1. 27 Apr, 2016 11 commits
    • Rick Hudson's avatar
      [dev.garbage] runtime: add gc work buffer tryGet and put fast paths · 1354b32c
      Rick Hudson authored
      The complexity of the GC work buffers put and tryGet
      prevented them from being inlined. This CL simplifies
      the fast path thus enabling inlining. If the fast
      path does not succeed the previous put and tryGet
      functions are called.
      
      Change-Id: I6da6495d0dadf42bd0377c110b502274cc01acf5
      Reviewed-on: https://go-review.googlesource.com/20704Reviewed-by: default avatarAustin Clements <austin@google.com>
      1354b32c
    • Rick Hudson's avatar
      [dev.garbage] runtime: cleanup and optimize span.base() · f8d0d4fd
      Rick Hudson authored
      Prior to this CL the base of a span was calculated in various
      places using shifts or calls to base(). This CL now
      always calls base() which has been optimized to calculate the
      base of the span when the span is initialized and store that
      value in the span structure.
      
      Change-Id: I661f2bfa21e3748a249cdf049ef9062db6e78100
      Reviewed-on: https://go-review.googlesource.com/20703Reviewed-by: default avatarAustin Clements <austin@google.com>
      f8d0d4fd
    • Rick Hudson's avatar
      [dev.garbage] runtime: remove heapBitsSweepSpan · 8dda1c4c
      Rick Hudson authored
      Prior to this CL the sweep phase was responsible for locating
      all objects that were about to be freed and calling a function
      to process the object. This was done by the function
      heapBitsSweepSpan. Part of processing included calls to
      tracefree and msanfree as well as counting how many objects
      were freed.
      
      The calls to tracefree and msanfree have been moved into the
      gcmalloc routine and called when the object is about to be
      reallocated. The counting of free objects has been optimized
      using an array based popcnt algorithm and if all the objects
      in a span are free then span is freed.
      
      Similarly the code to locate the next free object has been
      optimized to use an array based ctz (count trailing zero).
      Various hot paths in the allocation logic have been optimized.
      
      At this point the garbage benchmark is within 3% of the 1.6
      release.
      
      Change-Id: I00643c442e2ada1685c010c3447e4ea8537d2dfa
      Reviewed-on: https://go-review.googlesource.com/20201Reviewed-by: default avatarAustin Clements <austin@google.com>
      8dda1c4c
    • Rick Hudson's avatar
      [dev.garbage] runtime: add bit and cache ctz64 (count trailing zero) · 40934815
      Rick Hudson authored
      Add to each span a 64 bit cache (allocCache) of the allocBits
      at freeindex. allocCache is shifted such that the lowest bit
      corresponds to the bit freeindex. allocBits uses a 0 to
      indicate an object is free, on the other hand allocCache
      uses a 1 to indicate an object is free. This facilitates
      ctz64 (count trailing zero) which counts the number of 0s
      trailing the least significant 1. This is also the index of
      the least significant 1.
      
      Each span maintains a freeindex indicating the boundary
      between allocated objects and unallocated objects. allocCache
      is shifted as freeindex is incremented such that the low bit
      in allocCache corresponds to the bit a freeindex in the
      allocBits array.
      
      Currently ctz64 is written in Go using a for loop so it is
      not very efficient. Use of the hardware instruction will
      follow. With this in mind comparisons of the garbage
      benchmark are as follows.
      
      1.6 release        2.8 seconds
      dev:garbage branch 3.1 seconds.
      
      Profiling shows the go implementation of ctz64 takes up
      1% of the total time.
      
      Change-Id: If084ed9c3b1eda9f3c6ab2e794625cb870b8167f
      Reviewed-on: https://go-review.googlesource.com/20200Reviewed-by: default avatarAustin Clements <austin@google.com>
      40934815
    • Rick Hudson's avatar
      [dev.garbage] runtime: logic that uses count trailing zero (ctz) · 44fe90d0
      Rick Hudson authored
      Most (all?) processors that Go supports supply a hardware
      instruction that takes a byte and returns the number
      of zeros trailing the first 1 encountered, or 8
      if no ones are found. This is the index within the
      byte of the first 1 encountered. CTZ should improve the
      performance of the nextFreeIndex function.
      
      Since nextFreeIndex wants the next unmarked (0) bit
      a bit-wise complement is needed before calling ctz.
      Furthermore unmarked bits associated with previously
      allocated objects need to be ignored. Instead of writing
      a 1 as we allocate the code masks all bits less than the
      freeindex after loading the byte.
      
      While this CL does not actual execute a CTZ instruction
      it supplies a ctz function with the appropiate signature
      along with the logic to execute it.
      
      Change-Id: I5c55ce0ed48ca22c21c4dd9f969b0819b4eadaa7
      Reviewed-on: https://go-review.googlesource.com/20169Reviewed-by: default avatarKeith Randall <khr@golang.org>
      Reviewed-by: default avatarAustin Clements <austin@google.com>
      44fe90d0
    • Rick Hudson's avatar
      [dev.garbage] runtime: replace ref with allocCount · e4ac2d4a
      Rick Hudson authored
      This is a renaming of the field ref to the
      more appropriate allocCount. The field
      holds the number of objects in the span
      that are currently allocated. Some throws
      strings were adjusted to more accurately
      convey the meaning of allocCount.
      
      Change-Id: I10daf44e3e9cc24a10912638c7de3c1984ef8efe
      Reviewed-on: https://go-review.googlesource.com/19518Reviewed-by: default avatarAustin Clements <austin@google.com>
      e4ac2d4a
    • Rick Hudson's avatar
      [dev.garbage] runtime: allocate directly from GC mark bits · 3479b065
      Rick Hudson authored
      Instead of building a freelist from the mark bits generated
      by the GC this CL allocates directly from the mark bits.
      
      The approach moves the mark bits from the pointer/no pointer
      heap structures into their own per span data structures. The
      mark/allocation vectors consist of a single mark bit per
      object. Two vectors are maintained, one for allocation and
      one for the GC's mark phase. During the GC cycle's sweep
      phase the interpretation of the vectors is swapped. The
      mark vector becomes the allocation vector and the old
      allocation vector is cleared and becomes the mark vector that
      the next GC cycle will use.
      
      Marked entries in the allocation vector indicate that the
      object is not free. Each allocation vector maintains a boundary
      between areas of the span already allocated from and areas
      not yet allocated from. As objects are allocated this boundary
      is moved until it reaches the end of the span. At this point
      further allocations will be done from another span.
      
      Since we no longer sweep a span inspecting each freed object
      the responsibility for maintaining pointer/scalar bits in
      the heapBitMap containing is now the responsibility of the
      the routines doing the actual allocation.
      
      This CL is functionally complete and ready for performance
      tuning.
      
      Change-Id: I336e0fc21eef1066e0b68c7067cc71b9f3d50e04
      Reviewed-on: https://go-review.googlesource.com/19470Reviewed-by: default avatarAustin Clements <austin@google.com>
      3479b065
    • Rick Hudson's avatar
      [dev.garbage] runtime: mark/allocation helper functions · dc65a82e
      Rick Hudson authored
      The gcmarkBits is a bit vector used by the GC to mark
      reachable objects. Once a GC cycle is complete the gcmarkBits
      swap places with the allocBits. allocBits is then used directly
      by malloc to locate free objects, thus avoiding the
      construction of a linked free list. This CL introduces a set
      of helper functions for manipulating gcmarkBits and allocBits
      that will be used by later CLs to realize the actual
      algorithm. Minimal attempts have been made to optimize these
      helper routines.
      
      Change-Id: I55ad6240ca32cd456e8ed4973c6970b3b882dd34
      Reviewed-on: https://go-review.googlesource.com/19420Reviewed-by: default avatarAustin Clements <austin@google.com>
      Run-TryBot: Rick Hudson <rlh@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      dc65a82e
    • Rick Hudson's avatar
      [dev.garbage] runtime: refactor next free object · e1c4e9a7
      Rick Hudson authored
      In preparation for changing how the next free object is chosen
      refactor and consolidate code into a single function.
      
      Change-Id: I6836cd88ed7cbf0b2df87abd7c1c3b9fabc1cbd8
      Reviewed-on: https://go-review.googlesource.com/19317Reviewed-by: default avatarAustin Clements <austin@google.com>
      e1c4e9a7
    • Rick Hudson's avatar
      [dev.garbage] runtime: add stackfreelist · aed86103
      Rick Hudson authored
      The freelist for normal objects and the freelist
      for stacks share the same mspan field for holding
      the list head but are operated on by different code
      sequences. This overloading complicates the use of bit
      vectors for allocation of normal objects. This change
      refactors the use of the stackfreelist out from the
      use of freelist.
      
      Change-Id: I5b155b5b8a1fcd8e24c12ee1eb0800ad9b6b4fa0
      Reviewed-on: https://go-review.googlesource.com/19315Reviewed-by: default avatarAustin Clements <austin@google.com>
      aed86103
    • Rick Hudson's avatar
      [dev.garbage] runtime: bitmap allocation data structs · 2ac8bdc5
      Rick Hudson authored
      The bitmap allocation data structure prototypes. Before
      this is released these underlying data structures need
      to be more performant but the signatures of helper
      functions utilizing these structures will remain stable.
      
      Change-Id: I5ace12f2fb512a7038a52bbde2bfb7e98783bcbe
      Reviewed-on: https://go-review.googlesource.com/19221Reviewed-by: default avatarAustin Clements <austin@google.com>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      2ac8bdc5
  2. 05 Apr, 2016 25 commits
    • Austin Clements's avatar
      [dev.garbage] Merge branch 'master' into dev.garbage · 6d6e1600
      Austin Clements authored
      Change-Id: I47ac4112befc07d3674d7a88827227199edd93b4
      6d6e1600
    • Josh Bleecher Snyder's avatar
      cmd/compile: pull ssa OAPPEND expression handing into its own function · 5e1b7bde
      Josh Bleecher Snyder authored
      Pure code movement.
      
      Change-Id: Ia07ee0b0041c931b08adf090f262a6f74a6fdb01
      Reviewed-on: https://go-review.googlesource.com/21546
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      5e1b7bde
    • Josh Bleecher Snyder's avatar
      cmd/compile: give TLS relocations a name when dumping assembly · 7735dfb6
      Josh Bleecher Snyder authored
      Before:
      
      	...
      	0x00d0 ff ff ff e8 00 00 00 00 e9 23 ff ff ff cc cc cc  .........#......
      	rel 5+4 t=14 +0
      	rel 82+4 t=13 runtime.writeBarrier+0
      	...
      
      After:
      
      	...
      	0x00d0 ff ff ff e8 00 00 00 00 e9 23 ff ff ff cc cc cc  .........#......
      	rel 5+4 t=14 TLS+0
      	rel 82+4 t=13 runtime.writeBarrier+0
      	...
      
      Change-Id: Ibdaf694581b5fd5fb87fa8ce6a792f3eb4493622
      Reviewed-on: https://go-review.googlesource.com/21545Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      7735dfb6
    • Joe Tsai's avatar
      os: deprecate os.SEEK_SET, os.SEEK_CUR, and os.SEEK_END · 260ea689
      Joe Tsai authored
      CL/19862 introduced the same set of constants to the io package.
      We should steer users away from the os.SEEK* versions and towards
      the io.Seek* versions.
      
      Updates #6885
      
      Change-Id: I96ec5be3ec3439e1295c937159dadaf1ebfb2737
      Reviewed-on: https://go-review.googlesource.com/21540Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      260ea689
    • Robert Griesemer's avatar
      go/importer: match predeclared type list with gc's list in binary exporter · f79b50b8
      Robert Griesemer authored
      I think we had this code before but it may have gone lost somehow.
      
      Change-Id: Ifde490e686de0d2bfe907cbe19c9197f24f5fa8e
      Reviewed-on: https://go-review.googlesource.com/21537Reviewed-by: default avatarMatthew Dempsky <mdempsky@google.com>
      f79b50b8
    • David Chase's avatar
      cmd/compile: note escape of parts of closured-capture vars · d8c815d8
      David Chase authored
      Missed a case for closure calls (OCALLFUNC && indirect) in
      esc.go:esccall.
      
      Cleanup to runtime code for windows to more thoroughly hide
      a technical escape.  Also made code pickier about failing
      to late non-optional kernel32.dll.
      
      Fixes #14409.
      
      Change-Id: Ie75486a2c8626c4583224e02e4872c2875f7bca5
      Reviewed-on: https://go-review.googlesource.com/20102
      Run-TryBot: David Chase <drchase@google.com>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      d8c815d8
    • Robert Griesemer's avatar
      crypto/dsa: eliminate invalid PublicKey early · eb876dd8
      Robert Griesemer authored
      For PublicKey.P == 0, Verify will fail. Don't even try.
      
      Change-Id: I1009f2b3dead8d0041626c946633acb10086d8c8
      Reviewed-on: https://go-review.googlesource.com/21533Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      eb876dd8
    • Brad Fitzpatrick's avatar
      doc: add httptest.ResponseRecorder note to go1.7.txt notes · b9531d31
      Brad Fitzpatrick authored
      Fixes #14928
      
      Change-Id: Id772eb623815cb2bb3e49de68a916762345a9dc1
      Reviewed-on: https://go-review.googlesource.com/21531Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      b9531d31
    • Dmitry Vyukov's avatar
      runtime: don't burn CPU unnecessarily · 475d113b
      Dmitry Vyukov authored
      Two GC-related functions, scang and casgstatus, wait in an active spin loop.
      Active spinning is never a good idea in user-space. Once we wait several
      times more than the expected wait time, something unexpected is happenning
      (e.g. the thread we are waiting for is descheduled or handling a page fault)
      and we need to yield to OS scheduler. Moreover, the expected wait time is
      very high for these functions: scang wait time can be tens of milliseconds,
      casgstatus can be hundreds of microseconds. It does not make sense to spin
      even for that time.
      
      go install -a std profile on a 4-core machine shows that 11% of time is spent
      in the active spin in scang:
      
        6.12%    compile  compile                [.] runtime.scang
        3.27%    compile  compile                [.] runtime.readgstatus
        1.72%    compile  compile                [.] runtime/internal/atomic.Load
      
      The active spin also increases tail latency in the case of the slightest
      oversubscription: GC goroutines spend whole quantum in the loop instead of
      executing user code.
      
      Here is scang wait time histogram during go install -a std:
      
      13707.0000 - 1815442.7667 [   118]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎...
      1815442.7667 - 3617178.5333 [     9]: ∎∎∎∎∎∎∎∎∎
      3617178.5333 - 5418914.3000 [    11]: ∎∎∎∎∎∎∎∎∎∎∎
      5418914.3000 - 7220650.0667 [     5]: ∎∎∎∎∎
      7220650.0667 - 9022385.8333 [    12]: ∎∎∎∎∎∎∎∎∎∎∎∎
      9022385.8333 - 10824121.6000 [    13]: ∎∎∎∎∎∎∎∎∎∎∎∎∎
      10824121.6000 - 12625857.3667 [    15]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
      12625857.3667 - 14427593.1333 [    18]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
      14427593.1333 - 16229328.9000 [    18]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
      16229328.9000 - 18031064.6667 [    32]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
      18031064.6667 - 19832800.4333 [    28]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
      19832800.4333 - 21634536.2000 [     6]: ∎∎∎∎∎∎
      21634536.2000 - 23436271.9667 [    15]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
      23436271.9667 - 25238007.7333 [    11]: ∎∎∎∎∎∎∎∎∎∎∎
      25238007.7333 - 27039743.5000 [    27]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
      27039743.5000 - 28841479.2667 [    20]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
      28841479.2667 - 30643215.0333 [    10]: ∎∎∎∎∎∎∎∎∎∎
      30643215.0333 - 32444950.8000 [     7]: ∎∎∎∎∎∎∎
      32444950.8000 - 34246686.5667 [     4]: ∎∎∎∎
      34246686.5667 - 36048422.3333 [     4]: ∎∎∎∎
      36048422.3333 - 37850158.1000 [     1]: ∎
      37850158.1000 - 39651893.8667 [     5]: ∎∎∎∎∎
      39651893.8667 - 41453629.6333 [     2]: ∎∎
      41453629.6333 - 43255365.4000 [     2]: ∎∎
      43255365.4000 - 45057101.1667 [     2]: ∎∎
      45057101.1667 - 46858836.9333 [     1]: ∎
      46858836.9333 - 48660572.7000 [     2]: ∎∎
      48660572.7000 - 50462308.4667 [     3]: ∎∎∎
      50462308.4667 - 52264044.2333 [     2]: ∎∎
      52264044.2333 - 54065780.0000 [     2]: ∎∎
      
      and the zoomed-in first part:
      
      13707.0000 - 19916.7667 [     2]: ∎∎
      19916.7667 - 26126.5333 [     2]: ∎∎
      26126.5333 - 32336.3000 [     9]: ∎∎∎∎∎∎∎∎∎
      32336.3000 - 38546.0667 [     8]: ∎∎∎∎∎∎∎∎
      38546.0667 - 44755.8333 [    12]: ∎∎∎∎∎∎∎∎∎∎∎∎
      44755.8333 - 50965.6000 [    10]: ∎∎∎∎∎∎∎∎∎∎
      50965.6000 - 57175.3667 [     5]: ∎∎∎∎∎
      57175.3667 - 63385.1333 [     6]: ∎∎∎∎∎∎
      63385.1333 - 69594.9000 [     5]: ∎∎∎∎∎
      69594.9000 - 75804.6667 [     6]: ∎∎∎∎∎∎
      75804.6667 - 82014.4333 [     6]: ∎∎∎∎∎∎
      82014.4333 - 88224.2000 [     4]: ∎∎∎∎
      88224.2000 - 94433.9667 [     1]: ∎
      94433.9667 - 100643.7333 [     1]: ∎
      100643.7333 - 106853.5000 [     2]: ∎∎
      106853.5000 - 113063.2667 [     0]:
      113063.2667 - 119273.0333 [     2]: ∎∎
      119273.0333 - 125482.8000 [     2]: ∎∎
      125482.8000 - 131692.5667 [     1]: ∎
      131692.5667 - 137902.3333 [     1]: ∎
      137902.3333 - 144112.1000 [     0]:
      144112.1000 - 150321.8667 [     2]: ∎∎
      150321.8667 - 156531.6333 [     1]: ∎
      156531.6333 - 162741.4000 [     1]: ∎
      162741.4000 - 168951.1667 [     0]:
      168951.1667 - 175160.9333 [     0]:
      175160.9333 - 181370.7000 [     1]: ∎
      181370.7000 - 187580.4667 [     1]: ∎
      187580.4667 - 193790.2333 [     2]: ∎∎
      193790.2333 - 200000.0000 [     0]:
      
      Here is casgstatus wait time histogram:
      
        631.0000 -  5276.6333 [     3]: ∎∎∎
       5276.6333 -  9922.2667 [     5]: ∎∎∎∎∎
       9922.2667 - 14567.9000 [     2]: ∎∎
      14567.9000 - 19213.5333 [     6]: ∎∎∎∎∎∎
      19213.5333 - 23859.1667 [     5]: ∎∎∎∎∎
      23859.1667 - 28504.8000 [     6]: ∎∎∎∎∎∎
      28504.8000 - 33150.4333 [     6]: ∎∎∎∎∎∎
      33150.4333 - 37796.0667 [     2]: ∎∎
      37796.0667 - 42441.7000 [     1]: ∎
      42441.7000 - 47087.3333 [     3]: ∎∎∎
      47087.3333 - 51732.9667 [     0]:
      51732.9667 - 56378.6000 [     1]: ∎
      56378.6000 - 61024.2333 [     0]:
      61024.2333 - 65669.8667 [     0]:
      65669.8667 - 70315.5000 [     0]:
      70315.5000 - 74961.1333 [     1]: ∎
      74961.1333 - 79606.7667 [     0]:
      79606.7667 - 84252.4000 [     0]:
      84252.4000 - 88898.0333 [     0]:
      88898.0333 - 93543.6667 [     0]:
      93543.6667 - 98189.3000 [     0]:
      98189.3000 - 102834.9333 [     0]:
      102834.9333 - 107480.5667 [     1]: ∎
      107480.5667 - 112126.2000 [     0]:
      112126.2000 - 116771.8333 [     0]:
      116771.8333 - 121417.4667 [     0]:
      121417.4667 - 126063.1000 [     0]:
      126063.1000 - 130708.7333 [     0]:
      130708.7333 - 135354.3667 [     0]:
      135354.3667 - 140000.0000 [     1]: ∎
      
      Ideally we eliminate the waiting by switching to async
      state machine for GC, but for now just yield to OS scheduler
      after a reasonable wait time.
      
      To choose yielding parameters I've measured
      golang.org/x/benchmarks/http tail latencies with different yield
      delays and oversubscription levels.
      
      With no oversubscription (to the degree possible):
      
      scang yield delay = 1, casgstatus yield delay = 1
      Latency-50   1.41ms ±15%  1.41ms ± 5%    ~     (p=0.611 n=13+12)
      Latency-95   5.21ms ± 2%  5.15ms ± 2%  -1.15%  (p=0.012 n=13+13)
      Latency-99   7.16ms ± 2%  7.05ms ± 2%  -1.54%  (p=0.002 n=13+13)
      Latency-999  10.7ms ± 9%  10.2ms ±10%  -5.46%  (p=0.004 n=12+13)
      
      scang yield delay = 5000, casgstatus yield delay = 3000
      Latency-50   1.41ms ±15%  1.41ms ± 8%    ~     (p=0.511 n=13+13)
      Latency-95   5.21ms ± 2%  5.14ms ± 2%  -1.23%  (p=0.006 n=13+13)
      Latency-99   7.16ms ± 2%  7.02ms ± 2%  -1.94%  (p=0.000 n=13+13)
      Latency-999  10.7ms ± 9%  10.1ms ± 8%  -6.14%  (p=0.000 n=12+13)
      
      scang yield delay = 10000, casgstatus yield delay = 5000
      Latency-50   1.41ms ±15%  1.45ms ± 6%    ~     (p=0.724 n=13+13)
      Latency-95   5.21ms ± 2%  5.18ms ± 1%    ~     (p=0.287 n=13+13)
      Latency-99   7.16ms ± 2%  7.05ms ± 2%  -1.64%  (p=0.002 n=13+13)
      Latency-999  10.7ms ± 9%  10.0ms ± 5%  -6.72%  (p=0.000 n=12+13)
      
      scang yield delay = 30000, casgstatus yield delay = 10000
      Latency-50   1.41ms ±15%  1.51ms ± 7%  +6.57%  (p=0.002 n=13+13)
      Latency-95   5.21ms ± 2%  5.21ms ± 2%    ~     (p=0.960 n=13+13)
      Latency-99   7.16ms ± 2%  7.06ms ± 2%  -1.50%  (p=0.012 n=13+13)
      Latency-999  10.7ms ± 9%  10.0ms ± 6%  -6.49%  (p=0.000 n=12+13)
      
      scang yield delay = 100000, casgstatus yield delay = 50000
      Latency-50   1.41ms ±15%  1.53ms ± 6%  +8.48%  (p=0.000 n=13+12)
      Latency-95   5.21ms ± 2%  5.23ms ± 2%    ~     (p=0.287 n=13+13)
      Latency-99   7.16ms ± 2%  7.08ms ± 2%  -1.21%  (p=0.004 n=13+13)
      Latency-999  10.7ms ± 9%   9.9ms ± 3%  -7.99%  (p=0.000 n=12+12)
      
      scang yield delay = 200000, casgstatus yield delay = 100000
      Latency-50   1.41ms ±15%  1.47ms ± 5%    ~     (p=0.072 n=13+13)
      Latency-95   5.21ms ± 2%  5.17ms ± 2%    ~     (p=0.091 n=13+13)
      Latency-99   7.16ms ± 2%  7.02ms ± 2%  -1.99%  (p=0.000 n=13+13)
      Latency-999  10.7ms ± 9%   9.9ms ± 5%  -7.86%  (p=0.000 n=12+13)
      
      With slight oversubscription (another instance of http benchmark
      was running in background with reduced GOMAXPROCS):
      
      scang yield delay = 1, casgstatus yield delay = 1
      Latency-50    840µs ± 3%   804µs ± 3%  -4.37%  (p=0.000 n=15+18)
      Latency-95   6.52ms ± 4%  6.03ms ± 4%  -7.51%  (p=0.000 n=18+18)
      Latency-99   10.8ms ± 7%  10.0ms ± 4%  -7.33%  (p=0.000 n=18+14)
      Latency-999  18.0ms ± 9%  16.8ms ± 7%  -6.84%  (p=0.000 n=18+18)
      
      scang yield delay = 5000, casgstatus yield delay = 3000
      Latency-50    840µs ± 3%   809µs ± 3%  -3.71%  (p=0.000 n=15+17)
      Latency-95   6.52ms ± 4%  6.11ms ± 4%  -6.29%  (p=0.000 n=18+18)
      Latency-99   10.8ms ± 7%   9.9ms ± 6%  -7.55%  (p=0.000 n=18+18)
      Latency-999  18.0ms ± 9%  16.5ms ±11%  -8.49%  (p=0.000 n=18+18)
      
      scang yield delay = 10000, casgstatus yield delay = 5000
      Latency-50    840µs ± 3%   823µs ± 5%  -2.06%  (p=0.002 n=15+18)
      Latency-95   6.52ms ± 4%  6.32ms ± 3%  -3.05%  (p=0.000 n=18+18)
      Latency-99   10.8ms ± 7%  10.2ms ± 4%  -5.22%  (p=0.000 n=18+18)
      Latency-999  18.0ms ± 9%  16.7ms ±10%  -7.09%  (p=0.000 n=18+18)
      
      scang yield delay = 30000, casgstatus yield delay = 10000
      Latency-50    840µs ± 3%   836µs ± 5%    ~     (p=0.442 n=15+18)
      Latency-95   6.52ms ± 4%  6.39ms ± 3%  -2.00%  (p=0.000 n=18+18)
      Latency-99   10.8ms ± 7%  10.2ms ± 6%  -5.15%  (p=0.000 n=18+17)
      Latency-999  18.0ms ± 9%  16.6ms ± 8%  -7.48%  (p=0.000 n=18+18)
      
      scang yield delay = 100000, casgstatus yield delay = 50000
      Latency-50    840µs ± 3%   836µs ± 6%    ~     (p=0.401 n=15+18)
      Latency-95   6.52ms ± 4%  6.40ms ± 4%  -1.79%  (p=0.010 n=18+18)
      Latency-99   10.8ms ± 7%  10.2ms ± 5%  -4.95%  (p=0.000 n=18+18)
      Latency-999  18.0ms ± 9%  16.5ms ±14%  -8.17%  (p=0.000 n=18+18)
      
      scang yield delay = 200000, casgstatus yield delay = 100000
      Latency-50    840µs ± 3%   828µs ± 2%  -1.49%  (p=0.001 n=15+17)
      Latency-95   6.52ms ± 4%  6.38ms ± 4%  -2.04%  (p=0.001 n=18+18)
      Latency-99   10.8ms ± 7%  10.2ms ± 4%  -4.77%  (p=0.000 n=18+18)
      Latency-999  18.0ms ± 9%  16.9ms ± 9%  -6.23%  (p=0.000 n=18+18)
      
      With significant oversubscription (background http benchmark
      was running with full GOMAXPROCS):
      
      scang yield delay = 1, casgstatus yield delay = 1
      Latency-50   1.32ms ±12%  1.30ms ±13%    ~     (p=0.454 n=14+14)
      Latency-95   16.3ms ±10%  15.3ms ± 7%  -6.29%  (p=0.001 n=14+14)
      Latency-99   29.4ms ±10%  27.9ms ± 5%  -5.04%  (p=0.001 n=14+12)
      Latency-999  49.9ms ±19%  45.9ms ± 5%  -8.00%  (p=0.008 n=14+13)
      
      scang yield delay = 5000, casgstatus yield delay = 3000
      Latency-50   1.32ms ±12%  1.29ms ± 9%    ~     (p=0.227 n=14+14)
      Latency-95   16.3ms ±10%  15.4ms ± 5%  -5.27%  (p=0.002 n=14+14)
      Latency-99   29.4ms ±10%  27.9ms ± 6%  -5.16%  (p=0.001 n=14+14)
      Latency-999  49.9ms ±19%  46.8ms ± 8%  -6.21%  (p=0.050 n=14+14)
      
      scang yield delay = 10000, casgstatus yield delay = 5000
      Latency-50   1.32ms ±12%  1.35ms ± 9%     ~     (p=0.401 n=14+14)
      Latency-95   16.3ms ±10%  15.0ms ± 4%   -7.67%  (p=0.000 n=14+14)
      Latency-99   29.4ms ±10%  27.4ms ± 5%   -6.98%  (p=0.000 n=14+14)
      Latency-999  49.9ms ±19%  44.7ms ± 5%  -10.56%  (p=0.000 n=14+11)
      
      scang yield delay = 30000, casgstatus yield delay = 10000
      Latency-50   1.32ms ±12%  1.36ms ±10%     ~     (p=0.246 n=14+14)
      Latency-95   16.3ms ±10%  14.9ms ± 5%   -8.31%  (p=0.000 n=14+14)
      Latency-99   29.4ms ±10%  27.4ms ± 7%   -6.70%  (p=0.000 n=14+14)
      Latency-999  49.9ms ±19%  44.9ms ±15%  -10.13%  (p=0.003 n=14+14)
      
      scang yield delay = 100000, casgstatus yield delay = 50000
      Latency-50   1.32ms ±12%  1.41ms ± 9%  +6.37%  (p=0.008 n=14+13)
      Latency-95   16.3ms ±10%  15.1ms ± 8%  -7.45%  (p=0.000 n=14+14)
      Latency-99   29.4ms ±10%  27.5ms ±12%  -6.67%  (p=0.002 n=14+14)
      Latency-999  49.9ms ±19%  45.9ms ±16%  -8.06%  (p=0.019 n=14+14)
      
      scang yield delay = 200000, casgstatus yield delay = 100000
      Latency-50   1.32ms ±12%  1.42ms ±10%   +7.21%  (p=0.003 n=14+14)
      Latency-95   16.3ms ±10%  15.0ms ± 7%   -7.59%  (p=0.000 n=14+14)
      Latency-99   29.4ms ±10%  27.3ms ± 8%   -7.20%  (p=0.000 n=14+14)
      Latency-999  49.9ms ±19%  44.8ms ± 8%  -10.21%  (p=0.001 n=14+13)
      
      All numbers are on 8 cores and with GOGC=10 (http benchmark has
      tiny heap, few goroutines and low allocation rate, so by default
      GC barely affects tail latency).
      
      10us/5us yield delays seem to provide a reasonable compromise
      and give 5-10% tail latency reduction. That's what used in this change.
      
      go install -a std results on 4 core machine:
      
      name      old time/op  new time/op  delta
      Time       8.39s ± 2%   7.94s ± 2%  -5.34%  (p=0.000 n=47+49)
      UserTime   24.6s ± 2%   22.9s ± 2%  -6.76%  (p=0.000 n=49+49)
      SysTime    1.77s ± 9%   1.89s ±11%  +7.00%  (p=0.000 n=49+49)
      CpuLoad    315ns ± 2%   313ns ± 1%  -0.59%  (p=0.000 n=49+48) # %CPU
      MaxRSS    97.1ms ± 4%  97.5ms ± 9%    ~     (p=0.838 n=46+49) # bytes
      
      Update #14396
      Update #14189
      
      Change-Id: I3f4109bf8f7fd79b39c466576690a778232055a2
      Reviewed-on: https://go-review.googlesource.com/21503
      Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Reviewed-by: default avatarAustin Clements <austin@google.com>
      475d113b
    • Dmitry Vyukov's avatar
      runtime: sleep less when we can do work · 3b246fa8
      Dmitry Vyukov authored
      Usleep(100) in runqgrab negatively affects latency and throughput
      of parallel application. We are sleeping instead of doing useful work.
      This is effect is particularly visible on windows where minimal
      sleep duration is 1-15ms.
      
      Reduce sleep from 100us to 3us and use osyield on windows.
      Sync chan send/recv takes ~50ns, so 3us gives us ~50x overshoot.
      
      benchmark                    old ns/op     new ns/op     delta
      BenchmarkChanSync-12         216           217           +0.46%
      BenchmarkChanSyncWork-12     27213         25816         -5.13%
      
      CPU consumption goes up from 106% to 108% in the first case,
      and from 107% to 125% in the second case.
      
      Test case from #14790 on windows:
      
      BenchmarkDefaultResolution-8  4583372   29720    -99.35%
      Benchmark1ms-8                992056    30701    -96.91%
      
      99-th latency percentile for HTTP request serving is improved by up to 15%
      (see http://golang.org/cl/20835 for details).
      
      The following benchmarks are from the change that originally added this sleep
      (see https://golang.org/s/go15gomaxprocs):
      
      name        old time/op  new time/op  delta
      Chain       22.6µs ± 2%  22.7µs ± 6%    ~      (p=0.905 n=9+10)
      ChainBuf    22.4µs ± 3%  22.5µs ± 4%    ~      (p=0.780 n=9+10)
      Chain-2     23.5µs ± 4%  24.9µs ± 1%  +5.66%   (p=0.000 n=10+9)
      ChainBuf-2  23.7µs ± 1%  24.4µs ± 1%  +3.31%   (p=0.000 n=9+10)
      Chain-4     24.2µs ± 2%  25.1µs ± 3%  +3.70%   (p=0.000 n=9+10)
      ChainBuf-4  24.4µs ± 5%  25.0µs ± 2%  +2.37%  (p=0.023 n=10+10)
      Powser       2.37s ± 1%   2.37s ± 1%    ~       (p=0.423 n=8+9)
      Powser-2     2.48s ± 2%   2.57s ± 2%  +3.74%   (p=0.000 n=10+9)
      Powser-4     2.66s ± 1%   2.75s ± 1%  +3.40%  (p=0.000 n=10+10)
      Sieve        13.3s ± 2%   13.3s ± 2%    ~      (p=1.000 n=10+9)
      Sieve-2      7.00s ± 2%   7.44s ±16%    ~      (p=0.408 n=8+10)
      Sieve-4      4.13s ±21%   3.85s ±22%    ~       (p=0.113 n=9+9)
      
      Fixes #14790
      
      Change-Id: Ie7c6a1c4f9c8eb2f5d65ab127a3845386d6f8b5d
      Reviewed-on: https://go-review.googlesource.com/20835Reviewed-by: default avatarAustin Clements <austin@google.com>
      3b246fa8
    • Ilya Tocar's avatar
      cmd/compile/internal/amd64: Use 32-bit operands for byte operations · 036d09d5
      Ilya Tocar authored
      We already generate ADDL for byte operations, reflect this in code.
      This also allows inc/dec for +-1 operation, which are 1-byte shorter,
      and enables lea for 3-operand addition/subtraction.
      
      Change-Id: Ibfdfee50667ca4cd3c28f72e3dece0c6d114d3ae
      Reviewed-on: https://go-review.googlesource.com/21251Reviewed-by: default avatarKeith Randall <khr@golang.org>
      Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      036d09d5
    • Augusto Roman's avatar
      encoding/json: allow non-string type keys for (un-)marshal · ffbd31e9
      Augusto Roman authored
      This CL allows JSON-encoding & -decoding maps whose keys are types that
      implement encoding.TextMarshaler / TextUnmarshaler.
      
      During encode, the map keys are marshaled upfront so that they can be
      sorted.
      
      Fixes #12146
      
      Change-Id: I43809750a7ad82a3603662f095c7baf75fd172da
      Reviewed-on: https://go-review.googlesource.com/20356
      Run-TryBot: Caleb Spare <cespare@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      ffbd31e9
    • Eric Lagergren's avatar
      io: define SeekStart, SeekCurrent, SeekEnd constants for use with Seeker · acefcb73
      Eric Lagergren authored
      Fixes #6885
      
      Change-Id: I6907958186f6a2427da1ad2f6c20bd5d7bf7a3f9
      Reviewed-on: https://go-review.googlesource.com/19862Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      acefcb73
    • Alexandru Moșoi's avatar
      cmd/compile: add a pass to print bound checks · 1747788c
      Alexandru Moșoi authored
      Since BCE happens over several passes (opt, loopbce, prove)
      it's easy to regress especially with rewriting.
      
      The pass is only activated with special debug flag.
      
      Change-Id: I46205982e7a2751156db8e875d69af6138068f59
      Reviewed-on: https://go-review.googlesource.com/21510
      Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDavid Chase <drchase@google.com>
      1747788c
    • Brad Fitzpatrick's avatar
      net/http: zero pad Response status codes to three digits · 3bbede0c
      Brad Fitzpatrick authored
      Go 1.6's HTTP/1.x Transport started enforcing that responses have 3
      status digits, per the spec, but we could still write out invalid
      status codes ourselves if the called
      ResponseWriter.WriteHeader(0). That is bogus anyway, since the minimum
      status code is 1xx, but be a little bit less bogus (and consistent)
      and zero pad our responses.
      
      Change-Id: I6883901fd95073cb72f6b74035cabf1a79c35e1c
      Reviewed-on: https://go-review.googlesource.com/19130
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarAndrew Gerrand <adg@golang.org>
      3bbede0c
    • Dave Cheney's avatar
      cmd/compile/internal/ssa: hide gen packge from ./make.bash · 7208a2cd
      Dave Cheney authored
      Fixes #15122
      
      Change-Id: Ie2c802d78aea731e25bf4b193b3c2e4c884e0573
      Reviewed-on: https://go-review.googlesource.com/21524
      Run-TryBot: Dave Cheney <dave@cheney.net>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      7208a2cd
    • David Symonds's avatar
      expvar: document that the Var interface's String method should return a valid JSON value. · e8f01e5c
      David Symonds authored
      Change-Id: If4e740f3dbef4053355542eebdd899b3099d872c
      Reviewed-on: https://go-review.googlesource.com/21525Reviewed-by: default avatarAndrew Gerrand <adg@golang.org>
      e8f01e5c
    • Paul Marks's avatar
      net: wait for cancelation goroutine before returning from connect. · 869e5765
      Paul Marks authored
      This fixes a race which made it possible to cancel a connection after
      returning from net.Dial.
      
      Fixes #15035
      Fixes #15078
      
      Change-Id: Iec6215009538362f7ad9f408a33549f3e94d1606
      Reviewed-on: https://go-review.googlesource.com/21497Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      869e5765
    • Alex Brainman's avatar
      cmd/go: leave directory before removing it in TestSharedLibName · bbbd572c
      Alex Brainman authored
      Fixes #15124
      
      Change-Id: I55fe4c2957370f3fb417c3df54f99fb085a5dada
      Reviewed-on: https://go-review.googlesource.com/21522Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      bbbd572c
    • Alex Brainman's avatar
      debug/gosym: do not forget to close test binay file handle in TestPCLine · 31f2bb4b
      Alex Brainman authored
      Fixes #15121
      
      Change-Id: I651521743c56244c55eda5762905889d7e06887a
      Reviewed-on: https://go-review.googlesource.com/21521Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      31f2bb4b
    • Alex Brainman's avatar
      runtime: leave directory before removing it in TestDLLPreloadMitigation · ffeae198
      Alex Brainman authored
      Fixes #15120
      
      Change-Id: I1d9a192ac163826bad8b46e8c0b0b9e218e69570
      Reviewed-on: https://go-review.googlesource.com/21520Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      ffeae198
    • Brad Fitzpatrick's avatar
      net/http: add Request.Context and Request.WithContext · c1c7547f
      Brad Fitzpatrick authored
      Currently only used by the client. The server is not yet wired up.  A
      TODO remains to document how it works server-side, once implemented.
      
      Updates #14660
      
      Change-Id: I27c2e74198872b2720995fa8271d91de200e23d5
      Reviewed-on: https://go-review.googlesource.com/21496Reviewed-by: default avatarAndrew Gerrand <adg@golang.org>
      c1c7547f
    • Alex Brainman's avatar
      runtime: remove race out of BenchmarkChanToSyscallPing1ms · fcac8809
      Alex Brainman authored
      Fixes #15119
      
      Change-Id: I31445bf282a5e2a160ff4e66c5a592b989a5798f
      Reviewed-on: https://go-review.googlesource.com/21448Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      fcac8809
    • Hiroshi Ioka's avatar
      path/filepath: normalize output of EvalSymlinks on windows · c4dda7e5
      Hiroshi Ioka authored
      Current implementation uses GetShortPathName and GetLongPathName
      to get a normalized path. That approach sometimes fails because
      user can disable short path name anytime. This CL provides
      an alternative approach suggested by MSDN.
      
      https://msdn.microsoft.com/en-us/library/windows/desktop/aa364989(v=vs.85).aspx
      
      Fixes #13980
      
      Change-Id: Icf4afe4c9c4b507fc110c1483bf8db2c3f606b0a
      Reviewed-on: https://go-review.googlesource.com/20860Reviewed-by: default avatarAlex Brainman <alex.brainman@gmail.com>
      Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      c4dda7e5
    • Brad Fitzpatrick's avatar
      context: add the context package from golang.org/x/net/context · 9db7ef56
      Brad Fitzpatrick authored
      This copies the golang.org/x/net/context package to the standard library.
      
      It is imported from the x/net repo's git rev 1d9fd3b8333e (the most
      recent modified to x/net/context as of 2016-03-07).
      
      The corresponding change to x/net/context is in https://golang.org/cl/20347
      
      Updates #14660
      
      Change-Id: Ida14b1b7e115194d6218d9ac614548b9f41641cc
      Reviewed-on: https://go-review.googlesource.com/20346Reviewed-by: default avatarSameer Ajmani <sameer@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      9db7ef56
  3. 04 Apr, 2016 4 commits