1. 13 Apr, 2016 1 commit
    • David Crawshaw's avatar
      cmd/compile, etc: store method tables as offsets · 7d469179
      David Crawshaw authored
      This CL introduces the typeOff type and a lookup method of the same
      name that can turn a typeOff offset into an *rtype.
      
      In a typical Go binary (built with buildmode=exe, pie, c-archive, or
      c-shared), there is one moduledata and all typeOff values are offsets
      relative to firstmoduledata.types. This makes computing the pointer
      cheap in typical programs.
      
      With buildmode=shared (and one day, buildmode=plugin) there are
      multiple modules whose relative offset is determined at runtime.
      We identify a type in the general case by the pair of the original
      *rtype that references it and its typeOff value. We determine
      the module from the original pointer, and then use the typeOff from
      there to compute the final *rtype.
      
      To ensure there is only one *rtype representing each type, the
      runtime initializes a typemap for each module, using any identical
      type from an earlier module when resolving that offset. This means
      that types computed from an offset match the type mapped by the
      pointer dynamic relocations.
      
      A series of followup CLs will replace other *rtype values with typeOff
      (and name/*string with nameOff).
      
      For types created at runtime by reflect, type offsets are treated as
      global IDs and reference into a reflect offset map kept by the runtime.
      
      darwin/amd64:
      	cmd/go:  -57KB (0.6%)
      	jujud:  -557KB (0.8%)
      
      linux/amd64 PIE:
      	cmd/go: -361KB (3.0%)
      	jujud:  -3.5MB (4.2%)
      
      For #6853.
      
      Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96
      Reviewed-on: https://go-review.googlesource.com/21285
      
      Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      Run-TryBot: David Crawshaw <crawshaw@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      7d469179
  2. 10 Apr, 2016 1 commit
  3. 07 Apr, 2016 1 commit
  4. 05 Apr, 2016 2 commits
    • Dmitry Vyukov's avatar
      runtime: don't burn CPU unnecessarily · 475d113b
      Dmitry Vyukov authored
      Two GC-related functions, scang and casgstatus, wait in an active spin loop.
      Active spinning is never a good idea in user-space. Once we wait several
      times more than the expected wait time, something unexpected is happenning
      (e.g. the thread we are waiting for is descheduled or handling a page fault)
      and we need to yield to OS scheduler. Moreover, the expected wait time is
      very high for these functions: scang wait time can be tens of milliseconds,
      casgstatus can be hundreds of microseconds. It does not make sense to spin
      even for that time.
      
      go install -a std profile on a 4-core machine shows that 11% of time is spent
      in the active spin in scang:
      
        6.12%    compile  compile                [.] runtime.scang
        3.27%    compile  compile                [.] runtime.readgstatus
        1.72%    compile  compile                [.] runtime/internal/atomic.Load
      
      The active spin also increases tail latency in the case of the slightest
      oversubscription: GC goroutines spend whole quantum in the loop instead of
      executing user code.
      
      Here is scang wait time histogram during go install -a std:
      
      13707.0000 - 1815442.7667 [   118]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎...
      1815442.7667 - 3617178.5333 [     9]: ∎∎∎∎∎∎∎∎∎
      3617178.5333 - 5418914.3000 [    11]: ∎∎∎∎∎∎∎∎∎∎∎
      5418914.3000 - 7220650.0667 [     5]: ∎∎∎∎∎
      7220650.0667 - 9022385.8333 [    12]: ∎∎∎∎∎∎∎∎∎∎∎∎
      9022385.8333 - 10824121.6000 [    13]: ∎∎∎∎∎∎∎∎∎∎∎∎∎
      10824121.6000 - 12625857.3667 [    15]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
      12625857.3667 - 14427593.1333 [    18]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
      14427593.1333 - 16229328.9000 [    18]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
      16229328.9000 - 18031064.6667 [    32]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
      18031064.6667 - 19832800.4333 [    28]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
      19832800.4333 - 21634536.2000 [     6]: ∎∎∎∎∎∎
      21634536.2000 - 23436271.9667 [    15]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
      23436271.9667 - 25238007.7333 [    11]: ∎∎∎∎∎∎∎∎∎∎∎
      25238007.7333 - 27039743.5000 [    27]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
      27039743.5000 - 28841479.2667 [    20]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
      28841479.2667 - 30643215.0333 [    10]: ∎∎∎∎∎∎∎∎∎∎
      30643215.0333 - 32444950.8000 [     7]: ∎∎∎∎∎∎∎
      32444950.8000 - 34246686.5667 [     4]: ∎∎∎∎
      34246686.5667 - 36048422.3333 [     4]: ∎∎∎∎
      36048422.3333 - 37850158.1000 [     1]: ∎
      37850158.1000 - 39651893.8667 [     5]: ∎∎∎∎∎
      39651893.8667 - 41453629.6333 [     2]: ∎∎
      41453629.6333 - 43255365.4000 [     2]: ∎∎
      43255365.4000 - 45057101.1667 [     2]: ∎∎
      45057101.1667 - 46858836.9333 [     1]: ∎
      46858836.9333 - 48660572.7000 [     2]: ∎∎
      48660572.7000 - 50462308.4667 [     3]: ∎∎∎
      50462308.4667 - 52264044.2333 [     2]: ∎∎
      52264044.2333 - 54065780.0000 [     2]: ∎∎
      
      and the zoomed-in first part:
      
      13707.0000 - 19916.7667 [     2]: ∎∎
      19916.7667 - 26126.5333 [     2]: ∎∎
      26126.5333 - 32336.3000 [     9]: ∎∎∎∎∎∎∎∎∎
      32336.3000 - 38546.0667 [     8]: ∎∎∎∎∎∎∎∎
      38546.0667 - 44755.8333 [    12]: ∎∎∎∎∎∎∎∎∎∎∎∎
      44755.8333 - 50965.6000 [    10]: ∎∎∎∎∎∎∎∎∎∎
      50965.6000 - 57175.3667 [     5]: ∎∎∎∎∎
      57175.3667 - 63385.1333 [     6]: ∎∎∎∎∎∎
      63385.1333 - 69594.9000 [     5]: ∎∎∎∎∎
      69594.9000 - 75804.6667 [     6]: ∎∎∎∎∎∎
      75804.6667 - 82014.4333 [     6]: ∎∎∎∎∎∎
      82014.4333 - 88224.2000 [     4]: ∎∎∎∎
      88224.2000 - 94433.9667 [     1]: ∎
      94433.9667 - 100643.7333 [     1]: ∎
      100643.7333 - 106853.5000 [     2]: ∎∎
      106853.5000 - 113063.2667 [     0]:
      113063.2667 - 119273.0333 [     2]: ∎∎
      119273.0333 - 125482.8000 [     2]: ∎∎
      125482.8000 - 131692.5667 [     1]: ∎
      131692.5667 - 137902.3333 [     1]: ∎
      137902.3333 - 144112.1000 [     0]:
      144112.1000 - 150321.8667 [     2]: ∎∎
      150321.8667 - 156531.6333 [     1]: ∎
      156531.6333 - 162741.4000 [     1]: ∎
      162741.4000 - 168951.1667 [     0]:
      168951.1667 - 175160.9333 [     0]:
      175160.9333 - 181370.7000 [     1]: ∎
      181370.7000 - 187580.4667 [     1]: ∎
      187580.4667 - 193790.2333 [     2]: ∎∎
      193790.2333 - 200000.0000 [     0]:
      
      Here is casgstatus wait time histogram:
      
        631.0000 -  5276.6333 [     3]: ∎∎∎
       5276.6333 -  9922.2667 [     5]: ∎∎∎∎∎
       9922.2667 - 14567.9000 [     2]: ∎∎
      14567.9000 - 19213.5333 [     6]: ∎∎∎∎∎∎
      19213.5333 - 23859.1667 [     5]: ∎∎∎∎∎
      23859.1667 - 28504.8000 [     6]: ∎∎∎∎∎∎
      28504.8000 - 33150.4333 [     6]: ∎∎∎∎∎∎
      33150.4333 - 37796.0667 [     2]: ∎∎
      37796.0667 - 42441.7000 [     1]: ∎
      42441.7000 - 47087.3333 [     3]: ∎∎∎
      47087.3333 - 51732.9667 [     0]:
      51732.9667 - 56378.6000 [     1]: ∎
      56378.6000 - 61024.2333 [     0]:
      61024.2333 - 65669.8667 [     0]:
      65669.8667 - 70315.5000 [     0]:
      70315.5000 - 74961.1333 [     1]: ∎
      74961.1333 - 79606.7667 [     0]:
      79606.7667 - 84252.4000 [     0]:
      84252.4000 - 88898.0333 [     0]:
      88898.0333 - 93543.6667 [     0]:
      93543.6667 - 98189.3000 [     0]:
      98189.3000 - 102834.9333 [     0]:
      102834.9333 - 107480.5667 [     1]: ∎
      107480.5667 - 112126.2000 [     0]:
      112126.2000 - 116771.8333 [     0]:
      116771.8333 - 121417.4667 [     0]:
      121417.4667 - 126063.1000 [     0]:
      126063.1000 - 130708.7333 [     0]:
      130708.7333 - 135354.3667 [     0]:
      135354.3667 - 140000.0000 [     1]: ∎
      
      Ideally we eliminate the waiting by switching to async
      state machine for GC, but for now just yield to OS scheduler
      after a reasonable wait time.
      
      To choose yielding parameters I've measured
      golang.org/x/benchmarks/http tail latencies with different yield
      delays and oversubscription levels.
      
      With no oversubscription (to the degree possible):
      
      scang yield delay = 1, casgstatus yield delay = 1
      Latency-50   1.41ms ±15%  1.41ms ± 5%    ~     (p=0.611 n=13+12)
      Latency-95   5.21ms ± 2%  5.15ms ± 2%  -1.15%  (p=0.012 n=13+13)
      Latency-99   7.16ms ± 2%  7.05ms ± 2%  -1.54%  (p=0.002 n=13+13)
      Latency-999  10.7ms ± 9%  10.2ms ±10%  -5.46%  (p=0.004 n=12+13)
      
      scang yield delay = 5000, casgstatus yield delay = 3000
      Latency-50   1.41ms ±15%  1.41ms ± 8%    ~     (p=0.511 n=13+13)
      Latency-95   5.21ms ± 2%  5.14ms ± 2%  -1.23%  (p=0.006 n=13+13)
      Latency-99   7.16ms ± 2%  7.02ms ± 2%  -1.94%  (p=0.000 n=13+13)
      Latency-999  10.7ms ± 9%  10.1ms ± 8%  -6.14%  (p=0.000 n=12+13)
      
      scang yield delay = 10000, casgstatus yield delay = 5000
      Latency-50   1.41ms ±15%  1.45ms ± 6%    ~     (p=0.724 n=13+13)
      Latency-95   5.21ms ± 2%  5.18ms ± 1%    ~     (p=0.287 n=13+13)
      Latency-99   7.16ms ± 2%  7.05ms ± 2%  -1.64%  (p=0.002 n=13+13)
      Latency-999  10.7ms ± 9%  10.0ms ± 5%  -6.72%  (p=0.000 n=12+13)
      
      scang yield delay = 30000, casgstatus yield delay = 10000
      Latency-50   1.41ms ±15%  1.51ms ± 7%  +6.57%  (p=0.002 n=13+13)
      Latency-95   5.21ms ± 2%  5.21ms ± 2%    ~     (p=0.960 n=13+13)
      Latency-99   7.16ms ± 2%  7.06ms ± 2%  -1.50%  (p=0.012 n=13+13)
      Latency-999  10.7ms ± 9%  10.0ms ± 6%  -6.49%  (p=0.000 n=12+13)
      
      scang yield delay = 100000, casgstatus yield delay = 50000
      Latency-50   1.41ms ±15%  1.53ms ± 6%  +8.48%  (p=0.000 n=13+12)
      Latency-95   5.21ms ± 2%  5.23ms ± 2%    ~     (p=0.287 n=13+13)
      Latency-99   7.16ms ± 2%  7.08ms ± 2%  -1.21%  (p=0.004 n=13+13)
      Latency-999  10.7ms ± 9%   9.9ms ± 3%  -7.99%  (p=0.000 n=12+12)
      
      scang yield delay = 200000, casgstatus yield delay = 100000
      Latency-50   1.41ms ±15%  1.47ms ± 5%    ~     (p=0.072 n=13+13)
      Latency-95   5.21ms ± 2%  5.17ms ± 2%    ~     (p=0.091 n=13+13)
      Latency-99   7.16ms ± 2%  7.02ms ± 2%  -1.99%  (p=0.000 n=13+13)
      Latency-999  10.7ms ± 9%   9.9ms ± 5%  -7.86%  (p=0.000 n=12+13)
      
      With slight oversubscription (another instance of http benchmark
      was running in background with reduced GOMAXPROCS):
      
      scang yield delay = 1, casgstatus yield delay = 1
      Latency-50    840µs ± 3%   804µs ± 3%  -4.37%  (p=0.000 n=15+18)
      Latency-95   6.52ms ± 4%  6.03ms ± 4%  -7.51%  (p=0.000 n=18+18)
      Latency-99   10.8ms ± 7%  10.0ms ± 4%  -7.33%  (p=0.000 n=18+14)
      Latency-999  18.0ms ± 9%  16.8ms ± 7%  -6.84%  (p=0.000 n=18+18)
      
      scang yield delay = 5000, casgstatus yield delay = 3000
      Latency-50    840µs ± 3%   809µs ± 3%  -3.71%  (p=0.000 n=15+17)
      Latency-95   6.52ms ± 4%  6.11ms ± 4%  -6.29%  (p=0.000 n=18+18)
      Latency-99   10.8ms ± 7%   9.9ms ± 6%  -7.55%  (p=0.000 n=18+18)
      Latency-999  18.0ms ± 9%  16.5ms ±11%  -8.49%  (p=0.000 n=18+18)
      
      scang yield delay = 10000, casgstatus yield delay = 5000
      Latency-50    840µs ± 3%   823µs ± 5%  -2.06%  (p=0.002 n=15+18)
      Latency-95   6.52ms ± 4%  6.32ms ± 3%  -3.05%  (p=0.000 n=18+18)
      Latency-99   10.8ms ± 7%  10.2ms ± 4%  -5.22%  (p=0.000 n=18+18)
      Latency-999  18.0ms ± 9%  16.7ms ±10%  -7.09%  (p=0.000 n=18+18)
      
      scang yield delay = 30000, casgstatus yield delay = 10000
      Latency-50    840µs ± 3%   836µs ± 5%    ~     (p=0.442 n=15+18)
      Latency-95   6.52ms ± 4%  6.39ms ± 3%  -2.00%  (p=0.000 n=18+18)
      Latency-99   10.8ms ± 7%  10.2ms ± 6%  -5.15%  (p=0.000 n=18+17)
      Latency-999  18.0ms ± 9%  16.6ms ± 8%  -7.48%  (p=0.000 n=18+18)
      
      scang yield delay = 100000, casgstatus yield delay = 50000
      Latency-50    840µs ± 3%   836µs ± 6%    ~     (p=0.401 n=15+18)
      Latency-95   6.52ms ± 4%  6.40ms ± 4%  -1.79%  (p=0.010 n=18+18)
      Latency-99   10.8ms ± 7%  10.2ms ± 5%  -4.95%  (p=0.000 n=18+18)
      Latency-999  18.0ms ± 9%  16.5ms ±14%  -8.17%  (p=0.000 n=18+18)
      
      scang yield delay = 200000, casgstatus yield delay = 100000
      Latency-50    840µs ± 3%   828µs ± 2%  -1.49%  (p=0.001 n=15+17)
      Latency-95   6.52ms ± 4%  6.38ms ± 4%  -2.04%  (p=0.001 n=18+18)
      Latency-99   10.8ms ± 7%  10.2ms ± 4%  -4.77%  (p=0.000 n=18+18)
      Latency-999  18.0ms ± 9%  16.9ms ± 9%  -6.23%  (p=0.000 n=18+18)
      
      With significant oversubscription (background http benchmark
      was running with full GOMAXPROCS):
      
      scang yield delay = 1, casgstatus yield delay = 1
      Latency-50   1.32ms ±12%  1.30ms ±13%    ~     (p=0.454 n=14+14)
      Latency-95   16.3ms ±10%  15.3ms ± 7%  -6.29%  (p=0.001 n=14+14)
      Latency-99   29.4ms ±10%  27.9ms ± 5%  -5.04%  (p=0.001 n=14+12)
      Latency-999  49.9ms ±19%  45.9ms ± 5%  -8.00%  (p=0.008 n=14+13)
      
      scang yield delay = 5000, casgstatus yield delay = 3000
      Latency-50   1.32ms ±12%  1.29ms ± 9%    ~     (p=0.227 n=14+14)
      Latency-95   16.3ms ±10%  15.4ms ± 5%  -5.27%  (p=0.002 n=14+14)
      Latency-99   29.4ms ±10%  27.9ms ± 6%  -5.16%  (p=0.001 n=14+14)
      Latency-999  49.9ms ±19%  46.8ms ± 8%  -6.21%  (p=0.050 n=14+14)
      
      scang yield delay = 10000, casgstatus yield delay = 5000
      Latency-50   1.32ms ±12%  1.35ms ± 9%     ~     (p=0.401 n=14+14)
      Latency-95   16.3ms ±10%  15.0ms ± 4%   -7.67%  (p=0.000 n=14+14)
      Latency-99   29.4ms ±10%  27.4ms ± 5%   -6.98%  (p=0.000 n=14+14)
      Latency-999  49.9ms ±19%  44.7ms ± 5%  -10.56%  (p=0.000 n=14+11)
      
      scang yield delay = 30000, casgstatus yield delay = 10000
      Latency-50   1.32ms ±12%  1.36ms ±10%     ~     (p=0.246 n=14+14)
      Latency-95   16.3ms ±10%  14.9ms ± 5%   -8.31%  (p=0.000 n=14+14)
      Latency-99   29.4ms ±10%  27.4ms ± 7%   -6.70%  (p=0.000 n=14+14)
      Latency-999  49.9ms ±19%  44.9ms ±15%  -10.13%  (p=0.003 n=14+14)
      
      scang yield delay = 100000, casgstatus yield delay = 50000
      Latency-50   1.32ms ±12%  1.41ms ± 9%  +6.37%  (p=0.008 n=14+13)
      Latency-95   16.3ms ±10%  15.1ms ± 8%  -7.45%  (p=0.000 n=14+14)
      Latency-99   29.4ms ±10%  27.5ms ±12%  -6.67%  (p=0.002 n=14+14)
      Latency-999  49.9ms ±19%  45.9ms ±16%  -8.06%  (p=0.019 n=14+14)
      
      scang yield delay = 200000, casgstatus yield delay = 100000
      Latency-50   1.32ms ±12%  1.42ms ±10%   +7.21%  (p=0.003 n=14+14)
      Latency-95   16.3ms ±10%  15.0ms ± 7%   -7.59%  (p=0.000 n=14+14)
      Latency-99   29.4ms ±10%  27.3ms ± 8%   -7.20%  (p=0.000 n=14+14)
      Latency-999  49.9ms ±19%  44.8ms ± 8%  -10.21%  (p=0.001 n=14+13)
      
      All numbers are on 8 cores and with GOGC=10 (http benchmark has
      tiny heap, few goroutines and low allocation rate, so by default
      GC barely affects tail latency).
      
      10us/5us yield delays seem to provide a reasonable compromise
      and give 5-10% tail latency reduction. That's what used in this change.
      
      go install -a std results on 4 core machine:
      
      name      old time/op  new time/op  delta
      Time       8.39s ± 2%   7.94s ± 2%  -5.34%  (p=0.000 n=47+49)
      UserTime   24.6s ± 2%   22.9s ± 2%  -6.76%  (p=0.000 n=49+49)
      SysTime    1.77s ± 9%   1.89s ±11%  +7.00%  (p=0.000 n=49+49)
      CpuLoad    315ns ± 2%   313ns ± 1%  -0.59%  (p=0.000 n=49+48) # %CPU
      MaxRSS    97.1ms ± 4%  97.5ms ± 9%    ~     (p=0.838 n=46+49) # bytes
      
      Update #14396
      Update #14189
      
      Change-Id: I3f4109bf8f7fd79b39c466576690a778232055a2
      Reviewed-on: https://go-review.googlesource.com/21503
      
      
      Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Reviewed-by: default avatarAustin Clements <austin@google.com>
      475d113b
    • Dmitry Vyukov's avatar
      runtime: sleep less when we can do work · 3b246fa8
      Dmitry Vyukov authored
      Usleep(100) in runqgrab negatively affects latency and throughput
      of parallel application. We are sleeping instead of doing useful work.
      This is effect is particularly visible on windows where minimal
      sleep duration is 1-15ms.
      
      Reduce sleep from 100us to 3us and use osyield on windows.
      Sync chan send/recv takes ~50ns, so 3us gives us ~50x overshoot.
      
      benchmark                    old ns/op     new ns/op     delta
      BenchmarkChanSync-12         216           217           +0.46%
      BenchmarkChanSyncWork-12     27213         25816         -5.13%
      
      CPU consumption goes up from 106% to 108% in the first case,
      and from 107% to 125% in the second case.
      
      Test case from #14790 on windows:
      
      BenchmarkDefaultResolution-8  4583372   29720    -99.35%
      Benchmark1ms-8                992056    30701    -96.91%
      
      99-th latency percentile for HTTP request serving is improved by up to 15%
      (see http://golang.org/cl/20835 for details).
      
      The following benchmarks are from the change that originally added this sleep
      (see https://golang.org/s/go15gomaxprocs):
      
      name        old time/op  new time/op  delta
      Chain       22.6µs ± 2%  22.7µs ± 6%    ~      (p=0.905 n=9+10)
      ChainBuf    22.4µs ± 3%  22.5µs ± 4%    ~      (p=0.780 n=9+10)
      Chain-2     23.5µs ± 4%  24.9µs ± 1%  +5.66%   (p=0.000 n=10+9)
      ChainBuf-2  23.7µs ± 1%  24.4µs ± 1%  +3.31%   (p=0.000 n=9+10)
      Chain-4     24.2µs ± 2%  25.1µs ± 3%  +3.70%   (p=0.000 n=9+10)
      ChainBuf-4  24.4µs ± 5%  25.0µs ± 2%  +2.37%  (p=0.023 n=10+10)
      Powser       2.37s ± 1%   2.37s ± 1%    ~       (p=0.423 n=8+9)
      Powser-2     2.48s ± 2%   2.57s ± 2%  +3.74%   (p=0.000 n=10+9)
      Powser-4     2.66s ± 1%   2.75s ± 1%  +3.40%  (p=0.000 n=10+10)
      Sieve        13.3s ± 2%   13.3s ± 2%    ~      (p=1.000 n=10+9)
      Sieve-2      7.00s ± 2%   7.44s ±16%    ~      (p=0.408 n=8+10)
      Sieve-4      4.13s ±21%   3.85s ±22%    ~       (p=0.113 n=9+9)
      
      Fixes #14790
      
      Change-Id: Ie7c6a1c4f9c8eb2f5d65ab127a3845386d6f8b5d
      Reviewed-on: https://go-review.googlesource.com/20835
      
      Reviewed-by: default avatarAustin Clements <austin@google.com>
      3b246fa8
  5. 01 Apr, 2016 1 commit
  6. 29 Mar, 2016 1 commit
  7. 25 Mar, 2016 1 commit
    • Dmitry Vyukov's avatar
      runtime: improve randomized stealing logic · ea0386f8
      Dmitry Vyukov authored
      During random stealing we steal 4*GOMAXPROCS times from random procs.
      One would expect that most of the time we check all procs this way,
      but due to low quality PRNG we actually miss procs with frightening
      probability. Below are modelling experiment results for 1e6 tries:
      
      GOMAXPROCS = 2 : missed 1 procs 7944 times
      
      GOMAXPROCS = 3 : missed 1 procs 101620 times
      GOMAXPROCS = 3 : missed 2 procs 3571 times
      
      GOMAXPROCS = 4 : missed 1 procs 63916 times
      GOMAXPROCS = 4 : missed 2 procs 61 times
      GOMAXPROCS = 4 : missed 3 procs 16 times
      
      GOMAXPROCS = 5 : missed 1 procs 133136 times
      GOMAXPROCS = 5 : missed 2 procs 1025 times
      GOMAXPROCS = 5 : missed 3 procs 101 times
      GOMAXPROCS = 5 : missed 4 procs 15 times
      
      GOMAXPROCS = 8 : missed 1 procs 151765 times
      GOMAXPROCS = 8 : missed 2 procs 5057 times
      GOMAXPROCS = 8 : missed 3 procs 1726 times
      GOMAXPROCS = 8 : missed 4 procs 68 times
      
      GOMAXPROCS = 12 : missed 1 procs 199081 times
      GOMAXPROCS = 12 : missed 2 procs 27489 times
      GOMAXPROCS = 12 : missed 3 procs 3113 times
      GOMAXPROCS = 12 : missed 4 procs 233 times
      GOMAXPROCS = 12 : missed 5 procs 9 times
      
      GOMAXPROCS = 16 : missed 1 procs 237477 times
      GOMAXPROCS = 16 : missed 2 procs 30037 times
      GOMAXPROCS = 16 : missed 3 procs 9466 times
      GOMAXPROCS = 16 : missed 4 procs 1334 times
      GOMAXPROCS = 16 : missed 5 procs 192 times
      GOMAXPROCS = 16 : missed 6 procs 5 times
      GOMAXPROCS = 16 : missed 7 procs 1 times
      GOMAXPROCS = 16 : missed 8 procs 1 times
      
      A missed proc won't lead to underutilization because we check all procs
      again after dropping P. But it can lead to an unpleasant situation
      when we miss a proc, drop P, check all procs, discover work, acquire P,
      miss the proc again, repeat.
      
      Improve stealing logic to cover all procs.
      Also don't enter spinning mode and try to steal when there is nobody around.
      
      Change-Id: Ibb6b122cc7fb836991bad7d0639b77c807aab4c2
      Reviewed-on: https://go-review.googlesource.com/20836
      
      Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarAustin Clements <austin@google.com>
      Reviewed-by: default avatarMarvin Stenger <marvin.stenger94@gmail.com>
      ea0386f8
  8. 16 Mar, 2016 2 commits
    • Austin Clements's avatar
      runtime: never pass stack pointers to gopark · 8fb182d0
      Austin Clements authored
      gopark calls the unlock function after setting the G to _Gwaiting.
      This means it's generally unsafe to access the G's stack from the
      unlock function because the G may start running on another P. Once we
      start shrinking stacks concurrently, a stack shrink could also move
      the stack the moment after it enters _Gwaiting and before the unlock
      function is called.
      
      Document this restriction and fix the two places where we currently
      violate it.
      
      This is unlikely to be a problem in practice for these two places
      right now, but they're already skating on thin ice. For example, the
      following sequence could in principle cause corruption, deadlock, or a
      panic in the select code:
      
      On M1/P1:
      1. G1 selects on channels A and B.
      2. selectgoImpl calls gopark.
      3. gopark puts G1 in _Gwaiting.
      4. gopark calls selparkcommit.
      5. selparkcommit releases the lock on channel A.
      
      On M2/P2:
      6. G2 sends to channel A.
      7. The send puts G1 in _Grunnable and puts it on P2's run queue.
      8. The scheduler runs, selects G1, puts it in _Grunning, and resumes G1.
      9. On G1, the sellock immediately following the gopark gets called.
      10. sellock grows and moves the stack.
      
      On M1/P1:
      11. selparkcommit continues to scan the lock order for the next
      channel to unlock, but it's now reading from a freed (and possibly
      reused) stack.
      
      This shouldn't happen in practice because step 10 isn't the first call
      to sellock, so the stack should already be big enough. However, once
      we start shrinking stacks concurrently, this reasoning won't work any
      more.
      
      For #12967.
      
      Change-Id: I3660c5be37e5be9f87433cb8141bdfdf37fadc4c
      Reviewed-on: https://go-review.googlesource.com/20038
      
      Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      8fb182d0
    • Austin Clements's avatar
      runtime: record channel in sudog · e4a95b63
      Austin Clements authored
      Given a G, there's currently no way to find the channel it's blocking
      on. We'll need this information to fix a (probably theoretical) bug in
      select and to implement concurrent stack shrinking, so record the
      channel in the sudog.
      
      For #12967.
      
      Change-Id: If8fb63a140f1d07175818824d08c0ebeec2bdf66
      Reviewed-on: https://go-review.googlesource.com/20035
      
      Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      e4a95b63
  9. 13 Mar, 2016 1 commit
  10. 02 Mar, 2016 1 commit
    • Brad Fitzpatrick's avatar
      all: single space after period. · 5fea2ccc
      Brad Fitzpatrick authored
      The tree's pretty inconsistent about single space vs double space
      after a period in documentation. Make it consistently a single space,
      per earlier decisions. This means contributors won't be confused by
      misleading precedence.
      
      This CL doesn't use go/doc to parse. It only addresses // comments.
      It was generated with:
      
      $ perl -i -npe 's,^(\s*// .+[a-z]\.)  +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.)  +([A-Z])')
      $ go test go/doc -update
      
      Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7
      Reviewed-on: https://go-review.googlesource.com/20022
      
      Reviewed-by: default avatarRob Pike <r@golang.org>
      Reviewed-by: default avatarDave Day <djd@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      5fea2ccc
  11. 26 Feb, 2016 1 commit
    • Dmitry Vyukov's avatar
      runtime: unwire g/m in dropg always · bdc14698
      Dmitry Vyukov authored
      Currently dropg does not unwire locked g/m.
      This is unnecessary distiction between locked and non-locked g/m.
      We always restart goroutines with execute which re-wires g/m.
      
      First, this produces false sense that this distinction is necessary.
      Second, it can confuse some sanity and cross checks. For example,
      if we check that g/m are unwired before we wire them in execute,
      the check will fail for locked g/m. I've hit this while doing some
      race detector changes, When we deschedule a goroutine and run
      scheduler code, m.curg is generally nil, but not for locked ms.
      
      Remove the distinction.
      
      Change-Id: I3b87a28ff343baa1d564aab1f821b582a84dee07
      Reviewed-on: https://go-review.googlesource.com/19950
      
      Reviewed-by: default avatarAustin Clements <austin@google.com>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      bdc14698
  12. 25 Feb, 2016 1 commit
  13. 24 Feb, 2016 1 commit
  14. 21 Feb, 2016 1 commit
  15. 02 Feb, 2016 1 commit
    • Austin Clements's avatar
      runtime: start an M when handing off a P when there's GC work · f309bf3e
      Austin Clements authored
      Currently it's possible for the scheduler to deadlock with the right
      confluence of locked Gs, assists, and scheduling of background mark
      workers. Broadly, this happens because handoffp is stricter than
      findrunnable, and if the only work for a P is GC work, handoffp will
      put the P into idle, rather than starting an M to execute that P. One
      way this can happen is as follows:
      
      0. There is only one user G, which we'll call G 1. There is more than
         one P, but they're all idle except the one running G 1.
      
      1. G 1 locks itself to an M using runtime.LockOSThread.
      
      2. GC starts up and enters mark 1.
      
      3. G 1 performs a GC assist, which completes mark 1 without being
         fully satisfied. Completing mark 1 causes all background mark
         workers to park. And since the assist isn't fully satisfied, it
         parks as well, waiting for a background mark worker to satisfy its
         remaining assist debt.
      
      4. The assist park enters the scheduler. Since G 1 is locked to the M,
         the scheduler releases the P and calls handoffp to hand the P to
         another M.
      
      5. handoffp checks the local and global run queues, which are empty,
         and sees that there are idle Ps, so rather than start an M, it puts
         the P into idle.
      
      At this point, all of the Gs are waiting and all of the Ps are idle.
      In particular, none of the GC workers are running, so no mark work
      gets done and the assist on the main G is never satisfied, so the
      whole process soft locks up.
      
      Fix this by making handoffp start an M if there is GC work. This
      reintroduces a key invariant: that in any situation where findrunnable
      would return a G to run on a P, handoffp for that P will start an M to
      run work on that P.
      
      Fixes #13645.
      
      Tested by running 2,689 iterations of `go tool dist test -no-rebuild
      runtime:cpu124` across 10 linux-amd64-noopt VMs with no failures.
      Without this change, the failure rate was somewhere around 1%.
      
      Performance change is negligible.
      
      name              old time/op  new time/op  delta
      XBenchGarbage-12  2.48ms ± 2%  2.48ms ± 1%  -0.24%  (p=0.000 n=92+93)
      
      name                      old time/op    new time/op    delta
      BinaryTree17-12              2.86s ± 2%     2.87s ± 2%    ~     (p=0.667 n=19+20)
      Fannkuch11-12                2.52s ± 1%     2.47s ± 1%  -2.05%  (p=0.000 n=18+20)
      FmtFprintfEmpty-12          51.7ns ± 1%    51.5ns ± 3%    ~     (p=0.931 n=16+20)
      FmtFprintfString-12          170ns ± 1%     168ns ± 1%  -0.65%  (p=0.000 n=19+19)
      FmtFprintfInt-12             160ns ± 0%     160ns ± 0%  +0.18%  (p=0.033 n=17+19)
      FmtFprintfIntInt-12          265ns ± 1%     273ns ± 1%  +2.98%  (p=0.000 n=17+19)
      FmtFprintfPrefixedInt-12     235ns ± 1%     239ns ± 1%  +1.99%  (p=0.000 n=16+19)
      FmtFprintfFloat-12           315ns ± 0%     315ns ± 1%    ~     (p=0.250 n=17+19)
      FmtManyArgs-12              1.04µs ± 1%    1.05µs ± 0%  +0.87%  (p=0.000 n=17+19)
      GobDecode-12                7.93ms ± 0%    7.85ms ± 1%  -1.03%  (p=0.000 n=16+18)
      GobEncode-12                6.62ms ± 1%    6.58ms ± 1%  -0.60%  (p=0.000 n=18+19)
      Gzip-12                      322ms ± 1%     320ms ± 1%  -0.46%  (p=0.009 n=20+20)
      Gunzip-12                   42.5ms ± 1%    42.5ms ± 0%    ~     (p=0.751 n=19+19)
      HTTPClientServer-12         69.7µs ± 1%    70.0µs ± 2%    ~     (p=0.056 n=19+19)
      JSONEncode-12               16.9ms ± 1%    16.7ms ± 1%  -1.13%  (p=0.000 n=19+19)
      JSONDecode-12               61.5ms ± 1%    61.3ms ± 1%  -0.35%  (p=0.001 n=20+17)
      Mandelbrot200-12            3.94ms ± 0%    3.91ms ± 0%  -0.67%  (p=0.000 n=20+18)
      GoParse-12                  3.71ms ± 1%    3.70ms ± 1%    ~     (p=0.244 n=17+19)
      RegexpMatchEasy0_32-12       101ns ± 1%     102ns ± 2%  +0.54%  (p=0.037 n=19+20)
      RegexpMatchEasy0_1K-12       349ns ± 0%     350ns ± 0%  +0.33%  (p=0.000 n=17+18)
      RegexpMatchEasy1_32-12      84.5ns ± 2%    84.2ns ± 1%  -0.43%  (p=0.048 n=19+20)
      RegexpMatchEasy1_1K-12       510ns ± 1%     513ns ± 2%  +0.58%  (p=0.002 n=18+20)
      RegexpMatchMedium_32-12      132ns ± 1%     134ns ± 1%  +0.95%  (p=0.000 n=20+20)
      RegexpMatchMedium_1K-12     40.1µs ± 1%    39.6µs ± 1%  -1.39%  (p=0.000 n=20+20)
      RegexpMatchHard_32-12       2.08µs ± 0%    2.06µs ± 1%  -0.95%  (p=0.000 n=18+18)
      RegexpMatchHard_1K-12       62.2µs ± 1%    61.9µs ± 1%  -0.42%  (p=0.001 n=19+20)
      Revcomp-12                   537ms ± 0%     536ms ± 0%    ~     (p=0.076 n=20+20)
      Template-12                 71.3ms ± 1%    69.3ms ± 1%  -2.75%  (p=0.000 n=20+20)
      TimeParse-12                 361ns ± 0%     360ns ± 1%    ~     (p=0.056 n=19+19)
      TimeFormat-12                353ns ± 0%     352ns ± 0%  -0.23%  (p=0.000 n=17+18)
      [Geo mean]                  62.6µs         62.5µs       -0.17%
      
      Change-Id: I0fbbbe4d7d99653ba5600ffb4394fa03558bc4e9
      Reviewed-on: https://go-review.googlesource.com/19107
      
      Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Reviewed-by: default avatarRuss Cox <rsc@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      f309bf3e
  16. 27 Jan, 2016 2 commits
    • Austin Clements's avatar
      runtime: make p.gcBgMarkWorker a guintptr · 09940b92
      Austin Clements authored
      Currently p.gcBgMarkWorker is a *g. Change it to a guintptr. This
      eliminates a write barrier during the subtle mark worker parking dance
      (which isn't known to be causing problems, but may).
      
      Change-Id: Ibf12c05ac910820448059e69a68e5b882c993ed8
      Reviewed-on: https://go-review.googlesource.com/18970
      
      
      Run-TryBot: Austin Clements <austin@google.com>
      Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Reviewed-by: default avatarRuss Cox <rsc@golang.org>
      09940b92
    • Austin Clements's avatar
      runtime: attach mark workers to P after they park · eb3b1830
      Austin Clements authored
      Currently mark workers attach to their designated Ps before parking,
      either during initialization or after performing a phase transition.
      However, in both of these cases, it's possible that the mark worker is
      running on a different P than the one it attaches to. This is a
      problem, because as soon as the worker attaches to a P, that P's
      scheduler can execute the worker. If the worker hasn't yet parked on
      the P it's actually running on, this means the worker G will be
      running in two places at once. The most visible consequence of this is
      that once the first instance of the worker does park, it will clear
      g.m and the second instance will crash shortly when it tries to use
      g.m.
      
      Fix this by moving the attach to the gopark callback. At this point,
      the G is genuinely stopped and the callback is running on the system
      stack, so it's safe for another P's scheduler to pick up the worker G.
      
      Fixes #13363. Fixes #13978.
      
      Change-Id: If2f7c4a4174f9511f6227e14a27c56fb842d1cc8
      Reviewed-on: https://go-review.googlesource.com/18761
      
      Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Reviewed-by: default avatarRuss Cox <rsc@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      eb3b1830
  17. 14 Jan, 2016 1 commit
  18. 13 Jan, 2016 1 commit
  19. 11 Jan, 2016 1 commit
  20. 09 Jan, 2016 1 commit
  21. 08 Jan, 2016 2 commits
  22. 07 Jan, 2016 2 commits
    • Austin Clements's avatar
      runtime: fix sigprof stack barrier locking · 3f22adec
      Austin Clements authored
      f90b48e0 intended to require the stack barrier lock in all cases of
      sigprof that walked the user stack, but got it wrong. In particular,
      if sp < gp.stack.lo || gp.stack.hi < sp, tracebackUser would be true,
      but we wouldn't acquire the stack lock. If it then turned out that we
      were in a cgo call, it would walk the stack without the lock.
      
      In fact, the whole structure of stack locking is sigprof is somewhat
      wrong because it assumes the G to lock is gp.m.curg, but all three
      gentraceback calls start from potentially different Gs.
      
      To fix this, we lower the gcTryLockStackBarriers calls much closer to
      the gentraceback calls. There are now three separate trylock calls,
      each clearly associated with a gentraceback and the locked G clearly
      matches the G from which the gentraceback starts. This actually brings
      the sigprof logic closer to what it originally was before stack
      barrier locking.
      
      This depends on "runtime: increase assumed stack size in
      externalthreadhandler" because it very slightly increases the stack
      used by sigprof; without this other commit, this is enough to blow the
      profiler thread's assumed stack size.
      
      Fixes #12528 (hopefully for real this time!).
      
      For the 1.5 branch, though it will require some backporting. On the
      1.5 branch, this will *not* require the "runtime: increase assumed
      stack size in externalthreadhandler" commit: there's no pcvalue cache,
      so the used stack is smaller.
      
      Change-Id: Id2f6446ac276848f6fc158bee550cccd03186b83
      Reviewed-on: https://go-review.googlesource.com/18328
      
      
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRuss Cox <rsc@golang.org>
      3f22adec
    • Austin Clements's avatar
      runtime: don't ignore success of cgo profiling tracebacks · b50b2483
      Austin Clements authored
      If a sigprof happens during a cgo call, we traceback from the entry
      point of the cgo call. However, if the SP is outside of the G's stack,
      we'll then ignore this traceback, even if it was successful, and
      overwrite it with just _ExternalCode.
      
      Fix this by accepting any successful traceback, regardless of whether
      we got it from a cgo entry point or from regular Go code.
      
      Fixes #13466.
      
      Change-Id: I5da9684361fc5964f44985d74a8cdf02ffefd213
      Reviewed-on: https://go-review.googlesource.com/18327
      
      
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRuss Cox <rsc@golang.org>
      b50b2483
  23. 06 Jan, 2016 2 commits
  24. 18 Dec, 2015 1 commit
    • Austin Clements's avatar
      runtime: require the stack barrier lock to traceback cgo and libcalls · f90b48e0
      Austin Clements authored
      Currently, if sigprof determines that the G is in user code (not cgo
      or libcall code), it will only traceback the G stack if it can acquire
      the stack barrier lock. However, it has no such restriction if the G
      is in cgo or libcall code. Because cgo calls count as syscalls, stack
      scanning and stack barrier installation can occur during a cgo call,
      which means sigprof could attempt to traceback a G in a cgo call while
      scanstack is installing stack barriers in that G's stack. As a result,
      the following sequence of events can cause the sigprof traceback to
      panic with "missed stack barrier":
      
      1. M1: G1 performs a Cgo call (which, on Windows, is any system call,
         which could explain why this is easier to reproduce on Windows).
      
      2. M1: The Cgo call puts G1 into _Gsyscall state.
      
      3. M2: GC starts a scan of G1's stack. It puts G1 in to _Gscansyscall
         and acquires the stack barrier lock.
      
      4. M3: A profiling signal comes in. On Windows this is a global
         (though I don't think this matters), so the runtime stops M1 and
         calls sigprof for G1.
      
      5. M3: sigprof fails to acquire the stack barrier lock (because the
         GC's stack scan holds it).
      
      6. M3: sigprof observes that G1 is in a Cgo call, so it calls
         gentraceback on G1 with its Cgo transition point.
      
      7. M3: gentraceback on G1 grabs the currently empty g.stkbar slice.
      
      8. M2: GC finishes scanning G1's stack and installing stack barriers.
      
      9. M3: gentraceback encounters one of the just-installed stack
         barriers and panics.
      
      This commit fixes this by only allowing cgo tracebacks if sigprof can
      acquire the stack barrier lock, just like in the regular user
      traceback case.
      
      For good measure, we put the same constraint on libcall tracebacks.
      This case is probably already safe because, unlike cgo calls, libcalls
      leave the G in _Grunning and prevent reaching a safe point, so
      scanstack cannot run during a libcall. However, this also means that
      sigprof will always acquire the stack barrier lock without contention,
      so there's no cost to adding this constraint to libcall tracebacks.
      
      Fixes #12528. For 1.5.3 (will require some backporting).
      
      Change-Id: Ia5a4b8e3d66b23b02ffcd54c6315c81055c0cec2
      Reviewed-on: https://go-review.googlesource.com/18023
      
      
      Run-TryBot: Austin Clements <austin@google.com>
      Reviewed-by: default avatarRuss Cox <rsc@golang.org>
      f90b48e0
  25. 16 Dec, 2015 1 commit
  26. 15 Dec, 2015 2 commits
    • Austin Clements's avatar
      runtime: only trigger forced GC if GC is not running · 01baf13b
      Austin Clements authored
      Currently, sysmon triggers a forced GC solely based on
      memstats.last_gc. However, memstats.last_gc isn't updated until mark
      termination, so once sysmon starts triggering forced GC, it will keep
      triggering them until GC finishes. The first of these actually starts
      a GC; the remainder up to the last print "GC forced", but gcStart
      returns immediately because gcphase != _GCoff; then the last may start
      another GC if the previous GC finishes (and sets last_gc) between
      sysmon triggering it and gcStart checking the GC phase.
      
      Fix this by expanding the condition for starting a forced GC to also
      require that no GC is currently running. This, combined with the way
      forcegchelper blocks until the GC cycle is started, ensures sysmon
      only starts one GC when the time exceeds the forced GC threshold.
      
      Fixes #13458.
      
      Change-Id: Ie6cf841927f6085136be3f45259956cd5cf10d23
      Reviewed-on: https://go-review.googlesource.com/17819
      
      
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRuss Cox <rsc@golang.org>
      01baf13b
    • Austin Clements's avatar
      runtime: simplify sigprof traceback interlocking · 50d8d4e8
      Austin Clements authored
      The addition of stack barrier locking to copystack subsumes the
      partial fix from commit bbd1a1c7 for SIGPROF during copystack. With the
      stack barrier locking, this commit simplifies the rule in sigprof to:
      the user stack can be traced only if sigprof can acquire the stack
      barrier lock.
      
      Updates #12932, #13362.
      
      Change-Id: I1c1f80015053d0ac7761e9e0c7437c2aba26663f
      Reviewed-on: https://go-review.googlesource.com/17192
      
      
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRuss Cox <rsc@golang.org>
      50d8d4e8
  27. 11 Dec, 2015 2 commits
    • Dmitry Vyukov's avatar
      runtime: remove unnecessary wakeups of worker threads · fb6f8a96
      Dmitry Vyukov authored
      Currently we wake up new worker threads whenever we pass
      through the scheduler with nmspinning==0. This leads to
      lots of unnecessary thread wake ups.
      Instead let only spinning threads wake up new spinning threads.
      
      For the following program:
      
      package main
      import "runtime"
      func main() {
      	for i := 0; i < 1e7; i++ {
      		runtime.Gosched()
      	}
      }
      
      Before:
      $ time ./test
      real	0m4.278s
      user	0m7.634s
      sys	0m1.423s
      
      $ strace -c ./test
      % time     seconds  usecs/call     calls    errors syscall
       99.93    9.314936           3   2685009     17536 futex
      
      After:
      $ time ./test
      real	0m1.200s
      user	0m1.181s
      sys	0m0.024s
      
      $ strace -c ./test
      % time     seconds  usecs/call     calls    errors syscall
        3.11    0.000049          25         2           futex
      
      Fixes #13527
      
      Change-Id: Ia1f5bf8a896dcc25d8b04beb1f4317aa9ff16f74
      Reviewed-on: https://go-review.googlesource.com/17540
      
      Reviewed-by: default avatarAustin Clements <austin@google.com>
      Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      fb6f8a96
    • Rahul Chaudhry's avatar
      runtime: fix GODEBUG=schedtrace=X delay handling. · f939ee13
      Rahul Chaudhry authored
      debug.schedtrace is an int32. Convert it to int64 before
      multiplying with constant 1000000. Otherwise, schedtrace
      values more than 2147 result in int32 overflow causing
      incorrect delays between traces.
      
      Change-Id: I064e8d7b432c1e892a705ee1f31a2e8cdd2c3ea3
      Reviewed-on: https://go-review.googlesource.com/17712
      
      Reviewed-by: default avatarAustin Clements <austin@google.com>
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      f939ee13
  28. 19 Nov, 2015 3 commits
    • Austin Clements's avatar
      runtime: recursively disallow write barriers in sysmon · d2c81ad8
      Austin Clements authored
      sysmon runs without a P. This means it can't interact with the garbage
      collector, so write barriers not allowed in anything that sysmon does.
      
      Fixes #10600.
      
      Change-Id: I9de1283900dadee4f72e2ebfc8787123e382ae88
      Reviewed-on: https://go-review.googlesource.com/17006
      
      
      Run-TryBot: Austin Clements <austin@google.com>
      Reviewed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      d2c81ad8
    • Austin Clements's avatar
      cmd/compile: special case nowritebarrierrec for allocm · 402e37d4
      Austin Clements authored
      allocm is a very unusual function: it is specifically designed to
      allocate in contexts where m.p is nil by temporarily taking over a P.
      Since allocm is used in many contexts where it would make sense to use
      nowritebarrierrec, this commit teaches the nowritebarrierrec analysis
      to stop at allocm.
      
      Updates #10600.
      
      Change-Id: I8499629461d4fe25712d861720dfe438df7ada9b
      Reviewed-on: https://go-review.googlesource.com/17005
      
      Reviewed-by: default avatarRuss Cox <rsc@golang.org>
      402e37d4
    • Austin Clements's avatar
      runtime: prevent sigprof during all stack barrier ops · 9c9d74ab
      Austin Clements authored
      A sigprof during stack barrier insertion or removal can crash if it
      detects an inconsistency between the stkbar array and the stack
      itself. Currently we protect against this when scanning another G's
      stack using stackLock, but we don't protect against it when unwinding
      stack barriers for a recover or a memmove to the stack.
      
      This commit cleans up and improves the stack locking code. It
      abstracts out the lock and unlock operations. It uses the lock
      consistently everywhere we perform stack operations, and pushes the
      lock/unlock down closer to where the stack barrier operations happen
      to make it more obvious what it's protecting. Finally, it modifies
      sigprof so that instead of spinning until it acquires the lock, it
      simply doesn't perform a traceback if it can't acquire it. This is
      necessary to prevent self-deadlock.
      
      Updates #11863, which introduced stackLock to fix some of these
      issues, but didn't go far enough.
      
      Updates #12528.
      
      Change-Id: I9d1fa88ae3744d31ba91500c96c6988ce1a3a349
      Reviewed-on: https://go-review.googlesource.com/17036
      
      Reviewed-by: default avatarRuss Cox <rsc@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      9c9d74ab
  29. 18 Nov, 2015 1 commit
  30. 12 Nov, 2015 1 commit
    • Michael Hudson-Doyle's avatar
      cmd/compile, cmd/link, runtime: on ppc64x, maintain the TOC pointer in R2 when compiling PIC · 368d5484
      Michael Hudson-Doyle authored
      The PowerPC ISA does not have a PC-relative load instruction, which poses
      obvious challenges when generating position-independent code. The way the ELFv2
      ABI addresses this is to specify that r2 points to a per "module" (shared
      library or executable) TOC pointer. Maintaining this pointer requires
      cooperation between codegen and the system linker:
      
       * Non-leaf functions leave space on the stack at r1+24 to save the TOC pointer.
       * A call to a function that *might* have to go via a PLT stub must be followed
         by a nop instruction that the system linker can replace with "ld r1, 24(r1)"
         to restore the TOC pointer (only when dynamically linking Go code).
       * When calling a function via a function pointer, the address of the function
         must be in r12, and the first couple of instructions (the "global entry
         point") of the called function use this to derive the address of the TOC
         for the module it is in.
       * When calling a function that is implemented in the same module, the system
         linker adjusts the call to skip over the instructions mentioned above (the
         "local entry point"), assuming that r2 is already correctly set.
      
      So this changeset adds the global entry point instructions, sets the metadata so
      the system linker knows where the local entry point is, inserts code to save the
      TOC pointer at 24(r1), adds a nop after any call not known to be local and copes
      with the odd non-local code transfer in the runtime (e.g. the stuff around
      jmpdefer). It does not actually compile PIC yet.
      
      Change-Id: I7522e22bdfd2f891745a900c60254fe9e372c854
      Reviewed-on: https://go-review.googlesource.com/15967
      
      Reviewed-by: default avatarRuss Cox <rsc@golang.org>
      368d5484