1. 07 Jun, 2018 1 commit
  2. 05 Jun, 2018 1 commit
  3. 04 Jun, 2018 1 commit
  4. 01 Jun, 2018 1 commit
  5. 29 May, 2018 1 commit
  6. 25 May, 2018 1 commit
  7. 22 May, 2018 1 commit
    • Martin Möhrmann's avatar
      internal/cpu: add experiment to disable CPU features with GODEBUGCPU · f045ddc6
      Martin Möhrmann authored
      Needs the go compiler to be build with GOEXPERIMENT=debugcpu to be active.
      
      The GODEBUGCPU environment variable can be used to disable usage of
      specific processor features in the Go standard library.
      This is useful for testing and benchmarking different code paths that
      are guarded by internal/cpu variable checks.
      
      Use of processor features can not be enabled through GODEBUGCPU.
      
      To disable usage of AVX and SSE41 cpu features on GOARCH amd64 use:
      GODEBUGCPU=avx=0,sse41=0
      
      The special "all" option can be used to disable all options:
      GODEBUGCPU=all=0
      
      Updates #12805
      Updates #15403
      
      Change-Id: I699c2e6f74d98472b6fb4b1e5ffbf29b15697aab
      Reviewed-on: https://go-review.googlesource.com/91737
      
      
      Run-TryBot: Martin Möhrmann <moehrmann@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      f045ddc6
  8. 21 May, 2018 1 commit
  9. 08 May, 2018 1 commit
  10. 07 May, 2018 1 commit
  11. 03 May, 2018 1 commit
    • Josh Bleecher Snyder's avatar
      runtime: convert g.waitreason from string to uint8 · 4d7cf3fe
      Josh Bleecher Snyder authored
      Every time I poke at #14921, the g.waitreason string
      pointer writes show up.
      
      They're not particularly important performance-wise,
      but it'd be nice to clear the noise away.
      
      And it does open up a few extra bytes in the g struct
      for some future use.
      
      This is a re-roll of CL 99078, which was rolled
      back because of failures on s390x.
      Those failures were apparently due to an old version of gdb.
      
      Change-Id: Icc2c12f449b2934063fd61e272e06237625ed589
      Reviewed-on: https://go-review.googlesource.com/111256
      
      
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarMichael Munday <mike.munday@ibm.com>
      4d7cf3fe
  12. 01 May, 2018 1 commit
  13. 30 Apr, 2018 1 commit
  14. 26 Apr, 2018 1 commit
  15. 18 Apr, 2018 1 commit
  16. 13 Apr, 2018 2 commits
  17. 10 Apr, 2018 1 commit
  18. 15 Mar, 2018 1 commit
  19. 13 Mar, 2018 1 commit
  20. 12 Mar, 2018 1 commit
  21. 09 Mar, 2018 1 commit
  22. 08 Mar, 2018 1 commit
    • Austin Clements's avatar
      runtime: make throw safer to call · 7f1b2738
      Austin Clements authored
      Currently, throw may grow the stack, which means whenever we call it
      from a context where it's not safe to grow the stack, we first have to
      switch to the system stack. This is pretty easy to get wrong.
      
      Fix this by making throw switch to the system stack so it doesn't grow
      the stack and is hence safe to call without a system stack switch at
      the call site.
      
      The only thing this complicates is badsystemstack itself, which would
      now go into an infinite loop before printing anything (previously it
      would also go into an infinite loop, but would at least print the
      error first). Fix this by making badsystemstack do a direct write and
      then crash hard.
      
      Change-Id: Ic5b4a610df265e47962dcfa341cabac03c31c049
      Reviewed-on: https://go-review.googlesource.com/93659
      
      
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      7f1b2738
  23. 07 Mar, 2018 1 commit
    • Ian Lance Taylor's avatar
      runtime: get traceback from VDSO code · 419c0645
      Ian Lance Taylor authored
      Currently if a profiling signal arrives while executing within a VDSO
      the profiler will report _ExternalCode, which is needlessly confusing
      for a pure Go program. Change the VDSO calling code to record the
      caller's PC/SP, so that we can do a traceback from that point. If that
      fails for some reason, report _VDSO rather than _ExternalCode, which
      should at least point in the right direction.
      
      This adds some instructions to the code that calls the VDSO, but the
      slowdown is reasonably negligible:
      
      name                                  old time/op  new time/op  delta
      ClockVDSOAndFallbackPaths/vDSO-8      40.5ns ± 2%  41.3ns ± 1%  +1.85%  (p=0.002 n=10+10)
      ClockVDSOAndFallbackPaths/Fallback-8  41.9ns ± 1%  43.5ns ± 1%  +3.84%  (p=0.000 n=9+9)
      TimeNow-8                             41.5ns ± 3%  41.5ns ± 2%    ~     (p=0.723 n=10+10)
      
      Fixes #24142
      
      Change-Id: Iacd935db3c4c782150b3809aaa675a71799b1c9c
      Reviewed-on: https://go-review.googlesource.com/97315
      
      
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarAustin Clements <austin@google.com>
      419c0645
  24. 14 Feb, 2018 1 commit
  25. 04 Jan, 2018 1 commit
    • Austin Clements's avatar
      runtime: avoid race on allp in findrunnable · 7c2cf4e7
      Austin Clements authored
      findrunnable loops over allp to check run queues *after* it has
      dropped its own P. This is unsafe because allp can change when nothing
      is blocking safe-points. Hence, procresize could change allp
      concurrently with findrunnable's loop. Beyond generally violating Go's
      memory model, in the best case this could findrunnable to observe a
      nil P pointer if allp has been grown but the new slots not yet
      initialized. In the worst case, the reads of allp could tear, causing
      findrunnable to read a word that isn't even a valid *P pointer.
      
      Fix this by taking a snapshot of the allp slice header (but not the
      backing store) before findrunnable drops its P and iterating over this
      snapshot. The actual contents of allp are immutable up to len(allp),
      so this fixes the race.
      
      Updates #23098 (may fix).
      
      Change-Id: I556ae2dbfffe9fe4a1bf43126e930b9e5c240ea8
      Reviewed-on: https://go-review.googlesource.com/86215
      
      
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      7c2cf4e7
  26. 01 Dec, 2017 1 commit
    • Austin Clements's avatar
      runtime: restore the Go-allocated signal stack in unminit · 292558be
      Austin Clements authored
      Currently, when we minit on a thread that already has an alternate
      signal stack (e.g., because the M was an extram being used for a cgo
      callback, or to handle a signal on a C thread, or because the
      platform's libc always allocates a signal stack like on Android), we
      simply drop the Go-allocated gsignal stack on the floor.
      
      This is a problem for Ms on the extram list because those Ms may later
      be reused for a different thread that may not have its own alternate
      signal stack. On tip, this manifests as a crash in sigaltstack because
      we clear the gsignal stack bounds in unminit and later try to use
      those cleared bounds when we re-minit that M. On 1.9 and earlier, we
      didn't clear the bounds, so this manifests as running more than one
      signal handler on the same signal stack, which could lead to arbitrary
      memory corruption.
      
      This CL fixes this problem by saving the Go-allocated gsignal stack in
      a new field in the m struct when overwriting it with a system-provided
      signal stack, and then restoring the original gsignal stack in
      unminit.
      
      This CL is designed to be easy to back-port to 1.9. It won't quite
      cherry-pick cleanly, but it should be sufficient to simply ignore the
      change in mexit (which didn't exist in 1.9).
      
      Now that we always have a place to stash the original signal stack in
      the m struct, there are some simplifications we can make to the signal
      stack handling. We'll do those in a later CL.
      
      Fixes #22930.
      
      Change-Id: I55c5a6dd9d97532f131146afdef0b216e1433054
      Reviewed-on: https://go-review.googlesource.com/81476
      
      
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      292558be
  27. 24 Nov, 2017 1 commit
    • Austin Clements's avatar
      runtime: fix final stack split in exitsyscall · be589f8d
      Austin Clements authored
      exitsyscall should be recursively nosplit, but we don't have a way to
      annotate that right now (see #21314). There's exactly one remaining
      place where this is violated right now: exitsyscall -> casgstatus ->
      print. The other prints in casgstatus are wrapped in systemstack
      calls. This fixes the remaining print.
      
      Updates #21431 (in theory could fix it, but that would just indicate
      that we have a different G status-related crash and we've *never* seen
      that failure on the dashboard.)
      
      Change-Id: I9a5e8d942adce4a5c78cfc6b306ea5bda90dbd33
      Reviewed-on: https://go-review.googlesource.com/79815
      
      
      Run-TryBot: Austin Clements <austin@google.com>
      Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      be589f8d
  28. 22 Nov, 2017 1 commit
  29. 21 Nov, 2017 2 commits
    • Michael Pratt's avatar
      runtime: skip netpoll check if there are no waiters · b75b4d0e
      Michael Pratt authored
      If there are no netpoll waiters then calling netpoll will never find any
      goroutines. The later blocking netpoll in findrunnable already has this
      optimization.
      
      With golang.org/cl/78538 also applied, this change has a small impact on
      latency:
      
      name                             old time/op  new time/op  delta
      WakeupParallelSpinning/0s-12     13.6µs ± 1%  13.7µs ± 1%    ~     (p=0.873 n=19+20)
      WakeupParallelSpinning/1µs-12    17.7µs ± 0%  17.6µs ± 0%  -0.31%  (p=0.000 n=20+20)
      WakeupParallelSpinning/2µs-12    20.2µs ± 2%  19.9µs ± 1%  -1.59%  (p=0.000 n=20+19)
      WakeupParallelSpinning/5µs-12    32.0µs ± 1%  32.1µs ± 1%    ~     (p=0.201 n=20+19)
      WakeupParallelSpinning/10µs-12   51.7µs ± 0%  51.4µs ± 1%  -0.60%  (p=0.000 n=20+18)
      WakeupParallelSpinning/20µs-12   92.2µs ± 0%  92.2µs ± 0%    ~     (p=0.474 n=19+19)
      WakeupParallelSpinning/50µs-12    215µs ± 0%   215µs ± 0%    ~     (p=0.319 n=20+19)
      WakeupParallelSpinning/100µs-12   330µs ± 2%   331µs ± 2%    ~     (p=0.296 n=20+19)
      WakeupParallelSyscall/0s-12       127µs ± 0%   126µs ± 0%  -0.57%  (p=0.000 n=18+18)
      WakeupParallelSyscall/1µs-12      129µs ± 0%   128µs ± 1%  -0.43%  (p=0.000 n=18+19)
      WakeupParallelSyscall/2µs-12      131µs ± 1%   130µs ± 1%  -0.78%  (p=0.000 n=20+19)
      WakeupParallelSyscall/5µs-12      137µs ± 1%   136µs ± 0%  -0.54%  (p=0.000 n=18+19)
      WakeupParallelSyscall/10µs-12     147µs ± 1%   146µs ± 0%  -0.58%  (p=0.000 n=18+19)
      WakeupParallelSyscall/20µs-12     168µs ± 0%   167µs ± 0%  -0.52%  (p=0.000 n=19+19)
      WakeupParallelSyscall/50µs-12     228µs ± 0%   227µs ± 0%  -0.37%  (p=0.000 n=19+18)
      WakeupParallelSyscall/100µs-12    329µs ± 0%   328µs ± 0%  -0.28%  (p=0.000 n=20+18)
      
      There is a bigger improvement in CPU utilization. Before this CL, these
      benchmarks spent 12% of cycles in netpoll, which are gone after this CL.
      
      This also fixes the sched.lastpoll load, which should be atomic.
      
      Change-Id: I600961460608bd5ba3eeddc599493d2be62064c6
      Reviewed-on: https://go-review.googlesource.com/78915
      
      
      Run-TryBot: Michael Pratt <mpratt@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Reviewed-by: default avatarAustin Clements <austin@google.com>
      b75b4d0e
    • Jamie Liu's avatar
      runtime: only sleep before stealing work from a running P · 868c8b37
      Jamie Liu authored
      The sleep in question does not make sense if the stolen-from P cannot
      run the stolen G. The usleep(3) has been observed delaying execution of
      woken G's by ~60us; skipping it reduces the wakeup-to-execution latency
      to ~7us in these cases, improving CPU utilization.
      
      Benchmarks added by this change:
      
      name                             old time/op  new time/op  delta
      WakeupParallelSpinning/0s-12     14.4µs ± 1%  14.3µs ± 1%     ~     (p=0.227 n=19+20)
      WakeupParallelSpinning/1µs-12    18.3µs ± 0%  18.3µs ± 1%     ~     (p=0.950 n=20+19)
      WakeupParallelSpinning/2µs-12    22.3µs ± 1%  22.3µs ± 1%     ~     (p=0.670 n=20+18)
      WakeupParallelSpinning/5µs-12    31.7µs ± 0%  31.7µs ± 0%     ~     (p=0.460 n=20+17)
      WakeupParallelSpinning/10µs-12   51.8µs ± 0%  51.8µs ± 0%     ~     (p=0.883 n=20+20)
      WakeupParallelSpinning/20µs-12   91.9µs ± 0%  91.9µs ± 0%     ~     (p=0.245 n=20+20)
      WakeupParallelSpinning/50µs-12    214µs ± 0%   214µs ± 0%     ~     (p=0.509 n=19+20)
      WakeupParallelSpinning/100µs-12   335µs ± 0%   335µs ± 0%   -0.05%  (p=0.006 n=17+15)
      WakeupParallelSyscall/0s-12       228µs ± 2%   129µs ± 1%  -43.32%  (p=0.000 n=20+19)
      WakeupParallelSyscall/1µs-12      232µs ± 1%   131µs ± 1%  -43.60%  (p=0.000 n=19+20)
      WakeupParallelSyscall/2µs-12      236µs ± 1%   133µs ± 1%  -43.44%  (p=0.000 n=18+19)
      WakeupParallelSyscall/5µs-12      248µs ± 2%   139µs ± 1%  -43.68%  (p=0.000 n=18+19)
      WakeupParallelSyscall/10µs-12     263µs ± 3%   150µs ± 2%  -42.97%  (p=0.000 n=18+20)
      WakeupParallelSyscall/20µs-12     281µs ± 2%   170µs ± 1%  -39.43%  (p=0.000 n=19+19)
      WakeupParallelSyscall/50µs-12     345µs ± 4%   246µs ± 7%  -28.85%  (p=0.000 n=20+20)
      WakeupParallelSyscall/100µs-12    460µs ± 5%   350µs ± 4%  -23.85%  (p=0.000 n=20+20)
      
      Benchmarks associated with the change that originally added this sleep
      (see https://golang.org/s/go15gomaxprocs):
      
      name        old time/op  new time/op  delta
      Chain       19.4µs ± 2%  19.3µs ± 1%    ~     (p=0.101 n=19+20)
      ChainBuf    19.5µs ± 2%  19.4µs ± 2%    ~     (p=0.840 n=19+19)
      Chain-2     19.9µs ± 1%  19.9µs ± 2%    ~     (p=0.734 n=19+19)
      ChainBuf-2  20.0µs ± 2%  20.0µs ± 2%    ~     (p=0.175 n=19+17)
      Chain-4     20.3µs ± 1%  20.1µs ± 1%  -0.62%  (p=0.010 n=19+18)
      ChainBuf-4  20.3µs ± 1%  20.2µs ± 1%  -0.52%  (p=0.023 n=19+19)
      Powser       2.09s ± 1%   2.10s ± 3%    ~     (p=0.908 n=19+19)
      Powser-2     2.21s ± 1%   2.20s ± 1%  -0.35%  (p=0.010 n=19+18)
      Powser-4     2.31s ± 2%   2.31s ± 2%    ~     (p=0.578 n=18+19)
      Sieve        13.6s ± 1%   13.6s ± 1%    ~     (p=0.909 n=17+18)
      Sieve-2      8.02s ±52%   7.28s ±15%    ~     (p=0.336 n=20+16)
      Sieve-4      4.00s ±35%   3.98s ±26%    ~     (p=0.654 n=20+18)
      
      Change-Id: I58edd8ce01075859d871e2348fc0833e9c01f70f
      Reviewed-on: https://go-review.googlesource.com/78538
      
      Reviewed-by: default avatarAustin Clements <austin@google.com>
      868c8b37
  30. 07 Nov, 2017 2 commits
  31. 30 Oct, 2017 1 commit
    • Austin Clements's avatar
      runtime: buffered write barrier implementation · e9079a69
      Austin Clements authored
      This implements runtime support for buffered write barriers on amd64.
      The buffered write barrier has a fast path that simply enqueues
      pointers in a per-P buffer. Unlike the current write barrier, this
      fast path is *not* a normal Go call and does not require the compiler
      to spill general-purpose registers or put arguments on the stack. When
      the buffer fills up, the write barrier takes the slow path, which
      spills all general purpose registers and flushes the buffer. We don't
      allow safe-points or stack splits while this frame is active, so it
      doesn't matter that we have no type information for the spilled
      registers in this frame.
      
      One minor complication is cgocheck=2 mode, which uses the write
      barrier to detect Go pointers being written to non-Go memory. We
      obviously can't buffer this, so instead we set the buffer to its
      minimum size, forcing the write barrier into the slow path on every
      call. For this specific case, we pass additional information as
      arguments to the flush function. This also requires enabling the cgo
      write barrier slightly later during runtime initialization, after Ps
      (and the per-P write barrier buffers) have been initialized.
      
      The code in this CL is not yet active. The next CL will modify the
      compiler to generate calls to the new write barrier.
      
      This reduces the average cost of the write barrier by roughly a factor
      of 4, which will pay for the cost of having it enabled more of the
      time after we make the GC pacer less aggressive. (Benchmarks will be
      in the next CL.)
      
      Updates #14951.
      Updates #22460.
      
      Change-Id: I396b5b0e2c5e5c4acfd761a3235fd15abadc6cb1
      Reviewed-on: https://go-review.googlesource.com/73711
      
      
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      e9079a69
  32. 29 Oct, 2017 1 commit
    • Austin Clements's avatar
      runtime: allow write barriers in gchelper · 3526d803
      Austin Clements authored
      We're about to start tracking nowritebarrierrec through systemstack
      calls, which detects that we're calling markroot (which has write
      barriers) from gchelper, which is called from the scheduler during STW
      apparently without a P.
      
      But it turns out that func helpgc, which wakes up blocked Ms to run
      gchelper, installs a P for gchelper to use. This means there *is* a P
      when gchelper runs, so it is allowed to have write barriers. Tell the
      compiler this by marking gchelper go:yeswritebarrierrec. Also,
      document the call to gchelper so I don't have to spend another half a
      day puzzling over how on earth this could possibly work before
      discovering the spooky action-at-a-distance in helpgc.
      
      Updates #22384.
      For #22460.
      
      Change-Id: I7394c9b4871745575f87a2d4fbbc5b8e54d669f7
      Reviewed-on: https://go-review.googlesource.com/72772
      
      
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      3526d803
  33. 25 Oct, 2017 1 commit
  34. 17 Oct, 2017 2 commits
  35. 12 Oct, 2017 1 commit
  36. 11 Oct, 2017 1 commit
    • Austin Clements's avatar
      runtime: don't try to free OS-created signal stacks · 44d9e96d
      Austin Clements authored
      Android's libc creates a signal stack for every thread it creates. In
      Go, minitSignalStack picks up this existing signal stack and puts it
      in m.gsignal.stack. However, if we later try to exit a thread (because
      a locked goroutine is exiting), we'll attempt to stackfree this
      libc-allocated signal stack and panic.
      
      Fix this by clearing gsignal.stack when we unminitSignals in such a
      situation.
      
      This should fix the Android build, which is currently broken.
      
      Change-Id: Ieea8d72ef063d22741c54c9daddd8bb84926a488
      Reviewed-on: https://go-review.googlesource.com/70130
      
      Reviewed-by: default avatarDavid Crawshaw <crawshaw@golang.org>
      Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      Run-TryBot: David Crawshaw <crawshaw@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      44d9e96d