An error occurred fetching the project authors.
  1. 12 Nov, 2018 3 commits
    • Austin Clements's avatar
      cmd/compile: fix race on initializing Sym symFunc flag · 5cf2b4c2
      Austin Clements authored
      SSA lowering can create PFUNC ONAME nodes when compiling method calls.
      Since we generally initialize the node's Sym to a func when we set its
      class to PFUNC, we did this here, too. Unfortunately, since SSA
      compilation is concurrent, this can cause a race if two function
      compilations try to initialize the same symbol.
      
      Luckily, we don't need to do this at all, since we're actually just
      wrapping an ONAME node around an existing Sym that's already marked as
      a function symbol.
      
      Fixes the linux-amd64-racecompile builder, which was broken by CL
      147158.
      
      Updates #27539.
      
      Change-Id: I8ddfce6e66a08ce53998c5bfa6f5a423c1ffc1eb
      Reviewed-on: https://go-review.googlesource.com/c/149158
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDavid Chase <drchase@google.com>
      5cf2b4c2
    • Austin Clements's avatar
      cmd/compile, cmd/link: separate stable and internal ABIs · 685aca45
      Austin Clements authored
      This implements compiler and linker support for separating the
      function calling ABI into two ABIs: a stable and an internal ABI. At
      the moment, the two ABIs are identical, but we'll be able to evolve
      the internal ABI without breaking existing assembly code that depends
      on the stable ABI for calling to and from Go.
      
      The Go compiler generates internal ABI symbols for all Go functions.
      It uses the symabis information produced by the assembler to create
      ABI wrappers whenever it encounters a body-less Go function that's
      defined in assembly or a Go function that's referenced from assembly.
      
      Since the two ABIs are currently identical, for the moment this is
      implemented using "ABI alias" symbols, which are just forwarding
      references to the native ABI symbol for a function. This way there's
      no actual code involved in the ABI wrapper, which is good because
      we're not deriving any benefit from it right now. Once the ABIs
      diverge, we can eliminate ABI aliases.
      
      The linker represents these different ABIs internally as different
      versions of the same symbol. This way, the linker keeps us honest,
      since every symbol definition and reference also specifies its
      version. The linker is responsible for resolving ABI aliases.
      
      Fixes #27539.
      
      Change-Id: I197c52ec9f8fc435db8f7a4259029b20f6d65e95
      Reviewed-on: https://go-review.googlesource.com/c/147160
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDavid Chase <drchase@google.com>
      685aca45
    • Austin Clements's avatar
      cmd/compile: mark function Syms · 16e6cd9a
      Austin Clements authored
      In order to mark the obj.LSyms produced by the compiler with the
      correct ABI, we need to know which types.Syms refer to function
      symbols. This CL adds a flag to types.Syms to mark symbols for
      functions, and sets this flag everywhere we create a PFUNC-class node,
      and in the one place where we directly create function symbols without
      always wrapping them in a PFUNC node (methodSym).
      
      We'll use this information to construct obj.LSyms with correct ABI
      information.
      
      For #27539.
      
      Change-Id: Ie3ac8bf3da013e449e78f6ca85546a055f275463
      Reviewed-on: https://go-review.googlesource.com/c/147158
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDavid Chase <drchase@google.com>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      16e6cd9a
  2. 03 Nov, 2018 1 commit
    • Austin Clements's avatar
      cmd/compile: avoid duplicate GC bitmap symbols · 15265ec4
      Austin Clements authored
      Currently, liveness produces a distinct obj.LSym for each GC bitmap
      for each function. These are then named by content hash and only
      ultimately deduplicated by WriteObjFile.
      
      For various reasons (see next commit), we want to remove this
      deduplication behavior from WriteObjFile. Furthermore, it's
      inefficient to produce these duplicate symbols in the first place.
      
      GC bitmaps are the only source of duplicate symbols in the compiler.
      This commit eliminates these duplicate symbols by declaring them in
      the Ctxt symbol hash just like every other obj.LSym. As a result, all
      GC bitmaps with the same content now refer to the same obj.LSym.
      
      The next commit will remove deduplication from WriteObjFile.
      
      For #27539.
      
      Change-Id: I4f15e3d99530122cdf473b7a838c69ef5f79db59
      Reviewed-on: https://go-review.googlesource.com/c/146557
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      15265ec4
  3. 29 Oct, 2018 1 commit
  4. 27 Oct, 2018 2 commits
  5. 25 Oct, 2018 3 commits
    • Keith Randall's avatar
      cmd/compile: fix Mul->Mul64 intrinsic alias · 7a634034
      Keith Randall authored
      The alias declaration needs to come after the function it is aliasing.
      
      It isn't a big deal in this case, as bits.Mul inlines and has as its
      body bits.Mul64, so the desired code gets generated regardless.
      The alias should only have an effect on inlining cost estimates
      (for functions that call bits.Mul).
      
      Change-Id: I0d814899ce7049a0fb36e8ce1ad5ababbaf6265f
      Reviewed-on: https://go-review.googlesource.com/c/144597
      Run-TryBot: Keith Randall <khr@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarGiovanni Bajo <rasky@develer.com>
      7a634034
    • Keith Randall's avatar
      cmd/compile: intrinsify math/bits.Sub on amd64 · dd789550
      Keith Randall authored
      name             old time/op  new time/op  delta
      Sub-8            1.12ns ± 1%  1.17ns ± 1%   +5.20%          (p=0.008 n=5+5)
      Sub32-8          1.11ns ± 0%  1.11ns ± 0%     ~     (all samples are equal)
      Sub64-8          1.12ns ± 0%  1.18ns ± 1%   +5.00%          (p=0.016 n=4+5)
      Sub64multiple-8  4.10ns ± 1%  0.86ns ± 1%  -78.93%          (p=0.008 n=5+5)
      
      Fixes #28273
      
      Change-Id: Ibcb6f2fd32d987c3bcbae4f4cd9d335a3de98548
      Reviewed-on: https://go-review.googlesource.com/c/144258
      Run-TryBot: Keith Randall <khr@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      dd789550
    • Keith Randall's avatar
      cmd/compile: intrinsify math/bits.Add on amd64 · 899f3a28
      Keith Randall authored
      name             old time/op  new time/op  delta
      Add-8            1.11ns ± 0%  1.18ns ± 0%   +6.31%  (p=0.029 n=4+4)
      Add32-8          1.02ns ± 0%  1.02ns ± 1%     ~     (p=0.333 n=4+5)
      Add64-8          1.11ns ± 1%  1.17ns ± 0%   +5.79%  (p=0.008 n=5+5)
      Add64multiple-8  4.35ns ± 1%  0.86ns ± 0%  -80.22%  (p=0.000 n=5+4)
      
      The individual ops are a bit slower (but still very fast).
      Using the ops in carry chains is very fast.
      
      Update #28273
      
      Change-Id: Id975f76df2b930abf0e412911d327b6c5b1befe5
      Reviewed-on: https://go-review.googlesource.com/c/144257
      Run-TryBot: Keith Randall <khr@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      899f3a28
  6. 23 Oct, 2018 1 commit
    • Carlos Eduardo Seo's avatar
      cmd/compile, runtime: add new lightweight atomics for ppc64x · 5c472132
      Carlos Eduardo Seo authored
      This change creates the infrastructure for new lightweight atomics
      primitives in runtime/internal/atomic:
      
      - LoadAcq, for load-acquire
      - StoreRel, for store-release
      - CasRel, for Compare-and-Swap-release
      
      and implements them for ppc64x. There is visible performance improvement
      in producer-consumer scenarios, like BenchmarkChanProdCons*:
      
      benchmark                           old ns/op     new ns/op     delta
      BenchmarkChanProdCons0-48           2034          2034          +0.00%
      BenchmarkChanProdCons10-48          1798          1608          -10.57%
      BenchmarkChanProdCons100-48         1596          1585          -0.69%
      BenchmarkChanProdConsWork0-48       2084          2046          -1.82%
      BenchmarkChanProdConsWork10-48      1829          1668          -8.80%
      BenchmarkChanProdConsWork100-48     1650          1650          +0.00%
      
      Fixes #21348
      
      Change-Id: I1f6ce377e4a0fe4bd7f5f775e8036f50070ad8db
      Reviewed-on: https://go-review.googlesource.com/c/142277
      Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDavid Chase <drchase@google.com>
      5c472132
  7. 22 Oct, 2018 1 commit
    • Carlos Eduardo Seo's avatar
      cmd/compile: intrinsify math/big.mulWW on ppc64x · 1e8ecefc
      Carlos Eduardo Seo authored
      This change implements mulWW as an intrinsic for ppc64x. Performance
      numbers below:
      
      name                            old time/op    new time/op    delta
      QuoRem                            4.54µs ±45%    3.22µs ± 0%  -29.22%  (p=0.029 n=4+4)
      ModSqrt225_Tonelli                 765µs ± 3%     757µs ± 0%   -1.02%  (p=0.029 n=4+4)
      ModSqrt225_3Mod4                   231µs ± 0%     231µs ± 0%   -0.10%  (p=0.029 n=4+4)
      ModSqrt231_Tonelli                 789µs ± 0%     788µs ± 0%   -0.14%  (p=0.029 n=4+4)
      ModSqrt231_5Mod8                   267µs ± 0%     267µs ± 0%   -0.13%  (p=0.029 n=4+4)
      Sqrt                              49.5µs ±17%    45.3µs ± 0%   -8.48%  (p=0.029 n=4+4)
      IntSqr/1                          32.2ns ±22%    24.2ns ± 0%  -24.79%  (p=0.029 n=4+4)
      IntSqr/2                          60.6ns ± 0%    60.9ns ± 0%   +0.50%  (p=0.029 n=4+4)
      IntSqr/3                          82.8ns ± 0%    83.3ns ± 0%   +0.51%  (p=0.029 n=4+4)
      IntSqr/5                           122ns ± 0%     121ns ± 0%   -1.22%  (p=0.029 n=4+4)
      IntSqr/8                           227ns ± 0%     226ns ± 0%   -0.44%  (p=0.029 n=4+4)
      IntSqr/10                          300ns ± 0%     298ns ± 0%   -0.67%  (p=0.029 n=4+4)
      IntSqr/20                         1.02µs ± 0%    0.89µs ± 0%  -13.08%  (p=0.029 n=4+4)
      IntSqr/30                         1.73µs ± 0%    1.51µs ± 0%  -12.73%  (p=0.029 n=4+4)
      IntSqr/50                         3.69µs ± 1%    3.29µs ± 0%  -10.70%  (p=0.029 n=4+4)
      IntSqr/80                         7.64µs ± 0%    7.04µs ± 0%   -7.91%  (p=0.029 n=4+4)
      IntSqr/100                        11.1µs ± 0%    10.3µs ± 0%   -7.04%  (p=0.029 n=4+4)
      IntSqr/200                        37.9µs ± 0%    36.4µs ± 0%   -4.13%  (p=0.029 n=4+4)
      IntSqr/300                        69.4µs ± 0%    66.0µs ± 0%   -4.94%  (p=0.029 n=4+4)
      IntSqr/500                         174µs ± 0%     168µs ± 0%   -3.10%  (p=0.029 n=4+4)
      IntSqr/800                         347µs ± 0%     333µs ± 0%   -4.06%  (p=0.029 n=4+4)
      IntSqr/1000                        524µs ± 0%     507µs ± 0%   -3.21%  (p=0.029 n=4+4)
      
      Change-Id: If067452f5b6579ad3a2e9daa76a7ffe6fceae1bb
      Reviewed-on: https://go-review.googlesource.com/c/143217
      Run-TryBot: Giovanni Bajo <rasky@develer.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarGiovanni Bajo <rasky@develer.com>
      1e8ecefc
  8. 19 Oct, 2018 1 commit
    • Josh Bleecher Snyder's avatar
      cmd/compile: move argument stack construction to SSA generation · 2578ac54
      Josh Bleecher Snyder authored
      The goal of this change is to move work from walk to SSA,
      and simplify things along the way.
      
      This is hard to accomplish cleanly with small incremental changes,
      so this large commit message aims to provide a roadmap to the diff.
      
      High level description:
      
      Prior to this change, walk was responsible for constructing (most of) the stack for function calls.
      ascompatte gathered variadic arguments into a slice.
      It also rewrote n.List from a list of arguments to a list of assignments to stack slots.
      ascompatte was called multiple times to handle the receiver in a method call.
      reorder1 then introduced temporaries into n.List as needed to avoid smashing the stack.
      adjustargs then made extra stack space for go/defer args as needed.
      
      Node to SSA construction evaluated all the statements in n.List,
      and issued the function call, assuming that the stack was correctly constructed.
      Intrinsic calls had to dig around inside n.List to extract the arguments,
      since intrinsics don't use the stack to make function calls.
      
      This change moves stack construction to the SSA construction phase.
      ascompatte, now called walkParams, does all the work that ascompatte and reorder1 did.
      It handles variadic arguments, inserts the method receiver if needed, and allocates temporaries.
      It does not, however, make any assignments to stack slots.
      Instead, it moves the function arguments to n.Rlist, leaving assignments to temporaries in n.List.
      (It would be better to use Ninit instead of List; future work.)
      During SSA construction, after doing all the temporary assignments in n.List,
      the function arguments are assigned to stack slots by
      constructing the appropriate SSA Value, using (*state).storeArg.
      SSA construction also now handles adjustments for go/defer args.
      This change also simplifies intrinsic calls, since we no longer need to undo walk's work.
      
      Along the way, we simplify nodarg by pushing the fp==1 case to its callers, where it fits nicely.
      
      Generated code differences:
      
      There were a few optimizations applied along the way, the old way.
      f(g()) was rewritten to do a block copy of function results to function arguments.
      And reorder1 avoided introducing the final "save the stack" temporary in n.List.
      
      The f(g()) block copy optimization never actually triggered; the order pass rewrote away g(), so that has been removed.
      
      SSA optimizations mostly obviated the need for reorder1's optimization of avoiding the final temporary.
      The exception was when the temporary's type was not SSA-able;
      in that case, we got a Move into an autotmp and then an immediate Move onto the stack,
      with the autotmp never read or used again.
      This change introduces a new rewrite rule to detect such pointless double Moves
      and collapse them into a single Move.
      This is actually more powerful than the original optimization,
      since the original optimization relied on the imprecise Node.HasCall calculation.
      
      The other significant difference in the generated code is that the stack is now constructed
      completely in SP-offset order. Prior to this change, the stack was constructed somewhat
      haphazardly: first the final argument that Node.HasCall deemed to require a temporary,
      then other arguments, then the method receiver, then the defer/go args.
      SP-offset is probably a good default order. See future work.
      
      There are a few minor object file size changes as a result of this change.
      I investigated some regressions in early versions of this change.
      
      One regression (in archive/tar) was the addition of a single CMPQ instruction,
      which would be eliminated were this TODO from flagalloc to be done:
      	// TODO: Remove original instructions if they are never used.
      
      One regression (in text/template) was an ADDQconstmodify that is now
      a regular MOVQLoad+ADDQconst+MOVQStore, due to an unlucky change
      in the order in which arguments are written. The argument change
      order can also now be luckier, so this appears to be a wash.
      
      All in all, though there will be minor winners and losers,
      this change appears to be performance neutral.
      
      Future work:
      
      Move loading the result of function calls to SSA construction; eliminate OINDREGSP.
      
      Consider pushing stack construction deeper into SSA world, perhaps in an arch-specific pass.
      Among other benefits, this would make it easier to transition to a new calling convention.
      This would require rethinking the handling of stack conflicts and is non-trivial.
      
      Figure out some clean way to indicate that stack construction Stores/Moves
      do not alias each other, so that subsequent passes may do things like
      CSE+tighten shared stack setup, do DSE using non-first Stores, etc.
      This would allow us to eliminate the minor text/template regression.
      
      Possibly make assignments to stack slots not treated as statements by DWARF.
      
      Compiler benchmarks:
      
      name        old time/op       new time/op       delta
      Template          182ms ± 2%        179ms ± 2%  -1.69%  (p=0.000 n=47+48)
      Unicode          86.3ms ± 5%       85.1ms ± 4%  -1.36%  (p=0.001 n=50+50)
      GoTypes           646ms ± 1%        642ms ± 1%  -0.63%  (p=0.000 n=49+48)
      Compiler          2.89s ± 1%        2.86s ± 2%  -1.36%  (p=0.000 n=48+50)
      SSA               8.47s ± 1%        8.37s ± 2%  -1.22%  (p=0.000 n=47+50)
      Flate             122ms ± 2%        121ms ± 2%  -0.66%  (p=0.000 n=47+45)
      GoParser          147ms ± 2%        146ms ± 2%  -0.53%  (p=0.006 n=46+49)
      Reflect           406ms ± 2%        403ms ± 2%  -0.76%  (p=0.000 n=48+43)
      Tar               162ms ± 3%        162ms ± 4%    ~     (p=0.191 n=46+50)
      XML               223ms ± 2%        222ms ± 2%  -0.37%  (p=0.031 n=45+49)
      [Geo mean]        382ms             378ms       -0.89%
      
      name        old user-time/op  new user-time/op  delta
      Template          219ms ± 3%        216ms ± 3%  -1.56%  (p=0.000 n=50+48)
      Unicode           109ms ± 6%        109ms ± 5%    ~     (p=0.190 n=50+49)
      GoTypes           836ms ± 2%        828ms ± 2%  -0.96%  (p=0.000 n=49+48)
      Compiler          3.87s ± 2%        3.80s ± 1%  -1.81%  (p=0.000 n=49+46)
      SSA               12.0s ± 1%        11.8s ± 1%  -2.01%  (p=0.000 n=48+50)
      Flate             142ms ± 3%        141ms ± 3%  -0.85%  (p=0.003 n=50+48)
      GoParser          178ms ± 4%        175ms ± 4%  -1.66%  (p=0.000 n=48+46)
      Reflect           520ms ± 2%        512ms ± 2%  -1.44%  (p=0.000 n=45+48)
      Tar               200ms ± 3%        198ms ± 4%  -0.61%  (p=0.037 n=47+50)
      XML               277ms ± 3%        275ms ± 3%  -0.85%  (p=0.000 n=49+48)
      [Geo mean]        482ms             476ms       -1.23%
      
      name        old alloc/op      new alloc/op      delta
      Template         36.1MB ± 0%       35.3MB ± 0%  -2.18%  (p=0.008 n=5+5)
      Unicode          29.8MB ± 0%       29.3MB ± 0%  -1.58%  (p=0.008 n=5+5)
      GoTypes           125MB ± 0%        123MB ± 0%  -2.13%  (p=0.008 n=5+5)
      Compiler          531MB ± 0%        513MB ± 0%  -3.40%  (p=0.008 n=5+5)
      SSA              2.00GB ± 0%       1.93GB ± 0%  -3.34%  (p=0.008 n=5+5)
      Flate            24.5MB ± 0%       24.3MB ± 0%  -1.18%  (p=0.008 n=5+5)
      GoParser         29.4MB ± 0%       28.7MB ± 0%  -2.34%  (p=0.008 n=5+5)
      Reflect          87.1MB ± 0%       86.0MB ± 0%  -1.33%  (p=0.008 n=5+5)
      Tar              35.3MB ± 0%       34.8MB ± 0%  -1.44%  (p=0.008 n=5+5)
      XML              47.9MB ± 0%       47.1MB ± 0%  -1.86%  (p=0.008 n=5+5)
      [Geo mean]       82.8MB            81.1MB       -2.08%
      
      name        old allocs/op     new allocs/op     delta
      Template           352k ± 0%         347k ± 0%  -1.32%  (p=0.008 n=5+5)
      Unicode            342k ± 0%         339k ± 0%  -0.66%  (p=0.008 n=5+5)
      GoTypes           1.29M ± 0%        1.27M ± 0%  -1.30%  (p=0.008 n=5+5)
      Compiler          4.98M ± 0%        4.87M ± 0%  -2.14%  (p=0.008 n=5+5)
      SSA               15.7M ± 0%        15.2M ± 0%  -2.86%  (p=0.008 n=5+5)
      Flate              233k ± 0%         231k ± 0%  -0.83%  (p=0.008 n=5+5)
      GoParser           296k ± 0%         291k ± 0%  -1.54%  (p=0.016 n=5+4)
      Reflect           1.05M ± 0%        1.04M ± 0%  -0.65%  (p=0.008 n=5+5)
      Tar                343k ± 0%         339k ± 0%  -0.97%  (p=0.008 n=5+5)
      XML                432k ± 0%         426k ± 0%  -1.19%  (p=0.008 n=5+5)
      [Geo mean]         815k              804k       -1.35%
      
      name        old object-bytes  new object-bytes  delta
      Template          505kB ± 0%        505kB ± 0%  -0.01%  (p=0.008 n=5+5)
      Unicode           224kB ± 0%        224kB ± 0%    ~     (all equal)
      GoTypes          1.82MB ± 0%       1.83MB ± 0%  +0.06%  (p=0.008 n=5+5)
      Flate             324kB ± 0%        324kB ± 0%  +0.00%  (p=0.008 n=5+5)
      GoParser          402kB ± 0%        402kB ± 0%  +0.04%  (p=0.008 n=5+5)
      Reflect          1.39MB ± 0%       1.39MB ± 0%  -0.01%  (p=0.008 n=5+5)
      Tar               449kB ± 0%        449kB ± 0%  -0.02%  (p=0.008 n=5+5)
      XML               598kB ± 0%        597kB ± 0%  -0.05%  (p=0.008 n=5+5)
      
      Change-Id: Ifc9d5c1bd01f90171414b8fb18ffe2290d271143
      Reviewed-on: https://go-review.googlesource.com/c/114797
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDavid Chase <drchase@google.com>
      Reviewed-by: default avatarMatthew Dempsky <mdempsky@google.com>
      2578ac54
  9. 18 Oct, 2018 1 commit
    • David Chase's avatar
      cmd/compile: attach slots to incoming params for better debugging · fa31093e
      David Chase authored
      This change attaches a slots to the OpArg values for
      incoming params, and this in turn causes location lists
      to be generated for params, and that yields better
      debugging, in delve and sometimes in gdb.
      
      The parameter lifetimes could start earlier; they are in
      fact defined on entry, not at the point where the OpArg is
      finally mentioned.  (that will be addressed in another CL)
      
      Change-Id: Icca891e118291d260c35a14acd5bc92bb82d9e9f
      Reviewed-on: https://go-review.googlesource.com/c/141697
      Run-TryBot: David Chase <drchase@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      fa31093e
  10. 15 Oct, 2018 2 commits
    • Martin Möhrmann's avatar
      cmd/compile: add intrinsics for runtime/internal/math on 386 and amd64 · a1ca4893
      Martin Möhrmann authored
      Add generic, 386 and amd64 specific ops and SSA rules for multiplication
      with overflow and branching based on overflow flags. Use these to intrinsify
      runtime/internal/math.MulUinptr.
      
      On amd64
        mul, overflow := math.MulUintptr(a, b)
        if overflow {
      is lowered to two instructions:
        MULQ SI
        JO 0x10ee35c
      
      No codegen tests as codegen can not currently test unexported internal runtime
      functions.
      
      amd64:
      name              old time/op  new time/op  delta
      MulUintptr/small  1.16ns ± 5%  0.88ns ± 6%  -24.36%  (p=0.000 n=19+20)
      MulUintptr/large  10.7ns ± 1%   1.1ns ± 1%  -89.28%  (p=0.000 n=17+19)
      
      Change-Id: If60739a86f820e5044d677276c21df90d3c7a86a
      Reviewed-on: https://go-review.googlesource.com/c/141820
      Run-TryBot: Martin Möhrmann <moehrmann@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      a1ca4893
    • Martin Möhrmann's avatar
      cmd/compile: avoid implicit bounds checks after explicit checks for append · 9f66b41b
      Martin Möhrmann authored
      The generated code for the append builtin already checks if the appended
      to slice is large enough and calls growslice if that is not the case.
      Trust that this ensures the slice is large enough and avoid the
      implicit bounds check when slicing the slice to its new size.
      
      Removes 365 panicslice calls (-14%) from the go binary which
      reduces the binary size by ~12kbyte.
      
      Change-Id: I1b88418675ff409bc0b956853c9e95241274d5a6
      Reviewed-on: https://go-review.googlesource.com/c/119315
      Run-TryBot: Martin Möhrmann <moehrmann@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      9f66b41b
  11. 11 Oct, 2018 2 commits
  12. 04 Oct, 2018 2 commits
  13. 03 Oct, 2018 2 commits
    • Keith Randall's avatar
      cmd/compile,runtime: remove ambiguously live logic · 9a8372f8
      Keith Randall authored
      The previous CL introduced stack objects. This CL removes the old
      ambiguously live liveness analysis. After this CL we're relying
      on stack objects exclusively.
      
      Update a bunch of liveness tests to reflect the new world.
      
      Fixes #22350
      
      Change-Id: I739b26e015882231011ce6bc1a7f426049e59f31
      Reviewed-on: https://go-review.googlesource.com/c/134156Reviewed-by: default avatarAustin Clements <austin@google.com>
      Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      9a8372f8
    • Keith Randall's avatar
      cmd/compile,runtime: implement stack objects · cbafcc55
      Keith Randall authored
      Rework how the compiler+runtime handles stack-allocated variables
      whose address is taken.
      
      Direct references to such variables work as before. References through
      pointers, however, use a new mechanism. The new mechanism is more
      precise than the old "ambiguously live" mechanism. It computes liveness
      at runtime based on the actual references among objects on the stack.
      
      Each function records all of its address-taken objects in a FUNCDATA.
      These are called "stack objects". The runtime then uses that
      information while scanning a stack to find all of the stack objects on
      a stack. It then does a mark phase on the stack objects, using all the
      pointers found on the stack (and ancillary structures, like defer
      records) as the root set. Only stack objects which are found to be
      live during this mark phase will be scanned and thus retain any heap
      objects they point to.
      
      A subsequent CL will remove all the "ambiguously live" logic from
      the compiler, so that the stack object tracing will be required.
      For this CL, the stack tracing is all redundant with the current
      ambiguously live logic.
      
      Update #22350
      
      Change-Id: Ide19f1f71a5b6ec8c4d54f8f66f0e9a98344772f
      Reviewed-on: https://go-review.googlesource.com/c/134155Reviewed-by: default avatarAustin Clements <austin@google.com>
      cbafcc55
  14. 02 Oct, 2018 1 commit
  15. 26 Sep, 2018 1 commit
    • Brian Kessler's avatar
      cmd/compile: intrinsify math/bits.Mul · 9eb53ab9
      Brian Kessler authored
      Add SSA rules to intrinsify Mul/Mul64 (AMD64 and ARM64).
      SSA rules for other functions and architectures are left as a future
      optimization.  Benchmark results on AMD64/ARM64 before and after SSA
      implementation are below.
      
      amd64
      name     old time/op  new time/op  delta
      Add-4    1.78ns ± 0%  1.85ns ±12%     ~     (p=0.397 n=4+5)
      Add32-4  1.71ns ± 1%  1.70ns ± 0%     ~     (p=0.683 n=5+5)
      Add64-4  1.80ns ± 2%  1.77ns ± 0%   -1.22%  (p=0.048 n=5+5)
      Sub-4    1.78ns ± 0%  1.78ns ± 0%     ~     (all equal)
      Sub32-4  1.78ns ± 1%  1.78ns ± 0%     ~     (p=1.000 n=5+5)
      Sub64-4  1.78ns ± 1%  1.78ns ± 0%     ~     (p=0.968 n=5+4)
      Mul-4    11.5ns ± 1%   1.8ns ± 2%  -84.39%  (p=0.008 n=5+5)
      Mul32-4  1.39ns ± 0%  1.38ns ± 3%     ~     (p=0.175 n=5+5)
      Mul64-4  6.85ns ± 1%  1.78ns ± 1%  -73.97%  (p=0.008 n=5+5)
      Div-4    57.1ns ± 1%  56.7ns ± 0%     ~     (p=0.087 n=5+5)
      Div32-4  18.0ns ± 0%  18.0ns ± 0%     ~     (all equal)
      Div64-4  56.4ns ±10%  53.6ns ± 1%     ~     (p=0.071 n=5+5)
      
      arm64
      name      old time/op  new time/op  delta
      Add-96    5.51ns ± 0%  5.51ns ± 0%     ~     (all equal)
      Add32-96  5.51ns ± 0%  5.51ns ± 0%     ~     (all equal)
      Add64-96  5.52ns ± 0%  5.51ns ± 0%     ~     (p=0.444 n=5+5)
      Sub-96    5.51ns ± 0%  5.51ns ± 0%     ~     (all equal)
      Sub32-96  5.51ns ± 0%  5.51ns ± 0%     ~     (all equal)
      Sub64-96  5.51ns ± 0%  5.51ns ± 0%     ~     (all equal)
      Mul-96    34.6ns ± 0%   5.0ns ± 0%  -85.52%  (p=0.008 n=5+5)
      Mul32-96  4.51ns ± 0%  4.51ns ± 0%     ~     (all equal)
      Mul64-96  21.1ns ± 0%   5.0ns ± 0%  -76.26%  (p=0.008 n=5+5)
      Div-96    64.7ns ± 0%  64.7ns ± 0%     ~     (all equal)
      Div32-96  17.0ns ± 0%  17.0ns ± 0%     ~     (all equal)
      Div64-96  53.1ns ± 0%  53.1ns ± 0%     ~     (all equal)
      
      Updates #24813
      
      Change-Id: I9bda6d2102f65cae3d436a2087b47ed8bafeb068
      Reviewed-on: https://go-review.googlesource.com/129415
      Run-TryBot: Keith Randall <khr@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      9eb53ab9
  16. 13 Sep, 2018 1 commit
    • erifan01's avatar
      cmd/compile: intrinsify math.RoundToEven and math.Abs on arm64 · 8149db4f
      erifan01 authored
      math.RoundToEven can be done by one arm64 instruction FRINTND, intrinsify it to improve performance.
      The current pure Go implementation of the function Abs is translated into five instructions on arm64:
      str, ldr, and, str, ldr. The intrinsic implementation requires only one instruction, so in terms of
      performance, intrinsify it is worthwhile.
      
      Benchmarks:
      name           old time/op  new time/op  delta
      Abs-8          3.50ns ± 0%  1.50ns ± 0%  -57.14%  (p=0.000 n=10+10)
      RoundToEven-8  9.26ns ± 0%  1.50ns ± 0%  -83.80%  (p=0.000 n=10+10)
      
      Change-Id: I9456b26ab282b544dfac0154fc86f17aed96ac3d
      Reviewed-on: https://go-review.googlesource.com/116535Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      8149db4f
  17. 07 Sep, 2018 1 commit
    • erifan01's avatar
      cmd/compile: implement non-constant rotates using ROR on arm64 · 204cc14b
      erifan01 authored
      Add some rules to match the Go code like:
      	y &= 63
      	x << y | x >> (64-y)
      or
      	y &= 63
      	x >> y | x << (64-y)
      as a ROR instruction. Make math/bits.RotateLeft faster on arm64.
      
      Extends CL 132435 to arm64.
      
      Benchmarks of math/bits.RotateLeftxxN:
      name            old time/op       new time/op       delta
      RotateLeft-8    3.548750ns +- 1%  2.003750ns +- 0%  -43.54%  (p=0.000 n=8+8)
      RotateLeft8-8   3.925000ns +- 0%  3.925000ns +- 0%     ~     (p=1.000 n=8+8)
      RotateLeft16-8  3.925000ns +- 0%  3.927500ns +- 0%     ~     (p=0.608 n=8+8)
      RotateLeft32-8  3.925000ns +- 0%  2.002500ns +- 0%  -48.98%  (p=0.000 n=8+8)
      RotateLeft64-8  3.536250ns +- 0%  2.003750ns +- 0%  -43.34%  (p=0.000 n=8+8)
      
      Change-Id: I77622cd7f39b917427e060647321f5513973232c
      Reviewed-on: https://go-review.googlesource.com/122542
      Run-TryBot: Ben Shi <powerman1st@163.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      204cc14b
  18. 05 Sep, 2018 1 commit
  19. 03 Sep, 2018 1 commit
    • Michael Munday's avatar
      cmd/compile: implement OnesCount{8,16,32,64} intrinsics on s390x · 6f9b94ab
      Michael Munday authored
      This CL implements the math/bits.OnesCount{8,16,32,64} functions
      as intrinsics on s390x using the 'population count' (popcnt)
      instruction. This instruction was released as the 'population-count'
      facility which uses the same facility bit (45) as the
      'distinct-operands' facility which is a pre-requisite for Go on
      s390x. We can therefore use it without a feature check.
      
      The s390x popcnt instruction treats a 64 bit register as a vector
      of 8 bytes, summing the number of ones in each byte individually.
      It then writes the results to the corresponding bytes in the
      output register. Therefore to implement OnesCount{16,32,64} we
      need to sum the individual byte counts using some extra
      instructions. To do this efficiently I've added some additional
      pseudo operations to the s390x SSA backend.
      
      Unlike other architectures the new instruction sequence is faster
      for OnesCount8, so that is implemented using the intrinsic.
      
      name         old time/op  new time/op  delta
      OnesCount    3.21ns ± 1%  1.35ns ± 0%  -58.00%  (p=0.000 n=20+20)
      OnesCount8   0.91ns ± 1%  0.81ns ± 0%  -11.43%  (p=0.000 n=20+20)
      OnesCount16  1.51ns ± 3%  1.21ns ± 0%  -19.71%  (p=0.000 n=20+17)
      OnesCount32  1.91ns ± 0%  1.12ns ± 1%  -41.60%  (p=0.000 n=19+20)
      OnesCount64  3.18ns ± 4%  1.35ns ± 0%  -57.52%  (p=0.000 n=20+20)
      
      Change-Id: Id54f0bd28b6db9a887ad12c0d72fcc168ef9c4e0
      Reviewed-on: https://go-review.googlesource.com/114675
      Run-TryBot: Michael Munday <mike.munday@ibm.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      6f9b94ab
  20. 30 Aug, 2018 1 commit
  21. 24 Aug, 2018 1 commit
  22. 23 Aug, 2018 3 commits
    • Kazuhiro Sera's avatar
      all: fix typos detected by github.com/client9/misspell · ad644d2e
      Kazuhiro Sera authored
      Change-Id: Iadb3c5de8ae9ea45855013997ed70f7929a88661
      GitHub-Last-Rev: ae85bcf82be8fee533e2b9901c6133921382c70a
      GitHub-Pull-Request: golang/go#26920
      Reviewed-on: https://go-review.googlesource.com/128955Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      ad644d2e
    • Yury Smolsky's avatar
      cmd/compile: add sources for inlined functions to ssa.html · 9e2a04d5
      Yury Smolsky authored
      This CL adds the source code of all inlined functions
      into the function specified in $GOSSAFUNC.
      The code is appended to the sources column of ssa.html.
      
      ssaDumpInlined is populated with references to inlined functions.
      Then it is used for dumping the sources in buildssa.
      
      The source columns contains code in following order:
      target function, inlined functions sorted by filename, lineno.
      
      Fixes #25904
      
      Change-Id: I4f6d4834376f1efdfda1f968a5335c0543ed36bc
      Reviewed-on: https://go-review.googlesource.com/126606
      Run-TryBot: Yury Smolsky <yury@smolsky.by>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      9e2a04d5
    • Yury Smolsky's avatar
      cmd/compile: clean the output of GOSSAFUNC · c35069d6
      Yury Smolsky authored
      Since we print almost everything to ssa.html in the GOSSAFUNC mode,
      there is a need to stop spamming stdout when user just wants to see
      ssa.html.
      
      This changes cleans output of the GOSSAFUNC debug mode.
      To enable the dump of the debug data to stdout, one must
      put suffix + after the function name like that:
      
      GOSSAFUNC=Foo+
      
      Otherwise gc will not print the IR and ASM to stdout after each phase.
      AST IR is still sent to stdout because it is not included
      into ssa.html. It will be fixed in a separate change.
      
      The change adds printing out the full path to the ssa.html file.
      
      Updates #25942
      
      Change-Id: I711e145e05f0443c7df5459ca528dced273a62ee
      Reviewed-on: https://go-review.googlesource.com/126603
      Run-TryBot: Yury Smolsky <yury@smolsky.by>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      c35069d6
  23. 22 Aug, 2018 2 commits
  24. 21 Aug, 2018 2 commits
    • Daniel Martí's avatar
      cmd/compile: fix racy setting of gc's Config.Race · c1807989
      Daniel Martí authored
      ssaConfig.Race was being set by many goroutines concurrently, resulting
      in a data race seen below. This was very likely introduced by CL 121235.
      
      	WARNING: DATA RACE
      	Write at 0x00c000344408 by goroutine 12:
      	  cmd/compile/internal/gc.buildssa()
      	      /workdir/go/src/cmd/compile/internal/gc/ssa.go:134 +0x7a8
      	  cmd/compile/internal/gc.compileSSA()
      	      /workdir/go/src/cmd/compile/internal/gc/pgen.go:259 +0x5d
      	  cmd/compile/internal/gc.compileFunctions.func2()
      	      /workdir/go/src/cmd/compile/internal/gc/pgen.go:323 +0x5a
      
      	Previous write at 0x00c000344408 by goroutine 11:
      	  cmd/compile/internal/gc.buildssa()
      	      /workdir/go/src/cmd/compile/internal/gc/ssa.go:134 +0x7a8
      	  cmd/compile/internal/gc.compileSSA()
      	      /workdir/go/src/cmd/compile/internal/gc/pgen.go:259 +0x5d
      	  cmd/compile/internal/gc.compileFunctions.func2()
      	      /workdir/go/src/cmd/compile/internal/gc/pgen.go:323 +0x5a
      
      	Goroutine 12 (running) created at:
      	  cmd/compile/internal/gc.compileFunctions()
      	      /workdir/go/src/cmd/compile/internal/gc/pgen.go:321 +0x39b
      	  cmd/compile/internal/gc.Main()
      	      /workdir/go/src/cmd/compile/internal/gc/main.go:651 +0x437d
      	  main.main()
      	      /workdir/go/src/cmd/compile/main.go:51 +0x100
      
      	Goroutine 11 (running) created at:
      	  cmd/compile/internal/gc.compileFunctions()
      	      /workdir/go/src/cmd/compile/internal/gc/pgen.go:321 +0x39b
      	  cmd/compile/internal/gc.Main()
      	      /workdir/go/src/cmd/compile/internal/gc/main.go:651 +0x437d
      	  main.main()
      	      /workdir/go/src/cmd/compile/main.go:51 +0x100
      
      Instead, set up the field exactly once as part of initssaconfig.
      
      Change-Id: I2c30c6b1cf92b8fd98e7cb5c2e10c526467d0b0a
      Reviewed-on: https://go-review.googlesource.com/130375
      Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDave Cheney <dave@cheney.net>
      c1807989
    • Tobias Klauser's avatar
      cmd/compile/internal/gc: unexport Deferproc and Newproc · e7f59f02
      Tobias Klauser authored
      They are no longer used outside the package since CL 38080.
      
      Passes toolstash-check -all
      
      Change-Id: I30977ed2b233b7c8c53632cc420938bc3b0e37c6
      Reviewed-on: https://go-review.googlesource.com/129781
      Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      e7f59f02
  25. 20 Aug, 2018 1 commit
    • Ilya Tocar's avatar
      cmd/compile: omit racefuncentry/exit when they are not needed · 4201c207
      Ilya Tocar authored
      When compiling with -race, we insert calls to racefuncentry,
      into every function. Add a rule that removes them in leaf functions,
      without instrumented loads/stores.
      Shaves ~30kb from "-race" version of go tool:
      
      file difference:
      go_old 15626192
      go_new 15597520 [-28672 bytes]
      
      section differences:
      global text (code) = -24513 bytes (-0.358598%)
      read-only data = -5849 bytes (-0.167064%)
      Total difference -30362 bytes (-0.097928%)
      
      Fixes #24662
      
      Change-Id: Ia63bf1827f4cf2c25e3e28dcd097c150994ade0a
      Reviewed-on: https://go-review.googlesource.com/121235
      Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      4201c207
  26. 14 Aug, 2018 1 commit
  27. 12 Jul, 2018 1 commit
    • David Chase's avatar
      cmd/compile: add LocalAddr that takes SP,mem operands · 0029cd47
      David Chase authored
      Lack of a well-defined order between VarDef and related
      address operations sometimes causes problems with store order
      and write barrier transformations; glitches in the order are
      made irreparable (by later optimizations) if the two parts of
      the glitch straddle a split in the original block caused by
      insertion of a write barrier diamond.
      
      Fix this by creating a LocalAddr for addresses of locals
      (what VarDef matters for) that takes a memory input to
      help make the order explicit.  Addr is modified to only
      be legal for SB operand, so there is no overlap between
      Addr and LocalAddr uses (there may be some downstream
      cleanup from this).
      
      Changes to generic.rules and rewrite.go ensure that codegen
      tests continue to pass; CSE of LocalAddr is impaired, not
      quite sure of the cost.
      
      Fixes #26105.
      
      Change-Id: Id4192b4440aa4e9d7ba54a465c456df9b530b515
      Reviewed-on: https://go-review.googlesource.com/122483
      Run-TryBot: David Chase <drchase@google.com>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      0029cd47