An error occurred fetching the project authors.
- 12 Nov, 2018 3 commits
-
-
Austin Clements authored
SSA lowering can create PFUNC ONAME nodes when compiling method calls. Since we generally initialize the node's Sym to a func when we set its class to PFUNC, we did this here, too. Unfortunately, since SSA compilation is concurrent, this can cause a race if two function compilations try to initialize the same symbol. Luckily, we don't need to do this at all, since we're actually just wrapping an ONAME node around an existing Sym that's already marked as a function symbol. Fixes the linux-amd64-racecompile builder, which was broken by CL 147158. Updates #27539. Change-Id: I8ddfce6e66a08ce53998c5bfa6f5a423c1ffc1eb Reviewed-on: https://go-review.googlesource.com/c/149158 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
David Chase <drchase@google.com>
-
Austin Clements authored
This implements compiler and linker support for separating the function calling ABI into two ABIs: a stable and an internal ABI. At the moment, the two ABIs are identical, but we'll be able to evolve the internal ABI without breaking existing assembly code that depends on the stable ABI for calling to and from Go. The Go compiler generates internal ABI symbols for all Go functions. It uses the symabis information produced by the assembler to create ABI wrappers whenever it encounters a body-less Go function that's defined in assembly or a Go function that's referenced from assembly. Since the two ABIs are currently identical, for the moment this is implemented using "ABI alias" symbols, which are just forwarding references to the native ABI symbol for a function. This way there's no actual code involved in the ABI wrapper, which is good because we're not deriving any benefit from it right now. Once the ABIs diverge, we can eliminate ABI aliases. The linker represents these different ABIs internally as different versions of the same symbol. This way, the linker keeps us honest, since every symbol definition and reference also specifies its version. The linker is responsible for resolving ABI aliases. Fixes #27539. Change-Id: I197c52ec9f8fc435db8f7a4259029b20f6d65e95 Reviewed-on: https://go-review.googlesource.com/c/147160 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
David Chase <drchase@google.com>
-
Austin Clements authored
In order to mark the obj.LSyms produced by the compiler with the correct ABI, we need to know which types.Syms refer to function symbols. This CL adds a flag to types.Syms to mark symbols for functions, and sets this flag everywhere we create a PFUNC-class node, and in the one place where we directly create function symbols without always wrapping them in a PFUNC node (methodSym). We'll use this information to construct obj.LSyms with correct ABI information. For #27539. Change-Id: Ie3ac8bf3da013e449e78f6ca85546a055f275463 Reviewed-on: https://go-review.googlesource.com/c/147158 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
David Chase <drchase@google.com> Reviewed-by:
Keith Randall <khr@golang.org>
-
- 03 Nov, 2018 1 commit
-
-
Austin Clements authored
Currently, liveness produces a distinct obj.LSym for each GC bitmap for each function. These are then named by content hash and only ultimately deduplicated by WriteObjFile. For various reasons (see next commit), we want to remove this deduplication behavior from WriteObjFile. Furthermore, it's inefficient to produce these duplicate symbols in the first place. GC bitmaps are the only source of duplicate symbols in the compiler. This commit eliminates these duplicate symbols by declaring them in the Ctxt symbol hash just like every other obj.LSym. As a result, all GC bitmaps with the same content now refer to the same obj.LSym. The next commit will remove deduplication from WriteObjFile. For #27539. Change-Id: I4f15e3d99530122cdf473b7a838c69ef5f79db59 Reviewed-on: https://go-review.googlesource.com/c/146557 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Keith Randall <khr@golang.org> Reviewed-by:
Cherry Zhang <cherryyz@google.com>
-
- 29 Oct, 2018 1 commit
-
-
Martin Möhrmann authored
Only return a pointer p to the new slices backing array from makeslice. Makeslice callers then construct sliceheader{p, len, cap} explictly instead of makeslice returning the slice. Reduces go binary size by ~0.2%. Removes 92 (~3.5%) panicindex calls from go binary. Change-Id: I29b7c3b5fe8b9dcec96e2c43730575071cfe8a94 Reviewed-on: https://go-review.googlesource.com/c/141822 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Josh Bleecher Snyder <josharian@gmail.com>
-
- 27 Oct, 2018 2 commits
-
-
Matthew Dempsky authored
Avoids allocating an ONAME for OLABEL, OGOTO, and named OBREAK and OCONTINUE nodes. Passes toolstash-check. Change-Id: I359142cd48e8987b5bf29ac100752f8c497261c1 Reviewed-on: https://go-review.googlesource.com/c/145200 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Robert Griesemer <gri@golang.org>
-
Matthew Dempsky authored
This is supposed to print out function stack frames, but it's been broken since golang.org/cl/38593, and no one has noticed. Change-Id: Iad428a9097d452b878b1f8c5df22afd6f671ac2e Reviewed-on: https://go-review.googlesource.com/c/145199 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Robert Griesemer <gri@golang.org>
-
- 25 Oct, 2018 3 commits
-
-
Keith Randall authored
The alias declaration needs to come after the function it is aliasing. It isn't a big deal in this case, as bits.Mul inlines and has as its body bits.Mul64, so the desired code gets generated regardless. The alias should only have an effect on inlining cost estimates (for functions that call bits.Mul). Change-Id: I0d814899ce7049a0fb36e8ce1ad5ababbaf6265f Reviewed-on: https://go-review.googlesource.com/c/144597 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Giovanni Bajo <rasky@develer.com>
-
Keith Randall authored
name old time/op new time/op delta Sub-8 1.12ns ± 1% 1.17ns ± 1% +5.20% (p=0.008 n=5+5) Sub32-8 1.11ns ± 0% 1.11ns ± 0% ~ (all samples are equal) Sub64-8 1.12ns ± 0% 1.18ns ± 1% +5.00% (p=0.016 n=4+5) Sub64multiple-8 4.10ns ± 1% 0.86ns ± 1% -78.93% (p=0.008 n=5+5) Fixes #28273 Change-Id: Ibcb6f2fd32d987c3bcbae4f4cd9d335a3de98548 Reviewed-on: https://go-review.googlesource.com/c/144258 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Cherry Zhang <cherryyz@google.com>
-
Keith Randall authored
name old time/op new time/op delta Add-8 1.11ns ± 0% 1.18ns ± 0% +6.31% (p=0.029 n=4+4) Add32-8 1.02ns ± 0% 1.02ns ± 1% ~ (p=0.333 n=4+5) Add64-8 1.11ns ± 1% 1.17ns ± 0% +5.79% (p=0.008 n=5+5) Add64multiple-8 4.35ns ± 1% 0.86ns ± 0% -80.22% (p=0.000 n=5+4) The individual ops are a bit slower (but still very fast). Using the ops in carry chains is very fast. Update #28273 Change-Id: Id975f76df2b930abf0e412911d327b6c5b1befe5 Reviewed-on: https://go-review.googlesource.com/c/144257 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Cherry Zhang <cherryyz@google.com>
-
- 23 Oct, 2018 1 commit
-
-
Carlos Eduardo Seo authored
This change creates the infrastructure for new lightweight atomics primitives in runtime/internal/atomic: - LoadAcq, for load-acquire - StoreRel, for store-release - CasRel, for Compare-and-Swap-release and implements them for ppc64x. There is visible performance improvement in producer-consumer scenarios, like BenchmarkChanProdCons*: benchmark old ns/op new ns/op delta BenchmarkChanProdCons0-48 2034 2034 +0.00% BenchmarkChanProdCons10-48 1798 1608 -10.57% BenchmarkChanProdCons100-48 1596 1585 -0.69% BenchmarkChanProdConsWork0-48 2084 2046 -1.82% BenchmarkChanProdConsWork10-48 1829 1668 -8.80% BenchmarkChanProdConsWork100-48 1650 1650 +0.00% Fixes #21348 Change-Id: I1f6ce377e4a0fe4bd7f5f775e8036f50070ad8db Reviewed-on: https://go-review.googlesource.com/c/142277 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
David Chase <drchase@google.com>
-
- 22 Oct, 2018 1 commit
-
-
Carlos Eduardo Seo authored
This change implements mulWW as an intrinsic for ppc64x. Performance numbers below: name old time/op new time/op delta QuoRem 4.54µs ±45% 3.22µs ± 0% -29.22% (p=0.029 n=4+4) ModSqrt225_Tonelli 765µs ± 3% 757µs ± 0% -1.02% (p=0.029 n=4+4) ModSqrt225_3Mod4 231µs ± 0% 231µs ± 0% -0.10% (p=0.029 n=4+4) ModSqrt231_Tonelli 789µs ± 0% 788µs ± 0% -0.14% (p=0.029 n=4+4) ModSqrt231_5Mod8 267µs ± 0% 267µs ± 0% -0.13% (p=0.029 n=4+4) Sqrt 49.5µs ±17% 45.3µs ± 0% -8.48% (p=0.029 n=4+4) IntSqr/1 32.2ns ±22% 24.2ns ± 0% -24.79% (p=0.029 n=4+4) IntSqr/2 60.6ns ± 0% 60.9ns ± 0% +0.50% (p=0.029 n=4+4) IntSqr/3 82.8ns ± 0% 83.3ns ± 0% +0.51% (p=0.029 n=4+4) IntSqr/5 122ns ± 0% 121ns ± 0% -1.22% (p=0.029 n=4+4) IntSqr/8 227ns ± 0% 226ns ± 0% -0.44% (p=0.029 n=4+4) IntSqr/10 300ns ± 0% 298ns ± 0% -0.67% (p=0.029 n=4+4) IntSqr/20 1.02µs ± 0% 0.89µs ± 0% -13.08% (p=0.029 n=4+4) IntSqr/30 1.73µs ± 0% 1.51µs ± 0% -12.73% (p=0.029 n=4+4) IntSqr/50 3.69µs ± 1% 3.29µs ± 0% -10.70% (p=0.029 n=4+4) IntSqr/80 7.64µs ± 0% 7.04µs ± 0% -7.91% (p=0.029 n=4+4) IntSqr/100 11.1µs ± 0% 10.3µs ± 0% -7.04% (p=0.029 n=4+4) IntSqr/200 37.9µs ± 0% 36.4µs ± 0% -4.13% (p=0.029 n=4+4) IntSqr/300 69.4µs ± 0% 66.0µs ± 0% -4.94% (p=0.029 n=4+4) IntSqr/500 174µs ± 0% 168µs ± 0% -3.10% (p=0.029 n=4+4) IntSqr/800 347µs ± 0% 333µs ± 0% -4.06% (p=0.029 n=4+4) IntSqr/1000 524µs ± 0% 507µs ± 0% -3.21% (p=0.029 n=4+4) Change-Id: If067452f5b6579ad3a2e9daa76a7ffe6fceae1bb Reviewed-on: https://go-review.googlesource.com/c/143217 Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Giovanni Bajo <rasky@develer.com>
-
- 19 Oct, 2018 1 commit
-
-
Josh Bleecher Snyder authored
The goal of this change is to move work from walk to SSA, and simplify things along the way. This is hard to accomplish cleanly with small incremental changes, so this large commit message aims to provide a roadmap to the diff. High level description: Prior to this change, walk was responsible for constructing (most of) the stack for function calls. ascompatte gathered variadic arguments into a slice. It also rewrote n.List from a list of arguments to a list of assignments to stack slots. ascompatte was called multiple times to handle the receiver in a method call. reorder1 then introduced temporaries into n.List as needed to avoid smashing the stack. adjustargs then made extra stack space for go/defer args as needed. Node to SSA construction evaluated all the statements in n.List, and issued the function call, assuming that the stack was correctly constructed. Intrinsic calls had to dig around inside n.List to extract the arguments, since intrinsics don't use the stack to make function calls. This change moves stack construction to the SSA construction phase. ascompatte, now called walkParams, does all the work that ascompatte and reorder1 did. It handles variadic arguments, inserts the method receiver if needed, and allocates temporaries. It does not, however, make any assignments to stack slots. Instead, it moves the function arguments to n.Rlist, leaving assignments to temporaries in n.List. (It would be better to use Ninit instead of List; future work.) During SSA construction, after doing all the temporary assignments in n.List, the function arguments are assigned to stack slots by constructing the appropriate SSA Value, using (*state).storeArg. SSA construction also now handles adjustments for go/defer args. This change also simplifies intrinsic calls, since we no longer need to undo walk's work. Along the way, we simplify nodarg by pushing the fp==1 case to its callers, where it fits nicely. Generated code differences: There were a few optimizations applied along the way, the old way. f(g()) was rewritten to do a block copy of function results to function arguments. And reorder1 avoided introducing the final "save the stack" temporary in n.List. The f(g()) block copy optimization never actually triggered; the order pass rewrote away g(), so that has been removed. SSA optimizations mostly obviated the need for reorder1's optimization of avoiding the final temporary. The exception was when the temporary's type was not SSA-able; in that case, we got a Move into an autotmp and then an immediate Move onto the stack, with the autotmp never read or used again. This change introduces a new rewrite rule to detect such pointless double Moves and collapse them into a single Move. This is actually more powerful than the original optimization, since the original optimization relied on the imprecise Node.HasCall calculation. The other significant difference in the generated code is that the stack is now constructed completely in SP-offset order. Prior to this change, the stack was constructed somewhat haphazardly: first the final argument that Node.HasCall deemed to require a temporary, then other arguments, then the method receiver, then the defer/go args. SP-offset is probably a good default order. See future work. There are a few minor object file size changes as a result of this change. I investigated some regressions in early versions of this change. One regression (in archive/tar) was the addition of a single CMPQ instruction, which would be eliminated were this TODO from flagalloc to be done: // TODO: Remove original instructions if they are never used. One regression (in text/template) was an ADDQconstmodify that is now a regular MOVQLoad+ADDQconst+MOVQStore, due to an unlucky change in the order in which arguments are written. The argument change order can also now be luckier, so this appears to be a wash. All in all, though there will be minor winners and losers, this change appears to be performance neutral. Future work: Move loading the result of function calls to SSA construction; eliminate OINDREGSP. Consider pushing stack construction deeper into SSA world, perhaps in an arch-specific pass. Among other benefits, this would make it easier to transition to a new calling convention. This would require rethinking the handling of stack conflicts and is non-trivial. Figure out some clean way to indicate that stack construction Stores/Moves do not alias each other, so that subsequent passes may do things like CSE+tighten shared stack setup, do DSE using non-first Stores, etc. This would allow us to eliminate the minor text/template regression. Possibly make assignments to stack slots not treated as statements by DWARF. Compiler benchmarks: name old time/op new time/op delta Template 182ms ± 2% 179ms ± 2% -1.69% (p=0.000 n=47+48) Unicode 86.3ms ± 5% 85.1ms ± 4% -1.36% (p=0.001 n=50+50) GoTypes 646ms ± 1% 642ms ± 1% -0.63% (p=0.000 n=49+48) Compiler 2.89s ± 1% 2.86s ± 2% -1.36% (p=0.000 n=48+50) SSA 8.47s ± 1% 8.37s ± 2% -1.22% (p=0.000 n=47+50) Flate 122ms ± 2% 121ms ± 2% -0.66% (p=0.000 n=47+45) GoParser 147ms ± 2% 146ms ± 2% -0.53% (p=0.006 n=46+49) Reflect 406ms ± 2% 403ms ± 2% -0.76% (p=0.000 n=48+43) Tar 162ms ± 3% 162ms ± 4% ~ (p=0.191 n=46+50) XML 223ms ± 2% 222ms ± 2% -0.37% (p=0.031 n=45+49) [Geo mean] 382ms 378ms -0.89% name old user-time/op new user-time/op delta Template 219ms ± 3% 216ms ± 3% -1.56% (p=0.000 n=50+48) Unicode 109ms ± 6% 109ms ± 5% ~ (p=0.190 n=50+49) GoTypes 836ms ± 2% 828ms ± 2% -0.96% (p=0.000 n=49+48) Compiler 3.87s ± 2% 3.80s ± 1% -1.81% (p=0.000 n=49+46) SSA 12.0s ± 1% 11.8s ± 1% -2.01% (p=0.000 n=48+50) Flate 142ms ± 3% 141ms ± 3% -0.85% (p=0.003 n=50+48) GoParser 178ms ± 4% 175ms ± 4% -1.66% (p=0.000 n=48+46) Reflect 520ms ± 2% 512ms ± 2% -1.44% (p=0.000 n=45+48) Tar 200ms ± 3% 198ms ± 4% -0.61% (p=0.037 n=47+50) XML 277ms ± 3% 275ms ± 3% -0.85% (p=0.000 n=49+48) [Geo mean] 482ms 476ms -1.23% name old alloc/op new alloc/op delta Template 36.1MB ± 0% 35.3MB ± 0% -2.18% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 29.3MB ± 0% -1.58% (p=0.008 n=5+5) GoTypes 125MB ± 0% 123MB ± 0% -2.13% (p=0.008 n=5+5) Compiler 531MB ± 0% 513MB ± 0% -3.40% (p=0.008 n=5+5) SSA 2.00GB ± 0% 1.93GB ± 0% -3.34% (p=0.008 n=5+5) Flate 24.5MB ± 0% 24.3MB ± 0% -1.18% (p=0.008 n=5+5) GoParser 29.4MB ± 0% 28.7MB ± 0% -2.34% (p=0.008 n=5+5) Reflect 87.1MB ± 0% 86.0MB ± 0% -1.33% (p=0.008 n=5+5) Tar 35.3MB ± 0% 34.8MB ± 0% -1.44% (p=0.008 n=5+5) XML 47.9MB ± 0% 47.1MB ± 0% -1.86% (p=0.008 n=5+5) [Geo mean] 82.8MB 81.1MB -2.08% name old allocs/op new allocs/op delta Template 352k ± 0% 347k ± 0% -1.32% (p=0.008 n=5+5) Unicode 342k ± 0% 339k ± 0% -0.66% (p=0.008 n=5+5) GoTypes 1.29M ± 0% 1.27M ± 0% -1.30% (p=0.008 n=5+5) Compiler 4.98M ± 0% 4.87M ± 0% -2.14% (p=0.008 n=5+5) SSA 15.7M ± 0% 15.2M ± 0% -2.86% (p=0.008 n=5+5) Flate 233k ± 0% 231k ± 0% -0.83% (p=0.008 n=5+5) GoParser 296k ± 0% 291k ± 0% -1.54% (p=0.016 n=5+4) Reflect 1.05M ± 0% 1.04M ± 0% -0.65% (p=0.008 n=5+5) Tar 343k ± 0% 339k ± 0% -0.97% (p=0.008 n=5+5) XML 432k ± 0% 426k ± 0% -1.19% (p=0.008 n=5+5) [Geo mean] 815k 804k -1.35% name old object-bytes new object-bytes delta Template 505kB ± 0% 505kB ± 0% -0.01% (p=0.008 n=5+5) Unicode 224kB ± 0% 224kB ± 0% ~ (all equal) GoTypes 1.82MB ± 0% 1.83MB ± 0% +0.06% (p=0.008 n=5+5) Flate 324kB ± 0% 324kB ± 0% +0.00% (p=0.008 n=5+5) GoParser 402kB ± 0% 402kB ± 0% +0.04% (p=0.008 n=5+5) Reflect 1.39MB ± 0% 1.39MB ± 0% -0.01% (p=0.008 n=5+5) Tar 449kB ± 0% 449kB ± 0% -0.02% (p=0.008 n=5+5) XML 598kB ± 0% 597kB ± 0% -0.05% (p=0.008 n=5+5) Change-Id: Ifc9d5c1bd01f90171414b8fb18ffe2290d271143 Reviewed-on: https://go-review.googlesource.com/c/114797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
David Chase <drchase@google.com> Reviewed-by:
Matthew Dempsky <mdempsky@google.com>
-
- 18 Oct, 2018 1 commit
-
-
David Chase authored
This change attaches a slots to the OpArg values for incoming params, and this in turn causes location lists to be generated for params, and that yields better debugging, in delve and sometimes in gdb. The parameter lifetimes could start earlier; they are in fact defined on entry, not at the point where the OpArg is finally mentioned. (that will be addressed in another CL) Change-Id: Icca891e118291d260c35a14acd5bc92bb82d9e9f Reviewed-on: https://go-review.googlesource.com/c/141697 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Keith Randall <khr@golang.org>
-
- 15 Oct, 2018 2 commits
-
-
Martin Möhrmann authored
Add generic, 386 and amd64 specific ops and SSA rules for multiplication with overflow and branching based on overflow flags. Use these to intrinsify runtime/internal/math.MulUinptr. On amd64 mul, overflow := math.MulUintptr(a, b) if overflow { is lowered to two instructions: MULQ SI JO 0x10ee35c No codegen tests as codegen can not currently test unexported internal runtime functions. amd64: name old time/op new time/op delta MulUintptr/small 1.16ns ± 5% 0.88ns ± 6% -24.36% (p=0.000 n=19+20) MulUintptr/large 10.7ns ± 1% 1.1ns ± 1% -89.28% (p=0.000 n=17+19) Change-Id: If60739a86f820e5044d677276c21df90d3c7a86a Reviewed-on: https://go-review.googlesource.com/c/141820 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Keith Randall <khr@golang.org>
-
Martin Möhrmann authored
The generated code for the append builtin already checks if the appended to slice is large enough and calls growslice if that is not the case. Trust that this ensures the slice is large enough and avoid the implicit bounds check when slicing the slice to its new size. Removes 365 panicslice calls (-14%) from the go binary which reduces the binary size by ~12kbyte. Change-Id: I1b88418675ff409bc0b956853c9e95241274d5a6 Reviewed-on: https://go-review.googlesource.com/c/119315 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Keith Randall <khr@golang.org>
-
- 11 Oct, 2018 2 commits
-
-
David Chase authored
This restores the printing of vXX and bYY in the left-hand edge of the last column of ssa.html, where the generated progs appear. Change-Id: I81ab9b2fa5ae28e6e5de1b77665cfbed8d14e000 Reviewed-on: https://go-review.googlesource.com/c/141277 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Yury Smolsky <yury@smolsky.by>
-
Carlos Eduardo Seo authored
This change implements TrailingZeros16, OnesCount8 and OnesCount16 as intrinsics for ppc64x. benchmark old ns/op new ns/op delta BenchmarkTrailingZeros16-40 2.16 1.61 -25.46% benchmark old ns/op new ns/op delta BenchmarkOnesCount-40 0.71 0.71 +0.00% BenchmarkOnesCount8-40 0.93 0.69 -25.81% BenchmarkOnesCount16-40 1.54 0.75 -51.30% BenchmarkOnesCount32-40 0.75 0.74 -1.33% BenchmarkOnesCount64-40 0.71 0.71 +0.00% Change-Id: I010fa9c0ef596a09362870d81193c633e70da637 Reviewed-on: https://go-review.googlesource.com/c/139137 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Lynn Boger <laboger@linux.vnet.ibm.com>
-
- 04 Oct, 2018 2 commits
-
-
Matthew Dempsky authored
Change-Id: I0490098a7235458c5aede1135426a9f19f8584a7 Reviewed-on: https://go-review.googlesource.com/c/76312 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Robert Griesemer <gri@golang.org>
-
Matthew Dempsky authored
Change-Id: Ie4bab0b74d5a4e1aecd8501a48176b2e9a3d8c42 Reviewed-on: https://go-review.googlesource.com/c/76311 Run-TryBot: Matthew Dempsky <mdempsky@google.com> Reviewed-by:
Robert Griesemer <gri@golang.org>
-
- 03 Oct, 2018 2 commits
-
-
Keith Randall authored
The previous CL introduced stack objects. This CL removes the old ambiguously live liveness analysis. After this CL we're relying on stack objects exclusively. Update a bunch of liveness tests to reflect the new world. Fixes #22350 Change-Id: I739b26e015882231011ce6bc1a7f426049e59f31 Reviewed-on: https://go-review.googlesource.com/c/134156Reviewed-by:
Austin Clements <austin@google.com> Reviewed-by:
Cherry Zhang <cherryyz@google.com>
-
Keith Randall authored
Rework how the compiler+runtime handles stack-allocated variables whose address is taken. Direct references to such variables work as before. References through pointers, however, use a new mechanism. The new mechanism is more precise than the old "ambiguously live" mechanism. It computes liveness at runtime based on the actual references among objects on the stack. Each function records all of its address-taken objects in a FUNCDATA. These are called "stack objects". The runtime then uses that information while scanning a stack to find all of the stack objects on a stack. It then does a mark phase on the stack objects, using all the pointers found on the stack (and ancillary structures, like defer records) as the root set. Only stack objects which are found to be live during this mark phase will be scanned and thus retain any heap objects they point to. A subsequent CL will remove all the "ambiguously live" logic from the compiler, so that the stack object tracing will be required. For this CL, the stack tracing is all redundant with the current ambiguously live logic. Update #22350 Change-Id: Ide19f1f71a5b6ec8c4d54f8f66f0e9a98344772f Reviewed-on: https://go-review.googlesource.com/c/134155Reviewed-by:
Austin Clements <austin@google.com>
-
- 02 Oct, 2018 1 commit
-
-
Carlos Eduardo Seo authored
Add SSA rules to intrinsify Mul/Mul64 on ppc64x. benchmark old ns/op new ns/op delta BenchmarkMul-40 8.80 0.93 -89.43% BenchmarkMul32-40 1.39 1.39 +0.00% BenchmarkMul64-40 5.39 0.93 -82.75% Updates #24813 Change-Id: I6e95bfbe976a2278bd17799df184a7fbc0e57829 Reviewed-on: https://go-review.googlesource.com/138917Reviewed-by:
Lynn Boger <laboger@linux.vnet.ibm.com>
-
- 26 Sep, 2018 1 commit
-
-
Brian Kessler authored
Add SSA rules to intrinsify Mul/Mul64 (AMD64 and ARM64). SSA rules for other functions and architectures are left as a future optimization. Benchmark results on AMD64/ARM64 before and after SSA implementation are below. amd64 name old time/op new time/op delta Add-4 1.78ns ± 0% 1.85ns ±12% ~ (p=0.397 n=4+5) Add32-4 1.71ns ± 1% 1.70ns ± 0% ~ (p=0.683 n=5+5) Add64-4 1.80ns ± 2% 1.77ns ± 0% -1.22% (p=0.048 n=5+5) Sub-4 1.78ns ± 0% 1.78ns ± 0% ~ (all equal) Sub32-4 1.78ns ± 1% 1.78ns ± 0% ~ (p=1.000 n=5+5) Sub64-4 1.78ns ± 1% 1.78ns ± 0% ~ (p=0.968 n=5+4) Mul-4 11.5ns ± 1% 1.8ns ± 2% -84.39% (p=0.008 n=5+5) Mul32-4 1.39ns ± 0% 1.38ns ± 3% ~ (p=0.175 n=5+5) Mul64-4 6.85ns ± 1% 1.78ns ± 1% -73.97% (p=0.008 n=5+5) Div-4 57.1ns ± 1% 56.7ns ± 0% ~ (p=0.087 n=5+5) Div32-4 18.0ns ± 0% 18.0ns ± 0% ~ (all equal) Div64-4 56.4ns ±10% 53.6ns ± 1% ~ (p=0.071 n=5+5) arm64 name old time/op new time/op delta Add-96 5.51ns ± 0% 5.51ns ± 0% ~ (all equal) Add32-96 5.51ns ± 0% 5.51ns ± 0% ~ (all equal) Add64-96 5.52ns ± 0% 5.51ns ± 0% ~ (p=0.444 n=5+5) Sub-96 5.51ns ± 0% 5.51ns ± 0% ~ (all equal) Sub32-96 5.51ns ± 0% 5.51ns ± 0% ~ (all equal) Sub64-96 5.51ns ± 0% 5.51ns ± 0% ~ (all equal) Mul-96 34.6ns ± 0% 5.0ns ± 0% -85.52% (p=0.008 n=5+5) Mul32-96 4.51ns ± 0% 4.51ns ± 0% ~ (all equal) Mul64-96 21.1ns ± 0% 5.0ns ± 0% -76.26% (p=0.008 n=5+5) Div-96 64.7ns ± 0% 64.7ns ± 0% ~ (all equal) Div32-96 17.0ns ± 0% 17.0ns ± 0% ~ (all equal) Div64-96 53.1ns ± 0% 53.1ns ± 0% ~ (all equal) Updates #24813 Change-Id: I9bda6d2102f65cae3d436a2087b47ed8bafeb068 Reviewed-on: https://go-review.googlesource.com/129415 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Keith Randall <khr@golang.org>
-
- 13 Sep, 2018 1 commit
-
-
erifan01 authored
math.RoundToEven can be done by one arm64 instruction FRINTND, intrinsify it to improve performance. The current pure Go implementation of the function Abs is translated into five instructions on arm64: str, ldr, and, str, ldr. The intrinsic implementation requires only one instruction, so in terms of performance, intrinsify it is worthwhile. Benchmarks: name old time/op new time/op delta Abs-8 3.50ns ± 0% 1.50ns ± 0% -57.14% (p=0.000 n=10+10) RoundToEven-8 9.26ns ± 0% 1.50ns ± 0% -83.80% (p=0.000 n=10+10) Change-Id: I9456b26ab282b544dfac0154fc86f17aed96ac3d Reviewed-on: https://go-review.googlesource.com/116535Reviewed-by:
Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
- 07 Sep, 2018 1 commit
-
-
erifan01 authored
Add some rules to match the Go code like: y &= 63 x << y | x >> (64-y) or y &= 63 x >> y | x << (64-y) as a ROR instruction. Make math/bits.RotateLeft faster on arm64. Extends CL 132435 to arm64. Benchmarks of math/bits.RotateLeftxxN: name old time/op new time/op delta RotateLeft-8 3.548750ns +- 1% 2.003750ns +- 0% -43.54% (p=0.000 n=8+8) RotateLeft8-8 3.925000ns +- 0% 3.925000ns +- 0% ~ (p=1.000 n=8+8) RotateLeft16-8 3.925000ns +- 0% 3.927500ns +- 0% ~ (p=0.608 n=8+8) RotateLeft32-8 3.925000ns +- 0% 2.002500ns +- 0% -48.98% (p=0.000 n=8+8) RotateLeft64-8 3.536250ns +- 0% 2.003750ns +- 0% -43.34% (p=0.000 n=8+8) Change-Id: I77622cd7f39b917427e060647321f5513973232c Reviewed-on: https://go-review.googlesource.com/122542 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Cherry Zhang <cherryyz@google.com>
-
- 05 Sep, 2018 1 commit
-
-
Michael Munday authored
Extends CL 132435 to s390x. s390x has 32- and 64-bit variable rotate left instructions. Change-Id: Ic4f1ebb0e0543207ed2fc8c119e0163b428138a5 Reviewed-on: https://go-review.googlesource.com/133035 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Cherry Zhang <cherryyz@google.com>
-
- 03 Sep, 2018 1 commit
-
-
Michael Munday authored
This CL implements the math/bits.OnesCount{8,16,32,64} functions as intrinsics on s390x using the 'population count' (popcnt) instruction. This instruction was released as the 'population-count' facility which uses the same facility bit (45) as the 'distinct-operands' facility which is a pre-requisite for Go on s390x. We can therefore use it without a feature check. The s390x popcnt instruction treats a 64 bit register as a vector of 8 bytes, summing the number of ones in each byte individually. It then writes the results to the corresponding bytes in the output register. Therefore to implement OnesCount{16,32,64} we need to sum the individual byte counts using some extra instructions. To do this efficiently I've added some additional pseudo operations to the s390x SSA backend. Unlike other architectures the new instruction sequence is faster for OnesCount8, so that is implemented using the intrinsic. name old time/op new time/op delta OnesCount 3.21ns ± 1% 1.35ns ± 0% -58.00% (p=0.000 n=20+20) OnesCount8 0.91ns ± 1% 0.81ns ± 0% -11.43% (p=0.000 n=20+20) OnesCount16 1.51ns ± 3% 1.21ns ± 0% -19.71% (p=0.000 n=20+17) OnesCount32 1.91ns ± 0% 1.12ns ± 1% -41.60% (p=0.000 n=19+20) OnesCount64 3.18ns ± 4% 1.35ns ± 0% -57.52% (p=0.000 n=20+20) Change-Id: Id54f0bd28b6db9a887ad12c0d72fcc168ef9c4e0 Reviewed-on: https://go-review.googlesource.com/114675 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Cherry Zhang <cherryyz@google.com>
-
- 30 Aug, 2018 1 commit
-
-
Andrew Bonventre authored
Previously, pattern matching was good enough to achieve good performance for the RotateLeft* functions, but the inlining cost for them was much too high. Make RotateLeft* intrinsic on amd64 as a stop-gap for now to reduce inlining costs. This should be done (or at least looked at) for other architectures as well. Updates golang/go#17566 Change-Id: I6a106ff00b6c4e3f490650af3e083ed2be00c819 Reviewed-on: https://go-review.googlesource.com/132435 Run-TryBot: Andrew Bonventre <andybons@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
David Chase <drchase@google.com>
-
- 24 Aug, 2018 1 commit
-
-
Yury Smolsky authored
This change adds a new column, AST IR. That column contains nodes for a function specified in $GOSSAFUNC. Also this CL enables horizontal scrolling of sources and AST columns. Fixes #26662 Change-Id: I3fba39fd998bb05e9c93038e8ec2384c69613b24 Reviewed-on: https://go-review.googlesource.com/126858 Run-TryBot: Yury Smolsky <yury@smolsky.by> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Keith Randall <khr@golang.org>
-
- 23 Aug, 2018 3 commits
-
-
Kazuhiro Sera authored
Change-Id: Iadb3c5de8ae9ea45855013997ed70f7929a88661 GitHub-Last-Rev: ae85bcf82be8fee533e2b9901c6133921382c70a GitHub-Pull-Request: golang/go#26920 Reviewed-on: https://go-review.googlesource.com/128955Reviewed-by:
Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Yury Smolsky authored
This CL adds the source code of all inlined functions into the function specified in $GOSSAFUNC. The code is appended to the sources column of ssa.html. ssaDumpInlined is populated with references to inlined functions. Then it is used for dumping the sources in buildssa. The source columns contains code in following order: target function, inlined functions sorted by filename, lineno. Fixes #25904 Change-Id: I4f6d4834376f1efdfda1f968a5335c0543ed36bc Reviewed-on: https://go-review.googlesource.com/126606 Run-TryBot: Yury Smolsky <yury@smolsky.by> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Keith Randall <khr@golang.org>
-
Yury Smolsky authored
Since we print almost everything to ssa.html in the GOSSAFUNC mode, there is a need to stop spamming stdout when user just wants to see ssa.html. This changes cleans output of the GOSSAFUNC debug mode. To enable the dump of the debug data to stdout, one must put suffix + after the function name like that: GOSSAFUNC=Foo+ Otherwise gc will not print the IR and ASM to stdout after each phase. AST IR is still sent to stdout because it is not included into ssa.html. It will be fixed in a separate change. The change adds printing out the full path to the ssa.html file. Updates #25942 Change-Id: I711e145e05f0443c7df5459ca528dced273a62ee Reviewed-on: https://go-review.googlesource.com/126603 Run-TryBot: Yury Smolsky <yury@smolsky.by> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Keith Randall <khr@golang.org>
-
- 22 Aug, 2018 2 commits
-
-
Yury Smolsky authored
Store the value of GOSSAFUNC in a global variable to avoid multiple calls to os.Getenv from gc.buildssa and gc.mkinlcall1. The latter is implemented in the CL 126606. Updates #25942 Change-Id: I58caaef2fee23694d80dc5a561a2e809bf077fa4 Reviewed-on: https://go-review.googlesource.com/126604 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Brad Fitzpatrick <bradfitz@golang.org>
-
Yury Smolsky authored
This CL adds the "sources" column at the beginning of SSA table. This column displays the source code for the function being passed in the GOSSAFUNC env variable. Also UI was extended so that clicking on particular line will highlight all places this line is referenced. JS code was cleaned and formatted. This CL does not handle inlined functions. See issue 25904. Change-Id: Ic7833a0b05e38795f4cf090f3dc82abf62d97026 Reviewed-on: https://go-review.googlesource.com/119035 Run-TryBot: Yury Smolsky <yury@smolsky.by> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Keith Randall <khr@golang.org>
-
- 21 Aug, 2018 2 commits
-
-
Daniel Martí authored
ssaConfig.Race was being set by many goroutines concurrently, resulting in a data race seen below. This was very likely introduced by CL 121235. WARNING: DATA RACE Write at 0x00c000344408 by goroutine 12: cmd/compile/internal/gc.buildssa() /workdir/go/src/cmd/compile/internal/gc/ssa.go:134 +0x7a8 cmd/compile/internal/gc.compileSSA() /workdir/go/src/cmd/compile/internal/gc/pgen.go:259 +0x5d cmd/compile/internal/gc.compileFunctions.func2() /workdir/go/src/cmd/compile/internal/gc/pgen.go:323 +0x5a Previous write at 0x00c000344408 by goroutine 11: cmd/compile/internal/gc.buildssa() /workdir/go/src/cmd/compile/internal/gc/ssa.go:134 +0x7a8 cmd/compile/internal/gc.compileSSA() /workdir/go/src/cmd/compile/internal/gc/pgen.go:259 +0x5d cmd/compile/internal/gc.compileFunctions.func2() /workdir/go/src/cmd/compile/internal/gc/pgen.go:323 +0x5a Goroutine 12 (running) created at: cmd/compile/internal/gc.compileFunctions() /workdir/go/src/cmd/compile/internal/gc/pgen.go:321 +0x39b cmd/compile/internal/gc.Main() /workdir/go/src/cmd/compile/internal/gc/main.go:651 +0x437d main.main() /workdir/go/src/cmd/compile/main.go:51 +0x100 Goroutine 11 (running) created at: cmd/compile/internal/gc.compileFunctions() /workdir/go/src/cmd/compile/internal/gc/pgen.go:321 +0x39b cmd/compile/internal/gc.Main() /workdir/go/src/cmd/compile/internal/gc/main.go:651 +0x437d main.main() /workdir/go/src/cmd/compile/main.go:51 +0x100 Instead, set up the field exactly once as part of initssaconfig. Change-Id: I2c30c6b1cf92b8fd98e7cb5c2e10c526467d0b0a Reviewed-on: https://go-review.googlesource.com/130375 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Dave Cheney <dave@cheney.net>
-
Tobias Klauser authored
They are no longer used outside the package since CL 38080. Passes toolstash-check -all Change-Id: I30977ed2b233b7c8c53632cc420938bc3b0e37c6 Reviewed-on: https://go-review.googlesource.com/129781 Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Brad Fitzpatrick <bradfitz@golang.org>
-
- 20 Aug, 2018 1 commit
-
-
Ilya Tocar authored
When compiling with -race, we insert calls to racefuncentry, into every function. Add a rule that removes them in leaf functions, without instrumented loads/stores. Shaves ~30kb from "-race" version of go tool: file difference: go_old 15626192 go_new 15597520 [-28672 bytes] section differences: global text (code) = -24513 bytes (-0.358598%) read-only data = -5849 bytes (-0.167064%) Total difference -30362 bytes (-0.097928%) Fixes #24662 Change-Id: Ia63bf1827f4cf2c25e3e28dcd097c150994ade0a Reviewed-on: https://go-review.googlesource.com/121235 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Keith Randall <khr@golang.org>
-
- 14 Aug, 2018 1 commit
-
-
Richard Musiol authored
This commit adds an explicit nil check for closure calls on wasm, so calling a nil func causes a proper panic instead of crashing on the WebAssembly level. Change-Id: I6246844f316677976cdd420618be5664444c25ae Reviewed-on: https://go-review.googlesource.com/127759 Run-TryBot: Richard Musiol <neelance@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Cherry Zhang <cherryyz@google.com>
-
- 12 Jul, 2018 1 commit
-
-
David Chase authored
Lack of a well-defined order between VarDef and related address operations sometimes causes problems with store order and write barrier transformations; glitches in the order are made irreparable (by later optimizations) if the two parts of the glitch straddle a split in the original block caused by insertion of a write barrier diamond. Fix this by creating a LocalAddr for addresses of locals (what VarDef matters for) that takes a memory input to help make the order explicit. Addr is modified to only be legal for SB operand, so there is no overlap between Addr and LocalAddr uses (there may be some downstream cleanup from this). Changes to generic.rules and rewrite.go ensure that codegen tests continue to pass; CSE of LocalAddr is impaired, not quite sure of the cost. Fixes #26105. Change-Id: Id4192b4440aa4e9d7ba54a465c456df9b530b515 Reviewed-on: https://go-review.googlesource.com/122483 Run-TryBot: David Chase <drchase@google.com> Reviewed-by:
Keith Randall <khr@golang.org>
-