1. 29 May, 2014 2 commits
    • Russ Cox's avatar
      cmd/gc: fix x=x crash · 89d46fed
      Russ Cox authored
      [Same as CL 102820043 except applied changes to 6g/gsubr.c
      also to 5g/gsubr.c and 8g/gsubr.c. The problem I had last night
      trying to do that was that 8g's copy of nodarg has different
      (but equivalent) control flow and I was pasting the new code
      into the wrong place.]
      
      Description from CL 102820043:
      
      The 'nodarg' function is used to obtain a Node*
      representing a function argument or result.
      It returned a brand new Node*, but that violates
      the guarantee in most places in the compiler that
      two Node*s refer to the same variable if and only if
      they are the same Node* pointer. Reestablish that
      invariant by making nodarg return a preexisting
      named variable if present.
      
      Having fixed that, avoid any copy during x=x in
      componentgen, because the VARDEF we emit
      before the copy marks the lhs x as dead incorrectly.
      
      The change in walk.c avoids modifying the result
      of nodarg. This was the only place in the compiler
      that did so.
      
      Fixes #8097.
      
      LGTM=khr
      R=golang-codereviews, khr
      CC=golang-codereviews, iant, khr, r
      https://golang.org/cl/103750043
      89d46fed
    • Russ Cox's avatar
      undo CL 102820043 / b0ce6dbafc18 · 9dd062b8
      Russ Cox authored
      Breaks 386 and arm builds.
      The obvious reason is that this CL only edited 6g/gsubr.c
      and failed to edit 5g/gsubr.c and 8g/gsubr.c.
      However, the obvious CL applying the same edit to those
      files (CL 101900043) causes mysterious build failures
      in various of the standard package tests, usually involving
      reflect. Something deep and subtle is broken but only on
      the 32-bit systems.
      
      Undo this CL for now.
      
      ««« original CL description
      cmd/gc: fix x=x crash
      
      The 'nodarg' function is used to obtain a Node*
      representing a function argument or result.
      It returned a brand new Node*, but that violates
      the guarantee in most places in the compiler that
      two Node*s refer to the same variable if and only if
      they are the same Node* pointer. Reestablish that
      invariant by making nodarg return a preexisting
      named variable if present.
      
      Having fixed that, avoid any copy during x=x in
      componentgen, because the VARDEF we emit
      before the copy marks the lhs x as dead incorrectly.
      
      The change in walk.c avoids modifying the result
      of nodarg. This was the only place in the compiler
      that did so.
      
      Fixes #8097.
      
      LGTM=r, khr
      R=golang-codereviews, r, khr
      CC=golang-codereviews, iant
      https://golang.org/cl/102820043
      »»»
      
      TBR=r
      CC=golang-codereviews, khr
      https://golang.org/cl/95660043
      9dd062b8
  2. 28 May, 2014 1 commit
    • Russ Cox's avatar
      cmd/gc: fix x=x crash · 948b2c72
      Russ Cox authored
      The 'nodarg' function is used to obtain a Node*
      representing a function argument or result.
      It returned a brand new Node*, but that violates
      the guarantee in most places in the compiler that
      two Node*s refer to the same variable if and only if
      they are the same Node* pointer. Reestablish that
      invariant by making nodarg return a preexisting
      named variable if present.
      
      Having fixed that, avoid any copy during x=x in
      componentgen, because the VARDEF we emit
      before the copy marks the lhs x as dead incorrectly.
      
      The change in walk.c avoids modifying the result
      of nodarg. This was the only place in the compiler
      that did so.
      
      Fixes #8097.
      
      LGTM=r, khr
      R=golang-codereviews, r, khr
      CC=golang-codereviews, iant
      https://golang.org/cl/102820043
      948b2c72
  3. 12 May, 2014 1 commit
    • Josh Bleecher Snyder's avatar
      cmd/gc: alias more variables during register allocation · 03c0f3fe
      Josh Bleecher Snyder authored
      This is joint work with Daniel Morsing.
      
      In order for the register allocator to alias two variables, they must have the same width, stack offset, and etype. Code generation was altering a variable's etype in a few places. This prevented the variable from being moved to a register, which in turn prevented peephole optimization. This failure to alias was very common, with almost 23,000 instances just running make.bash.
      
      This phenomenon was not visible in the register allocation debug output because the variables that failed to alias had the same name. The debugging-only change to bits.c fixes this by printing the variable number with its name.
      
      This CL fixes the source of all etype mismatches for 6g, all but one case for 8g, and depressingly few cases for 5g. (I believe that extending CL 6819083 to 5g is a prerequisite.) Fixing the remaining cases in 8g and 5g is work for the future.
      
      The etype mismatch fixes are:
      
      * [gc] Slicing changed the type of the base pointer into a uintptr in order to perform arithmetic on it. Instead, support addition directly on pointers.
      
      * [*g] OSPTR was giving type uintptr to slice base pointers; undo that. This arose, for example, while compiling copy(dst, src).
      
      * [8g] 64 bit float conversion was assigning int64 type during codegen, overwriting the existing uint64 type.
      
      Note that some etype mismatches are appropriate, such as a struct with a single field or an array with a single element.
      
      With these fixes, the number of registerizations that occur while running make.bash for 6g increases ~10%. Hello world binary size shrinks ~1.5%. Running all benchmarks in the standard library show performance improvements ranging from nominal to substantive (>10%); a full comparison using 6g on my laptop is available at https://gist.github.com/josharian/8f9b5beb46667c272064. The microbenchmarks must be taken with a grain of salt; see issue 7920. The few benchmarks that show real regressions are likely due to issue 7920. I manually examined the generated code for the top few regressions and none had any assembly output changes. The few benchmarks that show extraordinary improvements are likely also due to issue 7920.
      
      Performance results from 8g appear similar to 6g.
      
      5g shows no performance improvements. This is not surprising, given the discussion above.
      
      Update #7316
      
      LGTM=rsc
      R=rsc, daniel.morsing, bradfitz
      CC=dave, golang-codereviews
      https://golang.org/cl/91850043
      03c0f3fe
  4. 04 Apr, 2014 1 commit
  5. 01 Apr, 2014 1 commit
    • Keith Randall's avatar
      runtime: get rid of most uses of REP for copying/zeroing. · 6c7cbf08
      Keith Randall authored
      REP MOVSQ and REP STOSQ have a really high startup overhead.
      Use a Duff's device to do the repetition instead.
      
      benchmark                 old ns/op     new ns/op     delta
      BenchmarkClearFat32       7.20          1.60          -77.78%
      BenchmarkCopyFat32        6.88          2.38          -65.41%
      BenchmarkClearFat64       7.15          3.20          -55.24%
      BenchmarkCopyFat64        6.88          3.44          -50.00%
      BenchmarkClearFat128      9.53          5.34          -43.97%
      BenchmarkCopyFat128       9.27          5.56          -40.02%
      BenchmarkClearFat256      13.8          9.53          -30.94%
      BenchmarkCopyFat256       13.5          10.3          -23.70%
      BenchmarkClearFat512      22.3          18.0          -19.28%
      BenchmarkCopyFat512       22.0          19.7          -10.45%
      BenchmarkCopyFat1024      36.5          38.4          +5.21%
      BenchmarkClearFat1024     35.1          35.0          -0.28%
      
      TODO: use for stack frame zeroing
      TODO: REP prefixes are still used for "reverse" copying when src/dst
      regions overlap.  Might be worth fixing.
      
      LGTM=rsc
      R=golang-codereviews, rsc
      CC=golang-codereviews, r
      https://golang.org/cl/81370046
      6c7cbf08
  6. 20 Mar, 2014 1 commit
    • Rémy Oudompheng's avatar
      cmd/6g, cmd/8g: skip CONVNOP nodes in bgen. · 0285d2b9
      Rémy Oudompheng authored
      Revision 3ae4607a43ff introduced CONVNOP layers
      to fix type checking issues arising from comparisons.
      The added complexity made 8g run out of registers
      when compiling an equality function in go.net/ipv6.
      
      A similar issue occurred in test/sizeof.go on
      amd64p32 with 6g.
      
      Fixes #7405.
      
      LGTM=khr
      R=rsc, dave, iant, khr
      CC=golang-codereviews
      https://golang.org/cl/78100044
      0285d2b9
  7. 26 Feb, 2014 1 commit
  8. 15 Feb, 2014 1 commit
    • Russ Cox's avatar
      cmd/gc: correct liveness for fat variables · 7a7c0ffb
      Russ Cox authored
      The VARDEF placement must be before the initialization
      but after any final use. If you have something like s = ... using s ...
      the rhs must be evaluated, then the VARDEF, then the lhs
      assigned.
      
      There is a large comment in pgen.c on gvardef explaining
      this in more detail.
      
      This CL also includes Ian's suggestions from earlier CLs,
      namely commenting the use of mode in link.h and fixing
      the precedence of the ~r check in dcl.c.
      
      This CL enables the check that if liveness analysis decides
      a variable is live on entry to the function, that variable must
      be a function parameter (not a result, and not a local variable).
      If this check fails, it indicates a bug in the liveness analysis or
      in the generated code being analyzed.
      
      The race detector generates invalid code for append(x, y...).
      The code declares a temporary t and then uses cap(t) before
      initializing t. The new liveness check catches this bug and
      stops the compiler from writing out the buggy code.
      Consequently, this CL disables the race detector tests in
      run.bash until the race detector bug can be fixed
      (golang.org/issue/7334).
      
      Except for the race detector bug, the liveness analysis check
      does not detect any problems (this CL and the previous CLs
      fixed all the detected problems).
      
      The net test still fails with GOGC=0 but the rest of the tests
      now pass or time out (because GOGC=0 is so slow).
      
      TBR=iant
      CC=golang-codereviews
      https://golang.org/cl/64170043
      7a7c0ffb
  9. 14 Feb, 2014 1 commit
  10. 17 Sep, 2013 1 commit
    • Russ Cox's avatar
      cmd/gc: eliminate redundant &x.Field nil checks · aa0439ba
      Russ Cox authored
      This eliminates ~75% of the nil checks being emitted,
      on all architectures. We can do better, but we need
      a bit more general support from the compiler, and
      I don't want to do that so close to Go 1.2.
      What's here is simple but effective and safe.
      
      A few small code generation cleanups were required
      to make the analysis consistent on all systems about
      which nil checks are omitted, at least in the test.
      
      Fixes #6019.
      
      R=ken2
      CC=golang-dev
      https://golang.org/cl/13334052
      aa0439ba
  11. 11 Sep, 2013 1 commit
    • Rémy Oudompheng's avatar
      cmd/gc: inline copy in frontend to call memmove directly. · ff416a3f
      Rémy Oudompheng authored
      A new node type OSPTR is added to refer to the data pointer of
      strings and slices in a simple way during walk(). It will be
      useful for future work on simplification of slice arithmetic.
      
      benchmark                  old ns/op    new ns/op    delta
      BenchmarkCopy1Byte                 9            8  -13.98%
      BenchmarkCopy2Byte                14            8  -40.49%
      BenchmarkCopy4Byte                13            8  -35.04%
      BenchmarkCopy8Byte                13            8  -37.10%
      BenchmarkCopy12Byte               14           12  -15.38%
      BenchmarkCopy16Byte               14           12  -17.24%
      BenchmarkCopy32Byte               19           14  -27.32%
      BenchmarkCopy128Byte              31           26  -15.29%
      BenchmarkCopy1024Byte            100           92   -7.50%
      BenchmarkCopy1String              10            7  -28.99%
      BenchmarkCopy2String              10            7  -28.06%
      BenchmarkCopy4String              10            8  -22.69%
      BenchmarkCopy8String              10            8  -23.30%
      BenchmarkCopy12String             11           11   -5.88%
      BenchmarkCopy16String             11           11   -5.08%
      BenchmarkCopy32String             15           14   -6.58%
      BenchmarkCopy128String            28           25  -10.60%
      BenchmarkCopy1024String           95           95   +0.53%
      
      R=golang-dev, bradfitz, cshapiro, dave, daniel.morsing, rsc, khr, khr
      CC=golang-dev
      https://golang.org/cl/9101048
      ff416a3f
  12. 15 Aug, 2013 1 commit
  13. 02 Jul, 2013 1 commit
  14. 09 Jun, 2013 1 commit
  15. 30 Apr, 2013 1 commit
  16. 24 Apr, 2013 1 commit
  17. 07 Mar, 2013 1 commit
  18. 02 Jan, 2013 1 commit
  19. 21 Dec, 2012 1 commit
  20. 27 Nov, 2012 1 commit
  21. 26 Nov, 2012 1 commit
  22. 21 Nov, 2012 1 commit
  23. 01 Nov, 2012 1 commit
    • Rémy Oudompheng's avatar
      cmd/5g, cmd/6g, cmd/8g: remove width check for componentgen. · 022b361a
      Rémy Oudompheng authored
      The move to 64-bit ints in 6g made componentgen ineffective.
      In componentgen, the code already selects which values it can handle.
      
      On amd64:
      benchmark                 old ns/op    new ns/op    delta
      BenchmarkBinaryTree17    9477970000   9582314000   +1.10%
      BenchmarkFannkuch11      5928750000   5255080000  -11.36%
      BenchmarkGobDecode         37103040     31451120  -15.23%
      BenchmarkGobEncode         16042490     16844730   +5.00%
      BenchmarkGzip             811337400    741373600   -8.62%
      BenchmarkGunzip           197928700    192844500   -2.57%
      BenchmarkJSONEncode       224164100    140064200  -37.52%
      BenchmarkJSONDecode       258346800    231829000  -10.26%
      BenchmarkMandelbrot200      7561780      7601615   +0.53%
      BenchmarkParse             12970340     11624360  -10.38%
      BenchmarkRevcomp         1969917000   1699137000  -13.75%
      BenchmarkTemplate         296182000    263117400  -11.16%
      
      R=nigeltao, dave, daniel.morsing
      CC=golang-dev
      https://golang.org/cl/6821052
      022b361a
  24. 16 Oct, 2012 1 commit
  25. 02 Oct, 2012 1 commit
    • Rémy Oudompheng's avatar
      cmd/8g: do not take the address of string/slice for &s[i] · 2de064b6
      Rémy Oudompheng authored
      A similar change was made in 6g recently.
      
      LEALs in cmd/go: 31440 before, 27867 after.
      
      benchmark                 old ns/op    new ns/op    delta
      BenchmarkBinaryTree17    7065794000   6723617000   -4.84%
      BenchmarkFannkuch11      7767395000   7477945000   -3.73%
      BenchmarkGobDecode         34708140     34857820   +0.43%
      BenchmarkGobEncode         10998780     10960060   -0.35%
      BenchmarkGzip            1603630000   1471052000   -8.27%
      BenchmarkGunzip           242573900    240650400   -0.79%
      BenchmarkJSONEncode       120842200    117966100   -2.38%
      BenchmarkJSONDecode       247254900    249103100   +0.75%
      BenchmarkMandelbrot200     29237330     29241790   +0.02%
      BenchmarkParse              8111320      8096865   -0.18%
      BenchmarkRevcomp         2595780000   2694153000   +3.79%
      BenchmarkTemplate         276679600    264497000   -4.40%
      
      benchmark                              old ns/op    new ns/op    delta
      BenchmarkAppendFloatDecimal                  429          416   -3.03%
      BenchmarkAppendFloat                         780          740   -5.13%
      BenchmarkAppendFloatExp                      746          700   -6.17%
      BenchmarkAppendFloatNegExp                   752          694   -7.71%
      BenchmarkAppendFloatBig                     1228         1108   -9.77%
      BenchmarkAppendFloat32Integer                457          416   -8.97%
      BenchmarkAppendFloat32ExactFraction          662          631   -4.68%
      BenchmarkAppendFloat32Point                  771          735   -4.67%
      BenchmarkAppendFloat32Exp                    722          672   -6.93%
      BenchmarkAppendFloat32NegExp                 724          659   -8.98%
      BenchmarkAppendFloat64Fixed1                 429          400   -6.76%
      BenchmarkAppendFloat64Fixed2                 463          442   -4.54%
      
      Update #1914.
      
      R=golang-dev, daniel.morsing, rsc
      CC=golang-dev
      https://golang.org/cl/6574043
      2de064b6
  26. 26 Sep, 2012 1 commit
    • Rémy Oudompheng's avatar
      cmd/6g, cmd/8g: fix two "out of fixed registers" cases. · 6feb6132
      Rémy Oudompheng authored
      In two cases, registers were allocated too early resulting
      in exhausting of available registers when nesting these
      operations.
      
      The case of method calls was due to missing cases in igen,
      which only makes calls but doesn't allocate a register for
      the result.
      
      The case of 8-bit multiplication was due to a wrong order
      in register allocation when Ullman numbers were bigger on the
      RHS.
      
      Fixes #3907.
      Fixes #4156.
      
      R=rsc
      CC=golang-dev, remy
      https://golang.org/cl/6560054
      6feb6132
  27. 24 Sep, 2012 2 commits
    • Rémy Oudompheng's avatar
      cmd/6g, cmd/8g: add OINDREG, ODOT, ODOTPTR cases to igen. · f4e76d5e
      Rémy Oudompheng authored
      Apart from reducing the number of LEAL/LEAQ instructions by about
      30%, it gives 8g easier registerization in several cases,
      for example in strconv. Performance with 6g is not affected.
      
      Before (386):
      src/pkg/strconv/decimal.go:22   TEXT  (*decimal).String+0(SB),$240-12
      src/pkg/strconv/extfloat.go:540 TEXT  (*extFloat).ShortestDecimal+0(SB),$584-20
      
      After (386):
      src/pkg/strconv/decimal.go:22   TEXT  (*decimal).String+0(SB),$196-12
      src/pkg/strconv/extfloat.go:540 TEXT  (*extFloat).ShortestDecimal+0(SB),$420-20
      
      Benchmarks with GOARCH=386 (on a Core 2).
      
      benchmark                 old ns/op    new ns/op    delta
      BenchmarkBinaryTree17    7110191000   7079644000   -0.43%
      BenchmarkFannkuch11      7769274000   7766514000   -0.04%
      BenchmarkGobDecode         33454820     34755400   +3.89%
      BenchmarkGobEncode         11675710     11007050   -5.73%
      BenchmarkGzip            2013519000   1593855000  -20.84%
      BenchmarkGunzip           253368200    242667600   -4.22%
      BenchmarkJSONEncode       152443900    120763400  -20.78%
      BenchmarkJSONDecode       304112800    247461800  -18.63%
      BenchmarkMandelbrot200     29245520     29240490   -0.02%
      BenchmarkParse              8484105      8088660   -4.66%
      BenchmarkRevcomp         2695688000   2841263000   +5.40%
      BenchmarkTemplate         363759800    277271200  -23.78%
      
      benchmark                       old ns/op    new ns/op    delta
      BenchmarkAtof64Decimal                127          129   +1.57%
      BenchmarkAtof64Float                  166          164   -1.20%
      BenchmarkAtof64FloatExp               308          300   -2.60%
      BenchmarkAtof64Big                    584          571   -2.23%
      BenchmarkAppendFloatDecimal           440          430   -2.27%
      BenchmarkAppendFloat                  995          776  -22.01%
      BenchmarkAppendFloatExp               897          746  -16.83%
      BenchmarkAppendFloatNegExp            900          752  -16.44%
      BenchmarkAppendFloatBig              1528         1228  -19.63%
      BenchmarkAppendFloat32Integer         443          453   +2.26%
      BenchmarkAppendFloat32ExactFraction   812          661  -18.60%
      BenchmarkAppendFloat32Point          1002          773  -22.85%
      BenchmarkAppendFloat32Exp             858          725  -15.50%
      BenchmarkAppendFloat32NegExp          848          728  -14.15%
      BenchmarkAppendFloat64Fixed1          447          431   -3.58%
      BenchmarkAppendFloat64Fixed2          480          462   -3.75%
      BenchmarkAppendFloat64Fixed3          461          457   -0.87%
      BenchmarkAppendFloat64Fixed4          509          484   -4.91%
      
      Update #1914.
      
      R=rsc, nigeltao
      CC=golang-dev, remy
      https://golang.org/cl/6494107
      f4e76d5e
    • Rémy Oudompheng's avatar
      cmd/8g: don't create redundant temporaries in bgen. · 14f3276c
      Rémy Oudompheng authored
      Comparisons used to create temporaries for arguments
      even if they were already variables or addressable.
      Removing the extra ones reduces pressure on regopt.
      
      benchmark                 old ns/op    new ns/op    delta
      BenchmarkGobDecode         50787620     49908980   -1.73%
      BenchmarkGobEncode         19870190     19473030   -2.00%
      BenchmarkGzip            3214321000   3067929000   -4.55%
      BenchmarkGunzip           496792800    465828600   -6.23%
      BenchmarkJSONEncode       232524800    263864400  +13.48%
      BenchmarkJSONDecode       622038400    506600600  -18.56%
      BenchmarkMandelbrot200     23937310     45913060  +91.81%
      BenchmarkParse             14364450     13997010   -2.56%
      BenchmarkRevcomp         6919028000   6480009000   -6.35%
      BenchmarkTemplate         594458800    539528200   -9.24%
      
      benchmark                  old MB/s     new MB/s  speedup
      BenchmarkGobDecode            15.11        15.38    1.02x
      BenchmarkGobEncode            38.63        39.42    1.02x
      BenchmarkGzip                  6.04         6.33    1.05x
      BenchmarkGunzip               39.06        41.66    1.07x
      BenchmarkJSONEncode            8.35         7.35    0.88x
      BenchmarkJSONDecode            3.12         3.83    1.23x
      BenchmarkParse                 4.03         4.14    1.03x
      BenchmarkRevcomp              36.73        39.22    1.07x
      BenchmarkTemplate              3.26         3.60    1.10x
      
      R=mtj, daniel.morsing, rsc
      CC=golang-dev
      https://golang.org/cl/6547064
      14f3276c
  28. 23 Sep, 2012 1 commit
  29. 12 Sep, 2012 1 commit
  30. 11 Sep, 2012 1 commit
  31. 09 Sep, 2012 1 commit
    • Rémy Oudompheng's avatar
      cmd/8g: import componentgen from 6g. · ae0862c1
      Rémy Oudompheng authored
      This makes the compilers code more similar and improves
      code generation a lot.
      
      The number of LEAL instructions generated for cmd/go drops
      by 60%.
      
      % GOARCH=386 go build -gcflags -S -a cmd/go | grep LEAL | wc -l
      Before:       89774
      After:        47548
      
      benchmark                              old ns/op    new ns/op    delta
      BenchmarkAppendFloatDecimal                  540          444  -17.78%
      BenchmarkAppendFloat                        1160         1035  -10.78%
      BenchmarkAppendFloatExp                     1060          922  -13.02%
      BenchmarkAppendFloatNegExp                  1053          920  -12.63%
      BenchmarkAppendFloatBig                     1773         1558  -12.13%
      BenchmarkFormatInt                         13065        12481   -4.47%
      BenchmarkAppendInt                         10981         9900   -9.84%
      BenchmarkFormatUint                         3804         3650   -4.05%
      BenchmarkAppendUint                         3506         3303   -5.79%
      BenchmarkUnquoteEasy                         714          683   -4.34%
      BenchmarkUnquoteHard                        5117         2915  -43.03%
      
      Update #1914.
      
      R=nigeltao, rsc, golang-dev
      CC=golang-dev, remy
      https://golang.org/cl/6489067
      ae0862c1
  32. 23 Aug, 2012 1 commit
  33. 14 Jun, 2012 1 commit
    • Nigel Tao's avatar
      cmd/gc: inline convT2E when T is uintptr-shaped. · 8f84328f
      Nigel Tao authored
      GOARCH=amd64 benchmarks
      
      src/pkg/runtime
      benchmark                  old ns/op    new ns/op    delta
      BenchmarkConvT2ESmall             10           10   +1.00%
      BenchmarkConvT2EUintptr            9            0  -92.07%
      BenchmarkConvT2EBig               74           74   -0.27%
      BenchmarkConvT2I                  27           26   -3.62%
      BenchmarkConvI2E                   4            4   -7.05%
      BenchmarkConvI2I                  20           19   -2.99%
      
      test/bench/go1
      benchmark                 old ns/op    new ns/op    delta
      BenchmarkBinaryTree17    5930908000   5937260000   +0.11%
      BenchmarkFannkuch11      3927057000   3933556000   +0.17%
      BenchmarkGobDecode         21998090     21870620   -0.58%
      BenchmarkGobEncode         12725310     12734480   +0.07%
      BenchmarkGzip             567617600    567892800   +0.05%
      BenchmarkGunzip           178284100    178706900   +0.24%
      BenchmarkJSONEncode        87693550     86794300   -1.03%
      BenchmarkJSONDecode       314212600    324115000   +3.15%
      BenchmarkMandelbrot200      7016640      7073766   +0.81%
      BenchmarkParse              7852100      7892085   +0.51%
      BenchmarkRevcomp         1285663000   1286147000   +0.04%
      BenchmarkTemplate         566823800    567606200   +0.14%
      
      I'm not entirely sure why the JSON* numbers have changed, but
      eyeballing the profile suggests that it could be spending less
      and more time in runtime.{new,old}stack, so it could simply be
      stack-split boundary noise.
      
      R=rsc, dave, bsiegert, dsymonds
      CC=golang-dev
      https://golang.org/cl/6280049
      8f84328f
  34. 03 Jun, 2012 1 commit
  35. 30 May, 2012 1 commit
    • Russ Cox's avatar
      cmd/gc: contiguous loop layout · 001b75c9
      Russ Cox authored
      Drop expecttaken function in favor of extra argument
      to gbranch and bgen. Mark loop condition as likely to
      be true, so that loops are generated inline.
      
      The main benefit here is contiguous code when trying
      to read the generated assembly. It has only minor effects
      on the timing, and they mostly cancel the minor effects
      that aligning function entry points had.  One exception:
      both changes made Fannkuch faster.
      
      Compared to before CL 6244066 (before aligned functions)
      benchmark                 old ns/op    new ns/op    delta
      BenchmarkBinaryTree17    4222117400   4201958800   -0.48%
      BenchmarkFannkuch11      3462631800   3215908600   -7.13%
      BenchmarkGobDecode         20887622     20899164   +0.06%
      BenchmarkGobEncode          9548772      9439083   -1.15%
      BenchmarkGzip                151687       152060   +0.25%
      BenchmarkGunzip                8742         8711   -0.35%
      BenchmarkJSONEncode        62730560     62686700   -0.07%
      BenchmarkJSONDecode       252569180    252368960   -0.0...
      001b75c9
  36. 29 May, 2012 1 commit
    • Russ Cox's avatar
      cmd/6g, cmd/8g: move panicindex calls out of line · fefae6ee
      Russ Cox authored
      The old code generated for a bounds check was
                      CMP
                      JLT ok
                      CALL panicindex
              ok:
                      ...
      
      The new code is (once the linker finishes with it):
                      CMP
                      JGE panic
                      ...
              panic:
                      CALL panicindex
      
      which moves the calls out of line, putting more useful
      code in each cache line.  This matters especially in tight
      loops, such as in Fannkuch.  The benefit is more modest
      elsewhere, but real.
      
      From test/bench/go1, amd64:
      
      benchmark                old ns/op    new ns/op    delta
      BenchmarkBinaryTree17   6096092000   6088808000   -0.12%
      BenchmarkFannkuch11     6151404000   4020463000  -34.64%
      BenchmarkGobDecode        28990050     28894630   -0.33%
      BenchmarkGobEncode        12406310     12136730   -2.17%
      BenchmarkGzip               179923       179903   -0.01%
      BenchmarkGunzip              11219        11130   -0.79%
      BenchmarkJSONEncode       86429350     86515900   +0.10%
      BenchmarkJSONDecode      334593800    315728400   -5.64%
      BenchmarkRevcomp25M     1219763000   1180767000   -3.20%
      BenchmarkTemplate        492947600    483646800   -1.89%
      
      And 386:
      
      benchmark                old ns/op    new ns/op    delta
      BenchmarkBinaryTree17   6354902000   6243000000   -1.76%
      BenchmarkFannkuch11     8043769000   7326965000   -8.91%
      BenchmarkGobDecode        19010800     18941230   -0.37%
      BenchmarkGobEncode        14077500     13792460   -2.02%
      BenchmarkGzip               194087       193619   -0.24%
      BenchmarkGunzip              12495        12457   -0.30%
      BenchmarkJSONEncode      125636400    125451400   -0.15%
      BenchmarkJSONDecode      696648600    685032800   -1.67%
      BenchmarkRevcomp25M     2058088000   2052545000   -0.27%
      BenchmarkTemplate        602140000    589876800   -2.04%
      
      To implement this, two new instruction forms:
      
              JLT target      // same as always
              JLT $0, target  // branch expected not taken
              JLT $1, target  // branch expected taken
      
      The linker could also emit the prediction prefixes, but it
      does not: expected taken branches are reversed so that the
      expected case is not taken (as in example above), and
      the default expectaton for such a jump is not taken
      already.
      
      R=golang-dev, gri, r, dave
      CC=golang-dev
      https://golang.org/cl/6248049
      fefae6ee
  37. 24 May, 2012 1 commit
  38. 13 Feb, 2012 1 commit
    • Anthony Martin's avatar
      gc, 8g, 8l: fix a handful of warnings · dbec4210
      Anthony Martin authored
      8g/cgen.c
              print format type mismatch
      
      8l/asm.c
              resoff set and not used
      
      gc/pgen.c
              misleading comparison INT > 0x80000000
      
      gc/reflect.c
              dalgsym must be static to match forward declaration
      
      gc/subr.c
              assumed_equal set and not used
              hashmem's second argument is not used
      
      gc/walk.c
              duplicated (unreachable) code
      
      R=rsc
      CC=golang-dev
      https://golang.org/cl/5651079
      dbec4210