1. 14 Feb, 2017 18 commits
    • Kirill Smelkov's avatar
      cmd/compile/internal/ssa: combine 2 byte loads + shifts into word load + rolw 8 on AMD64 · 4477fd09
      Kirill Smelkov authored
      ... and same for stores. This does for binary.BigEndian.Uint16() what
      was already done for Uint32 and Uint64 with BSWAP in 10f75748 (CL 32222).
      
      Here is how generated code changes e.g. for the following function
      (omitting saying the same prologue/epilogue):
      
      	func get16(b [2]byte) uint16 {
      		return binary.BigEndian.Uint16(b[:])
      	}
      
      "".get16 t=1 size=21 args=0x10 locals=0x0
      
      	// before
              0x0000 00000 (x.go:15)  MOVBLZX "".b+9(FP), AX
              0x0005 00005 (x.go:15)  MOVBLZX "".b+8(FP), CX
              0x000a 00010 (x.go:15)  SHLL    $8, CX
              0x000d 00013 (x.go:15)  ORL     CX, AX
      
      	// after
      	0x0000 00000 (x.go:15)	MOVWLZX	"".b+8(FP), AX
      	0x0005 00005 (x.go:15)	ROLW	$8, AX
      
      encoding/binary is speedup overall a bit:
      
      name                    old time/op    new time/op    delta
      ReadSlice1000Int32s-4     4.83µs ± 0%    4.83µs ± 0%     ~     (p=0.206 n=4+5)
      ReadStruct-4              1.29µs ± 2%    1.28µs ± 1%   -1.27%  (p=0.032 n=4+5)
      ReadInts-4                 384ns ± 1%     385ns ± 1%     ~     (p=0.968 n=4+5)
      WriteInts-4                534ns ± 3%     526ns ± 0%   -1.54%  (p=0.048 n=4+5)
      WriteSlice1000Int32s-4    5.02µs ± 0%    5.11µs ± 3%     ~     (p=0.175 n=4+5)
      PutUint16-4               0.59ns ± 0%    0.49ns ± 2%  -16.95%  (p=0.016 n=4+5)
      PutUint32-4               0.52ns ± 0%    0.52ns ± 0%     ~     (all equal)
      PutUint64-4               0.53ns ± 0%    0.53ns ± 0%     ~     (all equal)
      PutUvarint32-4            19.9ns ± 0%    19.9ns ± 1%     ~     (p=0.556 n=4+5)
      PutUvarint64-4            54.5ns ± 1%    54.2ns ± 0%     ~     (p=0.333 n=4+5)
      
      name                    old speed      new speed      delta
      ReadSlice1000Int32s-4    829MB/s ± 0%   828MB/s ± 0%     ~     (p=0.190 n=4+5)
      ReadStruct-4            58.0MB/s ± 2%  58.7MB/s ± 1%   +1.30%  (p=0.032 n=4+5)
      ReadInts-4              78.0MB/s ± 1%  77.8MB/s ± 1%     ~     (p=0.968 n=4+5)
      WriteInts-4             56.1MB/s ± 3%  57.0MB/s ± 0%     ~     (p=0.063 n=4+5)
      WriteSlice1000Int32s-4   797MB/s ± 0%   783MB/s ± 3%     ~     (p=0.190 n=4+5)
      PutUint16-4             3.37GB/s ± 0%  4.07GB/s ± 2%  +20.83%  (p=0.016 n=4+5)
      PutUint32-4             7.73GB/s ± 0%  7.72GB/s ± 0%     ~     (p=0.556 n=4+5)
      PutUint64-4             15.1GB/s ± 0%  15.1GB/s ± 0%     ~     (p=0.905 n=4+5)
      PutUvarint32-4           201MB/s ± 0%   201MB/s ± 0%     ~     (p=0.905 n=4+5)
      PutUvarint64-4           147MB/s ± 1%   147MB/s ± 0%     ~     (p=0.286 n=4+5)
      
      ( "a bit" only because most of the time is spent in reflection-like things
        there, not actual bytes decoding. Even for direct PutUint16 benchmark the
        looping adds overhead and lowers visible benefit. For code-generated encoders /
        decoders actual effect is more than 20% )
      
      Adding Uint32 and Uint64 raw benchmarks too for completeness.
      
      NOTE I had to adjust load-combining rule for bswap case to match first 2 bytes
      loads as result of "2-bytes load+shift" -> "loadw + rorw 8" rewrite. Reason is:
      for loads+shift, even e.g. into uint16 var
      
      	var b []byte
      	var v uin16
      	v = uint16(b[1]) | uint16(b[0])<<8
      
      the compiler eventually generates L(ong) shift - SHLLconst [8], probably
      because it is more straightforward / other reasons to work on the whole
      register. This way 2 bytes rewriting rule is using SHLLconst (not SHLWconst) in
      its pattern, and then it always gets matched first, even if 2-byte rule comes
      syntactically after 4-byte rule in AMD64.rules because 4-bytes rule seemingly
      needs more applyRewrite() cycles to trigger. If 2-bytes rule gets matched for
      inner half of
      
      	var b []byte
      	var v uin32
      	v = uint32(b[3]) | uint32(b[2])<<8 | uint32(b[1])<<16 | uint32(b[0])<<24
      
      and we keep 4-byte load rule unchanged, the result will be MOVW + RORW $8 and
      then series of byte loads and shifts - not one MOVL + BSWAPL.
      
      There is no such problem for stores: there compiler, since it probably knows
      store destination is 2 bytes wide, uses SHRWconst 8 (not SHRLconst 8) and thus
      2-byte store rule is not a subset of rule for 4-byte stores.
      
      Fixes #17151  (int16 was last missing piece there)
      
      Change-Id: Idc03ba965bfce2b94fef456b02ff6742194748f6
      Reviewed-on: https://go-review.googlesource.com/34636Reviewed-by: default avatarIlya Tocar <ilya.tocar@intel.com>
      Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      4477fd09
    • Bryan C. Mills's avatar
      expvar: add benchmarks for steady-state Map Add calls · 7ffdb757
      Bryan C. Mills authored
      Add a benchmark for setting a String value, which we may
      want to treat differently from Int or Float due to the need to support
      Add methods for the latter.
      
      Update tests to use only the exported API instead of making (fragile)
      assumptions about unexported fields.
      
      The existing Map benchmarks construct a new Map for each iteration, which
      focuses the benchmark results on the initial allocation costs for the
      Map and its entries. This change adds variants of the benchmarks which
      use a long-lived map in order to measure steady-state performance for
      Map updates on existing keys.
      
      Updates #18177
      
      Change-Id: I62c920991d17d5898c592446af382cd5c04c528a
      Reviewed-on: https://go-review.googlesource.com/36959Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      7ffdb757
    • Michael Munday's avatar
      math/big: fix s390x test build tags · d2fea044
      Michael Munday authored
      The tests failed to compile when using the math_big_pure_go tag on
      s390x.
      
      Change-Id: I2a09f53ff6562ab9bc9b886cffc0f6205bbfcfbb
      Reviewed-on: https://go-review.googlesource.com/36956
      Run-TryBot: Michael Munday <munday@ca.ibm.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      d2fea044
    • Cherry Zhang's avatar
      cmd/compile: undo special handling of zero-valued STRUCTLIT · 78200799
      Cherry Zhang authored
      CL 35261 introduces special handling of zero-valued STRUCTLIT for
      efficient struct zeroing. But it didn't cover all use cases, for
      example, CONVNOP STRUCTLIT is not handled.
      
      On the other hand, CL 34566 handles zeroing earlier, so we don't
      need the change in CL 35261 for efficient zeroing. Other uses of
      zero-valued struct literals are very rare. So undo the change in
      walk.go in CL 35261.
      
      Add a test for efficient zeroing.
      
      Fixes #19084.
      
      Change-Id: I0807f7423fb44d47bf325b3c1ce9611a14953853
      Reviewed-on: https://go-review.googlesource.com/36955Reviewed-by: default avatarMatthew Dempsky <mdempsky@google.com>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      78200799
    • Kirill Smelkov's avatar
      cmd/compile/internal/ssa: generate bswap/store for indexed bigendian byte stores too on AMD64 · bd91e356
      Kirill Smelkov authored
      Commit 10f75748 (CL 32222) added rewrite rules to combine byte loads/stores +
      shifts into larger loads/stores + bswap. For loads both MOVBload and
      MOVBloadidx1 were handled but for store only MOVBstore was there without
      MOVBstoreidx added to rewrite pattern. Fix it.
      
      Here is how generated code changes for the following 2 functions
      (ommitting staying the same prologue/epilogue):
      
          func put32(b []byte, i int, v uint32) {
                  binary.BigEndian.PutUint32(b[i:], v)
          }
      
          func put64(b []byte, i int, v uint64) {
                  binary.BigEndian.PutUint64(b[i:], v)
          }
      
      "".put32 t=1 size=100 args=0x28 locals=0x0
      
      	// before
      	0x0032 00050 (x.go:5)	MOVL	CX, DX
      	0x0034 00052 (x.go:5)	SHRL	$24, CX
      	0x0037 00055 (x.go:5)	MOVQ	"".b+8(FP), BX
      	0x003c 00060 (x.go:5)	MOVB	CL, (BX)(AX*1)
      	0x003f 00063 (x.go:5)	MOVL	DX, CX
      	0x0041 00065 (x.go:5)	SHRL	$16, DX
      	0x0044 00068 (x.go:5)	MOVB	DL, 1(BX)(AX*1)
      	0x0048 00072 (x.go:5)	MOVL	CX, DX
      	0x004a 00074 (x.go:5)	SHRL	$8, CX
      	0x004d 00077 (x.go:5)	MOVB	CL, 2(BX)(AX*1)
      	0x0051 00081 (x.go:5)	MOVB	DL, 3(BX)(AX*1)
      
      	// after
      	0x0032 00050 (x.go:5)	BSWAPL	CX
      	0x0034 00052 (x.go:5)	MOVQ	"".b+8(FP), DX
      	0x0039 00057 (x.go:5)	MOVL	CX, (DX)(AX*1)
      
      "".put64 t=1 size=155 args=0x28 locals=0x0
      
      	// before
      	0x0037 00055 (x.go:9)	MOVQ	CX, DX
      	0x003a 00058 (x.go:9)	SHRQ	$56, CX
      	0x003e 00062 (x.go:9)	MOVQ	"".b+8(FP), BX
      	0x0043 00067 (x.go:9)	MOVB	CL, (BX)(AX*1)
      	0x0046 00070 (x.go:9)	MOVQ	DX, CX
      	0x0049 00073 (x.go:9)	SHRQ	$48, DX
      	0x004d 00077 (x.go:9)	MOVB	DL, 1(BX)(AX*1)
      	0x0051 00081 (x.go:9)	MOVQ	CX, DX
      	0x0054 00084 (x.go:9)	SHRQ	$40, CX
      	0x0058 00088 (x.go:9)	MOVB	CL, 2(BX)(AX*1)
      	0x005c 00092 (x.go:9)	MOVQ	DX, CX
      	0x005f 00095 (x.go:9)	SHRQ	$32, DX
      	0x0063 00099 (x.go:9)	MOVB	DL, 3(BX)(AX*1)
      	0x0067 00103 (x.go:9)	MOVQ	CX, DX
      	0x006a 00106 (x.go:9)	SHRQ	$24, CX
      	0x006e 00110 (x.go:9)	MOVB	CL, 4(BX)(AX*1)
      	0x0072 00114 (x.go:9)	MOVQ	DX, CX
      	0x0075 00117 (x.go:9)	SHRQ	$16, DX
      	0x0079 00121 (x.go:9)	MOVB	DL, 5(BX)(AX*1)
      	0x007d 00125 (x.go:9)	MOVQ	CX, DX
      	0x0080 00128 (x.go:9)	SHRQ	$8, CX
      	0x0084 00132 (x.go:9)	MOVB	CL, 6(BX)(AX*1)
      	0x0088 00136 (x.go:9)	MOVB	DL, 7(BX)(AX*1)
      
      	// after
      	0x0033 00051 (x.go:9)	BSWAPQ	CX
      	0x0036 00054 (x.go:9)	MOVQ	"".b+8(FP), DX
      	0x003b 00059 (x.go:9)	MOVQ	CX, (DX)(AX*1)
      
      Updates #17151
      
      Change-Id: I3f4a7f28f210e62e153e60da5abd1d39508cc6c4
      Reviewed-on: https://go-review.googlesource.com/34635
      Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarIlya Tocar <ilya.tocar@intel.com>
      bd91e356
    • Kale Blankenship's avatar
      net/http: document ErrServerClosed · a0645fca
      Kale Blankenship authored
      Fixes #19085
      
      Change-Id: Ib11b9a22ea8092aca9e1c9c36b1fb015dd555c4b
      Reviewed-on: https://go-review.googlesource.com/36943Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      a0645fca
    • Austin Clements's avatar
      runtime: remove g.stackAlloc · 0993b2fd
      Austin Clements authored
      Since we're no longer stealing space for the stack barrier array from
      the stack allocation, the stack allocation is simply
      g.stack.hi-g.stack.lo.
      
      Updates #17503.
      
      Change-Id: Id9b450ae12c3df9ec59cfc4365481a0a16b7c601
      Reviewed-on: https://go-review.googlesource.com/36621
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      0993b2fd
    • Austin Clements's avatar
      runtime: remove stack barriers · d089a6c7
      Austin Clements authored
      Now that we don't rescan stacks, stack barriers are unnecessary. This
      removes all of the code and structures supporting them as well as
      tests that were specifically for stack barriers.
      
      Updates #17503.
      
      Change-Id: Ia29221730e0f2bbe7beab4fa757f31a032d9690c
      Reviewed-on: https://go-review.googlesource.com/36620
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      d089a6c7
    • Austin Clements's avatar
      runtime: remove rescan list · c5ebcd2c
      Austin Clements authored
      With the hybrid barrier, rescanning stacks is no longer necessary so
      the rescan list is no longer necessary. Remove it.
      
      This leaves the gcrescanstacks GODEBUG variable, since it's useful for
      debugging, but changes it to simply walk all of the Gs to rescan
      stacks rather than using the rescan list.
      
      We could also remove g.gcscanvalid, which is effectively a distributed
      rescan list. However, it's still useful for gcrescanstacks mode and it
      adds little complexity, so we'll leave it in.
      
      Fixes #17099.
      Updates #17503.
      
      Change-Id: I776d43f0729567335ef1bfd145b75c74de2cc7a9
      Reviewed-on: https://go-review.googlesource.com/36619
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      c5ebcd2c
    • Austin Clements's avatar
      runtime: remove unused debug.wbshadow · 7aeb915d
      Austin Clements authored
      The wbshadow implementation was removed a year and a half ago in
      1635ab7d, but the GODEBUG setting remained. Remove the GODEBUG
      setting since it doesn't do anything.
      
      Change-Id: I19cde324a79472aff60acb5cc9f7d4aa86c0c0ed
      Reviewed-on: https://go-review.googlesource.com/36618
      Run-TryBot: Austin Clements <austin@google.com>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      7aeb915d
    • Nathan Caza's avatar
      net/http: handle absolute paths in mapDirOpenError · a610957f
      Nathan Caza authored
      The current implementation does not account for Dir being
      initialized with an absolute path on systems that start
      paths with filepath.Separator. In this scenario, the
      original error is returned, and not checked for file
      segments.
      
      This change adds a test for this case, and corrects the
      behavior by ignoring blank path segments in the loop.
      
      Refs #18984
      
      Change-Id: I9b79fd0a73a46976c8e2feda0283ef0bb2b62ea1
      Reviewed-on: https://go-review.googlesource.com/36804Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      a610957f
    • Josh Bleecher Snyder's avatar
      runtime: fix some assembly offset names · ef30a1c8
      Josh Bleecher Snyder authored
      For vet. There are more. This is a start.
      
      Change-Id: Ibbbb2b20b5db60ee3fac4a1b5913d18fab01f6b9
      Reviewed-on: https://go-review.googlesource.com/36939
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      ef30a1c8
    • Josh Bleecher Snyder's avatar
      all: fix some printf format strings · 785cb7e0
      Josh Bleecher Snyder authored
      Appease vet.
      
      Change-Id: Ie88de08b91041990c0eaf2e15628cdb98d40c660
      Reviewed-on: https://go-review.googlesource.com/36938
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      785cb7e0
    • Josh Bleecher Snyder's avatar
      all: use keyed composite literals · cc2a52ad
      Josh Bleecher Snyder authored
      Makes vet happy.
      
      Change-Id: I7250f283c96e82b9796c5672a0a143ba7568fa63
      Reviewed-on: https://go-review.googlesource.com/36937
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      cc2a52ad
    • Dave Cheney's avatar
      internal/poll: only build str.go on plan9 · c0165a38
      Dave Cheney authored
      Alternatively the contents of str.go could be moved into fd_io_plan9.go
      
      Change-Id: I9d7ec85bbb376f4244eeca732f25c0b77cadc6a6
      Reviewed-on: https://go-review.googlesource.com/36971
      Run-TryBot: Dave Cheney <dave@cheney.net>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      c0165a38
    • Dave Cheney's avatar
      internal/poll: remove named return values and naked returns · 84cf1f05
      Dave Cheney authored
      Change-Id: I283f4453e5cf8b22995b3abffccae182cfbb6945
      Reviewed-on: https://go-review.googlesource.com/36970Reviewed-by: default avatarDave Cheney <dave@cheney.net>
      Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      Run-TryBot: Dave Cheney <dave@cheney.net>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      84cf1f05
    • Caleb Spare's avatar
      time: add Duration.Truncate and Duration.Round · 45356c1a
      Caleb Spare authored
      Fixes #18996
      
      Change-Id: I0b0f7270960b368ce97ad4456f60bcc1fc2a8313
      Reviewed-on: https://go-review.googlesource.com/36615
      Run-TryBot: Caleb Spare <cespare@gmail.com>
      Reviewed-by: default avatarRuss Cox <rsc@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      45356c1a
    • Josh Bleecher Snyder's avatar
      runtime: speed up fastrand() % n · 46a75870
      Josh Bleecher Snyder authored
      This occurs a fair amount in the runtime for non-power-of-two n.
      Use an alternative, faster formulation.
      
      name           old time/op  new time/op  delta
      Fastrandn/2-8  4.45ns ± 2%  2.09ns ± 3%  -53.12%  (p=0.000 n=14+14)
      Fastrandn/3-8  4.78ns ±11%  2.06ns ± 2%  -56.94%  (p=0.000 n=15+15)
      Fastrandn/4-8  4.76ns ± 9%  1.99ns ± 3%  -58.28%  (p=0.000 n=15+13)
      Fastrandn/5-8  4.96ns ±13%  2.03ns ± 6%  -59.14%  (p=0.000 n=15+15)
      
      name                    old time/op  new time/op  delta
      SelectUncontended-8     33.7ns ± 2%  33.9ns ± 2%  +0.70%  (p=0.000 n=49+50)
      SelectSyncContended-8   1.68µs ± 4%  1.65µs ± 4%  -1.54%  (p=0.000 n=50+45)
      SelectAsyncContended-8   282ns ± 1%   277ns ± 1%  -1.50%  (p=0.000 n=48+43)
      SelectNonblock-8        5.31ns ± 1%  5.32ns ± 1%    ~     (p=0.275 n=45+44)
      SelectProdCons-8         585ns ± 3%   577ns ± 2%  -1.35%  (p=0.000 n=50+50)
      GoroutineSelect-8       1.59ms ± 2%  1.59ms ± 1%    ~     (p=0.084 n=49+48)
      
      Updates #16213
      
      Change-Id: Ib555a4d7da2042a25c3976f76a436b536487d5b7
      Reviewed-on: https://go-review.googlesource.com/36932
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      46a75870
  2. 13 Feb, 2017 22 commits