Commit a2f22a68 authored by Ben Shi's avatar Ben Shi Committed by Cherry Zhang

cmd/compile: optimize ARM with more efficient MOVB/MOVBU/MOVH/MOVHU

Like the indexed MOVW (MOVWloadidx/MOVWstoreidx) used in current
ARM backend, the indexed MOVB/MOVBU/MOVH/MOVHU can also be used to
generate further optimized ARM code.

My patch implements this optimization. Here are some contrast test
results against the original go compiler.

1. The total size of all .a files in pkg/ shrinks by 0.03%.

2. The compilecmp benchmark shows a little decline.
name        old time/op       new time/op       delta
Template          2.35s ± 1%        2.37s ± 3%  +0.94%  (p=0.006 n=19+19)
Unicode           1.33s ± 3%        1.33s ± 2%    ~     (p=0.158 n=20+18)
GoTypes           7.86s ± 2%        7.84s ± 1%    ~     (p=0.284 n=19+18)
Compiler          37.5s ± 1%        37.7s ± 2%    ~     (p=0.101 n=20+19)
SSA               83.4s ± 2%        83.6s ± 2%    ~     (p=0.231 n=20+20)
Flate             1.46s ± 2%        1.45s ± 1%    ~     (p=0.097 n=20+17)
GoParser          1.86s ± 2%        1.86s ± 4%    ~     (p=0.738 n=20+20)
Reflect           5.10s ± 1%        5.11s ± 1%    ~     (p=0.290 n=20+18)
Tar               1.78s ± 2%        1.77s ± 2%    ~     (p=0.166 n=19+20)
XML               2.61s ± 2%        2.61s ± 2%    ~     (p=0.665 n=19+19)
[Geo mean]        4.67s             4.68s       +0.16%

name        old user-time/op  new user-time/op  delta
Template          2.79s ± 3%        2.80s ± 2%    ~     (p=0.662 n=20+20)
Unicode           1.62s ± 3%        1.64s ± 4%    ~     (p=0.252 n=20+20)
GoTypes           9.58s ± 2%        9.62s ± 2%    ~     (p=0.250 n=20+20)
Compiler          46.2s ± 1%        46.2s ± 1%    ~     (p=0.602 n=20+19)
SSA                108s ± 1%         108s ± 2%    ~     (p=0.242 n=18+20)
Flate             1.69s ± 3%        1.69s ± 4%    ~     (p=0.470 n=20+20)
GoParser          2.16s ± 3%        2.20s ± 4%  +1.70%  (p=0.005 n=19+20)
Reflect           6.02s ± 2%        6.02s ± 2%    ~     (p=0.700 n=20+17)
Tar               2.11s ± 2%        2.11s ± 3%    ~     (p=0.480 n=18+20)
XML               3.07s ± 2%        3.11s ± 4%  +1.50%  (p=0.043 n=20+20)
[Geo mean]        5.61s             5.64s       +0.55%

name        old text-bytes    new text-bytes    delta
HelloSize         586kB ± 0%        586kB ± 0%    ~     (all equal)

name        old data-bytes    new data-bytes    delta
HelloSize        5.46kB ± 0%       5.46kB ± 0%    ~     (all equal)

name        old bss-bytes     new bss-bytes     delta
HelloSize        72.9kB ± 0%       72.9kB ± 0%    ~     (all equal)

name        old exe-bytes     new exe-bytes     delta
HelloSize        1.03MB ± 0%       1.03MB ± 0%    ~     (all equal)

3. The go1 benchmark shows improvement totally, and even more than 10%
improvement in the test case Revcomp. 
name                     old time/op    new time/op    delta
BinaryTree17-4              42.0s ± 1%     41.5s ± 1%   -1.32%  (p=0.000 n=39+40)
Fannkuch11-4                24.1s ± 1%     23.6s ± 0%   -2.38%  (p=0.000 n=40+40)
FmtFprintfEmpty-4           843ns ± 0%     839ns ± 1%   -0.46%  (p=0.000 n=33+40)
FmtFprintfString-4         1.44µs ± 1%    1.37µs ± 1%   -5.48%  (p=0.000 n=40+35)
FmtFprintfInt-4            1.44µs ± 1%    1.41µs ± 2%   -1.50%  (p=0.000 n=40+40)
FmtFprintfIntInt-4         2.07µs ± 1%    2.06µs ± 0%   -0.78%  (p=0.000 n=40+40)
FmtFprintfPrefixedInt-4    2.50µs ± 1%    2.33µs ± 1%   -6.85%  (p=0.000 n=40+40)
FmtFprintfFloat-4          4.36µs ± 1%    4.34µs ± 0%   -0.39%  (p=0.017 n=40+40)
FmtManyArgs-4              8.11µs ± 0%    8.00µs ± 0%   -1.37%  (p=0.000 n=40+40)
GobDecode-4                 105ms ± 2%     103ms ± 2%   -2.17%  (p=0.000 n=39+39)
GobEncode-4                90.1ms ± 2%    88.6ms ± 1%   -1.67%  (p=0.000 n=40+39)
Gzip-4                      4.18s ± 1%     4.09s ± 1%   -2.03%  (p=0.000 n=40+40)
Gunzip-4                    608ms ± 1%     603ms ± 1%   -0.86%  (p=0.000 n=40+34)
HTTPClientServer-4          674µs ± 3%     661µs ± 2%   -1.82%  (p=0.000 n=40+39)
JSONEncode-4                256ms ± 1%     243ms ± 0%   -5.11%  (p=0.000 n=39+31)
JSONDecode-4                915ms ± 1%     904ms ± 1%   -1.18%  (p=0.000 n=40+36)
Mandelbrot200-4            49.2ms ± 0%    49.3ms ± 0%     ~     (p=0.254 n=34+40)
GoParse-4                  46.9ms ± 2%    46.9ms ± 1%     ~     (p=0.737 n=40+39)
RegexpMatchEasy0_32-4      1.28µs ± 1%    1.27µs ± 1%   -0.71%  (p=0.000 n=40+40)
RegexpMatchEasy0_1K-4      7.86µs ± 4%    7.67µs ± 4%   -2.46%  (p=0.000 n=38+40)
RegexpMatchEasy1_32-4      1.28µs ± 1%    1.28µs ± 1%   -0.54%  (p=0.000 n=40+40)
RegexpMatchEasy1_1K-4      10.4µs ± 2%    10.3µs ± 2%   -0.88%  (p=0.003 n=40+39)
RegexpMatchMedium_32-4     2.05µs ± 0%    2.04µs ± 0%   -0.34%  (p=0.000 n=40+33)
RegexpMatchMedium_1K-4      541µs ± 1%     535µs ± 1%   -1.02%  (p=0.000 n=40+38)
RegexpMatchHard_32-4       29.3µs ± 1%    29.1µs ± 1%   -0.51%  (p=0.000 n=40+40)
RegexpMatchHard_1K-4        881µs ± 1%     871µs ± 1%   -1.15%  (p=0.000 n=40+40)
Revcomp-4                  81.7ms ± 2%    67.5ms ± 2%  -17.37%  (p=0.000 n=39+39)
Template-4                  1.05s ± 1%     1.08s ± 2%   +3.67%  (p=0.000 n=40+40)
TimeParse-4                7.24µs ± 1%    7.09µs ± 1%   -2.13%  (p=0.000 n=40+40)
TimeFormat-4               13.2µs ± 1%    13.1µs ± 0%   -0.31%  (p=0.007 n=40+31)
[Geo mean]                  733µs          718µs        -2.03%

name                     old speed      new speed      delta
GobDecode-4              7.28MB/s ± 2%  7.44MB/s ± 2%   +2.23%  (p=0.000 n=39+39)
GobEncode-4              8.52MB/s ± 2%  8.67MB/s ± 1%   +1.70%  (p=0.000 n=40+39)
Gzip-4                   4.65MB/s ± 1%  4.74MB/s ± 1%   +1.94%  (p=0.000 n=37+40)
Gunzip-4                 31.9MB/s ± 1%  32.2MB/s ± 1%   +0.90%  (p=0.000 n=40+36)
JSONEncode-4             7.57MB/s ± 1%  7.98MB/s ± 0%   +5.41%  (p=0.000 n=40+31)
JSONDecode-4             2.12MB/s ± 1%  2.15MB/s ± 1%   +1.23%  (p=0.000 n=40+40)
GoParse-4                1.23MB/s ± 1%  1.23MB/s ± 1%     ~     (p=0.769 n=39+40)
RegexpMatchEasy0_32-4    25.0MB/s ± 1%  25.2MB/s ± 1%   +0.71%  (p=0.000 n=40+40)
RegexpMatchEasy0_1K-4     130MB/s ± 5%   134MB/s ± 4%   +2.53%  (p=0.000 n=38+40)
RegexpMatchEasy1_32-4    24.9MB/s ± 1%  25.1MB/s ± 1%   +0.55%  (p=0.000 n=40+40)
RegexpMatchEasy1_1K-4    98.5MB/s ± 2%  99.4MB/s ± 2%   +0.88%  (p=0.003 n=40+39)
RegexpMatchMedium_32-4    490kB/s ± 0%   490kB/s ± 0%     ~     (all equal)
RegexpMatchMedium_1K-4   1.89MB/s ± 1%  1.91MB/s ± 1%   +1.02%  (p=0.000 n=40+38)
RegexpMatchHard_32-4     1.10MB/s ± 1%  1.10MB/s ± 0%   +0.41%  (p=0.000 n=40+33)
RegexpMatchHard_1K-4     1.16MB/s ± 1%  1.17MB/s ± 1%   +1.21%  (p=0.000 n=40+40)
Revcomp-4                31.1MB/s ± 2%  37.6MB/s ± 2%  +21.03%  (p=0.000 n=39+39)
Template-4               1.86MB/s ± 1%  1.79MB/s ± 1%   -3.51%  (p=0.000 n=40+38)
[Geo mean]               6.66MB/s       6.80MB/s        +2.13%

fixes #21492

Change-Id: Ia26e7ca393f0a5f31de240e8ff9a220453ca7e0d
Reviewed-on: https://go-review.googlesource.com/58450Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
parent 1fe1512f
......@@ -516,7 +516,7 @@ func ssaGenValue(s *gc.SSAGenState, v *ssa.Value) {
p.To.Type = obj.TYPE_MEM
p.To.Reg = v.Args[0].Reg()
gc.AddAux(&p.To, v)
case ssa.OpARMMOVWloadidx:
case ssa.OpARMMOVWloadidx, ssa.OpARMMOVBUloadidx, ssa.OpARMMOVBloadidx, ssa.OpARMMOVHUloadidx, ssa.OpARMMOVHloadidx:
// this is just shift 0 bits
fallthrough
case ssa.OpARMMOVWloadshiftLL:
......@@ -528,7 +528,7 @@ func ssaGenValue(s *gc.SSAGenState, v *ssa.Value) {
case ssa.OpARMMOVWloadshiftRA:
p := genshift(s, v.Op.Asm(), 0, v.Args[1].Reg(), v.Reg(), arm.SHIFT_AR, v.AuxInt)
p.From.Reg = v.Args[0].Reg()
case ssa.OpARMMOVWstoreidx:
case ssa.OpARMMOVWstoreidx, ssa.OpARMMOVBstoreidx, ssa.OpARMMOVHstoreidx:
// this is just shift 0 bits
fallthrough
case ssa.OpARMMOVWstoreshiftLL:
......
......@@ -499,6 +499,10 @@
(MOVWloadshiftLL ptr idx [c] (MOVWstoreshiftLL ptr2 idx [d] x _)) && c==d && isSamePtr(ptr, ptr2) -> x
(MOVWloadshiftRL ptr idx [c] (MOVWstoreshiftRL ptr2 idx [d] x _)) && c==d && isSamePtr(ptr, ptr2) -> x
(MOVWloadshiftRA ptr idx [c] (MOVWstoreshiftRA ptr2 idx [d] x _)) && c==d && isSamePtr(ptr, ptr2) -> x
(MOVBUloadidx ptr idx (MOVBstoreidx ptr2 idx x _)) && isSamePtr(ptr, ptr2) -> (MOVBUreg x)
(MOVBloadidx ptr idx (MOVBstoreidx ptr2 idx x _)) && isSamePtr(ptr, ptr2) -> (MOVBreg x)
(MOVHUloadidx ptr idx (MOVHstoreidx ptr2 idx x _)) && isSamePtr(ptr, ptr2) -> (MOVHUreg x)
(MOVHloadidx ptr idx (MOVHstoreidx ptr2 idx x _)) && isSamePtr(ptr, ptr2) -> (MOVHreg x)
// fold constant into arithmatic ops
(ADD x (MOVWconst [c])) -> (ADDconst [c] x)
......@@ -1152,13 +1156,31 @@
(MOVWstore [0] {sym} (ADDshiftLL ptr idx [c]) val mem) && sym == nil && !config.nacl -> (MOVWstoreshiftLL ptr idx [c] val mem)
(MOVWstore [0] {sym} (ADDshiftRL ptr idx [c]) val mem) && sym == nil && !config.nacl -> (MOVWstoreshiftRL ptr idx [c] val mem)
(MOVWstore [0] {sym} (ADDshiftRA ptr idx [c]) val mem) && sym == nil && !config.nacl -> (MOVWstoreshiftRA ptr idx [c] val mem)
(MOVBUload [0] {sym} (ADD ptr idx) mem) && sym == nil && !config.nacl -> (MOVBUloadidx ptr idx mem)
(MOVBload [0] {sym} (ADD ptr idx) mem) && sym == nil && !config.nacl -> (MOVBloadidx ptr idx mem)
(MOVBstore [0] {sym} (ADD ptr idx) val mem) && sym == nil && !config.nacl -> (MOVBstoreidx ptr idx val mem)
(MOVHUload [0] {sym} (ADD ptr idx) mem) && sym == nil && !config.nacl -> (MOVHUloadidx ptr idx mem)
(MOVHload [0] {sym} (ADD ptr idx) mem) && sym == nil && !config.nacl -> (MOVHloadidx ptr idx mem)
(MOVHstore [0] {sym} (ADD ptr idx) val mem) && sym == nil && !config.nacl -> (MOVHstoreidx ptr idx val mem)
// constant folding in indexed loads and stores
(MOVWloadidx ptr (MOVWconst [c]) mem) -> (MOVWload [c] ptr mem)
(MOVWloadidx (MOVWconst [c]) ptr mem) -> (MOVWload [c] ptr mem)
(MOVBloadidx ptr (MOVWconst [c]) mem) -> (MOVBload [c] ptr mem)
(MOVBloadidx (MOVWconst [c]) ptr mem) -> (MOVBload [c] ptr mem)
(MOVBUloadidx ptr (MOVWconst [c]) mem) -> (MOVBUload [c] ptr mem)
(MOVBUloadidx (MOVWconst [c]) ptr mem) -> (MOVBUload [c] ptr mem)
(MOVHUloadidx ptr (MOVWconst [c]) mem) -> (MOVHUload [c] ptr mem)
(MOVHUloadidx (MOVWconst [c]) ptr mem) -> (MOVHUload [c] ptr mem)
(MOVHloadidx ptr (MOVWconst [c]) mem) -> (MOVHload [c] ptr mem)
(MOVHloadidx (MOVWconst [c]) ptr mem) -> (MOVHload [c] ptr mem)
(MOVWstoreidx ptr (MOVWconst [c]) val mem) -> (MOVWstore [c] ptr val mem)
(MOVWstoreidx (MOVWconst [c]) ptr val mem) -> (MOVWstore [c] ptr val mem)
(MOVBstoreidx ptr (MOVWconst [c]) val mem) -> (MOVBstore [c] ptr val mem)
(MOVBstoreidx (MOVWconst [c]) ptr val mem) -> (MOVBstore [c] ptr val mem)
(MOVHstoreidx ptr (MOVWconst [c]) val mem) -> (MOVHstore [c] ptr val mem)
(MOVHstoreidx (MOVWconst [c]) ptr val mem) -> (MOVHstore [c] ptr val mem)
(MOVWloadidx ptr (SLLconst idx [c]) mem) -> (MOVWloadshiftLL ptr idx [c] mem)
(MOVWloadidx (SLLconst idx [c]) ptr mem) -> (MOVWloadshiftLL ptr idx [c] mem)
......
......@@ -346,11 +346,17 @@ func init() {
{name: "MOVWloadshiftLL", argLength: 3, reg: gp2load, asm: "MOVW", aux: "Int32"}, // load from arg0 + arg1<<auxInt. arg2=mem
{name: "MOVWloadshiftRL", argLength: 3, reg: gp2load, asm: "MOVW", aux: "Int32"}, // load from arg0 + arg1>>auxInt, unsigned shift. arg2=mem
{name: "MOVWloadshiftRA", argLength: 3, reg: gp2load, asm: "MOVW", aux: "Int32"}, // load from arg0 + arg1>>auxInt, signed shift. arg2=mem
{name: "MOVBUloadidx", argLength: 3, reg: gp2load, asm: "MOVBU"}, // load from arg0 + arg1. arg2=mem
{name: "MOVBloadidx", argLength: 3, reg: gp2load, asm: "MOVB"}, // load from arg0 + arg1. arg2=mem
{name: "MOVHUloadidx", argLength: 3, reg: gp2load, asm: "MOVHU"}, // load from arg0 + arg1. arg2=mem
{name: "MOVHloadidx", argLength: 3, reg: gp2load, asm: "MOVH"}, // load from arg0 + arg1. arg2=mem
{name: "MOVWstoreidx", argLength: 4, reg: gp2store, asm: "MOVW"}, // store arg2 to arg0 + arg1. arg3=mem
{name: "MOVWstoreshiftLL", argLength: 4, reg: gp2store, asm: "MOVW", aux: "Int32"}, // store arg2 to arg0 + arg1<<auxInt. arg3=mem
{name: "MOVWstoreshiftRL", argLength: 4, reg: gp2store, asm: "MOVW", aux: "Int32"}, // store arg2 to arg0 + arg1>>auxInt, unsigned shift. arg3=mem
{name: "MOVWstoreshiftRA", argLength: 4, reg: gp2store, asm: "MOVW", aux: "Int32"}, // store arg2 to arg0 + arg1>>auxInt, signed shift. arg3=mem
{name: "MOVBstoreidx", argLength: 4, reg: gp2store, asm: "MOVB"}, // store arg2 to arg0 + arg1. arg3=mem
{name: "MOVHstoreidx", argLength: 4, reg: gp2store, asm: "MOVH"}, // store arg2 to arg0 + arg1. arg3=mem
{name: "MOVBreg", argLength: 1, reg: gp11, asm: "MOVBS"}, // move from arg0, sign-extended from byte
{name: "MOVBUreg", argLength: 1, reg: gp11, asm: "MOVBU"}, // move from arg0, unsign-extended from byte
......
......@@ -847,10 +847,16 @@ const (
OpARMMOVWloadshiftLL
OpARMMOVWloadshiftRL
OpARMMOVWloadshiftRA
OpARMMOVBUloadidx
OpARMMOVBloadidx
OpARMMOVHUloadidx
OpARMMOVHloadidx
OpARMMOVWstoreidx
OpARMMOVWstoreshiftLL
OpARMMOVWstoreshiftRL
OpARMMOVWstoreshiftRA
OpARMMOVBstoreidx
OpARMMOVHstoreidx
OpARMMOVBreg
OpARMMOVBUreg
OpARMMOVHreg
......@@ -10651,6 +10657,62 @@ var opcodeTable = [...]opInfo{
},
},
},
{
name: "MOVBUloadidx",
argLen: 3,
asm: arm.AMOVBU,
reg: regInfo{
inputs: []inputInfo{
{1, 22527}, // R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 g R12 R14
{0, 4294998015}, // R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 g R12 SP R14 SB
},
outputs: []outputInfo{
{0, 21503}, // R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R12 R14
},
},
},
{
name: "MOVBloadidx",
argLen: 3,
asm: arm.AMOVB,
reg: regInfo{
inputs: []inputInfo{
{1, 22527}, // R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 g R12 R14
{0, 4294998015}, // R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 g R12 SP R14 SB
},
outputs: []outputInfo{
{0, 21503}, // R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R12 R14
},
},
},
{
name: "MOVHUloadidx",
argLen: 3,
asm: arm.AMOVHU,
reg: regInfo{
inputs: []inputInfo{
{1, 22527}, // R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 g R12 R14
{0, 4294998015}, // R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 g R12 SP R14 SB
},
outputs: []outputInfo{
{0, 21503}, // R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R12 R14
},
},
},
{
name: "MOVHloadidx",
argLen: 3,
asm: arm.AMOVH,
reg: regInfo{
inputs: []inputInfo{
{1, 22527}, // R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 g R12 R14
{0, 4294998015}, // R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 g R12 SP R14 SB
},
outputs: []outputInfo{
{0, 21503}, // R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R12 R14
},
},
},
{
name: "MOVWstoreidx",
argLen: 4,
......@@ -10702,6 +10764,30 @@ var opcodeTable = [...]opInfo{
},
},
},
{
name: "MOVBstoreidx",
argLen: 4,
asm: arm.AMOVB,
reg: regInfo{
inputs: []inputInfo{
{1, 22527}, // R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 g R12 R14
{2, 22527}, // R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 g R12 R14
{0, 4294998015}, // R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 g R12 SP R14 SB
},
},
},
{
name: "MOVHstoreidx",
argLen: 4,
asm: arm.AMOVH,
reg: regInfo{
inputs: []inputInfo{
{1, 22527}, // R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 g R12 R14
{2, 22527}, // R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 g R12 R14
{0, 4294998015}, // R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 g R12 SP R14 SB
},
},
},
{
name: "MOVBreg",
argLen: 1,
......
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment