• Josh Bleecher Snyder's avatar
    math/big: rewrite pure Go implementations to use math/bits · d5edbcac
    Josh Bleecher Snyder authored
    While we're here, delete addWW_g and subWW_g, per the TODO.
    They are now obsolete.
    
    Benchmarks on amd64 with -tags=math_big_pure_go.
    
    name                old time/op    new time/op     delta
    AddVV/1-8             5.24ns ± 2%     5.12ns ± 1%    -2.11%  (p=0.000 n=82+87)
    AddVV/2-8             6.44ns ± 1%     6.33ns ± 2%    -1.82%  (p=0.000 n=77+82)
    AddVV/3-8             7.89ns ± 8%     6.97ns ± 4%   -11.71%  (p=0.000 n=100+96)
    AddVV/4-8             8.60ns ± 0%     7.72ns ± 4%   -10.24%  (p=0.000 n=90+96)
    AddVV/5-8             10.3ns ± 4%      8.5ns ± 1%   -17.02%  (p=0.000 n=96+91)
    AddVV/10-8            16.2ns ± 5%     12.8ns ± 1%   -21.11%  (p=0.000 n=97+86)
    AddVV/100-8            148ns ± 1%      117ns ± 5%   -21.07%  (p=0.000 n=66+98)
    AddVV/1000-8          1.41µs ± 4%     1.13µs ± 3%   -19.90%  (p=0.000 n=97+97)
    AddVV/10000-8         14.2µs ± 5%     11.2µs ± 1%   -20.82%  (p=0.000 n=99+84)
    AddVV/100000-8         142µs ± 4%      113µs ± 4%   -20.40%  (p=0.000 n=91+92)
    SubVV/1-8             5.29ns ± 1%     5.11ns ± 0%    -3.30%  (p=0.000 n=87+88)
    SubVV/2-8             6.36ns ± 4%     6.33ns ± 2%    -0.56%  (p=0.002 n=98+73)
    SubVV/3-8             7.58ns ± 5%     6.98ns ± 4%    -8.01%  (p=0.000 n=97+91)
    SubVV/4-8             8.61ns ± 3%     7.98ns ± 2%    -7.31%  (p=0.000 n=95+83)
    SubVV/5-8             10.6ns ± 2%      8.5ns ± 1%   -19.56%  (p=0.000 n=79+89)
    SubVV/10-8            16.3ns ± 4%     12.7ns ± 1%   -21.97%  (p=0.000 n=98+82)
    SubVV/100-8            124ns ± 1%      118ns ± 1%    -4.83%  (p=0.000 n=85+81)
    SubVV/1000-8          1.14µs ± 5%     1.12µs ± 2%    -1.17%  (p=0.000 n=97+81)
    SubVV/10000-8         11.6µs ±10%     11.2µs ± 1%    -3.39%  (p=0.000 n=100+84)
    SubVV/100000-8         114µs ± 6%      114µs ± 5%      ~     (p=0.396 n=83+94)
    AddVW/1-8             4.04ns ± 4%     4.34ns ± 4%    +7.57%  (p=0.000 n=96+98)
    AddVW/2-8             4.34ns ± 5%     4.40ns ± 5%    +1.40%  (p=0.000 n=99+98)
    AddVW/3-8             5.43ns ± 0%     5.54ns ± 2%    +1.97%  (p=0.000 n=85+94)
    AddVW/4-8             6.23ns ± 1%     6.18ns ± 2%    -0.66%  (p=0.000 n=77+78)
    AddVW/5-8             6.78ns ± 2%     6.90ns ± 4%    +1.77%  (p=0.000 n=80+99)
    AddVW/10-8            10.5ns ± 4%      9.9ns ± 1%    -5.77%  (p=0.000 n=97+69)
    AddVW/100-8            114ns ± 3%       91ns ± 0%   -20.38%  (p=0.000 n=98+77)
    AddVW/1000-8          1.12µs ± 1%     0.87µs ± 1%   -22.80%  (p=0.000 n=82+68)
    AddVW/10000-8         11.2µs ± 2%      8.5µs ± 5%   -23.85%  (p=0.000 n=85+100)
    AddVW/100000-8         112µs ± 2%       85µs ± 5%   -24.22%  (p=0.000 n=71+96)
    SubVW/1-8             4.09ns ± 2%     4.18ns ± 4%    +2.32%  (p=0.000 n=78+96)
    SubVW/2-8             4.59ns ± 5%     4.52ns ± 7%    -1.54%  (p=0.000 n=98+94)
    SubVW/3-8             5.41ns ±10%     5.55ns ± 1%    +2.48%  (p=0.000 n=100+89)
    SubVW/4-8             6.51ns ± 2%     6.19ns ± 0%    -4.85%  (p=0.000 n=97+81)
    SubVW/5-8             7.25ns ± 3%     6.90ns ± 4%    -4.93%  (p=0.000 n=97+96)
    SubVW/10-8            10.6ns ± 4%      9.8ns ± 2%    -7.32%  (p=0.000 n=95+96)
    SubVW/100-8           90.4ns ± 0%     90.8ns ± 0%    +0.43%  (p=0.000 n=83+78)
    SubVW/1000-8           853ns ± 4%      857ns ± 2%    +0.42%  (p=0.000 n=100+98)
    SubVW/10000-8         8.52µs ± 4%     8.53µs ± 2%      ~     (p=0.061 n=99+97)
    SubVW/100000-8        84.8µs ± 5%     84.2µs ± 2%    -0.78%  (p=0.000 n=99+93)
    AddMulVVW/1-8         8.73ns ± 0%     5.33ns ± 3%   -38.91%  (p=0.000 n=91+96)
    AddMulVVW/2-8         14.8ns ± 3%      6.5ns ± 2%   -56.33%  (p=0.000 n=100+79)
    AddMulVVW/3-8         18.6ns ± 2%      7.8ns ± 5%   -57.84%  (p=0.000 n=89+96)
    AddMulVVW/4-8         24.0ns ± 2%      9.8ns ± 0%   -59.09%  (p=0.000 n=95+67)
    AddMulVVW/5-8         29.0ns ± 2%     11.5ns ± 5%   -60.44%  (p=0.000 n=90+97)
    AddMulVVW/10-8        54.1ns ± 0%     18.8ns ± 1%   -65.37%  (p=0.000 n=82+84)
    AddMulVVW/100-8        508ns ± 2%      165ns ± 4%   -67.62%  (p=0.000 n=72+98)
    AddMulVVW/1000-8      4.96µs ± 3%     1.55µs ± 1%   -68.86%  (p=0.000 n=99+91)
    AddMulVVW/10000-8     50.0µs ± 4%     15.5µs ± 4%   -68.95%  (p=0.000 n=97+97)
    AddMulVVW/100000-8     491µs ± 1%      156µs ± 8%   -68.22%  (p=0.000 n=79+95)
    
    Change-Id: I4c6ae0b4065f371aea8103f6a85d9e9274bf01d0
    Reviewed-on: https://go-review.googlesource.com/c/go/+/164965
    Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: default avatarRobert Griesemer <gri@golang.org>
    d5edbcac
arith.go 3.41 KB