• Robert Griesemer's avatar
    math/bits: faster Reverse, ReverseBytes · 4498b683
    Robert Griesemer authored
    - moved from: x&m>>k | x&^m<<k to: x&m>>k | x<<k&m
      This permits use of the same constant m twice (*) which may be
      better for machines that can't use large immediate constants
      directly with an AND instruction and have to load them explicitly.
      *) CPUs don't usually have a &^ instruction, so x&^m becomes x&(^m)
    
    - simplified returns
      This improves the generated code because the compiler recognizes
      x>>k | x<<k as ROT when k is the bitsize of x.
    
    The 8-bit versions of these instructions can be significantly faster
    still if they are replaced with table lookups, as long as the table
    is in cache. If the table is not in cache, table-lookup is probably
    slower, hence the choice of an explicit register-only implementation
    for now.
    
    BenchmarkReverse-8            8.50          6.86          -19.29%
    BenchmarkReverse8-8           2.17          1.74          -19.82%
    BenchmarkReverse16-8          2.89          2.34          -19.03%
    BenchmarkReverse32-8          3.55          2.95          -16.90%
    BenchmarkReverse64-8          6.81          5.57          -18.21%
    BenchmarkReverseBytes-8       3.49          2.48          -28.94%
    BenchmarkReverseBytes16-8     0.93          0.62          -33.33%
    BenchmarkReverseBytes32-8     1.55          1.13          -27.10%
    BenchmarkReverseBytes64-8     2.47          2.47          +0.00%
    
    Reverse-8         8.50ns ± 0%  6.86ns ± 0%   ~             (p=1.000 n=1+1)
    Reverse8-8        2.17ns ± 0%  1.74ns ± 0%   ~             (p=1.000 n=1+1)
    Reverse16-8       2.89ns ± 0%  2.34ns ± 0%   ~             (p=1.000 n=1+1)
    Reverse32-8       3.55ns ± 0%  2.95ns ± 0%   ~             (p=1.000 n=1+1)
    Reverse64-8       6.81ns ± 0%  5.57ns ± 0%   ~             (p=1.000 n=1+1)
    ReverseBytes-8    3.49ns ± 0%  2.48ns ± 0%   ~             (p=1.000 n=1+1)
    ReverseBytes16-8  0.93ns ± 0%  0.62ns ± 0%   ~             (p=1.000 n=1+1)
    ReverseBytes32-8  1.55ns ± 0%  1.13ns ± 0%   ~             (p=1.000 n=1+1)
    ReverseBytes64-8  2.47ns ± 0%  2.47ns ± 0%   ~     (all samples are equal)
    
    Change-Id: I0064de8c7e0e568ca7885d6f7064344bef91a06d
    Reviewed-on: https://go-review.googlesource.com/37215
    
    
    Run-TryBot: Robert Griesemer <gri@golang.org>
    Reviewed-by: default avatarMatthew Dempsky <mdempsky@google.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    4498b683
bits.go 7.29 KB