• Michael Hudson-Doyle's avatar
    runtime: adjust the arm64 memmove and memclr to operate by word as much as they can · 168a51b3
    Michael Hudson-Doyle authored
    Not only is this an obvious optimization:
    
    benchmark                           old MB/s     new MB/s     speedup
    BenchmarkMemmove1-4                 35.35        29.65        0.84x
    BenchmarkMemmove2-4                 63.78        52.53        0.82x
    BenchmarkMemmove3-4                 89.72        73.96        0.82x
    BenchmarkMemmove4-4                 109.94       95.73        0.87x
    BenchmarkMemmove5-4                 127.60       112.80       0.88x
    BenchmarkMemmove6-4                 143.59       126.67       0.88x
    BenchmarkMemmove7-4                 157.90       138.92       0.88x
    BenchmarkMemmove8-4                 167.18       231.81       1.39x
    BenchmarkMemmove9-4                 175.23       252.07       1.44x
    BenchmarkMemmove10-4                165.68       261.10       1.58x
    BenchmarkMemmove11-4                174.43       263.31       1.51x
    BenchmarkMemmove12-4                180.76       267.56       1.48x
    Benchma...
    168a51b3
memmove_arm64.s 2.36 KB