• Carlo Alberto Ferraris's avatar
    bytes: make bytes.Buffer cache-friendly · 55310403
    Carlo Alberto Ferraris authored
    During benchmark of an internal tool we found out that (*Buffer).Reset() was
    surprisingly showing up in CPU profiles.
    
    This CL contains two related changes aimed at speeding up Reset():
    1. Create a fast path for Truncate(0) by moving the logic to Reset()
       (this makes Reset() a simple leaf func that gets inlined since it
       gets compiled to 3 MOVx instructions). Accordingly change calls in
       the rest of the Buffer methods to call Reset() instead of Truncate(0).
    2. Reorder the fields in the Buffer struct so that frequently accessed
       fields are packed together (buf, off, lastRead). This also make them
       likely to be in the same cacheline.
    
    Ideally it would be advisable to have Buffer{} cacheline-aligned, but I
    couldn't find a way to do this without changing the size of the bootstrap
    array (but this will cause some regressions, because it will make duffcopy
    show up in CPU profiles where it wasn't showing up before).
    
    go1 benchmarks are not really affected, but some other benchmarks that
    exercise Buffer more show improvements:
    
    name                     old time/op    new time/op    delta
    BinaryTree17-4              2.46s ± 9%     2.43s ± 3%    ~     (p=0.982 n=14+14)
    Fannkuch11-4                2.98s ± 1%     2.90s ± 1%  -2.58%  (p=0.000 n=15+14)
    FmtFprintfEmpty-4          45.2ns ± 1%    45.2ns ± 1%    ~     (p=0.494 n=14+15)
    FmtFprintfString-4         76.8ns ± 1%    83.1ns ± 2%  +8.23%  (p=0.000 n=10+15)
    FmtFprintfInt-4            78.0ns ± 2%    74.6ns ± 1%  -4.46%  (p=0.000 n=15+15)
    FmtFprintfIntInt-4          113ns ± 1%     109ns ± 2%  -2.91%  (p=0.000 n=13+15)
    FmtFprintfPrefixedInt-4     152ns ± 2%     143ns ± 2%  -6.04%  (p=0.000 n=15+14)
    FmtFprintfFloat-4           224ns ± 1%     222ns ± 2%  -1.08%  (p=0.001 n=15+14)
    FmtManyArgs-4               464ns ± 2%     463ns ± 2%    ~     (p=0.303 n=14+15)
    GobDecode-4                6.25ms ± 2%    6.32ms ± 3%  +1.20%  (p=0.002 n=14+14)
    GobEncode-4                5.41ms ± 2%    5.41ms ± 2%    ~     (p=0.967 n=15+15)
    Gzip-4                      215ms ± 2%     218ms ± 2%  +1.35%  (p=0.002 n=15+15)
    Gunzip-4                   34.3ms ± 2%    34.2ms ± 2%    ~     (p=0.539 n=15+15)
    HTTPClientServer-4         76.4µs ± 2%    75.4µs ± 1%  -1.31%  (p=0.000 n=15+15)
    JSONEncode-4               14.7ms ± 2%    14.6ms ± 3%    ~     (p=0.094 n=14+14)
    JSONDecode-4               48.0ms ± 1%    48.5ms ± 1%  +0.92%  (p=0.001 n=14+12)
    Mandelbrot200-4            4.04ms ± 2%    4.06ms ± 1%    ~     (p=0.108 n=15+13)
    GoParse-4                  2.99ms ± 2%    3.00ms ± 1%    ~     (p=0.130 n=15+13)
    RegexpMatchEasy0_32-4      78.3ns ± 1%    79.5ns ± 1%  +1.51%  (p=0.000 n=15+14)
    RegexpMatchEasy0_1K-4       185ns ± 1%     186ns ± 1%  +0.76%  (p=0.005 n=15+15)
    RegexpMatchEasy1_32-4      79.0ns ± 2%    76.7ns ± 1%  -2.87%  (p=0.000 n=14+15)
    
    name                     old speed      new speed      delta
    GobDecode-4               123MB/s ± 2%   121MB/s ± 3%  -1.18%  (p=0.002 n=14+14)
    GobEncode-4               142MB/s ± 2%   142MB/s ± 1%    ~     (p=0.959 n=15+15)
    Gzip-4                   90.3MB/s ± 2%  89.1MB/s ± 2%  -1.34%  (p=0.002 n=15+15)
    Gunzip-4                  565MB/s ± 2%   567MB/s ± 2%    ~     (p=0.539 n=15+15)
    JSONEncode-4              132MB/s ± 2%   133MB/s ± 3%    ~     (p=0.091 n=14+14)
    JSONDecode-4             40.4MB/s ± 1%  40.0MB/s ± 1%  -0.92%  (p=0.001 n=14+12)
    GoParse-4                19.4MB/s ± 2%  19.3MB/s ± 1%    ~     (p=0.121 n=15+13)
    RegexpMatchEasy0_32-4     409MB/s ± 1%   403MB/s ± 1%  -1.47%  (p=0.000 n=15+14)
    RegexpMatchEasy0_1K-4    5.53GB/s ± 1%  5.49GB/s ± 1%  -0.86%  (p=0.002 n=15+15)
    RegexpMatchEasy1_32-4     405MB/s ± 2%   417MB/s ± 1%  +2.94%  (p=0.000 n=14+15)
    
    name                old time/op  new time/op  delta
    PoolsSingle1K-4     34.9ns ± 2%  30.4ns ± 4%  -12.80%  (p=0.000 n=15+15)
    PoolsSingle64K-4    36.9ns ± 1%  34.4ns ± 4%   -6.72%  (p=0.000 n=14+15)
    PoolsRandomSmall-4  34.8ns ± 3%  29.5ns ± 1%  -15.19%  (p=0.000 n=15+14)
    PoolsRandomLarge-4  38.6ns ± 1%  34.3ns ± 3%  -11.17%  (p=0.000 n=14+15)
    PoolSingle1K-4      26.1ns ± 1%  21.2ns ± 2%  -18.59%  (p=0.000 n=15+14)
    PoolSingle64K-4     26.7ns ± 2%  21.5ns ± 2%  -19.72%  (p=0.000 n=15+15)
    MakeSingle1K-4      24.2ns ± 2%  24.3ns ± 3%     ~     (p=0.132 n=13+15)
    MakeSingle64K-4     6.76µs ± 1%  6.96µs ± 5%   +2.94%  (p=0.002 n=13+13)
    MakeRandomSmall-4    531ns ± 4%   538ns ± 5%     ~     (p=0.066 n=14+15)
    MakeRandomLarge-4    152µs ± 0%   152µs ± 1%   -0.31%  (p=0.001 n=14+13)
    
    Change-Id: I86d7d9d2cac65335baf62214fbb35ba0fd8f9528
    Reviewed-on: https://go-review.googlesource.com/37416
    Run-TryBot: Ian Lance Taylor <iant@golang.org>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
    55310403
buffer.go 13.5 KB