• Nick Piggin's avatar
    fs: implement faster dentry memcmp · 9d55c369
    Nick Piggin authored
    The standard memcmp function on a Westmere system shows up hot in
    profiles in the `git diff` workload (both parallel and single threaded),
    and it is likely due to the costs associated with trapping into
    microcode, and little opportunity to improve memory access (dentry
    name is not likely to take up more than a cacheline).
    
    So replace it with an open-coded byte comparison. This increases code
    size by 8 bytes in the critical __d_lookup_rcu function, but the
    speedup is huge, averaging 10 runs of each:
    
    git diff st   user   sys   elapsed  CPU
    before        1.15   2.57  3.82      97.1
    after         1.14   2.35  3.61      96.8
    
    git diff mt   user   sys   elapsed  CPU
    before        1.27   3.85  1.46     349
    after         1.26   3.54  1.43     333
    
    Elapsed time for single threaded git diff at 95.0% confidence:
            -0.21  +/- 0.01
            -5.45% +/- 0.24%
    
    It's -0.66% +/- 0.06% elapsed time on my Opteron, so rep cmp costs on the
    fam10h seem to be relatively smaller, but there is still a win.
    Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
    9d55c369
dcache.c 77.7 KB