• Denis Vlasenko's avatar
    vsprintf.c: optimizing, part 2: base 10 conversion speedup, v2 · 4277eedd
    Denis Vlasenko authored
    Optimize integer-to-string conversion in vsprintf.c for base 10.  This is
    by far the most used conversion, and in some use cases it impacts
    performance.  For example, top reads /proc/$PID/stat for every process, and
    with 4000 processes decimal conversion alone takes noticeable time.
    
    Using code from
    
    http://www.cs.uiowa.edu/~jones/bcd/decimal.html
    (with permission from the author, Douglas W. Jones)
    
    binary-to-decimal-string conversion is done in groups of five digits at
    once, using only additions/subtractions/shifts (with -O2; -Os throws in
    some multiply instructions).
    
    On i386 arch gcc 4.1.2 -O2 generates ~500 bytes of code.
    
    This patch is run tested. Userspace benchmark/test is also attached.
    I tested it on PIII and AMD64 and new code is generally ~2.5 times
    faster. On AMD64:
    
    # ./vsprintf_verify-O2
    Original decimal conv: .......... 151 ns per iteration
    Patched decimal conv:  .......... 62 ns per iteration
    Testing correctness
    12895992590592 ok...        [Ctrl-C]
    # ./vsprintf_verify-O2
    Original decimal conv: .......... 151 ns per iteration
    Patched decimal conv:  .......... 62 ns per iteration
    Testing correctness
    26025406464 ok...        [Ctrl-C]
    
    More realistic test: top from busybox project was modified to
    report how many us it took to scan /proc (this does not account
    any processing done after that, like sorting process list),
    and then I test it with 4000 processes:
    
    #!/bin/sh
    i=4000
    while test $i != 0; do
        sleep 30 &
        let i--
    done
    busybox top -b -n3 >/dev/null
    
    on unpatched kernel:
    
    top: 4120 processes took 102864 microseconds to scan
    top: 4120 processes took 91757 microseconds to scan
    top: 4120 processes took 92517 microseconds to scan
    top: 4120 processes took 92581 microseconds to scan
    
    on patched kernel:
    
    top: 4120 processes took 75460 microseconds to scan
    top: 4120 processes took 66451 microseconds to scan
    top: 4120 processes took 67267 microseconds to scan
    top: 4120 processes took 67618 microseconds to scan
    
    The speedup comes from much faster generation of /proc/PID/stat
    by sprintf() calls inside the kernel.
    Signed-off-by: default avatarDouglas W Jones <jones@cs.uiowa.edu>
    Signed-off-by: default avatarDenys Vlasenko <vda.linux@googlemail.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    4277eedd
vsprintf.c 22.4 KB