• Christoph Lameter's avatar
    [PATCH] Time interpolator: Scalability enhancements and high resolution time for IA64 · bd46a4f1
    Christoph Lameter authored
    This has been in the ia64 (and hence -mm) trees for a couple of months.
    
    Changelog:
     * Affects only architectures which define CONFIG_TIME_INTERPOLATION
       (currently only IA64)
     * Genericize time interpolation, make time interpolators easily usable
       and provide instructions on how to use the interpolator for other
       architectures.
     * Provide nanosecond resolution for clock_gettime and an accuracy
       up to the time interpolator time base.
     * clock_getres() reports resolution of underlying time basis which
       is typically <50ns and may be 1ns on some systems.
     * Make time interpolator self-tuning to limit time jumps
       and to make the interpolators work correctly on systems with
       broken time base specifications.
     * SMP scalability: Make clock_gettime and gettimeofday scale O(1)
       by removing the cmpxchg for most clocks (tested for up to 512 CPUs)
     * IA64: provide asm fastcall that doubles the performance
       of gettimeofday and clock_gettime on SGI and other IA64 systems
       (asm fastcalls scale O(1) together with the scalability fixes).
     * IA64: provide nojitter kernel option so that IA64 systems with
       correctly synchronized ITC counters may also enjoy the
       scalability enhancements.
    
    Performance measurements for single calls (ITC cycles):
    
    A. 4 way Intel IA64 SMP system (kmart)
    
    ITC offsets:
    kmart:/usr/src/noship-tests # dmesg|grep synchr
    CPU 1: synchronized ITC with CPU 0 (last diff 1 cycles, maxerr 417 cycles)
    CPU 2: synchronized ITC with CPU 0 (last diff 2 cycles, maxerr 417 cycles)
    CPU 3: synchronized ITC with CPU 0 (last diff 1 cycles, maxerr 417 cycles)
    
    A.1. Current kernel code
    
    kmart:/usr/src/noship-tests # ./dmt
    gettimeofday cycles: 3737 220 215 215 215 215 215 215 215 215
    clock_gettime(REAL) cycles: 4058 575 564 576 565 566 558 558 558 558
    clock_gettime(MONO) cycles: 1583 621 609 609 609 609 609 609 609 609
    clock_gettime(PROCESS) cycles: 71428 298 259 259 259 259 259 259 259 259
    clock_gettime(THREAD) cycles: 3982 336 290 298 298 298 298 286 286 286
    
    A.2 New code using cmpxchg
    
    kmart:/usr/src/noship-tests # ./dmt
    gettimeofday cycles: 3145 213 216 213 213 213 213 213 213 213
    clock_gettime(REAL) cycles: 3185 230 210 210 210 210 210 210 210 210
    clock_gettime(MONO) cycles: 284 217 217 216 216 216 216 216 216 216
    clock_gettime(PROCESS) cycles: 68857 289 270 259 259 259 259 259 259 259
    clock_gettime(THREAD) cycles: 3862 339 298 298 298 298 290 286 286 286
    
    A.3 New code with cmpxchg switched off (nojitter kernel option)
    
    kmart:/usr/src/noship-tests # ./dmt
    gettimeofday cycles: 3195 219 219 212 212 212 212 212 212 212
    clock_gettime(REAL) cycles: 3003 228 205 205 205 205 205 205 205 205
    clock_gettime(MONO) cycles: 279 209 209 209 208 208 208 208 208 208
    clock_gettime(PROCESS) cycles: 65849 292 259 259 268 270 270 259 259 259
    
    B. SGI SN2 system running 512 IA64 CPUs.
    
    B.1. Current kernel code
    
    [root@ascender noship-tests]# ./dmt
    gettimeofday cycles: 17221 1028 1007 1004 1004 1004 1010 25928 1002 1003
    clock_gettime(REAL) cycles: 10388 1099 1055 1044 1064 1063 1051 1056 1061 1056
    clock_gettime(MONO) cycles: 2363 96 96 96 96 96 96 96 96 96
    clock_gettime(PROCESS) cycles: 46537 804 660 666 666 666 666 666 666 666
    clock_gettime(THREAD) cycles: 10945 727 710 684 685 686 685 686 685 686
    
    B.2 New code
    
    ascender:~/noship-tests # ./dmt
    gettimeofday cycles: 3874 610 588 588 588 588 588 588 588 588
    clock_gettime(REAL) cycles: 3893 612 588 582 588 588 588 588 588 588
    clock_gettime(MONO) cycles: 686 595 595 588 588 588 588 588 588 588
    clock_gettime(PROCESS) cycles: 290759 322 269 269 259 265 265 265 259 259
    clock_gettime(THREAD) cycles: 5153 358 306 298 296 304 290 298 298 298
    
    Scalability of time functions (in time it takes to do a million calls):
    =======================================================================
    
    A. 4 way Intel IA SMP system (kmart)
    A.1 Current code
    
    kmart:/usr/src/noship-tests # ./todscale -n1000000
     CPUS       WALL  WALL/CPUS
        1      0.192      0.192
        2      1.125      0.563
        4      9.229      2.307
    
    A.2 New code using cmpxchg
    
    kmart:/usr/src/noship-tests # ./todscale
     CPUS       WALL  WALL/CPUS
        1      0.188      0.188
        2      0.457      0.229
        4      0.413      0.103
    
    (the measurement with 4 cpus may fluctuate up to 15.x somehow)
    
    A.3 New code without cmpxchg (nojitter kernel option)
    
    kmart:/usr/src/noship-tests # ./todscale -n10000000
     CPUS       WALL  WALL/CPUS
        1      0.180      0.180
        2      0.180      0.090
        4      0.252      0.063
    
    B. SGI SN2 system running 512 IA64 CPUs.
    
    The system has a global monotonic clock and therefore has
    no need for compensation. Current code uses a cmpxchg. New
    code has no cmpxchg.
    
    B.1 current code
    
    ascender:~/noship-tests # ./todscale
     CPUS       WALL  WALL/CPUS
        1      0.850      0.850
        2      1.767      0.884
        4      6.124      1.531
        8     20.777      2.597
       16     57.693      3.606
       32    164.688      5.146
       64    456.647      7.135
      128   1093.371      8.542
      256   2778.257     10.853
    (System crash at 512 CPUs)
    
    B.2 New code
    
    ascender:~/noship-tests # ./todscale -n1000000
     CPUS       WALL  WALL/CPUS
        1      0.426      0.426
        2      0.429      0.215
        4      0.436      0.109
        8      0.452      0.057
       16      0.454      0.028
       32      0.457      0.014
       64      0.459      0.007
      128      0.466      0.004
      256      0.474      0.002
      512      0.518      0.001
    
    Clock Accuracy
    ==============
    A. 4 CPU SMP system
    
    A.1 Old code
    
    kmart:/usr/src/noship-tests # ./cdisp
              Gettimeofday() = 1092124757.270305000
               CLOCK_REALTIME= 1092124757.270382000 resolution= 0.000976563
              CLOCK_MONOTONIC=         89.696726590 resolution= 0.000976563
     CLOCK_PROCESS_CPUTIME_ID=          0.001242507 resolution= 0.000000001
      CLOCK_THREAD_CPUTIME_ID=          0.001255310 resolution= 0.000000001
    
    A.2 New code
    
    kmart:/usr/src/noship-tests # ./cdisp
              Gettimeofday() = 1092124478.194530000
               CLOCK_REALTIME= 1092124478.194603399 resolution= 0.000000001
              CLOCK_MONOTONIC=         88.198315204 resolution= 0.000000001
     CLOCK_PROCESS_CPUTIME_ID=          0.001241235 resolution= 0.000000001
      CLOCK_THREAD_CPUTIME_ID=          0.001254747 resolution= 0.000000001
    Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
    Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    bd46a4f1
cyclone.c 2.57 KB