• Chris Metcalf's avatar
    tile: avoid using clocksource_cyc2ns with absolute cycle count · 4df31626
    Chris Metcalf authored
    commit e658a6f1 upstream.
    
    For large values of "mult" and long uptimes, the intermediate
    result of "cycles * mult" can overflow 64 bits.  For example,
    the tile platform calls clocksource_cyc2ns with a 1.2 GHz clock;
    we have mult = 853, and after 208.5 days, we overflow 64 bits.
    
    Since clocksource_cyc2ns() is intended to be used for relative
    cycle counts, not absolute cycle counts, performance is more
    importance than accepting a wider range of cycle values.  So,
    just use mult_frac() directly in tile's sched_clock().
    
    Commit 4cecf6d4 ("sched, x86: Avoid unnecessary overflow
    in sched_clock") by Salman Qazi results in essentially the same
    generated code for x86 as this change does for tile.  In fact,
    a follow-on change by Salman introduced mult_frac() and switched
    to using it, so the C code was largely identical at that point too.
    
    Peter Zijlstra then added mul_u64_u32_shr() and switched x86
    to use it.  This is, in principle, better; by optimizing the
    64x64->64 multiplies to be 32x32->64 multiplies we can potentially
    save some time.  However, the compiler piplines the 64x64->64
    multiplies pretty well, and the conditional branch in the generic
    mul_u64_u32_shr() causes some bubbles in execution, with the
    result that it's pretty much a wash.  If tilegx provided its own
    implementation of mul_u64_u32_shr() without the conditional branch,
    we could potentially save 3 cycles, but that seems like small gain
    for a fair amount of additional build scaffolding; no other platform
    currently provides a mul_u64_u32_shr() override, and tile doesn't
    currently have an <asm/div64.h> header to put the override in.
    
    Additionally, gcc currently has an optimization bug that prevents
    it from recognizing the opportunity to use a 32x32->64 multiply,
    and so the result would be no better than the existing mult_frac()
    until such time as the compiler is fixed.
    
    For now, just using mult_frac() seems like the right answer.
    Signed-off-by: default avatarChris Metcalf <cmetcalf@mellanox.com>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    4df31626
time.c 8.38 KB