• Thomas Gleixner's avatar
    timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion · ca22975a
    Thomas Gleixner authored
    commit 9c164572 upstream.
    
    The clocksource delta to nanoseconds conversion is using signed math, but
    the delta is unsigned. This makes the conversion space smaller than
    necessary and in case of a multiplication overflow the conversion can
    become negative. The conversion is done with scaled math:
    
        s64 nsec_delta = ((s64)clkdelta * clk->mult) >> clk->shift;
    
    Shifting a signed integer right obvioulsy preserves the sign, which has
    interesting consequences:
    
     - Time jumps backwards
    
     - __iter_div_u64_rem() which is used in one of the calling code pathes
       will take forever to piecewise calculate the seconds/nanoseconds part.
    
    This has been reported by several people with different scenarios:
    
    David observed that when stopping a VM with a debugger:
    
     "It was essentially the stopped by debugger case.  I forget exactly why,
      but the guest was being explicitly stopped from outside, it wasn't just
      scheduling lag.  I think it was something in the vicinity of 10 minutes
      stopped."
    
     When lifting the stop the machine went dead.
    
    The stopped by debugger case is not really interesting, but nevertheless it
    would be a good thing not to die completely.
    
    But this was also observed on a live system by Liav:
    
     "When the OS is too overloaded, delta will get a high enough value for the
      msb of the sum delta * tkr->mult + tkr->xtime_nsec to be set, and so
      after the shift the nsec variable will gain a value similar to
      0xffffffffff000000."
    
    Unfortunately this has been reintroduced recently with commit 6bd58f09
    ("time: Add cycles to nanoseconds translation"). It had been fixed a year
    ago already in commit 35a4933a ("time: Avoid signed overflow in
    timekeeping_get_ns()").
    
    Though it's not surprising that the issue has been reintroduced because the
    function itself and the whole call chain uses s64 for the result and the
    propagation of it. The change in this recent commit is subtle:
    
       s64 nsec;
    
    -  nsec = (d * m + n) >> s:
    +  nsec = d * m + n;
    +  nsec >>= s;
    
    d being type of cycle_t adds another level of obfuscation.
    
    This wouldn't have happened if the previous change to unsigned computation
    would have made the 'nsec' variable u64 right away and a follow up patch
    had cleaned up the whole call chain.
    
    There have been patches submitted which basically did a revert of the above
    patch leaving everything else unchanged as signed. Back to square one. This
    spawned a admittedly pointless discussion about potential users which rely
    on the unsigned behaviour until someone pointed out that it had been fixed
    before. The changelogs of said patches added further confusion as they made
    finally false claims about the consequences for eventual users which expect
    signed results.
    
    Despite delta being cycle_t, aka. u64, it's very well possible to hand in
    a signed negative value and the signed computation will happily return the
    correct result. But nobody actually sat down and analyzed the code which
    was added as user after the propably unintended signed conversion.
    
    Though in sensitive code like this it's better to analyze it proper and
    make sure that nothing relies on this than hunting the subtle wreckage half
    a year later. After analyzing all call chains it stands that no caller can
    hand in a negative value (which actually would work due to the s64 cast)
    and rely on the signed math to do the right thing.
    
    Change the conversion function to unsigned math. The conversion of all call
    chains is done in a follow up patch.
    
    This solves the starvation issue, which was caused by the negative result,
    but it does not solve the underlying problem. It merily procrastinates
    it. When the timekeeper update is deferred long enough that the unsigned
    multiplication overflows, then time going backwards is observable again.
    
    It does neither solve the issue of clocksources with a small counter width
    which will wrap around possibly several times and cause random time stamps
    to be generated. But those are usually not found on systems used for
    virtualization, so this is likely a non issue.
    
    I took the liberty to claim authorship for this simply because
    analyzing all callsites and writing the changelog took substantially
    more time than just making the simple s/s64/u64/ change and ignore the
    rest.
    
    Fixes: 6bd58f09 ("time: Add cycles to nanoseconds translation")
    Reported-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
    Reported-by: default avatarLiav Rehana <liavr@mellanox.com>
    Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
    Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Parit Bhargava <prarit@redhat.com>
    Cc: Laurent Vivier <lvivier@redhat.com>
    Cc: "Christopher S. Hall" <christopher.s.hall@intel.com>
    Cc: Chris Metcalf <cmetcalf@mellanox.com>
    Cc: Richard Cochran <richardcochran@gmail.com>
    Cc: John Stultz <john.stultz@linaro.org>
    Link: http://lkml.kernel.org/r/20161208204228.688545601@linutronix.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    ca22975a
timekeeping.c 63.7 KB