• Mel Gorman's avatar
    sched/fair: Do not re-read ->h_load_next during hierarchical load calculation · 0e9f0245
    Mel Gorman authored
    A NULL pointer dereference bug was reported on a distribution kernel but
    the same issue should be present on mainline kernel. It occured on s390
    but should not be arch-specific.  A partial oops looks like:
    
      Unable to handle kernel pointer dereference in virtual kernel address space
      ...
      Call Trace:
        ...
        try_to_wake_up+0xfc/0x450
        vhost_poll_wakeup+0x3a/0x50 [vhost]
        __wake_up_common+0xbc/0x178
        __wake_up_common_lock+0x9e/0x160
        __wake_up_sync_key+0x4e/0x60
        sock_def_readable+0x5e/0x98
    
    The bug hits any time between 1 hour to 3 days. The dereference occurs
    in update_cfs_rq_h_load when accumulating h_load. The problem is that
    cfq_rq->h_load_next is not protected by any locking and can be updated
    by parallel calls to task_h_load. Depending on the compiler, code may be
    generated that re-reads cfq_rq->h_load_next after the check for NULL and
    then oops when reading se->avg.load_avg. The dissassembly showed that it
    was possible to reread h_load_next after the check for NULL.
    
    While this does not appear to be an issue for later compilers, it's still
    an accident if the correct code is generated. Full locking in this path
    would have high overhead so this patch uses READ_ONCE to read h_load_next
    only once and check for NULL before dereferencing. It was confirmed that
    there were no further oops after 10 days of testing.
    
    As Peter pointed out, it is also necessary to use WRITE_ONCE() to avoid any
    potential problems with store tearing.
    Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Reviewed-by: default avatarValentin Schneider <valentin.schneider@arm.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Mike Galbraith <efault@gmx.de>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: <stable@vger.kernel.org>
    Fixes: 68520796 ("sched: Move h_load calculation to task_h_load()")
    Link: https://lkml.kernel.org/r/20190319123610.nsivgf3mjbjjesxb@techsingularity.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
    0e9f0245
fair.c 284 KB