• Rafael J. Wysocki's avatar
    x86 / hibernate: Use hlt_play_dead() when resuming from hibernation · 72b57e4e
    Rafael J. Wysocki authored
    BugLink: https://bugs.launchpad.net/bugs/1686061
    
    On Intel hardware, native_play_dead() uses mwait_play_dead() by
    default and only falls back to the other methods if that fails.
    That also happens during resume from hibernation, when the restore
    (boot) kernel runs disable_nonboot_cpus() to take all of the CPUs
    except for the boot one offline.
    
    However, that is problematic, because the address passed to
    __monitor() in mwait_play_dead() is likely to be written to in the
    last phase of hibernate image restoration and that causes the "dead"
    CPU to start executing instructions again.  Unfortunately, the page
    containing the address in that CPU's instruction pointer may not be
    valid any more at that point.
    
    First, that page may have been overwritten with image kernel memory
    contents already, so the instructions the CPU attempts to execute may
    simply be invalid.  Second, the page tables previously used by that
    CPU may have been overwritten by image kernel memory contents, so the
    address in its instruction pointer is impossible to resolve then.
    
    A report from Varun Koyyalagunta and investigation carried out by
    Chen Yu show that the latter sometimes happens in practice.
    
    To prevent it from happening, temporarily change the smp_ops.play_dead
    pointer during resume from hibernation so that it points to a special
    "play dead" routine which uses hlt_play_dead() and avoids the
    inadvertent "revivals" of "dead" CPUs this way.
    
    A slightly unpleasant consequence of this change is that if the
    system is hibernated with one or more CPUs offline, it will generally
    draw more power after resume than it did before hibernation, because
    the physical state entered by CPUs via hlt_play_dead() is higher-power
    than the mwait_play_dead() one in the majority of cases.  It is
    possible to work around this, but it is unclear how much of a problem
    that's going to be in practice, so the workaround will be implemented
    later if it turns out to be necessary.
    
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=106371Reported-by: default avatarVarun Koyyalagunta <cpudebug@centtech.com>
    Original-by: default avatarChen Yu <yu.c.chen@intel.com>
    Tested-by: default avatarChen Yu <yu.c.chen@intel.com>
    Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
    Acked-by: default avatarIngo Molnar <mingo@kernel.org>
    (cherry picked from commit 406f992e)
    Signed-off-by: default avatarKai-Heng Feng <kai.heng.feng@canonical.com>
    Acked-by: default avatarColin Ian King <colin.king@canonical.com>
    Acked-by: default avatarKamal Mostafa <kamal@canonical.com>
    Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@canonical.com>
    72b57e4e
smpboot.c 40.9 KB