• Vitaly Kuznetsov's avatar
    x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic · 59107e2f
    Vitaly Kuznetsov authored
    There is a feature in Hyper-V ('Debug-VM --InjectNonMaskableInterrupt')
    which injects NMI to the guest. We may want to crash the guest and do kdump
    on this NMI by enabling unknown_nmi_panic. To make kdump succeed we need to
    allow the kdump kernel to re-establish VMBus connection so it will see
    VMBus devices (storage, network,..).
    
    To properly unload VMBus making it possible to start over during kdump we
    need to do the following:
    
     - Send an 'unload' message to the hypervisor. This can be done on any CPU
       so we do this the crashing CPU.
    
     - Receive the 'unload finished' reply message. WS2012R2 delivers this
       message to the CPU which was used to establish VMBus connection during
       module load and this CPU may differ from the CPU sending 'unload'.
    
    Receiving a VMBus message means the following:
    
     - There is a per-CPU slot in memory for one message. This slot can in
       theory be accessed by any CPU.
    
     - We get an interrupt on the CPU when a message was placed into the slot.
    
     - When we read the message we need to clear the slot and signal the fact
       to the hypervisor. In case there are more messages to this CPU pending
       the hypervisor will deliver the next message. The signaling is done by
       writing to an MSR so this can only be done on the appropriate CPU.
    
    To avoid doing cross-CPU work on crash we have vmbus_wait_for_unload()
    function which checks message slots for all CPUs in a loop waiting for the
    'unload finished' messages. However, there is an issue which arises when
    these conditions are met:
    
     - We're crashing on a CPU which is different from the one which was used
       to initially contact the hypervisor.
    
     - The CPU which was used for the initial contact is blocked with interrupts
       disabled and there is a message pending in the message slot.
    
    In this case we won't be able to read the 'unload finished' message on the
    crashing CPU. This is reproducible when we receive unknown NMIs on all CPUs
    simultaneously: the first CPU entering panic() will proceed to crash and
    all other CPUs will stop themselves with interrupts disabled.
    
    The suggested solution is to handle unknown NMIs for Hyper-V guests on the
    first CPU which gets them only. This will allow us to rely on VMBus
    interrupt handler being able to receive the 'unload finish' message in
    case it is delivered to a different CPU.
    
    The issue is not reproducible on WS2016 as Debug-VM delivers NMI to the
    boot CPU only, WS2012R2 and earlier Hyper-V versions are affected.
    Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
    Acked-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
    Cc: devel@linuxdriverproject.org
    Cc: Haiyang Zhang <haiyangz@microsoft.com>
    Link: http://lkml.kernel.org/r/20161202100720.28121-1-vkuznets@redhat.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
    59107e2f
mshyperv.c 5.73 KB