• Sean Christopherson's avatar
    KVM: nVMX: Synthesize TRIPLE_FAULT for L2 if emulation is required · cd0e615c
    Sean Christopherson authored
    
    
    Synthesize a triple fault if L2 guest state is invalid at the time of
    VM-Enter, which can happen if L1 modifies SMRAM or if userspace stuffs
    guest state via ioctls(), e.g. KVM_SET_SREGS.  KVM should never emulate
    invalid guest state, since from L1's perspective, it's architecturally
    impossible for L2 to have invalid state while L2 is running in hardware.
    E.g. attempts to set CR0 or CR4 to unsupported values will either VM-Exit
    or #GP.
    
    Modifying vCPU state via RSM+SMRAM and ioctl() are the only paths that
    can trigger this scenario, as nested VM-Enter correctly rejects any
    attempt to enter L2 with invalid state.
    
    RSM is a straightforward case as (a) KVM follows AMD's SMRAM layout and
    behavior, and (b) Intel's SDM states that loading reserved CR0/CR4 bits
    via RSM results in shutdown, i.e. there is precedent for KVM's behavior.
    Following AMD's SMRAM layout is important as AMD's layout saves/restores
    the descriptor cache information, including CS.RPL and SS.RPL, and also
    defines all the fields relevant to invalid guest state as read-only, i.e.
    so long as the vCPU had valid state before the SMI, which is guaranteed
    for L2, RSM will generate valid state unless SMRAM was modified.  Intel's
    layout saves/restores only the selector, which means that scenarios where
    the selector and cached RPL don't match, e.g. conforming code segments,
    would yield invalid guest state.  Intel CPUs fudge around this issued by
    stuffing SS.RPL and CS.RPL on RSM.  Per Intel's SDM on the "Default
    Treatment of RSM", paraphrasing for brevity:
    
      IF internal storage indicates that the [CPU was post-VMXON]
      THEN
         enter VMX operation (root or non-root);
         restore VMX-critical state as defined in Section 34.14.1;
         set to their fixed values any bits in CR0 and CR4 whose values must
         be fixed in VMX operation [unless coming from an unrestricted guest];
         IF RFLAGS.VM = 0 AND (in VMX root operation OR the
            “unrestricted guest” VM-execution control is 0)
         THEN
           CS.RPL := SS.DPL;
           SS.RPL := SS.DPL;
         FI;
         restore current VMCS pointer;
      FI;
    
    Note that Intel CPUs also overwrite the fixed CR0/CR4 bits, whereas KVM
    will sythesize TRIPLE_FAULT in this scenario.  KVM's behavior is allowed
    as both Intel and AMD define CR0/CR4 SMRAM fields as read-only, i.e. the
    only way for CR0 and/or CR4 to have illegal values is if they were
    modified by the L1 SMM handler, and Intel's SDM "SMRAM State Save Map"
    section states "modifying these registers will result in unpredictable
    behavior".
    
    KVM's ioctl() behavior is less straightforward.  Because KVM allows
    ioctls() to be executed in any order, rejecting an ioctl() if it would
    result in invalid L2 guest state is not an option as KVM cannot know if
    a future ioctl() would resolve the invalid state, e.g. KVM_SET_SREGS, or
    drop the vCPU out of L2, e.g. KVM_SET_NESTED_STATE.  Ideally, KVM would
    reject KVM_RUN if L2 contained invalid guest state, but that carries the
    risk of a false positive, e.g. if RSM loaded invalid guest state and KVM
    exited to userspace.  Setting a flag/request to detect such a scenario is
    undesirable because (a) it's extremely unlikely to add value to KVM as a
    whole, and (b) KVM would need to consider ioctl() interactions with such
    a flag, e.g. if userspace migrated the vCPU while the flag were set.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
    Message-Id: <20211207193006.120997-3-seanjc@google.com>
    Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
    Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
    cd0e615c
vmx.c 225 KB