1. 21 Apr, 2020 23 commits
  2. 20 Apr, 2020 10 commits
  3. 15 Apr, 2020 7 commits
    • Sean Christopherson's avatar
      KVM: x86: Export kvm_propagate_fault() (as kvm_inject_emulated_page_fault) · 53b3d8e9
      Sean Christopherson authored
      Export the page fault propagation helper so that VMX can use it to
      correctly emulate TLB invalidation on page faults in an upcoming patch.
      
      In the (hopefully) not-too-distant future, SGX virtualization will also
      want access to the helper for injecting page faults to the correct level
      (L1 vs. L2) when emulating ENCLS instructions.
      
      Rename the function to kvm_inject_emulated_page_fault() to clarify that
      it is (a) injecting a fault and (b) only for page faults.  WARN if it's
      invoked with an exception other than PF_VECTOR.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200320212833.3507-6-sean.j.christopherson@intel.com>
      Reviewed-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      53b3d8e9
    • Junaid Shahid's avatar
      KVM: nVMX: Invalidate all roots when emulating INVVPID without EPT · d6e3f838
      Junaid Shahid authored
      Free all roots when emulating INVVPID for L1 and EPT is disabled, as
      outstanding changes to the page tables managed by L1 need to be
      recognized.  Because L1 and L2 share an MMU when EPT is disabled, and
      because VPID is not tracked by the MMU role, all roots in the current
      MMU (root_mmu) need to be freed, otherwise a future nested VM-Enter or
      VM-Exit could do a fast CR3 switch (without a flush/sync) and consume
      stale SPTEs.
      
      Fixes: 5c614b35 ("KVM: nVMX: nested VPID emulation")
      Signed-off-by: default avatarJunaid Shahid <junaids@google.com>
      [sean: ported to upstream KVM, reworded the comment and changelog]
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200320212833.3507-5-sean.j.christopherson@intel.com>
      Reviewed-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d6e3f838
    • Sean Christopherson's avatar
      KVM: nVMX: Invalidate all EPTP contexts when emulating INVEPT for L1 · f8aa7e39
      Sean Christopherson authored
      Free all L2 (guest_mmu) roots when emulating INVEPT for L1.  Outstanding
      changes to the EPT tables managed by L1 need to be recognized, and
      relying on KVM to always flush L2's EPTP context on nested VM-Enter is
      dangerous.
      
      Similar to handle_invpcid(), rely on kvm_mmu_free_roots() to do a remote
      TLB flush if necessary, e.g. if L1 has never entered L2 then there is
      nothing to be done.
      
      Nuking all L2 roots is overkill for the single-context variant, but it's
      the safe and easy bet.  A more precise zap mechanism will be added in
      the future.  Add a TODO to call out that KVM only needs to invalidate
      affected contexts.
      
      Fixes: 14c07ad8 ("x86/kvm/mmu: introduce guest_mmu")
      Reported-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200320212833.3507-4-sean.j.christopherson@intel.com>
      Reviewed-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f8aa7e39
    • Sean Christopherson's avatar
      KVM: nVMX: Validate the EPTP when emulating INVEPT(EXTENT_CONTEXT) · eed0030e
      Sean Christopherson authored
      Signal VM-Fail for the single-context variant of INVEPT if the specified
      EPTP is invalid.  Per the INEVPT pseudocode in Intel's SDM, it's subject
      to the standard EPT checks:
      
        If VM entry with the "enable EPT" VM execution control set to 1 would
        fail due to the EPTP value then VMfail(Invalid operand to INVEPT/INVVPID);
      
      Fixes: bfd0a56b ("nEPT: Nested INVEPT")
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200320212833.3507-3-sean.j.christopherson@intel.com>
      Reviewed-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      eed0030e
    • Sean Christopherson's avatar
      KVM: VMX: Flush all EPTP/VPID contexts on remote TLB flush · e8eff282
      Sean Christopherson authored
      Flush all EPTP/VPID contexts if a TLB flush _may_ have been triggered by
      a remote or deferred TLB flush, i.e. by KVM_REQ_TLB_FLUSH.  Remote TLB
      flushes require all contexts to be invalidated, not just the active
      contexts, e.g. all mappings in all contexts for a given HVA need to be
      invalidated on a mmu_notifier invalidation.  Similarly, the instigator
      of the deferred TLB flush may be expecting all contexts to be flushed,
      e.g. vmx_vcpu_load_vmcs().
      
      Without nested VMX, flushing only the current EPTP/VPID context isn't
      problematic because KVM uses a constant VPID for each vCPU, and
      mmu_alloc_direct_roots() all but guarantees KVM will use a single EPTP
      for L1.  In the rare case where a different EPTP is created or reused,
      KVM (currently) unconditionally flushes the new EPTP context prior to
      entering the guest.
      
      With nested VMX, KVM conditionally uses a different VPID for L2, and
      unconditionally uses a different EPTP for L2.  Because KVM doesn't
      _intentionally_ guarantee L2's EPTP/VPID context is flushed on nested
      VM-Enter, it'd be possible for a malicious L1 to attack the host and/or
      different VMs by exploiting the lack of flushing for L2.
      
        1) Launch nested guest from malicious L1.
      
        2) Nested VM-Enter to L2.
      
        3) Access target GPA 'g'.  CPU inserts TLB entry tagged with L2's ASID
           mapping 'g' to host PFN 'x'.
      
        2) Nested VM-Exit to L1.
      
        3) L1 triggers kernel same-page merging (ksm) by duplicating/zeroing
           the page for PFN 'x'.
      
        4) Host kernel merges PFN 'x' with PFN 'y', i.e. unmaps PFN 'x' and
           remaps the page to PFN 'y'.  mmu_notifier sends invalidate command,
           KVM flushes TLB only for L1's ASID.
      
        4) Host kernel reallocates PFN 'x' to some other task/guest.
      
        5) Nested VM-Enter to L2.  KVM does not invalidate L2's EPTP or VPID.
      
        6) L2 accesses GPA 'g' and gains read/write access to PFN 'x' via its
           stale TLB entry.
      
      However, current KVM unconditionally flushes L1's EPTP/VPID context on
      nested VM-Exit.  But, that behavior is mostly unintentional, KVM doesn't
      go out of its way to flush EPTP/VPID on nested VM-Enter/VM-Exit, rather
      a TLB flush is guaranteed to occur prior to re-entering L1 due to
      __kvm_mmu_new_cr3() always being called with skip_tlb_flush=false.  On
      nested VM-Enter, this happens via kvm_init_shadow_ept_mmu() (nested EPT
      enabled) or in nested_vmx_load_cr3() (nested EPT disabled).  On nested
      VM-Exit it occurs via nested_vmx_load_cr3().
      
      This also fixes a bug where a deferred TLB flush in the context of L2,
      with EPT disabled, would flush L1's VPID instead of L2's VPID, as
      vmx_flush_tlb() flushes L1's VPID regardless of is_guest_mode().
      
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Ben Gardon <bgardon@google.com>
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Junaid Shahid <junaids@google.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: John Haxby <john.haxby@oracle.com>
      Reviewed-by: default avatarLiran Alon <liran.alon@oracle.com>
      Fixes: efebf0aa ("KVM: nVMX: Do not flush TLB on L1<->L2 transitions if L1 uses VPID and EPT")
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200320212833.3507-2-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e8eff282
    • Wainer dos Santos Moschetta's avatar
      selftests: kvm: Add testcase for creating max number of memslots · 909e0aba
      Wainer dos Santos Moschetta authored
      This patch introduces test_add_max_memory_regions(), which checks
      that a VM can have added memory slots up to the limit defined in
      KVM_CAP_NR_MEMSLOTS. Then attempt to add one more slot to
      verify it fails as expected.
      Signed-off-by: default avatarWainer dos Santos Moschetta <wainersm@redhat.com>
      Reviewed-by: default avatarAndrew Jones <drjones@redhat.com>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200410231707.7128-11-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      909e0aba
    • Sean Christopherson's avatar
      KVM: selftests: Make set_memory_region_test common to all architectures · 5b4f758f
      Sean Christopherson authored
      Make set_memory_region_test available on all architectures by wrapping
      the bits that are x86-specific in ifdefs.  A future testcase
      to create the maximum number of memslots will be architecture
      agnostic.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200410231707.7128-10-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5b4f758f