1. 02 Jul, 2019 10 commits
    • Jim Mattson's avatar
      kvm: nVMX: Remove unnecessary sync_roots from handle_invept · b1190198
      Jim Mattson authored
      When L0 is executing handle_invept(), the TDP MMU is active. Emulating
      an L1 INVEPT does require synchronizing the appropriate shadow EPT
      root(s), but a call to kvm_mmu_sync_roots in this context won't do
      that. Similarly, the hardware TLB and paging-structure-cache entries
      associated with the appropriate shadow EPT root(s) must be flushed,
      but requesting a TLB_FLUSH from this context won't do that either.
      
      How did this ever work? KVM always does a sync_roots and TLB flush (in
      the correct context) when transitioning from L1 to L2. That isn't the
      best choice for nested VM performance, but it effectively papers over
      the mistakes here.
      
      Remove the unnecessary operations and leave a comment to try to do
      better in the future.
      Reported-by: default avatarJunaid Shahid <junaids@google.com>
      Fixes: bfd0a56b ("nEPT: Nested INVEPT")
      Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Cc: Nadav Har'El <nyh@il.ibm.com>
      Cc: Jun Nakajima <jun.nakajima@intel.com>
      Cc: Xinhao Xu <xinhao.xu@intel.com>
      Cc: Yang Zhang <yang.z.zhang@Intel.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Reviewed-by Peter Shier <pshier@google.com>
      Reviewed-by: default avatarJunaid Shahid <junaids@google.com>
      Signed-off-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b1190198
    • Paolo Bonzini's avatar
      Documentation: kvm: document CPUID bit for MSR_KVM_POLL_CONTROL · 9824c83f
      Paolo Bonzini authored
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      9824c83f
    • Wanpeng Li's avatar
      KVM: X86: Expose PV_SCHED_YIELD CPUID feature bit to guest · 32b72ecc
      Wanpeng Li authored
      Expose PV_SCHED_YIELD feature bit to guest, the guest can check this
      feature bit before using paravirtualized sched yield.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      32b72ecc
    • Wanpeng Li's avatar
      KVM: X86: Implement PV sched yield hypercall · 71506297
      Wanpeng Li authored
      The target vCPUs are in runnable state after vcpu_kick and suitable
      as a yield target. This patch implements the sched yield hypercall.
      
      17% performance increasement of ebizzy benchmark can be observed in an
      over-subscribe environment. (w/ kvm-pv-tlb disabled, testing TLB flush
      call-function IPI-many since call-function is not easy to be trigged
      by userspace workload).
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      71506297
    • Wanpeng Li's avatar
      KVM: X86: Yield to IPI target if necessary · f85f6e7b
      Wanpeng Li authored
      When sending a call-function IPI-many to vCPUs, yield if any of
      the IPI target vCPUs was preempted, we just select the first
      preempted target vCPU which we found since the state of target
      vCPUs can change underneath and to avoid race conditions.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f85f6e7b
    • Vitaly Kuznetsov's avatar
      x86/kvm/nVMX: fix VMCLEAR when Enlightened VMCS is in use · 11e34914
      Vitaly Kuznetsov authored
      When Enlightened VMCS is in use, it is valid to do VMCLEAR and,
      according to TLFS, this should "transition an enlightened VMCS from the
      active to the non-active state". It is, however, wrong to assume that
      it is only valid to do VMCLEAR for the eVMCS which is currently active
      on the vCPU performing VMCLEAR.
      
      Currently, the logic in handle_vmclear() is broken: in case, there is no
      active eVMCS on the vCPU doing VMCLEAR we treat the argument as a 'normal'
      VMCS and kvm_vcpu_write_guest() to the 'launch_state' field irreversibly
      corrupts the memory area.
      
      So, in case the VMCLEAR argument is not the current active eVMCS on the
      vCPU, how can we know if the area it is pointing to is a normal or an
      enlightened VMCS?
      Thanks to the bug in Hyper-V (see commit 72aeb60c ("KVM: nVMX: Verify
      eVMCS revision id match supported eVMCS version on eVMCS VMPTRLD")) we can
      not, the revision can't be used to distinguish between them. So let's
      assume it is always enlightened in case enlightened vmentry is enabled in
      the assist page. Also, check if vmx->nested.enlightened_vmcs_enabled to
      minimize the impact for 'unenlightened' workloads.
      
      Fixes: b8bbab92 ("KVM: nVMX: implement enlightened VMPTRLD and VMCLEAR")
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      11e34914
    • Vitaly Kuznetsov's avatar
      x86/KVM/nVMX: don't use clean fields data on enlightened VMLAUNCH · a21a39c2
      Vitaly Kuznetsov authored
      Apparently, Windows doesn't maintain clean fields data after it does
      VMCLEAR for an enlightened VMCS so we can only use it on VMRESUME.
      The issue went unnoticed because currently we do nested_release_evmcs()
      in handle_vmclear() and the consecutive enlightened VMPTRLD invalidates
      clean fields when a new eVMCS is mapped but we're going to change the
      logic.
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a21a39c2
    • Paolo Bonzini's avatar
      KVM: nVMX: list VMX MSRs in KVM_GET_MSR_INDEX_LIST · 95c5c7c7
      Paolo Bonzini authored
      This allows userspace to know which MSRs are supported by the hypervisor.
      Unfortunately userspace must resort to tricks for everything except
      MSR_IA32_VMX_VMFUNC (which was just added in the previous patch).
      One possibility is to use the feature control MSR, which is tied to nested
      VMX as well and is present on all KVM versions that support feature MSRs.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      95c5c7c7
    • Paolo Bonzini's avatar
      KVM: nVMX: allow setting the VMFUNC controls MSR · e8a70bd4
      Paolo Bonzini authored
      Allow userspace to set a custom value for the VMFUNC controls MSR, as long
      as the capabilities it advertises do not exceed those of the host.
      
      Fixes: 27c42a1b ("KVM: nVMX: Enable VMFUNC for the L1 hypervisor", 2017-08-03)
      Reviewed-by: default avatarLiran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e8a70bd4
    • Paolo Bonzini's avatar
      KVM: nVMX: include conditional controls in /dev/kvm KVM_GET_MSRS · 6defc591
      Paolo Bonzini authored
      Some secondary controls are automatically enabled/disabled based on the CPUID
      values that are set for the guest.  However, they are still available at a
      global level and therefore should be present when KVM_GET_MSRS is sent to
      /dev/kvm.
      
      Fixes: 1389309c ("KVM: nVMX: expose VMX capabilities for nested hypervisors to userspace", 2018-02-26)
      Reviewed-by: default avatarLiran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6defc591
  2. 20 Jun, 2019 2 commits
  3. 18 Jun, 2019 28 commits