1. 01 Oct, 2019 1 commit
    • Jim Mattson's avatar
      kvm: vmx: Limit guest PMCs to those supported on the host · e1fba49c
      Jim Mattson authored
      KVM can only virtualize as many PMCs as the host supports.
      
      Limit the number of generic counters and fixed counters to the number
      of corresponding counters supported on the host, rather than to
      INTEL_PMC_MAX_GENERIC and INTEL_PMC_MAX_FIXED, respectively.
      
      Note that INTEL_PMC_MAX_GENERIC is currently 32, which exceeds the 18
      contiguous MSR indices reserved by Intel for event selectors. Since
      the existing code relies on a contiguous range of MSR indices for
      event selectors, it can't possibly work for more than 18 general
      purpose counters.
      
      Fixes: f5132b01 ("KVM: Expose a version 2 architectural PMU to a guests")
      Signed-off-by: default avatarJim Mattson <jmattson@google.com>
      Reviewed-by: default avatarMarc Orr <marcorr@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e1fba49c
  2. 30 Sep, 2019 1 commit
  3. 27 Sep, 2019 5 commits
    • Vitaly Kuznetsov's avatar
      KVM: selftests: x86: clarify what is reported on KVM_GET_MSRS failure · 2e4a7597
      Vitaly Kuznetsov authored
      When KVM_GET_MSRS fail the report looks like
      
      ==== Test Assertion Failure ====
        lib/x86_64/processor.c:1089: r == nmsrs
        pid=28775 tid=28775 - Argument list too long
           1	0x000000000040a55f: vcpu_save_state at processor.c:1088 (discriminator 3)
           2	0x00000000004010e3: main at state_test.c:171 (discriminator 4)
           3	0x00007fb8e69223d4: ?? ??:0
           4	0x0000000000401287: _start at ??:?
        Unexpected result from KVM_GET_MSRS, r: 36 (failed at 194)
      
      and it's not obvious that '194' here is the failed MSR index and that
      it's printed in hex. Change that.
      Suggested-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2e4a7597
    • Waiman Long's avatar
      KVM: VMX: Set VMENTER_L1D_FLUSH_NOT_REQUIRED if !X86_BUG_L1TF · 19a36d32
      Waiman Long authored
      The l1tf_vmx_mitigation is only set to VMENTER_L1D_FLUSH_NOT_REQUIRED
      when the ARCH_CAPABILITIES MSR indicates that L1D flush is not required.
      However, if the CPU is not affected by L1TF, l1tf_vmx_mitigation will
      still be set to VMENTER_L1D_FLUSH_AUTO. This is certainly not the best
      option for a !X86_BUG_L1TF CPU.
      
      So force l1tf_vmx_mitigation to VMENTER_L1D_FLUSH_NOT_REQUIRED to make it
      more explicit in case users are checking the vmentry_l1d_flush parameter.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      [Patch rewritten accoring to Borislav Petkov's suggestion. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      19a36d32
    • Paolo Bonzini's avatar
      selftests: kvm: add test for dirty logging inside nested guests · 09444420
      Paolo Bonzini authored
      Check that accesses by nested guests are logged according to the
      L1 physical addresses rather than L2.
      
      Most of the patch is really adding EPT support to the testing
      framework.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      09444420
    • Paolo Bonzini's avatar
      KVM: x86: fix nested guest live migration with PML · 1f4e5fc8
      Paolo Bonzini authored
      Shadow paging is fundamentally incompatible with the page-modification
      log, because the GPAs in the log come from the wrong memory map.
      In particular, for the EPT page-modification log, the GPAs in the log come
      from L2 rather than L1.  (If there was a non-EPT page-modification log,
      we couldn't use it for shadow paging because it would log GVAs rather
      than GPAs).
      
      Therefore, we need to rely on write protection to record dirty pages.
      This has the side effect of bypassing PML, since writes now result in an
      EPT violation vmexit.
      
      This is relatively easy to add to KVM, because pretty much the only place
      that needs changing is spte_clear_dirty.  The first access to the page
      already goes through the page fault path and records the correct GPA;
      it's only subsequent accesses that are wrong.  Therefore, we can equip
      set_spte (where the first access happens) to record that the SPTE will
      have to be write protected, and then spte_clear_dirty will use this
      information to do the right thing.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1f4e5fc8
    • Paolo Bonzini's avatar
      KVM: x86: assign two bits to track SPTE kinds · 6eeb4ef0
      Paolo Bonzini authored
      Currently, we are overloading SPTE_SPECIAL_MASK to mean both
      "A/D bits unavailable" and MMIO, where the difference between the
      two is determined by mio_mask and mmio_value.
      
      However, the next patch will need two bits to distinguish
      availability of A/D bits from write protection.  So, while at
      it give MMIO its own bit pattern, and move the two bits from
      bit 62 to bits 52..53 since Intel is allocating EPT page table
      bits from the top.
      Reviewed-by: default avatarJunaid Shahid <junaids@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6eeb4ef0
  4. 26 Sep, 2019 7 commits
  5. 25 Sep, 2019 9 commits
    • Paolo Bonzini's avatar
      KVM: nVMX: cleanup and fix host 64-bit mode checks · fd3edd4a
      Paolo Bonzini authored
      KVM was incorrectly checking vmcs12->host_ia32_efer even if the "load
      IA32_EFER" exit control was reset.  Also, some checks were not using
      the new CC macro for tracing.
      
      Cleanup everything so that the vCPU's 64-bit mode is determined
      directly from EFER_LMA and the VMCS checks are based on that, which
      matches section 26.2.4 of the SDM.
      
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>
      Fixes: 5845038cReviewed-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fd3edd4a
    • Vitaly Kuznetsov's avatar
      KVM: vmx: fix build warnings in hv_enable_direct_tlbflush() on i386 · cab01850
      Vitaly Kuznetsov authored
      The following was reported on i386:
      
        arch/x86/kvm/vmx/vmx.c: In function 'hv_enable_direct_tlbflush':
        arch/x86/kvm/vmx/vmx.c:503:10: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
      
      pr_debugs() in this function are  more or less useless, let's just
      remove them. evmcs->hv_vm_id can use 'unsigned long' instead of 'u64'.
      
      Also, simplify the code a little bit.
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      cab01850
    • Sean Christopherson's avatar
      KVM: x86: Don't check kvm_rebooting in __kvm_handle_fault_on_reboot() · f209a26d
      Sean Christopherson authored
      Remove the kvm_rebooting check from VMX/SVM instruction exception fixup
      now that kvm_spurious_fault() conditions its BUG() on !kvm_rebooting.
      Because the 'cleanup_insn' functionally is also gone, deferring to
      kvm_spurious_fault() means __kvm_handle_fault_on_reboot() can eliminate
      its .fixup code entirely and have its exception table entry branch
      directly to the call to kvm_spurious_fault().
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f209a26d
    • Sean Christopherson's avatar
      KVM: x86: Drop ____kvm_handle_fault_on_reboot() · 98cd382d
      Sean Christopherson authored
      Remove the variation of __kvm_handle_fault_on_reboot() that accepts a
      post-fault cleanup instruction now that its sole user (VMREAD) uses
      a different method for handling faults.
      Acked-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      98cd382d
    • Sean Christopherson's avatar
      KVM: VMX: Add error handling to VMREAD helper · 6e202097
      Sean Christopherson authored
      Now that VMREAD flows require a taken branch, courtesy of commit
      
        3901336e ("x86/kvm: Don't call kvm_spurious_fault() from .fixup")
      
      bite the bullet and add full error handling to VMREAD, i.e. replace the
      JMP added by __ex()/____kvm_handle_fault_on_reboot() with a hinted Jcc.
      
      To minimize the code footprint, add a helper function, vmread_error(),
      to handle both faults and failures so that the inline flow has a single
      CALL.
      Acked-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6e202097
    • Sean Christopherson's avatar
      KVM: VMX: Optimize VMX instruction error and fault handling · 52a9fcbc
      Sean Christopherson authored
      Rework the VMX instruction helpers using asm-goto to branch directly
      to error/fault "handlers" in lieu of using __ex(), i.e. the generic
      ____kvm_handle_fault_on_reboot().  Branching directly to fault handling
      code during fixup avoids the extra JMP that is inserted after every VMX
      instruction when using the generic "fault on reboot" (see commit
      3901336e, "x86/kvm: Don't call kvm_spurious_fault() from .fixup").
      
      Opportunistically clean up the helpers so that they all have consistent
      error handling and messages.
      
      Leave the usage of ____kvm_handle_fault_on_reboot() (via __ex()) in
      kvm_cpu_vmxoff() and nested_vmx_check_vmentry_hw() as is.  The VMXOFF
      case is not a fast path, i.e. the cleanliness of __ex() is worth the
      JMP, and the extra JMP in nested_vmx_check_vmentry_hw() is unavoidable.
      
      Note, VMREAD cannot get the asm-goto treatment as output operands aren't
      compatible with GCC's asm-goto due to internal compiler restrictions.
      Acked-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      52a9fcbc
    • Sean Christopherson's avatar
      KVM: x86: Check kvm_rebooting in kvm_spurious_fault() · 4b526de5
      Sean Christopherson authored
      Explicitly check kvm_rebooting in kvm_spurious_fault() prior to invoking
      BUG(), as opposed to assuming the caller has already done so.  Letting
      kvm_spurious_fault() be called "directly" will allow VMX to better
      optimize its low level assembly flows.
      
      As a happy side effect, kvm_spurious_fault() no longer needs to be
      marked as a dead end since it doesn't unconditionally BUG().
      Acked-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4b526de5
    • Vitaly Kuznetsov's avatar
      KVM: selftests: fix ucall on x86 · 90a48843
      Vitaly Kuznetsov authored
      After commit e8bb4755eea2("KVM: selftests: Split ucall.c into architecture
      specific files") selftests which use ucall on x86 started segfaulting and
      apparently it's gcc to blame: it "optimizes" ucall() function throwing away
      va_start/va_end part because it thinks the structure is not being used.
      Previously, it couldn't do that because the there was also MMIO version and
      the decision which particular implementation to use was done at runtime.
      
      With older gccs it's possible to solve the problem by adding 'volatile'
      to 'struct ucall' but at least with gcc-8.3 this trick doesn't work.
      
      'memory' clobber seems to do the job.
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      90a48843
    • Wanpeng Li's avatar
      Revert "locking/pvqspinlock: Don't wait if vCPU is preempted" · 89340d09
      Wanpeng Li authored
      This patch reverts commit 75437bb3 (locking/pvqspinlock: Don't
      wait if vCPU is preempted).  A large performance regression was caused
      by this commit.  on over-subscription scenarios.
      
      The test was run on a Xeon Skylake box, 2 sockets, 40 cores, 80 threads,
      with three VMs of 80 vCPUs each.  The score of ebizzy -M is reduced from
      13000-14000 records/s to 1700-1800 records/s:
      
                Host                Guest                score
      
      vanilla w/o kvm optimizations     upstream    1700-1800 records/s
      vanilla w/o kvm optimizations     revert      13000-14000 records/s
      vanilla w/ kvm optimizations      upstream    4500-5000 records/s
      vanilla w/ kvm optimizations      revert      14000-15500 records/s
      
      Exit from aggressive wait-early mechanism can result in premature yield
      and extra scheduling latency.
      
      Actually, only 6% of wait_early events are caused by vcpu_is_preempted()
      being true.  However, when one vCPU voluntarily releases its vCPU, all
      the subsequently waiters in the queue will do the same and the cascading
      effect leads to bad performance.
      
      kvm optimizations:
      [1] commit d73eb57b (KVM: Boost vCPUs that are delivering interrupts)
      [2] commit 266e85a5 (KVM: X86: Boost queue head vCPU to mitigate lock waiter preemption)
      
      Tested-by: loobinliu@tencent.com
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: loobinliu@tencent.com
      Cc: stable@vger.kernel.org
      Fixes: 75437bb3 (locking/pvqspinlock: Don't wait if vCPU is preempted)
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      89340d09
  6. 24 Sep, 2019 17 commits