1. 15 May, 2020 10 commits
    • Peter Xu's avatar
      KVM: X86: Sanity check on gfn before removal · 0fd46044
      Peter Xu authored
      The index returned by kvm_async_pf_gfn_slot() will be removed when an
      async pf gfn is going to be removed.  However kvm_async_pf_gfn_slot()
      is not reliable in that it can return the last key it loops over even
      if the gfn is not found in the async gfn array.  It should never
      happen, but it's still better to sanity check against that to make
      sure no unexpected gfn will be removed.
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20200416155910.267514-1-peterx@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0fd46044
    • Peter Xu's avatar
      KVM: No need to retry for hva_to_pfn_remapped() · 5b494aea
      Peter Xu authored
      hva_to_pfn_remapped() calls fixup_user_fault(), which has already
      handled the retry gracefully.  Even if "unlocked" is set to true, it
      means that we've got a VM_FAULT_RETRY inside fixup_user_fault(),
      however the page fault has already retried and we should have the pfn
      set correctly.  No need to do that again.
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20200416155906.267462-1-peterx@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5b494aea
    • Peter Xu's avatar
      KVM: X86: Force ASYNC_PF_PER_VCPU to be power of two · dd03bcaa
      Peter Xu authored
      Forcing the ASYNC_PF_PER_VCPU to be power of two is much easier to be
      used rather than calling roundup_pow_of_two() from time to time.  Do
      this by adding a BUILD_BUG_ON() inside the hash function.
      
      Another point is that generally async pf does not allow concurrency
      over ASYNC_PF_PER_VCPU after all (see kvm_setup_async_pf()), so it
      does not make much sense either to have it not a power of two or some
      of the entries will definitely be wasted.
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20200416155859.267366-1-peterx@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      dd03bcaa
    • Uros Bizjak's avatar
      KVM: VMX: Remove unneeded __ASM_SIZE usage with POP instruction · c16312f4
      Uros Bizjak authored
      POP [mem] defaults to the word size, and the only legal non-default
      size is 16 bits, e.g. a 32-bit POP will #UD in 64-bit mode and vice
      versa, no need to use __ASM_SIZE macro to force operating mode.
      
      Changes since v1:
      - Fix commit message.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Reviewed-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarUros Bizjak <ubizjak@gmail.com>
      Message-Id: <20200427205035.1594232-1-ubizjak@gmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c16312f4
    • Sean Christopherson's avatar
      KVM: x86/mmu: Add a helper to consolidate root sp allocation · 8123f265
      Sean Christopherson authored
      Add a helper, mmu_alloc_root(), to consolidate the allocation of a root
      shadow page, which has the same basic mechanics for all flavors of TDP
      and shadow paging.
      
      Note, __pa(sp->spt) doesn't need to be protected by mmu_lock, sp->spt
      points at a kernel page.
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200428023714.31923-1-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      8123f265
    • Sean Christopherson's avatar
      KVM: x86/mmu: Drop KVM's hugepage enums in favor of the kernel's enums · 3bae0459
      Sean Christopherson authored
      Replace KVM's PT_PAGE_TABLE_LEVEL, PT_DIRECTORY_LEVEL and PT_PDPE_LEVEL
      with the kernel's PG_LEVEL_4K, PG_LEVEL_2M and PG_LEVEL_1G.  KVM's
      enums are borderline impossible to remember and result in code that is
      visually difficult to audit, e.g.
      
              if (!enable_ept)
                      ept_lpage_level = 0;
              else if (cpu_has_vmx_ept_1g_page())
                      ept_lpage_level = PT_PDPE_LEVEL;
              else if (cpu_has_vmx_ept_2m_page())
                      ept_lpage_level = PT_DIRECTORY_LEVEL;
              else
                      ept_lpage_level = PT_PAGE_TABLE_LEVEL;
      
      versus
      
              if (!enable_ept)
                      ept_lpage_level = 0;
              else if (cpu_has_vmx_ept_1g_page())
                      ept_lpage_level = PG_LEVEL_1G;
              else if (cpu_has_vmx_ept_2m_page())
                      ept_lpage_level = PG_LEVEL_2M;
              else
                      ept_lpage_level = PG_LEVEL_4K;
      
      No functional change intended.
      Suggested-by: default avatarBarret Rhoden <brho@google.com>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200428005422.4235-4-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3bae0459
    • Sean Christopherson's avatar
      KVM: x86/mmu: Move max hugepage level to a separate #define · e662ec3e
      Sean Christopherson authored
      Rename PT_MAX_HUGEPAGE_LEVEL to KVM_MAX_HUGEPAGE_LEVEL and make it a
      separate define in anticipation of dropping KVM's PT_*_LEVEL enums in
      favor of the kernel's PG_LEVEL_* enums.
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200428005422.4235-3-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e662ec3e
    • Sean Christopherson's avatar
      KVM: x86/mmu: Tweak PSE hugepage handling to avoid 2M vs 4M conundrum · b2f432f8
      Sean Christopherson authored
      Change the PSE hugepage handling in walk_addr_generic() to fire on any
      page level greater than PT_PAGE_TABLE_LEVEL, a.k.a. PG_LEVEL_4K.  PSE
      paging only has two levels, so "== 2" and "> 1" are functionally the
      same, i.e. this is a nop.
      
      A future patch will drop KVM's PT_*_LEVEL enums in favor of the kernel's
      PG_LEVEL_* enums, at which point "walker->level == PG_LEVEL_2M" is
      semantically incorrect (though still functionally ok).
      
      No functional change intended.
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200428005422.4235-2-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b2f432f8
    • Xiaoyao Li's avatar
      kvm: x86: Cleanup vcpu->arch.guest_xstate_size · a71936ab
      Xiaoyao Li authored
      vcpu->arch.guest_xstate_size lost its only user since commit df1daba7
      ("KVM: x86: support XSAVES usage in the host"), so clean it up.
      Signed-off-by: default avatarXiaoyao Li <xiaoyao.li@intel.com>
      Message-Id: <20200429154312.1411-1-xiaoyao.li@intel.com>
      Reviewed-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a71936ab
    • Sean Christopherson's avatar
      KVM: nVMX: Tweak handling of failure code for nested VM-Enter failure · 68cda40d
      Sean Christopherson authored
      Use an enum for passing around the failure code for a failed VM-Enter
      that results in VM-Exit to provide a level of indirection from the final
      resting place of the failure code, vmcs.EXIT_QUALIFICATION.  The exit
      qualification field is an unsigned long, e.g. passing around
      'u32 exit_qual' throws up red flags as it suggests KVM may be dropping
      bits when reporting errors to L1.  This is a red herring because the
      only defined failure codes are 0, 2, 3, and 4, i.e. don't come remotely
      close to overflowing a u32.
      
      Setting vmcs.EXIT_QUALIFICATION on entry failure is further complicated
      by the MSR load list, which returns the (1-based) entry that failed, and
      the number of MSRs to load is a 32-bit VMCS field.  At first blush, it
      would appear that overflowing a u32 is possible, but the number of MSRs
      that can be loaded is hardcapped at 4096 (limited by MSR_IA32_VMX_MISC).
      
      In other words, there are two completely disparate types of data that
      eventually get stuffed into vmcs.EXIT_QUALIFICATION, neither of which is
      an 'unsigned long' in nature.  This was presumably the reasoning for
      switching to 'u32' when the related code was refactored in commit
      ca0bde28 ("kvm: nVMX: Split VMCS checks from nested_vmx_run()").
      
      Using an enum for the failure code addresses the technically-possible-
      but-will-never-happen scenario where Intel defines a failure code that
      doesn't fit in a 32-bit integer.  The enum variables and values will
      either be automatically sized (gcc 5.4 behavior) or be subjected to some
      combination of truncation.  The former case will simply work, while the
      latter will trigger a compile-time warning unless the compiler is being
      particularly unhelpful.
      
      Separating the failure code from the failed MSR entry allows for
      disassociating both from vmcs.EXIT_QUALIFICATION, which avoids the
      conundrum where KVM has to choose between 'u32 exit_qual' and tracking
      values as 'unsigned long' that have no business being tracked as such.
      To cement the split, set vmcs12->exit_qualification directly from the
      entry error code or failed MSR index instead of bouncing through a local
      variable.
      
      Opportunistically rename the variables in load_vmcs12_host_state() and
      vmx_set_nested_state() to call out that they're ignored, set exit_reason
      on demand on nested VM-Enter failure, and add a comment in
      nested_vmx_load_msr() to call out that returning 'i + 1' can't wrap.
      
      No functional change intended.
      Reported-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Jim Mattson <jmattson@google.com>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200511220529.11402-1-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      68cda40d
  2. 13 May, 2020 30 commits