1. 11 Jun, 2022 10 commits
  2. 10 Jun, 2022 8 commits
  3. 09 Jun, 2022 22 commits
    • Paolo Bonzini's avatar
      Merge branch 'kvm-5.20-early' · e15f5e6f
      Paolo Bonzini authored
      s390:
      
      * add an interface to provide a hypervisor dump for secure guests
      
      * improve selftests to show tests
      
      x86:
      
      * Intel IPI virtualization
      
      * Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS
      
      * PEBS virtualization
      
      * Simplify PMU emulation by just using PERF_TYPE_RAW events
      
      * More accurate event reinjection on SVM (avoid retrying instructions)
      
      * Allow getting/setting the state of the speaker port data bit
      
      * Rewrite gfn-pfn cache refresh
      
      * Refuse starting the module if VM-Entry/VM-Exit controls are inconsistent
      
      * "Notify" VM exit
      e15f5e6f
    • David Matlack's avatar
      KVM: selftests: Restrict test region to 48-bit physical addresses when using nested · e0f3f46e
      David Matlack authored
      The selftests nested code only supports 4-level paging at the moment.
      This means it cannot map nested guest physical addresses with more than
      48 bits. Allow perf_test_util nested mode to work on hosts with more
      than 48 physical addresses by restricting the guest test region to
      48-bits.
      
      While here, opportunistically fix an off-by-one error when dealing with
      vm_get_max_gfn(). perf_test_util.c was treating this as the maximum
      number of GFNs, rather than the maximum allowed GFN. This didn't result
      in any correctness issues, but it did end up shifting the test region
      down slightly when using huge pages.
      Suggested-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Message-Id: <20220520233249.3776001-12-dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e0f3f46e
    • David Matlack's avatar
      KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2 · 71d48966
      David Matlack authored
      Add an option to dirty_log_perf_test that configures the vCPUs to run in
      L2 instead of L1. This makes it possible to benchmark the dirty logging
      performance of nested virtualization, which is particularly interesting
      because KVM must shadow L1's EPT/NPT tables.
      
      For now this support only works on x86_64 CPUs with VMX. Otherwise
      passing -n results in the test being skipped.
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Message-Id: <20220520233249.3776001-11-dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      71d48966
    • David Matlack's avatar
      KVM: selftests: Clean up LIBKVM files in Makefile · cf97d5e9
      David Matlack authored
      Break up the long lines for LIBKVM and alphabetize each architecture.
      This makes reading the Makefile easier, and will make reading diffs to
      LIBKVM easier.
      
      No functional change intended.
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Message-Id: <20220520233249.3776001-10-dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      cf97d5e9
    • David Matlack's avatar
      KVM: selftests: Link selftests directly with lib object files · cdc979da
      David Matlack authored
      The linker does obey strong/weak symbols when linking static libraries,
      it simply resolves an undefined symbol to the first-encountered symbol.
      This means that defining __weak arch-generic functions and then defining
      arch-specific strong functions to override them in libkvm will not
      always work.
      
      More specifically, if we have:
      
      lib/generic.c:
      
        void __weak foo(void)
        {
                pr_info("weak\n");
        }
      
        void bar(void)
        {
                foo();
        }
      
      lib/x86_64/arch.c:
      
        void foo(void)
        {
                pr_info("strong\n");
        }
      
      And a selftest that calls bar(), it will print "weak". Now if you make
      generic.o explicitly depend on arch.o (e.g. add function to arch.c that
      is called directly from generic.c) it will print "strong". In other
      words, it seems that the linker is free to throw out arch.o when linking
      because generic.o does not explicitly depend on it, which causes the
      linker to lose the strong symbol.
      
      One solution is to link libkvm.a with --whole-archive so that the linker
      doesn't throw away object files it thinks are unnecessary. However that
      is a bit difficult to plumb since we are using the common selftests
      makefile rules. An easier solution is to drop libkvm.a just link
      selftests with all the .o files that were originally in libkvm.a.
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Message-Id: <20220520233249.3776001-9-dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      cdc979da
    • David Matlack's avatar
      KVM: selftests: Drop unnecessary rule for STATIC_LIBS · acf57736
      David Matlack authored
      Drop the "all: $(STATIC_LIBS)" rule. The KVM selftests already depend
      on $(STATIC_LIBS), so there is no reason to have an extra "all" rule.
      Suggested-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Message-Id: <20220520233249.3776001-8-dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      acf57736
    • David Matlack's avatar
      KVM: selftests: Add a helper to check EPT/VPID capabilities · c363d959
      David Matlack authored
      Create a small helper function to check if a given EPT/VPID capability
      is supported. This will be re-used in a follow-up commit to check for 1G
      page support.
      
      No functional change intended.
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Message-Id: <20220520233249.3776001-7-dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c363d959
    • David Matlack's avatar
      KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.h · b6c086d0
      David Matlack authored
      This is a VMX-related macro so move it to vmx.h. While here, open code
      the mask like the rest of the VMX bitmask macros.
      
      No functional change intended.
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Message-Id: <20220520233249.3776001-6-dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b6c086d0
    • David Matlack's avatar
      KVM: selftests: Refactor nested_map() to specify target level · ce690e9c
      David Matlack authored
      Refactor nested_map() to specify that it explicityl wants 4K mappings
      (the existing behavior) and push the implementation down into
      __nested_map(), which can be used in subsequent commits to create huge
      page mappings.
      
      No function change intended.
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Message-Id: <20220520233249.3776001-5-dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ce690e9c
    • David Matlack's avatar
      KVM: selftests: Drop stale function parameter comment for nested_map() · b8ca01ea
      David Matlack authored
      nested_map() does not take a parameter named eptp_memslot. Drop the
      comment referring to it.
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Message-Id: <20220520233249.3776001-4-dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b8ca01ea
    • David Matlack's avatar
      KVM: selftests: Add option to create 2M and 1G EPT mappings · c5a0ccec
      David Matlack authored
      The current EPT mapping code in the selftests only supports mapping 4K
      pages. This commit extends that support with an option to map at 2M or
      1G. This will be used in a future commit to create large page mappings
      to test eager page splitting.
      
      No functional change intended.
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Message-Id: <20220520233249.3776001-3-dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c5a0ccec
    • David Matlack's avatar
      KVM: selftests: Replace x86_page_size with PG_LEVEL_XX · 4ee602e7
      David Matlack authored
      x86_page_size is an enum used to communicate the desired page size with
      which to map a range of memory. Under the hood they just encode the
      desired level at which to map the page. This ends up being clunky in a
      few ways:
      
       - The name suggests it encodes the size of the page rather than the
         level.
       - In other places in x86_64/processor.c we just use a raw int to encode
         the level.
      
      Simplify this by adopting the kernel style of PG_LEVEL_XX enums and pass
      around raw ints when referring to the level. This makes the code easier
      to understand since these macros are very common in KVM MMU code.
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Message-Id: <20220520233249.3776001-2-dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4ee602e7
    • Paolo Bonzini's avatar
      KVM: x86: SVM: fix nested PAUSE filtering when L0 intercepts PAUSE · e3cdaab5
      Paolo Bonzini authored
      Commit 74fd41ed ("KVM: x86: nSVM: support PAUSE filtering when L0
      doesn't intercept PAUSE") introduced passthrough support for nested pause
      filtering, (when the host doesn't intercept PAUSE) (either disabled with
      kvm module param, or disabled with '-overcommit cpu-pm=on')
      
      Before this commit, L1 KVM didn't intercept PAUSE at all; afterwards,
      the feature was exposed as supported by KVM cpuid unconditionally, thus
      if L1 could try to use it even when the L0 KVM can't really support it.
      
      In this case the fallback caused KVM to intercept each PAUSE instruction;
      in some cases, such intercept can slow down the nested guest so much
      that it can fail to boot.  Instead, before the problematic commit KVM
      was already setting both thresholds to 0 in vmcb02, but after the first
      userspace VM exit shrink_ple_window was called and would reset the
      pause_filter_count to the default value.
      
      To fix this, change the fallback strategy - ignore the guest threshold
      values, but use/update the host threshold values unless the guest
      specifically requests disabling PAUSE filtering (either simple or
      advanced).
      
      Also fix a minor bug: on nested VM exit, when PAUSE filter counter
      were copied back to vmcb01, a dirty bit was not set.
      
      Thanks a lot to Suravee Suthikulpanit for debugging this!
      
      Fixes: 74fd41ed ("KVM: x86: nSVM: support PAUSE filtering when L0 doesn't intercept PAUSE")
      Reported-by: default avatarSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Tested-by: default avatarSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Co-developed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20220518072709.730031-1-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e3cdaab5
    • Maxim Levitsky's avatar
      KVM: x86: SVM: drop preempt-safe wrappers for avic_vcpu_load/put · ba8ec273
      Maxim Levitsky authored
      Now that these functions are always called with preemption disabled,
      remove the preempt_disable()/preempt_enable() pair inside them.
      
      No functional change intended.
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20220606180829.102503-8-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ba8ec273
    • Maxim Levitsky's avatar
      KVM: x86: disable preemption around the call to kvm_arch_vcpu_{un|}blocking · 18869f26
      Maxim Levitsky authored
      On SVM, if preemption happens right after the call to finish_rcuwait
      but before call to kvm_arch_vcpu_unblocking on SVM/AVIC, it itself
      will re-enable AVIC, and then we will try to re-enable it again
      in kvm_arch_vcpu_unblocking which will lead to a warning
      in __avic_vcpu_load.
      
      The same problem can happen if the vCPU is preempted right after the call
      to kvm_arch_vcpu_blocking but before the call to prepare_to_rcuwait
      and in this case, we will end up with AVIC enabled during sleep -
      Ooops.
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20220606180829.102503-7-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      18869f26
    • Maxim Levitsky's avatar
      KVM: x86: disable preemption while updating apicv inhibition · 66c768d3
      Maxim Levitsky authored
      Currently nothing prevents preemption in kvm_vcpu_update_apicv.
      
      On SVM, If the preemption happens after we update the
      vcpu->arch.apicv_active, the preemption itself will
      'update' the inhibition since the AVIC will be first disabled
      on vCPU unload and then enabled, when the current task
      is loaded again.
      
      Then we will try to update it again, which will lead to a warning
      in __avic_vcpu_load, that the AVIC is already enabled.
      
      Fix this by disabling preemption in this code.
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20220606180829.102503-6-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      66c768d3
    • Maxim Levitsky's avatar
      KVM: x86: SVM: fix avic_kick_target_vcpus_fast · 603ccef4
      Maxim Levitsky authored
      There are two issues in avic_kick_target_vcpus_fast
      
      1. It is legal to issue an IPI request with APIC_DEST_NOSHORT
         and a physical destination of 0xFF (or 0xFFFFFFFF in case of x2apic),
         which must be treated as a broadcast destination.
      
         Fix this by explicitly checking for it.
         Also donâ€t use ‘index†in this case as it gives no new information.
      
      2. It is legal to issue a logical IPI request to more than one target.
         Index field only provides index in physical id table of first
         such target and therefore can't be used before we are sure
         that only a single target was addressed.
      
         Instead, parse the ICRL/ICRH, double check that a unicast interrupt
         was requested, and use that info to figure out the physical id
         of the target vCPU.
         At that point there is no need to use the index field as well.
      
      In addition to fixing the above	issues,	also skip the call to
      kvm_apic_match_dest.
      
      It is possible to do this now, because now as long as AVIC is not
      inhibited, it is guaranteed that none of the vCPUs changed their
      apic id from its default value.
      
      This fixes boot of windows guest with AVIC enabled because it uses
      IPI with 0xFF destination and no destination shorthand.
      
      Fixes: 7223fd2d ("KVM: SVM: Use target APIC ID to complete AVIC IRQs when possible")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20220606180829.102503-5-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      603ccef4
    • Maxim Levitsky's avatar
      KVM: x86: SVM: remove avic's broken code that updated APIC ID · f5f9089f
      Maxim Levitsky authored
      AVIC is now inhibited if the guest changes the apic id,
      and therefore this code is no longer needed.
      
      There are several ways this code was broken, including:
      
      1. a vCPU was only allowed to change its apic id to an apic id
      of an existing vCPU.
      
      2. After such change, the vCPU whose apic id entry was overwritten,
      could not correctly change its own apic id, because its own
      entry is already overwritten.
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20220606180829.102503-4-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f5f9089f
    • Maxim Levitsky's avatar
      KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base · 3743c2f0
      Maxim Levitsky authored
      Neither of these settings should be changed by the guest and it is
      a burden to support it in the acceleration code, so just inhibit
      this code instead.
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20220606180829.102503-3-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3743c2f0
    • Maxim Levitsky's avatar
      KVM: x86: document AVIC/APICv inhibit reasons · a9603ae0
      Maxim Levitsky authored
      These days there are too many AVIC/APICv inhibit
      reasons, and it doesn't hurt to have some documentation
      for them.
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20220606180829.102503-2-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a9603ae0
    • Yuan Yao's avatar
      KVM: x86/mmu: Set memory encryption "value", not "mask", in shadow PDPTRs · d2263de1
      Yuan Yao authored
      Assign shadow_me_value, not shadow_me_mask, to PAE root entries,
      a.k.a. shadow PDPTRs, when host memory encryption is supported.  The
      "mask" is the set of all possible memory encryption bits, e.g. MKTME
      KeyIDs, whereas "value" holds the actual value that needs to be
      stuffed into host page tables.
      
      Using shadow_me_mask results in a failed VM-Entry due to setting
      reserved PA bits in the PDPTRs, and ultimately causes an OOPS due to
      physical addresses with non-zero MKTME bits sending to_shadow_page()
      into the weeds:
      
      set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state.
      BUG: unable to handle page fault for address: ffd43f00063049e8
      PGD 86dfd8067 P4D 0
      Oops: 0000 [#1] PREEMPT SMP
      RIP: 0010:mmu_free_root_page+0x3c/0x90 [kvm]
       kvm_mmu_free_roots+0xd1/0x200 [kvm]
       __kvm_mmu_unload+0x29/0x70 [kvm]
       kvm_mmu_unload+0x13/0x20 [kvm]
       kvm_arch_destroy_vm+0x8a/0x190 [kvm]
       kvm_put_kvm+0x197/0x2d0 [kvm]
       kvm_vm_release+0x21/0x30 [kvm]
       __fput+0x8e/0x260
       ____fput+0xe/0x10
       task_work_run+0x6f/0xb0
       do_exit+0x327/0xa90
       do_group_exit+0x35/0xa0
       get_signal+0x911/0x930
       arch_do_signal_or_restart+0x37/0x720
       exit_to_user_mode_prepare+0xb2/0x140
       syscall_exit_to_user_mode+0x16/0x30
       do_syscall_64+0x4e/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Fixes: e54f1ff2 ("KVM: x86/mmu: Add shadow_me_value and repurpose shadow_me_mask")
      Signed-off-by: default avatarYuan Yao <yuan.yao@intel.com>
      Reviewed-by: default avatarKai Huang <kai.huang@intel.com>
      Message-Id: <20220608012015.19566-1-yuan.yao@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d2263de1
    • Paolo Bonzini's avatar
      Merge tag 'kvmarm-fixes-5.19-1' of... · 76599a47
      Paolo Bonzini authored
      Merge tag 'kvmarm-fixes-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
      
      KVM/arm64 fixes for 5.19, take #1
      
      - Properly reset the SVE/SME flags on vcpu load
      
      - Fix a vgic-v2 regression regarding accessing the pending
        state of a HW interrupt from userspace (and make the code
        common with vgic-v3)
      
      - Fix access to the idreg range for protected guests
      
      - Ignore 'kvm-arm.mode=protected' when using VHE
      
      - Return an error from kvm_arch_init_vm() on allocation failure
      
      - A bunch of small cleanups (comments, annotations, indentation)
      76599a47