1. 19 Jan, 2022 27 commits
  2. 18 Jan, 2022 1 commit
    • Marcelo Tosatti's avatar
      KVM: VMX: switch blocked_vcpu_on_cpu_lock to raw spinlock · 5f02ef74
      Marcelo Tosatti authored
      blocked_vcpu_on_cpu_lock is taken from hard interrupt context
      (pi_wakeup_handler), therefore it cannot sleep.
      
      Switch it to a raw spinlock.
      
      Fixes:
      
      [41297.066254] BUG: scheduling while atomic: CPU 0/KVM/635218/0x00010001
      [41297.066323] Preemption disabled at:
      [41297.066324] [<ffffffff902ee47f>] irq_enter_rcu+0xf/0x60
      [41297.066339] Call Trace:
      [41297.066342]  <IRQ>
      [41297.066346]  dump_stack_lvl+0x34/0x44
      [41297.066353]  ? irq_enter_rcu+0xf/0x60
      [41297.066356]  __schedule_bug.cold+0x7d/0x8b
      [41297.066361]  __schedule+0x439/0x5b0
      [41297.066365]  ? task_blocks_on_rt_mutex.constprop.0.isra.0+0x1b0/0x440
      [41297.066369]  schedule_rtlock+0x1e/0x40
      [41297.066371]  rtlock_slowlock_locked+0xf1/0x260
      [41297.066374]  rt_spin_lock+0x3b/0x60
      [41297.066378]  pi_wakeup_handler+0x31/0x90 [kvm_intel]
      [41297.066388]  sysvec_kvm_posted_intr_wakeup_ipi+0x9d/0xd0
      [41297.066392]  </IRQ>
      [41297.066392]  asm_sysvec_kvm_posted_intr_wakeup_ipi+0x12/0x20
      ...
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5f02ef74
  3. 17 Jan, 2022 6 commits
    • Like Xu's avatar
      KVM: x86: Making the module parameter of vPMU more common · 4732f244
      Like Xu authored
      The new module parameter to control PMU virtualization should apply
      to Intel as well as AMD, for situations where userspace is not trusted.
      If the module parameter allows PMU virtualization, there could be a
      new KVM_CAP or guest CPUID bits whereby userspace can enable/disable
      PMU virtualization on a per-VM basis.
      
      If the module parameter does not allow PMU virtualization, there
      should be no userspace override, since we have no precedent for
      authorizing that kind of override. If it's false, other counter-based
      profiling features (such as LBR including the associated CPUID bits
      if any) will not be exposed.
      
      Change its name from "pmu" to "enable_pmu" as we have temporary
      variables with the same name in our code like "struct kvm_pmu *pmu".
      
      Fixes: b1d66dad ("KVM: x86/svm: Add module param to control PMU virtualization")
      Suggested-by : Jim Mattson <jmattson@google.com>
      Signed-off-by: default avatarLike Xu <likexu@tencent.com>
      Message-Id: <20220111073823.21885-1-likexu@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4732f244
    • Vitaly Kuznetsov's avatar
      KVM: selftests: Test KVM_SET_CPUID2 after KVM_RUN · ecebb966
      Vitaly Kuznetsov authored
      KVM forbids KVM_SET_CPUID2 after KVM_RUN was performed on a vCPU unless
      the supplied CPUID data is equal to what was previously set. Test this.
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20220117150542.2176196-5-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ecebb966
    • Vitaly Kuznetsov's avatar
      KVM: selftests: Rename 'get_cpuid_test' to 'cpuid_test' · 9e6d484f
      Vitaly Kuznetsov authored
      In preparation to reusing the existing 'get_cpuid_test' for testing
      "KVM_SET_CPUID{,2} after KVM_RUN" rename it to 'cpuid_test' to avoid
      the confusion.
      
      No functional change intended.
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20220117150542.2176196-4-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      9e6d484f
    • Vitaly Kuznetsov's avatar
      KVM: x86: Partially allow KVM_SET_CPUID{,2} after KVM_RUN · c6617c61
      Vitaly Kuznetsov authored
      Commit feb627e8 ("KVM: x86: Forbid KVM_SET_CPUID{,2} after KVM_RUN")
      forbade changing CPUID altogether but unfortunately this is not fully
      compatible with existing VMMs. In particular, QEMU reuses vCPU fds for
      CPU hotplug after unplug and it calls KVM_SET_CPUID2. Instead of full ban,
      check whether the supplied CPUID data is equal to what was previously set.
      Reported-by: default avatarIgor Mammedov <imammedo@redhat.com>
      Fixes: feb627e8 ("KVM: x86: Forbid KVM_SET_CPUID{,2} after KVM_RUN")
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20220117150542.2176196-3-vkuznets@redhat.com>
      Cc: stable@vger.kernel.org
      [Do not call kvm_find_cpuid_entry repeatedly. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c6617c61
    • Vitaly Kuznetsov's avatar
      KVM: x86: Do runtime CPUID update before updating vcpu->arch.cpuid_entries · ee3a5f9e
      Vitaly Kuznetsov authored
      kvm_update_cpuid_runtime() mangles CPUID data coming from userspace
      VMM after updating 'vcpu->arch.cpuid_entries', this makes it
      impossible to compare an update with what was previously
      supplied. Introduce __kvm_update_cpuid_runtime() version which can be
      used to tweak the input before it goes to 'vcpu->arch.cpuid_entries'
      so the upcoming update check can compare tweaked data.
      
      No functional change intended.
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20220117150542.2176196-2-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ee3a5f9e
    • Like Xu's avatar
      KVM: x86/pmu: Fix available_event_types check for REF_CPU_CYCLES event · a2186448
      Like Xu authored
      According to CPUID 0x0A.EBX bit vector, the event [7] should be the
      unrealized event "Topdown Slots" instead of the *kernel* generalized
      common hardware event "REF_CPU_CYCLES", so we need to skip the cpuid
      unavaliblity check in the intel_pmc_perf_hw_id() for the last
      REF_CPU_CYCLES event and update the confusing comment.
      
      If the event is marked as unavailable in the Intel guest CPUID
      0AH.EBX leaf, we need to avoid any perf_event creation, whether
      it's a gp or fixed counter. To distinguish whether it is a rejected
      event or an event that needs to be programmed with PERF_TYPE_RAW type,
      a new special returned value of "PERF_COUNT_HW_MAX + 1" is introduced.
      
      Fixes: 62079d8a ("KVM: PMU: add proper support for fixed counter 2")
      Signed-off-by: default avatarLike Xu <likexu@tencent.com>
      Message-Id: <20220105051509.69437-1-likexu@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a2186448
  4. 14 Jan, 2022 6 commits
    • Yang Zhong's avatar
      x86/fpu: Fix inline prefix warnings · c862dcd1
      Yang Zhong authored
      Fix sparse warnings in xstate and remove inline prefix.
      
      Fixes: 980fe2fd ("x86/fpu: Extend fpu_xstate_prctl() with guest permissions")
      Signed-off-by: default avatarYang Zhong <yang.zhong@intel.com>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Message-Id: <20220113180825.322333-1-yang.zhong@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c862dcd1
    • Yang Zhong's avatar
      selftest: kvm: Add amx selftest · bf70636d
      Yang Zhong authored
      This selftest covers two aspects of AMX.  The first is triggering #NM
      exception and checking the MSR XFD_ERR value.  The second case is
      loading tile config and tile data into guest registers and trapping to
      the host side for a complete save/load of the guest state.  TMM0
      is also checked against memory data after save/restore.
      Signed-off-by: default avatarYang Zhong <yang.zhong@intel.com>
      Message-Id: <20211223145322.2914028-4-yang.zhong@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bf70636d
    • Yang Zhong's avatar
      selftest: kvm: Move struct kvm_x86_state to header · 6559b4a5
      Yang Zhong authored
      Those changes can avoid dereferencing pointer compile issue
      when amx_test.c reference state->xsave.
      
      Move struct kvm_x86_state definition to processor.h.
      Signed-off-by: default avatarYang Zhong <yang.zhong@intel.com>
      Message-Id: <20211223145322.2914028-3-yang.zhong@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6559b4a5
    • Paolo Bonzini's avatar
      selftest: kvm: Reorder vcpu_load_state steps for AMX · 551447cf
      Paolo Bonzini authored
      For AMX support it is recommended to load XCR0 after XFD, so
      that KVM does not see XFD=0, XCR=1 for a save state that will
      eventually be disabled (which would lead to premature allocation
      of the space required for that save state).
      
      It is also required to load XSAVE data after XCR0 and XFD, so
      that KVM can trigger allocation of the extra space required to
      store AMX state.
      
      Adjust vcpu_load_state to obey these new requirements.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarYang Zhong <yang.zhong@intel.com>
      Message-Id: <20211223145322.2914028-2-yang.zhong@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      551447cf
    • Kevin Tian's avatar
      kvm: x86: Disable interception for IA32_XFD on demand · b5274b1b
      Kevin Tian authored
      Always intercepting IA32_XFD causes non-negligible overhead when this
      register is updated frequently in the guest.
      
      Disable r/w emulation after intercepting the first WRMSR(IA32_XFD)
      with a non-zero value.
      
      Disable WRMSR emulation implies that IA32_XFD becomes out-of-sync
      with the software states in fpstate and the per-cpu xfd cache. This
      leads to two additional changes accordingly:
      
        - Call fpu_sync_guest_vmexit_xfd_state() after vm-exit to bring
          software states back in-sync with the MSR, before handle_exit_irqoff()
          is called.
      
        - Always trap #NM once write interception is disabled for IA32_XFD.
          The #NM exception is rare if the guest doesn't use dynamic
          features. Otherwise, there is at most one exception per guest
          task given a dynamic feature.
      
      p.s. We have confirmed that SDM is being revised to say that
      when setting IA32_XFD[18] the AMX register state is not guaranteed
      to be preserved. This clarification avoids adding mess for a creative
      guest which sets IA32_XFD[18]=1 before saving active AMX state to
      its own storage.
      Signed-off-by: default avatarKevin Tian <kevin.tian@intel.com>
      Signed-off-by: default avatarJing Liu <jing2.liu@intel.com>
      Signed-off-by: default avatarYang Zhong <yang.zhong@intel.com>
      Message-Id: <20220105123532.12586-22-yang.zhong@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b5274b1b
    • Thomas Gleixner's avatar
      x86/fpu: Provide fpu_sync_guest_vmexit_xfd_state() · 5429cead
      Thomas Gleixner authored
      KVM can disable the write emulation for the XFD MSR when the vCPU's fpstate
      is already correctly sized to reduce the overhead.
      
      When write emulation is disabled the XFD MSR state after a VMEXIT is
      unknown and therefore not in sync with the software states in fpstate and
      the per CPU XFD cache.
      
      Provide fpu_sync_guest_vmexit_xfd_state() which has to be invoked after a
      VMEXIT before enabling interrupts when write emulation is disabled for the
      XFD MSR.
      
      It could be invoked unconditionally even when write emulation is enabled
      for the price of a pointless MSR read.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarJing Liu <jing2.liu@intel.com>
      Signed-off-by: default avatarYang Zhong <yang.zhong@intel.com>
      Message-Id: <20220105123532.12586-21-yang.zhong@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5429cead