An error occurred fetching the project authors.
- 08 Jun, 2022 6 commits
-
-
Like Xu authored
Splitting the logic for determining the guest values is unnecessarily confusing, and potentially fragile. Perf should have full knowledge and control of what values are loaded for the guest. If we change .guest_get_msrs() to take a struct kvm_pmu pointer, then it can generate the full set of guest values by grabbing guest ds_area and pebs_data_cfg. Alternatively, .guest_get_msrs() could take the desired guest MSR values directly (ds_area and pebs_data_cfg), but kvm_pmu is vendor agnostic, so we don't see any reason to not just pass the pointer. Suggested-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Like Xu <like.xu@linux.intel.com> Acked-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Message-Id: <20220411101946.20262-4-likexu@tencent.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Chao Gao authored
With IPI virtualization enabled, the processor emulates writes to APIC registers that would send IPIs. The processor sets the bit corresponding to the vector in target vCPU's PIR and may send a notification (IPI) specified by NDST and NV fields in target vCPU's Posted-Interrupt Descriptor (PID). It is similar to what IOMMU engine does when dealing with posted interrupt from devices. A PID-pointer table is used by the processor to locate the PID of a vCPU with the vCPU's APIC ID. The table size depends on maximum APIC ID assigned for current VM session from userspace. Allocating memory for PID-pointer table is deferred to vCPU creation, because irqchip mode and VM-scope maximum APIC ID is settled at that point. KVM can skip PID-pointer table allocation if !irqchip_in_kernel(). Like VT-d PI, if a vCPU goes to blocked state, VMM needs to switch its notification vector to wakeup vector. This can ensure that when an IPI for blocked vCPUs arrives, VMM can get control and wake up blocked vCPUs. And if a VCPU is preempted, its posted interrupt notification is suppressed. Note that IPI virtualization can only virualize physical-addressing, flat mode, unicast IPIs. Sending other IPIs would still cause a trap-like APIC-write VM-exit and need to be handled by VMM. Signed-off-by:
Chao Gao <chao.gao@intel.com> Signed-off-by:
Zeng Guang <guang.zeng@intel.com> Message-Id: <20220419154510.11938-1-guang.zeng@intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Zeng Guang authored
Remove the condition check cpu_has_secondary_exec_ctrls(). Calling vmx_refresh_apicv_exec_ctrl() premises secondary controls activated and VMCS fields related to APICv valid as well. If it's invoked in wrong circumstance at the worst case, VMX operation will report VMfailValid error without further harmful impact and just functions as if all the secondary controls were 0. Suggested-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Zeng Guang <guang.zeng@intel.com> Message-Id: <20220419153604.11786-1-guang.zeng@intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Robert Hoo authored
Add tertiary_exec_control field report in dump_vmcs(). Meanwhile, reorganize the dump output of VMCS category as follows. Before change: *** Control State *** PinBased=0x000000ff CPUBased=0xb5a26dfa SecondaryExec=0x061037eb EntryControls=0000d1ff ExitControls=002befff After change: *** Control State *** CPUBased=0xb5a26dfa SecondaryExec=0x061037eb TertiaryExec=0x0000000000000010 PinBased=0x000000ff EntryControls=0000d1ff ExitControls=002befff Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by:
Robert Hoo <robert.hu@linux.intel.com> Signed-off-by:
Zeng Guang <guang.zeng@intel.com> Message-Id: <20220419153441.11687-1-guang.zeng@intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Robert Hoo authored
Check VMX features on tertiary execution control in VMCS config setup. Sub-features in tertiary execution control to be enabled are adjusted according to hardware capabilities although no sub-feature is enabled in this patch. EVMCSv1 doesn't support tertiary VM-execution control, so disable it when EVMCSv1 is in use. And define the auxiliary functions for Tertiary control field here, using the new BUILD_CONTROLS_SHADOW(). Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by:
Robert Hoo <robert.hu@linux.intel.com> Signed-off-by:
Zeng Guang <guang.zeng@intel.com> Message-Id: <20220419153400.11642-1-guang.zeng@intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
In the IRQ injection tracepoint, differentiate between Hard IRQs and Soft "IRQs", i.e. interrupts that are reinjected after incomplete delivery of a software interrupt from an INTn instruction. Tag reinjected interrupts as such, even though the information is usually redundant since soft interrupts are only ever reinjected by KVM. Though rare in practice, a hard IRQ can be reinjected. Signed-off-by:
Sean Christopherson <seanjc@google.com> [MSS: change "kvm_inj_virq" event "reinjected" field type to bool] Signed-off-by:
Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <9664d49b3bd21e227caa501cff77b0569bebffe2.1651440202.git.maciej.szmigiero@oracle.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 27 May, 2022 1 commit
-
-
Bo Liu authored
Rather than waiting for the bots to fix these one-by-one, fix all occurences of "the the" throughout arch/x86. Signed-off-by:
Bo Liu <liubo03@inspur.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20220527061400.5694-1-liubo03@inspur.com
-
- 25 May, 2022 3 commits
-
-
Jim Mattson authored
Change the printf format character from 'd' to 'u' for the VM-instruction error in vmwrite_error(). Fixes: 6aa8b732 ("[PATCH] kvm: userspace interface") Reported-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Jim Mattson <jmattson@google.com> Message-Id: <20220510224035.1792952-2-jmattson@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
David Matlack authored
Include the value of the "VM-instruction error" field from the current VMCS (if any) in the error message for VMCLEAR and VMPTRLD, since each of these instructions may result in more than one VM-instruction error. Previously, this field was only reported for VMWRITE errors. Signed-off-by:
David Matlack <dmatlack@google.com> [Rebased and refactored code; dropped the error number for INVVPID and INVEPT; reworded commit message.] Signed-off-by:
Jim Mattson <jmattson@google.com> Message-Id: <20220510224035.1792952-1-jmattson@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Yanfei Xu authored
When kernel handles the vm-exit caused by external interrupts and NMI, it always sets kvm_intr_type to tell if it's dealing an IRQ or NMI. For the PMI scenario, it could be IRQ or NMI. However, intel_pt PMIs are only generated for HARDWARE perf events, and HARDWARE events are always configured to generate NMIs. Use kvm_handling_nmi_from_guest() to precisely identify if the intel_pt PMI came from the guest; this avoids false positives if an intel_pt PMI/NMI arrives while the host is handling an unrelated IRQ VM-Exit. Fixes: db215756 ("KVM: x86: More precisely identify NMI from guest when handling PMI") Signed-off-by:
Yanfei Xu <yanfei.xu@intel.com> Message-Id: <20220523140821.1345605-1-yanfei.xu@intel.com> Cc: stable@vger.kernel.org Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 12 May, 2022 1 commit
-
-
Kai Huang authored
Intel MKTME KeyID bits (including Intel TDX private KeyID bits) should never be set to SPTE. Set shadow_me_value to 0 and shadow_me_mask to include all MKTME KeyID bits to include them to shadow_zero_check. Signed-off-by:
Kai Huang <kai.huang@intel.com> Message-Id: <27bc10e97a3c0b58a4105ff9107448c190328239.1650363789.git.kai.huang@intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 06 May, 2022 1 commit
-
-
Sean Christopherson authored
Exit to userspace with an emulation error if KVM encounters an injected exception with invalid guest state, in addition to the existing check of bailing if there's a pending exception (KVM doesn't support emulating exceptions except when emulating real mode via vm86). In theory, KVM should never get to such a situation as KVM is supposed to exit to userspace before injecting an exception with invalid guest state. But in practice, userspace can intervene and manually inject an exception and/or stuff registers to force invalid guest state while a previously injected exception is awaiting reinjection. Fixes: fc4fad79 ("KVM: VMX: Reject KVM_RUN if emulation is required with pending exception") Reported-by: syzbot+cfafed3bb76d3e37581b@syzkaller.appspotmail.com Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20220502221850.131873-1-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 29 Apr, 2022 1 commit
-
-
Paolo Bonzini authored
root_role.level is always the same value as shadow_level: - it's kvm_mmu_get_tdp_level(vcpu) when going through init_kvm_tdp_mmu - it's the level argument when going through kvm_init_shadow_ept_mmu - it's assigned directly from new_role.base.level when going through shadow_mmu_init_context Remove the duplication and get the level directly from the role. Reviewed-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 21 Apr, 2022 1 commit
-
-
Sean Christopherson authored
Defer APICv updates that occur while L2 is active until nested VM-Exit, i.e. until L1 regains control. vmx_refresh_apicv_exec_ctrl() assumes L1 is active and (a) stomps all over vmcs02 and (b) neglects to ever updated vmcs01. E.g. if vmcs12 doesn't enable the TPR shadow for L2 (and thus no APICv controls), L1 performs nested VM-Enter APICv inhibited, and APICv becomes unhibited while L2 is active, KVM will set various APICv controls in vmcs02 and trigger a failed VM-Entry. The kicker is that, unless running with nested_early_check=1, KVM blames L1 and chaos ensues. In all cases, ignoring vmcs02 and always deferring the inhibition change to vmcs01 is correct (or at least acceptable). The ABSENT and DISABLE inhibitions cannot truly change while L2 is active (see below). IRQ_BLOCKING can change, but it is firmly a best effort debug feature. Furthermore, only L2's APIC is accelerated/virtualized to the full extent possible, e.g. even if L1 passes through its APIC to L2, normal MMIO/MSR interception will apply to the virtual APIC managed by KVM. The exception is the SELF_IPI register when x2APIC is enabled, but that's an acceptable hole. Lastly, Hyper-V's Auto EOI can technically be toggled if L1 exposes the MSRs to L2, but for that to work in any sane capacity, L1 would need to pass through IRQs to L2 as well, and IRQs must be intercepted to enable virtual interrupt delivery. I.e. exposing Auto EOI to L2 and enabling VID for L2 are, for all intents and purposes, mutually exclusive. Lack of dynamic toggling is also why this scenario is all but impossible to encounter in KVM's current form. But a future patch will pend an APICv update request _during_ vCPU creation to plug a race where a vCPU that's being created doesn't get included in the "all vCPUs request" because it's not yet visible to other vCPUs. If userspaces restores L2 after VM creation (hello, KVM selftests), the first KVM_RUN will occur while L2 is active and thus service the APICv update request made during VM creation. Cc: stable@vger.kernel.org Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20220420013732.3308816-3-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 13 Apr, 2022 3 commits
-
-
Like Xu authored
The pmu_ops should be moved to kvm_x86_init_ops and tagged as __initdata. That'll save those precious few bytes, and more importantly make the original ops unreachable, i.e. make it harder to sneak in post-init modification bugs. Suggested-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Like Xu <likexu@tencent.com> Reviewed-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20220329235054.3534728-4-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Derive the mask of RWX bits reported on EPT violations from the mask of RWX bits that are shoved into EPT entries; the layout is the same, the EPT violation bits are simply shifted by three. Use the new shift and a slight copy-paste of the mask derivation instead of completely open coding the same to convert between the EPT entry bits and the exit qualification when synthesizing a nested EPT Violation. No functional change intended. Cc: SU Hang <darcy.sh@antgroup.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20220329030108.97341-3-darcy.sh@antgroup.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Peng Hao authored
Remove redundant parentheses. Signed-off-by:
Peng Hao <flyingpeng@tencent.com> Message-Id: <20220228030902.88465-1-flyingpeng@tencent.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 02 Apr, 2022 5 commits
-
-
Zeng Guang authored
Currently KVM setup posted interrupt VMCS only depending on per-vcpu APICv activation status at the vCPU creation time. However, this status can be toggled dynamically under some circumstance. So potentially, later posted interrupt enabling may be problematic without VMCS readiness. To fix this, always settle the VMCS setting for posted interrupt as long as APICv is available and lapic locates in kernel. Signed-off-by:
Zeng Guang <guang.zeng@intel.com> Message-Id: <20220315145836.9910-1-guang.zeng@intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Zhenzhong Duan authored
When emulating exit from long mode, EFER_LMA is cleared with vmx_set_efer(). This will already unset the VM_ENTRY_IA32E_MODE control bit as requested by SDM, so there is no need to unset VM_ENTRY_IA32E_MODE again in exit_lmode() explicitly. In case EFER isn't supported by hardware, long mode isn't supported, so exit_lmode() cannot be reached. Note that, thanks to the shadow controls mechanism, this change doesn't eliminate vmread or vmwrite. Signed-off-by:
Zhenzhong Duan <zhenzhong.duan@intel.com> Message-Id: <20220311102643.807507-3-zhenzhong.duan@intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Zhenzhong Duan authored
vmx_set_efer() sets uret->data but, in fact if the value of uret->data will be used vmx_setup_uret_msrs() will have rewritten it with the value returned by update_transition_efer(). uret->data is consumed if and only if uret->load_into_hardware is true, and vmx_setup_uret_msrs() takes care of (a) updating uret->data before setting uret->load_into_hardware to true (b) setting uret->load_into_hardware to false if uret->data isn't updated. Opportunistically use "vmx" directly instead of redoing to_vmx(). Signed-off-by:
Zhenzhong Duan <zhenzhong.duan@intel.com> Message-Id: <20220311102643.807507-2-zhenzhong.duan@intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Maxim Levitsky authored
It was decided that when TSC scaling is not supported, the virtual MSR_AMD64_TSC_RATIO should still have the default '1.0' value. However in this case kvm_max_tsc_scaling_ratio is not set, which breaks various assumptions. Fix this by always calculating kvm_max_tsc_scaling_ratio regardless of host support. For consistency, do the same for VMX. Suggested-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220322172449.235575-8-mlevitsk@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Use an enum for the APICv inhibit reasons, there is no meaning behind their values and they most definitely are not "unsigned longs". Rename the various params to "reason" for consistency and clarity (inhibit may be confused as a command, i.e. inhibit APICv, instead of the reason that is getting toggled/checked). No functional change intended. Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20220311043517.17027-2-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 01 Mar, 2022 1 commit
-
-
Sean Christopherson authored
Move the vAPIC offset adjustments done in the APIC-write trap path from common x86 to VMX in anticipation of using the nodecode path for SVM's AVIC. The adjustment reflects hardware behavior, i.e. it's technically a property of VMX, no common x86. SVM's AVIC behavior is identical, so it's a bit of a moot point, the goal is purely to make it easier to understand why the adjustment is ok. No functional change intended. Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20220204214205.3306634-3-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 25 Feb, 2022 6 commits
-
-
Paolo Bonzini authored
The root_hpa and root_pgd fields form essentially a struct kvm_mmu_root_info. Use the struct to have more consistency between mmu->root and mmu->prev_roots. The patch is entirely search and replace except for cached_root_available, which does not need a temporary struct kvm_mmu_root_info anymore. Reviewed-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Peng Hao authored
From: Peng Hao <flyingpeng@tencent.com> Remove a redundant 'cpu' declaration from inside an if-statement that that shadows an identical declaration at function scope. Both variables are used as scratch variables in for_each_*_cpu() loops, thus there's no harm in sharing a variable. Reviewed-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Peng Hao <flyingpeng@tencent.com> Message-Id: <20220222103954.70062-1-flyingpeng@tencent.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Peng Hao authored
Fix a comment documenting the memory barrier related to clearing a loaded_vmcs; loaded_vmcs tracks the host CPU the VMCS is loaded on via the field 'cpu', it doesn't have a 'vcpu' field. Reviewed-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Peng Hao <flyingpeng@tencent.com> Message-Id: <20220222104029.70129-1-flyingpeng@tencent.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Peng Hao authored
Make sure nested_vmx_hardware_setup/unsetup() are called in pairs under the same conditions. Calling nested_vmx_hardware_unsetup() when nested is false "works" right now because it only calls free_page() on zero- initialized pointers, but it's possible that more code will be added to nested_vmx_hardware_unsetup() in the future. Reviewed-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Peng Hao <flyingpeng@tencent.com> Message-Id: <20220222104054.70286-1-flyingpeng@tencent.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Revert back to refreshing vmcs.HOST_CR3 immediately prior to VM-Enter. The PCID (ASID) part of CR3 can be bumped without KVM being scheduled out, as the kernel will switch CR3 during __text_poke(), e.g. in response to a static key toggling. If switch_mm_irqs_off() chooses a new ASID for the mm associate with KVM, KVM will do VM-Enter => VM-Exit with a stale vmcs.HOST_CR3. Add a comment to explain why KVM must wait until VM-Enter is imminent to refresh vmcs.HOST_CR3. The following splat was captured by stashing vmcs.HOST_CR3 in kvm_vcpu and adding a WARN in load_new_mm_cr3() to fire if a new ASID is being loaded for the KVM-associated mm while KVM has a "running" vCPU: static void load_new_mm_cr3(pgd_t *pgdir, u16 new_asid, bool need_flush) { struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); ... WARN(vcpu && (vcpu->cr3 & GENMASK(11, 0)) != (new_mm_cr3 & GENMASK(11, 0)) && (vcpu->cr3 & PHYSICAL_PAGE_MASK) == (new_mm_cr3 & PHYSICAL_PAGE_MASK), "KVM is hosed, loading CR3 = %lx, vmcs.HOST_CR3 = %lx", new_mm_cr3, vcpu->cr3); } ------------[ cut here ]------------ KVM is hosed, loading CR3 = 8000000105393004, vmcs.HOST_CR3 = 105393003 WARNING: CPU: 4 PID: 20717 at arch/x86/mm/tlb.c:291 load_new_mm_cr3+0x82/0xe0 Modules linked in: vhost_net vhost vhost_iotlb tap kvm_intel CPU: 4 PID: 20717 Comm: stable Tainted: G W 5.17.0-rc3+ #747 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:load_new_mm_cr3+0x82/0xe0 RSP: 0018:ffffc9000489fa98 EFLAGS: 00010082 RAX: 0000000000000000 RBX: 8000000105393004 RCX: 0000000000000027 RDX: 0000000000000027 RSI: 00000000ffffdfff RDI: ffff888277d1b788 RBP: 0000000000000004 R08: ffff888277d1b780 R09: ffffc9000489f8b8 R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000 R13: ffff88810678a800 R14: 0000000000000004 R15: 0000000000000c33 FS: 00007fa9f0e72700(0000) GS:ffff888277d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000001001b5003 CR4: 0000000000172ea0 Call Trace: <TASK> switch_mm_irqs_off+0x1cb/0x460 __text_poke+0x308/0x3e0 text_poke_bp_batch+0x168/0x220 text_poke_finish+0x1b/0x30 arch_jump_label_transform_apply+0x18/0x30 static_key_slow_inc_cpuslocked+0x7c/0x90 static_key_slow_inc+0x16/0x20 kvm_lapic_set_base+0x116/0x190 kvm_set_apic_base+0xa5/0xe0 kvm_set_msr_common+0x2f4/0xf60 vmx_set_msr+0x355/0xe70 [kvm_intel] kvm_set_msr_ignored_check+0x91/0x230 kvm_emulate_wrmsr+0x36/0x120 vmx_handle_exit+0x609/0x6c0 [kvm_intel] kvm_arch_vcpu_ioctl_run+0x146f/0x1b80 kvm_vcpu_ioctl+0x279/0x690 __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> ---[ end trace 0000000000000000 ]--- This reverts commit 15ad9762. Fixes: 15ad9762 ("KVM: VMX: Save HOST_CR3 in vmx_prepare_switch_to_guest()") Reported-by:
Wanpeng Li <kernellwp@gmail.com> Cc: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Acked-by:
Lai Jiangshan <jiangshanlai@gmail.com> Message-Id: <20220224191917.3508476-3-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Undo a nested VMX fix as a step toward reverting the commit it fixed, 15ad9762 ("KVM: VMX: Save HOST_CR3 in vmx_prepare_switch_to_guest()"), as the underlying premise that "host CR3 in the vcpu thread can only be changed when scheduling" is wrong. This reverts commit a9f2705e. Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20220224191917.3508476-2-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 18 Feb, 2022 1 commit
-
-
Paolo Bonzini authored
The two ioctls used to implement userspace-accelerated TPR, KVM_TPR_ACCESS_REPORTING and KVM_SET_VAPIC_ADDR, are available even if hardware-accelerated TPR can be used. So there is no reason not to report KVM_CAP_VAPIC. Reviewed-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 10 Feb, 2022 5 commits
-
-
Oliver Upton authored
There is a local that contains a pointer to vcpu_vmx already. Just use that instead to get at the structure directly instead of doing pointer arithmetic. No functional change intended. Signed-off-by:
Oliver Upton <oupton@google.com> Message-Id: <20220204204705.3538240-8-oupton@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Wanpeng Li authored
When delivering a virtual interrupt, don't actually send a posted interrupt if the target vCPU is also the currently running vCPU and is IN_GUEST_MODE, in which case the interrupt is being sent from a VM-Exit fastpath and the core run loop in vcpu_enter_guest() will manually move the interrupt from the PIR to vmcs.GUEST_RVI. IRQs are disabled while IN_GUEST_MODE, thus there's no possibility of the virtual interrupt being sent from anything other than KVM, i.e. KVM won't suppress a wake event from an IRQ handler (see commit fdba608f, "KVM: VMX: Wake vCPU when delivering posted IRQ even if vCPU == this vCPU"). Eliding the posted interrupt restores the performance provided by the combination of commits 379a3c8e ("KVM: VMX: Optimize posted-interrupt delivery for timer fastpath") and 26efe2fd ("KVM: VMX: Handle preemption timer fastpath"). Thanks Sean for better comments. Suggested-by:
Chao Gao <chao.gao@intel.com> Reviewed-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Wanpeng Li <wanpengli@tencent.com> Message-Id: <1643111979-36447-1-git-send-email-wanpengli@tencent.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Massage VMX's implementation names for kvm_x86_ops to maximize use of kvm-x86-ops.h. Leave cpu_has_vmx_wbinvd_exit() as-is to preserve the cpu_has_vmx_*() pattern used for querying VMCS capabilities. Keep pi_has_pending_interrupt() as vmx_dy_apicv_has_pending_interrupt() does a poor job of describing exactly what is being checked in VMX land. No functional change intended. Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20220128005208.4008533-14-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Use vmx_get_cpl() instead of bouncing through kvm_x86_ops.get_cpl() when performing a CPL check on MOV DR accesses. This avoids a RETPOLINE (when enabled), and more importantly removes a vendor reference to kvm_x86_ops and helps pave the way for unexporting kvm_x86_ops. Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20220128005208.4008533-7-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Rename a variety of kvm_x86_op function pointers so that preferred name for vendor implementations follows the pattern <vendor>_<function>, e.g. rename .run() to .vcpu_run() to match {svm,vmx}_vcpu_run(). This will allow vendor implementations to be wired up via the KVM_X86_OP macro. In many cases, VMX and SVM "disagree" on the preferred name, though in reality it's VMX and x86 that disagree as SVM blindly prepended _svm to the kvm_x86_ops name. Justification for using the VMX nomenclature: - set_{irq,nmi} => inject_{irq,nmi} because the helper is injecting an event that has already been "set" in e.g. the vIRR. SVM's relevant VMCB field is even named event_inj, and KVM's stat is irq_injections. - prepare_guest_switch => prepare_switch_to_guest because the former is ambiguous, e.g. it could mean switching between multiple guests, switching from the guest to host, etc... - update_pi_irte => pi_update_irte to allow for matching match the rest of VMX's posted interrupt naming scheme, which is vmx_pi_<blah>(). - start_assignment => pi_start_assignment to again follow VMX's posted interrupt naming scheme, and to provide context for what bit of code might care about an otherwise undescribed "assignment". The "tlb_flush" => "flush_tlb" creates an inconsistency with respect to Hyper-V's "tlb_remote_flush" hooks, but Hyper-V really is the one that's wrong. x86, VMX, and SVM all use flush_tlb, and even common KVM is on a variant of the bandwagon with "kvm_flush_remote_tlbs", e.g. a more appropriate name for the Hyper-V hooks would be flush_remote_tlbs. Leave that change for another time as the Hyper-V hooks always start as NULL, i.e. the name doesn't matter for using kvm-x86-ops.h, and changing all names requires an astounding amount of churn. VMX and SVM function names are intentionally left as is to minimize the diff. Both VMX and SVM will need to rename even more functions in order to fully utilize KVM_X86_OPS, i.e. an additional patch for each is inevitable. No functional change intended. Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20220128005208.4008533-5-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 08 Feb, 2022 1 commit
-
-
Maxim Levitsky authored
While RSM induced VM entries are not full VM entries, they still need to be followed by actual VM entry to complete it, unlike setting the nested state. This patch fixes boot of hyperv and SMM enabled windows VM running nested on KVM, which fail due to this issue combined with lack of dirty bit setting. Signed-off-by:
Maxim Levitsky <mlevitsk@redhat.com> Cc: stable@vger.kernel.org Message-Id: <20220207155447.840194-5-mlevitsk@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 01 Feb, 2022 2 commits
-
-
Mark Rutland authored
For consistency and clarity, migrate x86 over to the generic helpers for guest timing and lockdep/RCU/tracing management, and remove the x86-specific helpers. Prior to this patch, the guest timing was entered in kvm_guest_enter_irqoff() (called by svm_vcpu_enter_exit() and svm_vcpu_enter_exit()), and was exited by the call to vtime_account_guest_exit() within vcpu_enter_guest(). To minimize duplication and to more clearly balance entry and exit, both entry and exit of guest timing are placed in vcpu_enter_guest(), using the new guest_timing_{enter,exit}_irqoff() helpers. When context tracking is used a small amount of additional time will be accounted towards guests; tick-based accounting is unnaffected as IRQs are disabled at this point and not enabled until after the return from the guest. This also corrects (benign) mis-balanced context tracking accounting introduced in commits: ae95f566 ("KVM: X86: TSCDEADLINE MSR emulation fastpath") 26efe2fd ("KVM: VMX: Handle preemption timer fastpath") Where KVM can enter a guest multiple times, calling vtime_guest_enter() without a corresponding call to vtime_account_guest_exit(), and with vtime_account_system() called when vtime_account_guest() should be used. As account_system_time() checks PF_VCPU and calls account_guest_time(), this doesn't result in any functional problem, but is unnecessarily confusing. Signed-off-by:
Mark Rutland <mark.rutland@arm.com> Acked-by:
Paolo Bonzini <pbonzini@redhat.com> Reviewed-by:
Nicolas Saenz Julienne <nsaenzju@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jim Mattson <jmattson@google.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Message-Id: <20220201132926.3301912-4-mark.rutland@arm.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Handle non-APICv interrupt delivery in vendor code, even though it means VMX and SVM will temporarily have duplicate code. SVM's AVIC has a race condition that requires KVM to fall back to legacy interrupt injection _after_ the interrupt has been logged in the vIRR, i.e. to fix the race, SVM will need to open code the full flow anyways[*]. Refactor the code so that the SVM bug without introducing other issues, e.g. SVM would return "success" and thus invoke trace_kvm_apicv_accept_irq() even when delivery through the AVIC failed, and to opportunistically prepare for using KVM_X86_OP to fill each vendor's kvm_x86_ops struct, which will rely on the vendor function matching the kvm_x86_op pointer name. No functional change intended. [*] https://lore.kernel.org/all/20211213104634.199141-4-mlevitsk@redhat.comSigned-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20220128005208.4008533-3-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 26 Jan, 2022 2 commits
-
-
Sean Christopherson authored
Pass the emulation type to kvm_x86_ops.can_emulate_insutrction() so that a future commit can harden KVM's SEV support to WARN on emulation scenarios that should never happen. No functional change intended. Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Liam Merwick <liam.merwick@oracle.com> Message-Id: <20220120010719.711476-6-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Jim Mattson authored
The maximum size of a VMCS (or VMXON region) is 4096. By definition, these are order 0 allocations. Signed-off-by:
Jim Mattson <jmattson@google.com> Message-Id: <20220125004359.147600-1-jmattson@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-