- 07 Jan, 2022 1 commit
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarmPaolo Bonzini authored
KVM/arm64 updates for Linux 5.16 - Simplification of the 'vcpu first run' by integrating it into KVM's 'pid change' flow - Refactoring of the FP and SVE state tracking, also leading to a simpler state and less shared data between EL1 and EL2 in the nVHE case - Tidy up the header file usage for the nvhe hyp object - New HYP unsharing mechanism, finally allowing pages to be unmapped from the Stage-1 EL2 page-tables - Various pKVM cleanups around refcounting and sharing - A couple of vgic fixes for bugs that would trigger once the vcpu xarray rework is merged, but not sooner - Add minimal support for ARMv8.7's PMU extension - Rework kvm_pgtable initialisation ahead of the NV work - New selftest for IRQ injection - Teach selftests about the lack of default IPA space and page sizes - Expand sysreg selftest to deal with Pointer Authentication - The usual bunch of cleanups and doc update
-
- 04 Jan, 2022 6 commits
-
-
Marc Zyngier authored
* kvm-arm64/misc-5.17: : . : Misc fixes and improvements: : - Add minimal support for ARMv8.7's PMU extension : - Constify kvm_io_gic_ops : - Drop kvm_is_transparent_hugepage() prototype : - Drop unused workaround_flags field : - Rework kvm_pgtable initialisation : - Documentation fixes : - Replace open-coded SCTLR_EL1.EE useage with its defined macro : - Sysreg list selftest update to handle PAuth : - Include cleanups : . KVM: arm64: vgic: Replace kernel.h with the necessary inclusions KVM: arm64: Fix comment typo in kvm_vcpu_finalize_sve() KVM: arm64: selftests: get-reg-list: Add pauth configuration KVM: arm64: Fix comment on barrier in kvm_psci_vcpu_on() KVM: arm64: Fix comment for kvm_reset_vcpu() KVM: arm64: Use defined value for SCTLR_ELx_EE KVM: arm64: Rework kvm_pgtable initialisation KVM: arm64: Drop unused workaround_flags vcpu field Signed-off-by: Marc Zyngier <maz@kernel.org>
-
Andy Shevchenko authored
arm_vgic.h does not require all the stuff that kernel.h provides. Replace kernel.h inclusion with the list of what is really being used. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220104151940.55399-1-andriy.shevchenko@linux.intel.com
-
Marc Zyngier authored
* kvm-arm64/selftest/irq-injection: : . : New tests from Ricardo Koller: : "This series adds a new test, aarch64/vgic-irq, that validates the injection of : different types of IRQs from userspace using various methods and configurations" : . KVM: selftests: aarch64: Add test for restoring active IRQs KVM: selftests: aarch64: Add ISPENDR write tests in vgic_irq KVM: selftests: aarch64: Add tests for IRQFD in vgic_irq KVM: selftests: Add IRQ GSI routing library functions KVM: selftests: aarch64: Add test_inject_fail to vgic_irq KVM: selftests: aarch64: Add tests for LEVEL_INFO in vgic_irq KVM: selftests: aarch64: Level-sensitive interrupts tests in vgic_irq KVM: selftests: aarch64: Add preemption tests in vgic_irq KVM: selftests: aarch64: Cmdline arg to set EOI mode in vgic_irq KVM: selftests: aarch64: Cmdline arg to set number of IRQs in vgic_irq test KVM: selftests: aarch64: Abstract the injection functions in vgic_irq KVM: selftests: aarch64: Add vgic_irq to test userspace IRQ injection KVM: selftests: aarch64: Add vGIC library functions to deal with vIRQ state KVM: selftests: Add kvm_irq_line library function KVM: selftests: aarch64: Add GICv3 register accessor library functions KVM: selftests: aarch64: Add function for accessing GICv3 dist and redist registers KVM: selftests: aarch64: Move gic_v3.h to shared headers Signed-off-by: Marc Zyngier <maz@kernel.org>
-
Marc Zyngier authored
* kvm-arm64/selftest/ipa: : . : Expand the KVM/arm64 selftest infrastructure to discover : supported page sizes at runtime, support 16kB pages, and : find out about the original M1 stupidly small IPA space. : . KVM: selftests: arm64: Add support for various modes with 16kB page size KVM: selftests: arm64: Add support for VM_MODE_P36V48_{4K,64K} KVM: selftests: arm64: Rework TCR_EL1 configuration KVM: selftests: arm64: Check for supported page sizes KVM: selftests: arm64: Introduce a variable default IPA size KVM: selftests: arm64: Initialise default guest mode at test startup time Signed-off-by: Marc Zyngier <maz@kernel.org>
-
Zenghui Yu authored
kvm_arm_init_arch_resources() was renamed to kvm_arm_init_sve() in commit a3be836d ("KVM: arm/arm64: Demote kvm_arm_init_arch_resources() to just set up SVE"). Fix the function name in comment of kvm_vcpu_finalize_sve(). Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211230141535.1389-1-yuzenghui@huawei.com
-
Marc Zyngier authored
The get-reg-list test ignores the Pointer Authentication features, which is a shame now that we have relatively common HW with this feature. Define two new configurations (with and without PMU) that exercise the KVM capabilities. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Link: https://lore.kernel.org/r/20211228121414.1013250-1-maz@kernel.org
-
- 28 Dec, 2021 23 commits
-
-
Ricardo Koller authored
Add a test that restores multiple IRQs in active state, it does it by writing into ISACTIVER from the guest and using KVM ioctls. This test tries to emulate what would happen during a live migration: restore active IRQs. Signed-off-by: Ricardo Koller <ricarkol@google.com> Acked-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211109023906.1091208-18-ricarkol@google.com
-
Ricardo Koller authored
Add injection tests that use writing into the ISPENDR register (to mark IRQs as pending). This is typically used by migration code. Signed-off-by: Ricardo Koller <ricarkol@google.com> Acked-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211109023906.1091208-17-ricarkol@google.com
-
Ricardo Koller authored
Add injection tests for the KVM_IRQFD ioctl into vgic_irq. Signed-off-by: Ricardo Koller <ricarkol@google.com> Acked-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211109023906.1091208-16-ricarkol@google.com
-
Ricardo Koller authored
Add an architecture independent wrapper function for creating and writing IRQ GSI routing tables. Also add a function to add irqchip entries. Signed-off-by: Ricardo Koller <ricarkol@google.com> Acked-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211109023906.1091208-15-ricarkol@google.com
-
Ricardo Koller authored
Add tests for failed injections to vgic_irq. This tests that KVM can handle bogus IRQ numbers. Signed-off-by: Ricardo Koller <ricarkol@google.com> Acked-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211109023906.1091208-14-ricarkol@google.com
-
Ricardo Koller authored
Add injection tests for the LEVEL_INFO ioctl (level-sensitive specific) into vgic_irq. Signed-off-by: Ricardo Koller <ricarkol@google.com> Acked-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211109023906.1091208-13-ricarkol@google.com
-
Ricardo Koller authored
Add a cmdline arg for using level-sensitive interrupts (vs the default edge-triggered). Then move the handler into a generic handler function that takes the type of interrupt (level vs. edge) as an arg. When handling line-sensitive interrupts it sets the line to low after acknowledging the IRQ. Signed-off-by: Ricardo Koller <ricarkol@google.com> Acked-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211109023906.1091208-12-ricarkol@google.com
-
Ricardo Koller authored
Add tests for IRQ preemption (having more than one activated IRQ at the same time). This test injects multiple concurrent IRQs and handles them without handling the actual exceptions. This is done by masking interrupts for the whole test. Signed-off-by: Ricardo Koller <ricarkol@google.com> Acked-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211109023906.1091208-11-ricarkol@google.com
-
Ricardo Koller authored
Add a new cmdline arg to set the EOI mode for all vgic_irq tests. This specifies whether a write to EOIR will deactivate IRQs or not. Signed-off-by: Ricardo Koller <ricarkol@google.com> Acked-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211109023906.1091208-10-ricarkol@google.com
-
Ricardo Koller authored
Add the ability to specify the number of vIRQs exposed by KVM (arg defaults to 64). Then extend the KVM_IRQ_LINE test by injecting all available SPIs at once (specified by the nr-irqs arg). As a bonus, inject all SGIs at once as well. Signed-off-by: Ricardo Koller <ricarkol@google.com> Acked-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211109023906.1091208-9-ricarkol@google.com
-
Ricardo Koller authored
Build an abstraction around the injection functions, so the preparation and checking around the actual injection can be shared between tests. All functions are stored as pointers in arrays of kvm_inject_desc's which include the pointer and what kind of interrupts they can inject. Signed-off-by: Ricardo Koller <ricarkol@google.com> Acked-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211109023906.1091208-8-ricarkol@google.com
-
Ricardo Koller authored
Add a new KVM selftest, vgic_irq, for testing userspace IRQ injection. This particular test injects an SPI using KVM_IRQ_LINE on GICv3 and verifies that the IRQ is handled in the guest. The next commits will add more types of IRQs and different modes. Signed-off-by: Ricardo Koller <ricarkol@google.com> Acked-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211109023906.1091208-7-ricarkol@google.com
-
Ricardo Koller authored
Add a set of library functions for userspace code in selftests to deal with vIRQ state (i.e., ioctl wrappers). Signed-off-by: Ricardo Koller <ricarkol@google.com> Acked-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211109023906.1091208-6-ricarkol@google.com
-
Ricardo Koller authored
Add an architecture independent wrapper function for the KVM_IRQ_LINE ioctl. Signed-off-by: Ricardo Koller <ricarkol@google.com> Acked-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211109023906.1091208-5-ricarkol@google.com
-
Ricardo Koller authored
Add library functions for accessing GICv3 registers: DIR, PMR, CTLR, ISACTIVER, ISPENDR. Signed-off-by: Ricardo Koller <ricarkol@google.com> Acked-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211109023906.1091208-4-ricarkol@google.com
-
Ricardo Koller authored
Add a generic library function for reading and writing GICv3 distributor and redistributor registers. Then adapt some functions to use it; more will come and use it in the next commit. Signed-off-by: Ricardo Koller <ricarkol@google.com> Acked-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211109023906.1091208-3-ricarkol@google.com
-
Ricardo Koller authored
Move gic_v3.h to the shared headers location. There are some definitions that will be used in the vgic-irq test. Signed-off-by: Ricardo Koller <ricarkol@google.com> Acked-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211109023906.1091208-2-ricarkol@google.com
-
Marc Zyngier authored
The 16kB page size is not a popular choice, due to only a few CPUs actually implementing support for it. However, it can lead to some interesting performance improvements given the right uarch choices. Add support for this page size for various PA/VA combinations. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Link: https://lore.kernel.org/r/20211227124809.1335409-7-maz@kernel.org
-
Marc Zyngier authored
Some of the arm64 systems out there have an IPA space that is positively tiny. Nonetheless, they make great KVM hosts. Add support for 36bit IPA support with 4kB pages, which makes some of the fruity machines happy. Whilst we're at it, add support for 64kB pages as well, though these boxes have no support for it. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211227124809.1335409-6-maz@kernel.org
-
Marc Zyngier authored
The current way we initialise TCR_EL1 is a bit cumbersome, as we mix setting TG0 and IPS in the same swtch statement. Split it into two statements (one for the base granule size, and another for the IPA size), allowing new modes to be added in a more elegant way. No functional change intended. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Link: https://lore.kernel.org/r/20211227124809.1335409-5-maz@kernel.org
-
Marc Zyngier authored
Just as arm64 implemenations don't necessary support all IPA ranges, they don't all support the same page sizes either. Fun. Create a dummy VM to snapshot the page sizes supported by the host, and filter the supported modes. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Link: https://lore.kernel.org/r/20211227124809.1335409-4-maz@kernel.org
-
Marc Zyngier authored
Contrary to popular belief, there is no such thing as a default IPA size on arm64. Anything goes, and implementations are the usual Wild West. The selftest infrastructure default to 40bit IPA, which obviously doesn't work for some systems out there. Turn VM_MODE_DEFAULT from a constant into a variable, and let guest_modes_append_default() populate it, depending on what the HW can do. In order to preserve the current behaviour, we still pick 40bits IPA as the default if it is available, and the largest supported IPA space otherwise. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Link: https://lore.kernel.org/r/20211227124809.1335409-3-maz@kernel.org
-
Marc Zyngier authored
As we are going to add support for a variable default mode on arm64, let's make sure it is setup first by using a constructor that gets called before the actual test runs. Suggested-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Link: https://lore.kernel.org/r/20211227124809.1335409-2-maz@kernel.org
-
- 21 Dec, 2021 3 commits
-
-
Paolo Bonzini authored
Merge tag 'kvm-s390-next-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Fix and cleanup - fix sigp sense/start/stop/inconsistency - cleanups
-
Paolo Bonzini authored
Pick commit fdba608f ("KVM: VMX: Wake vCPU when delivering posted IRQ even if vCPU == this vCPU"). In addition to fixing a bug, it also aligns the non-nested and nested usage of triggering posted interrupts, allowing for additional cleanups. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Drop a check that guards triggering a posted interrupt on the currently running vCPU, and more importantly guards waking the target vCPU if triggering a posted interrupt fails because the vCPU isn't IN_GUEST_MODE. If a vIRQ is delivered from asynchronous context, the target vCPU can be the currently running vCPU and can also be blocking, in which case skipping kvm_vcpu_wake_up() is effectively dropping what is supposed to be a wake event for the vCPU. The "do nothing" logic when "vcpu == running_vcpu" mostly works only because the majority of calls to ->deliver_posted_interrupt(), especially when using posted interrupts, come from synchronous KVM context. But if a device is exposed to the guest using vfio-pci passthrough, the VFIO IRQ and vCPU are bound to the same pCPU, and the IRQ is _not_ configured to use posted interrupts, wake events from the device will be delivered to KVM from IRQ context, e.g. vfio_msihandler() | |-> eventfd_signal() | |-> ... | |-> irqfd_wakeup() | |->kvm_arch_set_irq_inatomic() | |-> kvm_irq_delivery_to_apic_fast() | |-> kvm_apic_set_irq() This also aligns the non-nested and nested usage of triggering posted interrupts, and will allow for additional cleanups. Fixes: 379a3c8e ("KVM: VMX: Optimize posted-interrupt delivery for timer fastpath") Cc: stable@vger.kernel.org Reported-by: Longpeng (Mike) <longpeng2@huawei.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211208015236.1616697-18-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 20 Dec, 2021 7 commits
-
-
Fuad Tabba authored
The barrier is there for power_off rather than power_state. Probably typo in commit 358b28f0 ("arm/arm64: KVM: Allow a VCPU to fully reset itself"). Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211208193257.667613-3-tabba@google.com
-
Fuad Tabba authored
The comment for kvm_reset_vcpu() refers to the sysreg table as being the table above, probably because of the code extracted at commit f4672752 ("arm64: KVM: virtual CPU reset"). Fix the comment to remove the potentially confusing reference. Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211208193257.667613-2-tabba@google.com
-
Fuad Tabba authored
Replace the hardcoded value with the existing definition. No functional change intended. Signed-off-by: Fuad Tabba <tabba@google.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211208192810.657360-1-tabba@google.com
-
Sean Christopherson authored
Add a selftest to attempt to enter L2 with invalid guests state by exiting to userspace via I/O from L2, and then using KVM_SET_SREGS to set invalid guest state (marking TR unusable is arbitrary chosen for its relative simplicity). This is a regression test for a bug introduced by commit c8607e4a ("KVM: x86: nVMX: don't fail nested VM entry on invalid guest state if !from_vmentry"), which incorrectly set vmx->fail=true when L2 had invalid guest state and ultimately triggered a WARN due to nested_vmx_vmexit() seeing vmx->fail==true while attempting to synthesize a nested VM-Exit. The is also a functional test to verify that KVM sythesizes TRIPLE_FAULT for L2, which is somewhat arbitrary behavior, instead of emulating L2. KVM should never emulate L2 due to invalid guest state, as it's architecturally impossible for L1 to run an L2 guest with invalid state as nested VM-Enter should always fail, i.e. L1 needs to do the emulation. Stuffing state via KVM ioctl() is a non-architctural, out-of-band case, hence the TRIPLE_FAULT being rather arbitrary. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211207193006.120997-5-seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Update the documentation for kvm-intel's emulate_invalid_guest_state to rectify the description of KVM's default behavior, and to document that the behavior and thus parameter only applies to L1. Fixes: a27685c3 ("KVM: VMX: Emulate invalid guest state by default") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211207193006.120997-4-seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Synthesize a triple fault if L2 guest state is invalid at the time of VM-Enter, which can happen if L1 modifies SMRAM or if userspace stuffs guest state via ioctls(), e.g. KVM_SET_SREGS. KVM should never emulate invalid guest state, since from L1's perspective, it's architecturally impossible for L2 to have invalid state while L2 is running in hardware. E.g. attempts to set CR0 or CR4 to unsupported values will either VM-Exit or #GP. Modifying vCPU state via RSM+SMRAM and ioctl() are the only paths that can trigger this scenario, as nested VM-Enter correctly rejects any attempt to enter L2 with invalid state. RSM is a straightforward case as (a) KVM follows AMD's SMRAM layout and behavior, and (b) Intel's SDM states that loading reserved CR0/CR4 bits via RSM results in shutdown, i.e. there is precedent for KVM's behavior. Following AMD's SMRAM layout is important as AMD's layout saves/restores the descriptor cache information, including CS.RPL and SS.RPL, and also defines all the fields relevant to invalid guest state as read-only, i.e. so long as the vCPU had valid state before the SMI, which is guaranteed for L2, RSM will generate valid state unless SMRAM was modified. Intel's layout saves/restores only the selector, which means that scenarios where the selector and cached RPL don't match, e.g. conforming code segments, would yield invalid guest state. Intel CPUs fudge around this issued by stuffing SS.RPL and CS.RPL on RSM. Per Intel's SDM on the "Default Treatment of RSM", paraphrasing for brevity: IF internal storage indicates that the [CPU was post-VMXON] THEN enter VMX operation (root or non-root); restore VMX-critical state as defined in Section 34.14.1; set to their fixed values any bits in CR0 and CR4 whose values must be fixed in VMX operation [unless coming from an unrestricted guest]; IF RFLAGS.VM = 0 AND (in VMX root operation OR the “unrestricted guest” VM-execution control is 0) THEN CS.RPL := SS.DPL; SS.RPL := SS.DPL; FI; restore current VMCS pointer; FI; Note that Intel CPUs also overwrite the fixed CR0/CR4 bits, whereas KVM will sythesize TRIPLE_FAULT in this scenario. KVM's behavior is allowed as both Intel and AMD define CR0/CR4 SMRAM fields as read-only, i.e. the only way for CR0 and/or CR4 to have illegal values is if they were modified by the L1 SMM handler, and Intel's SDM "SMRAM State Save Map" section states "modifying these registers will result in unpredictable behavior". KVM's ioctl() behavior is less straightforward. Because KVM allows ioctls() to be executed in any order, rejecting an ioctl() if it would result in invalid L2 guest state is not an option as KVM cannot know if a future ioctl() would resolve the invalid state, e.g. KVM_SET_SREGS, or drop the vCPU out of L2, e.g. KVM_SET_NESTED_STATE. Ideally, KVM would reject KVM_RUN if L2 contained invalid guest state, but that carries the risk of a false positive, e.g. if RSM loaded invalid guest state and KVM exited to userspace. Setting a flag/request to detect such a scenario is undesirable because (a) it's extremely unlikely to add value to KVM as a whole, and (b) KVM would need to consider ioctl() interactions with such a flag, e.g. if userspace migrated the vCPU while the flag were set. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211207193006.120997-3-seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Revert a relatively recent change that set vmx->fail if the vCPU is in L2 and emulation_required is true, as that behavior is completely bogus. Setting vmx->fail and synthesizing a VM-Exit is contradictory and wrong: (a) it's impossible to have both a VM-Fail and VM-Exit (b) vmcs.EXIT_REASON is not modified on VM-Fail (c) emulation_required refers to guest state and guest state checks are always VM-Exits, not VM-Fails. For KVM specifically, emulation_required is handled before nested exits in __vmx_handle_exit(), thus setting vmx->fail has no immediate effect, i.e. KVM calls into handle_invalid_guest_state() and vmx->fail is ignored. Setting vmx->fail can ultimately result in a WARN in nested_vmx_vmexit() firing when tearing down the VM as KVM never expects vmx->fail to be set when L2 is active, KVM always reflects those errors into L1. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 21158 at arch/x86/kvm/vmx/nested.c:4548 nested_vmx_vmexit+0x16bd/0x17e0 arch/x86/kvm/vmx/nested.c:4547 Modules linked in: CPU: 0 PID: 21158 Comm: syz-executor.1 Not tainted 5.16.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:nested_vmx_vmexit+0x16bd/0x17e0 arch/x86/kvm/vmx/nested.c:4547 Code: <0f> 0b e9 2e f8 ff ff e8 57 b3 5d 00 0f 0b e9 00 f1 ff ff 89 e9 80 Call Trace: vmx_leave_nested arch/x86/kvm/vmx/nested.c:6220 [inline] nested_vmx_free_vcpu+0x83/0xc0 arch/x86/kvm/vmx/nested.c:330 vmx_free_vcpu+0x11f/0x2a0 arch/x86/kvm/vmx/vmx.c:6799 kvm_arch_vcpu_destroy+0x6b/0x240 arch/x86/kvm/x86.c:10989 kvm_vcpu_destroy+0x29/0x90 arch/x86/kvm/../../../virt/kvm/kvm_main.c:441 kvm_free_vcpus arch/x86/kvm/x86.c:11426 [inline] kvm_arch_destroy_vm+0x3ef/0x6b0 arch/x86/kvm/x86.c:11545 kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:1189 [inline] kvm_put_kvm+0x751/0xe40 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1220 kvm_vcpu_release+0x53/0x60 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3489 __fput+0x3fc/0x870 fs/file_table.c:280 task_work_run+0x146/0x1c0 kernel/task_work.c:164 exit_task_work include/linux/task_work.h:32 [inline] do_exit+0x705/0x24f0 kernel/exit.c:832 do_group_exit+0x168/0x2d0 kernel/exit.c:929 get_signal+0x1740/0x2120 kernel/signal.c:2852 arch_do_signal_or_restart+0x9c/0x730 arch/x86/kernel/signal.c:868 handle_signal_work kernel/entry/common.c:148 [inline] exit_to_user_mode_loop kernel/entry/common.c:172 [inline] exit_to_user_mode_prepare+0x191/0x220 kernel/entry/common.c:207 __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline] syscall_exit_to_user_mode+0x2e/0x70 kernel/entry/common.c:300 do_syscall_64+0x53/0xd0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: c8607e4a ("KVM: x86: nVMX: don't fail nested VM entry on invalid guest state if !from_vmentry") Reported-by: syzbot+f1d2136db9c80d4733e8@syzkaller.appspotmail.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211207193006.120997-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-