1. 21 Jun, 2024 2 commits
    • Paolo Bonzini's avatar
      Merge tag 'kvm-x86-fixes-6.10-rcN' of https://github.com/kvm-x86/linux into HEAD · dee67a94
      Paolo Bonzini authored
      KVM fixes for 6.10
      
       - Fix a "shift too big" goof in the KVM_SEV_INIT2 selftest.
      
       - Compute the max mappable gfn for KVM selftests on x86 using GuestMaxPhyAddr
         from KVM's supported CPUID (if it's available).
      
       - Fix a race in kvm_vcpu_on_spin() by ensuring loads and stores are atomic.
      
       - Fix technically benign bug in __kvm_handle_hva_range() where KVM consumes
         the return from a void-returning function as if it were a boolean.
      dee67a94
    • Michael Roth's avatar
      KVM: SEV-ES: Fix svm_get_msr()/svm_set_msr() for KVM_SEV_ES_INIT guests · cf6d9d2d
      Michael Roth authored
      With commit 27bd5fdc ("KVM: SEV-ES: Prevent MSR access post VMSA
      encryption"), older VMMs like QEMU 9.0 and older will fail when booting
      SEV-ES guests with something like the following error:
      
        qemu-system-x86_64: error: failed to get MSR 0x174
        qemu-system-x86_64: ../qemu.git/target/i386/kvm/kvm.c:3950: kvm_get_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
      
      This is because older VMMs that might still call
      svm_get_msr()/svm_set_msr() for SEV-ES guests after guest boot even if
      those interfaces were essentially just noops because of the vCPU state
      being encrypted and stored separately in the VMSA. Now those VMMs will
      get an -EINVAL and generally crash.
      
      Newer VMMs that are aware of KVM_SEV_INIT2 however are already aware of
      the stricter limitations of what vCPU state can be sync'd during
      guest run-time, so newer QEMU for instance will work both for legacy
      KVM_SEV_ES_INIT interface as well as KVM_SEV_INIT2.
      
      So when using KVM_SEV_INIT2 it's okay to assume userspace can deal with
      -EINVAL, whereas for legacy KVM_SEV_ES_INIT the kernel might be dealing
      with either an older VMM and so it needs to assume that returning
      -EINVAL might break the VMM.
      
      Address this by only returning -EINVAL if the guest was started with
      KVM_SEV_INIT2. Otherwise, just silently return.
      
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Nikunj A Dadhania <nikunj@amd.com>
      Reported-by: default avatarSrikanth Aithal <sraithal@amd.com>
      Closes: https://lore.kernel.org/lkml/37usuu4yu4ok7be2hqexhmcyopluuiqj3k266z4gajc2rcj4yo@eujb23qc3zcm/
      Fixes: 27bd5fdc ("KVM: SEV-ES: Prevent MSR access post VMSA encryption")
      Signed-off-by: default avatarMichael Roth <michael.roth@amd.com>
      Message-ID: <20240604233510.764949-1-michael.roth@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      cf6d9d2d
  2. 20 Jun, 2024 6 commits
  3. 18 Jun, 2024 1 commit
    • Babu Moger's avatar
      KVM: Stop processing *all* memslots when "null" mmu_notifier handler is found · c3f3edf7
      Babu Moger authored
      Bail from outer address space loop, not just the inner memslot loop, when
      a "null" handler is encountered by __kvm_handle_hva_range(), which is the
      intended behavior.  On x86, which has multiple address spaces thanks to
      SMM emulation, breaking from just the memslot loop results in undefined
      behavior due to assigning the non-existent return value from kvm_null_fn()
      to a bool.
      
      In practice, the bug is benign as kvm_mmu_notifier_invalidate_range_end()
      is the only caller that passes handler=kvm_null_fn, and it doesn't set
      flush_on_ret, i.e. assigning garbage to r.ret is ultimately ignored.  And
      for most configuration the compiler elides the entire sequence, i.e. there
      is no undefined behavior at runtime.
      
        ------------[ cut here ]------------
        UBSAN: invalid-load in arch/x86/kvm/../../../virt/kvm/kvm_main.c:655:10
        load of value 160 is not a valid value for type '_Bool'
        CPU: 370 PID: 8246 Comm: CPU 0/KVM Not tainted 6.8.2-amdsos-build58-ubuntu-22.04+ #1
        Hardware name: AMD Corporation Sh54p/Sh54p, BIOS WPC4429N 04/25/2024
        Call Trace:
         <TASK>
         dump_stack_lvl+0x48/0x60
         ubsan_epilogue+0x5/0x30
         __ubsan_handle_load_invalid_value+0x79/0x80
         kvm_mmu_notifier_invalidate_range_end.cold+0x18/0x4f [kvm]
         __mmu_notifier_invalidate_range_end+0x63/0xe0
         __split_huge_pmd+0x367/0xfc0
         do_huge_pmd_wp_page+0x1cc/0x380
         __handle_mm_fault+0x8ee/0xe50
         handle_mm_fault+0xe4/0x4a0
         __get_user_pages+0x190/0x840
         get_user_pages_unlocked+0xe0/0x590
         hva_to_pfn+0x114/0x550 [kvm]
         kvm_faultin_pfn+0xed/0x5b0 [kvm]
         kvm_tdp_page_fault+0x123/0x170 [kvm]
         kvm_mmu_page_fault+0x244/0xaa0 [kvm]
         vcpu_enter_guest+0x592/0x1070 [kvm]
         kvm_arch_vcpu_ioctl_run+0x145/0x8a0 [kvm]
         kvm_vcpu_ioctl+0x288/0x6d0 [kvm]
         __x64_sys_ioctl+0x8f/0xd0
         do_syscall_64+0x77/0x120
         entry_SYSCALL_64_after_hwframe+0x6e/0x76
         </TASK>
        ---[ end trace ]---
      
      Fixes: 071064f1 ("KVM: Don't take mmu_lock for range invalidation unless necessary")
      Signed-off-by: default avatarBabu Moger <babu.moger@amd.com>
      Link: https://lore.kernel.org/r/b8723d39903b64c241c50f5513f804390c7b5eec.1718203311.git.babu.moger@amd.com
      [sean: massage changelog]
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      c3f3edf7
  4. 11 Jun, 2024 1 commit
    • Vincent Donnefort's avatar
      KVM: arm64: FFA: Release hyp rx buffer · d66e50be
      Vincent Donnefort authored
      According to the FF-A spec (Buffer states and ownership), after a
      producer has written into a buffer, it is "full" and now owned by the
      consumer. The producer won't be able to use that buffer, until the
      consumer hands it over with an invocation such as RX_RELEASE.
      
      It is clear in the following paragraph (Transfer of buffer ownership),
      that MEM_RETRIEVE_RESP is transferring the ownership from producer (in
      our case SPM) to consumer (hypervisor). RX_RELEASE is therefore
      mandatory here.
      
      It is less clear though what is happening with MEM_FRAG_TX. But this
      invocation, as a response to MEM_FRAG_RX writes into the same hypervisor
      RX buffer (see paragraph "Transmission of transaction descriptor in
      fragments"). Also this is matching the TF-A implementation where the RX
      buffer is marked "full" during a MEM_FRAG_RX.
      
      Release the RX hypervisor buffer in those two cases. This will unblock
      later invocations using this buffer which would otherwise fail.
      (RETRIEVE_REQ, MEM_FRAG_RX and PARTITION_INFO_GET).
      Signed-off-by: default avatarVincent Donnefort <vdonnefort@google.com>
      Reviewed-by: default avatarSudeep Holla <sudeep.holla@arm.com>
      Link: https://lore.kernel.org/r/20240611175317.1220842-1-vdonnefort@google.comSigned-off-by: default avatarMarc Zyngier <maz@kernel.org>
      d66e50be
  5. 06 Jun, 2024 1 commit
  6. 05 Jun, 2024 5 commits
    • Breno Leitao's avatar
      KVM: Fix a data race on last_boosted_vcpu in kvm_vcpu_on_spin() · 49f683b4
      Breno Leitao authored
      Use {READ,WRITE}_ONCE() to access kvm->last_boosted_vcpu to ensure the
      loads and stores are atomic.  In the extremely unlikely scenario the
      compiler tears the stores, it's theoretically possible for KVM to attempt
      to get a vCPU using an out-of-bounds index, e.g. if the write is split
      into multiple 8-bit stores, and is paired with a 32-bit load on a VM with
      257 vCPUs:
      
        CPU0                              CPU1
        last_boosted_vcpu = 0xff;
      
                                          (last_boosted_vcpu = 0x100)
                                          last_boosted_vcpu[15:8] = 0x01;
        i = (last_boosted_vcpu = 0x1ff)
                                          last_boosted_vcpu[7:0] = 0x00;
      
        vcpu = kvm->vcpu_array[0x1ff];
      
      As detected by KCSAN:
      
        BUG: KCSAN: data-race in kvm_vcpu_on_spin [kvm] / kvm_vcpu_on_spin [kvm]
      
        write to 0xffffc90025a92344 of 4 bytes by task 4340 on cpu 16:
        kvm_vcpu_on_spin (arch/x86/kvm/../../../virt/kvm/kvm_main.c:4112) kvm
        handle_pause (arch/x86/kvm/vmx/vmx.c:5929) kvm_intel
        vmx_handle_exit (arch/x86/kvm/vmx/vmx.c:?
      		 arch/x86/kvm/vmx/vmx.c:6606) kvm_intel
        vcpu_run (arch/x86/kvm/x86.c:11107 arch/x86/kvm/x86.c:11211) kvm
        kvm_arch_vcpu_ioctl_run (arch/x86/kvm/x86.c:?) kvm
        kvm_vcpu_ioctl (arch/x86/kvm/../../../virt/kvm/kvm_main.c:?) kvm
        __se_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:904 fs/ioctl.c:890)
        __x64_sys_ioctl (fs/ioctl.c:890)
        x64_sys_call (arch/x86/entry/syscall_64.c:33)
        do_syscall_64 (arch/x86/entry/common.c:?)
        entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
      
        read to 0xffffc90025a92344 of 4 bytes by task 4342 on cpu 4:
        kvm_vcpu_on_spin (arch/x86/kvm/../../../virt/kvm/kvm_main.c:4069) kvm
        handle_pause (arch/x86/kvm/vmx/vmx.c:5929) kvm_intel
        vmx_handle_exit (arch/x86/kvm/vmx/vmx.c:?
      			arch/x86/kvm/vmx/vmx.c:6606) kvm_intel
        vcpu_run (arch/x86/kvm/x86.c:11107 arch/x86/kvm/x86.c:11211) kvm
        kvm_arch_vcpu_ioctl_run (arch/x86/kvm/x86.c:?) kvm
        kvm_vcpu_ioctl (arch/x86/kvm/../../../virt/kvm/kvm_main.c:?) kvm
        __se_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:904 fs/ioctl.c:890)
        __x64_sys_ioctl (fs/ioctl.c:890)
        x64_sys_call (arch/x86/entry/syscall_64.c:33)
        do_syscall_64 (arch/x86/entry/common.c:?)
        entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
      
        value changed: 0x00000012 -> 0x00000000
      
      Fixes: 217ece61 ("KVM: use yield_to instead of sleep in kvm_vcpu_on_spin")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
      Link: https://lore.kernel.org/r/20240510092353.2261824-1-leitao@debian.orgSigned-off-by: default avatarSean Christopherson <seanjc@google.com>
      49f683b4
    • Tao Su's avatar
      KVM: selftests: x86: Prioritize getting max_gfn from GuestPhysBits · 980b8bc0
      Tao Su authored
      Use the max mappable GPA via GuestPhysBits advertised by KVM to calculate
      max_gfn. Currently some selftests (e.g. access_tracking_perf_test,
      dirty_log_test...) add RAM regions close to max_gfn, so guest may access
      GPA beyond its mappable range and cause infinite loop.
      
      Adjust max_gfn in vm_compute_max_gfn() since x86 selftests already
      overrides vm_compute_max_gfn() specifically to deal with goofy edge cases.
      Reported-by: default avatarYi Lai <yi1.lai@intel.com>
      Signed-off-by: default avatarTao Su <tao1.su@linux.intel.com>
      Tested-by: default avatarYi Lai <yi1.lai@intel.com>
      Reviewed-by: default avatarXiaoyao Li <xiaoyao.li@intel.com>
      Link: https://lore.kernel.org/r/20240513014003.104593-1-tao1.su@linux.intel.com
      [sean: tweak name, add comment and sanity check]
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      980b8bc0
    • Colin Ian King's avatar
      KVM: selftests: Fix shift of 32 bit unsigned int more than 32 bits · d21b3c60
      Colin Ian King authored
      Currrentl a 32 bit 1u value is being shifted more than 32 bits causing
      overflow and incorrect checking of bits 32-63. Fix this by using the
      BIT_ULL macro for shifting bits.
      
      Detected by cppcheck:
      sev_init2_tests.c:108:34: error: Shifting 32-bit value by 63 bits is
      undefined behaviour [shiftTooManyBits]
      
      Fixes: dfc083a1 ("selftests: kvm: add tests for KVM_SEV_INIT2")
      Signed-off-by: default avatarColin Ian King <colin.i.king@gmail.com>
      Link: https://lore.kernel.org/r/20240523154102.2236133-1-colin.i.king@gmail.comSigned-off-by: default avatarSean Christopherson <seanjc@google.com>
      d21b3c60
    • Tao Su's avatar
      KVM: x86/mmu: Don't save mmu_invalidate_seq after checking private attr · db574f2f
      Tao Su authored
      Drop the second snapshot of mmu_invalidate_seq in kvm_faultin_pfn().
      Before checking the mismatch of private vs. shared, mmu_invalidate_seq is
      saved to fault->mmu_seq, which can be used to detect an invalidation
      related to the gfn occurred, i.e. KVM will not install a mapping in page
      table if fault->mmu_seq != mmu_invalidate_seq.
      
      Currently there is a second snapshot of mmu_invalidate_seq, which may not
      be same as the first snapshot in kvm_faultin_pfn(), i.e. the gfn attribute
      may be changed between the two snapshots, but the gfn may be mapped in
      page table without hindrance. Therefore, drop the second snapshot as it
      has no obvious benefits.
      
      Fixes: f6adeae8 ("KVM: x86/mmu: Handle no-slot faults at the beginning of kvm_faultin_pfn()")
      Signed-off-by: default avatarTao Su <tao1.su@linux.intel.com>
      Message-ID: <20240528102234.2162763-1-tao1.su@linux.intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      db574f2f
    • Paolo Bonzini's avatar
      Merge tag 'kvmarm-fixes-6.10-1' of... · 45ce0314
      Paolo Bonzini authored
      Merge tag 'kvmarm-fixes-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
      
      KVM/arm64 fixes for 6.10, take #1
      
      - Large set of FP/SVE fixes for pKVM, addressing the fallout
        from the per-CPU data rework and making sure that the host
        is not involved in the FP/SVE switching any more
      
      - Allow FEAT_BTI to be enabled with NV now that FEAT_PAUTH
        is copletely supported
      
      - Fix for the respective priorities of Failed PAC, Illegal
        Execution state and Instruction Abort exceptions
      
      - Fix the handling of AArch32 instruction traps failing their
        condition code, which was broken by the introduction of
        ESR_EL2.ISS2
      
      - Allow vpcus running in AArch32 state to be restored in
        System mode
      
      - Fix AArch32 GPR restore that would lose the 64 bit state
        under some conditions
      45ce0314
  7. 04 Jun, 2024 9 commits
  8. 03 Jun, 2024 6 commits
    • Paolo Bonzini's avatar
      Merge tag 'kvm-riscv-fixes-6.10-1' of https://github.com/kvm-riscv/linux into HEAD · b50788f7
      Paolo Bonzini authored
      KVM/riscv fixes for 6.10, take #1
      
      - No need to use mask when hart-index-bits is 0
      - Fix incorrect reg_subtype labels in kvm_riscv_vcpu_set_reg_isa_ext()
      b50788f7
    • Paolo Bonzini's avatar
      Merge branch 'kvm-fixes-6.10-1' into HEAD · b3233c73
      Paolo Bonzini authored
      * Fixes and debugging help for the #VE sanity check.  Also disable
        it by default, even for CONFIG_DEBUG_KERNEL, because it was found
        to trigger spuriously (most likely a processor erratum as the
        exact symptoms vary by generation).
      
      * Avoid WARN() when two NMIs arrive simultaneously during an NMI-disabled
        situation (GIF=0 or interrupt shadow) when the processor supports
        virtual NMI.  While generally KVM will not request an NMI window
        when virtual NMIs are supported, in this case it *does* have to
        single-step over the interrupt shadow or enable the STGI intercept,
        in order to deliver the latched second NMI.
      
      * Drop support for hand tuning APIC timer advancement from userspace.
        Since we have adaptive tuning, and it has proved to work well,
        drop the module parameter for manual configuration and with it a
        few stupid bugs that it had.
      b3233c73
    • Sean Christopherson's avatar
      KVM: x86: Drop support for hand tuning APIC timer advancement from userspace · 89a58812
      Sean Christopherson authored
      Remove support for specifying a static local APIC timer advancement value,
      and instead present a read-only boolean parameter to let userspace enable
      or disable KVM's dynamic APIC timer advancement.  Realistically, it's all
      but impossible for userspace to specify an advancement that is more
      precise than what KVM's adaptive tuning can provide.  E.g. a static value
      needs to be tuned for the exact hardware and kernel, and if KVM is using
      hrtimers, likely requires additional tuning for the exact configuration of
      the entire system.
      
      Dropping support for a userspace provided value also fixes several flaws
      in the interface.  E.g. KVM interprets a negative value other than -1 as a
      large advancement, toggling between a negative and positive value yields
      unpredictable behavior as vCPUs will switch from dynamic to static
      advancement, changing the advancement in the middle of VM creation can
      result in different values for vCPUs within a VM, etc.  Those flaws are
      mostly fixable, but there's almost no justification for taking on yet more
      complexity (it's minimal complexity, but still non-zero).
      
      The only arguments against using KVM's adaptive tuning is if a setup needs
      a higher maximum, or if the adjustments are too reactive, but those are
      arguments for letting userspace control the absolute max advancement and
      the granularity of each adjustment, e.g. similar to how KVM provides knobs
      for halt polling.
      
      Link: https://lore.kernel.org/all/20240520115334.852510-1-zhoushuling@huawei.com
      Cc: Shuling Zhou <zhoushuling@huawei.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-ID: <20240522010304.1650603-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      89a58812
    • Ravi Bangoria's avatar
      KVM: SEV-ES: Delegate LBR virtualization to the processor · b7e4be0a
      Ravi Bangoria authored
      As documented in APM[1], LBR Virtualization must be enabled for SEV-ES
      guests. Although KVM currently enforces LBRV for SEV-ES guests, there
      are multiple issues with it:
      
      o MSR_IA32_DEBUGCTLMSR is still intercepted. Since MSR_IA32_DEBUGCTLMSR
        interception is used to dynamically toggle LBRV for performance reasons,
        this can be fatal for SEV-ES guests. For ex SEV-ES guest on Zen3:
      
        [guest ~]# wrmsr 0x1d9 0x4
        KVM: entry failed, hardware error 0xffffffff
        EAX=00000004 EBX=00000000 ECX=000001d9 EDX=00000000
      
        Fix this by never intercepting MSR_IA32_DEBUGCTLMSR for SEV-ES guests.
        No additional save/restore logic is required since MSR_IA32_DEBUGCTLMSR
        is of swap type A.
      
      o KVM will disable LBRV if userspace sets MSR_IA32_DEBUGCTLMSR before the
        VMSA is encrypted. Fix this by moving LBRV enablement code post VMSA
        encryption.
      
      [1]: AMD64 Architecture Programmer's Manual Pub. 40332, Rev. 4.07 - June
           2023, Vol 2, 15.35.2 Enabling SEV-ES.
           https://bugzilla.kernel.org/attachment.cgi?id=304653
      
      Fixes: 376c6d28 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading")
      Co-developed-by: default avatarNikunj A Dadhania <nikunj@amd.com>
      Signed-off-by: default avatarNikunj A Dadhania <nikunj@amd.com>
      Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@amd.com>
      Message-ID: <20240531044644.768-4-ravi.bangoria@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b7e4be0a
    • Ravi Bangoria's avatar
      KVM: SEV-ES: Disallow SEV-ES guests when X86_FEATURE_LBRV is absent · d9220562
      Ravi Bangoria authored
      As documented in APM[1], LBR Virtualization must be enabled for SEV-ES
      guests. So, prevent SEV-ES guests when LBRV support is missing.
      
      [1]: AMD64 Architecture Programmer's Manual Pub. 40332, Rev. 4.07 - June
           2023, Vol 2, 15.35.2 Enabling SEV-ES.
           https://bugzilla.kernel.org/attachment.cgi?id=304653
      
      Fixes: 376c6d28 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading")
      Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@amd.com>
      Message-ID: <20240531044644.768-3-ravi.bangoria@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d9220562
    • Nikunj A Dadhania's avatar
      KVM: SEV-ES: Prevent MSR access post VMSA encryption · 27bd5fdc
      Nikunj A Dadhania authored
      KVM currently allows userspace to read/write MSRs even after the VMSA is
      encrypted. This can cause unintentional issues if MSR access has side-
      effects. For ex, while migrating a guest, userspace could attempt to
      migrate MSR_IA32_DEBUGCTLMSR and end up unintentionally disabling LBRV on
      the target. Fix this by preventing access to those MSRs which are context
      switched via the VMSA, once the VMSA is encrypted.
      Suggested-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarNikunj A Dadhania <nikunj@amd.com>
      Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@amd.com>
      Message-ID: <20240531044644.768-2-ravi.bangoria@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      27bd5fdc
  9. 02 Jun, 2024 8 commits
  10. 01 Jun, 2024 1 commit