1. 22 Oct, 2021 6 commits
    • Paolo Bonzini's avatar
      KVM: SEV-ES: keep INS functions together · 4fa4b38d
      Paolo Bonzini authored
      Make the diff a little nicer when we actually get to fixing
      the bug.  No functional change intended.
      
      Cc: stable@vger.kernel.org
      Fixes: 7ed9abfe ("KVM: SVM: Support string IO operations for an SEV-ES guest")
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4fa4b38d
    • Paolo Bonzini's avatar
      KVM: x86: remove unnecessary arguments from complete_emulator_pio_in · 6b5efc93
      Paolo Bonzini authored
      complete_emulator_pio_in can expect that vcpu->arch.pio has been filled in,
      and therefore does not need the size and count arguments.  This makes things
      nicer when the function is called directly from a complete_userspace_io
      callback.
      
      No functional change intended.
      
      Cc: stable@vger.kernel.org
      Fixes: 7ed9abfe ("KVM: SVM: Support string IO operations for an SEV-ES guest")
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6b5efc93
    • Paolo Bonzini's avatar
      KVM: x86: split the two parts of emulator_pio_in · 3b27de27
      Paolo Bonzini authored
      emulator_pio_in handles both the case where the data is pending in
      vcpu->arch.pio.count, and the case where I/O has to be done via either
      an in-kernel device or a userspace exit.  For SEV-ES we would like
      to split these, to identify clearly the moment at which the
      sev_pio_data is consumed.  To this end, create two different
      functions: __emulator_pio_in fills in vcpu->arch.pio.count, while
      complete_emulator_pio_in clears it and releases vcpu->arch.pio.data.
      
      Because this patch has to be backported, things are left a bit messy.
      kernel_pio() operates on vcpu->arch.pio, which leads to emulator_pio_in()
      having with two calls to complete_emulator_pio_in().  It will be fixed
      in the next release.
      
      While at it, remove the unused void* val argument of emulator_pio_in_out.
      The function currently hardcodes vcpu->arch.pio_data as the
      source/destination buffer, which sucks but will be fixed after the more
      severe SEV-ES buffer overflow.
      
      No functional change intended.
      
      Cc: stable@vger.kernel.org
      Fixes: 7ed9abfe ("KVM: SVM: Support string IO operations for an SEV-ES guest")
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3b27de27
    • Paolo Bonzini's avatar
      KVM: SEV-ES: clean up kvm_sev_es_ins/outs · ea724ea4
      Paolo Bonzini authored
      A few very small cleanups to the functions, smushed together because
      the patch is already very small like this:
      
      - inline emulator_pio_in_emulated and emulator_pio_out_emulated,
        since we already have the vCPU
      
      - remove the data argument and pull setting vcpu->arch.sev_pio_data into
        the caller
      
      - remove unnecessary clearing of vcpu->arch.pio.count when
        emulation is done by the kernel (and therefore vcpu->arch.pio.count
        is already clear on exit from emulator_pio_in and emulator_pio_out).
      
      No functional change intended.
      
      Cc: stable@vger.kernel.org
      Fixes: 7ed9abfe ("KVM: SVM: Support string IO operations for an SEV-ES guest")
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ea724ea4
    • Paolo Bonzini's avatar
      KVM: x86: leave vcpu->arch.pio.count alone in emulator_pio_in_out · 0d33b1ba
      Paolo Bonzini authored
      Currently emulator_pio_in clears vcpu->arch.pio.count twice if
      emulator_pio_in_out performs kernel PIO.  Move the clear into
      emulator_pio_out where it is actually necessary.
      
      No functional change intended.
      
      Cc: stable@vger.kernel.org
      Fixes: 7ed9abfe ("KVM: SVM: Support string IO operations for an SEV-ES guest")
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0d33b1ba
    • Paolo Bonzini's avatar
      KVM: SEV-ES: rename guest_ins_data to sev_pio_data · b5998402
      Paolo Bonzini authored
      We will be using this field for OUTS emulation as well, in case the
      data that is pushed via OUTS spans more than one page.  In that case,
      there will be a need to save the data pointer across exits to userspace.
      
      So, change the name to something that refers to any kind of PIO.
      Also spell out what it is used for, namely SEV-ES.
      
      No functional change intended.
      
      Cc: stable@vger.kernel.org
      Fixes: 7ed9abfe ("KVM: SVM: Support string IO operations for an SEV-ES guest")
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b5998402
  2. 21 Oct, 2021 4 commits
  3. 18 Oct, 2021 7 commits
    • Paolo Bonzini's avatar
      KVM: SEV-ES: reduce ghcb_sa_len to 32 bits · 9f1ee7b1
      Paolo Bonzini authored
      The size of the GHCB scratch area is limited to 16 KiB (GHCB_SCRATCH_AREA_LIMIT),
      so there is no need for it to be a u64.  This fixes a build error on 32-bit
      systems:
      
      i686-linux-gnu-ld: arch/x86/kvm/svm/sev.o: in function `sev_es_string_io:
      sev.c:(.text+0x110f): undefined reference to `__udivdi3'
      
      Cc: stable@vger.kernel.org
      Fixes: 019057bd ("KVM: SEV-ES: fix length of string I/O")
      Reported-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      9f1ee7b1
    • Hao Xiang's avatar
      KVM: VMX: Remove redundant handling of bus lock vmexit · d61863c6
      Hao Xiang authored
      Hardware may or may not set exit_reason.bus_lock_detected on BUS_LOCK
      VM-Exits. Dealing with KVM_RUN_X86_BUS_LOCK in handle_bus_lock_vmexit
      could be redundant when exit_reason.basic is EXIT_REASON_BUS_LOCK.
      
      We can remove redundant handling of bus lock vmexit. Unconditionally Set
      exit_reason.bus_lock_detected in handle_bus_lock_vmexit(), and deal with
      KVM_RUN_X86_BUS_LOCK only in vmx_handle_exit().
      Signed-off-by: default avatarHao Xiang <hao.xiang@linux.alibaba.com>
      Message-Id: <1634299161-30101-1-git-send-email-hao.xiang@linux.alibaba.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d61863c6
    • Christian Borntraeger's avatar
      KVM: kvm_stat: do not show halt_wait_ns · 01c7d267
      Christian Borntraeger authored
      Similar to commit 111d0bda ("tools/kvm_stat: Exempt time-based
      counters"), we should not show timer values in kvm_stat. Remove the new
      halt_wait_ns.
      
      Fixes: 87bcc5fa ("KVM: stats: Add halt_wait_ns stats for all architectures")
      Cc: Jing Zhang <jingzhangos@google.com>
      Cc: Stefan Raspl <raspl@de.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: default avatarStefan Raspl <raspl@linux.ibm.com>
      Message-Id: <20211006121724.4154-1-borntraeger@de.ibm.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      01c7d267
    • Sean Christopherson's avatar
      KVM: x86: WARN if APIC HW/SW disable static keys are non-zero on unload · 9139a7a6
      Sean Christopherson authored
      WARN if the static keys used to track if any vCPU has disabled its APIC
      are left elevated at module exit.  Unlike the underflow case, nothing in
      the static key infrastructure will complain if a key is left elevated,
      and because an elevated key only affects performance, nothing in KVM will
      fail if either key is improperly incremented.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211013003554.47705-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      9139a7a6
    • Sean Christopherson's avatar
      Revert "KVM: x86: Open code necessary bits of kvm_lapic_set_base() at vCPU RESET" · f7d8a19f
      Sean Christopherson authored
      Revert a change to open code bits of kvm_lapic_set_base() when emulating
      APIC RESET to fix an apic_hw_disabled underflow bug due to arch.apic_base
      and apic_hw_disabled being unsyncrhonized when the APIC is created.  If
      kvm_arch_vcpu_create() fails after creating the APIC, kvm_free_lapic()
      will see the initialized-to-zero vcpu->arch.apic_base and decrement
      apic_hw_disabled without KVM ever having incremented apic_hw_disabled.
      
      Using kvm_lapic_set_base() in kvm_lapic_reset() is also desirable for a
      potential future where KVM supports RESET outside of vCPU creation, in
      which case all the side effects of kvm_lapic_set_base() are needed, e.g.
      to handle the transition from x2APIC => xAPIC.
      
      Alternatively, KVM could temporarily increment apic_hw_disabled (and call
      kvm_lapic_set_base() at RESET), but that's a waste of cycles and would
      impact the performance of other vCPUs and VMs.  The other subtle side
      effect is that updating the xAPIC ID needs to be done at RESET regardless
      of whether the APIC was previously enabled, i.e. kvm_lapic_reset() needs
      an explicit call to kvm_apic_set_xapic_id() regardless of whether or not
      kvm_lapic_set_base() also performs the update.  That makes stuffing the
      enable bit at vCPU creation slightly more palatable, as doing so affects
      only the apic_hw_disabled key.
      
      Opportunistically tweak the comment to explicitly call out the connection
      between vcpu->arch.apic_base and apic_hw_disabled, and add a comment to
      call out the need to always do kvm_apic_set_xapic_id() at RESET.
      
      Underflow scenario:
      
        kvm_vm_ioctl() {
          kvm_vm_ioctl_create_vcpu() {
            kvm_arch_vcpu_create() {
              if (something_went_wrong)
                goto fail_free_lapic;
              /* vcpu->arch.apic_base is initialized when something_went_wrong is false. */
              kvm_vcpu_reset() {
                kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event) {
                  vcpu->arch.apic_base = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE;
                }
              }
              return 0;
            fail_free_lapic:
              kvm_free_lapic() {
                /* vcpu->arch.apic_base is not yet initialized when something_went_wrong is true. */
                if (!(vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE))
                  static_branch_slow_dec_deferred(&apic_hw_disabled); // <= underflow bug.
              }
              return r;
            }
          }
        }
      
      This (mostly) reverts commit 42122123.
      
      Fixes: 42122123 ("KVM: x86: Open code necessary bits of kvm_lapic_set_base() at vCPU RESET")
      Reported-by: syzbot+9fc046ab2b0cf295a063@syzkaller.appspotmail.com
      Debugged-by: default avatarTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211013003554.47705-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f7d8a19f
    • Peter Gonda's avatar
      KVM: SEV-ES: Set guest_state_protected after VMSA update · baa1e5ca
      Peter Gonda authored
      The refactoring in commit bb18a677 ("KVM: SEV: Acquire
      vcpu mutex when updating VMSA") left behind the assignment to
      svm->vcpu.arch.guest_state_protected; add it back.
      Signed-off-by: default avatarPeter Gonda <pgonda@google.com>
      [Delta between v2 and v3 of Peter's patch, which had already been
       committed; the commit message is my own. - Paolo]
      Fixes: bb18a677 ("KVM: SEV: Acquire vcpu mutex when updating VMSA")
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      baa1e5ca
    • Paolo Bonzini's avatar
      KVM: X86: fix lazy allocation of rmaps · fa13843d
      Paolo Bonzini authored
      If allocation of rmaps fails, but some of the pointers have already been written,
      those pointers can be cleaned up when the memslot is freed, or even reused later
      for another attempt at allocating the rmaps.  Therefore there is no need to
      WARN, as done for example in memslot_rmap_alloc, but the allocation *must* be
      skipped lest KVM will overwrite the previous pointer and will indeed leak memory.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fa13843d
  4. 15 Oct, 2021 2 commits
  5. 05 Oct, 2021 3 commits
  6. 04 Oct, 2021 1 commit
  7. 30 Sep, 2021 4 commits
    • Sean Christopherson's avatar
      KVM: selftests: Ensure all migrations are performed when test is affined · 7b0035ea
      Sean Christopherson authored
      Rework the CPU selection in the migration worker to ensure the specified
      number of migrations are performed when the test iteslf is affined to a
      subset of CPUs.  The existing logic skips iterations if the target CPU is
      not in the original set of possible CPUs, which causes the test to fail
      if too many iterations are skipped.
      
        ==== Test Assertion Failure ====
        rseq_test.c:228: i > (NR_TASK_MIGRATIONS / 2)
        pid=10127 tid=10127 errno=4 - Interrupted system call
           1  0x00000000004018e5: main at rseq_test.c:227
           2  0x00007fcc8fc66bf6: ?? ??:0
           3  0x0000000000401959: _start at ??:?
        Only performed 4 KVM_RUNs, task stalled too much?
      
      Calculate the min/max possible CPUs as a cheap "best effort" to avoid
      high runtimes when the test is affined to a small percentage of CPUs.
      Alternatively, a list or xarray of the possible CPUs could be used, but
      even in a horrendously inefficient setup, such optimizations are not
      needed because the runtime is completely dominated by the cost of
      migrating the task, and the absolute runtime is well under a minute in
      even truly absurd setups, e.g. running on a subset of vCPUs in a VM that
      is heavily overcommited (16 vCPUs per pCPU).
      
      Fixes: 61e52f16 ("KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs")
      Reported-by: default avatarDongli Zhang <dongli.zhang@oracle.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210929234112.1862848-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7b0035ea
    • Sean Christopherson's avatar
      KVM: x86: Swap order of CPUID entry "index" vs. "significant flag" checks · e8a747d0
      Sean Christopherson authored
      Check whether a CPUID entry's index is significant before checking for a
      matching index to hack-a-fix an undefined behavior bug due to consuming
      uninitialized data.  RESET/INIT emulation uses kvm_cpuid() to retrieve
      CPUID.0x1, which does _not_ have a significant index, and fails to
      initialize the dummy variable that doubles as EBX/ECX/EDX output _and_
      ECX, a.k.a. index, input.
      
      Practically speaking, it's _extremely_  unlikely any compiler will yield
      code that causes problems, as the compiler would need to inline the
      kvm_cpuid() call to detect the uninitialized data, and intentionally hose
      the kernel, e.g. insert ud2, instead of simply ignoring the result of
      the index comparison.
      
      Although the sketchy "dummy" pattern was introduced in SVM by commit
      66f7b72e ("KVM: x86: Make register state after reset conform to
      specification"), it wasn't actually broken until commit 7ff6c035
      ("KVM: x86: Remove stateful CPUID handling") arbitrarily swapped the
      order of operations such that "index" was checked before the significant
      flag.
      
      Avoid consuming uninitialized data by reverting to checking the flag
      before the index purely so that the fix can be easily backported; the
      offending RESET/INIT code has been refactored, moved, and consolidated
      from vendor code to common x86 since the bug was introduced.  A future
      patch will directly address the bad RESET/INIT behavior.
      
      The undefined behavior was detected by syzbot + KernelMemorySanitizer.
      
        BUG: KMSAN: uninit-value in cpuid_entry2_find arch/x86/kvm/cpuid.c:68
        BUG: KMSAN: uninit-value in kvm_find_cpuid_entry arch/x86/kvm/cpuid.c:1103
        BUG: KMSAN: uninit-value in kvm_cpuid+0x456/0x28f0 arch/x86/kvm/cpuid.c:1183
         cpuid_entry2_find arch/x86/kvm/cpuid.c:68 [inline]
         kvm_find_cpuid_entry arch/x86/kvm/cpuid.c:1103 [inline]
         kvm_cpuid+0x456/0x28f0 arch/x86/kvm/cpuid.c:1183
         kvm_vcpu_reset+0x13fb/0x1c20 arch/x86/kvm/x86.c:10885
         kvm_apic_accept_events+0x58f/0x8c0 arch/x86/kvm/lapic.c:2923
         vcpu_enter_guest+0xfd2/0x6d80 arch/x86/kvm/x86.c:9534
         vcpu_run+0x7f5/0x18d0 arch/x86/kvm/x86.c:9788
         kvm_arch_vcpu_ioctl_run+0x245b/0x2d10 arch/x86/kvm/x86.c:10020
      
        Local variable ----dummy@kvm_vcpu_reset created at:
         kvm_vcpu_reset+0x1fb/0x1c20 arch/x86/kvm/x86.c:10812
         kvm_apic_accept_events+0x58f/0x8c0 arch/x86/kvm/lapic.c:2923
      
      Reported-by: syzbot+f3985126b746b3d59c9d@syzkaller.appspotmail.com
      Reported-by: default avatarAlexander Potapenko <glider@google.com>
      Fixes: 2a24be79 ("KVM: VMX: Set EDX at INIT with CPUID.0x1, Family-Model-Stepping")
      Fixes: 7ff6c035 ("KVM: x86: Remove stateful CPUID handling")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Message-Id: <20210929222426.1855730-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e8a747d0
    • Zelin Deng's avatar
      ptp: Fix ptp_kvm_getcrosststamp issue for x86 ptp_kvm · 773e89ab
      Zelin Deng authored
      hv_clock is preallocated to have only HVC_BOOT_ARRAY_SIZE (64) elements;
      if the PTP_SYS_OFFSET_PRECISE ioctl is executed on vCPUs whose index is
      64 of higher, retrieving the struct pvclock_vcpu_time_info pointer with
      "src = &hv_clock[cpu].pvti" will result in an out-of-bounds access and
      a wild pointer.  Change it to "this_cpu_pvti()" which is guaranteed to
      be valid.
      
      Fixes: 95a3d445 ("Switch kvmclock data to a PER_CPU variable")
      Signed-off-by: default avatarZelin Deng <zelin.deng@linux.alibaba.com>
      Cc: <stable@vger.kernel.org>
      Message-Id: <1632892429-101194-3-git-send-email-zelin.deng@linux.alibaba.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      773e89ab
    • Zelin Deng's avatar
      x86/kvmclock: Move this_cpu_pvti into kvmclock.h · ad9af930
      Zelin Deng authored
      There're other modules might use hv_clock_per_cpu variable like ptp_kvm,
      so move it into kvmclock.h and export the symbol to make it visiable to
      other modules.
      Signed-off-by: default avatarZelin Deng <zelin.deng@linux.alibaba.com>
      Cc: <stable@vger.kernel.org>
      Message-Id: <1632892429-101194-2-git-send-email-zelin.deng@linux.alibaba.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ad9af930
  8. 28 Sep, 2021 2 commits
  9. 27 Sep, 2021 1 commit
    • Zhenzhong Duan's avatar
      KVM: VMX: Fix a TSX_CTRL_CPUID_CLEAR field mask issue · 5c49d185
      Zhenzhong Duan authored
      When updating the host's mask for its MSR_IA32_TSX_CTRL user return entry,
      clear the mask in the found uret MSR instead of vmx->guest_uret_msrs[i].
      Modifying guest_uret_msrs directly is completely broken as 'i' does not
      point at the MSR_IA32_TSX_CTRL entry.  In fact, it's guaranteed to be an
      out-of-bounds accesses as is always set to kvm_nr_uret_msrs in a prior
      loop. By sheer dumb luck, the fallout is limited to "only" failing to
      preserve the host's TSX_CTRL_CPUID_CLEAR.  The out-of-bounds access is
      benign as it's guaranteed to clear a bit in a guest MSR value, which are
      always zero at vCPU creation on both x86-64 and i386.
      
      Cc: stable@vger.kernel.org
      Fixes: 8ea8b8d6 ("KVM: VMX: Use common x86's uret MSR list as the one true list")
      Signed-off-by: default avatarZhenzhong Duan <zhenzhong.duan@intel.com>
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210926015545.281083-1-zhenzhong.duan@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5c49d185
  10. 24 Sep, 2021 3 commits
  11. 23 Sep, 2021 7 commits