1. 12 Dec, 2020 2 commits
    • Stefan Raspl's avatar
      tools/kvm_stat: Exempt time-based counters · 111d0bda
      Stefan Raspl authored
      The new counters halt_poll_success_ns and halt_poll_fail_ns do not count
      events. Instead they provide a time, and mess up our statistics. Therefore,
      we should exclude them.
      Removal is currently implemented with an exempt list. If more counters like
      these appear, we can think about a more general rule like excluding all
      fields name "*_ns", in case that's a standing convention.
      Signed-off-by: default avatarStefan Raspl <raspl@linux.ibm.com>
      Tested-and-reported-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Message-Id: <20201208210829.101324-1-raspl@linux.ibm.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      111d0bda
    • Maciej S. Szmigiero's avatar
      KVM: mmu: Fix SPTE encoding of MMIO generation upper half · 34c0f6f2
      Maciej S. Szmigiero authored
      Commit cae7ed3c ("KVM: x86: Refactor the MMIO SPTE generation handling")
      cleaned up the computation of MMIO generation SPTE masks, however it
      introduced a bug how the upper part was encoded:
      SPTE bits 52-61 were supposed to contain bits 10-19 of the current
      generation number, however a missing shift encoded bits 1-10 there instead
      (mostly duplicating the lower part of the encoded generation number that
      then consisted of bits 1-9).
      
      In the meantime, the upper part was shrunk by one bit and moved by
      subsequent commits to become an upper half of the encoded generation number
      (bits 9-17 of bits 0-17 encoded in a SPTE).
      
      In addition to the above, commit 56871d44 ("KVM: x86: fix overlap between SPTE_MMIO_MASK and generation")
      has changed the SPTE bit range assigned to encode the generation number and
      the total number of bits encoded but did not update them in the comment
      attached to their defines, nor in the KVM MMU doc.
      Let's do it here, too, since it is too trivial thing to warrant a separate
      commit.
      
      Fixes: cae7ed3c ("KVM: x86: Refactor the MMIO SPTE generation handling")
      Signed-off-by: default avatarMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
      Message-Id: <156700708db2a5296c5ed7a8b9ac71f1e9765c85.1607129096.git.maciej.szmigiero@oracle.com>
      Cc: stable@vger.kernel.org
      [Reorganize macros so that everything is computed from the bit ranges. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      34c0f6f2
  2. 10 Dec, 2020 1 commit
  3. 04 Dec, 2020 2 commits
  4. 03 Dec, 2020 1 commit
    • Maciej S. Szmigiero's avatar
      selftests: kvm/set_memory_region_test: Fix race in move region test · 0c55f867
      Maciej S. Szmigiero authored
      The current memory region move test correctly handles the situation that
      the second (realigning) memslot move operation would temporarily trigger
      MMIO until it completes, however it does not handle the case in which the
      first (misaligning) move operation does this, too.
      This results in false test assertions in case it does so.
      
      Fix this by handling temporary MMIO from the first memslot move operation
      in the test guest code, too.
      
      Fixes: 8a0639fe ("KVM: sefltests: Add explicit synchronization to move mem region test")
      Signed-off-by: default avatarMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
      Message-Id: <0fdddb94bb0e31b7da129a809a308d91c10c0b5e.1606941224.git.maciej.szmigiero@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0c55f867
  5. 02 Dec, 2020 3 commits
  6. 27 Nov, 2020 4 commits
    • Vitaly Kuznetsov's avatar
      kvm: x86/mmu: Fix get_mmio_spte() on CPUs supporting 5-level PT · 9a2a0d3c
      Vitaly Kuznetsov authored
      Commit 95fb5b02 ("kvm: x86/mmu: Support MMIO in the TDP MMU") caused
      the following WARNING on an Intel Ice Lake CPU:
      
       get_mmio_spte: detect reserved bits on spte, addr 0xb80a0, dump hierarchy:
       ------ spte 0xb80a0 level 5.
       ------ spte 0xfcd210107 level 4.
       ------ spte 0x1004c40107 level 3.
       ------ spte 0x1004c41107 level 2.
       ------ spte 0x1db00000000b83b6 level 1.
       WARNING: CPU: 109 PID: 10254 at arch/x86/kvm/mmu/mmu.c:3569 kvm_mmu_page_fault.cold.150+0x54/0x22f [kvm]
      ...
       Call Trace:
        ? kvm_io_bus_get_first_dev+0x55/0x110 [kvm]
        vcpu_enter_guest+0xaa1/0x16a0 [kvm]
        ? vmx_get_cs_db_l_bits+0x17/0x30 [kvm_intel]
        ? skip_emulated_instruction+0xaa/0x150 [kvm_intel]
        kvm_arch_vcpu_ioctl_run+0xca/0x520 [kvm]
      
      The guest triggering this crashes. Note, this happens with the traditional
      MMU and EPT enabled, not with the newly introduced TDP MMU. Turns out,
      there was a subtle change in the above mentioned commit. Previously,
      walk_shadow_page_get_mmio_spte() was setting 'root' to 'iterator.level'
      which is returned by shadow_walk_init() and this equals to
      'vcpu->arch.mmu->shadow_root_level'. Now, get_mmio_spte() sets it to
      'int root = vcpu->arch.mmu->root_level'.
      
      The difference between 'root_level' and 'shadow_root_level' on CPUs
      supporting 5-level page tables is that in some case we don't want to
      use 5-level, in particular when 'cpuid_maxphyaddr(vcpu) <= 48'
      kvm_mmu_get_tdp_level() returns '4'. In case upper layer is not used,
      the corresponding SPTE will fail '__is_rsvd_bits_set()' check.
      
      Revert to using 'shadow_root_level'.
      
      Fixes: 95fb5b02 ("kvm: x86/mmu: Support MMIO in the TDP MMU")
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20201126110206.2118959-1-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      9a2a0d3c
    • Paolo Bonzini's avatar
      KVM: x86: Fix split-irqchip vs interrupt injection window request · 71cc849b
      Paolo Bonzini authored
      kvm_cpu_accept_dm_intr and kvm_vcpu_ready_for_interrupt_injection are
      a hodge-podge of conditions, hacked together to get something that
      more or less works.  But what is actually needed is much simpler;
      in both cases the fundamental question is, do we have a place to stash
      an interrupt if userspace does KVM_INTERRUPT?
      
      In userspace irqchip mode, that is !vcpu->arch.interrupt.injected.
      Currently kvm_event_needs_reinjection(vcpu) covers it, but it is
      unnecessarily restrictive.
      
      In split irqchip mode it's a bit more complicated, we need to check
      kvm_apic_accept_pic_intr(vcpu) (the IRQ window exit is basically an INTACK
      cycle and thus requires ExtINTs not to be masked) as well as
      !pending_userspace_extint(vcpu).  However, there is no need to
      check kvm_event_needs_reinjection(vcpu), since split irqchip keeps
      pending ExtINT state separate from event injection state, and checking
      kvm_cpu_has_interrupt(vcpu) is wrong too since ExtINT has higher
      priority than APIC interrupts.  In fact the latter fixes a bug:
      when userspace requests an IRQ window vmexit, an interrupt in the
      local APIC can cause kvm_cpu_has_interrupt() to be true and thus
      kvm_vcpu_ready_for_interrupt_injection() to return false.  When this
      happens, vcpu_run does not exit to userspace but the interrupt window
      vmexits keep occurring.  The VM loops without any hope of making progress.
      
      Once we try to fix these with something like
      
           return kvm_arch_interrupt_allowed(vcpu) &&
      -        !kvm_cpu_has_interrupt(vcpu) &&
      -        !kvm_event_needs_reinjection(vcpu) &&
      -        kvm_cpu_accept_dm_intr(vcpu);
      +        (!lapic_in_kernel(vcpu)
      +         ? !vcpu->arch.interrupt.injected
      +         : (kvm_apic_accept_pic_intr(vcpu)
      +            && !pending_userspace_extint(v)));
      
      we realize two things.  First, thanks to the previous patch the complex
      conditional can reuse !kvm_cpu_has_extint(vcpu).  Second, the interrupt
      window request in vcpu_enter_guest()
      
              bool req_int_win =
                      dm_request_for_irq_injection(vcpu) &&
                      kvm_cpu_accept_dm_intr(vcpu);
      
      should be kept in sync with kvm_vcpu_ready_for_interrupt_injection():
      it is unnecessary to ask the processor for an interrupt window
      if we would not be able to return to userspace.  Therefore,
      kvm_cpu_accept_dm_intr(vcpu) is basically !kvm_cpu_has_extint(vcpu)
      ANDed with the existing check for masked ExtINT.  It all makes sense:
      
      - we can accept an interrupt from userspace if there is a place
        to stash it (and, for irqchip split, ExtINTs are not masked).
        Interrupts from userspace _can_ be accepted even if right now
        EFLAGS.IF=0.
      
      - in order to tell userspace we will inject its interrupt ("IRQ
        window open" i.e. kvm_vcpu_ready_for_interrupt_injection), both
        KVM and the vCPU need to be ready to accept the interrupt.
      
      ... and this is what the patch implements.
      Reported-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Analyzed-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: default avatarNikos Tsironis <ntsironis@arrikto.com>
      Reviewed-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Tested-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      71cc849b
    • Paolo Bonzini's avatar
      KVM: x86: handle !lapic_in_kernel case in kvm_cpu_*_extint · 72c3bcdc
      Paolo Bonzini authored
      Centralize handling of interrupts from the userspace APIC
      in kvm_cpu_has_extint and kvm_cpu_get_extint, since
      userspace APIC interrupts are handled more or less the
      same as ExtINTs are with split irqchip.  This removes
      duplicated code from kvm_cpu_has_injectable_intr and
      kvm_cpu_has_interrupt, and makes the code more similar
      between kvm_cpu_has_{extint,interrupt} on one side
      and kvm_cpu_get_{extint,interrupt} on the other.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarFilippo Sironi <sironi@amazon.de>
      Reviewed-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Tested-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      72c3bcdc
    • Paolo Bonzini's avatar
      Merge tag 'kvmarm-fixes-5.10-4' of... · 545f6394
      Paolo Bonzini authored
      Merge tag 'kvmarm-fixes-5.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master
      
      KVM/arm64 fixes for v5.10, take #4
      
      - Fix alignment of the new HYP sections
      - Fix GICR_TYPER access from userspace
      545f6394
  7. 20 Nov, 2020 1 commit
    • Sean Christopherson's avatar
      MAINTAINERS: Update email address for Sean Christopherson · c2b1209d
      Sean Christopherson authored
      Update my email address to one provided by my new benefactor.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jarkko Sakkinen <jarkko@kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Wanpeng Li <wanpengli@tencent.com>
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: kvm@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20201119183707.291864-1-sean.kvm@gmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c2b1209d
  8. 18 Nov, 2020 3 commits
  9. 17 Nov, 2020 2 commits
  10. 16 Nov, 2020 3 commits
    • Ashish Kalra's avatar
      KVM: SVM: Fix offset computation bug in __sev_dbg_decrypt(). · 854c57f0
      Ashish Kalra authored
      Fix offset computation in __sev_dbg_decrypt() to include the
      source paddr before it is rounded down to be aligned to 16 bytes
      as required by SEV API. This fixes incorrect guest memory dumps
      observed when using qemu monitor.
      Signed-off-by: default avatarAshish Kalra <ashish.kalra@amd.com>
      Message-Id: <20201110224205.29444-1-Ashish.Kalra@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      854c57f0
    • Paolo Bonzini's avatar
      Merge tag 'kvm-s390-master-5.10-1' of... · d4d3c84d
      Paolo Bonzini authored
      Merge tag 'kvm-s390-master-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-master
      
      KVM: s390: Fixes for 5.10
      
      - do not reset the global diag318 data for per-cpu reset
      - do not mark memory as protected too early
      d4d3c84d
    • Jamie Iles's avatar
      KVM: arm64: Correctly align nVHE percpu data · 7bab16a6
      Jamie Iles authored
      The nVHE percpu data is partially linked but the nVHE linker script did
      not align the percpu section.  The PERCPU_INPUT macro would then align
      the data to a page boundary:
      
        #define PERCPU_INPUT(cacheline)					\
        	__per_cpu_start = .;						\
        	*(.data..percpu..first)						\
        	. = ALIGN(PAGE_SIZE);						\
        	*(.data..percpu..page_aligned)					\
        	. = ALIGN(cacheline);						\
        	*(.data..percpu..read_mostly)					\
        	. = ALIGN(cacheline);						\
        	*(.data..percpu)						\
        	*(.data..percpu..shared_aligned)				\
        	PERCPU_DECRYPTED_SECTION					\
        	__per_cpu_end = .;
      
      but then when the final vmlinux linking happens the hypervisor percpu
      data is included after page alignment and so the offsets potentially
      don't match.  On my build I saw that the .hyp.data..percpu section was
      at address 0x20 and then the percpu data would begin at 0x1000 (because
      of the page alignment in PERCPU_INPUT), but when linked into vmlinux,
      everything would be shifted down by 0x20 bytes.
      
      This manifests as one of the CPUs getting lost when running
      kvm-unit-tests or starting any VM and subsequent soft lockup on a Cortex
      A72 device.
      
      Fixes: 30c95391 ("kvm: arm64: Set up hyp percpu data for nVHE")
      Signed-off-by: default avatarJamie Iles <jamie@nuviainc.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Acked-by: default avatarDavid Brazdil <dbrazdil@google.com>
      Cc: David Brazdil <dbrazdil@google.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20201113150406.14314-1-jamie@nuviainc.com
      7bab16a6
  11. 15 Nov, 2020 1 commit
    • Paolo Bonzini's avatar
      kvm: mmu: fix is_tdp_mmu_check when the TDP MMU is not in use · c887c9b9
      Paolo Bonzini authored
      In some cases where shadow paging is in use, the root page will
      be either mmu->pae_root or vcpu->arch.mmu->lm_root.  Then it will
      not have an associated struct kvm_mmu_page, because it is allocated
      with alloc_page instead of kvm_mmu_alloc_page.
      
      Just return false quickly from is_tdp_mmu_root if the TDP MMU is
      not in use, which also includes the case where shadow paging is
      enabled.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c887c9b9
  12. 13 Nov, 2020 6 commits
  13. 12 Nov, 2020 11 commits