1. 24 Jan, 2023 3 commits
    • Paolo Bonzini's avatar
      Merge branch 'kvm-lapic-fix-and-cleanup' into HEAD · f15a87c0
      Paolo Bonzini authored
      The first half or so patches fix semi-urgent, real-world relevant APICv
      and AVIC bugs.
      
      The second half fixes a variety of AVIC and optimized APIC map bugs
      where KVM doesn't play nice with various edge cases that are
      architecturally legal(ish), but are unlikely to occur in most real world
      scenarios
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f15a87c0
    • Paolo Bonzini's avatar
      Merge branch 'kvm-v6.2-rc4-fixes' into HEAD · dc7c31e9
      Paolo Bonzini authored
      ARM:
      
      * Fix the PMCR_EL0 reset value after the PMU rework
      
      * Correctly handle S2 fault triggered by a S1 page table walk
        by not always classifying it as a write, as this breaks on
        R/O memslots
      
      * Document why we cannot exit with KVM_EXIT_MMIO when taking
        a write fault from a S1 PTW on a R/O memslot
      
      * Put the Apple M2 on the naughty list for not being able to
        correctly implement the vgic SEIS feature, just like the M1
        before it
      
      * Reviewer updates: Alex is stepping down, replaced by Zenghui
      
      x86:
      
      * Fix various rare locking issues in Xen emulation and teach lockdep
        to detect them
      
      * Documentation improvements
      
      * Do not return host topology information from KVM_GET_SUPPORTED_CPUID
      dc7c31e9
    • Paolo Bonzini's avatar
      Merge branch 'kvm-hw-enable-refactor' into HEAD · edd731d7
      Paolo Bonzini authored
      The main theme of this series is to kill off kvm_arch_init(),
      kvm_arch_hardware_(un)setup(), and kvm_arch_check_processor_compat(), which
      all originated in x86 code from way back when, and needlessly complicate
      both common KVM code and architecture code.  E.g. many architectures don't
      mark functions/data as __init/__ro_after_init purely because kvm_init()
      isn't marked __init to support x86's separate vendor modules.
      
      The idea/hope is that with those hooks gone (moved to arch code), it will
      be easier for x86 (and other architectures) to modify their module init
      sequences as needed without having to fight common KVM code.  E.g. I'm
      hoping that ARM can build on this to simplify its hardware enabling logic,
      especially the pKVM side of things.
      
      There are bug fixes throughout this series.  They are more scattered than
      I would usually prefer, but getting the sequencing correct was a gigantic
      pain for many of the x86 fixes due to needing to fix common code in order
      for the x86 fix to have any meaning.  And while the bugs are often fatal,
      they aren't all that interesting for most users as they either require a
      malicious admin or broken hardware, i.e. aren't likely to be encountered
      by the vast majority of KVM users.  So unless someone _really_ wants a
      particular fix isolated for backporting, I'm not planning on shuffling
      patches.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      edd731d7
  2. 13 Jan, 2023 33 commits
  3. 11 Jan, 2023 4 commits
    • David Woodhouse's avatar
      KVM: x86/xen: Avoid deadlock by adding kvm->arch.xen.xen_lock leaf node lock · 310bc395
      David Woodhouse authored
      In commit 14243b38 ("KVM: x86/xen: Add KVM_IRQ_ROUTING_XEN_EVTCHN
      and event channel delivery") the clever version of me left some helpful
      notes for those who would come after him:
      
             /*
              * For the irqfd workqueue, using the main kvm->lock mutex is
              * fine since this function is invoked from kvm_set_irq() with
              * no other lock held, no srcu. In future if it will be called
              * directly from a vCPU thread (e.g. on hypercall for an IPI)
              * then it may need to switch to using a leaf-node mutex for
              * serializing the shared_info mapping.
              */
             mutex_lock(&kvm->lock);
      
      In commit 2fd6df2f ("KVM: x86/xen: intercept EVTCHNOP_send from guests")
      the other version of me ran straight past that comment without reading it,
      and introduced a potential deadlock by taking vcpu->mutex and kvm->lock
      in the wrong order.
      
      Solve this as originally suggested, by adding a leaf-node lock in the Xen
      state rather than using kvm->lock for it.
      
      Fixes: 2fd6df2f ("KVM: x86/xen: intercept EVTCHNOP_send from guests")
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <20230111180651.14394-4-dwmw2@infradead.org>
      [Rebase, add docs. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      310bc395
    • David Woodhouse's avatar
      KVM: Ensure lockdep knows about kvm->lock vs. vcpu->mutex ordering rule · 42a90008
      David Woodhouse authored
      Documentation/virt/kvm/locking.rst tells us that kvm->lock is taken outside
      vcpu->mutex. But that doesn't actually happen very often; it's only in
      some esoteric cases like migration with AMD SEV. This means that lockdep
      usually doesn't notice, and doesn't do its job of keeping us honest.
      
      Ensure that lockdep *always* knows about the ordering of these two locks,
      by briefly taking vcpu->mutex in kvm_vm_ioctl_create_vcpu() while kvm->lock
      is held.
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <20230111180651.14394-3-dwmw2@infradead.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      42a90008
    • David Woodhouse's avatar
      KVM: x86/xen: Fix potential deadlock in kvm_xen_update_runstate_guest() · bbe17c62
      David Woodhouse authored
      The kvm_xen_update_runstate_guest() function can be called when the vCPU
      is being scheduled out, from a preempt notifier. It *opportunistically*
      updates the runstate area in the guest memory, if the gfn_to_pfn_cache
      which caches the appropriate address is still valid.
      
      If there is *contention* when it attempts to obtain gpc->lock, then
      locking inside the priority inheritance checks may cause a deadlock.
      Lockdep reports:
      
      [13890.148997] Chain exists of:
                       &gpc->lock --> &p->pi_lock --> &rq->__lock
      
      [13890.149002]  Possible unsafe locking scenario:
      
      [13890.149003]        CPU0                    CPU1
      [13890.149004]        ----                    ----
      [13890.149005]   lock(&rq->__lock);
      [13890.149007]                                lock(&p->pi_lock);
      [13890.149009]                                lock(&rq->__lock);
      [13890.149011]   lock(&gpc->lock);
      [13890.149013]
                      *** DEADLOCK ***
      
      In the general case, if there's contention for a read lock on gpc->lock,
      that's going to be because something else is either invalidating or
      revalidating the cache. Either way, we've raced with seeing it in an
      invalid state, in which case we would have aborted the opportunistic
      update anyway.
      
      So in the 'atomic' case when called from the preempt notifier, just
      switch to using read_trylock() and avoid the PI handling altogether.
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <20230111180651.14394-2-dwmw2@infradead.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bbe17c62
    • David Woodhouse's avatar
      KVM: x86/xen: Fix lockdep warning on "recursive" gpc locking · 23e60258
      David Woodhouse authored
      In commit 5ec3289b ("KVM: x86/xen: Compatibility fixes for shared runstate
      area") we declared it safe to obtain two gfn_to_pfn_cache locks at the same
      time:
      	/*
      	 * The guest's runstate_info is split across two pages and we
      	 * need to hold and validate both GPCs simultaneously. We can
      	 * declare a lock ordering GPC1 > GPC2 because nothing else
      	 * takes them more than one at a time.
      	 */
      
      However, we forgot to tell lockdep. Do so, by setting a subclass on the
      first lock before taking the second.
      
      Fixes: 5ec3289b ("KVM: x86/xen: Compatibility fixes for shared runstate area")
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <20230111180651.14394-1-dwmw2@infradead.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      23e60258