• Sean Christopherson's avatar
    KVM: x86/mmu: Stop zapping invalidated TDP MMU roots asynchronously · 0df9dab8
    Sean Christopherson authored
    Stop zapping invalidate TDP MMU roots via work queue now that KVM
    preserves TDP MMU roots until they are explicitly invalidated.  Zapping
    roots asynchronously was effectively a workaround to avoid stalling a vCPU
    for an extended during if a vCPU unloaded a root, which at the time
    happened whenever the guest toggled CR0.WP (a frequent operation for some
    guest kernels).
    
    While a clever hack, zapping roots via an unbound worker had subtle,
    unintended consequences on host scheduling, especially when zapping
    multiple roots, e.g. as part of a memslot.  Because the work of zapping a
    root is no longer bound to the task that initiated the zap, things like
    the CPU affinity and priority of the original task get lost.  Losing the
    affinity and priority can be especially problematic if unbound workqueues
    aren't affined to a small number of CPUs, as zapping multiple roots can
    cause KVM to heavily utilize the majority of CPUs in the system, *beyond*
    the CPUs KVM is already using to run vCPUs.
    
    When deleting a memslot via KVM_SET_USER_MEMORY_REGION, the async root
    zap can result in KVM occupying all logical CPUs for ~8ms, and result in
    high priority tasks not being scheduled in in a timely manner.  In v5.15,
    which doesn't preserve unloaded roots, the issues were even more noticeable
    as KVM would zap roots more frequently and could occupy all CPUs for 50ms+.
    
    Consuming all CPUs for an extended duration can lead to significant jitter
    throughout the system, e.g. on ChromeOS with virtio-gpu, deleting memslots
    is a semi-frequent operation as memslots are deleted and recreated with
    different host virtual addresses to react to host GPU drivers allocating
    and freeing GPU blobs.  On ChromeOS, the jitter manifests as audio blips
    during games due to the audio server's tasks not getting scheduled in
    promptly, despite the tasks having a high realtime priority.
    
    Deleting memslots isn't exactly a fast path and should be avoided when
    possible, and ChromeOS is working towards utilizing MAP_FIXED to avoid the
    memslot shenanigans, but KVM is squarely in the wrong.  Not to mention
    that removing the async zapping eliminates a non-trivial amount of
    complexity.
    
    Note, one of the subtle behaviors hidden behind the async zapping is that
    KVM would zap invalidated roots only once (ignoring partial zaps from
    things like mmu_notifier events).  Preserve this behavior by adding a flag
    to identify roots that are scheduled to be zapped versus roots that have
    already been zapped but not yet freed.
    
    Add a comment calling out why kvm_tdp_mmu_invalidate_all_roots() can
    encounter invalid roots, as it's not at all obvious why zapping
    invalidated roots shouldn't simply zap all invalid roots.
    Reported-by: default avatarPattara Teerapong <pteerapong@google.com>
    Cc: David Stevens <stevensd@google.com>
    Cc: Yiwei Zhang<zzyiwei@google.com>
    Cc: Paul Hsia <paulhsia@google.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
    Message-Id: <20230916003916.2545000-4-seanjc@google.com>
    Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
    0df9dab8
x86.c 363 KB