• Paolo Bonzini's avatar
    KVM: x86/mmu: Allow yielding when zapping GFNs for defunct TDP MMU root · 8351779c
    Paolo Bonzini authored
    Allow yielding when zapping SPTEs after the last reference to a valid
    root is put.  Because KVM must drop all SPTEs in response to relevant
    mmu_notifier events, mark defunct roots invalid and reset their refcount
    prior to zapping the root.  Keeping the refcount elevated while the zap
    is in-progress ensures the root is reachable via mmu_notifier until the
    zap completes and the last reference to the invalid, defunct root is put.
    
    Allowing kvm_tdp_mmu_put_root() to yield fixes soft lockup issues if the
    root in being put has a massive paging structure, e.g. zapping a root
    that is backed entirely by 4kb pages for a guest with 32tb of memory can
    take hundreds of seconds to complete.
    
      watchdog: BUG: soft lockup - CPU#49 stuck for 485s! [max_guest_memor:52368]
      RIP: 0010:kvm_set_pfn_dirty+0x30/0x50 [kvm]
       __handle_changed_spte+0x1b2/0x2f0 [kvm]
       handle_removed_tdp_mmu_page+0x1a7/0x2b8 [kvm]
       __handle_changed_spte+0x1f4/0x2f0 [kvm]
       handle_removed_tdp_mmu_page+0x1a7/0x2b8 [kvm]
       __handle_changed_spte+0x1f4/0x2f0 [kvm]
       tdp_mmu_zap_root+0x307/0x4d0 [kvm]
       kvm_tdp_mmu_put_root+0x7c/0xc0 [kvm]
       kvm_mmu_free_roots+0x22d/0x350 [kvm]
       kvm_mmu_reset_context+0x20/0x60 [kvm]
       kvm_arch_vcpu_ioctl_set_sregs+0x5a/0xc0 [kvm]
       kvm_vcpu_ioctl+0x5bd/0x710 [kvm]
       __se_sys_ioctl+0x77/0xc0
       __x64_sys_ioctl+0x1d/0x20
       do_syscall_64+0x44/0xa0
       entry_SYSCALL_64_after_hwframe+0x44/0xae
    
    KVM currently doesn't put a root from a non-preemptible context, so other
    than the mmu_notifier wrinkle, yielding when putting a root is safe.
    
    Yield-unfriendly iteration uses for_each_tdp_mmu_root(), which doesn't
    take a reference to each root (it requires mmu_lock be held for the
    entire duration of the walk).
    
    tdp_mmu_next_root() is used only by the yield-friendly iterator.
    
    tdp_mmu_zap_root_work() is explicitly yield friendly.
    
    kvm_mmu_free_roots() => mmu_free_root_page() is a much bigger fan-out,
    but is still yield-friendly in all call sites, as all callers can be
    traced back to some combination of vcpu_run(), kvm_destroy_vm(), and/or
    kvm_create_vm().
    Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
    Message-Id: <20220226001546.360188-21-seanjc@google.com>
    Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
    8351779c
tdp_mmu.c 54.7 KB