• Sean Christopherson's avatar
    KVM: x86/mmu: Do not create SPTEs for GFNs that exceed host.MAXPHYADDR · 86931ff7
    Sean Christopherson authored
    Disallow memslots and MMIO SPTEs whose gpa range would exceed the host's
    MAXPHYADDR, i.e. don't create SPTEs for gfns that exceed host.MAXPHYADDR.
    The TDP MMU bounds its zapping based on host.MAXPHYADDR, and so if the
    guest, possibly with help from userspace, manages to coerce KVM into
    creating a SPTE for an "impossible" gfn, KVM will leak the associated
    shadow pages (page tables):
    
      WARNING: CPU: 10 PID: 1122 at arch/x86/kvm/mmu/tdp_mmu.c:57
                                    kvm_mmu_uninit_tdp_mmu+0x4b/0x60 [kvm]
      Modules linked in: kvm_intel kvm irqbypass
      CPU: 10 PID: 1122 Comm: set_memory_regi Tainted: G        W         5.18.0-rc1+ #293
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
      RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x4b/0x60 [kvm]
      Call Trace:
       <TASK>
       kvm_arch_destroy_vm+0x130/0x1b0 [kvm]
       kvm_destroy_vm+0x162/0x2d0 [kvm]
       kvm_vm_release+0x1d/0x30 [kvm]
       __fput+0x82/0x240
       task_work_run+0x5b/0x90
       exit_to_user_mode_prepare+0xd2/0xe0
       syscall_exit_to_user_mode+0x1d/0x40
       entry_SYSCALL_64_after_hwframe+0x44/0xae
       </TASK>
    
    On bare metal, encountering an impossible gpa in the page fault path is
    well and truly impossible, barring CPU bugs, as the CPU will signal #PF
    during the gva=>gpa translation (or a similar failure when stuffing a
    physical address into e.g. the VMCS/VMCB).  But if KVM is running as a VM
    itself, the MAXPHYADDR enumerated to KVM may not be the actual MAXPHYADDR
    of the underlying hardware, in which case the hardware will not fault on
    the illegal-from-KVM's-perspective gpa.
    
    Alternatively, KVM could continue allowing the dodgy behavior and simply
    zap the max possible range.  But, for hosts with MAXPHYADDR < 52, that's
    a (minor) waste of cycles, and more importantly, KVM can't reasonably
    support impossible memslots when running on bare metal (or with an
    accurate MAXPHYADDR as a VM).  Note, limiting the overhead by checking if
    KVM is running as a guest is not a safe option as the host isn't required
    to announce itself to the guest in any way, e.g. doesn't need to set the
    HYPERVISOR CPUID bit.
    
    A second alternative to disallowing the memslot behavior would be to
    disallow creating a VM with guest.MAXPHYADDR > host.MAXPHYADDR.  That
    restriction is undesirable as there are legitimate use cases for doing
    so, e.g. using the highest host.MAXPHYADDR out of a pool of heterogeneous
    systems so that VMs can be migrated between hosts with different
    MAXPHYADDRs without running afoul of the allow_smaller_maxphyaddr mess.
    
    Note that any guest.MAXPHYADDR is valid with shadow paging, and it is
    even useful in order to test KVM with MAXPHYADDR=52 (i.e. without
    any reserved physical address bits).
    
    The now common kvm_mmu_max_gfn() is inclusive instead of exclusive.
    The memslot and TDP MMU code want an exclusive value, but the name
    implies the returned value is inclusive, and the MMIO path needs an
    inclusive check.
    
    Fixes: faaf05b0 ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU")
    Fixes: 524a1e4e ("KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEs")
    Cc: stable@vger.kernel.org
    Cc: Maxim Levitsky <mlevitsk@redhat.com>
    Cc: Ben Gardon <bgardon@google.com>
    Cc: David Matlack <dmatlack@google.com>
    Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
    Message-Id: <20220428233416.2446833-1-seanjc@google.com>
    Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
    86931ff7
spte.h 15.5 KB