1. 15 Jul, 2021 13 commits
    • Vitaly Kuznetsov's avatar
      KVM: nSVM: Fix L1 state corruption upon return from SMM · 37be407b
      Vitaly Kuznetsov authored
      VMCB split commit 4995a368 ("KVM: SVM: Use a separate vmcb for the
      nested L2 guest") broke return from SMM when we entered there from guest
      (L2) mode. Gen2 WS2016/Hyper-V is known to do this on boot. The problem
      manifests itself like this:
      
        kvm_exit:             reason EXIT_RSM rip 0x7ffbb280 info 0 0
        kvm_emulate_insn:     0:7ffbb280: 0f aa
        kvm_smm_transition:   vcpu 0: leaving SMM, smbase 0x7ffb3000
        kvm_nested_vmrun:     rip: 0x000000007ffbb280 vmcb: 0x0000000008224000
          nrip: 0xffffffffffbbe119 int_ctl: 0x01020000 event_inj: 0x00000000
          npt: on
        kvm_nested_intercepts: cr_read: 0000 cr_write: 0010 excp: 40060002
          intercepts: fd44bfeb 0000217f 00000000
        kvm_entry:            vcpu 0, rip 0xffffffffffbbe119
        kvm_exit:             reason EXIT_NPF rip 0xffffffffffbbe119 info
          200000006 1ab000
        kvm_nested_vmexit:    vcpu 0 reason npf rip 0xffffffffffbbe119 info1
          0x0000000200000006 info2 0x00000000001ab000 intr_info 0x00000000
          error_code 0x00000000
        kvm_page_fault:       address 1ab000 error_code 6
        kvm_nested_vmexit_inject: reason EXIT_NPF info1 200000006 info2 1ab000
          int_info 0 int_info_err 0
        kvm_entry:            vcpu 0, rip 0x7ffbb280
        kvm_exit:             reason EXIT_EXCP_GP rip 0x7ffbb280 info 0 0
        kvm_emulate_insn:     0:7ffbb280: 0f aa
        kvm_inj_exception:    #GP (0x0)
      
      Note: return to L2 succeeded but upon first exit to L1 its RIP points to
      'RSM' instruction but we're not in SMM.
      
      The problem appears to be that VMCB01 gets irreversibly destroyed during
      SMM execution. Previously, we used to have 'hsave' VMCB where regular
      (pre-SMM) L1's state was saved upon nested_svm_vmexit() but now we just
      switch to VMCB01 from VMCB02.
      
      Pre-split (working) flow looked like:
      - SMM is triggered during L2's execution
      - L2's state is pushed to SMRAM
      - nested_svm_vmexit() restores L1's state from 'hsave'
      - SMM -> RSM
      - enter_svm_guest_mode() switches to L2 but keeps 'hsave' intact so we have
        pre-SMM (and pre L2 VMRUN) L1's state there
      - L2's state is restored from SMRAM
      - upon first exit L1's state is restored from L1.
      
      This was always broken with regards to svm_get_nested_state()/
      svm_set_nested_state(): 'hsave' was never a part of what's being
      save and restored so migration happening during SMM triggered from L2 would
      never restore L1's state correctly.
      
      Post-split flow (broken) looks like:
      - SMM is triggered during L2's execution
      - L2's state is pushed to SMRAM
      - nested_svm_vmexit() switches to VMCB01 from VMCB02
      - SMM -> RSM
      - enter_svm_guest_mode() switches from VMCB01 to VMCB02 but pre-SMM VMCB01
        is already lost.
      - L2's state is restored from SMRAM
      - upon first exit L1's state is restored from VMCB01 but it is corrupted
       (reflects the state during 'RSM' execution).
      
      VMX doesn't have this problem because unlike VMCB, VMCS keeps both guest
      and host state so when we switch back to VMCS02 L1's state is intact there.
      
      To resolve the issue we need to save L1's state somewhere. We could've
      created a third VMCB for SMM but that would require us to modify saved
      state format. L1's architectural HSAVE area (pointed by MSR_VM_HSAVE_PA)
      seems appropriate: L0 is free to save any (or none) of L1's state there.
      Currently, KVM does 'none'.
      
      Note, for nested state migration to succeed, both source and destination
      hypervisors must have the fix. We, however, don't need to create a new
      flag indicating the fact that HSAVE area is now populated as migration
      during SMM triggered from L2 was always broken.
      
      Fixes: 4995a368 ("KVM: SVM: Use a separate vmcb for the nested L2 guest")
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      37be407b
    • Vitaly Kuznetsov's avatar
      KVM: nSVM: Introduce svm_copy_vmrun_state() · 0a758290
      Vitaly Kuznetsov authored
      Separate the code setting non-VMLOAD-VMSAVE state from
      svm_set_nested_state() into its own function. This is going to be
      re-used from svm_enter_smm()/svm_leave_smm().
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210628104425.391276-4-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0a758290
    • Vitaly Kuznetsov's avatar
      KVM: nSVM: Check that VM_HSAVE_PA MSR was set before VMRUN · fb79f566
      Vitaly Kuznetsov authored
      APM states that "The address written to the VM_HSAVE_PA MSR, which holds
      the address of the page used to save the host state on a VMRUN, must point
      to a hypervisor-owned page. If this check fails, the WRMSR will fail with
      a #GP(0) exception. Note that a value of 0 is not considered valid for the
      VM_HSAVE_PA MSR and a VMRUN that is attempted while the HSAVE_PA is 0 will
      fail with a #GP(0) exception."
      
      svm_set_msr() already checks that the supplied address is valid, so only
      check for '0' is missing. Add it to nested_svm_vmrun().
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210628104425.391276-3-vkuznets@redhat.com>
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fb79f566
    • Vitaly Kuznetsov's avatar
      KVM: nSVM: Check the value written to MSR_VM_HSAVE_PA · fce7e152
      Vitaly Kuznetsov authored
      APM states that #GP is raised upon write to MSR_VM_HSAVE_PA when
      the supplied address is not page-aligned or is outside of "maximum
      supported physical address for this implementation".
      page_address_valid() check seems suitable. Also, forcefully page-align
      the address when it's written from VMM.
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210628104425.391276-2-vkuznets@redhat.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      [Add comment about behavior for host-provided values. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fce7e152
    • Sean Christopherson's avatar
      KVM: SVM: Fix sev_pin_memory() error checks in SEV migration utilities · c7a1b2b6
      Sean Christopherson authored
      Use IS_ERR() instead of checking for a NULL pointer when querying for
      sev_pin_memory() failures.  sev_pin_memory() always returns an error code
      cast to a pointer, or a valid pointer; it never returns NULL.
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Cc: Steve Rutherford <srutherford@google.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Ashish Kalra <ashish.kalra@amd.com>
      Fixes: d3d1af85 ("KVM: SVM: Add KVM_SEND_UPDATE_DATA command")
      Fixes: 15fb7de1 ("KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210506175826.2166383-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c7a1b2b6
    • Sean Christopherson's avatar
      KVM: SVM: Return -EFAULT if copy_to_user() for SEV mig packet header fails · b4a69392
      Sean Christopherson authored
      Return -EFAULT if copy_to_user() fails; if accessing user memory faults,
      copy_to_user() returns the number of bytes remaining, not an error code.
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Cc: Steve Rutherford <srutherford@google.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Ashish Kalra <ashish.kalra@amd.com>
      Fixes: d3d1af85 ("KVM: SVM: Add KVM_SEND_UPDATE_DATA command")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210506175826.2166383-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b4a69392
    • Maxim Levitsky's avatar
      KVM: SVM: add module param to control the #SMI interception · 4b639a9f
      Maxim Levitsky authored
      In theory there are no side effects of not intercepting #SMI,
      because then #SMI becomes transparent to the OS and the KVM.
      
      Plus an observation on recent Zen2 CPUs reveals that these
      CPUs ignore #SMI interception and never deliver #SMI VMexits.
      
      This is also useful to test nested KVM to see that L1
      handles #SMIs correctly in case when L1 doesn't intercept #SMI.
      
      Finally the default remains the same, the SMI are intercepted
      by default thus this patch doesn't have any effect unless
      non default module param value is used.
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210707125100.677203-4-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4b639a9f
    • Maxim Levitsky's avatar
      KVM: SVM: remove INIT intercept handler · 896707c2
      Maxim Levitsky authored
      Kernel never sends real INIT even to CPUs, other than on boot.
      
      Thus INIT interception is an error which should be caught
      by a check for an unknown VMexit reason.
      
      On top of that, the current INIT VM exit handler skips
      the current instruction which is wrong.
      That was added in commit 5ff3a351 ("KVM: x86: Move trivial
      instruction-based exit handlers to common code").
      
      Fixes: 5ff3a351 ("KVM: x86: Move trivial instruction-based exit handlers to common code")
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210707125100.677203-3-mlevitsk@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      896707c2
    • Maxim Levitsky's avatar
      KVM: SVM: #SMI interception must not skip the instruction · 991afbbe
      Maxim Levitsky authored
      Commit 5ff3a351 ("KVM: x86: Move trivial instruction-based
      exit handlers to common code"), unfortunately made a mistake of
      treating nop_on_interception and nop_interception in the same way.
      
      Former does truly nothing while the latter skips the instruction.
      
      SMI VM exit handler should do nothing.
      (SMI itself is handled by the host when we do STGI)
      
      Fixes: 5ff3a351 ("KVM: x86: Move trivial instruction-based exit handlers to common code")
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210707125100.677203-2-mlevitsk@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      991afbbe
    • Yu Zhang's avatar
      KVM: VMX: Remove vmx_msr_index from vmx.h · c0e1303e
      Yu Zhang authored
      vmx_msr_index was used to record the list of MSRs which can be lazily
      restored when kvm returns to userspace. It is now reimplemented as
      kvm_uret_msrs_list, a common x86 list which is only used inside x86.c.
      So just remove the obsolete declaration in vmx.h.
      Signed-off-by: default avatarYu Zhang <yu.c.zhang@linux.intel.com>
      Message-Id: <20210707235702.31595-1-yu.c.zhang@linux.intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c0e1303e
    • Lai Jiangshan's avatar
      KVM: X86: Disable hardware breakpoints unconditionally before kvm_x86->run() · f85d4016
      Lai Jiangshan authored
      When the host is using debug registers but the guest is not using them
      nor is the guest in guest-debug state, the kvm code does not reset
      the host debug registers before kvm_x86->run().  Rather, it relies on
      the hardware vmentry instruction to automatically reset the dr7 registers
      which ensures that the host breakpoints do not affect the guest.
      
      This however violates the non-instrumentable nature around VM entry
      and exit; for example, when a host breakpoint is set on vcpu->arch.cr2,
      
      Another issue is consistency.  When the guest debug registers are active,
      the host breakpoints are reset before kvm_x86->run(). But when the
      guest debug registers are inactive, the host breakpoints are delayed to
      be disabled.  The host tracing tools may see different results depending
      on what the guest is doing.
      
      To fix the problems, we clear %db7 unconditionally before kvm_x86->run()
      if the host has set any breakpoints, no matter if the guest is using
      them or not.
      Signed-off-by: default avatarLai Jiangshan <laijs@linux.alibaba.com>
      Message-Id: <20210628172632.81029-1-jiangshanlai@gmail.com>
      Cc: stable@vger.kernel.org
      [Only clear %db7 instead of reloading all debug registers. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f85d4016
    • Ricardo Koller's avatar
      KVM: selftests: Address extra memslot parameters in vm_vaddr_alloc · 6f2f86ec
      Ricardo Koller authored
      Commit a75a895e ("KVM: selftests: Unconditionally use memslot 0 for
      vaddr allocations") removed the memslot parameters from vm_vaddr_alloc.
      It addressed all callers except one under lib/aarch64/, due to a race
      with commit e3db7579 ("KVM: selftests: Add exception handling
      support for aarch64")
      
      Fix the vm_vaddr_alloc call in lib/aarch64/processor.c.
      Reported-by: default avatarZenghui Yu <yuzenghui@huawei.com>
      Signed-off-by: default avatarRicardo Koller <ricarkol@google.com>
      Message-Id: <20210702201042.4036162-1-ricarkol@google.com>
      Reviewed-by: default avatarEric Auger <eric.auger@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6f2f86ec
    • Pavel Skripkin's avatar
      kvm: debugfs: fix memory leak in kvm_create_vm_debugfs · 004d62eb
      Pavel Skripkin authored
      In commit bc9e9e67 ("KVM: debugfs: Reuse binary stats descriptors")
      loop for filling debugfs_stat_data was copy-pasted 2 times, but
      in the second loop pointers are saved over pointers allocated
      in the first loop.  All this causes is a memory leak, fix it.
      
      Fixes: bc9e9e67 ("KVM: debugfs: Reuse binary stats descriptors")
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Reviewed-by: default avatarJing Zhang <jingzhangos@google.com>
      Message-Id: <20210701195500.27097-1-paskripkin@gmail.com>
      Reviewed-by: default avatarJing Zhang <jingzhangos@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      004d62eb
  2. 14 Jul, 2021 9 commits
    • Like Xu's avatar
      KVM: x86/pmu: Clear anythread deprecated bit when 0xa leaf is unsupported on the SVM · 7234c362
      Like Xu authored
      The AMD platform does not support the functions Ah CPUID leaf. The returned
      results for this entry should all remain zero just like the native does:
      
      AMD host:
         0x0000000a 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
      (uncanny) AMD guest:
         0x0000000a 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00008000
      
      Fixes: cadbaa03 ("perf/x86/intel: Make anythread filter support conditional")
      Signed-off-by: default avatarLike Xu <likexu@tencent.com>
      Message-Id: <20210628074354.33848-1-likexu@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7234c362
    • Kefeng Wang's avatar
      KVM: mmio: Fix use-after-free Read in kvm_vm_ioctl_unregister_coalesced_mmio · 23fa2e46
      Kefeng Wang authored
      BUG: KASAN: use-after-free in kvm_vm_ioctl_unregister_coalesced_mmio+0x7c/0x1ec arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:183
      Read of size 8 at addr ffff0000c03a2500 by task syz-executor083/4269
      
      CPU: 5 PID: 4269 Comm: syz-executor083 Not tainted 5.10.0 #7
      Hardware name: linux,dummy-virt (DT)
      Call trace:
       dump_backtrace+0x0/0x2d0 arch/arm64/kernel/stacktrace.c:132
       show_stack+0x28/0x34 arch/arm64/kernel/stacktrace.c:196
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x110/0x164 lib/dump_stack.c:118
       print_address_description+0x78/0x5c8 mm/kasan/report.c:385
       __kasan_report mm/kasan/report.c:545 [inline]
       kasan_report+0x148/0x1e4 mm/kasan/report.c:562
       check_memory_region_inline mm/kasan/generic.c:183 [inline]
       __asan_load8+0xb4/0xbc mm/kasan/generic.c:252
       kvm_vm_ioctl_unregister_coalesced_mmio+0x7c/0x1ec arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:183
       kvm_vm_ioctl+0xe30/0x14c4 arch/arm64/kvm/../../../virt/kvm/kvm_main.c:3755
       vfs_ioctl fs/ioctl.c:48 [inline]
       __do_sys_ioctl fs/ioctl.c:753 [inline]
       __se_sys_ioctl fs/ioctl.c:739 [inline]
       __arm64_sys_ioctl+0xf88/0x131c fs/ioctl.c:739
       __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
       invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
       el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
       do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:220
       el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
       el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
       el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670
      
      Allocated by task 4269:
       stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121
       kasan_save_stack mm/kasan/common.c:48 [inline]
       kasan_set_track mm/kasan/common.c:56 [inline]
       __kasan_kmalloc+0xdc/0x120 mm/kasan/common.c:461
       kasan_kmalloc+0xc/0x14 mm/kasan/common.c:475
       kmem_cache_alloc_trace include/linux/slab.h:450 [inline]
       kmalloc include/linux/slab.h:552 [inline]
       kzalloc include/linux/slab.h:664 [inline]
       kvm_vm_ioctl_register_coalesced_mmio+0x78/0x1cc arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:146
       kvm_vm_ioctl+0x7e8/0x14c4 arch/arm64/kvm/../../../virt/kvm/kvm_main.c:3746
       vfs_ioctl fs/ioctl.c:48 [inline]
       __do_sys_ioctl fs/ioctl.c:753 [inline]
       __se_sys_ioctl fs/ioctl.c:739 [inline]
       __arm64_sys_ioctl+0xf88/0x131c fs/ioctl.c:739
       __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
       invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
       el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
       do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:220
       el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
       el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
       el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670
      
      Freed by task 4269:
       stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121
       kasan_save_stack mm/kasan/common.c:48 [inline]
       kasan_set_track+0x38/0x6c mm/kasan/common.c:56
       kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:355
       __kasan_slab_free+0x124/0x150 mm/kasan/common.c:422
       kasan_slab_free+0x10/0x1c mm/kasan/common.c:431
       slab_free_hook mm/slub.c:1544 [inline]
       slab_free_freelist_hook mm/slub.c:1577 [inline]
       slab_free mm/slub.c:3142 [inline]
       kfree+0x104/0x38c mm/slub.c:4124
       coalesced_mmio_destructor+0x94/0xa4 arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:102
       kvm_iodevice_destructor include/kvm/iodev.h:61 [inline]
       kvm_io_bus_unregister_dev+0x248/0x280 arch/arm64/kvm/../../../virt/kvm/kvm_main.c:4374
       kvm_vm_ioctl_unregister_coalesced_mmio+0x158/0x1ec arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:186
       kvm_vm_ioctl+0xe30/0x14c4 arch/arm64/kvm/../../../virt/kvm/kvm_main.c:3755
       vfs_ioctl fs/ioctl.c:48 [inline]
       __do_sys_ioctl fs/ioctl.c:753 [inline]
       __se_sys_ioctl fs/ioctl.c:739 [inline]
       __arm64_sys_ioctl+0xf88/0x131c fs/ioctl.c:739
       __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
       invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
       el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
       do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:220
       el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
       el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
       el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670
      
      If kvm_io_bus_unregister_dev() return -ENOMEM, we already call kvm_iodevice_destructor()
      inside this function to delete 'struct kvm_coalesced_mmio_dev *dev' from list
      and free the dev, but kvm_iodevice_destructor() is called again, it will lead
      the above issue.
      
      Let's check the the return value of kvm_io_bus_unregister_dev(), only call
      kvm_iodevice_destructor() if the return value is 0.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: kvm@vger.kernel.org
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Message-Id: <20210626070304.143456-1-wangkefeng.wang@huawei.com>
      Cc: stable@vger.kernel.org
      Fixes: 5d3c4c79 ("KVM: Stop looking for coalesced MMIO zones if the bus is destroyed", 2021-04-20)
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      23fa2e46
    • Sean Christopherson's avatar
      KVM: SVM: Revert clearing of C-bit on GPA in #NPF handler · 76ff371b
      Sean Christopherson authored
      Don't clear the C-bit in the #NPF handler, as it is a legal GPA bit for
      non-SEV guests, and for SEV guests the C-bit is dropped before the GPA
      hits the NPT in hardware.  Clearing the bit for non-SEV guests causes KVM
      to mishandle #NPFs with that collide with the host's C-bit.
      
      Although the APM doesn't explicitly state that the C-bit is not reserved
      for non-SEV, Tom Lendacky confirmed that the following snippet about the
      effective reduction due to the C-bit does indeed apply only to SEV guests.
      
        Note that because guest physical addresses are always translated
        through the nested page tables, the size of the guest physical address
        space is not impacted by any physical address space reduction indicated
        in CPUID 8000_001F[EBX]. If the C-bit is a physical address bit however,
        the guest physical address space is effectively reduced by 1 bit.
      
      And for SEV guests, the APM clearly states that the bit is dropped before
      walking the nested page tables.
      
        If the C-bit is an address bit, this bit is masked from the guest
        physical address when it is translated through the nested page tables.
        Consequently, the hypervisor does not need to be aware of which pages
        the guest has chosen to mark private.
      
      Note, the bogus C-bit clearing was removed from legacy #PF handler in
      commit 6d1b867d ("KVM: SVM: Don't strip the C-bit from CR2 on #PF
      interception").
      
      Fixes: 0ede79e1 ("KVM: SVM: Clear C-bit from the page fault address")
      Cc: Peter Gonda <pgonda@google.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210625020354.431829-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      76ff371b
    • Sean Christopherson's avatar
      KVM: x86/mmu: Do not apply HPA (memory encryption) mask to GPAs · fc9bf2e0
      Sean Christopherson authored
      Ignore "dynamic" host adjustments to the physical address mask when
      generating the masks for guest PTEs, i.e. the guest PA masks.  The host
      physical address space and guest physical address space are two different
      beasts, e.g. even though SEV's C-bit is the same bit location for both
      host and guest, disabling SME in the host (which clears shadow_me_mask)
      does not affect the guest PTE->GPA "translation".
      
      For non-SEV guests, not dropping bits is the correct behavior.  Assuming
      KVM and userspace correctly enumerate/configure guest MAXPHYADDR, bits
      that are lost as collateral damage from memory encryption are treated as
      reserved bits, i.e. KVM will never get to the point where it attempts to
      generate a gfn using the affected bits.  And if userspace wants to create
      a bogus vCPU, then userspace gets to deal with the fallout of hardware
      doing odd things with bad GPAs.
      
      For SEV guests, not dropping the C-bit is technically wrong, but it's a
      moot point because KVM can't read SEV guest's page tables in any case
      since they're always encrypted.  Not to mention that the current KVM code
      is also broken since sme_me_mask does not have to be non-zero for SEV to
      be supported by KVM.  The proper fix would be to teach all of KVM to
      correctly handle guest private memory, but that's a task for the future.
      
      Fixes: d0ec49d4 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
      Cc: stable@vger.kernel.org
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210623230552.4027702-5-seanjc@google.com>
      [Use a new header instead of adding header guards to paging_tmpl.h. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fc9bf2e0
    • Sean Christopherson's avatar
      KVM: x86: Use kernel's x86_phys_bits to handle reduced MAXPHYADDR · e39f00f6
      Sean Christopherson authored
      Use boot_cpu_data.x86_phys_bits instead of the raw CPUID information to
      enumerate the MAXPHYADDR for KVM guests when TDP is disabled (the guest
      version is only relevant to NPT/TDP).
      
      When using shadow paging, any reductions to the host's MAXPHYADDR apply
      to KVM and its guests as well, i.e. using the raw CPUID info will cause
      KVM to misreport the number of PA bits available to the guest.
      
      Unconditionally zero out the "Physical Address bit reduction" entry.
      For !TDP, the adjustment is already done, and for TDP enumerating the
      host's reduction is wrong as the reduction does not apply to GPAs.
      
      Fixes: 9af9b940 ("x86/cpu/AMD: Handle SME reduction in physical address size")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210623230552.4027702-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e39f00f6
    • Sean Christopherson's avatar
      KVM: x86: Use guest MAXPHYADDR from CPUID.0x8000_0008 iff TDP is enabled · 4bf48e3c
      Sean Christopherson authored
      Ignore the guest MAXPHYADDR reported by CPUID.0x8000_0008 if TDP, i.e.
      NPT, is disabled, and instead use the host's MAXPHYADDR.  Per AMD'S APM:
      
        Maximum guest physical address size in bits. This number applies only
        to guests using nested paging. When this field is zero, refer to the
        PhysAddrSize field for the maximum guest physical address size.
      
      Fixes: 24c82e57 ("KVM: Sanitize cpuid")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210623230552.4027702-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4bf48e3c
    • Sean Christopherson's avatar
      Revert "KVM: x86: WARN and reject loading KVM if NX is supported but not enabled" · f0414b07
      Sean Christopherson authored
      Let KVM load if EFER.NX=0 even if NX is supported, the analysis and
      testing (or lack thereof) for the non-PAE host case was garbage.
      
      If the kernel won't be using PAE paging, .Ldefault_entry in head_32.S
      skips over the entire EFER sequence.  Hopefully that can be changed in
      the future to allow KVM to require EFER.NX, but the motivation behind
      KVM's requirement isn't yet merged.  Reverting and revisiting the mess
      at a later date is by far the safest approach.
      
      This reverts commit 8bbed95d.
      
      Fixes: 8bbed95d ("KVM: x86: WARN and reject loading KVM if NX is supported but not enabled")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210625001853.318148-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f0414b07
    • Marc Zyngier's avatar
      KVM: selftests: x86: Address missing vm_install_exception_handler conversions · f8f0edab
      Marc Zyngier authored
      Commit b78f4a59 ("KVM: selftests: Rename vm_handle_exception")
      raced with a couple of new x86 tests, missing two vm_handle_exception
      to vm_install_exception_handler conversions.
      
      Help the two broken tests to catch up with the new world.
      
      Cc: Andrew Jones <drjones@redhat.com>
      CC: Ricardo Koller <ricarkol@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Message-Id: <20210701071928.2971053-1-maz@kernel.org>
      Reviewed-by: default avatarAndrew Jones <drjones@redhat.com>
      Reviewed-by: default avatarRicardo Koller <ricarkol@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f8f0edab
    • Paolo Bonzini's avatar
      Merge tag 'kvm-s390-master-5.14-1' of... · f3cf8007
      Paolo Bonzini authored
      Merge tag 'kvm-s390-master-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
      
      KVM: selftests: Fixes
      
      - provide memory model for  IBM z196 and zEC12
      - do not require 64GB of memory
      f3cf8007
  3. 06 Jul, 2021 2 commits
  4. 25 Jun, 2021 3 commits
  5. 24 Jun, 2021 13 commits