1. 26 Apr, 2021 7 commits
    • Sean Christopherson's avatar
      KVM: SVM: Delay restoration of host MSR_TSC_AUX until return to userspace · 844d69c2
      Sean Christopherson authored
      Use KVM's "user return MSRs" framework to defer restoring the host's
      MSR_TSC_AUX until the CPU returns to userspace.  Add/improve comments to
      clarify why MSR_TSC_AUX is intercepted on both RDMSR and WRMSR, and why
      it's safe for KVM to keep the guest's value loaded even if KVM is
      scheduled out.
      
      Cc: Reiji Watanabe <reijiw@google.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210423223404.3860547-5-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      844d69c2
    • Sean Christopherson's avatar
      KVM: SVM: Clear MSR_TSC_AUX[63:32] on write · dbd61273
      Sean Christopherson authored
      Force clear bits 63:32 of MSR_TSC_AUX on write to emulate current AMD
      CPUs, which completely ignore the upper 32 bits, including dropping them
      on write.  Emulating AMD hardware will also allow migrating a vCPU from
      AMD hardware to Intel hardware without requiring userspace to manually
      clear the upper bits, which are reserved on Intel hardware.
      
      Presumably, MSR_TSC_AUX[63:32] are intended to be reserved on AMD, but
      sadly the APM doesn't say _anything_ about those bits in the context of
      MSR access.  The RDTSCP entry simply states that RCX contains bits 31:0
      of the MSR, zero extended.  And even worse is that the RDPID description
      implies that it can consume all 64 bits of the MSR:
      
        RDPID reads the value of TSC_AUX MSR used by the RDTSCP instruction
        into the specified destination register. Normal operand size prefixes
        do not apply and the update is either 32 bit or 64 bit based on the
        current mode.
      
      Emulate current hardware behavior to give KVM the best odds of playing
      nice with whatever the behavior of future AMD CPUs happens to be.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210423223404.3860547-3-seanjc@google.com>
      [Fix broken patch. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      dbd61273
    • Sean Christopherson's avatar
      KVM: SVM: Inject #GP on guest MSR_TSC_AUX accesses if RDTSCP unsupported · 6f2b296a
      Sean Christopherson authored
      Inject #GP on guest accesses to MSR_TSC_AUX if RDTSCP is unsupported in
      the guest's CPUID model.
      
      Fixes: 46896c73 ("KVM: svm: add support for RDTSCP")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210423223404.3860547-2-seanjc@google.com>
      Reviewed-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6f2b296a
    • Sean Christopherson's avatar
      KVM: VMX: Invert the inlining of MSR interception helpers · e23f6d49
      Sean Christopherson authored
      Invert the inline declarations of the MSR interception helpers between
      the wrapper, vmx_set_intercept_for_msr(), and the core implementations,
      vmx_{dis,en}able_intercept_for_msr().  Letting the compiler _not_
      inline the implementation reduces KVM's code footprint by ~3k bytes.
      
      Back when the helpers were added in commit 904e14fb ("KVM: VMX: make
      MSR bitmaps per-VCPU"), both the wrapper and the implementations were
      __always_inline because the end code distilled down to a few conditionals
      and a bit operation.  Today, the implementations involve a variety of
      checks and bit ops in order to support userspace MSR filtering.
      
      Furthermore, the vast majority of calls to manipulate MSR interception
      are not performance sensitive, e.g. vCPU creation and x2APIC toggling.
      On the other hand, the one path that is performance sensitive, dynamic
      LBR passthrough, uses the wrappers, i.e. is largely untouched by
      inverting the inlining.
      
      In short, forcing the low level MSR interception code to be inlined no
      longer makes sense.
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210423221912.3857243-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e23f6d49
    • Paolo Bonzini's avatar
      KVM: documentation: fix sphinx warnings · f82762fb
      Paolo Bonzini authored
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f82762fb
    • Wanpeng Li's avatar
      KVM: X86: Fix failure to boost kernel lock holder candidate in SEV-ES guests · b86bb11e
      Wanpeng Li authored
      Commit f1c6366e ("KVM: SVM: Add required changes to support intercepts under
      SEV-ES") prevents hypervisor accesses guest register state when the guest is
      running under SEV-ES. The initial value of vcpu->arch.guest_state_protected
      is false, it will not be updated in preemption notifiers after this commit which
      means that the kernel spinlock lock holder will always be skipped to boost. Let's
      fix it by always treating preempted is in the guest kernel mode, false positive
      is better than skip completely.
      
      Fixes: f1c6366e (KVM: SVM: Add required changes to support intercepts under SEV-ES)
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Message-Id: <1619080459-30032-1-git-send-email-wanpengli@tencent.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b86bb11e
    • Vitaly Kuznetsov's avatar
      KVM: x86: Properly handle APF vs disabled LAPIC situation · 2f15d027
      Vitaly Kuznetsov authored
      Async PF 'page ready' event may happen when LAPIC is (temporary) disabled.
      In particular, Sebastien reports that when Linux kernel is directly booted
      by Cloud Hypervisor, LAPIC is 'software disabled' when APF mechanism is
      initialized. On initialization KVM tries to inject 'wakeup all' event and
      puts the corresponding token to the slot. It is, however, failing to inject
      an interrupt (kvm_apic_set_irq() -> __apic_accept_irq() -> !apic_enabled())
      so the guest never gets notified and the whole APF mechanism gets stuck.
      The same issue is likely to happen if the guest temporary disables LAPIC
      and a previously unavailable page becomes available.
      
      Do two things to resolve the issue:
      - Avoid dequeuing 'page ready' events from APF queue when LAPIC is
        disabled.
      - Trigger an attempt to deliver pending 'page ready' events when LAPIC
        becomes enabled (SPIV or MSR_IA32_APICBASE).
      Reported-by: default avatarSebastien Boeuf <sebastien.boeuf@intel.com>
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210422092948.568327-1-vkuznets@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2f15d027
  2. 23 Apr, 2021 3 commits
    • Sean Christopherson's avatar
      KVM: x86: Fix implicit enum conversion goof in scattered reverse CPUID code · 462f8dde
      Sean Christopherson authored
      Take "enum kvm_only_cpuid_leafs" in scattered specific CPUID helpers
      (which is obvious in hindsight), and use "unsigned int" for leafs that
      can be the kernel's standard "enum cpuid_leaf" or the aforementioned
      KVM-only variant.  Loss of the enum params is a bit disapponting, but
      gcc obviously isn't providing any extra sanity checks, and the various
      BUILD_BUG_ON() assertions ensure the input is in range.
      
      This fixes implicit enum conversions that are detected by clang-11:
      
      arch/x86/kvm/cpuid.c:499:29: warning: implicit conversion from enumeration type 'enum kvm_only_cpuid_leafs' to different enumeration type 'enum cpuid_leafs' [-Wenum-conversion]
              kvm_cpu_cap_init_scattered(CPUID_12_EAX,
              ~~~~~~~~~~~~~~~~~~~~~~~~~~ ^~~~~~~~~~~~
      arch/x86/kvm/cpuid.c:837:31: warning: implicit conversion from enumeration type 'enum kvm_only_cpuid_leafs' to different enumeration type 'enum cpuid_leafs' [-Wenum-conversion]
                      cpuid_entry_override(entry, CPUID_12_EAX);
                      ~~~~~~~~~~~~~~~~~~~~        ^~~~~~~~~~~~
      2 warnings generated.
      
      Fixes: 4e66c0cb ("KVM: x86: Add support for reverse CPUID lookup of scattered features")
      Cc: Kai Huang <kai.huang@intel.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210421010850.3009718-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      462f8dde
    • Isaku Yamahata's avatar
      KVM: VMX: use EPT_VIOLATION_GVA_TRANSLATED instead of 0x100 · 10835602
      Isaku Yamahata authored
      Use symbolic value, EPT_VIOLATION_GVA_TRANSLATED, instead of 0x100
      in handle_ept_violation().
      Signed-off-by: default avatarYao Yuan <yuan.yao@intel.com>
      Signed-off-by: default avatarIsaku Yamahata <isaku.yamahata@intel.com>
      Message-Id: <724e8271ea301aece3eb2afe286a9e2e92a70b18.1619136576.git.isaku.yamahata@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      10835602
    • Paolo Bonzini's avatar
      Merge tag 'kvmarm-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD · c4f71901
      Paolo Bonzini authored
      KVM/arm64 updates for Linux 5.13
      
      New features:
      
      - Stage-2 isolation for the host kernel when running in protected mode
      - Guest SVE support when running in nVHE mode
      - Force W^X hypervisor mappings in nVHE mode
      - ITS save/restore for guests using direct injection with GICv4.1
      - nVHE panics now produce readable backtraces
      - Guest support for PTP using the ptp_kvm driver
      - Performance improvements in the S2 fault handler
      - Alexandru is now a reviewer (not really a new feature...)
      
      Fixes:
      - Proper emulation of the GICR_TYPER register
      - Handle the complete set of relocation in the nVHE EL2 object
      - Get rid of the oprofile dependency in the PMU code (and of the
        oprofile body parts at the same time)
      - Debug and SPE fixes
      - Fix vcpu reset
      c4f71901
  3. 22 Apr, 2021 7 commits
  4. 21 Apr, 2021 21 commits
    • Sean Christopherson's avatar
      KVM: SVM: Allocate SEV command structures on local stack · 238eca82
      Sean Christopherson authored
      Use the local stack to "allocate" the structures used to communicate with
      the PSP.  The largest struct used by KVM, sev_data_launch_secret, clocks
      in at 52 bytes, well within the realm of reasonable stack usage.  The
      smallest structs are a mere 4 bytes, i.e. the pointer for the allocation
      is larger than the allocation itself.
      
      Now that the PSP driver plays nice with vmalloc pointers, putting the
      data on a virtually mapped stack (CONFIG_VMAP_STACK=y) will not cause
      explosions.
      
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210406224952.4177376-9-seanjc@google.com>
      Reviewed-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Acked-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      [Apply same treatment to PSP migration commands. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      238eca82
    • Sean Christopherson's avatar
      crypto: ccp: Use the stack and common buffer for INIT command · a402e351
      Sean Christopherson authored
      Drop the dedicated init_cmd_buf and instead use a local variable.  Now
      that the low level helper uses an internal buffer for all commands,
      using the stack for the upper layers is safe even when running with
      CONFIG_VMAP_STACK=y.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210406224952.4177376-8-seanjc@google.com>
      Reviewed-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Acked-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a402e351
    • Sean Christopherson's avatar
      crypto: ccp: Use the stack and common buffer for status commands · 38103671
      Sean Christopherson authored
      Drop the dedicated status_cmd_buf and instead use a local variable for
      PLATFORM_STATUS.  Now that the low level helper uses an internal buffer
      for all commands, using the stack for the upper layers is safe even when
      running with CONFIG_VMAP_STACK=y.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210406224952.4177376-7-seanjc@google.com>
      Reviewed-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Acked-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      38103671
    • Sean Christopherson's avatar
      crypto: ccp: Use the stack for small SEV command buffers · e4a9af79
      Sean Christopherson authored
      For commands with small input/output buffers, use the local stack to
      "allocate" the structures used to communicate with the PSP.   Now that
      __sev_do_cmd_locked() gracefully handles vmalloc'd buffers, there's no
      reason to avoid using the stack, e.g. CONFIG_VMAP_STACK=y will just work.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210406224952.4177376-6-seanjc@google.com>
      Reviewed-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Acked-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e4a9af79
    • Sean Christopherson's avatar
      crypto: ccp: Play nice with vmalloc'd memory for SEV command structs · 8347b994
      Sean Christopherson authored
      Copy the incoming @data comman to an internal buffer so that callers can
      put SEV command buffers on the stack without running afoul of
      CONFIG_VMAP_STACK=y, i.e. without bombing on vmalloc'd pointers.  As of
      today, the largest supported command takes a 68 byte buffer, i.e. pretty
      much every command can be put on the stack.  Because sev_cmd_mutex is
      held for the entirety of a transaction, only a single bounce buffer is
      required.
      
      Use the internal buffer unconditionally, as the majority of in-kernel
      users will soon switch to using the stack.  At that point, checking
      virt_addr_valid() becomes (negligible) overhead in most cases, and
      supporting both paths slightly increases complexity.  Since the commands
      are all quite small, the cost of the copies is insignificant compared to
      the latency of communicating with the PSP.
      
      Allocate a full page for the buffer as opportunistic preparation for
      SEV-SNP, which requires the command buffer to be in firmware state for
      commands that trigger memory writes from the PSP firmware.  Using a full
      page now will allow SEV-SNP support to simply transition the page as
      needed.
      
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210406224952.4177376-5-seanjc@google.com>
      Reviewed-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Acked-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      8347b994
    • Sean Christopherson's avatar
      crypto: ccp: Reject SEV commands with mismatching command buffer · d5760dee
      Sean Christopherson authored
      WARN on and reject SEV commands that provide a valid data pointer, but do
      not have a known, non-zero length.  And conversely, reject commands that
      take a command buffer but none is provided (data is null).
      
      Aside from sanity checking input, disallowing a non-null pointer without
      a non-zero size will allow a future patch to cleanly handle vmalloc'd
      data by copying the data to an internal __pa() friendly buffer.
      
      Note, this also effectively prevents callers from using commands that
      have a non-zero length and are not known to the kernel.  This is not an
      explicit goal, but arguably the side effect is a good thing from the
      kernel's perspective.
      
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210406224952.4177376-4-seanjc@google.com>
      Reviewed-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Acked-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d5760dee
    • Sean Christopherson's avatar
      crypto: ccp: Detect and reject "invalid" addresses destined for PSP · 74c1f136
      Sean Christopherson authored
      Explicitly reject using pointers that are not virt_to_phys() friendly
      as the source for SEV commands that are sent to the PSP.  The PSP works
      with physical addresses, and __pa()/virt_to_phys() will not return the
      correct address in these cases, e.g. for a vmalloc'd pointer.  At best,
      the bogus address will cause the command to fail, and at worst lead to
      system instability.
      
      While it's unlikely that callers will deliberately use a bad pointer for
      SEV buffers, a caller can easily use a vmalloc'd pointer unknowingly when
      running with CONFIG_VMAP_STACK=y as it's not obvious that putting the
      command buffers on the stack would be bad.  The command buffers are
      relative  small and easily fit on the stack, and the APIs to do not
      document that the incoming pointer must be a physically contiguous,
      __pa() friendly pointer.
      
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Fixes: 200664d5 ("crypto: ccp: Add Secure Encrypted Virtualization (SEV) command support")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210406224952.4177376-3-seanjc@google.com>
      Reviewed-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Acked-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      74c1f136
    • Sean Christopherson's avatar
      crypto: ccp: Free SEV device if SEV init fails · b61a9071
      Sean Christopherson authored
      Free the SEV device if later initialization fails.  The memory isn't
      technically leaked as it's tracked in the top-level device's devres
      list, but unless the top-level device is removed, the memory won't be
      freed and is effectively leaked.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210406224952.4177376-2-seanjc@google.com>
      Reviewed-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Acked-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b61a9071
    • Brijesh Singh's avatar
      KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command · 6a443def
      Brijesh Singh authored
      The command finalize the guest receiving process and make the SEV guest
      ready for the execution.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: x86@kernel.org
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Reviewed-by: default avatarSteve Rutherford <srutherford@google.com>
      Signed-off-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: default avatarAshish Kalra <ashish.kalra@amd.com>
      Message-Id: <d08914dc259644de94e29b51c3b68a13286fc5a3.1618498113.git.ashish.kalra@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6a443def
    • Brijesh Singh's avatar
      KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command · 15fb7de1
      Brijesh Singh authored
      The command is used for copying the incoming buffer into the
      SEV guest memory space.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: x86@kernel.org
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Reviewed-by: default avatarSteve Rutherford <srutherford@google.com>
      Signed-off-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: default avatarAshish Kalra <ashish.kalra@amd.com>
      Message-Id: <c5d0e3e719db7bb37ea85d79ed4db52e9da06257.1618498113.git.ashish.kalra@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      15fb7de1
    • Brijesh Singh's avatar
      KVM: SVM: Add support for KVM_SEV_RECEIVE_START command · af43cbbf
      Brijesh Singh authored
      The command is used to create the encryption context for an incoming
      SEV guest. The encryption context can be later used by the hypervisor
      to import the incoming data into the SEV guest memory space.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: x86@kernel.org
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Reviewed-by: default avatarSteve Rutherford <srutherford@google.com>
      Signed-off-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: default avatarAshish Kalra <ashish.kalra@amd.com>
      Message-Id: <c7400111ed7458eee01007c4d8d57cdf2cbb0fc2.1618498113.git.ashish.kalra@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      af43cbbf
    • Steve Rutherford's avatar
      KVM: SVM: Add support for KVM_SEV_SEND_CANCEL command · 5569e2e7
      Steve Rutherford authored
      After completion of SEND_START, but before SEND_FINISH, the source VMM can
      issue the SEND_CANCEL command to stop a migration. This is necessary so
      that a cancelled migration can restart with a new target later.
      Reviewed-by: default avatarNathan Tempelman <natet@google.com>
      Reviewed-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: default avatarSteve Rutherford <srutherford@google.com>
      Message-Id: <20210412194408.2458827-1-srutherford@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5569e2e7
    • Brijesh Singh's avatar
      KVM: SVM: Add KVM_SEV_SEND_FINISH command · fddecf6a
      Brijesh Singh authored
      The command is used to finailize the encryption context created with
      KVM_SEV_SEND_START command.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: x86@kernel.org
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Reviewed-by: default avatarSteve Rutherford <srutherford@google.com>
      Signed-off-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: default avatarAshish Kalra <ashish.kalra@amd.com>
      Message-Id: <5082bd6a8539d24bc55a1dd63a1b341245bb168f.1618498113.git.ashish.kalra@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fddecf6a
    • Brijesh Singh's avatar
      KVM: SVM: Add KVM_SEND_UPDATE_DATA command · d3d1af85
      Brijesh Singh authored
      The command is used for encrypting the guest memory region using the encryption
      context created with KVM_SEV_SEND_START.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: x86@kernel.org
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Reviewed-by : Steve Rutherford <srutherford@google.com>
      Signed-off-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: default avatarAshish Kalra <ashish.kalra@amd.com>
      Message-Id: <d6a6ea740b0c668b30905ae31eac5ad7da048bb3.1618498113.git.ashish.kalra@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d3d1af85
    • Brijesh Singh's avatar
      KVM: SVM: Add KVM_SEV SEND_START command · 4cfdd47d
      Brijesh Singh authored
      The command is used to create an outgoing SEV guest encryption context.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: x86@kernel.org
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Reviewed-by: default avatarSteve Rutherford <srutherford@google.com>
      Reviewed-by: default avatarVenu Busireddy <venu.busireddy@oracle.com>
      Signed-off-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: default avatarAshish Kalra <ashish.kalra@amd.com>
      Message-Id: <2f1686d0164e0f1b3d6a41d620408393e0a48376.1618498113.git.ashish.kalra@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4cfdd47d
    • Wanpeng Li's avatar
      KVM: Boost vCPU candidate in user mode which is delivering interrupt · 52acd22f
      Wanpeng Li authored
      Both lock holder vCPU and IPI receiver that has halted are condidate for
      boost. However, the PLE handler was originally designed to deal with the
      lock holder preemption problem. The Intel PLE occurs when the spinlock
      waiter is in kernel mode. This assumption doesn't hold for IPI receiver,
      they can be in either kernel or user mode. the vCPU candidate in user mode
      will not be boosted even if they should respond to IPIs. Some benchmarks
      like pbzip2, swaptions etc do the TLB shootdown in kernel mode and most
      of the time they are running in user mode. It can lead to a large number
      of continuous PLE events because the IPI sender causes PLE events
      repeatedly until the receiver is scheduled while the receiver is not
      candidate for a boost.
      
      This patch boosts the vCPU candidiate in user mode which is delivery
      interrupt. We can observe the speed of pbzip2 improves 10% in 96 vCPUs
      VM in over-subscribe scenario (The host machine is 2 socket, 48 cores,
      96 HTs Intel CLX box). There is no performance regression for other
      benchmarks like Unixbench spawn (most of the time contend read/write
      lock in kernel mode), ebizzy (most of the time contend read/write sem
      and TLB shoodtdown in kernel mode).
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Message-Id: <1618542490-14756-1-git-send-email-wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      52acd22f
    • Paolo Bonzini's avatar
    • Paolo Bonzini's avatar
      KVM: selftests: Always run vCPU thread with blocked SIG_IPI · bf1e15a8
      Paolo Bonzini authored
      The main thread could start to send SIG_IPI at any time, even before signal
      blocked on vcpu thread.  Therefore, start the vcpu thread with the signal
      blocked.
      
      Without this patch, on very busy cores the dirty_log_test could fail directly
      on receiving a SIGUSR1 without a handler (when vcpu runs far slower than main).
      Reported-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bf1e15a8
    • Peter Xu's avatar
      KVM: selftests: Sync data verify of dirty logging with guest sync · 016ff1a4
      Peter Xu authored
      This fixes a bug that can trigger with e.g. "taskset -c 0 ./dirty_log_test" or
      when the testing host is very busy.
      
      A similar previous attempt is done [1] but that is not enough, the reason is
      stated in the reply [2].
      
      As a summary (partly quotting from [2]):
      
      The problem is I think one guest memory write operation (of this specific test)
      contains a few micro-steps when page is during kvm dirty tracking (here I'm
      only considering write-protect rather than pml but pml should be similar at
      least when the log buffer is full):
      
        (1) Guest read 'iteration' number into register, prepare to write, page fault
        (2) Set dirty bit in either dirty bitmap or dirty ring
        (3) Return to guest, data written
      
      When we verify the data, we assumed that all these steps are "atomic", say,
      when (1) happened for this page, we assume (2) & (3) must have happened.  We
      had some trick to workaround "un-atomicity" of above three steps, as previous
      version of this patch wanted to fix atomicity of step (2)+(3) by explicitly
      letting the main thread wait for at least one vmenter of vcpu thread, which
      should work.  However what I overlooked is probably that we still have race
      when (1) and (2) can be interrupted.
      
      One example calltrace when it could happen that we read an old interation, got
      interrupted before even setting the dirty bit and flushing data:
      
          __schedule+1742
          __cond_resched+52
          __get_user_pages+530
          get_user_pages_unlocked+197
          hva_to_pfn+206
          try_async_pf+132
          direct_page_fault+320
          kvm_mmu_page_fault+103
          vmx_handle_exit+288
          vcpu_enter_guest+2460
          kvm_arch_vcpu_ioctl_run+325
          kvm_vcpu_ioctl+526
          __x64_sys_ioctl+131
          do_syscall_64+51
          entry_SYSCALL_64_after_hwframe+68
      
      It means iteration number cached in vcpu register can be very old when dirty
      bit set and data flushed.
      
      So far I don't see an easy way to guarantee all steps 1-3 atomicity but to sync
      at the GUEST_SYNC() point of guest code when we do verification of the dirty
      bits as what this patch does.
      
      [1] https://lore.kernel.org/lkml/20210413213641.23742-1-peterx@redhat.com/
      [2] https://lore.kernel.org/lkml/20210417140956.GV4440@xz-x1/
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Andrew Jones <drjones@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20210417143602.215059-2-peterx@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      016ff1a4
    • Nathan Tempelman's avatar
      KVM: x86: Support KVM VMs sharing SEV context · 54526d1f
      Nathan Tempelman authored
      Add a capability for userspace to mirror SEV encryption context from
      one vm to another. On our side, this is intended to support a
      Migration Helper vCPU, but it can also be used generically to support
      other in-guest workloads scheduled by the host. The intention is for
      the primary guest and the mirror to have nearly identical memslots.
      
      The primary benefits of this are that:
      1) The VMs do not share KVM contexts (think APIC/MSRs/etc), so they
      can't accidentally clobber each other.
      2) The VMs can have different memory-views, which is necessary for post-copy
      migration (the migration vCPUs on the target need to read and write to
      pages, when the primary guest would VMEXIT).
      
      This does not change the threat model for AMD SEV. Any memory involved
      is still owned by the primary guest and its initial state is still
      attested to through the normal SEV_LAUNCH_* flows. If userspace wanted
      to circumvent SEV, they could achieve the same effect by simply attaching
      a vCPU to the primary VM.
      This patch deliberately leaves userspace in charge of the memslots for the
      mirror, as it already has the power to mess with them in the primary guest.
      
      This patch does not support SEV-ES (much less SNP), as it does not
      handle handing off attested VMSAs to the mirror.
      
      For additional context, we need a Migration Helper because SEV PSP
      migration is far too slow for our live migration on its own. Using
      an in-guest migrator lets us speed this up significantly.
      Signed-off-by: default avatarNathan Tempelman <natet@google.com>
      Message-Id: <20210408223214.2582277-1-natet@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      54526d1f
    • Krish Sadhukhan's avatar
      nSVM: Check addresses of MSR and IO permission maps · ee695f22
      Krish Sadhukhan authored
      According to section "Canonicalization and Consistency Checks" in APM vol 2,
      the following guest state is illegal:
      
          "The MSR or IOIO intercept tables extend to a physical address that
           is greater than or equal to the maximum supported physical address."
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Message-Id: <20210412215611.110095-5-krish.sadhukhan@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ee695f22
  5. 20 Apr, 2021 2 commits