1. 26 Jul, 2024 16 commits
    • Paolo Bonzini's avatar
      KVM: guest_memfd: abstract how prepared folios are recorded · 66a644c0
      Paolo Bonzini authored
      Right now, large folios are not supported in guest_memfd, and therefore the order
      used by kvm_gmem_populate() is always 0.  In this scenario, using the up-to-date
      bit to track prepared-ness is nice and easy because we have one bit available
      per page.
      
      In the future, however, we might have large pages that are partially populated;
      for example, in the case of SEV-SNP, if a large page has both shared and private
      areas inside, it is necessary to populate it at a granularity that is smaller
      than that of the guest_memfd's backing store.  In that case we will have
      to track preparedness at a 4K level, probably as a bitmap.
      
      In preparation for that, do not use explicitly folio_test_uptodate() and
      folio_mark_uptodate().  Return the state of the page directly from
      __kvm_gmem_get_pfn(), so that it is expected to apply to 2^N pages
      with N=*max_order.  The function to mark a range as prepared for now
      takes just a folio, but is expected to take also an index and order
      (or something like that) when large pages are introduced.
      
      Thanks to Michael Roth for pointing out the issue with large pages.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      66a644c0
    • Paolo Bonzini's avatar
      KVM: guest_memfd: let kvm_gmem_populate() operate only on private gfns · e4ee5447
      Paolo Bonzini authored
      This check is currently performed by sev_gmem_post_populate(), but it
      applies to all callers of kvm_gmem_populate(): the point of the function
      is that the memory is being encrypted and some work has to be done
      on all the gfns in order to encrypt them.
      
      Therefore, check the KVM_MEMORY_ATTRIBUTE_PRIVATE attribute prior
      to invoking the callback, and stop the operation if a shared page
      is encountered.  Because CONFIG_KVM_PRIVATE_MEM in principle does
      not require attributes, this makes kvm_gmem_populate() depend on
      CONFIG_KVM_GENERIC_PRIVATE_MEM (which does require them).
      Reviewed-by: default avatarMichael Roth <michael.roth@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e4ee5447
    • Paolo Bonzini's avatar
      KVM: extend kvm_range_has_memory_attributes() to check subset of attributes · 4b5f6712
      Paolo Bonzini authored
      While currently there is no other attribute than KVM_MEMORY_ATTRIBUTE_PRIVATE,
      KVM code such as kvm_mem_is_private() is written to expect their existence.
      Allow using kvm_range_has_memory_attributes() as a multi-page version of
      kvm_mem_is_private(), without it breaking later when more attributes are
      introduced.
      Reviewed-by: default avatarMichael Roth <michael.roth@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4b5f6712
    • Paolo Bonzini's avatar
      KVM: cleanup and add shortcuts to kvm_range_has_memory_attributes() · e300614f
      Paolo Bonzini authored
      Use a guard to simplify early returns, and add two more easy
      shortcuts.  If the requested attributes are invalid, the attributes
      xarray will never show them as set.  And if testing a single page,
      kvm_get_memory_attributes() is more efficient.
      Reviewed-by: default avatarMichael Roth <michael.roth@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e300614f
    • Paolo Bonzini's avatar
      KVM: guest_memfd: move check for already-populated page to common code · de802524
      Paolo Bonzini authored
      Do not allow populating the same page twice with startup data.  In the
      case of SEV-SNP, for example, the firmware does not allow it anyway,
      since the launch-update operation is only possible on pages that are
      still shared in the RMP.
      
      Even if it worked, kvm_gmem_populate()'s callback is meant to have side
      effects such as updating launch measurements, and updating the same
      page twice is unlikely to have the desired results.
      
      Races between calls to the ioctl are not possible because
      kvm_gmem_populate() holds slots_lock and the VM should not be running.
      But again, even if this worked on other confidential computing technology,
      it doesn't matter to guest_memfd.c whether this is something fishy
      such as missing synchronization in userspace, or rather something
      intentional.  One of the racers wins, and the page is initialized by
      either kvm_gmem_prepare_folio() or kvm_gmem_populate().
      
      Anyway, out of paranoia, adjust sev_gmem_post_populate() anyway to use
      the same errno that kvm_gmem_populate() is using.
      Reviewed-by: default avatarMichael Roth <michael.roth@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      de802524
    • Paolo Bonzini's avatar
      KVM: remove kvm_arch_gmem_prepare_needed() · 7239ed74
      Paolo Bonzini authored
      It is enough to return 0 if a guest need not do any preparation.
      This is in fact how sev_gmem_prepare() works for non-SNP guests,
      and it extends naturally to Intel hosts: the x86 callback for
      gmem_prepare is optional and returns 0 if not defined.
      Reviewed-by: default avatarMichael Roth <michael.roth@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7239ed74
    • Paolo Bonzini's avatar
      KVM: guest_memfd: make kvm_gmem_prepare_folio() operate on a single struct kvm · 6dd761d9
      Paolo Bonzini authored
      This is now possible because preparation is done by kvm_gmem_get_pfn()
      instead of fallocate().  In practice this is not a limitation, because
      even though guest_memfd can be bound to multiple struct kvm, for
      hardware implementations of confidential computing only one guest
      (identified by an ASID on SEV-SNP, or an HKID on TDX) will be able
      to access it.
      
      In the case of intra-host migration (not implemented yet for SEV-SNP,
      but we can use SEV-ES as an idea of how it will work), the new struct
      kvm inherits the same ASID and preparation need not be repeated.
      Reviewed-by: default avatarMichael Roth <michael.roth@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6dd761d9
    • Paolo Bonzini's avatar
      KVM: guest_memfd: delay kvm_gmem_prepare_folio() until the memory is passed to the guest · b8552431
      Paolo Bonzini authored
      Initializing the contents of the folio on fallocate() is unnecessarily
      restrictive.  It means that the page is registered with the firmware and
      then it cannot be touched anymore.  In particular, this loses the
      possibility of using fallocate() to pre-allocate the page for SEV-SNP
      guests, because kvm_arch_gmem_prepare() then fails.
      
      It's only when the guest actually accesses the page (and therefore
      kvm_gmem_get_pfn() is called) that the page must be cleared from any
      stale host data and registered with the firmware.  The up-to-date flag
      is clear if this has to be done (i.e. it is the first access and
      kvm_gmem_populate() has not been called).
      
      All in all, there are enough differences between kvm_gmem_get_pfn() and
      kvm_gmem_populate(), that it's better to separate the two flows completely.
      Extract the bulk of kvm_gmem_get_folio(), which take a folio and end up
      setting its up-to-date flag, to a new function kvm_gmem_prepare_folio();
      these are now done only by the non-__-prefixed kvm_gmem_get_pfn().
      As a bonus, __kvm_gmem_get_pfn() loses its ugly "bool prepare" argument.
      
      One difference is that fallocate(PUNCH_HOLE) can now race with a
      page fault.  Potentially this causes a page to be prepared and into the
      filemap even after fallocate(PUNCH_HOLE).  This is harmless, as it can be
      fixed by another hole punching operation, and can be avoided by clearing
      the private-page attribute prior to invoking fallocate(PUNCH_HOLE).
      This way, the page fault will cause an exit to user space.
      
      The previous semantics, where fallocate() could be used to prepare
      the pages in advance of running the guest, can be accessed with
      KVM_PRE_FAULT_MEMORY.
      
      For now, accessing a page in one VM will attempt to call
      kvm_arch_gmem_prepare() in all of those that have bound the guest_memfd.
      Cleaning this up is left to a separate patch.
      Suggested-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b8552431
    • Paolo Bonzini's avatar
      KVM: guest_memfd: return locked folio from __kvm_gmem_get_pfn · 78c42933
      Paolo Bonzini authored
      Allow testing the up-to-date flag in the caller without taking the
      lock again.
      Reviewed-by: default avatarMichael Roth <michael.roth@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      78c42933
    • Paolo Bonzini's avatar
      KVM: rename CONFIG_HAVE_KVM_GMEM_* to CONFIG_HAVE_KVM_ARCH_GMEM_* · 564429a6
      Paolo Bonzini authored
      Add "ARCH" to the symbols; shortly, the "prepare" phase will include both
      the arch-independent step to clear out contents left in the page by the
      host, and the arch-dependent step enabled by CONFIG_HAVE_KVM_GMEM_PREPARE.
      For consistency do the same for CONFIG_HAVE_KVM_GMEM_INVALIDATE as well.
      Reviewed-by: default avatarMichael Roth <michael.roth@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      564429a6
    • Paolo Bonzini's avatar
      KVM: guest_memfd: do not go through struct page · 7fbdda31
      Paolo Bonzini authored
      We have a perfectly usable folio, use it to retrieve the pfn and order.
      All that's needed is a version of folio_file_page that returns a pfn.
      Reviewed-by: default avatarMichael Roth <michael.roth@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7fbdda31
    • Paolo Bonzini's avatar
      KVM: guest_memfd: delay folio_mark_uptodate() until after successful preparation · d04c77d2
      Paolo Bonzini authored
      The up-to-date flag as is now is not too useful; it tells guest_memfd not
      to overwrite the contents of a folio, but it doesn't say that the page
      is ready to be mapped into the guest.  For encrypted guests, mapping
      a private page requires that the "preparation" phase has succeeded,
      and at the same time the same page cannot be prepared twice.
      
      So, ensure that folio_mark_uptodate() is only called on a prepared page.  If
      kvm_gmem_prepare_folio() or the post_populate callback fail, the folio
      will not be marked up-to-date; it's not a problem to call clear_highpage()
      again on such a page prior to the next preparation attempt.
      Reviewed-by: default avatarMichael Roth <michael.roth@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d04c77d2
    • Paolo Bonzini's avatar
      KVM: guest_memfd: return folio from __kvm_gmem_get_pfn() · d0d87226
      Paolo Bonzini authored
      Right now this is simply more consistent and avoids use of pfn_to_page()
      and put_page().  It will be put to more use in upcoming patches, to
      ensure that the up-to-date flag is set at the very end of both the
      kvm_gmem_get_pfn() and kvm_gmem_populate() flows.
      Reviewed-by: default avatarMichael Roth <michael.roth@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d0d87226
    • Paolo Bonzini's avatar
      KVM: x86: disallow pre-fault for SNP VMs before initialization · 5932ca41
      Paolo Bonzini authored
      KVM_PRE_FAULT_MEMORY for an SNP guest can race with
      sev_gmem_post_populate() in bad ways. The following sequence for
      instance can potentially trigger an RMP fault:
      
        thread A, sev_gmem_post_populate: called
        thread B, sev_gmem_prepare: places below 'pfn' in a private state in RMP
        thread A, sev_gmem_post_populate: *vaddr = kmap_local_pfn(pfn + i);
        thread A, sev_gmem_post_populate: copy_from_user(vaddr, src + i * PAGE_SIZE, PAGE_SIZE);
        RMP #PF
      
      Fix this by only allowing KVM_PRE_FAULT_MEMORY to run after a guest's
      initial private memory contents have been finalized via
      KVM_SEV_SNP_LAUNCH_FINISH.
      
      Beyond fixing this issue, it just sort of makes sense to enforce this,
      since the KVM_PRE_FAULT_MEMORY documentation states:
      
        "KVM maps memory as if the vCPU generated a stage-2 read page fault"
      
      which sort of implies we should be acting on the same guest state that a
      vCPU would see post-launch after the initial guest memory is all set up.
      Co-developed-by: default avatarMichael Roth <michael.roth@amd.com>
      Signed-off-by: default avatarMichael Roth <michael.roth@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5932ca41
    • Chang Yu's avatar
      KVM: Documentation: Fix title underline too short warning · c2adcf05
      Chang Yu authored
      Fix "WARNING: Title underline too short" by extending title line to the
      proper length.
      Signed-off-by: default avatarChang Yu <marcus.yu.56@gmail.com>
      Message-ID: <ZqB3lofbzMQh5Q-5@gmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c2adcf05
    • Jim Mattson's avatar
      KVM: x86: Eliminate log spam from limited APIC timer periods · 0005ca20
      Jim Mattson authored
      SAP's vSMP MemoryONE continuously requests a local APIC timer period
      less than 500 us, resulting in the following kernel log spam:
      
        kvm: vcpu 15: requested 70240 ns lapic timer period limited to 500000 ns
        kvm: vcpu 19: requested 52848 ns lapic timer period limited to 500000 ns
        kvm: vcpu 15: requested 70256 ns lapic timer period limited to 500000 ns
        kvm: vcpu 9: requested 70256 ns lapic timer period limited to 500000 ns
        kvm: vcpu 9: requested 70208 ns lapic timer period limited to 500000 ns
        kvm: vcpu 9: requested 387520 ns lapic timer period limited to 500000 ns
        kvm: vcpu 9: requested 70160 ns lapic timer period limited to 500000 ns
        kvm: vcpu 66: requested 205744 ns lapic timer period limited to 500000 ns
        kvm: vcpu 9: requested 70224 ns lapic timer period limited to 500000 ns
        kvm: vcpu 9: requested 70256 ns lapic timer period limited to 500000 ns
        limit_periodic_timer_frequency: 7569 callbacks suppressed
        ...
      
      To eliminate this spam, change the pr_info_ratelimited() in
      limit_periodic_timer_frequency() to pr_info_once().
      Reported-by: default avatarJames Houghton <jthoughton@google.com>
      Signed-off-by: default avatarJim Mattson <jmattson@google.com>
      Message-ID: <20240724190640.2449291-1-jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0005ca20
  2. 17 Jul, 2024 1 commit
    • Michael Roth's avatar
      crypto: ccp: Add the SNP_VLEK_LOAD command · 332d2c1d
      Michael Roth authored
      When requesting an attestation report a guest is able to specify whether
      it wants SNP firmware to sign the report using either a Versioned Chip
      Endorsement Key (VCEK), which is derived from chip-unique secrets, or a
      Versioned Loaded Endorsement Key (VLEK) which is obtained from an AMD
      Key Derivation Service (KDS) and derived from seeds allocated to
      enrolled cloud service providers (CSPs).
      
      For VLEK keys, an SNP_VLEK_LOAD SNP firmware command is used to load
      them into the system after obtaining them from the KDS. Add a
      corresponding userspace interface so to allow the loading of VLEK keys
      into the system.
      
      See SEV-SNP Firmware ABI 1.54, SNP_VLEK_LOAD for more details.
      Reviewed-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarMichael Roth <michael.roth@amd.com>
      Message-ID: <20240501085210.2213060-21-michael.roth@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      332d2c1d
  3. 16 Jul, 2024 20 commits
    • Wei Wang's avatar
      KVM: x86/pmu: Add kvm_pmu_call() to simplify static calls of kvm_pmu_ops · 5d766508
      Wei Wang authored
      Similar to kvm_x86_call(), kvm_pmu_call() is added to streamline the usage
      of static calls of kvm_pmu_ops, which improves code readability.
      Suggested-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarWei Wang <wei.w.wang@intel.com>
      Link: https://lore.kernel.org/r/20240507133103.15052-4-wei.w.wang@intel.comSigned-off-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5d766508
    • Wei Wang's avatar
      KVM: x86: Introduce kvm_x86_call() to simplify static calls of kvm_x86_ops · 89604647
      Wei Wang authored
      Introduces kvm_x86_call(), to streamline the usage of static calls of
      kvm_x86_ops. The current implementation of these calls is verbose and
      could lead to alignment challenges. This makes the code susceptible to
      exceeding the "80 columns per single line of code" limit as defined in
      the coding-style document. Another issue with the existing implementation
      is that the addition of kvm_x86_ prefix to hooks at the static_call sites
      hinders code readability and navigation. kvm_x86_call() is added to
      improve code readability and maintainability, while adhering to the coding
      style guidelines.
      Signed-off-by: default avatarWei Wang <wei.w.wang@intel.com>
      Link: https://lore.kernel.org/r/20240507133103.15052-3-wei.w.wang@intel.comSigned-off-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      89604647
    • Wei Wang's avatar
      KVM: x86: Replace static_call_cond() with static_call() · f4854bf7
      Wei Wang authored
      The use of static_call_cond() is essentially the same as static_call() on
      x86 (e.g. static_call() now handles a NULL pointer as a NOP), so replace
      it with static_call() to simplify the code.
      
      Link: https://lore.kernel.org/all/3916caa1dcd114301a49beafa5030eca396745c1.1679456900.git.jpoimboe@kernel.org/Suggested-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarWei Wang <wei.w.wang@intel.com>
      Link: https://lore.kernel.org/r/20240507133103.15052-2-wei.w.wang@intel.comSigned-off-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f4854bf7
    • Paolo Bonzini's avatar
      Merge branch 'kvm-6.11-sev-attestation' into HEAD · bc9cd5a2
      Paolo Bonzini authored
      The GHCB 2.0 specification defines 2 GHCB request types to allow SNP guests
      to send encrypted messages/requests to firmware: SNP Guest Requests and SNP
      Extended Guest Requests. These encrypted messages are used for things like
      servicing attestation requests issued by the guest. Implementing support for
      these is required to be fully GHCB-compliant.
      
      For the most part, KVM only needs to handle forwarding these requests to
      firmware (to be issued via the SNP_GUEST_REQUEST firmware command defined
      in the SEV-SNP Firmware ABI), and then forwarding the encrypted response to
      the guest.
      
      However, in the case of SNP Extended Guest Requests, the host is also
      able to provide the certificate data corresponding to the endorsement key
      used by firmware to sign attestation report requests. This certificate data
      is provided by userspace because:
      
        1) It allows for different keys/key types to be used for each particular
           guest with requiring any sort of KVM API to configure the certificate
           table in advance on a per-guest basis.
      
        2) It provides additional flexibility with how attestation requests might
           be handled during live migration where the certificate data for
           source/dest might be different.
      
        3) It allows all synchronization between certificates and firmware/signing
           key updates to be handled purely by userspace rather than requiring
           some in-kernel mechanism to facilitate it. [1]
      
      To support fetching certificate data from userspace, a new KVM exit type will
      be needed to handle fetching the certificate from userspace. An attempt to
      define a new KVM_EXIT_COCO/KVM_EXIT_COCO_REQ_CERTS exit type to handle this
      was introduced in v1 of this patchset, but is still being discussed by
      community, so for now this patchset only implements a stub version of SNP
      Extended Guest Requests that does not provide certificate data, but is still
      enough to provide compliance with the GHCB 2.0 spec.
      bc9cd5a2
    • Michael Roth's avatar
      KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event · 74458e48
      Michael Roth authored
      Version 2 of GHCB specification added support for the SNP Extended Guest
      Request Message NAE event. This event serves a nearly identical purpose
      to the previously-added SNP_GUEST_REQUEST event, but for certain message
      types it allows the guest to supply a buffer to be used for additional
      information in some cases.
      
      Currently the GHCB spec only defines extended handling of this sort in
      the case of attestation requests, where the additional buffer is used to
      supply a table of certificate data corresponding to the attestion
      report's signing key. Support for this extended handling will require
      additional KVM APIs to handle coordinating with userspace.
      
      Whether or not the hypervisor opts to provide this certificate data is
      optional. However, support for processing SNP_EXTENDED_GUEST_REQUEST
      GHCB requests is required by the GHCB 2.0 specification for SNP guests,
      so for now implement a stub implementation that provides an empty
      certificate table to the guest if it supplies an additional buffer, but
      otherwise behaves identically to SNP_GUEST_REQUEST.
      Reviewed-by: default avatarCarlos Bilbao <carlos.bilbao.osdev@gmail.com>
      Reviewed-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Reviewed-by: default avatarLiam Merwick <liam.merwick@oracle.com>
      Signed-off-by: default avatarMichael Roth <michael.roth@amd.com>
      Message-ID: <20240701223148.3798365-4-michael.roth@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      74458e48
    • Michael Roth's avatar
      x86/sev: Move sev_guest.h into common SEV header · f55f3c3a
      Michael Roth authored
      sev_guest.h currently contains various definitions relating to the
      format of SNP_GUEST_REQUEST commands to SNP firmware. Currently only the
      sev-guest driver makes use of them, but when the KVM side of this is
      implemented there's a need to parse the SNP_GUEST_REQUEST header to
      determine whether additional information needs to be provided to the
      guest. Prepare for this by moving those definitions to a common header
      that's shared by host/guest code so that KVM can also make use of them.
      Reviewed-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Reviewed-by: default avatarLiam Merwick <liam.merwick@oracle.com>
      Signed-off-by: default avatarMichael Roth <michael.roth@amd.com>
      Message-ID: <20240701223148.3798365-3-michael.roth@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f55f3c3a
    • Brijesh Singh's avatar
      KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event · 88caf544
      Brijesh Singh authored
      Version 2 of GHCB specification added support for the SNP Guest Request
      Message NAE event. The event allows for an SEV-SNP guest to make
      requests to the SEV-SNP firmware through the hypervisor using the
      SNP_GUEST_REQUEST API defined in the SEV-SNP firmware specification.
      
      This is used by guests primarily to request attestation reports from
      firmware. There are other request types are available as well, but the
      specifics of what guest requests are being made generally does not
      affect how they are handled by the hypervisor, which only serves as a
      proxy for the guest requests and firmware responses.
      
      Implement handling for these events.
      
      When an SNP Guest Request is issued, the guest will provide its own
      request/response pages, which could in theory be passed along directly
      to firmware. However, these pages would need special care:
      
        - Both pages are from shared guest memory, so they need to be
          protected from migration/etc. occurring while firmware reads/writes
          to them. At a minimum, this requires elevating the ref counts and
          potentially needing an explicit pinning of the memory. This places
          additional restrictions on what type of memory backends userspace
          can use for shared guest memory since there would be some reliance
          on using refcounted pages.
      
        - The response page needs to be switched to Firmware-owned state
          before the firmware can write to it, which can lead to potential
          host RMP #PFs if the guest is misbehaved and hands the host a
          guest page that KVM is writing to for other reasons (e.g. virtio
          buffers).
      
      Both of these issues can be avoided completely by using
      separately-allocated bounce pages for both the request/response pages
      and passing those to firmware instead. So that's the approach taken
      here.
      Signed-off-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Co-developed-by: default avatarAlexey Kardashevskiy <aik@amd.com>
      Signed-off-by: default avatarAlexey Kardashevskiy <aik@amd.com>
      Co-developed-by: default avatarAshish Kalra <ashish.kalra@amd.com>
      Signed-off-by: default avatarAshish Kalra <ashish.kalra@amd.com>
      Reviewed-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Reviewed-by: default avatarLiam Merwick <liam.merwick@oracle.com>
      [mdr: ensure FW command failures are indicated to guest, drop extended
       request handling to be re-written as separate patch, massage commit]
      Signed-off-by: default avatarMichael Roth <michael.roth@amd.com>
      Message-ID: <20240701223148.3798365-2-michael.roth@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      88caf544
    • Sean Christopherson's avatar
      KVM: x86: Suppress MMIO that is triggered during task switch emulation · 2a1fc7dc
      Sean Christopherson authored
      Explicitly suppress userspace emulated MMIO exits that are triggered when
      emulating a task switch as KVM doesn't support userspace MMIO during
      complex (multi-step) emulation.  Silently ignoring the exit request can
      result in the WARN_ON_ONCE(vcpu->mmio_needed) firing if KVM exits to
      userspace for some other reason prior to purging mmio_needed.
      
      See commit 0dc90226 ("KVM: x86: Suppress pending MMIO write exits if
      emulator detects exception") for more details on KVM's limitations with
      respect to emulated MMIO during complex emulator flows.
      
      Reported-by: syzbot+2fb9f8ed752c01bc9a3f@syzkaller.appspotmail.com
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-ID: <20240712144841.1230591-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2a1fc7dc
    • Sean Christopherson's avatar
      KVM: x86/mmu: Clean up make_huge_page_split_spte() definition and intro · 9fe17d2a
      Sean Christopherson authored
      Tweak the definition of make_huge_page_split_spte() to eliminate an
      unnecessarily long line, and opportunistically initialize child_spte to
      make it more obvious that the child is directly derived from the huge
      parent.
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-ID: <20240712151335.1242633-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      9fe17d2a
    • Sean Christopherson's avatar
      KVM: x86/mmu: Bug the VM if KVM tries to split a !hugepage SPTE · 3d4415ed
      Sean Christopherson authored
      Bug the VM instead of simply warning if KVM tries to split a SPTE that is
      non-present or not-huge.  KVM is guaranteed to end up in a broken state as
      the callers fully expect a valid SPTE, e.g. the shadow MMU will add an
      rmap entry, and all MMUs will account the expected small page.  Returning
      '0' is also technically wrong now that SHADOW_NONPRESENT_VALUE exists,
      i.e. would cause KVM to create a potential #VE SPTE.
      
      While it would be possible to have the callers gracefully handle failure,
      doing so would provide no practical value as the scenario really should be
      impossible, while the error handling would add a non-trivial amount of
      noise.
      
      Fixes: a3fe5dbd ("KVM: x86/mmu: Split huge pages mapped by the TDP MMU when dirty logging is enabled")
      Cc: David Matlack <dmatlack@google.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-ID: <20240712151335.1242633-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3d4415ed
    • Paolo Bonzini's avatar
      Merge tag 'kvm-x86-vmx-6.11' of https://github.com/kvm-x86/linux into HEAD · 208a352a
      Paolo Bonzini authored
      KVM VMX changes for 6.11
      
       - Remove an unnecessary EPT TLB flush when enabling hardware.
      
       - Fix a series of bugs that cause KVM to fail to detect nested pending posted
         interrupts as valid wake eents for a vCPU executing HLT in L2 (with
         HLT-exiting disable by L1).
      
       - Misc cleanups
      208a352a
    • Paolo Bonzini's avatar
      Merge tag 'kvm-x86-svm-6.11' of https://github.com/kvm-x86/linux into HEAD · 1229cbef
      Paolo Bonzini authored
      KVM SVM changes for 6.11
      
       - Make per-CPU save_area allocations NUMA-aware.
      
       - Force sev_es_host_save_area() to be inlined to avoid calling into an
         instrumentable function from noinstr code.
      1229cbef
    • Paolo Bonzini's avatar
      Merge tag 'kvm-x86-selftests-6.11' of https://github.com/kvm-x86/linux into HEAD · dbfd50cb
      Paolo Bonzini authored
      KVM selftests for 6.11
      
       - Remove dead code in the memslot modification stress test.
      
       - Treat "branch instructions retired" as supported on all AMD Family 17h+ CPUs.
      
       - Print the guest pseudo-RNG seed only when it changes, to avoid spamming the
         log for tests that create lots of VMs.
      
       - Make the PMU counters test less flaky when counting LLC cache misses by
         doing CLFLUSH{OPT} in every loop iteration.
      dbfd50cb
    • Paolo Bonzini's avatar
      Merge tag 'kvm-x86-pmu-6.11' of https://github.com/kvm-x86/linux into HEAD · cda231cd
      Paolo Bonzini authored
      KVM x86/pmu changes for 6.11
      
       - Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as it reads
         '0' and writes from userspace are ignored.
      
       - Update to the newfangled Intel CPU FMS infrastructure.
      
       - Use macros instead of open-coded literals to clean up KVM's manipulation of
         FIXED_CTR_CTRL MSRs.
      cda231cd
    • Paolo Bonzini's avatar
      Merge tag 'kvm-x86-mtrrs-6.11' of https://github.com/kvm-x86/linux into HEAD · 5c5ddf71
      Paolo Bonzini authored
      KVM x86 MTRR virtualization removal
      
      Remove support for virtualizing MTRRs on Intel CPUs, along with a nasty CR0.CD
      hack, and instead always honor guest PAT on CPUs that support self-snoop.
      5c5ddf71
    • Paolo Bonzini's avatar
      Merge tag 'kvm-x86-mmu-6.11' of https://github.com/kvm-x86/linux into HEAD · 34b69ede
      Paolo Bonzini authored
      KVM x86 MMU changes for 6.11
      
       - Don't allocate kvm_mmu_page.shadowed_translation for shadow pages that can't
         hold leafs SPTEs.
      
       - Unconditionally drop mmu_lock when allocating TDP MMU page tables for eager
         page splitting to avoid stalling vCPUs when splitting huge pages.
      
       - Misc cleanups
      34b69ede
    • Paolo Bonzini's avatar
      Merge tag 'kvm-x86-misc-6.11' of https://github.com/kvm-x86/linux into HEAD · 5dcc1e76
      Paolo Bonzini authored
      KVM x86 misc changes for 6.11
      
       - Add a global struct to consolidate tracking of host values, e.g. EFER, and
         move "shadow_phys_bits" into the structure as "maxphyaddr".
      
       - Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the effective APIC
         bus frequency, because TDX.
      
       - Print the name of the APICv/AVIC inhibits in the relevant tracepoint.
      
       - Clean up KVM's handling of vendor specific emulation to consistently act on
         "compatible with Intel/AMD", versus checking for a specific vendor.
      
       - Misc cleanups
      5dcc1e76
    • Paolo Bonzini's avatar
      Merge tag 'kvm-x86-generic-6.11' of https://github.com/kvm-x86/linux into HEAD · 86014c1e
      Paolo Bonzini authored
      KVM generic changes for 6.11
      
       - Enable halt poll shrinking by default, as Intel found it to be a clear win.
      
       - Setup empty IRQ routing when creating a VM to avoid having to synchronize
         SRCU when creating a split IRQCHIP on x86.
      
       - Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag
         that arch code can use for hooking both sched_in() and sched_out().
      
       - Take the vCPU @id as an "unsigned long" instead of "u32" to avoid
         truncating a bogus value from userspace, e.g. to help userspace detect bugs.
      
       - Mark a vCPU as preempted if and only if it's scheduled out while in the
         KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest
         memory when retrieving guest state during live migration blackout.
      
       - A few minor cleanups
      86014c1e
    • Paolo Bonzini's avatar
      Merge tag 'kvm-x86-fixes-6.10-11' of https://github.com/kvm-x86/linux into HEAD · f4501e8b
      Paolo Bonzini authored
      KVM Xen:
      
      Fix a bug where KVM fails to check the validity of an incoming userspace
      virtual address and tries to activate a gfn_to_pfn_cache with a kernel address.
      f4501e8b
    • Paolo Bonzini's avatar
      Merge tag 'kvmarm-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD · 1c5a0b55
      Paolo Bonzini authored
      KVM/arm64 changes for 6.11
      
       - Initial infrastructure for shadow stage-2 MMUs, as part of nested
         virtualization enablement
      
       - Support for userspace changes to the guest CTR_EL0 value, enabling
         (in part) migration of VMs between heterogenous hardware
      
       - Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1 of
         the protocol
      
       - FPSIMD/SVE support for nested, including merged trap configuration
         and exception routing
      
       - New command-line parameter to control the WFx trap behavior under KVM
      
       - Introduce kCFI hardening in the EL2 hypervisor
      
       - Fixes + cleanups for handling presence/absence of FEAT_TCRX
      
       - Miscellaneous fixes + documentation updates
      1c5a0b55
  4. 14 Jul, 2024 3 commits
    • Oliver Upton's avatar
      Merge branch kvm-arm64/docs into kvmarm/next · bb032b23
      Oliver Upton authored
      * kvm-arm64/docs:
        : KVM Documentation fixes, courtesy of Changyuan Lyu
        :
        : Small set of typo fixes / corrections to the KVM API documentation
        : relating to MSIs and arm64 VGIC UAPI.
        MAINTAINERS: Include documentation in KVM/arm64 entry
        KVM: Documentation: Correct the VGIC V2 CPU interface addr space size
        KVM: Documentation: Enumerate allowed value macros of `irq_type`
        KVM: Documentation: Fix typo `BFD`
      Signed-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      bb032b23
    • Oliver Upton's avatar
      Merge branch kvm-arm64/nv-tcr2 into kvmarm/next · bc2e3253
      Oliver Upton authored
      * kvm-arm64/nv-tcr2:
        : Fixes to the handling of TCR_EL1, courtesy of Marc Zyngier
        :
        : Series addresses a couple gaps that are present in KVM (from cover
        : letter):
        :
        :   - VM configuration: HCRX_EL2.TCR2En is forced to 1, and we blindly
        :     save/restore stuff.
        :
        :   - trap bit description and routing: none, obviously, since we make a
        :     point in not trapping.
        KVM: arm64: Honor trap routing for TCR2_EL1
        KVM: arm64: Make PIR{,E0}_EL1 save/restore conditional on FEAT_TCRX
        KVM: arm64: Make TCR2_EL1 save/restore dependent on the VM features
        KVM: arm64: Get rid of HCRX_GUEST_FLAGS
        KVM: arm64: Correctly honor the presence of FEAT_TCRX
      Signed-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      bc2e3253
    • Oliver Upton's avatar
      Merge branch kvm-arm64/nv-sve into kvmarm/next · 8c2899e7
      Oliver Upton authored
      * kvm-arm64/nv-sve:
        : CPTR_EL2, FPSIMD/SVE support for nested
        :
        : This series brings support for honoring the guest hypervisor's CPTR_EL2
        : trap configuration when running a nested guest, along with support for
        : FPSIMD/SVE usage at L1 and L2.
        KVM: arm64: Allow the use of SVE+NV
        KVM: arm64: nv: Add additional trap setup for CPTR_EL2
        KVM: arm64: nv: Add trap description for CPTR_EL2
        KVM: arm64: nv: Add TCPAC/TTA to CPTR->CPACR conversion helper
        KVM: arm64: nv: Honor guest hypervisor's FP/SVE traps in CPTR_EL2
        KVM: arm64: nv: Load guest FP state for ZCR_EL2 trap
        KVM: arm64: nv: Handle CPACR_EL1 traps
        KVM: arm64: Spin off helper for programming CPTR traps
        KVM: arm64: nv: Ensure correct VL is loaded before saving SVE state
        KVM: arm64: nv: Use guest hypervisor's max VL when running nested guest
        KVM: arm64: nv: Save guest's ZCR_EL2 when in hyp context
        KVM: arm64: nv: Load guest hyp's ZCR into EL1 state
        KVM: arm64: nv: Handle ZCR_EL2 traps
        KVM: arm64: nv: Forward SVE traps to guest hypervisor
        KVM: arm64: nv: Forward FP/ASIMD traps to guest hypervisor
      Signed-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      8c2899e7