1. 14 Nov, 2023 5 commits
    • Chao Peng's avatar
      KVM: x86/mmu: Handle page fault for private memory · 8dd2eee9
      Chao Peng authored
      Add support for resolving page faults on guest private memory for VMs
      that differentiate between "shared" and "private" memory.  For such VMs,
      KVM_MEM_GUEST_MEMFD memslots can include both fd-based private memory and
      hva-based shared memory, and KVM needs to map in the "correct" variant,
      i.e. KVM needs to map the gfn shared/private as appropriate based on the
      current state of the gfn's KVM_MEMORY_ATTRIBUTE_PRIVATE flag.
      
      For AMD's SEV-SNP and Intel's TDX, the guest effectively gets to request
      shared vs. private via a bit in the guest page tables, i.e. what the guest
      wants may conflict with the current memory attributes.  To support such
      "implicit" conversion requests, exit to user with KVM_EXIT_MEMORY_FAULT
      to forward the request to userspace.  Add a new flag for memory faults,
      KVM_MEMORY_EXIT_FLAG_PRIVATE, to communicate whether the guest wants to
      map memory as shared vs. private.
      
      Like KVM_MEMORY_ATTRIBUTE_PRIVATE, use bit 3 for flagging private memory
      so that KVM can use bits 0-2 for capturing RWX behavior if/when userspace
      needs such information, e.g. a likely user of KVM_EXIT_MEMORY_FAULT is to
      exit on missing mappings when handling guest page fault VM-Exits.  In
      that case, userspace will want to know RWX information in order to
      correctly/precisely resolve the fault.
      
      Note, private memory *must* be backed by guest_memfd, i.e. shared mappings
      always come from the host userspace page tables, and private mappings
      always come from a guest_memfd instance.
      Co-developed-by: default avatarYu Zhang <yu.c.zhang@linux.intel.com>
      Signed-off-by: default avatarYu Zhang <yu.c.zhang@linux.intel.com>
      Signed-off-by: default avatarChao Peng <chao.p.peng@linux.intel.com>
      Co-developed-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarFuad Tabba <tabba@google.com>
      Tested-by: default avatarFuad Tabba <tabba@google.com>
      Message-Id: <20231027182217.3615211-21-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      8dd2eee9
    • Chao Peng's avatar
      KVM: x86: Disallow hugepages when memory attributes are mixed · 90b4fe17
      Chao Peng authored
      Disallow creating hugepages with mixed memory attributes, e.g. shared
      versus private, as mapping a hugepage in this case would allow the guest
      to access memory with the wrong attributes, e.g. overlaying private memory
      with a shared hugepage.
      
      Tracking whether or not attributes are mixed via the existing
      disallow_lpage field, but use the most significant bit in 'disallow_lpage'
      to indicate a hugepage has mixed attributes instead using the normal
      refcounting.  Whether or not attributes are mixed is binary; either they
      are or they aren't.  Attempting to squeeze that info into the refcount is
      unnecessarily complex as it would require knowing the previous state of
      the mixed count when updating attributes.  Using a flag means KVM just
      needs to ensure the current status is reflected in the memslots.
      Signed-off-by: default avatarChao Peng <chao.p.peng@linux.intel.com>
      Co-developed-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20231027182217.3615211-20-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      90b4fe17
    • Sean Christopherson's avatar
      KVM: x86: "Reset" vcpu->run->exit_reason early in KVM_RUN · ee605e31
      Sean Christopherson authored
      Initialize run->exit_reason to KVM_EXIT_UNKNOWN early in KVM_RUN to reduce
      the probability of exiting to userspace with a stale run->exit_reason that
      *appears* to be valid.
      
      To support fd-based guest memory (guest memory without a corresponding
      userspace virtual address), KVM will exit to userspace for various memory
      related errors, which userspace *may* be able to resolve, instead of using
      e.g. BUS_MCEERR_AR.  And in the more distant future, KVM will also likely
      utilize the same functionality to let userspace "intercept" and handle
      memory faults when the userspace mapping is missing, i.e. when fast gup()
      fails.
      
      Because many of KVM's internal APIs related to guest memory use '0' to
      indicate "success, continue on" and not "exit to userspace", reporting
      memory faults/errors to userspace will set run->exit_reason and
      corresponding fields in the run structure fields in conjunction with a
      a non-zero, negative return code, e.g. -EFAULT or -EHWPOISON.  And because
      KVM already returns  -EFAULT in many paths, there's a relatively high
      probability that KVM could return -EFAULT without setting run->exit_reason,
      in which case reporting KVM_EXIT_UNKNOWN is much better than reporting
      whatever exit reason happened to be in the run structure.
      
      Note, KVM must wait until after run->immediate_exit is serviced to
      sanitize run->exit_reason as KVM's ABI is that run->exit_reason is
      preserved across KVM_RUN when run->immediate_exit is true.
      
      Link: https://lore.kernel.org/all/20230908222905.1321305-1-amoorthy@google.com
      Link: https://lore.kernel.org/all/ZFFbwOXZ5uI%2Fgdaf@google.comSigned-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: default avatarFuad Tabba <tabba@google.com>
      Tested-by: default avatarFuad Tabba <tabba@google.com>
      Message-Id: <20231027182217.3615211-19-seanjc@google.com>
      Reviewed-by: default avatarXiaoyao Li <xiaoyao.li@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ee605e31
    • Sean Christopherson's avatar
      KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory · a7800aa8
      Sean Christopherson authored
      Introduce an ioctl(), KVM_CREATE_GUEST_MEMFD, to allow creating file-based
      memory that is tied to a specific KVM virtual machine and whose primary
      purpose is to serve guest memory.
      
      A guest-first memory subsystem allows for optimizations and enhancements
      that are kludgy or outright infeasible to implement/support in a generic
      memory subsystem.  With guest_memfd, guest protections and mapping sizes
      are fully decoupled from host userspace mappings.   E.g. KVM currently
      doesn't support mapping memory as writable in the guest without it also
      being writable in host userspace, as KVM's ABI uses VMA protections to
      define the allow guest protection.  Userspace can fudge this by
      establishing two mappings, a writable mapping for the guest and readable
      one for itself, but that’s suboptimal on multiple fronts.
      
      Similarly, KVM currently requires the guest mapping size to be a strict
      subset of the host userspace mapping size, e.g. KVM doesn’t support
      creating a 1GiB guest mapping unless userspace also has a 1GiB guest
      mapping.  Decoupling the mappings sizes would allow userspace to precisely
      map only what is needed without impacting guest performance, e.g. to
      harden against unintentional accesses to guest memory.
      
      Decoupling guest and userspace mappings may also allow for a cleaner
      alternative to high-granularity mappings for HugeTLB, which has reached a
      bit of an impasse and is unlikely to ever be merged.
      
      A guest-first memory subsystem also provides clearer line of sight to
      things like a dedicated memory pool (for slice-of-hardware VMs) and
      elimination of "struct page" (for offload setups where userspace _never_
      needs to mmap() guest memory).
      
      More immediately, being able to map memory into KVM guests without mapping
      said memory into the host is critical for Confidential VMs (CoCo VMs), the
      initial use case for guest_memfd.  While AMD's SEV and Intel's TDX prevent
      untrusted software from reading guest private data by encrypting guest
      memory with a key that isn't usable by the untrusted host, projects such
      as Protected KVM (pKVM) provide confidentiality and integrity *without*
      relying on memory encryption.  And with SEV-SNP and TDX, accessing guest
      private memory can be fatal to the host, i.e. KVM must be prevent host
      userspace from accessing guest memory irrespective of hardware behavior.
      
      Attempt #1 to support CoCo VMs was to add a VMA flag to mark memory as
      being mappable only by KVM (or a similarly enlightened kernel subsystem).
      That approach was abandoned largely due to it needing to play games with
      PROT_NONE to prevent userspace from accessing guest memory.
      
      Attempt #2 to was to usurp PG_hwpoison to prevent the host from mapping
      guest private memory into userspace, but that approach failed to meet
      several requirements for software-based CoCo VMs, e.g. pKVM, as the kernel
      wouldn't easily be able to enforce a 1:1 page:guest association, let alone
      a 1:1 pfn:gfn mapping.  And using PG_hwpoison does not work for memory
      that isn't backed by 'struct page', e.g. if devices gain support for
      exposing encrypted memory regions to guests.
      
      Attempt #3 was to extend the memfd() syscall and wrap shmem to provide
      dedicated file-based guest memory.  That approach made it as far as v10
      before feedback from Hugh Dickins and Christian Brauner (and others) led
      to it demise.
      
      Hugh's objection was that piggybacking shmem made no sense for KVM's use
      case as KVM didn't actually *want* the features provided by shmem.  I.e.
      KVM was using memfd() and shmem to avoid having to manage memory directly,
      not because memfd() and shmem were the optimal solution, e.g. things like
      read/write/mmap in shmem were dead weight.
      
      Christian pointed out flaws with implementing a partial overlay (wrapping
      only _some_ of shmem), e.g. poking at inode_operations or super_operations
      would show shmem stuff, but address_space_operations and file_operations
      would show KVM's overlay.  Paraphrashing heavily, Christian suggested KVM
      stop being lazy and create a proper API.
      
      Link: https://lore.kernel.org/all/20201020061859.18385-1-kirill.shutemov@linux.intel.com
      Link: https://lore.kernel.org/all/20210416154106.23721-1-kirill.shutemov@linux.intel.com
      Link: https://lore.kernel.org/all/20210824005248.200037-1-seanjc@google.com
      Link: https://lore.kernel.org/all/20211111141352.26311-1-chao.p.peng@linux.intel.com
      Link: https://lore.kernel.org/all/20221202061347.1070246-1-chao.p.peng@linux.intel.com
      Link: https://lore.kernel.org/all/ff5c5b97-acdf-9745-ebe5-c6609dd6322e@google.com
      Link: https://lore.kernel.org/all/20230418-anfallen-irdisch-6993a61be10b@brauner
      Link: https://lore.kernel.org/all/ZEM5Zq8oo+xnApW9@google.com
      Link: https://lore.kernel.org/linux-mm/20230306191944.GA15773@monkey
      Link: https://lore.kernel.org/linux-mm/ZII1p8ZHlHaQ3dDl@casper.infradead.org
      Cc: Fuad Tabba <tabba@google.com>
      Cc: Vishal Annapurve <vannapurve@google.com>
      Cc: Ackerley Tng <ackerleytng@google.com>
      Cc: Jarkko Sakkinen <jarkko@kernel.org>
      Cc: Maciej Szmigiero <mail@maciej.szmigiero.name>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Quentin Perret <qperret@google.com>
      Cc: Michael Roth <michael.roth@amd.com>
      Cc: Wang <wei.w.wang@intel.com>
      Cc: Liam Merwick <liam.merwick@oracle.com>
      Cc: Isaku Yamahata <isaku.yamahata@gmail.com>
      Co-developed-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Co-developed-by: default avatarYu Zhang <yu.c.zhang@linux.intel.com>
      Signed-off-by: default avatarYu Zhang <yu.c.zhang@linux.intel.com>
      Co-developed-by: default avatarChao Peng <chao.p.peng@linux.intel.com>
      Signed-off-by: default avatarChao Peng <chao.p.peng@linux.intel.com>
      Co-developed-by: default avatarAckerley Tng <ackerleytng@google.com>
      Signed-off-by: default avatarAckerley Tng <ackerleytng@google.com>
      Co-developed-by: default avatarIsaku Yamahata <isaku.yamahata@intel.com>
      Signed-off-by: default avatarIsaku Yamahata <isaku.yamahata@intel.com>
      Co-developed-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Co-developed-by: default avatarMichael Roth <michael.roth@amd.com>
      Signed-off-by: default avatarMichael Roth <michael.roth@amd.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20231027182217.3615211-17-seanjc@google.com>
      Reviewed-by: default avatarFuad Tabba <tabba@google.com>
      Tested-by: default avatarFuad Tabba <tabba@google.com>
      Reviewed-by: default avatarXiaoyao Li <xiaoyao.li@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a7800aa8
    • Paolo Bonzini's avatar
      fs: Rename anon_inode_getfile_secure() and anon_inode_getfd_secure() · 4f0b9194
      Paolo Bonzini authored
      The call to the inode_init_security_anon() LSM hook is not the sole
      reason to use anon_inode_getfile_secure() or anon_inode_getfd_secure().
      For example, the functions also allow one to create a file with non-zero
      size, without needing a full-blown filesystem.  In this case, you don't
      need a "secure" version, just unique inodes; the current name of the
      functions is confusing and does not explain well the difference with
      the more "standard" anon_inode_getfile() and anon_inode_getfd().
      
      Of course, there is another side of the coin; neither io_uring nor
      userfaultfd strictly speaking need distinct inodes, and it is not
      that clear anymore that anon_inode_create_get{file,fd}() allow the LSM
      to intercept and block the inode's creation.  If one was so inclined,
      anon_inode_getfile_secure() and anon_inode_getfd_secure() could be kept,
      using the shared inode or a new one depending on CONFIG_SECURITY.
      However, this is probably overkill, and potentially a cause of bugs in
      different configurations.  Therefore, just add a comment to io_uring
      and userfaultfd explaining the choice of the function.
      
      While at it, remove the export for what is now anon_inode_create_getfd().
      There is no in-tree module that uses it, and the old name is gone anyway.
      If anybody actually needs the symbol, they can ask or they can just use
      anon_inode_create_getfile(), which will be exported very soon for use
      in KVM.
      Suggested-by: default avatarChristian Brauner <brauner@kernel.org>
      Reviewed-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4f0b9194
  2. 13 Nov, 2023 13 commits
  3. 08 Nov, 2023 1 commit
  4. 31 Oct, 2023 11 commits
    • Paolo Bonzini's avatar
      Merge tag 'kvmarm-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD · 45b890f7
      Paolo Bonzini authored
      KVM/arm64 updates for 6.7
      
       - Generalized infrastructure for 'writable' ID registers, effectively
         allowing userspace to opt-out of certain vCPU features for its guest
      
       - Optimization for vSGI injection, opportunistically compressing MPIDR
         to vCPU mapping into a table
      
       - Improvements to KVM's PMU emulation, allowing userspace to select
         the number of PMCs available to a VM
      
       - Guest support for memory operation instructions (FEAT_MOPS)
      
       - Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing
         bugs and getting rid of useless code
      
       - Changes to the way the SMCCC filter is constructed, avoiding wasted
         memory allocations when not in use
      
       - Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing
         the overhead of errata mitigations
      
       - Miscellaneous kernel and selftest fixes
      45b890f7
    • Paolo Bonzini's avatar
      Merge tag 'kvm-x86-svm-6.7' of https://github.com/kvm-x86/linux into HEAD · be479419
      Paolo Bonzini authored
      KVM SVM changes for 6.7:
      
       - Report KVM_EXIT_SHUTDOWN instead of EINVAL if KVM intercepts SHUTDOWN while
         running an SEV-ES guest.
      
       - Clean up handling "failures" when KVM detects it can't emulate the "skip"
         action for an instruction that has already been partially emulated.  Drop a
         hack in the SVM code that was fudging around the emulator code not giving
         SVM enough information to do the right thing.
      be479419
    • Paolo Bonzini's avatar
      Merge tag 'kvm-x86-pmu-6.7' of https://github.com/kvm-x86/linux into HEAD · d5cde2e0
      Paolo Bonzini authored
      KVM PMU change for 6.7:
      
       - Handle NMI/SMI requests after PMU/PMI requests so that a PMI=>NMI doesn't
         require redoing the entire run loop due to the NMI not being detected until
         the final kvm_vcpu_exit_request() check before entering the guest.
      d5cde2e0
    • Paolo Bonzini's avatar
      Merge tag 'kvm-x86-xen-6.7' of https://github.com/kvm-x86/linux into HEAD · e122d7a1
      Paolo Bonzini authored
      KVM x86 Xen changes for 6.7:
      
       - Omit "struct kvm_vcpu_xen" entirely when CONFIG_KVM_XEN=n.
      
       - Use the fast path directly from the timer callback when delivering Xen timer
         events.  Avoid the problematic races with using the fast path by ensuring
         the hrtimer isn't running when (re)starting the timer or saving the timer
         information (for userspace).
      
       - Follow the lead of upstream Xen and ignore the VCPU_SSHOTTMR_future flag.
      e122d7a1
    • Paolo Bonzini's avatar
      Merge tag 'kvm-x86-mmu-6.7' of https://github.com/kvm-x86/linux into HEAD · f0f59d06
      Paolo Bonzini authored
      KVM x86 MMU changes for 6.7:
      
       - Clean up code that deals with honoring guest MTRRs when the VM has
         non-coherent DMA and host MTRRs are ignored, i.e. EPT is enabled.
      
       - Zap EPT entries when non-coherent DMA assignment stops/start to prevent
         using stale entries with the wrong memtype.
      
       - Don't ignore guest PAT for CR0.CD=1 && KVM_X86_QUIRK_CD_NW_CLEARED=y, as
         there's zero reason to ignore guest PAT if the effective MTRR memtype is WB.
         This will also allow for future optimizations of handling guest MTRR updates
         for VMs with non-coherent DMA and the quirk enabled.
      
       - Harden the fast page fault path to guard against encountering an invalid
         root when walking SPTEs.
      f0f59d06
    • Paolo Bonzini's avatar
      Merge tag 'kvm-x86-misc-6.7' of https://github.com/kvm-x86/linux into HEAD · f292dc8a
      Paolo Bonzini authored
      KVM x86 misc changes for 6.7:
      
       - Add CONFIG_KVM_MAX_NR_VCPUS to allow supporting up to 4096 vCPUs without
         forcing more common use cases to eat the extra memory overhead.
      
       - Add IBPB and SBPB virtualization support.
      
       - Fix a bug where restoring a vCPU snapshot that was taken within 1 second of
         creating the original vCPU would cause KVM to try to synchronize the vCPU's
         TSC and thus clobber the correct TSC being set by userspace.
      
       - Compute guest wall clock using a single TSC read to avoid generating an
         inaccurate time, e.g. if the vCPU is preempted between multiple TSC reads.
      
       - "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which complain
          about a "Firmware Bug" if the bit isn't set for select F/M/S combos.
      
       - Don't apply side effects to Hyper-V's synthetic timer on writes from
         userspace to fix an issue where the auto-enable behavior can trigger
         spurious interrupts, i.e. do auto-enabling only for guest writes.
      
       - Remove an unnecessary kick of all vCPUs when synchronizing the dirty log
         without PML enabled.
      
       - Advertise "support" for non-serializing FS/GS base MSR writes as appropriate.
      
       - Use octal notation for file permissions through KVM x86.
      
       - Fix a handful of typo fixes and warts.
      f292dc8a
    • Paolo Bonzini's avatar
      Merge tag 'kvm-x86-docs-6.7' of https://github.com/kvm-x86/linux into HEAD · fadaf574
      Paolo Bonzini authored
      KVM x86 Documentation updates for 6.7:
      
       - Fix various typos, notably a confusing reference to the non-existent
         "struct kvm_vcpu_event" (the actual structure is kvm_vcpu_events, plural).
      
       - Update x86's kvm_mmu_page documentation to bring it closer to the code
         (this raced with the removal of async zapping and so the documentation is
         already stale; my bad).
      
       - Document the behavior of x86 PMU filters on fixed counters.
      fadaf574
    • Paolo Bonzini's avatar
      Merge tag 'kvm-x86-apic-6.7' of https://github.com/kvm-x86/linux into HEAD · f2336467
      Paolo Bonzini authored
      KVM x86 APIC changes for 6.7:
      
       - Purge VMX's posted interrupt descriptor *before* loading APIC state when
         handling KVM_SET_LAPIC.  Purging the PID after loading APIC state results in
         lost APIC timer IRQs as the APIC timer can be armed as part of loading APIC
         state, i.e. can immediately pend an IRQ if the expiry is in the past.
      
       - Clear the ICR.BUSY bit when handling trap-like x2APIC writes.  This avoids a
         WARN, due to KVM expecting the BUSY bit to be cleared when sending IPIs.
      f2336467
    • Paolo Bonzini's avatar
      Merge tag 'kvm-s390-next-6.7-1' of... · 140139c5
      Paolo Bonzini authored
      Merge tag 'kvm-s390-next-6.7-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
      
      - nested page table management performance counters
      140139c5
    • Paolo Bonzini's avatar
      Merge tag 'kvm-riscv-6.7-1' of https://github.com/kvm-riscv/linux into HEAD · 957eedc7
      Paolo Bonzini authored
      KVM/riscv changes for 6.7
      
      - Smstateen and Zicond support for Guest/VM
      - Virtualized senvcfg CSR for Guest/VM
      - Added Smstateen registers to the get-reg-list selftests
      - Added Zicond to the get-reg-list selftests
      - Virtualized SBI debug console (DBCN) for Guest/VM
      - Added SBI debug console (DBCN) to the get-reg-list selftests
      957eedc7
    • Paolo Bonzini's avatar
      Merge tag 'loongarch-kvm-6.7' of... · ef12ea62
      Paolo Bonzini authored
      Merge tag 'loongarch-kvm-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD
      
      LoongArch KVM changes for v6.7
      
      Add LoongArch's KVM support. Loongson 3A5000/3A6000 supports hardware
      assisted virtualization. With cpu virtualization, there are separate
      hw-supported user mode and kernel mode in guest mode. With memory
      virtualization, there are two-level hw mmu table for guest mode and host
      mode. Also there is separate hw cpu timer with consant frequency in
      guest mode, so that vm can migrate between hosts with different freq.
      Currently, we are able to boot LoongArch Linux Guests.
      
      Few key aspects of KVM LoongArch added by this series are:
      1. Enable kvm hardware function when kvm module is loaded.
      2. Implement VM and vcpu related ioctl interface such as vcpu create,
         vcpu run etc. GET_ONE_REG/SET_ONE_REG ioctl commands are use to
         get general registers one by one.
      3. Hardware access about MMU, timer and csr are emulated in kernel.
      4. Hardwares such as mmio and iocsr device are emulated in user space
         such as IPI, irqchips, pci devices etc.
      ef12ea62
  5. 30 Oct, 2023 10 commits
    • Oliver Upton's avatar
      Merge branch kvm-arm64/pmu_pmcr_n into kvmarm/next · 123f42f0
      Oliver Upton authored
      * kvm-arm64/pmu_pmcr_n:
        : User-defined PMC limit, courtesy Raghavendra Rao Ananta
        :
        : Certain VMMs may want to reserve some PMCs for host use while running a
        : KVM guest. This was a bit difficult before, as KVM advertised all
        : supported counters to the guest. Userspace can now limit the number of
        : advertised PMCs by writing to PMCR_EL0.N, as KVM's sysreg and PMU
        : emulation enforce the specified limit for handling guest accesses.
        KVM: selftests: aarch64: vPMU test for validating user accesses
        KVM: selftests: aarch64: vPMU register test for unimplemented counters
        KVM: selftests: aarch64: vPMU register test for implemented counters
        KVM: selftests: aarch64: Introduce vpmu_counter_access test
        tools: Import arm_pmuv3.h
        KVM: arm64: PMU: Allow userspace to limit PMCR_EL0.N for the guest
        KVM: arm64: Sanitize PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} before first run
        KVM: arm64: Add {get,set}_user for PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}
        KVM: arm64: PMU: Set PMCR_EL0.N for vCPU based on the associated PMU
        KVM: arm64: PMU: Add a helper to read a vCPU's PMCR_EL0
        KVM: arm64: Select default PMU in KVM_ARM_VCPU_INIT handler
        KVM: arm64: PMU: Introduce helpers to set the guest's PMU
      Signed-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      123f42f0
    • Oliver Upton's avatar
      Merge branch kvm-arm64/mops into kvmarm/next · 53ce49ea
      Oliver Upton authored
      * kvm-arm64/mops:
        : KVM support for MOPS, courtesy of Kristina Martsenko
        :
        : MOPS adds new instructions for accelerating memcpy(), memset(), and
        : memmove() operations in hardware. This series brings virtualization
        : support for KVM guests, and allows VMs to run on asymmetrict systems
        : that may have different MOPS implementations.
        KVM: arm64: Expose MOPS instructions to guests
        KVM: arm64: Add handler for MOPS exceptions
      Signed-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      53ce49ea
    • Oliver Upton's avatar
      Merge branch kvm-arm64/writable-id-regs into kvmarm/next · a87a3643
      Oliver Upton authored
      * kvm-arm64/writable-id-regs:
        : Writable ID registers, courtesy of Jing Zhang
        :
        : This series significantly expands the architectural feature set that
        : userspace can manipulate via the ID registers. A new ioctl is defined
        : that makes the mutable fields in the ID registers discoverable to
        : userspace.
        KVM: selftests: Avoid using forced target for generating arm64 headers
        tools headers arm64: Fix references to top srcdir in Makefile
        KVM: arm64: selftests: Test for setting ID register from usersapce
        tools headers arm64: Update sysreg.h with kernel sources
        KVM: selftests: Generate sysreg-defs.h and add to include path
        perf build: Generate arm64's sysreg-defs.h and add to include path
        tools: arm64: Add a Makefile for generating sysreg-defs.h
        KVM: arm64: Document vCPU feature selection UAPIs
        KVM: arm64: Allow userspace to change ID_AA64ZFR0_EL1
        KVM: arm64: Allow userspace to change ID_AA64PFR0_EL1
        KVM: arm64: Allow userspace to change ID_AA64MMFR{0-2}_EL1
        KVM: arm64: Allow userspace to change ID_AA64ISAR{0-2}_EL1
        KVM: arm64: Bump up the default KVM sanitised debug version to v8p8
        KVM: arm64: Reject attempts to set invalid debug arch version
        KVM: arm64: Advertise selected DebugVer in DBGDIDR.Version
        KVM: arm64: Use guest ID register values for the sake of emulation
        KVM: arm64: Document KVM_ARM_GET_REG_WRITABLE_MASKS
        KVM: arm64: Allow userspace to get the writable masks for feature ID registers
      Signed-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      a87a3643
    • Oliver Upton's avatar
      KVM: selftests: Avoid using forced target for generating arm64 headers · 70c7b704
      Oliver Upton authored
      The 'prepare' target that generates the arm64 sysreg headers had no
      prerequisites, so it wound up forcing a rebuild of all KVM selftests
      each invocation. Add a rule for the generated headers and just have
      dependents use that for a prerequisite.
      Reported-by: default avatarNina Schoetterl-Glausch <nsg@linux.ibm.com>
      Fixes: 9697d84c ("KVM: selftests: Generate sysreg-defs.h and add to include path")
      Tested-by: default avatarNina Schoetterl-Glausch <nsg@linux.ibm.com>
      Link: https://lore.kernel.org/r/20231027005439.3142015-3-oliver.upton@linux.devSigned-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      70c7b704
    • Oliver Upton's avatar
      tools headers arm64: Fix references to top srcdir in Makefile · fbb075c1
      Oliver Upton authored
      Aishwarya reports that KVM selftests for arm64 fail with the following
      error:
      
       | make[4]: Entering directory '/tmp/kci/linux/tools/testing/selftests/kvm'
       | Makefile:270: warning: overriding recipe for target
       | '/tmp/kci/linux/build/kselftest/kvm/get-reg-list'
       | Makefile:265: warning: ignoring old recipe for target
       | '/tmp/kci/linux/build/kselftest/kvm/get-reg-list'
       | make -C ../../../../tools/arch/arm64/tools/
       | make[5]: Entering directory '/tmp/kci/linux/tools/arch/arm64/tools'
       | Makefile:10: ../tools/scripts/Makefile.include: No such file or directory
       | make[5]: *** No rule to make target '../tools/scripts/Makefile.include'.
       |  Stop.
      
      It would appear that this only affects builds from the top-level
      Makefile (e.g. make kselftest-all), as $(srctree) is set to ".". Work
      around the issue by shadowing the kselftest naming scheme for the source
      tree variable.
      Reported-by: default avatarAishwarya TCV <aishwarya.tcv@arm.com>
      Fixes: 0359c946 ("tools headers arm64: Update sysreg.h with kernel sources")
      Link: https://lore.kernel.org/r/20231027005439.3142015-2-oliver.upton@linux.devSigned-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      fbb075c1
    • Oliver Upton's avatar
      Merge branch kvm-arm64/sgi-injection into kvmarm/next · 54b44ad2
      Oliver Upton authored
      * kvm-arm64/sgi-injection:
        : vSGI injection improvements + fixes, courtesy Marc Zyngier
        :
        : Avoid linearly searching for vSGI targets using a compressed MPIDR to
        : index a cache. While at it, fix some egregious bugs in KVM's mishandling
        : of vcpuid (user-controlled value) and vcpu_idx.
        KVM: arm64: Clarify the ordering requirements for vcpu/RD creation
        KVM: arm64: vgic-v3: Optimize affinity-based SGI injection
        KVM: arm64: Fast-track kvm_mpidr_to_vcpu() when mpidr_data is available
        KVM: arm64: Build MPIDR to vcpu index cache at runtime
        KVM: arm64: Simplify kvm_vcpu_get_mpidr_aff()
        KVM: arm64: Use vcpu_idx for invalidation tracking
        KVM: arm64: vgic: Use vcpu_idx for the debug information
        KVM: arm64: vgic-v2: Use cpuid from userspace as vcpu_id
        KVM: arm64: vgic-v3: Refactor GICv3 SGI generation
        KVM: arm64: vgic-its: Treat the collection target address as a vcpu_id
        KVM: arm64: vgic: Make kvm_vgic_inject_irq() take a vcpu pointer
      Signed-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      54b44ad2
    • Oliver Upton's avatar
      Merge branch kvm-arm64/stage2-vhe-load into kvmarm/next · df26b779
      Oliver Upton authored
      * kvm-arm64/stage2-vhe-load:
        : Setup stage-2 MMU from vcpu_load() for VHE
        :
        : Unlike nVHE, there is no need to switch the stage-2 MMU around on guest
        : entry/exit in VHE mode as the host is running at EL2. Despite this KVM
        : reloads the stage-2 on every guest entry, which is needless.
        :
        : This series moves the setup of the stage-2 MMU context to vcpu_load()
        : when running in VHE mode. This is likely to be a win across the board,
        : but also allows us to remove an ISB on the guest entry path for systems
        : with one of the speculative AT errata.
        KVM: arm64: Move VTCR_EL2 into struct s2_mmu
        KVM: arm64: Load the stage-2 MMU context in kvm_vcpu_load_vhe()
        KVM: arm64: Rename helpers for VHE vCPU load/put
        KVM: arm64: Reload stage-2 for VMID change on VHE
        KVM: arm64: Restore the stage-2 context in VHE's __tlb_switch_to_host()
        KVM: arm64: Don't zero VTTBR in __tlb_switch_to_host()
      Signed-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      df26b779
    • Oliver Upton's avatar
      Merge branch kvm-arm64/nv-trap-fixes into kvmarm/next · 51e60796
      Oliver Upton authored
      * kvm-arm64/nv-trap-fixes:
        : NV trap forwarding fixes, courtesy Miguel Luis and Marc Zyngier
        :
        :  - Explicitly define the effects of HCR_EL2.NV on EL2 sysregs in the
        :    NV trap encoding
        :
        :  - Make EL2 registers that access AArch32 guest state UNDEF or RAZ/WI
        :    where appropriate for NV guests
        KVM: arm64: Handle AArch32 SPSR_{irq,abt,und,fiq} as RAZ/WI
        KVM: arm64: Do not let a L1 hypervisor access the *32_EL2 sysregs
        KVM: arm64: Refine _EL2 system register list that require trap reinjection
        arm64: Add missing _EL2 encodings
        arm64: Add missing _EL12 encodings
      Signed-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      51e60796
    • Oliver Upton's avatar
      Merge branch kvm-arm64/smccc-filter-cleanups into kvmarm/next · 25a35c1a
      Oliver Upton authored
      * kvm-arm64/smccc-filter-cleanups:
        : Cleanup the management of KVM's SMCCC maple tree
        :
        : Avoid the cost of maintaining the SMCCC filter maple tree if userspace
        : hasn't writen a rule to the filter. While at it, rip out the now
        : unnecessary VM flag to indicate whether or not the SMCCC filter was
        : configured.
        KVM: arm64: Use mtree_empty() to determine if SMCCC filter configured
        KVM: arm64: Only insert reserved ranges when SMCCC filter is used
        KVM: arm64: Add a predicate for testing if SMCCC filter is configured
      Signed-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      25a35c1a
    • Oliver Upton's avatar
      Merge branch kvm-arm64/pmevtyper-filter into kvmarm/next · 7ff7dfe9
      Oliver Upton authored
      * kvm-arm64/pmevtyper-filter:
        : Fixes to KVM's handling of the PMUv3 exception level filtering bits
        :
        :  - NSH (count at EL2) and M (count at EL3) should be stateful when the
        :    respective EL is advertised in the ID registers but have no effect on
        :    event counting.
        :
        :  - NSU and NSK modify the event filtering of EL0 and EL1, respectively.
        :    Though the kernel may not use these bits, other KVM guests might.
        :    Implement these bits exactly as written in the pseudocode if EL3 is
        :    advertised.
        KVM: arm64: Add PMU event filter bits required if EL3 is implemented
        KVM: arm64: Make PMEVTYPER<n>_EL0.NSH RES0 if EL2 isn't advertised
      Signed-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      7ff7dfe9