1. 20 Oct, 2024 12 commits
    • Paolo Bonzini's avatar
      Merge tag 'kvmarm-fixes-6.12-2' of... · ddd5c582
      Paolo Bonzini authored
      Merge tag 'kvmarm-fixes-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
      
      KVM/arm64 fixes for 6.12, take #2
      
      - Fix the guest view of the ID registers, making the relevant fields
        writable from userspace (affecting ID_AA64DFR0_EL1 and ID_AA64PFR1_EL1)
      
      - Correcly expose S1PIE to guests, fixing a regression introduced
        in 6.12-rc1 with the S1POE support
      
      - Fix the recycling of stage-2 shadow MMUs by tracking the context
        (are we allowed to block or not) as well as the recycling state
      
      - Address a couple of issues with the vgic when userspace misconfigures
        the emulation, resulting in various splats. Headaches courtesy
        of our Syzkaller friends
      ddd5c582
    • Cyan Yang's avatar
      RISCV: KVM: use raw_spinlock for critical section in imsic · 3ec4350d
      Cyan Yang authored
      For the external interrupt updating procedure in imsic, there was a
      spinlock to protect it already. But since it should not be preempted in
      any cases, we should turn to use raw_spinlock to prevent any preemption
      in case PREEMPT_RT was enabled.
      Signed-off-by: default avatarCyan Yang <cyan.yang@sifive.com>
      Reviewed-by: default avatarYong-Xuan Wang <yongxuan.wang@sifive.com>
      Reviewed-by: default avatarAnup Patel <anup@brainfault.org>
      Message-ID: <20240919160126.44487-1-cyan.yang@sifive.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3ec4350d
    • Sean Christopherson's avatar
      KVM: selftests: Fix out-of-bounds reads in CPUID test's array lookups · 773cca18
      Sean Christopherson authored
      When looking for a "mangled", i.e. dynamic, CPUID entry, terminate the
      walk based on the number of array _entries_, not the size in bytes of
      the array.  Iterating based on the total size of the array can result in
      false passes, e.g. if the random data beyond the array happens to match
      a CPUID entry's function and index.
      
      Fixes: fb18d053 ("selftest: kvm: x86: test KVM_GET_CPUID2 and guest visible CPUIDs against KVM_GET_SUPPORTED_CPUID")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-ID: <20241003234337.273364-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      773cca18
    • Vitaly Kuznetsov's avatar
      KVM: selftests: x86: Avoid using SSE/AVX instructions · 9a400068
      Vitaly Kuznetsov authored
      Some distros switched gcc to '-march=x86-64-v3' by default and while it's
      hard to find a CPU which doesn't support it today, many KVM selftests fail
      with
      
        ==== Test Assertion Failure ====
          lib/x86_64/processor.c:570: Unhandled exception in guest
          pid=72747 tid=72747 errno=4 - Interrupted system call
          Unhandled exception '0x6' at guest RIP '0x4104f7'
      
      The failure is easy to reproduce elsewhere with
      
         $ make clean && CFLAGS='-march=x86-64-v3' make -j && ./x86_64/kvm_pv_test
      
      The root cause of the problem seems to be that with '-march=x86-64-v3' GCC
      uses AVX* instructions (VMOVQ in the example above) and without prior
      XSETBV() in the guest this results in #UD. It is certainly possible to add
      it there, e.g. the following saves the day as well:
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-ID: <20240920154422.2890096-1-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      9a400068
    • Sean Christopherson's avatar
      KVM: nSVM: Ignore nCR3[4:0] when loading PDPTEs from memory · f559b2e9
      Sean Christopherson authored
      Ignore nCR3[4:0] when loading PDPTEs from memory for nested SVM, as bits
      4:0 of CR3 are ignored when PAE paging is used, and thus VMRUN doesn't
      enforce 32-byte alignment of nCR3.
      
      In the absolute worst case scenario, failure to ignore bits 4:0 can result
      in an out-of-bounds read, e.g. if the target page is at the end of a
      memslot, and the VMM isn't using guard pages.
      
      Per the APM:
      
        The CR3 register points to the base address of the page-directory-pointer
        table. The page-directory-pointer table is aligned on a 32-byte boundary,
        with the low 5 address bits 4:0 assumed to be 0.
      
      And the SDM's much more explicit:
      
        4:0    Ignored
      
      Note, KVM gets this right when loading PDPTRs, it's only the nSVM flow
      that is broken.
      
      Fixes: e4e517b4 ("KVM: MMU: Do not unconditionally read PDPTE from guest memory")
      Reported-by: default avatarKirk Swidowski <swidowski@google.com>
      Cc: Andy Nguyen <theflow@google.com>
      Cc: 3pvd <3pvd@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-ID: <20241009140838.1036226-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f559b2e9
    • Maxim Levitsky's avatar
      KVM: VMX: reset the segment cache after segment init in vmx_vcpu_reset() · 731285fb
      Maxim Levitsky authored
      Reset the segment cache after segment initialization in vmx_vcpu_reset()
      to harden KVM against caching stale/uninitialized data.  Without the
      recent fix to bypass the cache in kvm_arch_vcpu_put(), the following
      scenario is possible:
      
       - vCPU is just created, and the vCPU thread is preempted before
         SS.AR_BYTES is written in vmx_vcpu_reset().
      
       - When scheduling out the vCPU task, kvm_arch_vcpu_in_kernel() =>
         vmx_get_cpl() reads and caches '0' for SS.AR_BYTES.
      
       - vmx_vcpu_reset() => seg_setup() configures SS.AR_BYTES, but doesn't
         invoke vmx_segment_cache_clear() to invalidate the cache.
      
      As a result, KVM retains a stale value in the cache, which can be read,
      e.g. via KVM_GET_SREGS.  Usually this is not a problem because the VMX
      segment cache is reset on each VM-Exit, but if the userspace VMM (e.g KVM
      selftests) reads and writes system registers just after the vCPU was
      created, _without_ modifying SS.AR_BYTES, userspace will write back the
      stale '0' value and ultimately will trigger a VM-Entry failure due to
      incorrect SS segment type.
      
      Invalidating the cache after writing the VMCS doesn't address the general
      issue of cache accesses from IRQ context being unsafe, but it does prevent
      KVM from clobbering the VMCS, i.e. mitigates the harm done _if_ KVM has a
      bug that results in an unsafe cache access.
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Fixes: 2fb92db1 ("KVM: VMX: Cache vmcs segment fields")
      [sean: rework changelog to account for previous patch]
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-ID: <20241009175002.1118178-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      731285fb
    • Sean Christopherson's avatar
      KVM: x86: Clean up documentation for KVM_X86_QUIRK_SLOT_ZAP_ALL · 5a279842
      Sean Christopherson authored
      Massage the documentation for KVM_X86_QUIRK_SLOT_ZAP_ALL to call out that
      it applies to moved memslots as well as deleted memslots, to avoid KVM's
      "fast zap" terminology (which has no meaning for userspace), and to reword
      the documented targeted zap behavior to specifically say that KVM _may_
      zap a subset of all SPTEs.  As evidenced by the fix to zap non-leafs SPTEs
      with gPTEs, formally documenting KVM's exact internal behavior is risky
      and unnecessary.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-ID: <20241009192345.1148353-4-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5a279842
    • Sean Christopherson's avatar
      KVM: x86/mmu: Add lockdep assert to enforce safe usage of kvm_unmap_gfn_range() · 28cf4978
      Sean Christopherson authored
      Add a lockdep assertion in kvm_unmap_gfn_range() to ensure that either
      mmu_invalidate_in_progress is elevated, or that the range is being zapped
      due to memslot removal (loosely detected by slots_lock being held).
      Zapping SPTEs without mmu_invalidate_{in_progress,seq} protection is unsafe
      as KVM's page fault path snapshots state before acquiring mmu_lock, and
      thus can create SPTEs with stale information if vCPUs aren't forced to
      retry faults (due to seeing an in-progress or past MMU invalidation).
      
      Memslot removal is a special case, as the memslot is retrieved outside of
      mmu_invalidate_seq, i.e. doesn't use the "standard" protections, and
      instead relies on SRCU synchronization to ensure any in-flight page faults
      are fully resolved before zapping SPTEs.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-ID: <20241009192345.1148353-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      28cf4978
    • Sean Christopherson's avatar
      KVM: x86/mmu: Zap only SPs that shadow gPTEs when deleting memslot · 58a20a94
      Sean Christopherson authored
      When performing a targeted zap on memslot removal, zap only MMU pages that
      shadow guest PTEs, as zapping all SPs that "match" the gfn is inexact and
      unnecessary.  Furthermore, for_each_gfn_valid_sp() arguably shouldn't
      exist, because it doesn't do what most people would it expect it to do.
      The "round gfn for level" adjustment that is done for direct SPs (no gPTE)
      means that the exact gfn comparison will not get a match, even when a SP
      does "cover" a gfn, or was even created specifically for a gfn.
      
      For memslot deletion specifically, KVM's behavior will vary significantly
      based on the size and alignment of a memslot, and in weird ways.  E.g. for
      a 4KiB memslot, KVM will zap more SPs if the slot is 1GiB aligned than if
      it's only 4KiB aligned.  And as described below, zapping SPs in the
      aligned case overzaps for direct MMUs, as odds are good the upper-level
      SPs are serving other memslots.
      
      To iterate over all potentially-relevant gfns, KVM would need to make a
      pass over the hash table for each level, with the gfn used for lookup
      rounded for said level.  And then check that the SP is of the correct
      level, too, e.g. to avoid over-zapping.
      
      But even then, KVM would massively overzap, as processing every level is
      all but guaranteed to zap SPs that serve other memslots, especially if the
      memslot being removed is relatively small.  KVM could mitigate that issue
      by processing only levels that can be possible guest huge pages, i.e. are
      less likely to be re-used for other memslot, but while somewhat logical,
      that's quite arbitrary and would be a bit of a mess to implement.
      
      So, zap only SPs with gPTEs, as the resulting behavior is easy to describe,
      is predictable, and is explicitly minimal, i.e. KVM only zaps SPs that
      absolutely must be zapped.
      
      Cc: Yan Zhao <yan.y.zhao@intel.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarYan Zhao <yan.y.zhao@intel.com>
      Tested-by: default avatarYan Zhao <yan.y.zhao@intel.com>
      Message-ID: <20241009192345.1148353-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      58a20a94
    • Kirill A. Shutemov's avatar
      x86/kvm: Override default caching mode for SEV-SNP and TDX · 8e690b81
      Kirill A. Shutemov authored
      AMD SEV-SNP and Intel TDX have limited access to MTRR: either it is not
      advertised in CPUID or it cannot be programmed (on TDX, due to #VE on
      CR0.CD clear).
      
      This results in guests using uncached mappings where it shouldn't and
      pmd/pud_set_huge() failures due to non-uniform memory type reported by
      mtrr_type_lookup().
      
      Override MTRR state, making it WB by default as the kernel does for
      Hyper-V guests.
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Suggested-by: default avatarBinbin Wu <binbin.wu@intel.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Reviewed-by: default avatarJuergen Gross <jgross@suse.com>
      Message-ID: <20241015095818.357915-1-kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      8e690b81
    • Dr. David Alan Gilbert's avatar
      KVM: Remove unused kvm_vcpu_gfn_to_pfn_atomic · bc07eea2
      Dr. David Alan Gilbert authored
      The last use of kvm_vcpu_gfn_to_pfn_atomic was removed by commit
      1bbc60d0 ("KVM: x86/mmu: Remove MMU auditing")
      
      Remove it.
      Signed-off-by: default avatarDr. David Alan Gilbert <linux@treblig.org>
      Message-ID: <20241001141354.18009-3-linux@treblig.org>
      [Adjust Documentation/virt/kvm/locking.rst. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bc07eea2
    • Dr. David Alan Gilbert's avatar
      KVM: Remove unused kvm_vcpu_gfn_to_pfn · 88a387cf
      Dr. David Alan Gilbert authored
      The last use of kvm_vcpu_gfn_to_pfn was removed by commit
      b1624f99 ("KVM: Remove kvm_vcpu_gfn_to_page() and kvm_vcpu_gpa_to_page()")
      
      Remove it.
      Signed-off-by: default avatarDr. David Alan Gilbert <linux@treblig.org>
      Message-ID: <20241001141354.18009-2-linux@treblig.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      88a387cf
  2. 11 Oct, 2024 1 commit
    • Marc Zyngier's avatar
      KVM: arm64: Don't eagerly teardown the vgic on init error · df5fd75e
      Marc Zyngier authored
      As there is very little ordering in the KVM API, userspace can
      instanciate a half-baked GIC (missing its memory map, for example)
      at almost any time.
      
      This means that, with the right timing, a thread running vcpu-0
      can enter the kernel without a GIC configured and get a GIC created
      behind its back by another thread. Amusingly, it will pick up
      that GIC and start messing with the data structures without the
      GIC having been fully initialised.
      
      Similarly, a thread running vcpu-1 can enter the kernel, and try
      to init the GIC that was previously created. Since this GIC isn't
      properly configured (no memory map), it fails to correctly initialise.
      
      And that's the point where we decide to teardown the GIC, freeing all
      its resources. Behind vcpu-0's back. Things stop pretty abruptly,
      with a variety of symptoms.  Clearly, this isn't good, we should be
      a bit more careful about this.
      
      It is obvious that this guest is not viable, as it is missing some
      important part of its configuration. So instead of trying to tear
      bits of it down, let's just mark it as *dead*. It means that any
      further interaction from userspace will result in -EIO. The memory
      will be released on the "normal" path, when userspace gives up.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarAlexander Potapenko <glider@google.com>
      Reviewed-by: default avatarOliver Upton <oliver.upton@linux.dev>
      Link: https://lore.kernel.org/r/20241009183603.3221824-1-maz@kernel.orgSigned-off-by: default avatarMarc Zyngier <maz@kernel.org>
      df5fd75e
  3. 08 Oct, 2024 7 commits
    • Mark Brown's avatar
      KVM: arm64: Expose S1PIE to guests · d4a89e5a
      Mark Brown authored
      Prior to commit 70ed7238 ("KVM: arm64: Sanitise ID_AA64MMFR3_EL1")
      we just exposed the santised view of ID_AA64MMFR3_EL1 to guests, meaning
      that they saw both TCRX and S1PIE if present on the host machine. That
      commit added VMM control over the contents of the register and exposed
      S1POE but removed S1PIE, meaning that the extension is no longer visible
      to guests. Reenable support for S1PIE with VMM control.
      
      Fixes: 70ed7238 ("KVM: arm64: Sanitise ID_AA64MMFR3_EL1")
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Reviewed-by: default avatarJoey Gouly <joey.gouly@arm.com>
      Link: https://lore.kernel.org/r/20241005-kvm-arm64-fix-s1pie-v1-1-5901f02de749@kernel.orgSigned-off-by: default avatarMarc Zyngier <maz@kernel.org>
      d4a89e5a
    • Oliver Upton's avatar
      KVM: arm64: nv: Clarify safety of allowing TLBI unmaps to reschedule · 79cc6cdb
      Oliver Upton authored
      There's been a decent amount of attention around unmaps of nested MMUs,
      and TLBI handling is no exception to this. Add a comment clarifying why
      it is safe to reschedule during a TLBI unmap, even without a reference
      on the MMU in progress.
      Signed-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      Link: https://lore.kernel.org/r/20241007233028.2236133-5-oliver.upton@linux.devSigned-off-by: default avatarMarc Zyngier <maz@kernel.org>
      79cc6cdb
    • Oliver Upton's avatar
      KVM: arm64: nv: Punt stage-2 recycling to a vCPU request · c268f204
      Oliver Upton authored
      Currently, when a nested MMU is repurposed for some other MMU context,
      KVM unmaps everything during vcpu_load() while holding the MMU lock for
      write. This is quite a performance bottleneck for large nested VMs, as
      all vCPU scheduling will spin until the unmap completes.
      
      Start punting the MMU cleanup to a vCPU request, where it is then
      possible to periodically release the MMU lock and CPU in the presence of
      contention.
      
      Ensure that no vCPU winds up using a stale MMU by tracking the pending
      unmap on the S2 MMU itself and requesting an unmap on every vCPU that
      finds it.
      Signed-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      Link: https://lore.kernel.org/r/20241007233028.2236133-4-oliver.upton@linux.devSigned-off-by: default avatarMarc Zyngier <maz@kernel.org>
      c268f204
    • Oliver Upton's avatar
      KVM: arm64: nv: Do not block when unmapping stage-2 if disallowed · 3c164eb9
      Oliver Upton authored
      Right now the nested code allows unmap operations on a shadow stage-2 to
      block unconditionally. This is wrong in a couple places, such as a
      non-blocking MMU notifier or on the back of a sched_in() notifier as
      part of shadow MMU recycling.
      
      Carry through whether or not blocking is allowed to
      kvm_pgtable_stage2_unmap(). This 'fixes' an issue where stage-2 MMU
      reclaim would precipitate a stack overflow from a pile of kvm_sched_in()
      callbacks, all trying to recycle a stage-2 MMU.
      Signed-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      Link: https://lore.kernel.org/r/20241007233028.2236133-3-oliver.upton@linux.devSigned-off-by: default avatarMarc Zyngier <maz@kernel.org>
      3c164eb9
    • Oliver Upton's avatar
      KVM: arm64: nv: Keep reference on stage-2 MMU when scheduled out · 6ded46b5
      Oliver Upton authored
      If a vCPU is scheduling out and not in WFI emulation, it is highly
      likely it will get scheduled again soon and reuse the MMU it had before.
      Dropping the MMU at vcpu_put() can have some unfortunate consequences,
      as the MMU could get reclaimed and used in a different context, forcing
      another 'cold start' on an otherwise active MMU.
      
      Avoid that altogether by keeping a reference on the MMU if the vCPU is
      scheduling out, ensuring that another vCPU cannot reclaim it while the
      current vCPU is away. Since there are more MMUs than vCPUs, this does
      not affect the guarantee that an unused MMU is available at any time.
      
      Furthermore, this makes the vcpu->arch.hw_mmu ~stable in preemptible
      code, at least for where it matters in the stage-2 abort path. Yes, the
      MMU can change across WFI emulation, but there isn't even a use case
      where this would matter.
      Signed-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      Link: https://lore.kernel.org/r/20241007233028.2236133-2-oliver.upton@linux.devSigned-off-by: default avatarMarc Zyngier <maz@kernel.org>
      6ded46b5
    • Oliver Upton's avatar
      KVM: arm64: Unregister redistributor for failed vCPU creation · ae8f8b37
      Oliver Upton authored
      Alex reports that syzkaller has managed to trigger a use-after-free when
      tearing down a VM:
      
        BUG: KASAN: slab-use-after-free in kvm_put_kvm+0x300/0xe68 virt/kvm/kvm_main.c:5769
        Read of size 8 at addr ffffff801c6890d0 by task syz.3.2219/10758
      
        CPU: 3 UID: 0 PID: 10758 Comm: syz.3.2219 Not tainted 6.11.0-rc6-dirty #64
        Hardware name: linux,dummy-virt (DT)
        Call trace:
         dump_backtrace+0x17c/0x1a8 arch/arm64/kernel/stacktrace.c:317
         show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:324
         __dump_stack lib/dump_stack.c:93 [inline]
         dump_stack_lvl+0x94/0xc0 lib/dump_stack.c:119
         print_report+0x144/0x7a4 mm/kasan/report.c:377
         kasan_report+0xcc/0x128 mm/kasan/report.c:601
         __asan_report_load8_noabort+0x20/0x2c mm/kasan/report_generic.c:381
         kvm_put_kvm+0x300/0xe68 virt/kvm/kvm_main.c:5769
         kvm_vm_release+0x4c/0x60 virt/kvm/kvm_main.c:1409
         __fput+0x198/0x71c fs/file_table.c:422
         ____fput+0x20/0x30 fs/file_table.c:450
         task_work_run+0x1cc/0x23c kernel/task_work.c:228
         do_notify_resume+0x144/0x1a0 include/linux/resume_user_mode.h:50
         el0_svc+0x64/0x68 arch/arm64/kernel/entry-common.c:169
         el0t_64_sync_handler+0x90/0xfc arch/arm64/kernel/entry-common.c:730
         el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
      
      Upon closer inspection, it appears that we do not properly tear down the
      MMIO registration for a vCPU that fails creation late in the game, e.g.
      a vCPU w/ the same ID already exists in the VM.
      
      It is important to consider the context of commit that introduced this bug
      by moving the unregistration out of __kvm_vgic_vcpu_destroy(). That
      change correctly sought to avoid an srcu v. config_lock inversion by
      breaking up the vCPU teardown into two parts, one guarded by the
      config_lock.
      
      Fix the use-after-free while avoiding lock inversion by adding a
      special-cased unregistration to __kvm_vgic_vcpu_destroy(). This is safe
      because failed vCPUs are torn down outside of the config_lock.
      
      Cc: stable@vger.kernel.org
      Fixes: f6165067 ("KVM: arm64: vgic: Don't hold config_lock while unregistering redistributors")
      Reported-by: default avatarAlexander Potapenko <glider@google.com>
      Signed-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      Link: https://lore.kernel.org/r/20241007223909.2157336-1-oliver.upton@linux.devSigned-off-by: default avatarMarc Zyngier <maz@kernel.org>
      ae8f8b37
    • Marc Zyngier's avatar
      Merge branch kvm-arm64/idregs-6.12 into kvmarm/fixes · 9b7c3dd5
      Marc Zyngier authored
      * kvm-arm64/idregs-6.12:
        : .
        : Make some fields of ID_AA64DFR0_EL1 and ID_AA64PFR1_EL1
        : writable from userspace, so that a VMM can influence the
        : set of guest-visible features.
        :
        : - for ID_AA64DFR0_EL1: DoubleLock, WRPs, PMUVer and DebugVer
        :   are writable (courtesy of Shameer Kolothum)
        :
        : - for ID_AA64PFR1_EL1: BT, SSBS, CVS2_frac are writable
        :   (courtesy of Shaoqin Huang)
        : .
        KVM: selftests: aarch64: Add writable test for ID_AA64PFR1_EL1
        KVM: arm64: Allow userspace to change ID_AA64PFR1_EL1
        KVM: arm64: Use kvm_has_feat() to check if FEAT_SSBS is advertised to the guest
        KVM: arm64: Disable fields that KVM doesn't know how to handle in ID_AA64PFR1_EL1
        KVM: arm64: Make the exposed feature bits in AA64DFR0_EL1 writable from userspace
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      9b7c3dd5
  4. 06 Oct, 2024 3 commits
    • Paolo Bonzini's avatar
      Merge tag 'kvmarm-fixes-6.12-1' of... · c8d430db
      Paolo Bonzini authored
      Merge tag 'kvmarm-fixes-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
      
      KVM/arm64 fixes for 6.12, take #1
      
      - Fix pKVM error path on init, making sure we do not change critical
        system registers as we're about to fail
      
      - Make sure that the host's vector length is at capped by a value
        common to all CPUs
      
      - Fix kvm_has_feat*() handling of "negative" features, as the current
        code is pretty broken
      
      - Promote Joey to the status of official reviewer, while James steps
        down -- hopefully only temporarly
      c8d430db
    • Paolo Bonzini's avatar
      x86/reboot: emergency callbacks are now registered by common KVM code · 2a5fe5a0
      Paolo Bonzini authored
      Guard them with CONFIG_KVM_X86_COMMON rather than the two vendor modules.
      In practice this has no functional change, because CONFIG_KVM_X86_COMMON
      is set if and only if at least one vendor-specific module is being built.
      However, it is cleaner to specify CONFIG_KVM_X86_COMMON for functions that
      are used in kvm.ko.
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Fixes: 590b09b1 ("KVM: x86: Register "emergency disable" callbacks when virt is enabled")
      Fixes: 6d55a942 ("x86/reboot: Unconditionally define cpu_emergency_virt_cb typedef")
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2a5fe5a0
    • Paolo Bonzini's avatar
      KVM: x86: leave kvm.ko out of the build if no vendor module is requested · ea4290d7
      Paolo Bonzini authored
      kvm.ko is nothing but library code shared by kvm-intel.ko and kvm-amd.ko.
      It provides no functionality on its own and it is unnecessary unless one
      of the vendor-specific module is compiled.  In particular, /dev/kvm is
      not created until one of kvm-intel.ko or kvm-amd.ko is loaded.
      
      Use CONFIG_KVM to decide if it is built-in or a module, but use the
      vendor-specific modules for the actual decision on whether to build it.
      
      This also fixes a build failure when CONFIG_KVM_INTEL and CONFIG_KVM_AMD
      are both disabled.  The cpu_emergency_register_virt_callback() function
      is called from kvm.ko, but it is only defined if at least one of
      CONFIG_KVM_INTEL and CONFIG_KVM_AMD is provided.
      
      Fixes: 590b09b1 ("KVM: x86: Register "emergency disable" callbacks when virt is enabled")
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ea4290d7
  5. 03 Oct, 2024 2 commits
    • Paolo Bonzini's avatar
      KVM: x86/mmu: fix KVM_X86_QUIRK_SLOT_ZAP_ALL for shadow MMU · fcd1ec9c
      Paolo Bonzini authored
      As was tried in commit 4e103134 ("KVM: x86/mmu: Zap only the relevant
      pages when removing a memslot"), all shadow pages, i.e. non-leaf SPTEs,
      need to be zapped.  All of the accounting for a shadow page is tied to the
      memslot, i.e. the shadow page holds a reference to the memslot, for all
      intents and purposes.  Deleting the memslot without removing all relevant
      shadow pages, as is done when KVM_X86_QUIRK_SLOT_ZAP_ALL is disabled,
      results in NULL pointer derefs when tearing down the VM.
      
      Reintroduce from that commit the code that walks the whole memslot when
      there are active shadow MMU pages.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fcd1ec9c
    • Marc Zyngier's avatar
      KVM: arm64: Fix kvm_has_feat*() handling of negative features · a1d402ab
      Marc Zyngier authored
      Oliver reports that the kvm_has_feat() helper is not behaviing as
      expected for negative feature. On investigation, the main issue
      seems to be caused by the following construct:
      
       #define get_idreg_field(kvm, id, fld)				\
       	(id##_##fld##_SIGNED ?					\
      	 get_idreg_field_signed(kvm, id, fld) :			\
      	 get_idreg_field_unsigned(kvm, id, fld))
      
      where one side of the expression evaluates as something signed,
      and the other as something unsigned. In retrospect, this is totally
      braindead, as the compiler converts this into an unsigned expression.
      When compared to something that is 0, the test is simply elided.
      
      Epic fail. Similar issue exists in the expand_field_sign() macro.
      
      The correct way to handle this is to chose between signed and unsigned
      comparisons, so that both sides of the ternary expression are of the
      same type (bool).
      
      In order to keep the code readable (sort of), we introduce new
      comparison primitives taking an operator as a parameter, and
      rewrite the kvm_has_feat*() helpers in terms of these primitives.
      
      Fixes: c62d7a23 ("KVM: arm64: Add feature checking helpers")
      Reported-by: default avatarOliver Upton <oliver.upton@linux.dev>
      Tested-by: default avatarOliver Upton <oliver.upton@linux.dev>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20241002204239.2051637-1-maz@kernel.orgSigned-off-by: default avatarMarc Zyngier <maz@kernel.org>
      a1d402ab
  6. 01 Oct, 2024 4 commits
  7. 29 Sep, 2024 11 commits
    • Linus Torvalds's avatar
      Linux 6.12-rc1 · 9852d85e
      Linus Torvalds authored
      9852d85e
    • Linus Torvalds's avatar
      x86: kvm: fix build error · 3f749bef
      Linus Torvalds authored
      The cpu_emergency_register_virt_callback() function is used
      unconditionally by the x86 kvm code, but it is declared (and defined)
      conditionally:
      
        #if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD)
        void cpu_emergency_register_virt_callback(cpu_emergency_virt_cb *callback);
        ...
      
      leading to a build error when neither KVM_INTEL nor KVM_AMD support is
      enabled:
      
        arch/x86/kvm/x86.c: In function ‘kvm_arch_enable_virtualization’:
        arch/x86/kvm/x86.c:12517:9: error: implicit declaration of function ‘cpu_emergency_register_virt_callback’ [-Wimplicit-function-declaration]
        12517 |         cpu_emergency_register_virt_callback(kvm_x86_ops.emergency_disable_virtualization_cpu);
              |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        arch/x86/kvm/x86.c: In function ‘kvm_arch_disable_virtualization’:
        arch/x86/kvm/x86.c:12522:9: error: implicit declaration of function ‘cpu_emergency_unregister_virt_callback’ [-Wimplicit-function-declaration]
        12522 |         cpu_emergency_unregister_virt_callback(kvm_x86_ops.emergency_disable_virtualization_cpu);
              |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      Fix the build by defining empty helper functions the same way the old
      cpu_emergency_disable_virtualization() function was dealt with for the
      same situation.
      
      Maybe we could instead have made the call sites conditional, since the
      callers (kvm_arch_{en,dis}able_virtualization()) have an empty weak
      fallback.  I'll leave that to the kvm people to argue about, this at
      least gets the build going for that particular config.
      
      Fixes: 590b09b1 ("KVM: x86: Register "emergency disable" callbacks when virt is enabled")
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Kai Huang <kai.huang@intel.com>
      Cc: Chao Gao <chao.gao@intel.com>
      Cc: Farrah Chen <farrah.chen@intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3f749bef
    • Linus Torvalds's avatar
      Merge tag 'mailbox-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox · e7ed3436
      Linus Torvalds authored
      Pull mailbox updates from Jassi Brar:
      
       - fix kconfig dependencies (mhu-v3, omap2+)
      
       - use devie name instead of genereic imx_mu_chan as interrupt name
         (imx)
      
       - enable sa8255p and qcs8300 ipc controllers (qcom)
      
       - Fix timeout during suspend mode (bcm2835)
      
       - convert to use use of_property_match_string (mailbox)
      
       - enable mt8188 (mediatek)
      
       - use devm_clk_get_enabled helpers (spreadtrum)
      
       - fix device-id typo (rockchip)
      
      * tag 'mailbox-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox:
        mailbox, remoteproc: omap2+: fix compile testing
        dt-bindings: mailbox: qcom-ipcc: Document QCS8300 IPCC
        dt-bindings: mailbox: qcom-ipcc: document the support for SA8255p
        dt-bindings: mailbox: mtk,adsp-mbox: Add compatible for MT8188
        mailbox: Use of_property_match_string() instead of open-coding
        mailbox: bcm2835: Fix timeout during suspend mode
        mailbox: sprd: Use devm_clk_get_enabled() helpers
        mailbox: rockchip: fix a typo in module autoloading
        mailbox: imx: use device name in interrupt name
        mailbox: ARM_MHU_V3 should depend on ARM64
      e7ed3436
    • Linus Torvalds's avatar
      Merge tag 'i2c-for-6.12-rc1-additional_fixes' of... · 907537f5
      Linus Torvalds authored
      Merge tag 'i2c-for-6.12-rc1-additional_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
      
      Pull i2c fixes from Wolfram Sang:
      
       - fix DesignWare driver ENABLE-ABORT sequence, ensuring ABORT can
         always be sent when needed
      
       - check for PCLK in the SynQuacer controller as an optional clock,
         allowing ACPI to directly provide the clock rate
      
       - KEBA driver Kconfig dependency fix
      
       - fix XIIC driver power suspend sequence
      
      * tag 'i2c-for-6.12-rc1-additional_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: xiic: Fix pm_runtime_set_suspended() with runtime pm enabled
        i2c: keba: I2C_KEBA should depend on KEBA_CP500
        i2c: synquacer: Deal with optional PCLK correctly
        i2c: designware: fix controller is holding SCL low while ENABLE bit is disabled
      907537f5
    • Linus Torvalds's avatar
      Merge tag 'dma-mapping-6.12-2024-09-29' of git://git.infradead.org/users/hch/dma-mapping · b81b78da
      Linus Torvalds authored
      Pull dma-mapping fix from Christoph Hellwig:
      
       - handle chained SGLs in the new tracing code (Christoph Hellwig)
      
      * tag 'dma-mapping-6.12-2024-09-29' of git://git.infradead.org/users/hch/dma-mapping:
        dma-mapping: fix DMA API tracing for chained scatterlists
      b81b78da
    • Linus Torvalds's avatar
      Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 3ed7df08
      Linus Torvalds authored
      Pull more SCSI updates from James Bottomley:
       "These are mostly minor updates.
      
        There are two drivers (lpfc and mpi3mr) which missed the initial
        pull and a core change to retry a start/stop unit which affect
        suspend/resume"
      
      * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (32 commits)
        scsi: lpfc: Update lpfc version to 14.4.0.5
        scsi: lpfc: Support loopback tests with VMID enabled
        scsi: lpfc: Revise TRACE_EVENT log flag severities from KERN_ERR to KERN_WARNING
        scsi: lpfc: Ensure DA_ID handling completion before deleting an NPIV instance
        scsi: lpfc: Fix kref imbalance on fabric ndlps from dev_loss_tmo handler
        scsi: lpfc: Restrict support for 32 byte CDBs to specific HBAs
        scsi: lpfc: Update phba link state conditional before sending CMF_SYNC_WQE
        scsi: lpfc: Add ELS_RSP cmd to the list of WQEs to flush in lpfc_els_flush_cmd()
        scsi: mpi3mr: Update driver version to 8.12.0.0.50
        scsi: mpi3mr: Improve wait logic while controller transitions to READY state
        scsi: mpi3mr: Update MPI Headers to revision 34
        scsi: mpi3mr: Use firmware-provided timestamp update interval
        scsi: mpi3mr: Enhance the Enable Controller retry logic
        scsi: sd: Fix off-by-one error in sd_read_block_characteristics()
        scsi: pm8001: Do not overwrite PCI queue mapping
        scsi: scsi_debug: Remove a useless memset()
        scsi: pmcraid: Convert comma to semicolon
        scsi: sd: Retry START STOP UNIT commands
        scsi: mpi3mr: A performance fix
        scsi: ufs: qcom: Update MODE_MAX cfg_bw value
        ...
      3ed7df08
    • Linus Torvalds's avatar
      Merge tag 'bcachefs-2024-09-28' of git://evilpiepirate.org/bcachefs · 9f9a5347
      Linus Torvalds authored
      Pull more bcachefs updates from Kent Overstreet:
       "Assorted minor syzbot fixes, and for bigger stuff:
      
        Fix two disk accounting rewrite bugs:
      
         - Disk accounting keys use the version field of bkey so that journal
           replay can tell which updates have been applied to the btree.
      
           This is set in the transaction commit path, after we've gotten our
           journal reservation (and our time ordering), but the
           BCH_TRANS_COMMIT_skip_accounting_apply flag that journal replay
           uses was incorrectly skipping this for new updates generated prior
           to journal replay.
      
           This fixes the underlying cause of an assertion pop in
           disk_accounting_read.
      
         - A couple of fixes for disk accounting + device removal.
      
           Checking if acocunting replicas entries were marked in the
           superblock was being done at the wrong point, when deltas in the
           journal could still zero them out, and then additionally we'd try
           to add a missing replicas entry to the superblock without checking
           if it referred to an invalid (removed) device.
      
        A whole slew of repair fixes:
      
         - fix infinite loop in propagate_key_to_snapshot_leaves(), this fixes
           an infinite loop when repairing a filesystem with many snapshots
      
         - fix incorrect transaction restart handling leading to occasional
           "fsck counted ..." warnings
      
         - fix warning in __bch2_fsck_err() for bkey fsck errors
      
         - check_inode() in fsck now correctly checks if the filesystem was
           clean
      
         - there shouldn't be pending logged ops if the fs was clean, we now
           check for this
      
         - remove_backpointer() doesn't remove a dirent that doesn't actually
           point to the inode
      
         - many more fsck errors are AUTOFIX"
      
      * tag 'bcachefs-2024-09-28' of git://evilpiepirate.org/bcachefs: (35 commits)
        bcachefs: check_subvol_path() now prints subvol root inode
        bcachefs: remove_backpointer() now checks if dirent points to inode
        bcachefs: dirent_points_to_inode() now warns on mismatch
        bcachefs: Fix lost wake up
        bcachefs: Check for logged ops when clean
        bcachefs: BCH_FS_clean_recovery
        bcachefs: Convert disk accounting BUG_ON() to WARN_ON()
        bcachefs: Fix BCH_TRANS_COMMIT_skip_accounting_apply
        bcachefs: Check for accounting keys with bversion=0
        bcachefs: rename version -> bversion
        bcachefs: Don't delete unlinked inodes before logged op resume
        bcachefs: Fix BCH_SB_ERRS() so we can reorder
        bcachefs: Fix fsck warnings from bkey validation
        bcachefs: Move transaction commit path validation to as late as possible
        bcachefs: Fix disk accounting attempting to mark invalid replicas entry
        bcachefs: Fix unlocked access to c->disk_sb.sb in bch2_replicas_entry_validate()
        bcachefs: Fix accounting read + device removal
        bcachefs: bch_accounting_mode
        bcachefs: fix transaction restart handling in check_extents(), check_dirents()
        bcachefs: kill inode_walker_entry.seen_this_pos
        ...
      9f9a5347
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2024-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · d37421e6
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "Fix TDX MMIO #VE fault handling, and add two new Intel model numbers
        for 'Pantherlake' and 'Diamond Rapids'"
      
      * tag 'x86-urgent-2024-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/cpu: Add two Intel CPU model numbers
        x86/tdx: Fix "in-kernel MMIO" check
      d37421e6
    • Linus Torvalds's avatar
      Merge tag 'locking-urgent-2024-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ec03de73
      Linus Torvalds authored
      Pull locking updates from Ingo Molnar:
       "lockdep:
          - Fix potential deadlock between lockdep and RCU (Zhiguo Niu)
          - Use str_plural() to address Coccinelle warning (Thorsten Blum)
          - Add debuggability enhancement (Luis Claudio R. Goncalves)
      
        static keys & calls:
          - Fix static_key_slow_dec() yet again (Peter Zijlstra)
          - Handle module init failure correctly in static_call_del_module()
            (Thomas Gleixner)
          - Replace pointless WARN_ON() in static_call_module_notify() (Thomas
            Gleixner)
      
        <linux/cleanup.h>:
          - Add usage and style documentation (Dan Williams)
      
        rwsems:
          - Move is_rwsem_reader_owned() and rwsem_owner() under
            CONFIG_DEBUG_RWSEMS (Waiman Long)
      
        atomic ops, x86:
          - Redeclare x86_32 arch_atomic64_{add,sub}() as void (Uros Bizjak)
          - Introduce the read64_nonatomic macro to x86_32 with cx8 (Uros
            Bizjak)"
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      
      * tag 'locking-urgent-2024-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        locking/rwsem: Move is_rwsem_reader_owned() and rwsem_owner() under CONFIG_DEBUG_RWSEMS
        jump_label: Fix static_key_slow_dec() yet again
        static_call: Replace pointless WARN_ON() in static_call_module_notify()
        static_call: Handle module init failure correctly in static_call_del_module()
        locking/lockdep: Simplify character output in seq_line()
        lockdep: fix deadlock issue between lockdep and rcu
        lockdep: Use str_plural() to fix Coccinelle warning
        cleanup: Add usage and style documentation
        lockdep: suggest the fix for "lockdep bfs error:-1" on print_bfs_bug
        locking/atomic/x86: Redeclare x86_32 arch_atomic64_{add,sub}() as void
        locking/atomic/x86: Introduce the read64_nonatomic macro to x86_32 with cx8
      ec03de73
    • Linus Torvalds's avatar
      Merge tag 'cocci-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jlawall/linux · 68e4b0e0
      Linus Torvalds authored
      Pull coccinelle updates from Julia Lawall:
       "Extend string_choices.cocci to use more available helpers
      
        Ten patches from Hongbo Li extending string_choices.cocci with the
        complete set of functions offered by include/linux/string_choices.h.
      
        One patch from myself reducing the number of redundant cases that are
        checked by Coccinelle, giving a small performance improvement"
      
      * tag 'cocci-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jlawall/linux:
        Reduce Coccinelle choices in string_choices.cocci
        coccinelle: Remove unnecessary parentheses for only one possible change.
        coccinelle: Add rules to find str_yes_no() replacements
        coccinelle: Add rules to find str_on_off() replacements
        coccinelle: Add rules to find str_write_read() replacements
        coccinelle: Add rules to find str_read_write() replacements
        coccinelle: Add rules to find str_enable{d}_disable{d}() replacements
        coccinelle: Add rules to find str_lo{w}_hi{gh}() replacements
        coccinelle: Add rules to find str_hi{gh}_lo{w}() replacements
        coccinelle: Add rules to find str_false_true() replacements
        coccinelle: Add rules to find str_true_false() replacements
      68e4b0e0
    • Linus Torvalds's avatar
      Merge tag 'linux_kselftest-next-6.12-rc1-fixes' of... · e7ebdb51
      Linus Torvalds authored
      Merge tag 'linux_kselftest-next-6.12-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull kselftest fix from Shuah Khan:
       "One urgent fix to vDSO as automated testing is failing due to this
        bug"
      
      * tag 'linux_kselftest-next-6.12-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        selftests: vDSO: align stack for O2-optimized memcpy
      e7ebdb51