- 11 Nov, 2021 5 commits
-
-
Ashish Kalra authored
Reset the host's shared pages list related to kernel specific page encryption status settings before we load a new kernel by kexec. We cannot reset the complete shared pages list here as we need to retain the UEFI/OVMF firmware specific settings. The host's shared pages list is maintained for the guest to keep track of all unencrypted guest memory regions, therefore we need to explicitly mark all shared pages as encrypted again before rebooting into the new guest kernel. Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Reviewed-by: Steve Rutherford <srutherford@google.com> Message-Id: <3e051424ab839ea470f88333273d7a185006754f.1629726117.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Ashish Kalra authored
The guest support for detecting and enabling SEV Live migration feature uses the following logic : - kvm_init_plaform() checks if its booted under the EFI - If not EFI, i) if kvm_para_has_feature(KVM_FEATURE_MIGRATION_CONTROL), issue a wrmsrl() to enable the SEV live migration support - If EFI, i) If kvm_para_has_feature(KVM_FEATURE_MIGRATION_CONTROL), read the UEFI variable which indicates OVMF support for live migration ii) the variable indicates live migration is supported, issue a wrmsrl() to enable the SEV live migration support The EFI live migration check is done using a late_initcall() callback. Also, ensure that _bss_decrypted section is marked as decrypted in the hypervisor's guest page encryption status tracking. Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Reviewed-by: Steve Rutherford <srutherford@google.com> Message-Id: <b4453e4c87103ebef12217d2505ea99a1c3e0f0f.1629726117.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Ashish Kalra authored
Introduce a new AMD Memory Encryption GUID which is currently used for defining a new UEFI environment variable which indicates UEFI/OVMF support for the SEV live migration feature. This variable is setup when UEFI/OVMF detects host/hypervisor support for SEV live migration and later this variable is read by the kernel using EFI runtime services to verify if OVMF supports the live migration feature. Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Message-Id: <1cea22976d2208f34d47e0c1ce0ecac816c13111.1629726117.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Brijesh Singh authored
Invoke a hypercall when a memory region is changed from encrypted -> decrypted and vice versa. Hypervisor needs to know the page encryption status during the guest migration. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Steve Rutherford <srutherford@google.com> Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Reviewed-by: Borislav Petkov <bp@suse.de> Message-Id: <0a237d5bb08793916c7790a3e653a2cbe7485761.1629726117.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Brijesh Singh authored
KVM hypercall framework relies on alternative framework to patch the VMCALL -> VMMCALL on AMD platform. If a hypercall is made before apply_alternative() is called then it defaults to VMCALL. The approach works fine on non SEV guest. A VMCALL would causes #UD, and hypervisor will be able to decode the instruction and do the right things. But when SEV is active, guest memory is encrypted with guest key and hypervisor will not be able to decode the instruction bytes. To highlight the need to provide this interface, capturing the flow of apply_alternatives() : setup_arch() call init_hypervisor_platform() which detects the hypervisor platform the kernel is running under and then the hypervisor specific initialization code can make early hypercalls. For example, KVM specific initialization in case of SEV will try to mark the "__bss_decrypted" section's encryption state via early page encryption status hypercalls. Now, apply_alternatives() is called much later when setup_arch() calls check_bugs(), so we do need some kind of an early, pre-alternatives hypercall interface. Other cases of pre-alternatives hypercalls include marking per-cpu GHCB pages as decrypted on SEV-ES and per-cpu apf_reason, steal_time and kvm_apic_eoi as decrypted for SEV generally. Add SEV specific hypercall3, it unconditionally uses VMMCALL. The hypercall will be used by the SEV guest to notify encrypted pages to the hypervisor. This kvm_sev_hypercall3() function is abstracted and used as follows : All these early hypercalls are made through early_set_memory_XX() interfaces, which in turn invoke pv_ops (paravirt_ops). This early_set_memory_XX() -> pv_ops.mmu.notify_page_enc_status_changed() is a generic interface and can easily have SEV, TDX and any other future platform specific abstractions added to it. Currently, pv_ops.mmu.notify_page_enc_status_changed() callback is setup to invoke kvm_sev_hypercall3() in case of SEV. Similarly, in case of TDX, pv_ops.mmu.notify_page_enc_status_changed() can be setup to a TDX specific callback. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Steve Rutherford <srutherford@google.com> Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Message-Id: <6fd25c749205dd0b1eb492c60d41b124760cc6ae.1629726117.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 02 Nov, 2021 1 commit
-
-
https://github.com/kvm-riscv/linuxPaolo Bonzini authored
Minor cocci warning fixes: 1) Bool return warning fix 2) Unnedded semicolon warning fix
-
- 01 Nov, 2021 2 commits
-
-
Bixuan Cui authored
Fix boolreturn.cocci warnings: ./arch/riscv/kvm/mmu.c:603:9-10: WARNING: return of 0/1 in function 'kvm_age_gfn' with return type bool ./arch/riscv/kvm/mmu.c:582:9-10: WARNING: return of 0/1 in function 'kvm_set_spte_gfn' with return type bool ./arch/riscv/kvm/mmu.c:621:9-10: WARNING: return of 0/1 in function 'kvm_test_age_gfn' with return type bool ./arch/riscv/kvm/mmu.c:568:9-10: WARNING: return of 0/1 in function 'kvm_unmap_gfn_range' with return type bool Signed-off-by: Bixuan Cui <cuibixuan@linux.alibaba.com> Signed-off-by: Anup Patel <anup.patel@wdc.com>
-
ran jianping authored
Elimate the following coccinelle check warning: ./arch/riscv/kvm/vcpu_sbi.c:169:2-3: Unneeded semicolon ./arch/riscv/kvm/vcpu_exit.c:397:2-3: Unneeded semicolon ./arch/riscv/kvm/vcpu_exit.c:687:2-3: Unneeded semicolon ./arch/riscv/kvm/vcpu_exit.c:645:2-3: Unneeded semicolon ./arch/riscv/kvm/vcpu.c:247:2-3: Unneeded semicolon ./arch/riscv/kvm/vcpu.c:284:2-3: Unneeded semicolon ./arch/riscv/kvm/vcpu_timer.c:123:2-3: Unneeded semicolon ./arch/riscv/kvm/vcpu_timer.c:170:2-3: Unneeded semicolon Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: ran jianping <ran.jianping@zte.com.cn> Signed-off-by: Anup Patel <anup.patel@wdc.com>
-
- 31 Oct, 2021 4 commits
-
-
Paolo Bonzini authored
Merge tag 'kvm-s390-next-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Fixes and Features for 5.16 - SIGP Fixes - initial preparations for lazy destroy of secure VMs - storage key improvements/fixes - Log the guest CPNC
-
Anup Patel authored
The parameter passed to HFENCE.GVMA instruction in rs1 register is guest physical address right shifted by 2 (i.e. divided by 4). Unfortunately, we overlooked the semantics of rs1 registers for HFENCE.GVMA instruction and never right shifted guest physical address by 2. This issue did not manifest for hypervisors till now because: 1) Currently, only __kvm_riscv_hfence_gvma_all() and SBI HFENCE calls are used to invalidate TLB. 2) All H-extension implementations (such as QEMU, Spike, Rocket Core FPGA, etc) that we tried till now were conservatively flushing everything upon any HFENCE.GVMA instruction. This patch fixes GPA passed to __kvm_riscv_hfence_gvma_vmid_gpa() and __kvm_riscv_hfence_gvma_gpa() functions. Fixes: fd7bb4a2 ("RISC-V: KVM: Implement VMID allocator") Reported-by: Ian Huang <ihuang@ventanamicro.com> Signed-off-by: Anup Patel <anup.patel@wdc.com> Message-Id: <20211026170136.2147619-4-anup.patel@wdc.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Anup Patel authored
The timer and SBI virtualization is already in separate sources. In future, we will have vector and AIA virtualization also added as separate sources. To align with above described modularity, we factor-out FP virtualization into separate sources. Signed-off-by: Anup Patel <anup.patel@wdc.com> Message-Id: <20211026170136.2147619-3-anup.patel@wdc.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarmPaolo Bonzini authored
KVM/arm64 updates for Linux 5.16 - More progress on the protected VM front, now with the full fixed feature set as well as the limitation of some hypercalls after initialisation. - Cleanup of the RAZ/WI sysreg handling, which was pointlessly complicated - Fixes for the vgic placement in the IPA space, together with a bunch of selftests - More memcg accounting of the memory allocated on behalf of a guest - Timer and vgic selftests - Workarounds for the Apple M1 broken vgic implementation - KConfig cleanups - New kvmarm.mode=none option, for those who really dislike us
-
- 27 Oct, 2021 3 commits
-
-
Collin Walling authored
The diag 318 data contains values that denote information regarding the guest's environment. Currently, it is unecessarily difficult to observe this value (either manually-inserted debug statements, gdb stepping, mem dumping etc). It's useful to observe this information to obtain an at-a-glance view of the guest's environment, so lets add a simple VCPU event that prints the CPNC to the s390dbf logs. Signed-off-by: Collin Walling <walling@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Link: https://lore.kernel.org/r/20211027025451.290124-1-walling@linux.ibm.com [borntraeger@de.ibm.com]: change debug level to 3 Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
Claudio Imbrenda authored
Introduce variants of the convert and destroy page functions that also clear the PG_arch_1 bit used to mark them as secure pages. The PG_arch_1 flag is always allowed to overindicate; using the new functions introduced here allows to reduce the extent of overindication and thus improve performance. These new functions can only be called on pages for which a reference is already being held. Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Link: https://lore.kernel.org/r/20210920132502.36111-7-imbrenda@linux.ibm.comSigned-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
Janis Schoetterl-Glausch authored
If handle_sske cannot set the storage key, because there is no page table entry or no present large page entry, it calls fixup_user_fault. However, currently, if the call succeeds, handle_sske returns -EAGAIN, without having set the storage key. Instead, retry by continue'ing the loop without incrementing the address. The same issue in handle_pfmf was fixed by a11bdb1a ("KVM: s390: Fix pfmf and conditional skey emulation"). Fixes: bd096f64 ("KVM: s390: Add skey emulation fault handling") Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Link: https://lore.kernel.org/r/20211022152648.26536-1-scgl@linux.ibm.comSigned-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
- 25 Oct, 2021 20 commits
-
-
Paolo Bonzini authored
pvclock_gtod_sync_lock is completely gone in Linux 5.16. Include this fix into the kvm/next history to record that the syzkaller report is not valid there. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
David Woodhouse authored
On the preemption path when updating a Xen guest's runstate times, this lock is taken inside the scheduler rq->lock, which is a raw spinlock. This was shown in a lockdep warning: [ 89.138354] ============================= [ 89.138356] [ BUG: Invalid wait context ] [ 89.138358] 5.15.0-rc5+ #834 Tainted: G S I E [ 89.138360] ----------------------------- [ 89.138361] xen_shinfo_test/2575 is trying to lock: [ 89.138363] ffffa34a0364efd8 (&kvm->arch.pvclock_gtod_sync_lock){....}-{3:3}, at: get_kvmclock_ns+0x1f/0x130 [kvm] [ 89.138442] other info that might help us debug this: [ 89.138444] context-{5:5} [ 89.138445] 4 locks held by xen_shinfo_test/2575: [ 89.138447] #0: ffff972bdc3b8108 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0x77/0x6f0 [kvm] [ 89.138483] #1: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_ioctl_run+0xdc/0x8b0 [kvm] [ 89.138526] #2: ffff97331fdbac98 (&rq->__lock){-.-.}-{2:2}, at: __schedule+0xff/0xbd0 [ 89.138534] #3: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_put+0x26/0x170 [kvm] ... [ 89.138695] get_kvmclock_ns+0x1f/0x130 [kvm] [ 89.138734] kvm_xen_update_runstate+0x14/0x90 [kvm] [ 89.138783] kvm_xen_update_runstate_guest+0x15/0xd0 [kvm] [ 89.138830] kvm_arch_vcpu_put+0xe6/0x170 [kvm] [ 89.138870] kvm_sched_out+0x2f/0x40 [kvm] [ 89.138900] __schedule+0x5de/0xbd0 Cc: stable@vger.kernel.org Reported-by: syzbot+b282b65c2c68492df769@syzkaller.appspotmail.com Fixes: 30b5c851 ("KVM: x86/xen: Add support for vCPU runstate information") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <1b02a06421c17993df337493a68ba923f3bd5c0f.camel@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
David Edmondson authored
When passing the failing address and size out to user space, SGX must ensure not to trample on the earlier fields of the emulation_failure sub-union of struct kvm_run. Signed-off-by: David Edmondson <david.edmondson@oracle.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210920103737.2696756-5-david.edmondson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
David Edmondson authored
Should instruction emulation fail, include the VM exit reason, etc. in the emulation_failure data passed to userspace, in order that the VMM can report it as a debugging aid when describing the failure. Suggested-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: David Edmondson <david.edmondson@oracle.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210920103737.2696756-4-david.edmondson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
David Edmondson authored
Extend the get_exit_info static call to provide the reason for the VM exit. Modify relevant trace points to use this rather than extracting the reason in the caller. Signed-off-by: David Edmondson <david.edmondson@oracle.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210920103737.2696756-3-david.edmondson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
David Edmondson authored
Until more flags for kvm_run.emulation_failure flags are defined, it is undetermined whether new payload elements corresponding to those flags will be additive or alternative. As a hint to userspace that an alternative is possible, wrap the current payload elements in a union. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Edmondson <david.edmondson@oracle.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210920103737.2696756-2-david.edmondson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Eric Farman authored
This capability exists, but we don't record anything when userspace enables it. Let's refactor that code so that a note can be made in the debug logs that it was enabled. Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20211008203112.1979843-7-farman@linux.ibm.comSigned-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
Eric Farman authored
The Principles of Operations describe the various reasons that each individual SIGP orders might be rejected, and the status bit that are set for each condition. For example, for the Set Architecture order, it states: "If it is not true that all other CPUs in the configu- ration are in the stopped or check-stop state, ... bit 54 (incorrect state) ... is set to one." However, it also states: "... if the CZAM facility is installed, ... bit 55 (invalid parameter) ... is set to one." Since the Configuration-z/Architecture-Architectural Mode (CZAM) facility is unconditionally presented, there is no need to examine each VCPU to determine if it is started/stopped. It can simply be rejected outright with the Invalid Parameter bit. Fixes: b697e435 ("KVM: s390: Support Configuration z/Architecture Mode") Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Link: https://lore.kernel.org/r/20211008203112.1979843-2-farman@linux.ibm.comSigned-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
Claudio Imbrenda authored
Improve make_secure_pte to avoid stalls when the system is heavily overcommitted. This was especially problematic in kvm_s390_pv_unpack, because of the loop over all pages that needed unpacking. Due to the locks being held, it was not possible to simply replace uv_call with uv_call_sched. A more complex approach was needed, in which uv_call is replaced with __uv_call, which does not loop. When the UVC needs to be executed again, -EAGAIN is returned, and the caller (or its caller) will try again. When -EAGAIN is returned, the path is the same as when the page is in writeback (and the writeback check is also performed, which is harmless). Fixes: 214d9bbc ("s390/mm: provide memory management functions for protected KVM guests") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Link: https://lore.kernel.org/r/20210920132502.36111-5-imbrenda@linux.ibm.comSigned-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
Claudio Imbrenda authored
When the system is heavily overcommitted, kvm_s390_pv_init_vm might generate stall notifications. Fix this by using uv_call_sched instead of just uv_call. This is ok because we are not holding spinlocks. Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Fixes: 214d9bbc ("s390/mm: provide memory management functions for protected KVM guests") Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Message-Id: <20210920132502.36111-4-imbrenda@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
Claudio Imbrenda authored
If kvm_s390_pv_destroy_cpu is called more than once, we risk calling free_page on a random page, since the sidad field is aliased with the gbea, which is not guaranteed to be zero. This can happen, for example, if userspace calls the KVM_PV_DISABLE IOCTL, and it fails, and then userspace calls the same IOCTL again. This scenario is only possible if KVM has some serious bug or if the hardware is broken. The solution is to simply return successfully immediately if the vCPU was already non secure. Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Fixes: 19e12277 ("KVM: S390: protvirt: Introduce instruction data area bounce buffer") Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Message-Id: <20210920132502.36111-3-imbrenda@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
Claudio Imbrenda authored
Add macros to describe the 4 possible CC values returned by the UVC instruction. Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Message-Id: <20210920132502.36111-2-imbrenda@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
David Hildenbrand authored
We already optimize get_guest_storage_key() to assume that if we don't have a PTE table and don't have a huge page mapped that the storage key is 0. Similarly, optimize reset_guest_reference_bit() to simply do nothing if there is no PTE table and no huge page mapped. Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20210909162248.14969-10-david@redhat.comSigned-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
David Hildenbrand authored
We already optimize get_guest_storage_key() to assume that if we don't have a PTE table and don't have a huge page mapped that the storage key is 0. Similarly, optimize set_guest_storage_key() to simply do nothing in case the key to set is 0. Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20210909162248.14969-9-david@redhat.comSigned-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
David Hildenbrand authored
pte_map_lock() is sufficient. Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20210909162248.14969-8-david@redhat.comSigned-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
David Hildenbrand authored
We should not walk/touch page tables outside of VMA boundaries when holding only the mmap sem in read mode. Evil user space can modify the VMA layout just before this function runs and e.g., trigger races with page table removal code since commit dd2283f2 ("mm: mmap: zap pages with read mmap_sem in munmap"). find_vma() does not check if the address is >= the VMA start address; use vma_lookup() instead. Fixes: 214d9bbc ("s390/mm: provide memory management functions for protected KVM guests") Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Link: https://lore.kernel.org/r/20210909162248.14969-6-david@redhat.comSigned-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
David Hildenbrand authored
There are multiple things broken about our storage key handling functions: 1. We should not walk/touch page tables outside of VMA boundaries when holding only the mmap sem in read mode. Evil user space can modify the VMA layout just before this function runs and e.g., trigger races with page table removal code since commit dd2283f2 ("mm: mmap: zap pages with read mmap_sem in munmap"). gfn_to_hva() will only translate using KVM memory regions, but won't validate the VMA. 2. We should not allocate page tables outside of VMA boundaries: if evil user space decides to map hugetlbfs to these ranges, bad things will happen because we suddenly have PTE or PMD page tables where we shouldn't have them. 3. We don't handle large PUDs that might suddenly appeared inside our page table hierarchy. Don't manually allocate page tables, properly validate that we have VMA and bail out on pud_large(). All callers of page table handling functions, except get_guest_storage_key(), call fixup_user_fault() in case they receive an -EFAULT and retry; this will allocate the necessary page tables if required. To keep get_guest_storage_key() working as expected and not requiring kvm_s390_get_skeys() to call fixup_user_fault() distinguish between "there is simply no page table or huge page yet and the key is assumed to be 0" and "this is a fault to be reported". Although commit 637ff9ef ("s390/mm: Add huge pmd storage key handling") introduced most of the affected code, it was actually already broken before when using get_locked_pte() without any VMA checks. Note: Ever since commit 637ff9ef ("s390/mm: Add huge pmd storage key handling") we can no longer set a guest storage key (for example from QEMU during VM live migration) without actually resolving a fault. Although we would have created most page tables, we would choke on the !pmd_present(), requiring a call to fixup_user_fault(). I would have thought that this is problematic in combination with postcopy life migration ... but nobody noticed and this patch doesn't change the situation. So maybe it's just fine. Fixes: 9fcf93b5 ("KVM: S390: Create helper function get_guest_storage_key") Fixes: 24d5dd02 ("s390/kvm: Provide function for setting the guest storage key") Fixes: a7e19ab5 ("KVM: s390: handle missing storage-key facility") Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20210909162248.14969-5-david@redhat.comSigned-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
David Hildenbrand authored
We should not walk/touch page tables outside of VMA boundaries when holding only the mmap sem in read mode. Evil user space can modify the VMA layout just before this function runs and e.g., trigger races with page table removal code since commit dd2283f2 ("mm: mmap: zap pages with read mmap_sem in munmap"). gfn_to_hva() will only translate using KVM memory regions, but won't validate the VMA. Further, we should not allocate page tables outside of VMA boundaries: if evil user space decides to map hugetlbfs to these ranges, bad things will happen because we suddenly have PTE or PMD page tables where we shouldn't have them. Similarly, we have to check if we suddenly find a hugetlbfs VMA, before calling get_locked_pte(). Fixes: 2d42f947 ("s390/kvm: Add PGSTE manipulation functions") Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20210909162248.14969-4-david@redhat.comSigned-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
David Hildenbrand authored
... otherwise we will try unlocking a spinlock that was never locked via a garbage pointer. At the time we reach this code path, we usually successfully looked up a PGSTE already; however, evil user space could have manipulated the VMA layout in the meantime and triggered removal of the page table. Fixes: 1e133ab2 ("s390/mm: split arch/s390/mm/pgtable.c") Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20210909162248.14969-3-david@redhat.comSigned-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
David Hildenbrand authored
We should not walk/touch page tables outside of VMA boundaries when holding only the mmap sem in read mode. Evil user space can modify the VMA layout just before this function runs and e.g., trigger races with page table removal code since commit dd2283f2 ("mm: mmap: zap pages with read mmap_sem in munmap"). The pure prescence in our guest_to_host radix tree does not imply that there is a VMA. Further, we should not allocate page tables (via get_locked_pte()) outside of VMA boundaries: if evil user space decides to map hugetlbfs to these ranges, bad things will happen because we suddenly have PTE or PMD page tables where we shouldn't have them. Similarly, we have to check if we suddenly find a hugetlbfs VMA, before calling get_locked_pte(). Note that gmap_discard() is different: zap_page_range()->unmap_single_vma() makes sure to stay within VMA boundaries. Fixes: b31288fa ("s390/kvm: support collaborative memory management") Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20210909162248.14969-2-david@redhat.comSigned-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
- 22 Oct, 2021 5 commits
-
-
Jim Mattson authored
Though gcc conveniently compiles a simple memset to "rep stos," clang prefers to call the libc version of memset. If a test is dynamically linked, the libc memset isn't available in L1 (nor is the PLT or the GOT, for that matter). Even if the test is statically linked, the libc memset may choose to use some CPU features, like AVX, which may not be enabled in L1. Note that __builtin_memset doesn't solve the problem, because (a) the compiler is free to call memset anyway, and (b) __builtin_memset may also choose to use features like AVX, which may not be available in L1. To avoid a myriad of problems, use an explicit "rep stos" to clear the VMCB in generic_svm_setup(), which is called both from L0 and L1. Reported-by: Ricardo Koller <ricarkol@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Fixes: 20ba262f ("selftests: KVM: AMD Nested test infrastructure") Message-Id: <20210930003649.4026553-1-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Jim Mattson authored
This variable was renamed to kvm_has_noapic_vcpu in commit 6e4e3b4d ("KVM: Stop using deprecated jump label APIs"). Signed-off-by: Jim Mattson <jmattson@google.com> Message-Id: <20211021185449.3471763-1-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Unregister KVM's posted interrupt wakeup handler during unsetup so that a spurious interrupt that arrives after kvm_intel.ko is unloaded doesn't call into freed memory. Fixes: bf9f6ac8 ("KVM: Update Posted-Interrupts Descriptor when vCPU is blocked") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211009001107.3936588-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Add a synchronize_rcu() after clearing the posted interrupt wakeup handler to ensure all readers, i.e. in-flight IRQ handlers, see the new handler before returning to the caller. If the caller is an exiting module and is unregistering its handler, failure to wait could result in the IRQ handler jumping into an unloaded module. The registration path doesn't require synchronization, as it's the caller's responsibility to not generate interrupts it cares about until after its handler is registered. Fixes: f6b3c72c ("x86/irq: Define a global vector for VT-d Posted-Interrupts") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211009001107.3936588-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Use a rw_semaphore instead of a mutex to coordinate APICv updates so that vCPUs responding to requests can take the lock for read and run in parallel. Using a mutex forces serialization of vCPUs even though kvm_vcpu_update_apicv() only touches data local to that vCPU or is protected by a different lock, e.g. SVM's ir_list_lock. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211022004927.1448382-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-