- 13 Aug, 2021 3 commits
-
-
Sean Christopherson authored
Add yet another spinlock for the TDP MMU and take it when marking indirect shadow pages unsync. When using the TDP MMU and L1 is running L2(s) with nested TDP, KVM may encounter shadow pages for the TDP entries managed by L1 (controlling L2) when handling a TDP MMU page fault. The unsync logic is not thread safe, e.g. the kvm_mmu_page fields are not atomic, and misbehaves when a shadow page is marked unsync via a TDP MMU page fault, which runs with mmu_lock held for read, not write. Lack of a critical section manifests most visibly as an underflow of unsync_children in clear_unsync_child_bit() due to unsync_children being corrupted when multiple CPUs write it without a critical section and without atomic operations. But underflow is the best case scenario. The worst case scenario is that unsync_children prematurely hits '0' and leads to guest memory corruption due to KVM neglecting to properly sync shadow pages. Use an entirely new spinlock even though piggybacking tdp_mmu_pages_lock would functionally be ok. Usurping the lock could degrade performance when building upper level page tables on different vCPUs, especially since the unsync flow could hold the lock for a comparatively long time depending on the number of indirect shadow pages and the depth of the paging tree. For simplicity, take the lock for all MMUs, even though KVM could fairly easily know that mmu_lock is held for write. If mmu_lock is held for write, there cannot be contention for the inner spinlock, and marking shadow pages unsync across multiple vCPUs will be slow enough that bouncing the kvm_arch cacheline should be in the noise. Note, even though L2 could theoretically be given access to its own EPT entries, a nested MMU must hold mmu_lock for write and thus cannot race against a TDP MMU page fault. I.e. the additional spinlock only _needs_ to be taken by the TDP MMU, as opposed to being taken by any MMU for a VM that is running with the TDP MMU enabled. Holding mmu_lock for read also prevents the indirect shadow page from being freed. But as above, keep it simple and always take the lock. Alternative #1, the TDP MMU could simply pass "false" for can_unsync and effectively disable unsync behavior for nested TDP. Write protecting leaf shadow pages is unlikely to noticeably impact traditional L1 VMMs, as such VMMs typically don't modify TDP entries, but the same may not hold true for non-standard use cases and/or VMMs that are migrating physical pages (from L1's perspective). Alternative #2, the unsync logic could be made thread safe. In theory, simply converting all relevant kvm_mmu_page fields to atomics and using atomic bitops for the bitmap would suffice. However, (a) an in-depth audit would be required, (b) the code churn would be substantial, and (c) legacy shadow paging would incur additional atomic operations in performance sensitive paths for no benefit (to legacy shadow paging). Fixes: a2855afc ("KVM: x86/mmu: Allow parallel page faults for the TDP MMU") Cc: stable@vger.kernel.org Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210812181815.3378104-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Set the min_level for the TDP iterator at the root level when zapping all SPTEs to optimize the iterator's try_step_down(). Zapping a non-leaf SPTE will recursively zap all its children, thus there is no need for the iterator to attempt to step down. This avoids rereading the top-level SPTEs after they are zapped by causing try_step_down() to short-circuit. In most cases, optimizing try_step_down() will be in the noise as the cost of zapping SPTEs completely dominates the overall time. The optimization is however helpful if the zap occurs with relatively few SPTEs, e.g. if KVM is zapping in response to multiple memslot updates when userspace is adding and removing read-only memslots for option ROMs. In that case, the task doing the zapping likely isn't a vCPU thread, but it still holds mmu_lock for read and thus can be a noisy neighbor of sorts. Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210812181414.3376143-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Pass "all ones" as the end GFN to signal "zap all" for the TDP MMU and really zap all SPTEs in this case. As is, zap_gfn_range() skips non-leaf SPTEs whose range exceeds the range to be zapped. If shadow_phys_bits is not aligned to the range size of top-level SPTEs, e.g. 512gb with 4-level paging, the "zap all" flows will skip top-level SPTEs whose range extends beyond shadow_phys_bits and leak their SPs when the VM is destroyed. Use the current upper bound (based on host.MAXPHYADDR) to detect that the caller wants to zap all SPTEs, e.g. instead of using the max theoretical gfn, 1 << (52 - 12). The more precise upper bound allows the TDP iterator to terminate its walk earlier when running on hosts with MAXPHYADDR < 52. Add a WARN on kmv->arch.tdp_mmu_pages when the TDP MMU is destroyed to help future debuggers should KVM decide to leak SPTEs again. The bug is most easily reproduced by running (and unloading!) KVM in a VM whose host.MAXPHYADDR < 39, as the SPTE for gfn=0 will be skipped. ============================================================================= BUG kvm_mmu_page_header (Not tainted): Objects remaining in kvm_mmu_page_header on __kmem_cache_shutdown() ----------------------------------------------------------------------------- Slab 0x000000004d8f7af1 objects=22 used=2 fp=0x00000000624d29ac flags=0x4000000000000200(slab|zone=1) CPU: 0 PID: 1582 Comm: rmmod Not tainted 5.14.0-rc2+ #420 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: dump_stack_lvl+0x45/0x59 slab_err+0x95/0xc9 __kmem_cache_shutdown.cold+0x3c/0x158 kmem_cache_destroy+0x3d/0xf0 kvm_mmu_module_exit+0xa/0x30 [kvm] kvm_arch_exit+0x5d/0x90 [kvm] kvm_exit+0x78/0x90 [kvm] vmx_exit+0x1a/0x50 [kvm_intel] __x64_sys_delete_module+0x13f/0x220 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: faaf05b0 ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU") Cc: stable@vger.kernel.org Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210812181414.3376143-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 05 Aug, 2021 1 commit
-
-
Sean Christopherson authored
Take a signed 'long' instead of an 'unsigned long' for the number of pages to add/subtract to the total number of pages used by the MMU. This fixes a zero-extension bug on 32-bit kernels that effectively corrupts the per-cpu counter used by the shrinker. Per-cpu counters take a signed 64-bit value on both 32-bit and 64-bit kernels, whereas kvm_mod_used_mmu_pages() takes an unsigned long and thus an unsigned 32-bit value on 32-bit kernels. As a result, the value used to adjust the per-cpu counter is zero-extended (unsigned -> signed), not sign-extended (signed -> signed), and so KVM's intended -1 gets morphed to 4294967295 and effectively corrupts the counter. This was found by a staggering amount of sheer dumb luck when running kvm-unit-tests on a 32-bit KVM build. The shrinker just happened to kick in while running tests and do_shrink_slab() logged an error about trying to free a negative number of objects. The truly lucky part is that the kernel just happened to be a slightly stale build, as the shrinker no longer yells about negative objects as of commit 18bb473e ("mm: vmscan: shrink deferred objects proportional to priority"). vmscan: shrink_slab: mmu_shrink_scan+0x0/0x210 [kvm] negative objects to delete nr=-858993460 Fixes: bc8a3d89 ("kvm: mmu: Fix overflow on kvm mmu page limit calculation") Cc: stable@vger.kernel.org Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210804214609.1096003-1-seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 04 Aug, 2021 4 commits
-
-
Maxim Levitsky authored
The test was mistakenly using addr_gpa2hva on a gva and that happened to work accidentally. Commit 106a2e76 ("KVM: selftests: Lower the min virtual address for misc page allocations") revealed this bug. Fixes: 2c7f76b4 ("selftests: kvm: Add basic Hyper-V clocksources tests", 2021-03-18) Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210804112057.409498-1-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Mingwei Zhang authored
KVM SEV code uses bitmaps to manage ASID states. ASID 0 was always skipped because it is never used by VM. Thus, in existing code, ASID value and its bitmap postion always has an 'offset-by-1' relationship. Both SEV and SEV-ES shares the ASID space, thus KVM uses a dynamic range [min_asid, max_asid] to handle SEV and SEV-ES ASIDs separately. Existing code mixes the usage of ASID value and its bitmap position by using the same variable called 'min_asid'. Fix the min_asid usage: ensure that its usage is consistent with its name; allocate extra size for ASID 0 to ensure that each ASID has the same value with its bitmap position. Add comments on ASID bitmap allocation to clarify the size change. Signed-off-by: Mingwei Zhang <mizhang@google.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Marc Orr <marcorr@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Alper Gun <alpergun@google.com> Cc: Dionna Glaze <dionnaglaze@google.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Vipin Sharma <vipinsh@google.com> Cc: Peter Gonda <pgonda@google.com> Cc: Joerg Roedel <joro@8bytes.org> Message-Id: <20210802180903.159381-1-mizhang@google.com> [Fix up sev_asid_free to also index by ASID, as suggested by Sean Christopherson, and use nr_asids in sev_cpu_init. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Use the raw ASID, not ASID-1, when nullifying the last used VMCB when freeing an SEV ASID. The consumer, pre_sev_run(), indexes the array by the raw ASID, thus KVM could get a false negative when checking for a different VMCB if KVM manages to reallocate the same ASID+VMCB combo for a new VM. Note, this cannot cause a functional issue _in the current code_, as pre_sev_run() also checks which pCPU last did VMRUN for the vCPU, and last_vmentry_cpu is initialized to -1 during vCPU creation, i.e. is guaranteed to mismatch on the first VMRUN. However, prior to commit 8a14fe4f ("kvm: x86: Move last_cpu into kvm_vcpu_arch as last_vmentry_cpu"), SVM tracked pCPU on its own and zero-initialized the last_cpu variable. Thus it's theoretically possible that older versions of KVM could miss a TLB flush if the first VMRUN is on pCPU0 and the ASID and VMCB exactly match those of a prior VM. Fixes: 70cd94e6 ("KVM: SVM: VMRUN should use associated ASID when SEV is enabled") Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
KVM creates a debugfs directory for each VM in order to store statistics about the virtual machine. The directory name is built from the process pid and a VM fd. While generally unique, it is possible to keep a file descriptor alive in a way that causes duplicate directories, which manifests as these messages: [ 471.846235] debugfs: Directory '20245-4' with parent 'kvm' already present! Even though this should not happen in practice, it is more or less expected in the case of KVM for testcases that call KVM_CREATE_VM and close the resulting file descriptor repeatedly and in parallel. When this happens, debugfs_create_dir() returns an error but kvm_create_vm_debugfs() goes on to allocate stat data structs which are later leaked. The slow memory leak was spotted by syzkaller, where it caused OOM reports. Since the issue only affects debugfs, do a lookup before calling debugfs_create_dir, so that the message is downgraded and rate-limited. While at it, ensure kvm->debugfs_dentry is NULL rather than an error if it is not created. This fixes kvm_destroy_vm_debugfs, which was not checking IS_ERR_OR_NULL correctly. Cc: stable@vger.kernel.org Fixes: 536a6f88 ("KVM: Create debugfs dir and stat files for each VM") Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 03 Aug, 2021 4 commits
-
-
Vitaly Kuznetsov authored
Check that #UD is raised if bit 16 is clear in HYPERV_CPUID_FEATURES.EDX and an 'XMM fast' hypercall is issued. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Siddharth Chandrasekaran <sidcha@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210730122625.112848-5-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
TLFS states that "Availability of the XMM fast hypercall interface is indicated via the “Hypervisor Feature Identification” CPUID Leaf (0x40000003, see section 2.4.4) ... Any attempt to use this interface when the hypervisor does not indicate availability will result in a #UD fault." Implement the check for 'strict' mode (KVM_CAP_HYPERV_ENFORCE_CPUID). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Siddharth Chandrasekaran <sidcha@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210730122625.112848-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
Hypercall failures are unusual with potentially far going consequences so it would be useful to see their results when tracing. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Siddharth Chandrasekaran <sidcha@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210730122625.112848-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
In case guest doesn't have access to the particular hypercall we can avoid reading XMM registers. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Siddharth Chandrasekaran <sidcha@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210730122625.112848-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 30 Jul, 2021 1 commit
-
-
Paolo Bonzini authored
Once an exception has been injected, any side effects related to the exception (such as setting CR2 or DR6) have been taked place. Therefore, once KVM sets the VM-entry interruption information field or the AMD EVENTINJ field, the next VM-entry must deliver that exception. Pending interrupts are processed after injected exceptions, so in theory it would not be a problem to use KVM_INTERRUPT when an injected exception is present. However, DOSEMU is using run->ready_for_interrupt_injection to detect interrupt windows and then using KVM_SET_SREGS/KVM_SET_REGS to inject the interrupt manually. For this to work, the interrupt window must be delayed after the completion of the previous event injection. Cc: stable@vger.kernel.org Reported-by: Stas Sergeev <stsp2@yandex.ru> Tested-by: Stas Sergeev <stsp2@yandex.ru> Fixes: 71cc849b ("KVM: x86: Fix split-irqchip vs interrupt injection window request") Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 27 Jul, 2021 10 commits
-
-
Paolo Bonzini authored
The arguments to the KVM_CLEAR_DIRTY_LOG ioctl include a pointer, therefore it needs a compat ioctl implementation. Otherwise, 32-bit userspace fails to invoke it on 64-bit kernels; for x86 it might work fine by chance if the padding is zero, but not on big-endian architectures. Reported-by: Thomas Sattler Cc: stable@vger.kernel.org Fixes: 2a31b9db ("kvm: introduce manual dirty log reprotect") Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Li RongQing authored
SMT siblings share caches and other hardware, and busy halt polling will degrade its sibling performance if its sibling is working Sean Christopherson suggested as below: "Rather than disallowing halt-polling entirely, on x86 it should be sufficient to simply have the hardware thread yield to its sibling(s) via PAUSE. It probably won't get back all performance, but I would expect it to be close. This compiles on all KVM architectures, and AFAICT the intended usage of cpu_relax() is identical for all architectures." Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Message-Id: <20210727111247.55510-1-lirongqing@baidu.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Maxim Levitsky authored
Currently when SVM is enabled in guest CPUID, AVIC is inhibited as soon as the guest CPUID is set. AVIC happens to be fully disabled on all vCPUs by the time any guest entry starts (if after migration the entry can be nested). The reason is that currently we disable avic right away on vCPU from which the kvm_request_apicv_update was called and for this case, it happens to be called on all vCPUs (by svm_vcpu_after_set_cpuid). After we stop doing this, AVIC will end up being disabled only when KVM_REQ_APICV_UPDATE is processed which is after we done switching to the nested guest. Fix this by just using vmcb01 in svm_refresh_apicv_exec_ctrl for avic (which is a right thing to do anyway). Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210713142023.106183-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Maxim Levitsky authored
It is possible that AVIC was requested to be disabled but not yet disabled, e.g if the nested entry is done right after svm_vcpu_after_set_cpuid. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210713142023.106183-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Maxim Levitsky authored
It is possible for AVIC inhibit and AVIC active state to be mismatched. Currently we disable AVIC right away on vCPU which started the AVIC inhibit request thus this warning doesn't trigger but at least in theory, if svm_set_vintr is called at the same time on multiple vCPUs, the warning can happen. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210713142023.106183-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Christian Borntraeger authored
commit bc9e9e67 ("KVM: debugfs: Reuse binary stats descriptors") did replace the old definitions with the binary ones. While doing that it missed that some files are names different than the counters. This is especially important for kvm_stat which does have special handling for counters named instruction_*. Fixes: commit bc9e9e67 ("KVM: debugfs: Reuse binary stats descriptors") CC: Jing Zhang <jingzhangos@google.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Message-Id: <20210726150108.5603-1-borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Right now, svm_hv_vmcb_dirty_nested_enlightenments has an incorrect dereference of vmcb->control.reserved_sw before the vmcb is checked for being non-NULL. The compiler is usually sinking the dereference after the check; instead of doing this ourselves in the source, ensure that svm_hv_vmcb_dirty_nested_enlightenments is only called with a non-NULL VMCB. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Vineeth Pillai <viremana@linux.microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [Untested for now due to issues with my AMD machine. - Paolo]
-
David Matlack authored
This test measures the performance effects of KVM's access tracking. Access tracking is driven by the MMU notifiers test_young, clear_young, and clear_flush_young. These notifiers do not have a direct userspace API, however the clear_young notifier can be triggered by marking a pages as idle in /sys/kernel/mm/page_idle/bitmap. This test leverages that mechanism to enable access tracking on guest memory. To measure performance this test runs a VM with a configurable number of vCPUs that each touch every page in disjoint regions of memory. Performance is measured in the time it takes all vCPUs to finish touching their predefined region. Example invocation: $ ./access_tracking_perf_test -v 8 Testing guest mode: PA-bits:ANY, VA-bits:48, 4K pages guest physical test memory offset: 0xffdfffff000 Populating memory : 1.337752570s Writing to populated memory : 0.010177640s Reading from populated memory : 0.009548239s Mark memory idle : 23.973131748s Writing to idle memory : 0.063584496s Mark memory idle : 24.924652964s Reading from idle memory : 0.062042814s Breaking down the results: * "Populating memory": The time it takes for all vCPUs to perform the first write to every page in their region. * "Writing to populated memory" / "Reading from populated memory": The time it takes for all vCPUs to write and read to every page in their region after it has been populated. This serves as a control for the later results. * "Mark memory idle": The time it takes for every vCPU to mark every page in their region as idle through page_idle. * "Writing to idle memory" / "Reading from idle memory": The time it takes for all vCPUs to write and read to every page in their region after it has been marked idle. This test should be portable across architectures but it is only enabled for x86_64 since that's all I have tested. Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20210713220957.3493520-7-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
David Matlack authored
There is a missing break statement which causes a fallthrough to the next statement where optarg will be null and a segmentation fault will be generated. Fixes: 9e965bb7 ("KVM: selftests: Add backing src parameter to dirty_log_perf_test") Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20210713220957.3493520-6-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Juergen Gross authored
KVM_MAX_VCPU_ID is the maximum vcpu-id of a guest, and not the number of vcpu-ids. Fix array indexed by vcpu-id to have KVM_MAX_VCPU_ID+1 elements. Note that this is currently no real problem, as KVM_MAX_VCPU_ID is an odd number, resulting in always enough padding being available at the end of those arrays. Nevertheless this should be fixed in order to avoid rare problems in case someone is using an even number for KVM_MAX_VCPU_ID. Signed-off-by: Juergen Gross <jgross@suse.com> Message-Id: <20210701154105.23215-2-jgross@suse.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 26 Jul, 2021 5 commits
-
-
Vitaly Kuznetsov authored
MSR_KVM_ASYNC_PF_ACK MSR is part of interrupt based asynchronous page fault interface and not the original (deprecated) KVM_FEATURE_ASYNC_PF. This is stated in Documentation/virt/kvm/msr.rst. Fixes: 66570e96 ("kvm: x86: only provide PV features if enabled in guest's CPUID") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Oliver Upton <oupton@google.com> Message-Id: <20210722123018.260035-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Mauro Carvalho Chehab authored
The conversion tools used during DocBook/LaTeX/html/Markdown->ReST conversion and some cut-and-pasted text contain some characters that aren't easily reachable on standard keyboards and/or could cause troubles when parsed by the documentation build system. Replace the occurences of the following characters: - U+00a0 (' '): NO-BREAK SPACE as it can cause lines being truncated on PDF output Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Message-Id: <ff70cb42d63f3a1da66af1b21b8d038418ed5189.1626947264.git.mchehab+huawei@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
'KVM_CAP_ENFORCE_PV_CPUID' doesn't match the define in include/uapi/linux/kvm.h. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210722092628.236474-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
Make svm_copy_vmrun_state()/svm_copy_vmloadsave_state() interface match 'memcpy(dest, src)' to avoid any confusion. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210719090322.625277-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
To match svm_copy_vmrun_state(), rename nested_svm_vmloadsave() to svm_copy_vmloadsave_state(). Opportunistically add missing braces to 'else' branch in vmload_vmsave_interception(). No functional change intended. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210716144104.465269-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 19 Jul, 2021 1 commit
-
-
Paolo Bonzini authored
Merge tag 'kvmarm-fixes-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.14, take #1 - Fix MTE shared page detection - Fix selftest use of obsolete pthread_yield() in favour of sched_yield() - Enable selftest's use of PMU registers when asked to
-
- 18 Jul, 2021 11 commits
-
-
Linus Torvalds authored
-
Linus Torvalds authored
Merge tag 'perf-tools-fixes-for-v5.14-2021-07-18' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools fixes from Arnaldo Carvalho de Melo: - Skip invalid hybrid PMU on hybrid systems when the atom (little) CPUs are offlined. - Fix 'perf test' problems related to the recently added hybrid (BIG/little) code. - Split ARM's coresight (hw tracing) decode by aux records to avoid fatal decoding errors. - Fix add event failure in 'perf probe' when running 32-bit perf in a 64-bit kernel. - Fix 'perf sched record' failure when CONFIG_SCHEDSTATS is not set. - Fix memory and refcount leaks detected by ASAn when running 'perf test', should be clean of warnings now. - Remove broken definition of __LITTLE_ENDIAN from tools' linux/kconfig.h, which was breaking the build in some systems. - Cast PTHREAD_STACK_MIN to int as it may turn into 'long sysconf(__SC_THREAD_STACK_MIN_VALUE), breaking the build in some systems. - Fix libperf build error with LIBPFM4=1. - Sync UAPI files changed by the memfd_secret new syscall. * tag 'perf-tools-fixes-for-v5.14-2021-07-18' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (35 commits) perf sched: Fix record failure when CONFIG_SCHEDSTATS is not set perf probe: Fix add event failure when running 32-bit perf in a 64-bit kernel perf data: Close all files in close_dir() perf probe-file: Delete namelist in del_events() on the error path perf test bpf: Free obj_buf perf trace: Free strings in trace__parse_events_option() perf trace: Free syscall tp fields in evsel->priv perf trace: Free syscall->arg_fmt perf trace: Free malloc'd trace fields on exit perf lzma: Close lzma stream on exit perf script: Fix memory 'threads' and 'cpus' leaks on exit perf script: Release zstd data perf session: Cleanup trace_event perf inject: Close inject.output on exit perf report: Free generated help strings for sort option perf env: Fix memory leak of cpu_pmu_caps perf test maps__merge_in: Fix memory leak of maps perf dso: Fix memory leak in dso__new_map() perf test event_update: Fix memory leak of unit perf test event_update: Fix memory leak of evlist ...
-
git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds authored
Pull xfs fixes from Darrick Wong: "A few fixes for issues in the new online shrink code, additional corrections for my recent bug-hunt w.r.t. extent size hints on realtime, and improved input checking of the GROWFSRT ioctl. IOW, the usual 'I somehow got bored during the merge window and resumed auditing the farther reaches of xfs': - Fix shrink eligibility checking when sparse inode clusters enabled - Reset '..' directory entries when unlinking directories to prevent verifier errors if fs is shrinked later - Don't report unusable extent size hints to FSGETXATTR - Don't warn when extent size hints are unusable because the sysadmin configured them that way - Fix insufficient parameter validation in GROWFSRT ioctl - Fix integer overflow when adding rt volumes to filesystem" * tag 'xfs-5.14-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: detect misaligned rtinherit directory extent size hints xfs: fix an integer overflow error in xfs_growfs_rt xfs: improve FSGROWFSRT precondition checking xfs: don't expose misaligned extszinherit hints to userspace xfs: correct the narrative around misaligned rtinherit/extszinherit dirs xfs: reset child dir '..' entry when unlinking child xfs: check for sparse inode clusters that cross new EOAG when shrinking
-
git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds authored
Pull iomap fixes from Darrick Wong: "A handful of bugfixes for the iomap code. There's nothing especially exciting here, just fixes for UBSAN (not KASAN as I erroneously wrote in the tag message) warnings about undefined behavior in the SEEK_DATA/SEEK_HOLE code, and some reshuffling of per-page block state info to fix some problems with gfs2. - Fix KASAN warnings due to integer overflow in SEEK_DATA/SEEK_HOLE - Fix assertion errors when using inlinedata files on gfs2" * tag 'iomap-5.14-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: iomap: Don't create iomap_page objects in iomap_page_mkwrite_actor iomap: Don't create iomap_page objects for inline files iomap: Permit pages without an iop to enter writeback iomap: remove the length variable in iomap_seek_hole iomap: remove the length variable in iomap_seek_data
-
Linus Torvalds authored
Merge tag 'kbuild-fixes-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Restore the original behavior of scripts/setlocalversion when LOCALVERSION is set to empty. - Show Kconfig prompts even for 'make -s' - Fix the combination of COFNIG_LTO_CLANG=y and CONFIG_MODVERSIONS=y for older GNU Make versions * tag 'kbuild-fixes-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: Documentation: Fix intiramfs script name Kbuild: lto: fix module versionings mismatch in GNU make 3.X kbuild: do not suppress Kconfig prompts for silent build scripts/setlocalversion: fix a bug when LOCALVERSION is empty
-
Robert Richter authored
Documentation was not changed when renaming the script in commit 80e715a0 ("initramfs: rename gen_initramfs_list.sh to gen_initramfs.sh"). Fixing this. Basically does: $ sed -i -e s/gen_initramfs_list.sh/gen_initramfs.sh/g $(git grep -l gen_initramfs_list.sh) Fixes: 80e715a0 ("initramfs: rename gen_initramfs_list.sh to gen_initramfs.sh") Signed-off-by: Robert Richter <rrichter@amd.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
-
Lecopzer Chen authored
When building modules(CONFIG_...=m), I found some of module versions are incorrect and set to 0. This can be found in build log for first clean build which shows WARNING: EXPORT symbol "XXXX" [drivers/XXX/XXX.ko] version generation failed, symbol will not be versioned. But in second build(incremental build), the WARNING disappeared and the module version becomes valid CRC and make someone who want to change modules without updating kernel image can't insert their modules. The problematic code is + $(foreach n, $(filter-out FORCE,$^), \ + $(if $(wildcard $(n).symversions), \ + ; cat $(n).symversions >> $@.symversions)) For example: rm -f fs/notify/built-in.a.symversions ; rm -f fs/notify/built-in.a; \ llvm-ar cDPrST fs/notify/built-in.a fs/notify/fsnotify.o \ fs/notify/notification.o fs/notify/group.o ... `foreach n` shows nothing to `cat` into $(n).symversions because `if $(wildcard $(n).symversions)` return nothing, but actually they do exist during this line was executed. -rw-r--r-- 1 root root 168580 Jun 13 19:10 fs/notify/fsnotify.o -rw-r--r-- 1 root root 111 Jun 13 19:10 fs/notify/fsnotify.o.symversions The reason is the $(n).symversions are generated at runtime, but Makefile wildcard function expends and checks the file exist or not during parsing the Makefile. Thus fix this by use `test` shell command to check the file existence in runtime. Rebase from both: 1. [https://lore.kernel.org/lkml/20210616080252.32046-1-lecopzer.chen@mediatek.com/] 2. [https://lore.kernel.org/lkml/20210702032943.7865-1-lecopzer.chen@mediatek.com/] Fixes: 38e89184 ("kbuild: lto: fix module versioning") Co-developed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
-
Masahiro Yamada authored
When a new CONFIG option is available, Kbuild shows a prompt to get the user input. $ make [ snip ] Core Scheduling for SMT (SCHED_CORE) [N/y/?] (NEW) This is the only interactive place in the build process. Commit 174a1dcc ("kbuild: sink stdout from cmd for silent build") suppressed Kconfig prompts as well because syncconfig is invoked by the 'cmd' macro. You cannot notice the fact that Kconfig is waiting for the user input. Use 'kecho' to show the equivalent short log without suppressing stdout from sub-make. Fixes: 174a1dcc ("kbuild: sink stdout from cmd for silent build") Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
-
Mikulas Patocka authored
The commit 042da426 ("scripts/setlocalversion: simplify the short version part") reduces indentation. Unfortunately, it also changes behavior in a subtle way - if the user has empty "LOCALVERSION" variable, the plus sign is appended to the kernel version. It wasn't appended before. This patch reverts to the old behavior - we append the plus sign only if the LOCALVERSION variable is not set. Fixes: 042da426 ("scripts/setlocalversion: simplify the short version part") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
-
Yang Jihong authored
The tracepoints trace_sched_stat_{wait, sleep, iowait} are not exposed to user if CONFIG_SCHEDSTATS is not set, "perf sched record" records the three events. As a result, the command fails. Before: #perf sched record sleep 1 event syntax error: 'sched:sched_stat_wait' \___ unknown tracepoint Error: File /sys/kernel/tracing/events/sched/sched_stat_wait not found. Hint: Perhaps this kernel misses some CONFIG_ setting to enable this feature?. Run 'perf list' for a list of valid events Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] -e, --event <event> event selector. use 'perf list' to list available events Solution: Check whether schedstat tracepoints are exposed. If no, these events are not recorded. After: # perf sched record sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.163 MB perf.data (1091 samples) ] # perf sched report run measurement overhead: 4736 nsecs sleep measurement overhead: 9059979 nsecs the run test took 999854 nsecs the sleep test took 8945271 nsecs nr_run_events: 716 nr_sleep_events: 785 nr_wakeup_events: 0 ... ------------------------------------------------------------ Fixes: 2a09b5de ("sched/fair: do not expose some tracepoints to user if CONFIG_SCHEDSTATS is not set") Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Yafang Shao <laoar.shao@gmail.com> Link: http://lore.kernel.org/lkml/20210713112358.194693-1-yangjihong1@huawei.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Yang Jihong authored
The "address" member of "struct probe_trace_point" uses long data type. If kernel is 64-bit and perf program is 32-bit, size of "address" variable is 32 bits. As a result, upper 32 bits of address read from kernel are truncated, an error occurs during address comparison in kprobe_warn_out_range(). Before: # perf probe -a schedule schedule is out of .text, skip it. Error: Failed to add events. Solution: Change data type of "address" variable to u64 and change corresponding address printing and value assignment. After: # perf.new.new probe -a schedule Added new event: probe:schedule (on schedule) You can now use it in all perf tools, such as: perf record -e probe:schedule -aR sleep 1 # perf probe -l probe:schedule (on schedule@kernel/sched/core.c) # perf record -e probe:schedule -aR sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.156 MB perf.data (1366 samples) ] # perf report --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 1K of event 'probe:schedule' # Event count (approx.): 1366 # # Overhead Command Shared Object Symbol # ........ ............... ................. ............ # 6.22% migration/0 [kernel.kallsyms] [k] schedule 6.22% migration/1 [kernel.kallsyms] [k] schedule 6.22% migration/2 [kernel.kallsyms] [k] schedule 6.22% migration/3 [kernel.kallsyms] [k] schedule 6.15% migration/10 [kernel.kallsyms] [k] schedule 6.15% migration/11 [kernel.kallsyms] [k] schedule 6.15% migration/12 [kernel.kallsyms] [k] schedule 6.15% migration/13 [kernel.kallsyms] [k] schedule 6.15% migration/14 [kernel.kallsyms] [k] schedule 6.15% migration/15 [kernel.kallsyms] [k] schedule 6.15% migration/4 [kernel.kallsyms] [k] schedule 6.15% migration/5 [kernel.kallsyms] [k] schedule 6.15% migration/6 [kernel.kallsyms] [k] schedule 6.15% migration/7 [kernel.kallsyms] [k] schedule 6.15% migration/8 [kernel.kallsyms] [k] schedule 6.15% migration/9 [kernel.kallsyms] [k] schedule 0.22% rcu_sched [kernel.kallsyms] [k] schedule ... # # (Cannot load tips.txt file, please install perf!) # Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Frank Ch. Eigler <fche@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Jianlin Lv <jianlin.lv@arm.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Li Huafei <lihuafei1@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lore.kernel.org/lkml/20210715063723.11926-1-yangjihong1@huawei.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-