1. 20 Dec, 2021 6 commits
    • Sean Christopherson's avatar
      KVM: nVMX: Synthesize TRIPLE_FAULT for L2 if emulation is required · cd0e615c
      Sean Christopherson authored
      Synthesize a triple fault if L2 guest state is invalid at the time of
      VM-Enter, which can happen if L1 modifies SMRAM or if userspace stuffs
      guest state via ioctls(), e.g. KVM_SET_SREGS.  KVM should never emulate
      invalid guest state, since from L1's perspective, it's architecturally
      impossible for L2 to have invalid state while L2 is running in hardware.
      E.g. attempts to set CR0 or CR4 to unsupported values will either VM-Exit
      or #GP.
      
      Modifying vCPU state via RSM+SMRAM and ioctl() are the only paths that
      can trigger this scenario, as nested VM-Enter correctly rejects any
      attempt to enter L2 with invalid state.
      
      RSM is a straightforward case as (a) KVM follows AMD's SMRAM layout and
      behavior, and (b) Intel's SDM states that loading reserved CR0/CR4 bits
      via RSM results in shutdown, i.e. there is precedent for KVM's behavior.
      Following AMD's SMRAM layout is important as AMD's layout saves/restores
      the descriptor cache information, including CS.RPL and SS.RPL, and also
      defines all the fields relevant to invalid guest state as read-only, i.e.
      so long as the vCPU had valid state before the SMI, which is guaranteed
      for L2, RSM will generate valid state unless SMRAM was modified.  Intel's
      layout saves/restores only the selector, which means that scenarios where
      the selector and cached RPL don't match, e.g. conforming code segments,
      would yield invalid guest state.  Intel CPUs fudge around this issued by
      stuffing SS.RPL and CS.RPL on RSM.  Per Intel's SDM on the "Default
      Treatment of RSM", paraphrasing for brevity:
      
        IF internal storage indicates that the [CPU was post-VMXON]
        THEN
           enter VMX operation (root or non-root);
           restore VMX-critical state as defined in Section 34.14.1;
           set to their fixed values any bits in CR0 and CR4 whose values must
           be fixed in VMX operation [unless coming from an unrestricted guest];
           IF RFLAGS.VM = 0 AND (in VMX root operation OR the
              “unrestricted guest” VM-execution control is 0)
           THEN
             CS.RPL := SS.DPL;
             SS.RPL := SS.DPL;
           FI;
           restore current VMCS pointer;
        FI;
      
      Note that Intel CPUs also overwrite the fixed CR0/CR4 bits, whereas KVM
      will sythesize TRIPLE_FAULT in this scenario.  KVM's behavior is allowed
      as both Intel and AMD define CR0/CR4 SMRAM fields as read-only, i.e. the
      only way for CR0 and/or CR4 to have illegal values is if they were
      modified by the L1 SMM handler, and Intel's SDM "SMRAM State Save Map"
      section states "modifying these registers will result in unpredictable
      behavior".
      
      KVM's ioctl() behavior is less straightforward.  Because KVM allows
      ioctls() to be executed in any order, rejecting an ioctl() if it would
      result in invalid L2 guest state is not an option as KVM cannot know if
      a future ioctl() would resolve the invalid state, e.g. KVM_SET_SREGS, or
      drop the vCPU out of L2, e.g. KVM_SET_NESTED_STATE.  Ideally, KVM would
      reject KVM_RUN if L2 contained invalid guest state, but that carries the
      risk of a false positive, e.g. if RSM loaded invalid guest state and KVM
      exited to userspace.  Setting a flag/request to detect such a scenario is
      undesirable because (a) it's extremely unlikely to add value to KVM as a
      whole, and (b) KVM would need to consider ioctl() interactions with such
      a flag, e.g. if userspace migrated the vCPU while the flag were set.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211207193006.120997-3-seanjc@google.com>
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      cd0e615c
    • Sean Christopherson's avatar
      KVM: VMX: Always clear vmx->fail on emulation_required · a80dfc02
      Sean Christopherson authored
      Revert a relatively recent change that set vmx->fail if the vCPU is in L2
      and emulation_required is true, as that behavior is completely bogus.
      Setting vmx->fail and synthesizing a VM-Exit is contradictory and wrong:
      
        (a) it's impossible to have both a VM-Fail and VM-Exit
        (b) vmcs.EXIT_REASON is not modified on VM-Fail
        (c) emulation_required refers to guest state and guest state checks are
            always VM-Exits, not VM-Fails.
      
      For KVM specifically, emulation_required is handled before nested exits
      in __vmx_handle_exit(), thus setting vmx->fail has no immediate effect,
      i.e. KVM calls into handle_invalid_guest_state() and vmx->fail is ignored.
      Setting vmx->fail can ultimately result in a WARN in nested_vmx_vmexit()
      firing when tearing down the VM as KVM never expects vmx->fail to be set
      when L2 is active, KVM always reflects those errors into L1.
      
        ------------[ cut here ]------------
        WARNING: CPU: 0 PID: 21158 at arch/x86/kvm/vmx/nested.c:4548
                                      nested_vmx_vmexit+0x16bd/0x17e0
                                      arch/x86/kvm/vmx/nested.c:4547
        Modules linked in:
        CPU: 0 PID: 21158 Comm: syz-executor.1 Not tainted 5.16.0-rc3-syzkaller #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        RIP: 0010:nested_vmx_vmexit+0x16bd/0x17e0 arch/x86/kvm/vmx/nested.c:4547
        Code: <0f> 0b e9 2e f8 ff ff e8 57 b3 5d 00 0f 0b e9 00 f1 ff ff 89 e9 80
        Call Trace:
         vmx_leave_nested arch/x86/kvm/vmx/nested.c:6220 [inline]
         nested_vmx_free_vcpu+0x83/0xc0 arch/x86/kvm/vmx/nested.c:330
         vmx_free_vcpu+0x11f/0x2a0 arch/x86/kvm/vmx/vmx.c:6799
         kvm_arch_vcpu_destroy+0x6b/0x240 arch/x86/kvm/x86.c:10989
         kvm_vcpu_destroy+0x29/0x90 arch/x86/kvm/../../../virt/kvm/kvm_main.c:441
         kvm_free_vcpus arch/x86/kvm/x86.c:11426 [inline]
         kvm_arch_destroy_vm+0x3ef/0x6b0 arch/x86/kvm/x86.c:11545
         kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:1189 [inline]
         kvm_put_kvm+0x751/0xe40 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1220
         kvm_vcpu_release+0x53/0x60 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3489
         __fput+0x3fc/0x870 fs/file_table.c:280
         task_work_run+0x146/0x1c0 kernel/task_work.c:164
         exit_task_work include/linux/task_work.h:32 [inline]
         do_exit+0x705/0x24f0 kernel/exit.c:832
         do_group_exit+0x168/0x2d0 kernel/exit.c:929
         get_signal+0x1740/0x2120 kernel/signal.c:2852
         arch_do_signal_or_restart+0x9c/0x730 arch/x86/kernel/signal.c:868
         handle_signal_work kernel/entry/common.c:148 [inline]
         exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
         exit_to_user_mode_prepare+0x191/0x220 kernel/entry/common.c:207
         __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
         syscall_exit_to_user_mode+0x2e/0x70 kernel/entry/common.c:300
         do_syscall_64+0x53/0xd0 arch/x86/entry/common.c:86
         entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Fixes: c8607e4a ("KVM: x86: nVMX: don't fail nested VM entry on invalid guest state if !from_vmentry")
      Reported-by: syzbot+f1d2136db9c80d4733e8@syzkaller.appspotmail.com
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211207193006.120997-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a80dfc02
    • Andrew Jones's avatar
      selftests: KVM: Fix non-x86 compiling · 577e022b
      Andrew Jones authored
      Attempting to compile on a non-x86 architecture fails with
      
      include/kvm_util.h: In function ‘vm_compute_max_gfnâ€:
      include/kvm_util.h:79:21: error: dereferencing pointer to incomplete type ‘struct kvm_vmâ€
        return ((1ULL << vm->pa_bits) >> vm->page_shift) - 1;
                           ^~
      
      This is because the declaration of struct kvm_vm is in
      lib/kvm_util_internal.h as an effort to make it private to
      the test lib code. We can still provide arch specific functions,
      though, by making the generic function symbols weak. Do that to
      fix the compile error.
      
      Fixes: c8cc43c1 ("selftests: KVM: avoid failures due to reserved HyperTransport region")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAndrew Jones <drjones@redhat.com>
      Message-Id: <20211214151842.848314-1-drjones@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      577e022b
    • Marc Orr's avatar
      KVM: x86: Always set kvm_run->if_flag · c5063551
      Marc Orr authored
      The kvm_run struct's if_flag is a part of the userspace/kernel API. The
      SEV-ES patches failed to set this flag because it's no longer needed by
      QEMU (according to the comment in the source code). However, other
      hypervisors may make use of this flag. Therefore, set the flag for
      guests with encrypted registers (i.e., with guest_state_protected set).
      
      Fixes: f1c6366e ("KVM: SVM: Add required changes to support intercepts under SEV-ES")
      Signed-off-by: default avatarMarc Orr <marcorr@google.com>
      Message-Id: <20211209155257.128747-1-marcorr@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      c5063551
    • Sean Christopherson's avatar
      KVM: x86/mmu: Don't advance iterator after restart due to yielding · 3a0f64de
      Sean Christopherson authored
      After dropping mmu_lock in the TDP MMU, restart the iterator during
      tdp_iter_next() and do not advance the iterator.  Advancing the iterator
      results in skipping the top-level SPTE and all its children, which is
      fatal if any of the skipped SPTEs were not visited before yielding.
      
      When zapping all SPTEs, i.e. when min_level == root_level, restarting the
      iter and then invoking tdp_iter_next() is always fatal if the current gfn
      has as a valid SPTE, as advancing the iterator results in try_step_side()
      skipping the current gfn, which wasn't visited before yielding.
      
      Sprinkle WARNs on iter->yielded being true in various helpers that are
      often used in conjunction with yielding, and tag the helper with
      __must_check to reduce the probabily of improper usage.
      
      Failing to zap a top-level SPTE manifests in one of two ways.  If a valid
      SPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),
      the shadow page will be leaked and KVM will WARN accordingly.
      
        WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm]
        RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm]
        Call Trace:
         <TASK>
         kvm_arch_destroy_vm+0x130/0x1b0 [kvm]
         kvm_destroy_vm+0x162/0x2a0 [kvm]
         kvm_vcpu_release+0x34/0x60 [kvm]
         __fput+0x82/0x240
         task_work_run+0x5c/0x90
         do_exit+0x364/0xa10
         ? futex_unqueue+0x38/0x60
         do_group_exit+0x33/0xa0
         get_signal+0x155/0x850
         arch_do_signal_or_restart+0xed/0x750
         exit_to_user_mode_prepare+0xc5/0x120
         syscall_exit_to_user_mode+0x1d/0x40
         do_syscall_64+0x48/0xc0
         entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      If kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped by
      kvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form of
      marking a struct page as dirty/accessed after it has been put back on the
      free list.  This directly triggers a WARN due to encountering a page with
      page_count() == 0, but it can also lead to data corruption and additional
      errors in the kernel.
      
        WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171
        RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm]
        Call Trace:
         <TASK>
         kvm_set_pfn_dirty+0x120/0x1d0 [kvm]
         __handle_changed_spte+0x92e/0xca0 [kvm]
         __handle_changed_spte+0x63c/0xca0 [kvm]
         __handle_changed_spte+0x63c/0xca0 [kvm]
         __handle_changed_spte+0x63c/0xca0 [kvm]
         zap_gfn_range+0x549/0x620 [kvm]
         kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm]
         mmu_free_root_page+0x219/0x2c0 [kvm]
         kvm_mmu_free_roots+0x1b4/0x4e0 [kvm]
         kvm_mmu_unload+0x1c/0xa0 [kvm]
         kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm]
         kvm_put_kvm+0x3b1/0x8b0 [kvm]
         kvm_vcpu_release+0x4e/0x70 [kvm]
         __fput+0x1f7/0x8c0
         task_work_run+0xf8/0x1a0
         do_exit+0x97b/0x2230
         do_group_exit+0xda/0x2a0
         get_signal+0x3be/0x1e50
         arch_do_signal_or_restart+0x244/0x17f0
         exit_to_user_mode_prepare+0xcb/0x120
         syscall_exit_to_user_mode+0x1d/0x40
         do_syscall_64+0x4d/0x90
         entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Note, the underlying bug existed even before commit 1af4a960 ("KVM:
      x86/mmu: Yield in TDU MMU iter even if no SPTES changed") moved calls to
      tdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could still
      incorrectly advance past a top-level entry when yielding on a lower-level
      entry.  But with respect to leaking shadow pages, the bug was introduced
      by yielding before processing the current gfn.
      
      Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, or
      callers could jump to their "retry" label.  The downside of that approach
      is that tdp_mmu_iter_cond_resched() _must_ be called before anything else
      in the loop, and there's no easy way to enfornce that requirement.
      
      Ideally, KVM would handling the cond_resched() fully within the iterator
      macro (the code is actually quite clean) and avoid this entire class of
      bugs, but that is extremely difficult do while also supporting yielding
      after tdp_mmu_set_spte_atomic() fails.  Yielding after failing to set a
      SPTE is very desirable as the "owner" of the REMOVED_SPTE isn't strictly
      bounded, e.g. if it's zapping a high-level shadow page, the REMOVED_SPTE
      may block operations on the SPTE for a significant amount of time.
      
      Fixes: faaf05b0 ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU")
      Fixes: 1af4a960 ("KVM: x86/mmu: Yield in TDU MMU iter even if no SPTES changed")
      Reported-by: default avatarIgnat Korchagin <ignat@cloudflare.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211214033528.123268-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3a0f64de
    • Wei Wang's avatar
      KVM: x86: remove PMU FIXED_CTR3 from msrs_to_save_all · 9fb12fe5
      Wei Wang authored
      The fixed counter 3 is used for the Topdown metrics, which hasn't been
      enabled for KVM guests. Userspace accessing to it will fail as it's not
      included in get_fixed_pmc(). This breaks KVM selftests on ICX+ machines,
      which have this counter.
      
      To reproduce it on ICX+ machines, ./state_test reports:
      ==== Test Assertion Failure ====
      lib/x86_64/processor.c:1078: r == nmsrs
      pid=4564 tid=4564 - Argument list too long
      1  0x000000000040b1b9: vcpu_save_state at processor.c:1077
      2  0x0000000000402478: main at state_test.c:209 (discriminator 6)
      3  0x00007fbe21ed5f92: ?? ??:0
      4  0x000000000040264d: _start at ??:?
       Unexpected result from KVM_GET_MSRS, r: 17 (failed MSR was 0x30c)
      
      With this patch, it works well.
      Signed-off-by: default avatarWei Wang <wei.w.wang@intel.com>
      Message-Id: <20211217124934.32893-1-wei.w.wang@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      9fb12fe5
  2. 19 Dec, 2021 3 commits
  3. 10 Dec, 2021 6 commits
  4. 09 Dec, 2021 1 commit
  5. 08 Dec, 2021 1 commit
    • Vitaly Kuznetsov's avatar
      KVM: nVMX: Don't use Enlightened MSR Bitmap for L3 · 250552b9
      Vitaly Kuznetsov authored
      When KVM runs as a nested hypervisor on top of Hyper-V it uses Enlightened
      VMCS and enables Enlightened MSR Bitmap feature for its L1s and L2s (which
      are actually L2s and L3s from Hyper-V's perspective). When MSR bitmap is
      updated, KVM has to reset HV_VMX_ENLIGHTENED_CLEAN_FIELD_MSR_BITMAP from
      clean fields to make Hyper-V aware of the change. For KVM's L1s, this is
      done in vmx_disable_intercept_for_msr()/vmx_enable_intercept_for_msr().
      MSR bitmap for L2 is build in nested_vmx_prepare_msr_bitmap() by blending
      MSR bitmap for L1 and L1's idea of MSR bitmap for L2. KVM, however, doesn't
      check if the resulting bitmap is different and never cleans
      HV_VMX_ENLIGHTENED_CLEAN_FIELD_MSR_BITMAP in eVMCS02. This is incorrect and
      may result in Hyper-V missing the update.
      
      The issue could've been solved by calling evmcs_touch_msr_bitmap() for
      eVMCS02 from nested_vmx_prepare_msr_bitmap() unconditionally but doing so
      would not give any performance benefits (compared to not using Enlightened
      MSR Bitmap at all). 3-level nesting is also not a very common setup
      nowadays.
      
      Don't enable 'Enlightened MSR Bitmap' feature for KVM's L2s (real L3s) for
      now.
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20211129094704.326635-2-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      250552b9
  6. 05 Dec, 2021 12 commits
    • Linus Torvalds's avatar
      Linux 5.16-rc4 · 0fcfb00b
      Linus Torvalds authored
      0fcfb00b
    • Linus Torvalds's avatar
      Merge tag 'for-5.16/parisc-6' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · 268ba095
      Linus Torvalds authored
      Pull parisc fixes from Helge Deller:
       "Some bug and warning fixes:
      
         - Fix "make install" to use debians "installkernel" script which is
           now in /usr/sbin
      
         - Fix the bindeb-pkg make target by giving the correct KBUILD_IMAGE
           file name
      
         - Fix compiler warnings by annotating parisc agp init functions with
           __init
      
         - Fix timekeeping on SMP machines with dual-core CPUs
      
         - Enable some more config options in the 64-bit defconfig"
      
      * tag 'for-5.16/parisc-6' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: Mark cr16 CPU clocksource unstable on all SMP machines
        parisc: Fix "make install" on newer debian releases
        parisc/agp: Annotate parisc agp init functions with __init
        parisc: Enable sata sil, audit and usb support on 64-bit defconfig
        parisc: Fix KBUILD_IMAGE for self-extracting kernel
      268ba095
    • Linus Torvalds's avatar
      Merge tag 'usb-5.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 94420704
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are some small USB fixes for a few reported issues. Included in
        here are:
      
         - xhci fix for a _much_ reported regression. I don't think there's a
           community distro that has not reported this problem yet :(
      
         - new USB quirk addition
      
         - cdns3 minor fixes
      
         - typec regression fix.
      
        All of these have been in linux-next with no reported problems, and
        the xhci fix has been reported by many to resolve their reported
        problem"
      
      * tag 'usb-5.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        usb: cdnsp: Fix a NULL pointer dereference in cdnsp_endpoint_init()
        usb: cdns3: gadget: fix new urb never complete if ep cancel previous requests
        usb: typec: tcpm: Wait in SNK_DEBOUNCED until disconnect
        USB: NO_LPM quirk Lenovo Powered USB-C Travel Hub
        xhci: Fix commad ring abort, write all 64 bits to CRCR register.
      94420704
    • Linus Torvalds's avatar
      Merge tag 'tty-5.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · 51639539
      Linus Torvalds authored
      Pull tty/serial fixes from Greg KH:
       "Here are some small TTY and Serial driver fixes for 5.16-rc4 to
        resolve a number of reported problems.
      
        They include:
      
         - liteuart serial driver fixes
      
         - 8250_pci serial driver fixes for pericom devices
      
         - 8250 RTS line control fix while in RS-485 mode
      
         - tegra serial driver fix
      
         - msm_serial driver fix
      
         - pl011 serial driver new id
      
         - fsl_lpuart revert of broken change
      
         - 8250_bcm7271 serial driver fix
      
         - MAINTAINERS file update for rpmsg tty driver that came in 5.16-rc1
      
         - vgacon fix for reported problem
      
        All of these, except for the 8250_bcm7271 fix have been in linux-next
        with no reported problem. The 8250_bcm7271 fix was added to the tree
        on Friday so no chance to be linux-next yet. But it should be fine as
        the affected developers submitted it"
      
      * tag 'tty-5.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        serial: 8250_bcm7271: UART errors after resuming from S2
        serial: 8250_pci: rewrite pericom_do_set_divisor()
        serial: 8250_pci: Fix ACCES entries in pci_serial_quirks array
        serial: 8250: Fix RTS modem control while in rs485 mode
        Revert "tty: serial: fsl_lpuart: drop earlycon entry for i.MX8QXP"
        serial: tegra: Change lower tolerance baud rate limit for tegra20 and tegra30
        serial: liteuart: relax compile-test dependencies
        serial: liteuart: fix minor-number leak on probe errors
        serial: liteuart: fix use-after-free and memleak on unbind
        serial: liteuart: Fix NULL pointer dereference in ->remove()
        vgacon: Propagate console boot parameters before calling `vc_resize'
        tty: serial: msm_serial: Deactivate RX DMA for polling support
        serial: pl011: Add ACPI SBSA UART match id
        serial: core: fix transmit-buffer reset and memleak
        MAINTAINERS: Add rpmsg tty driver maintainer
      51639539
    • Linus Torvalds's avatar
      Merge tag 'timers_urgent_for_v5.16_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7587a4a5
      Linus Torvalds authored
      Pull timer fix from Borislav Petkov:
      
       - Prevent a tick storm when a dedicated timekeeper CPU in nohz_full
         mode runs for prolonged periods with interrupts disabled and ends up
         programming the next tick in the past, leading to that storm
      
      * tag 'timers_urgent_for_v5.16_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        timers/nohz: Last resort update jiffies on nohz_full IRQ entry
      7587a4a5
    • Linus Torvalds's avatar
      Merge tag 'sched_urgent_for_v5.16_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1d213767
      Linus Torvalds authored
      Pull scheduler fixes from Borislav Petkov:
      
       - Properly init uclamp_flags of a runqueue, on first enqueuing
      
       - Fix preempt= callback return values
      
       - Correct utime/stime resource usage reporting on nohz_full to return
         the proper times instead of shorter ones
      
      * tag 'sched_urgent_for_v5.16_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/uclamp: Fix rq->uclamp_max not set on first enqueue
        preempt/dynamic: Fix setup_preempt_mode() return value
        sched/cputime: Fix getrusage(RUSAGE_THREAD) with nohz_full
      1d213767
    • Linus Torvalds's avatar
      Merge tag 'x86_urgent_for_v5.16_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f5d54a42
      Linus Torvalds authored
      Pull x86 fixes from Borislav Petkov:
      
       - Fix a couple of SWAPGS fencing issues in the x86 entry code
      
       - Use the proper operand types in __{get,put}_user() to prevent
         truncation in SEV-ES string io
      
       - Make sure the kernel mappings are present in trampoline_pgd in order
         to prevent any potential accesses to unmapped memory after switching
         to it
      
       - Fix a trivial list corruption in objtool's pv_ops validation
      
       - Disable the clocksource watchdog for TSC on platforms which claim
         that the TSC is constant, doesn't stop in sleep states, CPU has TSC
         adjust and the number of sockets of the platform are max 2, to
         prevent erroneous markings of the TSC as unstable.
      
       - Make sure TSC adjust is always checked not only when going idle
      
       - Prevent a stack leak by initializing struct _fpx_sw_bytes properly in
         the FPU code
      
       - Fix INTEL_FAM6_RAPTORLAKE define naming to adhere to the convention
      
      * tag 'x86_urgent_for_v5.16_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/xen: Add xenpv_restore_regs_and_return_to_usermode()
        x86/entry: Use the correct fence macro after swapgs in kernel CR3
        x86/entry: Add a fence for kernel entry SWAPGS in paranoid_entry()
        x86/sev: Fix SEV-ES INS/OUTS instructions for word, dword, and qword
        x86/64/mm: Map all kernel memory into trampoline_pgd
        objtool: Fix pv_ops noinstr validation
        x86/tsc: Disable clocksource watchdog for TSC on qualified platorms
        x86/tsc: Add a timer to make sure TSC_adjust is always checked
        x86/fpu/signal: Initialize sw_bytes in save_xstate_epilog()
        x86/cpu: Drop spurious underscore from RAPTOR_LAKE #define
      f5d54a42
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 90bf8d98
      Linus Torvalds authored
      Pull more kvm fixes from Paolo Bonzini:
      
       - Static analysis fix
      
       - New SEV-ES protocol for communicating invalid VMGEXIT requests
      
       - Ensure APICv is considered inactive if there is no APIC
      
       - Fix reserved bits for AMD PerfEvtSeln register
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: SVM: Do not terminate SEV-ES guests on GHCB validation failure
        KVM: SEV: Fall back to vmalloc for SEV-ES scratch area if necessary
        KVM: SEV: Return appropriate error codes if SEV-ES scratch setup fails
        KVM: x86/mmu: Retry page fault if root is invalidated by memslot update
        KVM: VMX: Set failure code in prepare_vmcs02()
        KVM: ensure APICv is considered inactive if there is no APIC
        KVM: x86/pmu: Fix reserved bits for AMD PerfEvtSeln register
      90bf8d98
    • Tom Lendacky's avatar
      KVM: SVM: Do not terminate SEV-ES guests on GHCB validation failure · ad5b3532
      Tom Lendacky authored
      Currently, an SEV-ES guest is terminated if the validation of the VMGEXIT
      exit code or exit parameters fails.
      
      The VMGEXIT instruction can be issued from userspace, even though
      userspace (likely) can't update the GHCB. To prevent userspace from being
      able to kill the guest, return an error through the GHCB when validation
      fails rather than terminating the guest. For cases where the GHCB can't be
      updated (e.g. the GHCB can't be mapped, etc.), just return back to the
      guest.
      
      The new error codes are documented in the lasest update to the GHCB
      specification.
      
      Fixes: 291bd20d ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <b57280b5562893e2616257ac9c2d4525a9aeeb42.1638471124.git.thomas.lendacky@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ad5b3532
    • Sean Christopherson's avatar
      KVM: SEV: Fall back to vmalloc for SEV-ES scratch area if necessary · a655276a
      Sean Christopherson authored
      Use kvzalloc() to allocate KVM's buffer for SEV-ES's GHCB scratch area so
      that KVM falls back to __vmalloc() if physically contiguous memory isn't
      available.  The buffer is purely a KVM software construct, i.e. there's
      no need for it to be physically contiguous.
      
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211109222350.2266045-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a655276a
    • Sean Christopherson's avatar
      KVM: SEV: Return appropriate error codes if SEV-ES scratch setup fails · 75236f5f
      Sean Christopherson authored
      Return appropriate error codes if setting up the GHCB scratch area for an
      SEV-ES guest fails.  In particular, returning -EINVAL instead of -ENOMEM
      when allocating the kernel buffer could be confusing as userspace would
      likely suspect a guest issue.
      
      Fixes: 8f423a80 ("KVM: SVM: Support MMIO for an SEV-ES guest")
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211109222350.2266045-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      75236f5f
    • Linus Torvalds's avatar
      Merge tag 'xfs-5.16-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 79a72162
      Linus Torvalds authored
      Pull xfs fix from Darrick Wong:
       "Remove an unnecessary (and backwards) rename flags check that
        duplicates a VFS level check"
      
      * tag 'xfs-5.16-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: remove incorrect ASSERT in xfs_rename
      79a72162
  7. 04 Dec, 2021 9 commits
  8. 03 Dec, 2021 2 commits
    • Linus Torvalds's avatar
      Merge tag 'vfio-v5.16-rc4' of git://github.com/awilliam/linux-vfio · 12119cfa
      Linus Torvalds authored
      Pull VFIO fixes from Alex Williamson:
      
       - Fix OpRegion pointer arithmetic (Zhenyu Wang)
      
       - Fix comment format triggering kernel-doc warnings (Randy Dunlap)
      
      * tag 'vfio-v5.16-rc4' of git://github.com/awilliam/linux-vfio:
        vfio/pci: Fix OpRegion read
        vfio: remove all kernel-doc notation
      12119cfa
    • Linus Torvalds's avatar
      Merge tag 'pm-5.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 4ec6afd6
      Linus Torvalds authored
      Pull power management fixes from Rafael Wysocki:
       "These fix a CPU hot-add issue in the cpufreq core, fix a comment in
        the cpufreq core code and update its documentation, and disable the
        DTPM (Dynamic Thermal Power Management) code for the time being to
        prevent it from causing issues to appear.
      
        Specifics:
      
         - Disable DTPM for this cycle to prevent it from causing issues to
           appear on otherwise functional systems (Daniel Lezcano)
      
         - Fix cpufreq sysfs interface failure related to physical CPU hot-add
           (Xiongfeng Wang)
      
         - Fix comment in cpufreq core and update its documentation (Tang
           Yizhou)"
      
      * tag 'pm-5.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        powercap: DTPM: Drop unused local variable from init_dtpm()
        cpufreq: docs: Update core.rst
        cpufreq: Fix a comment in cpufreq_policy_free
        powercap/drivers/dtpm: Disable DTPM at boot time
        cpufreq: Fix get_cpu_device() failure in add_cpu_dev_symlink()
      4ec6afd6