1. 15 Nov, 2019 13 commits
    • Liran Alon's avatar
      KVM: nVMX: Update vmcs01 TPR_THRESHOLD if L2 changed L1 TPR · 02d496cf
      Liran Alon authored
      When L1 don't use TPR-Shadow to run L2, L0 configures vmcs02 without
      TPR-Shadow and install intercepts on CR8 access (load and store).
      
      If L1 do not intercept L2 CR8 access, L0 intercepts on those accesses
      will emulate load/store on L1's LAPIC TPR. If in this case L2 lowers
      TPR such that there is now an injectable interrupt to L1,
      apic_update_ppr() will request a KVM_REQ_EVENT which will trigger a call
      to update_cr8_intercept() to update TPR-Threshold to highest pending IRR
      priority.
      
      However, this update to TPR-Threshold is done while active vmcs is
      vmcs02 instead of vmcs01. Thus, when later at some point L0 will
      emulate an exit from L2 to L1, L1 will still run with high
      TPR-Threshold. This will result in every VMEntry to L1 to immediately
      exit on TPR_BELOW_THRESHOLD and continue to do so infinitely until
      some condition will cause KVM_REQ_EVENT to be set.
      (Note that TPR_BELOW_THRESHOLD exit handler do not set KVM_REQ_EVENT
      until apic_update_ppr() will notice a new injectable interrupt for PPR)
      
      To fix this issue, change update_cr8_intercept() such that if L2 lowers
      L1's TPR in a way that requires to lower L1's TPR-Threshold, save update
      to TPR-Threshold and apply it to vmcs01 when L0 emulates an exit from
      L2 to L1.
      Reviewed-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      02d496cf
    • Liran Alon's avatar
      KVM: VMX: Refactor update_cr8_intercept() · 132f4f7e
      Liran Alon authored
      No functional changes.
      Reviewed-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      132f4f7e
    • Liran Alon's avatar
      KVM: SVM: Remove check if APICv enabled in SVM update_cr8_intercept() handler · 49d654d8
      Liran Alon authored
      This check is unnecessary as x86 update_cr8_intercept() which calls
      this VMX/SVM specific callback already performs this check.
      Reviewed-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      49d654d8
    • Miaohe Lin's avatar
      KVM: APIC: add helper func to remove duplicate code in kvm_pv_send_ipi · 1a686237
      Miaohe Lin authored
      There are some duplicate code in kvm_pv_send_ipi when deal with ipi
      bitmap. Add helper func to remove it, and eliminate odd out label,
      get rid of unnecessary kvm_lapic_irq field init and so on.
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1a686237
    • Miaohe Lin's avatar
      KVM: X86: avoid unused setup_syscalls_segments call when SYSCALL check failed · 5b4ce93a
      Miaohe Lin authored
      When SYSCALL/SYSENTER ability check failed, cs and ss is inited but
      remain not used. Delay initializing cs and ss until SYSCALL/SYSENTER
      ability check passed.
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5b4ce93a
    • Miaohe Lin's avatar
      KVM: MMIO: get rid of odd out_err label in kvm_coalesced_mmio_init · b139b5a2
      Miaohe Lin authored
      The out_err label and var ret is unnecessary, clean them up.
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b139b5a2
    • Liran Alon's avatar
      KVM: VMX: Consume pending LAPIC INIT event when exit on INIT_SIGNAL · e64a8508
      Liran Alon authored
      Intel SDM section 25.2 OTHER CAUSES OF VM EXITS specifies the following
      on INIT signals: "Such exits do not modify register state or clear pending
      events as they would outside of VMX operation."
      
      When commit 4b9852f4 ("KVM: x86: Fix INIT signal handling in various CPU states")
      was applied, I interepted above Intel SDM statement such that
      INIT_SIGNAL exit don’t consume the LAPIC INIT pending event.
      
      However, when Nadav Amit run matching kvm-unit-test on a bare-metal
      machine, it turned out my interpetation was wrong. i.e. INIT_SIGNAL
      exit does consume the LAPIC INIT pending event.
      (See: https://www.spinics.net/lists/kvm/msg196757.html)
      
      Therefore, fix KVM code to behave as observed on bare-metal.
      
      Fixes: 4b9852f4 ("KVM: x86: Fix INIT signal handling in various CPU states")
      Reported-by: default avatarNadav Amit <nadav.amit@gmail.com>
      Reviewed-by: default avatarMihai Carabas <mihai.carabas@oracle.com>
      Reviewed-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e64a8508
    • Liran Alon's avatar
      KVM: x86: Prevent set vCPU into INIT/SIPI_RECEIVED state when INIT are latched · 27cbe7d6
      Liran Alon authored
      Commit 4b9852f4 ("KVM: x86: Fix INIT signal handling in various CPU states")
      fixed KVM to also latch pending LAPIC INIT event when vCPU is in VMX
      operation.
      
      However, current API of KVM_SET_MP_STATE allows userspace to put vCPU
      into KVM_MP_STATE_SIPI_RECEIVED or KVM_MP_STATE_INIT_RECEIVED even when
      vCPU is in VMX operation.
      
      Fix this by introducing a util method to check if vCPU state latch INIT
      signals and use it in KVM_SET_MP_STATE handler.
      
      Fixes: 4b9852f4 ("KVM: x86: Fix INIT signal handling in various CPU states")
      Reported-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Reviewed-by: default avatarMihai Carabas <mihai.carabas@oracle.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      27cbe7d6
    • Liran Alon's avatar
      KVM: x86: Evaluate latched_init in KVM_SET_VCPU_EVENTS when vCPU not in SMM · ff90afa7
      Liran Alon authored
      Commit 4b9852f4 ("KVM: x86: Fix INIT signal handling in various CPU states")
      fixed KVM to also latch pending LAPIC INIT event when vCPU is in VMX
      operation.
      
      However, current API of KVM_SET_VCPU_EVENTS defines this field as
      part of SMM state and only set pending LAPIC INIT event if vCPU is
      specified to be in SMM mode (events->smi.smm is set).
      
      Change KVM_SET_VCPU_EVENTS handler to set pending LAPIC INIT event
      by latched_init field regardless of if vCPU is in SMM mode or not.
      
      Fixes: 4b9852f4 ("KVM: x86: Fix INIT signal handling in various CPU states")
      Reviewed-by: default avatarMihai Carabas <mihai.carabas@oracle.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ff90afa7
    • Andrea Arcangeli's avatar
      x86: retpolines: eliminate retpoline from msr event handlers · 74c504a6
      Andrea Arcangeli authored
      It's enough to check the value and issue the direct call.
      
      After this commit is applied, here the most common retpolines executed
      under a high resolution timer workload in the guest on a VMX host:
      
      [..]
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 267
      @[]: 2256
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          __kvm_wait_lapic_expire+284
          vmx_vcpu_run.part.97+1091
          vcpu_enter_guest+377
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 2390
      @[]: 33410
      
      @total: 315707
      
      Note the highest hit above is __delay so probably not worth optimizing
      even if it would be more frequent than 2k hits per sec.
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      74c504a6
    • Andrea Arcangeli's avatar
      KVM: retpolines: x86: eliminate retpoline from svm.c exit handlers · 3dcb2a3f
      Andrea Arcangeli authored
      It's enough to check the exit value and issue a direct call to avoid
      the retpoline for all the common vmexit reasons.
      
      After this commit is applied, here the most common retpolines executed
      under a high resolution timer workload in the guest on a SVM host:
      
      [..]
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          ktime_get_update_offsets_now+70
          hrtimer_interrupt+131
          smp_apic_timer_interrupt+106
          apic_timer_interrupt+15
          start_sw_timer+359
          restart_apic_timer+85
          kvm_set_msr_common+1497
          msr_interception+142
          vcpu_enter_guest+684
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 1940
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_r12+33
          force_qs_rnp+217
          rcu_gp_kthread+1270
          kthread+268
          ret_from_fork+34
      ]: 4644
      @[]: 25095
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          lapic_next_event+28
          clockevents_program_event+148
          hrtimer_start_range_ns+528
          start_sw_timer+356
          restart_apic_timer+85
          kvm_set_msr_common+1497
          msr_interception+142
          vcpu_enter_guest+684
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 41474
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          clockevents_program_event+148
          hrtimer_start_range_ns+528
          start_sw_timer+356
          restart_apic_timer+85
          kvm_set_msr_common+1497
          msr_interception+142
          vcpu_enter_guest+684
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 41474
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          ktime_get+58
          clockevents_program_event+84
          hrtimer_start_range_ns+528
          start_sw_timer+356
          restart_apic_timer+85
          kvm_set_msr_common+1497
          msr_interception+142
          vcpu_enter_guest+684
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 41887
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          lapic_next_event+28
          clockevents_program_event+148
          hrtimer_try_to_cancel+168
          hrtimer_cancel+21
          kvm_set_lapic_tscdeadline_msr+43
          kvm_set_msr_common+1497
          msr_interception+142
          vcpu_enter_guest+684
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 42723
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          clockevents_program_event+148
          hrtimer_try_to_cancel+168
          hrtimer_cancel+21
          kvm_set_lapic_tscdeadline_msr+43
          kvm_set_msr_common+1497
          msr_interception+142
          vcpu_enter_guest+684
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 42766
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          ktime_get+58
          clockevents_program_event+84
          hrtimer_try_to_cancel+168
          hrtimer_cancel+21
          kvm_set_lapic_tscdeadline_msr+43
          kvm_set_msr_common+1497
          msr_interception+142
          vcpu_enter_guest+684
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 42848
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          ktime_get+58
          start_sw_timer+279
          restart_apic_timer+85
          kvm_set_msr_common+1497
          msr_interception+142
          vcpu_enter_guest+684
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 499845
      
      @total: 1780243
      
      SVM has no TSC based programmable preemption timer so it is invoking
      ktime_get() frequently.
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3dcb2a3f
    • Andrea Arcangeli's avatar
      KVM: retpolines: x86: eliminate retpoline from vmx.c exit handlers · 4289d272
      Andrea Arcangeli authored
      It's enough to check the exit value and issue a direct call to avoid
      the retpoline for all the common vmexit reasons.
      
      Of course CONFIG_RETPOLINE already forbids gcc to use indirect jumps
      while compiling all switch() statements, however switch() would still
      allow the compiler to bisect the case value. It's more efficient to
      prioritize the most frequent vmexits instead.
      
      The halt may be slow paths from the point of the guest, but not
      necessarily so from the point of the host if the host runs at full CPU
      capacity and no host CPU is ever left idle.
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4289d272
    • Andrea Arcangeli's avatar
      KVM: x86: optimize more exit handlers in vmx.c · f399e60c
      Andrea Arcangeli authored
      Eliminate wasteful call/ret non RETPOLINE case and unnecessary fentry
      dynamic tracing hooking points.
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f399e60c
  2. 11 Nov, 2019 1 commit
  3. 02 Nov, 2019 1 commit
    • Marcelo Tosatti's avatar
      KVM: x86: switch KVMCLOCK base to monotonic raw clock · 53fafdbb
      Marcelo Tosatti authored
      Commit 0bc48bea ("KVM: x86: update master clock before computing
      kvmclock_offset")
      switches the order of operations to avoid the conversion
      
      TSC (without frequency correction) ->
      system_timestamp (with frequency correction),
      
      which might cause a time jump.
      
      However, it leaves any other masterclock update unsafe, which includes,
      at the moment:
      
              * HV_X64_MSR_REFERENCE_TSC MSR write.
              * TSC writes.
              * Host suspend/resume.
      
      Avoid the time jump issue by using frequency uncorrected
      CLOCK_MONOTONIC_RAW clock.
      
      Its the guests time keeping software responsability
      to track and correct a reference clock such as UTC.
      
      This fixes forward time jump (which can result in
      failure to bring up a vCPU) during vCPU hotplug:
      
      Oct 11 14:48:33 storage kernel: CPU2 has been hot-added
      Oct 11 14:48:34 storage kernel: CPU3 has been hot-added
      Oct 11 14:49:22 storage kernel: smpboot: Booting Node 0 Processor 2 APIC 0x2          <-- time jump of almost 1 minute
      Oct 11 14:49:22 storage kernel: smpboot: do_boot_cpu failed(-1) to wakeup CPU#2
      Oct 11 14:49:23 storage kernel: smpboot: Booting Node 0 Processor 3 APIC 0x3
      Oct 11 14:49:23 storage kernel: kvm-clock: cpu 3, msr 0:7ff640c1, secondary cpu clock
      
      Which happens because:
      
                      /*
                       * Wait 10s total for a response from AP
                       */
                      boot_error = -1;
                      timeout = jiffies + 10*HZ;
                      while (time_before(jiffies, timeout)) {
                               ...
                      }
      Analyzed-by: default avatarIgor Mammedov <imammedo@redhat.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      53fafdbb
  4. 31 Oct, 2019 1 commit
    • Paolo Bonzini's avatar
      Merge tag 'kvm-ppc-next-5.5-1' of... · e7011c5d
      Paolo Bonzini authored
      Merge tag 'kvm-ppc-next-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
      
      KVM PPC update for 5.5
      
      * Add capability to tell userspace whether we can single-step the guest.
      
      * Improve the allocation of XIVE virtual processor IDs, to reduce the
        risk of running out of IDs when running many VMs on POWER9.
      
      * Rewrite interrupt synthesis code to deliver interrupts in virtual
        mode when appropriate.
      
      * Minor cleanups and improvements.
      e7011c5d
  5. 25 Oct, 2019 1 commit
  6. 22 Oct, 2019 23 commits