1. 15 Nov, 2019 20 commits
    • Liran Alon's avatar
      KVM: x86: Optimization: Requst TLB flush in fast_cr3_switch() instead of do it directly · 1924242b
      Liran Alon authored
      When KVM emulates a nested VMEntry (L1->L2 VMEntry), it switches mmu root
      page. If nEPT is used, this will happen from
      kvm_init_shadow_ept_mmu()->__kvm_mmu_new_cr3() and otherwise it will
      happpen from nested_vmx_load_cr3()->kvm_mmu_new_cr3(). Either case,
      __kvm_mmu_new_cr3() will use fast_cr3_switch() in attempt to switch to a
      previously cached root page.
      
      In case fast_cr3_switch() finds a matching cached root page, it will
      set it in mmu->root_hpa and request KVM_REQ_LOAD_CR3 such that on
      next entry to guest, KVM will set root HPA in appropriate hardware
      fields (e.g. vmcs->eptp). In addition, fast_cr3_switch() calls
      kvm_x86_ops->tlb_flush() in order to flush TLB as MMU root page
      was replaced.
      
      This works as mmu->root_hpa, which vmx_flush_tlb() use, was
      already replaced in cached_root_available(). However, this may
      result in unnecessary INVEPT execution because a KVM_REQ_TLB_FLUSH
      may have already been requested. For example, by prepare_vmcs02()
      in case L1 don't use VPID.
      
      Therefore, change fast_cr3_switch() to just request TLB flush on
      next entry to guest.
      Reviewed-by: default avatarBhavesh Davda <bhavesh.davda@oracle.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Reviewed-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1924242b
    • Like Xu's avatar
      KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC · b35e5548
      Like Xu authored
      Currently, a host perf_event is created for a vPMC functionality emulation.
      It’s unpredictable to determine if a disabled perf_event will be reused.
      If they are disabled and are not reused for a considerable period of time,
      those obsolete perf_events would increase host context switch overhead that
      could have been avoided.
      
      If the guest doesn't WRMSR any of the vPMC's MSRs during an entire vcpu
      sched time slice, and its independent enable bit of the vPMC isn't set,
      we can predict that the guest has finished the use of this vPMC, and then
      do request KVM_REQ_PMU in kvm_arch_sched_in and release those perf_events
      in the first call of kvm_pmu_handle_event() after the vcpu is scheduled in.
      
      This lazy mechanism delays the event release time to the beginning of the
      next scheduled time slice if vPMC's MSRs aren't changed during this time
      slice. If guest comes back to use this vPMC in next time slice, a new perf
      event would be re-created via perf_event_create_kernel_counter() as usual.
      Suggested-by: default avatarWei Wang <wei.w.wang@intel.com>
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarLike Xu <like.xu@linux.intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b35e5548
    • Like Xu's avatar
      KVM: x86/vPMU: Reuse perf_event to avoid unnecessary pmc_reprogram_counter · a6da0d77
      Like Xu authored
      The perf_event_create_kernel_counter() in the pmc_reprogram_counter() is
      a heavyweight and high-frequency operation, especially when host disables
      the watchdog (maximum 21000000 ns) which leads to an unacceptable latency
      of the guest NMI handler. It limits the use of vPMUs in the guest.
      
      When a vPMC is fully enabled, the legacy reprogram_*_counter() would stop
      and release its existing perf_event (if any) every time EVEN in most cases
      almost the same requested perf_event will be created and configured again.
      
      For each vPMC, if the reuqested config ('u64 eventsel' for gp and 'u8 ctrl'
      for fixed) is the same as its current config AND a new sample period based
      on pmc->counter is accepted by host perf interface, the current event could
      be reused safely as a new created one does. Otherwise, do release the
      undesirable perf_event and reprogram a new one as usual.
      
      It's light-weight to call pmc_pause_counter (disable, read and reset event)
      and pmc_resume_counter (recalibrate period and re-enable event) as guest
      expects instead of release-and-create again on any condition. Compared to
      use the filterable event->attr or hw.config, a new 'u64 current_config'
      field is added to save the last original programed config for each vPMC.
      
      Based on this implementation, the number of calls to pmc_reprogram_counter
      is reduced by ~82.5% for a gp sampling event and ~99.9% for a fixed event.
      In the usage of multiplexing perf sampling mode, the average latency of the
      guest NMI handler is reduced from 104923 ns to 48393 ns (~2.16x speed up).
      If host disables watchdog, the minimum latecy of guest NMI handler could be
      speed up at ~3413x (from 20407603 to 5979 ns) and at ~786x in the average.
      Suggested-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarLike Xu <like.xu@linux.intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a6da0d77
    • Like Xu's avatar
      KVM: x86/vPMU: Introduce a new kvm_pmu_ops->msr_idx_to_pmc callback · c900c156
      Like Xu authored
      Introduce a new callback msr_idx_to_pmc that returns a struct kvm_pmc*,
      and change kvm_pmu_is_valid_msr to return ".msr_idx_to_pmc(vcpu, msr) ||
      .is_valid_msr(vcpu, msr)" and AMD just returns false from .is_valid_msr.
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Signed-off-by: default avatarLike Xu <like.xu@linux.intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c900c156
    • Like Xu's avatar
      KVM: x86/vPMU: Rename pmu_ops callbacks from msr_idx to rdpmc_ecx · 98ff80f5
      Like Xu authored
      The leagcy pmu_ops->msr_idx_to_pmc is only called in kvm_pmu_rdpmc, so
      this function actually receives the contents of ECX before RDPMC, and
      translates it to a kvm_pmc. Let's clarify its semantic by renaming the
      existing msr_idx_to_pmc to rdpmc_ecx_to_pmc, and is_valid_msr_idx to
      is_valid_rdpmc_ecx; likewise for the wrapper kvm_pmu_is_valid_msr_idx.
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarLike Xu <like.xu@linux.intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      98ff80f5
    • Like Xu's avatar
      perf/core: Provide a kernel-internal interface to pause perf_event · 52ba4b0b
      Like Xu authored
      Exporting perf_event_pause() as an external accessor for kernel users (such
      as KVM) who may do both disable perf_event and read count with just one
      time to hold perf_event_ctx_lock. Also the value could be reset optionally.
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarLike Xu <like.xu@linux.intel.com>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      52ba4b0b
    • Like Xu's avatar
      perf/core: Provide a kernel-internal interface to recalibrate event period · 3ca270fc
      Like Xu authored
      Currently, perf_event_period() is used by user tools via ioctl. Based on
      naming convention, exporting perf_event_period() for kernel users (such
      as KVM) who may recalibrate the event period for their assigned counter
      according to their requirements.
      
      The perf_event_period() is an external accessor, just like the
      perf_event_{en,dis}able() and should thus use perf_event_ctx_lock().
      Suggested-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarLike Xu <like.xu@linux.intel.com>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3ca270fc
    • Liran Alon's avatar
      KVM: nVMX: Update vmcs01 TPR_THRESHOLD if L2 changed L1 TPR · 02d496cf
      Liran Alon authored
      When L1 don't use TPR-Shadow to run L2, L0 configures vmcs02 without
      TPR-Shadow and install intercepts on CR8 access (load and store).
      
      If L1 do not intercept L2 CR8 access, L0 intercepts on those accesses
      will emulate load/store on L1's LAPIC TPR. If in this case L2 lowers
      TPR such that there is now an injectable interrupt to L1,
      apic_update_ppr() will request a KVM_REQ_EVENT which will trigger a call
      to update_cr8_intercept() to update TPR-Threshold to highest pending IRR
      priority.
      
      However, this update to TPR-Threshold is done while active vmcs is
      vmcs02 instead of vmcs01. Thus, when later at some point L0 will
      emulate an exit from L2 to L1, L1 will still run with high
      TPR-Threshold. This will result in every VMEntry to L1 to immediately
      exit on TPR_BELOW_THRESHOLD and continue to do so infinitely until
      some condition will cause KVM_REQ_EVENT to be set.
      (Note that TPR_BELOW_THRESHOLD exit handler do not set KVM_REQ_EVENT
      until apic_update_ppr() will notice a new injectable interrupt for PPR)
      
      To fix this issue, change update_cr8_intercept() such that if L2 lowers
      L1's TPR in a way that requires to lower L1's TPR-Threshold, save update
      to TPR-Threshold and apply it to vmcs01 when L0 emulates an exit from
      L2 to L1.
      Reviewed-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      02d496cf
    • Liran Alon's avatar
      KVM: VMX: Refactor update_cr8_intercept() · 132f4f7e
      Liran Alon authored
      No functional changes.
      Reviewed-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      132f4f7e
    • Liran Alon's avatar
      KVM: SVM: Remove check if APICv enabled in SVM update_cr8_intercept() handler · 49d654d8
      Liran Alon authored
      This check is unnecessary as x86 update_cr8_intercept() which calls
      this VMX/SVM specific callback already performs this check.
      Reviewed-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      49d654d8
    • Miaohe Lin's avatar
      KVM: APIC: add helper func to remove duplicate code in kvm_pv_send_ipi · 1a686237
      Miaohe Lin authored
      There are some duplicate code in kvm_pv_send_ipi when deal with ipi
      bitmap. Add helper func to remove it, and eliminate odd out label,
      get rid of unnecessary kvm_lapic_irq field init and so on.
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1a686237
    • Miaohe Lin's avatar
      KVM: X86: avoid unused setup_syscalls_segments call when SYSCALL check failed · 5b4ce93a
      Miaohe Lin authored
      When SYSCALL/SYSENTER ability check failed, cs and ss is inited but
      remain not used. Delay initializing cs and ss until SYSCALL/SYSENTER
      ability check passed.
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5b4ce93a
    • Miaohe Lin's avatar
      KVM: MMIO: get rid of odd out_err label in kvm_coalesced_mmio_init · b139b5a2
      Miaohe Lin authored
      The out_err label and var ret is unnecessary, clean them up.
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b139b5a2
    • Liran Alon's avatar
      KVM: VMX: Consume pending LAPIC INIT event when exit on INIT_SIGNAL · e64a8508
      Liran Alon authored
      Intel SDM section 25.2 OTHER CAUSES OF VM EXITS specifies the following
      on INIT signals: "Such exits do not modify register state or clear pending
      events as they would outside of VMX operation."
      
      When commit 4b9852f4 ("KVM: x86: Fix INIT signal handling in various CPU states")
      was applied, I interepted above Intel SDM statement such that
      INIT_SIGNAL exit don’t consume the LAPIC INIT pending event.
      
      However, when Nadav Amit run matching kvm-unit-test on a bare-metal
      machine, it turned out my interpetation was wrong. i.e. INIT_SIGNAL
      exit does consume the LAPIC INIT pending event.
      (See: https://www.spinics.net/lists/kvm/msg196757.html)
      
      Therefore, fix KVM code to behave as observed on bare-metal.
      
      Fixes: 4b9852f4 ("KVM: x86: Fix INIT signal handling in various CPU states")
      Reported-by: default avatarNadav Amit <nadav.amit@gmail.com>
      Reviewed-by: default avatarMihai Carabas <mihai.carabas@oracle.com>
      Reviewed-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e64a8508
    • Liran Alon's avatar
      KVM: x86: Prevent set vCPU into INIT/SIPI_RECEIVED state when INIT are latched · 27cbe7d6
      Liran Alon authored
      Commit 4b9852f4 ("KVM: x86: Fix INIT signal handling in various CPU states")
      fixed KVM to also latch pending LAPIC INIT event when vCPU is in VMX
      operation.
      
      However, current API of KVM_SET_MP_STATE allows userspace to put vCPU
      into KVM_MP_STATE_SIPI_RECEIVED or KVM_MP_STATE_INIT_RECEIVED even when
      vCPU is in VMX operation.
      
      Fix this by introducing a util method to check if vCPU state latch INIT
      signals and use it in KVM_SET_MP_STATE handler.
      
      Fixes: 4b9852f4 ("KVM: x86: Fix INIT signal handling in various CPU states")
      Reported-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Reviewed-by: default avatarMihai Carabas <mihai.carabas@oracle.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      27cbe7d6
    • Liran Alon's avatar
      KVM: x86: Evaluate latched_init in KVM_SET_VCPU_EVENTS when vCPU not in SMM · ff90afa7
      Liran Alon authored
      Commit 4b9852f4 ("KVM: x86: Fix INIT signal handling in various CPU states")
      fixed KVM to also latch pending LAPIC INIT event when vCPU is in VMX
      operation.
      
      However, current API of KVM_SET_VCPU_EVENTS defines this field as
      part of SMM state and only set pending LAPIC INIT event if vCPU is
      specified to be in SMM mode (events->smi.smm is set).
      
      Change KVM_SET_VCPU_EVENTS handler to set pending LAPIC INIT event
      by latched_init field regardless of if vCPU is in SMM mode or not.
      
      Fixes: 4b9852f4 ("KVM: x86: Fix INIT signal handling in various CPU states")
      Reviewed-by: default avatarMihai Carabas <mihai.carabas@oracle.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ff90afa7
    • Andrea Arcangeli's avatar
      x86: retpolines: eliminate retpoline from msr event handlers · 74c504a6
      Andrea Arcangeli authored
      It's enough to check the value and issue the direct call.
      
      After this commit is applied, here the most common retpolines executed
      under a high resolution timer workload in the guest on a VMX host:
      
      [..]
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 267
      @[]: 2256
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          __kvm_wait_lapic_expire+284
          vmx_vcpu_run.part.97+1091
          vcpu_enter_guest+377
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 2390
      @[]: 33410
      
      @total: 315707
      
      Note the highest hit above is __delay so probably not worth optimizing
      even if it would be more frequent than 2k hits per sec.
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      74c504a6
    • Andrea Arcangeli's avatar
      KVM: retpolines: x86: eliminate retpoline from svm.c exit handlers · 3dcb2a3f
      Andrea Arcangeli authored
      It's enough to check the exit value and issue a direct call to avoid
      the retpoline for all the common vmexit reasons.
      
      After this commit is applied, here the most common retpolines executed
      under a high resolution timer workload in the guest on a SVM host:
      
      [..]
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          ktime_get_update_offsets_now+70
          hrtimer_interrupt+131
          smp_apic_timer_interrupt+106
          apic_timer_interrupt+15
          start_sw_timer+359
          restart_apic_timer+85
          kvm_set_msr_common+1497
          msr_interception+142
          vcpu_enter_guest+684
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 1940
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_r12+33
          force_qs_rnp+217
          rcu_gp_kthread+1270
          kthread+268
          ret_from_fork+34
      ]: 4644
      @[]: 25095
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          lapic_next_event+28
          clockevents_program_event+148
          hrtimer_start_range_ns+528
          start_sw_timer+356
          restart_apic_timer+85
          kvm_set_msr_common+1497
          msr_interception+142
          vcpu_enter_guest+684
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 41474
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          clockevents_program_event+148
          hrtimer_start_range_ns+528
          start_sw_timer+356
          restart_apic_timer+85
          kvm_set_msr_common+1497
          msr_interception+142
          vcpu_enter_guest+684
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 41474
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          ktime_get+58
          clockevents_program_event+84
          hrtimer_start_range_ns+528
          start_sw_timer+356
          restart_apic_timer+85
          kvm_set_msr_common+1497
          msr_interception+142
          vcpu_enter_guest+684
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 41887
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          lapic_next_event+28
          clockevents_program_event+148
          hrtimer_try_to_cancel+168
          hrtimer_cancel+21
          kvm_set_lapic_tscdeadline_msr+43
          kvm_set_msr_common+1497
          msr_interception+142
          vcpu_enter_guest+684
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 42723
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          clockevents_program_event+148
          hrtimer_try_to_cancel+168
          hrtimer_cancel+21
          kvm_set_lapic_tscdeadline_msr+43
          kvm_set_msr_common+1497
          msr_interception+142
          vcpu_enter_guest+684
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 42766
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          ktime_get+58
          clockevents_program_event+84
          hrtimer_try_to_cancel+168
          hrtimer_cancel+21
          kvm_set_lapic_tscdeadline_msr+43
          kvm_set_msr_common+1497
          msr_interception+142
          vcpu_enter_guest+684
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 42848
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          ktime_get+58
          start_sw_timer+279
          restart_apic_timer+85
          kvm_set_msr_common+1497
          msr_interception+142
          vcpu_enter_guest+684
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 499845
      
      @total: 1780243
      
      SVM has no TSC based programmable preemption timer so it is invoking
      ktime_get() frequently.
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3dcb2a3f
    • Andrea Arcangeli's avatar
      KVM: retpolines: x86: eliminate retpoline from vmx.c exit handlers · 4289d272
      Andrea Arcangeli authored
      It's enough to check the exit value and issue a direct call to avoid
      the retpoline for all the common vmexit reasons.
      
      Of course CONFIG_RETPOLINE already forbids gcc to use indirect jumps
      while compiling all switch() statements, however switch() would still
      allow the compiler to bisect the case value. It's more efficient to
      prioritize the most frequent vmexits instead.
      
      The halt may be slow paths from the point of the guest, but not
      necessarily so from the point of the host if the host runs at full CPU
      capacity and no host CPU is ever left idle.
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4289d272
    • Andrea Arcangeli's avatar
      KVM: x86: optimize more exit handlers in vmx.c · f399e60c
      Andrea Arcangeli authored
      Eliminate wasteful call/ret non RETPOLINE case and unnecessary fentry
      dynamic tracing hooking points.
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f399e60c
  2. 11 Nov, 2019 1 commit
  3. 02 Nov, 2019 1 commit
    • Marcelo Tosatti's avatar
      KVM: x86: switch KVMCLOCK base to monotonic raw clock · 53fafdbb
      Marcelo Tosatti authored
      Commit 0bc48bea ("KVM: x86: update master clock before computing
      kvmclock_offset")
      switches the order of operations to avoid the conversion
      
      TSC (without frequency correction) ->
      system_timestamp (with frequency correction),
      
      which might cause a time jump.
      
      However, it leaves any other masterclock update unsafe, which includes,
      at the moment:
      
              * HV_X64_MSR_REFERENCE_TSC MSR write.
              * TSC writes.
              * Host suspend/resume.
      
      Avoid the time jump issue by using frequency uncorrected
      CLOCK_MONOTONIC_RAW clock.
      
      Its the guests time keeping software responsability
      to track and correct a reference clock such as UTC.
      
      This fixes forward time jump (which can result in
      failure to bring up a vCPU) during vCPU hotplug:
      
      Oct 11 14:48:33 storage kernel: CPU2 has been hot-added
      Oct 11 14:48:34 storage kernel: CPU3 has been hot-added
      Oct 11 14:49:22 storage kernel: smpboot: Booting Node 0 Processor 2 APIC 0x2          <-- time jump of almost 1 minute
      Oct 11 14:49:22 storage kernel: smpboot: do_boot_cpu failed(-1) to wakeup CPU#2
      Oct 11 14:49:23 storage kernel: smpboot: Booting Node 0 Processor 3 APIC 0x3
      Oct 11 14:49:23 storage kernel: kvm-clock: cpu 3, msr 0:7ff640c1, secondary cpu clock
      
      Which happens because:
      
                      /*
                       * Wait 10s total for a response from AP
                       */
                      boot_error = -1;
                      timeout = jiffies + 10*HZ;
                      while (time_before(jiffies, timeout)) {
                               ...
                      }
      Analyzed-by: default avatarIgor Mammedov <imammedo@redhat.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      53fafdbb
  4. 31 Oct, 2019 1 commit
    • Paolo Bonzini's avatar
      Merge tag 'kvm-ppc-next-5.5-1' of... · e7011c5d
      Paolo Bonzini authored
      Merge tag 'kvm-ppc-next-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
      
      KVM PPC update for 5.5
      
      * Add capability to tell userspace whether we can single-step the guest.
      
      * Improve the allocation of XIVE virtual processor IDs, to reduce the
        risk of running out of IDs when running many VMs on POWER9.
      
      * Rewrite interrupt synthesis code to deliver interrupts in virtual
        mode when appropriate.
      
      * Minor cleanups and improvements.
      e7011c5d
  5. 25 Oct, 2019 1 commit
  6. 22 Oct, 2019 16 commits