1. 11 Mar, 2014 10 commits
    • Paolo Bonzini's avatar
      KVM: svm: set/clear all DR intercepts in one swoop · 5315c716
      Paolo Bonzini authored
      Unlike other intercepts, debug register intercepts will be modified
      in hot paths if the guest OS is bad or otherwise gets tricked into
      doing so.
      
      Avoid calling recalc_intercepts 16 times for debug registers.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5315c716
    • Paolo Bonzini's avatar
      KVM: nVMX: Allow nested guests to run with dirty debug registers · d16c293e
      Paolo Bonzini authored
      When preparing the VMCS02, the CPU-based execution controls is computed
      by vmx_exec_control.  Turn off DR access exits there, too, if the
      KVM_DEBUGREG_WONT_EXIT bit is set in switch_db_regs.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d16c293e
    • Paolo Bonzini's avatar
      KVM: vmx: Allow the guest to run with dirty debug registers · 81908bf4
      Paolo Bonzini authored
      When not running in guest-debug mode (i.e. the guest controls the debug
      registers, having to take an exit for each DR access is a waste of time.
      If the guest gets into a state where each context switch causes DR to be
      saved and restored, this can take away as much as 40% of the execution
      time from the guest.
      
      If the guest is running with vcpu->arch.db == vcpu->arch.eff_db, we
      can let it write freely to the debug registers and reload them on the
      next exit.  We still need to exit on the first access, so that the
      KVM_DEBUGREG_WONT_EXIT flag is set in switch_db_regs; after that, further
      accesses to the debug registers will not cause a vmexit.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      81908bf4
    • Paolo Bonzini's avatar
      KVM: x86: Allow the guest to run with dirty debug registers · c77fb5fe
      Paolo Bonzini authored
      When not running in guest-debug mode, the guest controls the debug
      registers and having to take an exit for each DR access is a waste
      of time.  If the guest gets into a state where each context switch
      causes DR to be saved and restored, this can take away as much as 40%
      of the execution time from the guest.
      
      After this patch, VMX- and SVM-specific code can set a flag in
      switch_db_regs, telling vcpu_enter_guest that on the next exit the debug
      registers might be dirty and need to be reloaded (syncing will be taken
      care of by a new callback in kvm_x86_ops).  This flag can be set on the
      first access to a debug registers, so that multiple accesses to the
      debug registers only cause one vmexit.
      
      Note that since the guest will be able to read debug registers and
      enable breakpoints in DR7, we need to ensure that they are synchronized
      on entry to the guest---including DR6 that was not synced before.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c77fb5fe
    • Paolo Bonzini's avatar
      KVM: x86: change vcpu->arch.switch_db_regs to a bit mask · 360b948d
      Paolo Bonzini authored
      The next patch will add another bit that we can test with the
      same "if".
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      360b948d
    • Paolo Bonzini's avatar
      KVM: vmx: we do rely on loading DR7 on entry · c845f9c6
      Paolo Bonzini authored
      Currently, this works even if the bit is not in "min", because the bit is always
      set in MSR_IA32_VMX_ENTRY_CTLS.  Mention it for the sake of documentation, and
      to avoid surprises if we later switch to MSR_IA32_VMX_TRUE_ENTRY_CTLS.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c845f9c6
    • Jan Kiszka's avatar
      KVM: x86: Remove return code from enable_irq/nmi_window · c9a7953f
      Jan Kiszka authored
      It's no longer possible to enter enable_irq_window in guest mode when
      L1 intercepts external interrupts and we are entering L2. This is now
      caught in vcpu_enter_guest. So we can remove the check from the VMX
      version of enable_irq_window, thus the need to return an error code from
      both enable_irq_window and enable_nmi_window.
      Signed-off-by: default avatarJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c9a7953f
    • Jan Kiszka's avatar
      KVM: nVMX: Do not inject NMI vmexits when L2 has a pending interrupt · 220c5672
      Jan Kiszka authored
      According to SDM 27.2.3, IDT vectoring information will not be valid on
      vmexits caused by external NMIs. So we have to avoid creating such
      scenarios by delaying EXIT_REASON_EXCEPTION_NMI injection as long as we
      have a pending interrupt because that one would be migrated to L1's IDT
      vectoring info on nested exit.
      Signed-off-by: default avatarJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      220c5672
    • Jan Kiszka's avatar
      KVM: nVMX: Fully emulate preemption timer · f4124500
      Jan Kiszka authored
      We cannot rely on the hardware-provided preemption timer support because
      we are holding L2 in HLT outside non-root mode. Furthermore, emulating
      the preemption will resolve tick rate errata on older Intel CPUs.
      
      The emulation is based on hrtimer which is started on L2 entry, stopped
      on L2 exit and evaluated via the new check_nested_events hook. As we no
      longer rely on hardware features, we can enable both the preemption
      timer support and value saving unconditionally.
      Signed-off-by: default avatarJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f4124500
    • Jan Kiszka's avatar
      KVM: nVMX: Rework interception of IRQs and NMIs · b6b8a145
      Jan Kiszka authored
      Move the check for leaving L2 on pending and intercepted IRQs or NMIs
      from the *_allowed handler into a dedicated callback. Invoke this
      callback at the relevant points before KVM checks if IRQs/NMIs can be
      injected. The callback has the task to switch from L2 to L1 if needed
      and inject the proper vmexit events.
      
      The rework fixes L2 wakeups from HLT and provides the foundation for
      preemption timer emulation.
      Signed-off-by: default avatarJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b6b8a145
  2. 06 Mar, 2014 2 commits
  3. 04 Mar, 2014 12 commits
  4. 03 Mar, 2014 13 commits
  5. 27 Feb, 2014 3 commits
    • Paolo Bonzini's avatar
      kvm, vmx: Really fix lazy FPU on nested guest · 1b385cbd
      Paolo Bonzini authored
      Commit e504c909 (kvm, vmx: Fix lazy FPU on nested guest, 2013-11-13)
      highlighted a real problem, but the fix was subtly wrong.
      
      nested_read_cr0 is the CR0 as read by L2, but here we want to look at
      the CR0 value reflecting L1's setup.  In other words, L2 might think
      that TS=0 (so nested_read_cr0 has the bit clear); but if L1 is actually
      running it with TS=1, we should inject the fault into L1.
      
      The effective value of CR0 in L2 is contained in vmcs12->guest_cr0, use
      it.
      
      Fixes: e504c909Reported-by: default avatarKashyap Chamarty <kchamart@redhat.com>
      Reported-by: default avatarStefan Bader <stefan.bader@canonical.com>
      Tested-by: default avatarKashyap Chamarty <kchamart@redhat.com>
      Tested-by: default avatarAnthoine Bourgeois <bourgeois@bertin.fr>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1b385cbd
    • Andrew Honig's avatar
      kvm: x86: fix emulator buffer overflow (CVE-2014-0049) · a08d3b3b
      Andrew Honig authored
      The problem occurs when the guest performs a pusha with the stack
      address pointing to an mmio address (or an invalid guest physical
      address) to start with, but then extending into an ordinary guest
      physical address.  When doing repeated emulated pushes
      emulator_read_write sets mmio_needed to 1 on the first one.  On a
      later push when the stack points to regular memory,
      mmio_nr_fragments is set to 0, but mmio_is_needed is not set to 0.
      
      As a result, KVM exits to userspace, and then returns to
      complete_emulated_mmio.  In complete_emulated_mmio
      vcpu->mmio_cur_fragment is incremented.  The termination condition of
      vcpu->mmio_cur_fragment == vcpu->mmio_nr_fragments is never achieved.
      The code bounces back and fourth to userspace incrementing
      mmio_cur_fragment past it's buffer.  If the guest does nothing else it
      eventually leads to a a crash on a memcpy from invalid memory address.
      
      However if a guest code can cause the vm to be destroyed in another
      vcpu with excellent timing, then kvm_clear_async_pf_completion_queue
      can be used by the guest to control the data that's pointed to by the
      call to cancel_work_item, which can be used to gain execution.
      
      Fixes: f78146b0Signed-off-by: default avatarAndrew Honig <ahonig@google.com>
      Cc: stable@vger.kernel.org (3.5+)
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a08d3b3b
    • Marc Zyngier's avatar
      arm/arm64: KVM: detect CPU reset on CPU_PM_EXIT · b20c9f29
      Marc Zyngier authored
      Commit 1fcf7ce0 (arm: kvm: implement CPU PM notifier) added
      support for CPU power-management, using a cpu_notifier to re-init
      KVM on a CPU that entered CPU idle.
      
      The code assumed that a CPU entering idle would actually be powered
      off, loosing its state entierely, and would then need to be
      reinitialized. It turns out that this is not always the case, and
      some HW performs CPU PM without actually killing the core. In this
      case, we try to reinitialize KVM while it is still live. It ends up
      badly, as reported by Andre Przywara (using a Calxeda Midway):
      
      [    3.663897] Kernel panic - not syncing: unexpected prefetch abort in Hyp mode at: 0x685760
      [    3.663897] unexpected data abort in Hyp mode at: 0xc067d150
      [    3.663897] unexpected HVC/SVC trap in Hyp mode at: 0xc0901dd0
      
      The trick here is to detect if we've been through a full re-init or
      not by looking at HVBAR (VBAR_EL2 on arm64). This involves
      implementing the backend for __hyp_get_vectors in the main KVM HYP
      code (rather small), and checking the return value against the
      default one when the CPU notifier is called on CPU_PM_EXIT.
      Reported-by: default avatarAndre Przywara <osp@andrep.de>
      Tested-by: default avatarAndre Przywara <osp@andrep.de>
      Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Cc: Rob Herring <rob.herring@linaro.org>
      Acked-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b20c9f29