An error occurred fetching the project authors.
  1. 10 Nov, 2016 1 commit
  2. 05 Sep, 2016 1 commit
    • Wanpeng Li's avatar
      KVM: lapic: adjust preemption timer correctly when goes TSC backward · e12c8f36
      Wanpeng Li authored
      TSC_OFFSET will be adjusted if discovers TSC backward during vCPU load.
      The preemption timer, which relies on the guest tsc to reprogram its
      preemption timer value, is also reprogrammed if vCPU is scheded in to
      a different pCPU. However, the current implementation reprogram preemption
      timer before TSC_OFFSET is adjusted to the right value, resulting in the
      preemption timer firing prematurely.
      
      This patch fix it by adjusting TSC_OFFSET before reprogramming preemption
      timer if TSC backward.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krċmář <rkrcmar@redhat.com>
      Cc: Yunhong Jiang <yunhong.jiang@intel.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e12c8f36
  3. 15 Jul, 2016 2 commits
  4. 14 Jul, 2016 7 commits
    • Paul Gortmaker's avatar
      x86/kvm: Audit and remove any unnecessary uses of module.h · 1767e931
      Paul Gortmaker authored
      Historically a lot of these existed because we did not have
      a distinction between what was modular code and what was providing
      support to modules via EXPORT_SYMBOL and friends.  That changed
      when we forked out support for the latter into the export.h file.
      
      This means we should be able to reduce the usage of module.h
      in code that is obj-y Makefile or bool Kconfig.  In the case of
      kvm where it is modular, we can extend that to also include files
      that are building basic support functionality but not related
      to loading or registering the final module; such files also have
      no need whatsoever for module.h
      
      The advantage in removing such instances is that module.h itself
      sources about 15 other headers; adding significantly to what we feed
      cpp, and it can obscure what headers we are effectively using.
      
      Since module.h was the source for init.h (for __init) and for
      export.h (for EXPORT_SYMBOL) we consider each instance for the
      presence of either and replace as needed.
      
      Several instances got replaced with moduleparam.h since that was
      really all that was required for those particular files.
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Acked-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kvm@vger.kernel.org
      Link: http://lkml.kernel.org/r/20160714001901.31603-8-paul.gortmaker@windriver.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1767e931
    • Radim Krčmář's avatar
      KVM: x86: bump KVM_MAX_VCPU_ID to 1023 · af1bae54
      Radim Krčmář authored
      kzalloc was replaced with kvm_kvzalloc to allow non-contiguous areas and
      rcu had to be modified to cope with it.
      
      The practical limit for KVM_MAX_VCPU_ID right now is INT_MAX, but lower
      value was chosen in case there were bugs.  1023 is sufficient maximum
      APIC ID for 288 VCPUs.
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      af1bae54
    • Radim Krčmář's avatar
      KVM: x86: add a flag to disable KVM x2apic broadcast quirk · c519265f
      Radim Krčmář authored
      Add KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK as a feature flag to
      KVM_CAP_X2APIC_API.
      
      The quirk made KVM interpret 0xff as a broadcast even in x2APIC mode.
      The enableable capability is needed in order to support standard x2APIC and
      remain backward compatible.
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      [Expand kvm_apic_mda comment. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c519265f
    • Radim Krčmář's avatar
      KVM: x86: add KVM_CAP_X2APIC_API · 37131313
      Radim Krčmář authored
      KVM_CAP_X2APIC_API is a capability for features related to x2APIC
      enablement.  KVM_X2APIC_API_32BIT_FORMAT feature can be enabled to
      extend APIC ID in get/set ioctl and MSI addresses to 32 bits.
      Both are needed to support x2APIC.
      
      The feature has to be enableable and disabled by default, because
      get/set ioctl shifted and truncated APIC ID to 8 bits by using a
      non-standard protocol inspired by xAPIC and the change is not
      backward-compatible.
      
      Changes to MSI addresses follow the format used by interrupt remapping
      unit.  The upper address word, that used to be 0, contains upper 24 bits
      of the LAPIC address in its upper 24 bits.  Lower 8 bits are reserved as
      0.  Using the upper address word is not backward-compatible either as we
      didn't check that userspace zeroed the word.  Reserved bits are still
      not explicitly checked, but non-zero data will affect LAPIC addresses,
      which will cause a bug.
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      37131313
    • Radim Krčmář's avatar
      KVM: x86: use hardware-compatible format for APIC ID register · a92e2543
      Radim Krčmář authored
      We currently always shift APIC ID as if APIC was in xAPIC mode.
      x2APIC mode wants to use more bits and storing a hardware-compabible
      value is the the sanest option.
      
      KVM API to set the lapic expects that bottom 8 bits of APIC ID are in
      top 8 bits of APIC_ID register, so the register needs to be shifted in
      x2APIC mode.
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a92e2543
    • Bandan Das's avatar
      kvm: mmu: don't set the present bit unconditionally · ffb128c8
      Bandan Das authored
      To support execute only mappings on behalf of L1
      hypervisors, we need to teach set_spte() to honor all three of
      L1's XWR bits.  As a start, add a new variable "shadow_present_mask"
      that will be set for non-EPT shadow paging and clear for EPT.
      Signed-off-by: default avatarBandan Das <bsd@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ffb128c8
    • Bandan Das's avatar
      kvm: mmu: remove is_present_gpte() · 812f30b2
      Bandan Das authored
      We have two versions of the above function.
      To prevent confusion and bugs in the future, remove
      the non-FNAME version entirely and replace all calls
      with the actual check.
      Signed-off-by: default avatarBandan Das <bsd@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      812f30b2
  5. 01 Jul, 2016 2 commits
  6. 27 Jun, 2016 1 commit
  7. 23 Jun, 2016 2 commits
  8. 16 Jun, 2016 2 commits
    • Yunhong Jiang's avatar
      kvm: vmx: hook preemption timer support · 64672c95
      Yunhong Jiang authored
      Hook the VMX preemption timer to the "hv timer" functionality added
      by the previous patch.  This includes: checking if the feature is
      supported, if the feature is broken on the CPU, the hooks to
      setup/clean the VMX preemption timer, arming the timer on vmentry
      and handling the vmexit.
      
      A module parameter states if the VMX preemption timer should be
      utilized.
      Signed-off-by: default avatarYunhong Jiang <yunhong.jiang@intel.com>
      [Move hv_deadline_tsc to struct vcpu_vmx, use -1 as the "unset" value.
       Put all VMX bits here.  Enable it by default #yolo. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      64672c95
    • Yunhong Jiang's avatar
      KVM: x86: support using the vmx preemption timer for tsc deadline timer · ce7a058a
      Yunhong Jiang authored
      The VMX preemption timer can be used to virtualize the TSC deadline timer.
      The VMX preemption timer is armed when the vCPU is running, and a VMExit
      will happen if the virtual TSC deadline timer expires.
      
      When the vCPU thread is blocked because of HLT, KVM will switch to use
      an hrtimer, and then go back to the VMX preemption timer when the vCPU
      thread is unblocked.
      
      This solution avoids the complex OS's hrtimer system, and the host
      timer interrupt handling cost, replacing them with a little math
      (for guest->host TSC and host TSC->preemption timer conversion)
      and a cheaper VMexit.  This benefits latency for isolated pCPUs.
      
      [A word about performance... Yunhong reported a 30% reduction in average
       latency from cyclictest.  I made a similar test with tscdeadline_latency
       from kvm-unit-tests, and measured
      
       - ~20 clock cycles loss (out of ~3200, so less than 1% but still
         statistically significant) in the worst case where the test halts
         just after programming the TSC deadline timer
      
       - ~800 clock cycles gain (25% reduction in latency) in the best case
         where the test busy waits.
      
       I removed the VMX bits from Yunhong's patch, to concentrate them in the
       next patch - Paolo]
      Signed-off-by: default avatarYunhong Jiang <yunhong.jiang@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ce7a058a
  9. 15 Jun, 2016 1 commit
    • Paolo Bonzini's avatar
      KVM: remove kvm_vcpu_compatible · 557abc40
      Paolo Bonzini authored
      The new created_vcpus field makes it possible to avoid the race between
      irqchip and VCPU creation in a much nicer way; just check under kvm->lock
      whether a VCPU has already been created.
      
      We can then remove KVM_APIC_ARCHITECTURE too, because at this point the
      symbol is only governing the default definition of kvm_vcpu_compatible.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      557abc40
  10. 14 Jun, 2016 1 commit
  11. 08 Jun, 2016 1 commit
    • Dave Hansen's avatar
      x86/fpu: Add tracepoints to dump FPU state at key points · d1898b73
      Dave Hansen authored
      I've been carrying this patch around for a bit and it's helped me
      solve at least a couple FPU-related bugs.  In addition to using
      it for debugging, I also drug it out because using AVX (and
      AVX2/AVX-512) can have serious power consequences for a modern
      core.  It's very important to be able to figure out who is using
      it.
      
      It's also insanely useful to go out and see who is using a given
      feature, like MPX or Memory Protection Keys.  If you, for
      instance, want to find all processes using protection keys, you
      can do:
      
      	echo 'xfeatures & 0x200' > filter
      
      Since 0x200 is the protection keys feature bit.
      
      Note that this touches the KVM code.  KVM did a CREATE_TRACE_POINTS
      and then included a bunch of random headers.  If anyone one of
      those included other tracepoints, it would have defined the *OTHER*
      tracepoints.  That's bogus, so move it to the right place.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160601174220.3CDFB90E@viggo.jf.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d1898b73
  12. 03 Jun, 2016 3 commits
    • Paolo Bonzini's avatar
      KVM: x86: protect KVM_CREATE_PIT/KVM_CREATE_PIT2 with kvm->lock · 250715a6
      Paolo Bonzini authored
      The syzkaller folks reported a NULL pointer dereference that seems
      to be cause by a race between KVM_CREATE_IRQCHIP and KVM_CREATE_PIT2.
      The former takes kvm->lock (except when registering the devices,
      which needs kvm->slots_lock); the latter takes kvm->slots_lock only.
      Change KVM_CREATE_PIT2 to follow the same model as KVM_CREATE_IRQCHIP.
      
      Testcase:
      
          #include <pthread.h>
          #include <linux/kvm.h>
          #include <fcntl.h>
          #include <sys/ioctl.h>
          #include <stdint.h>
          #include <string.h>
          #include <stdlib.h>
          #include <sys/syscall.h>
          #include <unistd.h>
      
          long r[23];
      
          void* thr1(void* arg)
          {
              struct kvm_pit_config pitcfg = { .flags = 4 };
              switch ((long)arg) {
              case 0: r[2]  = open("/dev/kvm", O_RDONLY|O_ASYNC);    break;
              case 1: r[3]  = ioctl(r[2], KVM_CREATE_VM, 0);         break;
              case 2: r[4]  = ioctl(r[3], KVM_CREATE_IRQCHIP, 0);    break;
              case 3: r[22] = ioctl(r[3], KVM_CREATE_PIT2, &pitcfg); break;
              }
              return 0;
          }
      
          int main(int argc, char **argv)
          {
              long i;
              pthread_t th[4];
      
              memset(r, -1, sizeof(r));
              for (i = 0; i < 4; i++) {
                  pthread_create(&th[i], 0, thr, (void*)i);
                  if (argc > 1 && rand()%2) usleep(rand()%1000);
              }
              usleep(20000);
              return 0;
          }
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      250715a6
    • Paolo Bonzini's avatar
      KVM: x86: rename process_smi to enter_smm, process_smi_request to process_smi · ee2cd4b7
      Paolo Bonzini authored
      Make the function names more similar between KVM_REQ_NMI and KVM_REQ_SMI.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      ee2cd4b7
    • Paolo Bonzini's avatar
      KVM: x86: avoid simultaneous queueing of both IRQ and SMI · c43203ca
      Paolo Bonzini authored
      If the processor exits to KVM while delivering an interrupt,
      the hypervisor then requeues the interrupt for the next vmentry.
      Trying to enter SMM in this same window causes to enter non-root
      mode in emulated SMM (i.e. with IF=0) and with a request to
      inject an IRQ (i.e. with a valid VM-entry interrupt info field).
      This is invalid guest state (SDM 26.3.1.4 "Check on Guest RIP
      and RFLAGS") and the processor fails vmentry.
      
      The fix is to defer the injection from KVM_REQ_SMI to KVM_REQ_EVENT,
      like we already do for e.g. NMIs.  This patch doesn't change the
      name of the process_smi function so that it can be applied to
      stable releases.  The next patch will modify the names so that
      process_nmi and process_smi handle respectively KVM_REQ_NMI and
      KVM_REQ_SMI.
      
      This is especially common with Windows, probably due to the
      self-IPI trick that it uses to deliver deferred procedure
      calls (DPCs).
      Reported-by: default avatarLaszlo Ersek <lersek@redhat.com>
      Reported-by: default avatarMichał Zegan <webczat_200@poczta.onet.pl>
      Fixes: 64d60670
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      c43203ca
  13. 02 Jun, 2016 4 commits
    • Paolo Bonzini's avatar
      KVM: x86: fix OOPS after invalid KVM_SET_DEBUGREGS · d14bdb55
      Paolo Bonzini authored
      MOV to DR6 or DR7 causes a #GP if an attempt is made to write a 1 to
      any of bits 63:32.  However, this is not detected at KVM_SET_DEBUGREGS
      time, and the next KVM_RUN oopses:
      
         general protection fault: 0000 [#1] SMP
         CPU: 2 PID: 14987 Comm: a.out Not tainted 4.4.9-300.fc23.x86_64 #1
         Hardware name: LENOVO 2325F51/2325F51, BIOS G2ET32WW (1.12 ) 05/30/2012
         [...]
         Call Trace:
          [<ffffffffa072c93d>] kvm_arch_vcpu_ioctl_run+0x141d/0x14e0 [kvm]
          [<ffffffffa071405d>] kvm_vcpu_ioctl+0x33d/0x620 [kvm]
          [<ffffffff81241648>] do_vfs_ioctl+0x298/0x480
          [<ffffffff812418a9>] SyS_ioctl+0x79/0x90
          [<ffffffff817a0f2e>] entry_SYSCALL_64_fastpath+0x12/0x71
         Code: 55 83 ff 07 48 89 e5 77 27 89 ff ff 24 fd 90 87 80 81 0f 23 fe 5d c3 0f 23 c6 5d c3 0f 23 ce 5d c3 0f 23 d6 5d c3 0f 23 de 5d c3 <0f> 23 f6 5d c3 0f 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00
         RIP  [<ffffffff810639eb>] native_set_debugreg+0x2b/0x40
          RSP <ffff88005836bd50>
      
      Testcase (beautified/reduced from syzkaller output):
      
          #include <unistd.h>
          #include <sys/syscall.h>
          #include <string.h>
          #include <stdint.h>
          #include <linux/kvm.h>
          #include <fcntl.h>
          #include <sys/ioctl.h>
      
          long r[8];
      
          int main()
          {
              struct kvm_debugregs dr = { 0 };
      
              r[2] = open("/dev/kvm", O_RDONLY);
              r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
              r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7);
      
              memcpy(&dr,
                     "\x5d\x6a\x6b\xe8\x57\x3b\x4b\x7e\xcf\x0d\xa1\x72"
                     "\xa3\x4a\x29\x0c\xfc\x6d\x44\x00\xa7\x52\xc7\xd8"
                     "\x00\xdb\x89\x9d\x78\xb5\x54\x6b\x6b\x13\x1c\xe9"
                     "\x5e\xd3\x0e\x40\x6f\xb4\x66\xf7\x5b\xe3\x36\xcb",
                     48);
              r[7] = ioctl(r[4], KVM_SET_DEBUGREGS, &dr);
              r[6] = ioctl(r[4], KVM_RUN, 0);
          }
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      d14bdb55
    • Paolo Bonzini's avatar
      KVM: fail KVM_SET_VCPU_EVENTS with invalid exception number · 78e546c8
      Paolo Bonzini authored
      This cannot be returned by KVM_GET_VCPU_EVENTS, so it is okay to return
      EINVAL.  It causes a WARN from exception_type:
      
          WARNING: CPU: 3 PID: 16732 at arch/x86/kvm/x86.c:345 exception_type+0x49/0x50 [kvm]()
          CPU: 3 PID: 16732 Comm: a.out Tainted: G        W       4.4.6-300.fc23.x86_64 #1
          Hardware name: LENOVO 2325F51/2325F51, BIOS G2ET32WW (1.12 ) 05/30/2012
           0000000000000286 000000006308a48b ffff8800bec7fcf8 ffffffff813b542e
           0000000000000000 ffffffffa0966496 ffff8800bec7fd30 ffffffff810a40f2
           ffff8800552a8000 0000000000000000 00000000002c267c 0000000000000001
          Call Trace:
           [<ffffffff813b542e>] dump_stack+0x63/0x85
           [<ffffffff810a40f2>] warn_slowpath_common+0x82/0xc0
           [<ffffffff810a423a>] warn_slowpath_null+0x1a/0x20
           [<ffffffffa0924809>] exception_type+0x49/0x50 [kvm]
           [<ffffffffa0934622>] kvm_arch_vcpu_ioctl_run+0x10a2/0x14e0 [kvm]
           [<ffffffffa091c04d>] kvm_vcpu_ioctl+0x33d/0x620 [kvm]
           [<ffffffff81241248>] do_vfs_ioctl+0x298/0x480
           [<ffffffff812414a9>] SyS_ioctl+0x79/0x90
           [<ffffffff817a04ee>] entry_SYSCALL_64_fastpath+0x12/0x71
          ---[ end trace b1a0391266848f50 ]---
      
      Testcase (beautified/reduced from syzkaller output):
      
          #include <unistd.h>
          #include <sys/syscall.h>
          #include <string.h>
          #include <stdint.h>
          #include <fcntl.h>
          #include <sys/ioctl.h>
          #include <linux/kvm.h>
      
          long r[31];
      
          int main()
          {
              memset(r, -1, sizeof(r));
              r[2] = open("/dev/kvm", O_RDONLY);
              r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
              r[7] = ioctl(r[3], KVM_CREATE_VCPU, 0);
      
              struct kvm_vcpu_events ve = {
                      .exception.injected = 1,
                      .exception.nr = 0xd4
              };
              r[27] = ioctl(r[7], KVM_SET_VCPU_EVENTS, &ve);
              r[30] = ioctl(r[7], KVM_RUN, 0);
              return 0;
          }
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      78e546c8
    • Paolo Bonzini's avatar
      kvm: x86: avoid warning on repeated KVM_SET_TSS_ADDR · b21629da
      Paolo Bonzini authored
      Found by syzkaller:
      
          WARNING: CPU: 3 PID: 15175 at arch/x86/kvm/x86.c:7705 __x86_set_memory_region+0x1dc/0x1f0 [kvm]()
          CPU: 3 PID: 15175 Comm: a.out Tainted: G        W       4.4.6-300.fc23.x86_64 #1
          Hardware name: LENOVO 2325F51/2325F51, BIOS G2ET32WW (1.12 ) 05/30/2012
           0000000000000286 00000000950899a7 ffff88011ab3fbf0 ffffffff813b542e
           0000000000000000 ffffffffa0966496 ffff88011ab3fc28 ffffffff810a40f2
           00000000000001fd 0000000000003000 ffff88014fc50000 0000000000000000
          Call Trace:
           [<ffffffff813b542e>] dump_stack+0x63/0x85
           [<ffffffff810a40f2>] warn_slowpath_common+0x82/0xc0
           [<ffffffff810a423a>] warn_slowpath_null+0x1a/0x20
           [<ffffffffa09251cc>] __x86_set_memory_region+0x1dc/0x1f0 [kvm]
           [<ffffffffa092521b>] x86_set_memory_region+0x3b/0x60 [kvm]
           [<ffffffffa09bb61c>] vmx_set_tss_addr+0x3c/0x150 [kvm_intel]
           [<ffffffffa092f4d4>] kvm_arch_vm_ioctl+0x654/0xbc0 [kvm]
           [<ffffffffa091d31a>] kvm_vm_ioctl+0x9a/0x6f0 [kvm]
           [<ffffffff81241248>] do_vfs_ioctl+0x298/0x480
           [<ffffffff812414a9>] SyS_ioctl+0x79/0x90
           [<ffffffff817a04ee>] entry_SYSCALL_64_fastpath+0x12/0x71
      
      Testcase:
      
          #include <unistd.h>
          #include <sys/ioctl.h>
          #include <fcntl.h>
          #include <string.h>
          #include <linux/kvm.h>
      
          long r[8];
      
          int main()
          {
              memset(r, -1, sizeof(r));
      	r[2] = open("/dev/kvm", O_RDONLY|O_TRUNC);
              r[3] = ioctl(r[2], KVM_CREATE_VM, 0x0ul);
              r[5] = ioctl(r[3], KVM_SET_TSS_ADDR, 0x20000000ul);
              r[7] = ioctl(r[3], KVM_SET_TSS_ADDR, 0x20000000ul);
              return 0;
          }
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      b21629da
    • Dmitry Bilunov's avatar
      KVM: Handle MSR_IA32_PERF_CTL · 0c2df2a1
      Dmitry Bilunov authored
      Intel CPUs having Turbo Boost feature implement an MSR to provide a
      control interface via rdmsr/wrmsr instructions. One could detect the
      presence of this feature by issuing one of these instructions and
      handling the #GP exception which is generated in case the referenced MSR
      is not implemented by the CPU.
      
      KVM's vCPU model behaves exactly as a real CPU in this case by injecting
      a fault when MSR_IA32_PERF_CTL is called (which KVM does not support).
      However, some operating systems use this register during an early boot
      stage in which their kernel is not capable of handling #GP correctly,
      causing #DP and finally a triple fault effectively resetting the vCPU.
      
      This patch implements a dummy handler for MSR_IA32_PERF_CTL to avoid the
      crashes.
      Signed-off-by: default avatarDmitry Bilunov <kmeaw@yandex-team.ru>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      0c2df2a1
  14. 18 May, 2016 2 commits
  15. 13 May, 2016 1 commit
    • Christian Borntraeger's avatar
      KVM: halt_polling: provide a way to qualify wakeups during poll · 3491caf2
      Christian Borntraeger authored
      Some wakeups should not be considered a sucessful poll. For example on
      s390 I/O interrupts are usually floating, which means that _ALL_ CPUs
      would be considered runnable - letting all vCPUs poll all the time for
      transactional like workload, even if one vCPU would be enough.
      This can result in huge CPU usage for large guests.
      This patch lets architectures provide a way to qualify wakeups if they
      should be considered a good/bad wakeups in regard to polls.
      
      For s390 the implementation will fence of halt polling for anything but
      known good, single vCPU events. The s390 implementation for floating
      interrupts does a wakeup for one vCPU, but the interrupt will be delivered
      by whatever CPU checks first for a pending interrupt. We prefer the
      woken up CPU by marking the poll of this CPU as "good" poll.
      This code will also mark several other wakeup reasons like IPI or
      expired timers as "good". This will of course also mark some events as
      not sucessful. As  KVM on z runs always as a 2nd level hypervisor,
      we prefer to not poll, unless we are really sure, though.
      
      This patch successfully limits the CPU usage for cases like uperf 1byte
      transactional ping pong workload or wakeup heavy workload like OLTP
      while still providing a proper speedup.
      
      This also introduced a new vcpu stat "halt_poll_no_tuning" that marks
      wakeups that are considered not good for polling.
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: Radim Krčmář <rkrcmar@redhat.com> (for an earlier version)
      Cc: David Matlack <dmatlack@google.com>
      Cc: Wanpeng Li <kernellwp@gmail.com>
      [Rename config symbol. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3491caf2
  16. 11 May, 2016 1 commit
  17. 03 May, 2016 1 commit
  18. 20 Apr, 2016 1 commit
  19. 13 Apr, 2016 2 commits
  20. 10 Apr, 2016 1 commit
    • David Matlack's avatar
      kvm: x86: do not leak guest xcr0 into host interrupt handlers · fc5b7f3b
      David Matlack authored
      An interrupt handler that uses the fpu can kill a KVM VM, if it runs
      under the following conditions:
       - the guest's xcr0 register is loaded on the cpu
       - the guest's fpu context is not loaded
       - the host is using eagerfpu
      
      Note that the guest's xcr0 register and fpu context are not loaded as
      part of the atomic world switch into "guest mode". They are loaded by
      KVM while the cpu is still in "host mode".
      
      Usage of the fpu in interrupt context is gated by irq_fpu_usable(). The
      interrupt handler will look something like this:
      
      if (irq_fpu_usable()) {
              kernel_fpu_begin();
      
              [... code that uses the fpu ...]
      
              kernel_fpu_end();
      }
      
      As long as the guest's fpu is not loaded and the host is using eager
      fpu, irq_fpu_usable() returns true (interrupted_kernel_fpu_idle()
      returns true). The interrupt handler proceeds to use the fpu with
      the guest's xcr0 live.
      
      kernel_fpu_begin() saves the current fpu context. If this uses
      XSAVE[OPT], it may leave the xsave area in an undesirable state.
      According to the SDM, during XSAVE bit i of XSTATE_BV is not modified
      if bit i is 0 in xcr0. So it's possible that XSTATE_BV[i] == 1 and
      xcr0[i] == 0 following an XSAVE.
      
      kernel_fpu_end() restores the fpu context. Now if any bit i in
      XSTATE_BV == 1 while xcr0[i] == 0, XRSTOR generates a #GP. The
      fault is trapped and SIGSEGV is delivered to the current process.
      
      Only pre-4.2 kernels appear to be vulnerable to this sequence of
      events. Commit 653f52c3 ("kvm,x86: load guest FPU context more eagerly")
      from 4.2 forces the guest's fpu to always be loaded on eagerfpu hosts.
      
      This patch fixes the bug by keeping the host's xcr0 loaded outside
      of the interrupts-disabled region where KVM switches into guest mode.
      
      Cc: stable@vger.kernel.org
      Suggested-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      [Move load after goto cancel_injection. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fc5b7f3b
  21. 01 Apr, 2016 1 commit
    • Yuki Shibuya's avatar
      KVM: x86: Inject pending interrupt even if pending nmi exist · 321c5658
      Yuki Shibuya authored
      Non maskable interrupts (NMI) are preferred to interrupts in current
      implementation. If a NMI is pending and NMI is blocked by the result
      of nmi_allowed(), pending interrupt is not injected and
      enable_irq_window() is not executed, even if interrupts injection is
      allowed.
      
      In old kernel (e.g. 2.6.32), schedule() is often called in NMI context.
      In this case, interrupts are needed to execute iret that intends end
      of NMI. The flag of blocking new NMI is not cleared until the guest
      execute the iret, and interrupts are blocked by pending NMI. Due to
      this, iret can't be invoked in the guest, and the guest is starved
      until block is cleared by some events (e.g. canceling injection).
      
      This patch injects pending interrupts, when it's allowed, even if NMI
      is blocked. And, If an interrupts is pending after executing
      inject_pending_event(), enable_irq_window() is executed regardless of
      NMI pending counter.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarYuki Shibuya <shibuya.yk@ncos.nec.co.jp>
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      321c5658
  22. 22 Mar, 2016 2 commits