1. 30 Nov, 2012 3 commits
    • Will Auld's avatar
      KVM: x86: Emulate IA32_TSC_ADJUST MSR · ba904635
      Will Auld authored
      CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported
      
      Basic design is to emulate the MSR by allowing reads and writes to a guest
      vcpu specific location to store the value of the emulated MSR while adding
      the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will
      be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This
      is of course as long as the "use TSC counter offsetting" VM-execution control
      is enabled as well as the IA32_TSC_ADJUST control.
      
      However, because hardware will only return the TSC + IA32_TSC_ADJUST +
      vmsc tsc_offset for a guest process when it does and rdtsc (with the correct
      settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one
      of these three locations. The argument against storing it in the actual MSR
      is performance. This is likely to be seldom used while the save/restore is
      required on every transition. IA32_TSC_ADJUST was created as a way to solve
      some issues with writing TSC itself so that is not an option either.
      
      The remaining option, defined above as our solution has the problem of
      returning incorrect vmcs tsc_offset values (unless we intercept and fix, not
      done here) as mentioned above. However, more problematic is that storing the
      data in vmcs tsc_offset will have a different semantic effect on the system
      than does using the actual MSR. This is illustrated in the following example:
      
      The hypervisor set the IA32_TSC_ADJUST, then the guest sets it and a guest
      process performs a rdtsc. In this case the guest process will get
      TSC + IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including
      IA32_TSC_ADJUST_guest. While the total system semantics changed the semantics
      as seen by the guest do not and hence this will not cause a problem.
      Signed-off-by: default avatarWill Auld <will.auld@intel.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      ba904635
    • Will Auld's avatar
      KVM: x86: Add code to track call origin for msr assignment · 8fe8ab46
      Will Auld authored
      In order to track who initiated the call (host or guest) to modify an msr
      value I have changed function call parameters along the call path. The
      specific change is to add a struct pointer parameter that points to (index,
      data, caller) information rather than having this information passed as
      individual parameters.
      
      The initial use for this capability is for updating the IA32_TSC_ADJUST msr
      while setting the tsc value. It is anticipated that this capability is
      useful for other tasks.
      Signed-off-by: default avatarWill Auld <will.auld@intel.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      8fe8ab46
    • Alex Williamson's avatar
      KVM: Fix user memslot overlap check · 5419369e
      Alex Williamson authored
      Prior to memory slot sorting this loop compared all of the user memory
      slots for overlap with new entries.  With memory slot sorting, we're
      just checking some number of entries in the array that may or may not
      be user slots.  Instead, walk all the slots with kvm_for_each_memslot,
      which has the added benefit of terminating early when we hit the first
      empty slot, and skip comparison to private slots.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      5419369e
  2. 29 Nov, 2012 2 commits
  3. 28 Nov, 2012 19 commits
  4. 14 Nov, 2012 3 commits
  5. 01 Nov, 2012 2 commits
    • Marcelo Tosatti's avatar
      Merge branch 'for-queue' of https://github.com/agraf/linux-2.6 into queue · f026399f
      Marcelo Tosatti authored
      * 'for-queue' of https://github.com/agraf/linux-2.6:
        PPC: ePAPR: Convert hcall header to uapi (round 2)
        KVM: PPC: Book3S HV: Fix thinko in try_lock_hpte()
        KVM: PPC: Book3S HV: Allow DTL to be set to address 0, length 0
        KVM: PPC: Book3S HV: Fix accounting of stolen time
        KVM: PPC: Book3S HV: Run virtual core whenever any vcpus in it can run
        KVM: PPC: Book3S HV: Fixes for late-joining threads
        KVM: PPC: Book3s HV: Don't access runnable threads list without vcore lock
        KVM: PPC: Book3S HV: Fix some races in starting secondary threads
        KVM: PPC: Book3S HV: Allow KVM guests to stop secondary threads coming online
        PPC: ePAPR: Convert header to uapi
        KVM: PPC: Move mtspr/mfspr emulation into own functions
        KVM: Documentation: Fix reentry-to-be-consistent paragraph
        KVM: PPC: 44x: fix DCR read/write
      f026399f
    • Joerg Roedel's avatar
      KVM: SVM: update MAINTAINERS entry · 7de609c8
      Joerg Roedel authored
      I have no access to my AMD email address anymore. Update
      entry in MAINTAINERS to the new address.
      
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarJoerg Roedel <joro@8bytes.org>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      7de609c8
  6. 31 Oct, 2012 2 commits
  7. 30 Oct, 2012 9 commits
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Fix thinko in try_lock_hpte() · 8b5869ad
      Paul Mackerras authored
      This fixes an error in the inline asm in try_lock_hpte() where we
      were erroneously using a register number as an immediate operand.
      The bug only affects an error path, and in fact the code will still
      work as long as the compiler chooses some register other than r0
      for the "bits" variable.  Nevertheless it should still be fixed.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      8b5869ad
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Allow DTL to be set to address 0, length 0 · 9f8c8c78
      Paul Mackerras authored
      Commit 55b665b0 ("KVM: PPC: Book3S HV: Provide a way for userspace
      to get/set per-vCPU areas") includes a check on the length of the
      dispatch trace log (DTL) to make sure the buffer is at least one entry
      long.  This is appropriate when registering a buffer, but the
      interface also allows for any existing buffer to be unregistered by
      specifying a zero address.  In this case the length check is not
      appropriate.  This makes the check conditional on the address being
      non-zero.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      9f8c8c78
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Fix accounting of stolen time · c7b67670
      Paul Mackerras authored
      Currently the code that accounts stolen time tends to overestimate the
      stolen time, and will sometimes report more stolen time in a DTL
      (dispatch trace log) entry than has elapsed since the last DTL entry.
      This can cause guests to underflow the user or system time measured
      for some tasks, leading to ridiculous CPU percentages and total runtimes
      being reported by top and other utilities.
      
      In addition, the current code was designed for the previous policy where
      a vcore would only run when all the vcpus in it were runnable, and so
      only counted stolen time on a per-vcore basis.  Now that a vcore can
      run while some of the vcpus in it are doing other things in the kernel
      (e.g. handling a page fault), we need to count the time when a vcpu task
      is preempted while it is not running as part of a vcore as stolen also.
      
      To do this, we bring back the BUSY_IN_HOST vcpu state and extend the
      vcpu_load/put functions to count preemption time while the vcpu is
      in that state.  Handling the transitions between the RUNNING and
      BUSY_IN_HOST states requires checking and updating two variables
      (accumulated time stolen and time last preempted), so we add a new
      spinlock, vcpu->arch.tbacct_lock.  This protects both the per-vcpu
      stolen/preempt-time variables, and the per-vcore variables while this
      vcpu is running the vcore.
      
      Finally, we now don't count time spent in userspace as stolen time.
      The task could be executing in userspace on behalf of the vcpu, or
      it could be preempted, or the vcpu could be genuinely stopped.  Since
      we have no way of dividing up the time between these cases, we don't
      count any of it as stolen.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      c7b67670
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Run virtual core whenever any vcpus in it can run · 8455d79e
      Paul Mackerras authored
      Currently the Book3S HV code implements a policy on multi-threaded
      processors (i.e. POWER7) that requires all of the active vcpus in a
      virtual core to be ready to run before we run the virtual core.
      However, that causes problems on reset, because reset stops all vcpus
      except vcpu 0, and can also reduce throughput since all four threads
      in a virtual core have to wait whenever any one of them hits a
      hypervisor page fault.
      
      This relaxes the policy, allowing the virtual core to run as soon as
      any vcpu in it is runnable.  With this, the KVMPPC_VCPU_STOPPED state
      and the KVMPPC_VCPU_BUSY_IN_HOST state have been combined into a single
      KVMPPC_VCPU_NOTREADY state, since we no longer need to distinguish
      between them.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      8455d79e
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Fixes for late-joining threads · 2f12f034
      Paul Mackerras authored
      If a thread in a virtual core becomes runnable while other threads
      in the same virtual core are already running in the guest, it is
      possible for the latecomer to join the others on the core without
      first pulling them all out of the guest.  Currently this only happens
      rarely, when a vcpu is first started.  This fixes some bugs and
      omissions in the code in this case.
      
      First, we need to check for VPA updates for the latecomer and make
      a DTL entry for it.  Secondly, if it comes along while the master
      vcpu is doing a VPA update, we don't need to do anything since the
      master will pick it up in kvmppc_run_core.  To handle this correctly
      we introduce a new vcore state, VCORE_STARTING.  Thirdly, there is
      a race because we currently clear the hardware thread's hwthread_req
      before waiting to see it get to nap.  A latecomer thread could have
      its hwthread_req cleared before it gets to test it, and therefore
      never increment the nap_count, leading to messages about wait_for_nap
      timeouts.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      2f12f034
    • Paul Mackerras's avatar
      KVM: PPC: Book3s HV: Don't access runnable threads list without vcore lock · 913d3ff9
      Paul Mackerras authored
      There were a few places where we were traversing the list of runnable
      threads in a virtual core, i.e. vc->runnable_threads, without holding
      the vcore spinlock.  This extends the places where we hold the vcore
      spinlock to cover everywhere that we traverse that list.
      
      Since we possibly need to sleep inside kvmppc_book3s_hv_page_fault,
      this moves the call of it from kvmppc_handle_exit out to
      kvmppc_vcpu_run, where we don't hold the vcore lock.
      
      In kvmppc_vcore_blocked, we don't actually need to check whether
      all vcpus are ceded and don't have any pending exceptions, since the
      caller has already done that.  The caller (kvmppc_run_vcpu) wasn't
      actually checking for pending exceptions, so we add that.
      
      The change of if to while in kvmppc_run_vcpu is to make sure that we
      never call kvmppc_remove_runnable() when the vcore state is RUNNING or
      EXITING.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      913d3ff9
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Fix some races in starting secondary threads · 7b444c67
      Paul Mackerras authored
      Subsequent patches implementing in-kernel XICS emulation will make it
      possible for IPIs to arrive at secondary threads at arbitrary times.
      This fixes some races in how we start the secondary threads, which
      if not fixed could lead to occasional crashes of the host kernel.
      
      This makes sure that (a) we have grabbed all the secondary threads,
      and verified that they are no longer in the kernel, before we start
      any thread, (b) that the secondary thread loads its vcpu pointer
      after clearing the IPI that woke it up (so we don't miss a wakeup),
      and (c) that the secondary thread clears its vcpu pointer before
      incrementing the nap count.  It also removes unnecessary setting
      of the vcpu and vcore pointers in the paca in kvmppc_core_vcpu_load.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      7b444c67
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Allow KVM guests to stop secondary threads coming online · 512691d4
      Paul Mackerras authored
      When a Book3S HV KVM guest is running, we need the host to be in
      single-thread mode, that is, all of the cores (or at least all of
      the cores where the KVM guest could run) to be running only one
      active hardware thread.  This is because of the hardware restriction
      in POWER processors that all of the hardware threads in the core
      must be in the same logical partition.  Complying with this restriction
      is much easier if, from the host kernel's point of view, only one
      hardware thread is active.
      
      This adds two hooks in the SMP hotplug code to allow the KVM code to
      make sure that secondary threads (i.e. hardware threads other than
      thread 0) cannot come online while any KVM guest exists.  The KVM
      code still has to check that any core where it runs a guest has the
      secondary threads offline, but having done that check it can now be
      sure that they will not come online while the guest is running.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Acked-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      512691d4
    • Alexander Graf's avatar
      PPC: ePAPR: Convert header to uapi · c99ec973
      Alexander Graf authored
      The new uapi framework splits kernel internal and user space exported
      bits of header files more cleanly. Adjust the ePAPR header accordingly.
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      c99ec973