An error occurred fetching the project authors.
  1. 22 Aug, 2016 2 commits
  2. 18 Dec, 2015 1 commit
  3. 15 Jan, 2015 1 commit
    • Paul Mackerras's avatar
      powerpc/powernv: Switch off MMU before entering nap/sleep/rvwinkle mode · e7c25b93
      Paul Mackerras authored
      commit 8117ac6a upstream.
      
      Currently, when going idle, we set the flag indicating that we are in
      nap mode (paca->kvm_hstate.hwthread_state) and then execute the nap
      (or sleep or rvwinkle) instruction, all with the MMU on.  This is bad
      for two reasons: (a) the architecture specifies that those instructions
      must be executed with the MMU off, and in fact with only the SF, HV, ME
      and possibly RI bits set, and (b) this introduces a race, because as
      soon as we set the flag, another thread can switch the MMU to a guest
      context.  If the race is lost, this thread will typically start looping
      on relocation-on ISIs at 0xc...4400.
      
      This fixes it by setting the MSR as required by the architecture before
      setting the flag or executing the nap/sleep/rvwinkle instruction.
      
      [ shreyas@linux.vnet.ibm.com: Edited to handle LE ]
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      e7c25b93
  4. 30 May, 2014 1 commit
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Work around POWER8 performance monitor bugs · 9bc01a9b
      Paul Mackerras authored
      This adds workarounds for two hardware bugs in the POWER8 performance
      monitor unit (PMU), both related to interrupt generation.  The effect
      of these bugs is that PMU interrupts can get lost, leading to tools
      such as perf reporting fewer counts and samples than they should.
      
      The first bug relates to the PMAO (perf. mon. alert occurred) bit in
      MMCR0; setting it should cause an interrupt, but doesn't.  The other
      bug relates to the PMAE (perf. mon. alert enable) bit in MMCR0.
      Setting PMAE when a counter is negative and counter negative
      conditions are enabled to cause alerts should cause an alert, but
      doesn't.
      
      The workaround for the first bug is to create conditions where a
      counter will overflow, whenever we are about to restore a MMCR0
      value that has PMAO set (and PMAO_SYNC clear).  The workaround for
      the second bug is to freeze all counters using MMCR2 before reading
      MMCR0.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      9bc01a9b
  5. 28 May, 2014 1 commit
    • Michael Ellerman's avatar
      powerpc/powernv: Add support for POWER8 split core on powernv · e2186023
      Michael Ellerman authored
      Upcoming POWER8 chips support a concept called split core. This is where the
      core can be split into subcores that although not full cores, are able to
      appear as full cores to a guest.
      
      The splitting & unsplitting procedure is mildly complicated, and explained at
      length in the comments within the patch.
      
      One notable detail is that when splitting or unsplitting we need to pull
      offline cpus out of their offline state to do work as part of the procedure.
      
      The interface for changing the split mode is via a sysfs file, eg:
      
       $ echo 2 > /sys/devices/system/cpu/subcores_per_core
      
      Currently supported values are '1', '2' and '4'. And indicate respectively that
      the core should be unsplit, split in half, and split in quarters. These modes
      correspond to threads_per_subcore of 8, 4 and 2.
      
      We do not allow changing the split mode while KVM VMs are active. This is to
      prevent the value changing while userspace is configuring the VM, and also to
      prevent the mode being changed in such a way that existing guests are unable to
      be run.
      
      CPU hotplug fixes by Srivatsa.  max_cpus fixes by Mahesh.  cpuset fixes by
      benh.  Fix for irq race by paulus.  The rest by mikey and mpe.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      e2186023
  6. 28 Apr, 2014 1 commit
    • Michael Neuling's avatar
      powerpc/tm: Add checking to treclaim/trechkpt · 7f06f21d
      Michael Neuling authored
      If we do a treclaim and we are not in TM suspend mode, it results in a TM bad
      thing (ie. a 0x700 program check).  Similarly if we do a trechkpt and we have
      an active transaction or TEXASR Failure Summary (FS) is not set, we also take a
      TM bad thing.
      
      This should never happen, but if it does (ie. a kernel bug), the cause is
      almost impossible to debug as the GPR state is mostly userspace and hence we
      don't get a call chain.
      
      This adds some checks in these cases case a BUG_ON() (in asm) in case we ever
      hit these cases.  It moves the register saving around to preserve r1 till later
      also.
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      7f06f21d
  7. 07 Apr, 2014 1 commit
    • Vaidyanathan Srinivasan's avatar
      cpufreq: powernv: cpufreq driver for powernv platform · b3d627a5
      Vaidyanathan Srinivasan authored
      Backend driver to dynamically set voltage and frequency on
      IBM POWER non-virtualized platforms.  Power management SPRs
      are used to set the required PState.
      
      This driver works in conjunction with cpufreq governors
      like 'ondemand' to provide a demand based frequency and
      voltage setting on IBM POWER non-virtualized platforms.
      
      PState table is obtained from OPAL v3 firmware through device
      tree.
      
      powernv_cpufreq back-end driver would parse the relevant device-tree
      nodes and initialise the cpufreq subsystem on powernv platform.
      
      The code was originally written by svaidy@linux.vnet.ibm.com. Over
      time it was modified to accomodate bug-fixes as well as updates to the
      the cpu-freq core. Relevant portions of the change logs corresponding
      to those modifications are noted below:
      
       * The policy->cpus needs to be populated in a hotplug-invariant
         manner instead of using cpu_sibling_mask() which varies with
         cpu-hotplug. This is because the cpufreq core code copies this
         content into policy->related_cpus mask which should not vary on
         cpu-hotplug. [Authored by srivatsa.bhat@linux.vnet.ibm.com]
      
       * Create a helper routine that can return the cpu-frequency for the
         corresponding pstate_id. Also, cache the values of the pstate_max,
         pstate_min and pstate_nominal and nr_pstates in a static structure
         so that they can be reused in the future to perform any
         validations. [Authored by ego@linux.vnet.ibm.com]
      
       * Create a driver attribute named cpuinfo_nominal_freq which creates
         a sysfs read-only file named cpuinfo_nominal_freq. Export the
         frequency corresponding to the nominal_pstate through this
         interface.
      
           Nominal frequency is the highest non-turbo frequency for the
         platform.  This is generally used for setting governor policies
         from user space for optimal energy efficiency. [Authored by
         ego@linux.vnet.ibm.com]
      
       * Implement a powernv_cpufreq_get(unsigned int cpu) method which will
         return the current operating frequency. Export this via the sysfs
         interface cpuinfo_cur_freq by setting powernv_cpufreq_driver.get to
         powernv_cpufreq_get(). [Authored by ego@linux.vnet.ibm.com]
      
      [Change log updated by ego@linux.vnet.ibm.com]
      Reviewed-by: default avatarPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Signed-off-by: default avatarVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: default avatarSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      b3d627a5
  8. 29 Mar, 2014 1 commit
  9. 23 Mar, 2014 2 commits
  10. 20 Mar, 2014 2 commits
    • Scott Wood's avatar
      powerpc/booke64: Use SPRG7 for VDSO · 9d378dfa
      Scott Wood authored
      Previously SPRG3 was marked for use by both VDSO and critical
      interrupts (though critical interrupts were not fully implemented).
      
      In commit 8b64a9df ("powerpc/booke64:
      Use SPRG0/3 scratch for bolted TLB miss & crit int"), Mihai Caraman
      made an attempt to resolve this conflict by restoring the VDSO value
      early in the critical interrupt, but this has some issues:
      
       - It's incompatible with EXCEPTION_COMMON which restores r13 from the
         by-then-overwritten scratch (this cost me some debugging time).
       - It forces critical exceptions to be a special case handled
         differently from even machine check and debug level exceptions.
       - It didn't occur to me that it was possible to make this work at all
         (by doing a final "ld r13, PACA_EXCRIT+EX_R13(r13)") until after
         I made (most of) this patch. :-)
      
      It might be worth investigating using a load rather than SPRG on return
      from all exceptions (except TLB misses where the scratch never leaves
      the SPRG) -- it could save a few cycles.  Until then, let's stick with
      SPRG for all exceptions.
      
      Since we cannot use SPRG4-7 for scratch without corrupting the state of
      a KVM guest, move VDSO to SPRG7 on book3e.  Since neither SPRG4-7 nor
      critical interrupts exist on book3s, SPRG3 is still used for VDSO
      there.
      Signed-off-by: default avatarScott Wood <scottwood@freescale.com>
      Cc: Mihai Caraman <mihai.caraman@freescale.com>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: kvm-ppc@vger.kernel.org
      9d378dfa
    • Wang Dongsheng's avatar
  11. 27 Jan, 2014 5 commits
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Add support for DABRX register on POWER7 · 8563bf52
      Paul Mackerras authored
      The DABRX (DABR extension) register on POWER7 processors provides finer
      control over which accesses cause a data breakpoint interrupt.  It
      contains 3 bits which indicate whether to enable accesses in user,
      kernel and hypervisor modes respectively to cause data breakpoint
      interrupts, plus one bit that enables both real mode and virtual mode
      accesses to cause interrupts.  Currently, KVM sets DABRX to allow
      both kernel and user accesses to cause interrupts while in the guest.
      
      This adds support for the guest to specify other values for DABRX.
      PAPR defines a H_SET_XDABR hcall to allow the guest to set both DABR
      and DABRX with one call.  This adds a real-mode implementation of
      H_SET_XDABR, which shares most of its code with the existing H_SET_DABR
      implementation.  To support this, we add a per-vcpu field to store the
      DABRX value plus code to get and set it via the ONE_REG interface.
      
      For Linux guests to use this new hcall, userspace needs to add
      "hcall-xdabr" to the set of strings in the /chosen/hypertas-functions
      property in the device tree.  If userspace does this and then migrates
      the guest to a host where the kernel doesn't include this patch, then
      userspace will need to implement H_SET_XDABR by writing the specified
      DABR value to the DABR using the ONE_REG interface.  In that case, the
      old kernel will set DABRX to DABRX_USER | DABRX_KERNEL.  That should
      still work correctly, at least for Linux guests, since Linux guests
      cope with getting data breakpoint interrupts in modes that weren't
      requested by just ignoring the interrupt, and Linux guests never set
      DABRX_BTI.
      
      The other thing this does is to make H_SET_DABR and H_SET_XDABR work
      on POWER8, which has the DAWR and DAWRX instead of DABR/X.  Guests that
      know about POWER8 should use H_SET_MODE rather than H_SET_[X]DABR, but
      guests running in POWER7 compatibility mode will still use H_SET_[X]DABR.
      For them, this adds the logic to convert DABR/X values into DAWR/X values
      on POWER8.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      8563bf52
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Handle new LPCR bits on POWER8 · e0622bd9
      Paul Mackerras authored
      POWER8 has a bit in the LPCR to enable or disable the PURR and SPURR
      registers to count when in the guest.  Set this bit.
      
      POWER8 has a field in the LPCR called AIL (Alternate Interrupt Location)
      which is used to enable relocation-on interrupts.  Allow userspace to
      set this field.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      e0622bd9
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Handle guest using doorbells for IPIs · aa31e843
      Paul Mackerras authored
      * SRR1 wake reason field for system reset interrupt on wakeup from nap
        is now a 4-bit field on P8, compared to 3 bits on P7.
      
      * Set PECEDP in LPCR when napping because of H_CEDE so guest doorbells
        will wake us up.
      
      * Waking up from nap because of a guest doorbell interrupt is not a
        reason to exit the guest.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      aa31e843
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Implement architecture compatibility modes for POWER8 · 5557ae0e
      Paul Mackerras authored
      This allows us to select architecture 2.05 (POWER6) or 2.06 (POWER7)
      compatibility modes on a POWER8 processor.  (Note that transactional
      memory is disabled for usermode if either or both of the PCR_TM_DIS
      and PCR_ARCH_206 bits are set.)
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      5557ae0e
    • Michael Neuling's avatar
      KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs · b005255e
      Michael Neuling authored
      This adds fields to the struct kvm_vcpu_arch to store the new
      guest-accessible SPRs on POWER8, adds code to the get/set_one_reg
      functions to allow userspace to access this state, and adds code to
      the guest entry and exit to context-switch these SPRs between host
      and guest.
      
      Note that DPDES (Directed Privileged Doorbell Exception State) is
      shared between threads on a core; hence we store it in struct
      kvmppc_vcore and have the master thread save and restore it.
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      b005255e
  12. 08 Jan, 2014 1 commit
  13. 22 Nov, 2013 1 commit
  14. 17 Oct, 2013 3 commits
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Support POWER6 compatibility mode on POWER7 · 388cc6e1
      Paul Mackerras authored
      This enables us to use the Processor Compatibility Register (PCR) on
      POWER7 to put the processor into architecture 2.05 compatibility mode
      when running a guest.  In this mode the new instructions and registers
      that were introduced on POWER7 are disabled in user mode.  This
      includes all the VSX facilities plus several other instructions such
      as ldbrx, stdbrx, popcntw, popcntd, etc.
      
      To select this mode, we have a new register accessible through the
      set/get_one_reg interface, called KVM_REG_PPC_ARCH_COMPAT.  Setting
      this to zero gives the full set of capabilities of the processor.
      Setting it to one of the "logical" PVR values defined in PAPR puts
      the vcpu into the compatibility mode for the corresponding
      architecture level.  The supported values are:
      
      0x0f000002	Architecture 2.05 (POWER6)
      0x0f000003	Architecture 2.06 (POWER7)
      0x0f100003	Architecture 2.06+ (POWER7+)
      
      Since the PCR is per-core, the architecture compatibility level and
      the corresponding PCR value are stored in the struct kvmppc_vcore, and
      are therefore shared between all vcpus in a virtual core.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      [agraf: squash in fix to add missing break statements and documentation]
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      388cc6e1
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Store LPCR value for each virtual core · a0144e2a
      Paul Mackerras authored
      This adds the ability to have a separate LPCR (Logical Partitioning
      Control Register) value relating to a guest for each virtual core,
      rather than only having a single value for the whole VM.  This
      corresponds to what real POWER hardware does, where there is a LPCR
      per CPU thread but most of the fields are required to have the same
      value on all active threads in a core.
      
      The per-virtual-core LPCR can be read and written using the
      GET/SET_ONE_REG interface.  Userspace can can only modify the
      following fields of the LPCR value:
      
      DPFD	Default prefetch depth
      ILE	Interrupt little-endian
      TC	Translation control (secondary HPT hash group search disable)
      
      We still maintain a per-VM default LPCR value in kvm->arch.lpcr, which
      contains bits relating to memory management, i.e. the Virtualized
      Partition Memory (VPM) bits and the bits relating to guest real mode.
      When this default value is updated, the update needs to be propagated
      to the per-vcore values, so we add a kvmppc_update_lpcr() helper to do
      that.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      [agraf: fix whitespace]
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      a0144e2a
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Implement timebase offset for guests · 93b0f4dc
      Paul Mackerras authored
      This allows guests to have a different timebase origin from the host.
      This is needed for migration, where a guest can migrate from one host
      to another and the two hosts might have a different timebase origin.
      However, the timebase seen by the guest must not go backwards, and
      should go forwards only by a small amount corresponding to the time
      taken for the migration.
      
      Therefore this provides a new per-vcpu value accessed via the one_reg
      interface using the new KVM_REG_PPC_TB_OFFSET identifier.  This value
      defaults to 0 and is not modified by KVM.  On entering the guest, this
      value is added onto the timebase, and on exiting the guest, it is
      subtracted from the timebase.
      
      This is only supported for recent POWER hardware which has the TBU40
      (timebase upper 40 bits) register.  Writing to the TBU40 register only
      alters the upper 40 bits of the timebase, leaving the lower 24 bits
      unchanged.  This provides a way to modify the timebase for guest
      migration without disturbing the synchronization of the timebase
      registers across CPU cores.  The kernel rounds up the value given
      to a multiple of 2^24.
      
      Timebase values stored in KVM structures (struct kvm_vcpu, struct
      kvmppc_vcore, etc.) are stored as host timebase values.  The timebase
      values in the dispatch trace log need to be guest timebase values,
      however, since that is read directly by the guest.  This moves the
      setting of vcpu->arch.dec_expires on guest exit to a point after we
      have restored the host timebase so that vcpu->arch.dec_expires is a
      host timebase value.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      93b0f4dc
  15. 11 Oct, 2013 1 commit
  16. 05 Sep, 2013 1 commit
  17. 21 Aug, 2013 1 commit
    • Scott Wood's avatar
      powerpc: Convert some mftb/mftbu into mfspr · beb2dc0a
      Scott Wood authored
      Some CPUs (such as e500v1/v2) don't implement mftb and will take a
      trap.  mfspr should work on everything that has a timebase, and is the
      preferred instruction according to ISA v2.06.
      
      Currently we get away with mftb on 85xx because the assembler converts
      it to mfspr due to -Wa,-me500.  However, that flag has other effects
      that are undesireable for certain targets (e.g.  lwsync is converted to
      sync), and is hostile to multiplatform kernels.  Thus we would like to
      stop setting it for all e500-family builds.
      
      mftb/mftbu instances which are in 85xx code or common code are
      converted.  Instances which will never run on 85xx are left alone.
      Signed-off-by: default avatarScott Wood <scottwood@freescale.com>
      beb2dc0a
  18. 20 Aug, 2013 1 commit
    • Scott Wood's avatar
      powerpc/fsl-booke: Work around erratum A-006958 · d52459ca
      Scott Wood authored
      Erratum A-006598 says that 64-bit mftb is not atomic -- it's subject
      to a similar race condition as doing mftbu/mftbl on 32-bit.  The lower
      half of timebase is updated before the upper half; thus, we can share
      the workaround for a similar bug on Cell.  This workaround involves
      looping if the lower half of timebase is zero, thus avoiding the need
      for a scratch register (other than CR0).  This workaround must be
      avoided when the timebase is frozen, such as during the timebase sync
      code.
      
      This deals with kernel and vdso accesses, but other userspace accesses
      will of course need to be fixed elsewhere.
      Signed-off-by: default avatarScott Wood <scottwood@freescale.com>
      d52459ca
  19. 14 Aug, 2013 1 commit
  20. 09 Aug, 2013 1 commit
  21. 24 Jul, 2013 1 commit
  22. 01 Jul, 2013 2 commits
  23. 31 May, 2013 3 commits
  24. 02 May, 2013 3 commits
  25. 26 Apr, 2013 2 commits
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Improve real-mode handling of external interrupts · 4619ac88
      Paul Mackerras authored
      This streamlines our handling of external interrupts that come in
      while we're in the guest.  First, when waking up a hardware thread
      that was napping, we split off the "napping due to H_CEDE" case
      earlier, and use the code that handles an external interrupt (0x500)
      in the guest to handle that too.  Secondly, the code that handles
      those external interrupts now checks if any other thread is exiting
      to the host before bouncing an external interrupt to the guest, and
      also checks that there is actually an external interrupt pending for
      the guest before setting the LPCR MER bit (mediated external request).
      
      This also makes sure that we clear the "ceded" flag when we handle a
      wakeup from cede in real mode, and fixes a potential infinite loop
      in kvmppc_run_vcpu() which can occur if we ever end up with the ceded
      flag set but MSR[EE] off.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      4619ac88
    • Michael Ellerman's avatar
      powerpc/perf: Add support for SIER · 8f61aa32
      Michael Ellerman authored
      On power8 we have a new SIER (Sampled Instruction Event Register), which
      captures information about instructions when we have random sampling
      enabled.
      
      Add support for loading the SIER into pt_regs, overloading regs->dar.
      Also set the new NO_SIPR flag in regs->result if we don't have SIPR.
      
      Update regs_sihv/sipr() to look for SIPR/SIHV in SIER.
      Signed-off-by: default avatarMichael Ellerman <michael@ellerman.id.au>
      Acked-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8f61aa32