1. 04 Nov, 2015 4 commits
    • Paolo Bonzini's avatar
      Merge tag 'kvm-arm-for-4.4' of... · 197a4f4b
      Paolo Bonzini authored
      Merge tag 'kvm-arm-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
      
      KVM/ARM Changes for v4.4-rc1
      
      Includes a number of fixes for the arch-timer, introducing proper
      level-triggered semantics for the arch-timers, a series of patches to
      synchronously halt a guest (prerequisite for IRQ forwarding), some tracepoint
      improvements, a tweak for the EL2 panic handlers, some more VGIC cleanups
      getting rid of redundant state, and finally a stylistic change that gets rid of
      some ctags warnings.
      
      Conflicts:
      	arch/x86/include/asm/kvm_host.h
      197a4f4b
    • Pavel Fedin's avatar
      KVM: arm/arm64: Merge vgic_set_lr() and vgic_sync_lr_elrsr() · 26caea76
      Pavel Fedin authored
      Now we see that vgic_set_lr() and vgic_sync_lr_elrsr() are always used
      together. Merge them into one function, saving from second vgic_ops
      dereferencing every time.
      Signed-off-by: default avatarPavel Fedin <p.fedin@samsung.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      26caea76
    • Pavel Fedin's avatar
      KVM: arm/arm64: Clean up vgic_retire_lr() and surroundings · 212c7654
      Pavel Fedin authored
      1. Remove unnecessary 'irq' argument, because irq number can be retrieved
         from the LR.
      2. Since cff9211e
         ("arm/arm64: KVM: Fix arch timer behavior for disabled interrupts ")
         LR_STATE_PENDING is queued back by vgic_retire_lr() itself. Also, it
         clears vlr.state itself. Therefore, we remove the same, now duplicated,
         check with all accompanying bit manipulations from vgic_unqueue_irqs().
      3. vgic_retire_lr() is always accompanied by vgic_irq_clear_queued(). Since
         it already does more than just clearing the LR, move
         vgic_irq_clear_queued() inside of it.
      Signed-off-by: default avatarPavel Fedin <p.fedin@samsung.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      212c7654
    • Pavel Fedin's avatar
      KVM: arm/arm64: Optimize away redundant LR tracking · c4cd4c16
      Pavel Fedin authored
      Currently we use vgic_irq_lr_map in order to track which LRs hold which
      IRQs, and lr_used bitmap in order to track which LRs are used or free.
      
      vgic_irq_lr_map is actually used only for piggy-back optimization, and
      can be easily replaced by iteration over lr_used. This is good because in
      future, when LPI support is introduced, number of IRQs will grow up to at
      least 16384, while numbers from 1024 to 8192 are never going to be used.
      This would be a huge memory waste.
      
      In its turn, lr_used is also completely redundant since
      ae705930 ("arm/arm64: KVM: Keep elrsr/aisr
      in sync with software model"), because together with lr_used we also update
      elrsr. This allows to easily replace lr_used with elrsr, inverting all
      conditions (because in elrsr '1' means 'free').
      Signed-off-by: default avatarPavel Fedin <p.fedin@samsung.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      c4cd4c16
  2. 02 Nov, 2015 2 commits
  3. 29 Oct, 2015 3 commits
    • Christian Borntraeger's avatar
      KVM: s390: use simple switch statement as multiplexer · 46b708ea
      Christian Borntraeger authored
      We currently do some magic shifting (by exploiting that exit codes
      are always a multiple of 4) and a table lookup to jump into the
      exit handlers. This causes some calculations and checks, just to
      do an potentially expensive function call.
      
      Changing that to a switch statement gives the compiler the chance
      to inline and dynamically decide between jump tables or inline
      compare and branches. In addition it makes the code more readable.
      
      bloat-o-meter gives me a small reduction in code size:
      
      add/remove: 0/7 grow/shrink: 1/1 up/down: 986/-1334 (-348)
      function                                     old     new   delta
      kvm_handle_sie_intercept                      72    1058    +986
      handle_prog                                  704     696      -8
      handle_noop                                   54       -     -54
      handle_partial_execution                      60       -     -60
      intercept_funcs                              120       -    -120
      handle_instruction                           198       -    -198
      handle_validity                              210       -    -210
      handle_stop                                  316       -    -316
      handle_external_interrupt                    368       -    -368
      
      Right now my gcc does conditional branches instead of jump tables.
      The inlining seems to give us enough cycles as some micro-benchmarking
      shows minimal improvements, but still in noise.
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: default avatarCornelia Huck <cornelia.huck@de.ibm.com>
      46b708ea
    • Christian Borntraeger's avatar
      KVM: s390: drop useless newline in debugging data · 58c383c6
      Christian Borntraeger authored
      the s390 debug feature does not need newlines. In fact it will
      result in empty lines. Get rid of 4 leftovers.
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: default avatarCornelia Huck <cornelia.huck@de.ibm.com>
      58c383c6
    • David Hildenbrand's avatar
      KVM: s390: SCA must not cross page boundaries · c5c2c393
      David Hildenbrand authored
      We seemed to have missed a few corner cases in commit f6c137ff
      ("KVM: s390: randomize sca address").
      
      The SCA has a maximum size of 2112 bytes. By setting the sca_offset to
      some unlucky numbers, we exceed the page.
      
      0x7c0 (1984) -> Fits exactly
      0x7d0 (2000) -> 16 bytes out
      0x7e0 (2016) -> 32 bytes out
      0x7f0 (2032) -> 48 bytes out
      
      One VCPU entry is 32 bytes long.
      
      For the last two cases, we actually write data to the other page.
      1. The address of the VCPU.
      2. Injection/delivery/clearing of SIGP externall calls via SIGP IF.
      
      Especially the 2. happens regularly. So this could produce two problems:
      1. The guest losing/getting external calls.
      2. Random memory overwrites in the host.
      
      So this problem happens on every 127 + 128 created VM with 64 VCPUs.
      
      Cc: stable@vger.kernel.org # v3.15+
      Acked-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      c5c2c393
  4. 22 Oct, 2015 18 commits
    • Michal Marek's avatar
      KVM: arm: Do not indent the arguments of DECLARE_BITMAP · 5fdf876d
      Michal Marek authored
      Besides being a coding style issue, it confuses make tags:
      
      ctags: Warning: include/kvm/arm_vgic.h:307: null expansion of name pattern "\1"
      ctags: Warning: include/kvm/arm_vgic.h:308: null expansion of name pattern "\1"
      ctags: Warning: include/kvm/arm_vgic.h:309: null expansion of name pattern "\1"
      ctags: Warning: include/kvm/arm_vgic.h:317: null expansion of name pattern "\1"
      
      Cc: kvmarm@lists.cs.columbia.edu
      Signed-off-by: default avatarMichal Marek <mmarek@suse.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      5fdf876d
    • Mark Rutland's avatar
      arm64: kvm: restore EL1N SP for panic · db85c55f
      Mark Rutland authored
      If we panic in hyp mode, we inject a call to panic() into the EL1N host
      kernel. If a guest context is active, we first attempt to restore the
      minimal amount of state necessary to execute the host kernel with
      restore_sysregs.
      
      However, the SP is restored as part of restore_common_regs, and so we
      may return to the host's panic() function with the SP of the guest. Any
      calculations based on the SP will be bogus, and any attempt to access
      the stack will result in recursive data aborts.
      
      When running Linux as a guest, the guest's EL1N SP is like to be some
      valid kernel address. In this case, the host kernel may use that region
      as a stack for panic(), corrupting it in the process.
      
      Avoid the problem by restoring the host SP prior to returning to the
      host. To prevent misleading backtraces in the host, the FP is zeroed at
      the same time. We don't need any of the other "common" registers in
      order to panic successfully.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Cc: Christoffer Dall <christoffer.dall@linaro.org>
      Cc: <kvmarm@lists.cs.columbia.edu>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      db85c55f
    • Christoffer Dall's avatar
      arm/arm64: KVM: Add tracepoints for vgic and timer · e21f0910
      Christoffer Dall authored
      The VGIC and timer code for KVM arm/arm64 doesn't have any tracepoints
      or tracepoint infrastructure defined.  Rewriting some of the timer code
      handling showed me how much we need this, so let's add these simple
      trace points once and for all and we can easily expand with additional
      trace points in these files as we go along.
      
      Cc: Wei Huang <wei@redhat.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      e21f0910
    • Christoffer Dall's avatar
      arm/arm64: KVM: Improve kvm_exit tracepoint · b5905dc1
      Christoffer Dall authored
      The ARM architecture only saves the exit class to the HSR (ESR_EL2 for
      arm64) on synchronous exceptions, not on asynchronous exceptions like an
      IRQ.  However, we only report the exception class on kvm_exit, which is
      confusing because an IRQ looks like it exited at some PC with the same
      reason as the previous exit.  Add a lookup table for the exception index
      and prepend the kvm_exit tracepoint text with the exception type to
      clarify this situation.
      
      Also resolve the exception class (EC) to a human-friendly text version
      so the trace output becomes immediately usable for debugging this code.
      
      Cc: Wei Huang <wei@redhat.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      b5905dc1
    • Pavel Fedin's avatar
      KVM: arm/arm64: Fix vGIC documentation · 952105ab
      Pavel Fedin authored
      Correct some old mistakes in the API documentation:
      
      1. VCPU is identified by index (using kvm_get_vcpu() function), but
         "cpu id" can be mistaken for affinity ID.
      2. Some error codes are wrong.
      
        [ Slightly tweaked some grammer and did some s/CPU index/vcpu_index/
          in the descriptions.  -Christoffer ]
      Signed-off-by: default avatarPavel Fedin <p.fedin@samsung.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      952105ab
    • Eric Auger's avatar
      KVM: arm/arm64: implement kvm_arm_[halt,resume]_guest · 3b92830a
      Eric Auger authored
      We introduce kvm_arm_halt_guest and resume functions. They
      will be used for IRQ forward state change.
      
      Halt is synchronous and prevents the guest from being re-entered.
      We use the same mechanism put in place for PSCI former pause,
      now renamed power_off. A new flag is introduced in arch vcpu state,
      pause, only meant to be used by those functions.
      Signed-off-by: default avatarEric Auger <eric.auger@linaro.org>
      Reviewed-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      3b92830a
    • Eric Auger's avatar
      KVM: arm/arm64: check power_off in critical section before VCPU run · 101d3da0
      Eric Auger authored
      In case a vcpu off PSCI call is called just after we executed the
      vcpu_sleep check, we can enter the guest although power_off
      is set. Let's check the power_off state in the critical section,
      just before entering the guest.
      Signed-off-by: default avatarEric Auger <eric.auger@linaro.org>
      Reported-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Reviewed-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      101d3da0
    • Eric Auger's avatar
      KVM: arm/arm64: check power_off in kvm_arch_vcpu_runnable · 4f5f1dc0
      Eric Auger authored
      kvm_arch_vcpu_runnable now also checks whether the power_off
      flag is set.
      Signed-off-by: default avatarEric Auger <eric.auger@linaro.org>
      Reviewed-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      4f5f1dc0
    • Eric Auger's avatar
      KVM: arm/arm64: rename pause into power_off · 3781528e
      Eric Auger authored
      The kvm_vcpu_arch pause field is renamed into power_off to prepare
      for the introduction of a new pause field. Also vcpu_pause is renamed
      into vcpu_sleep since we will sleep until both power_off and pause are
      false.
      Signed-off-by: default avatarEric Auger <eric.auger@linaro.org>
      Reviewed-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      3781528e
    • Wei Huang's avatar
      arm/arm64: KVM : Enable vhost device selection under KVM config menu · 75755c6d
      Wei Huang authored
      vhost drivers provide guest VMs with better I/O performance and lower
      CPU utilization. This patch allows users to select vhost devices under
      KVM configuration menu on ARM. This makes vhost support on arm/arm64
      on a par with other architectures (e.g. x86, ppc).
      Signed-off-by: default avatarWei Huang <wei@redhat.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      75755c6d
    • Christoffer Dall's avatar
      arm/arm64: KVM: Support edge-triggered forwarded interrupts · 8fe2f19e
      Christoffer Dall authored
      We mark edge-triggered interrupts with the HW bit set as queued to
      prevent the VGIC code from injecting LRs with both the Active and
      Pending bits set at the same time while also setting the HW bit,
      because the hardware does not support this.
      
      However, this means that we must also clear the queued flag when we sync
      back a LR where the state on the physical distributor went from active
      to inactive because the guest deactivated the interrupt.  At this point
      we must also check if the interrupt is pending on the distributor, and
      tell the VGIC to queue it again if it is.
      
      Since these actions on the sync path are extremely close to those for
      level-triggered interrupts, rename process_level_irq to
      process_queued_irq, allowing it to cater for both cases.
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      8fe2f19e
    • Christoffer Dall's avatar
      arm/arm64: KVM: Rework the arch timer to use level-triggered semantics · 4b4b4512
      Christoffer Dall authored
      The arch timer currently uses edge-triggered semantics in the sense that
      the line is never sampled by the vgic and lowering the line from the
      timer to the vgic doesn't have any effect on the pending state of
      virtual interrupts in the vgic.  This means that we do not support a
      guest with the otherwise valid behavior of (1) disable interrupts (2)
      enable the timer (3) disable the timer (4) enable interrupts.  Such a
      guest would validly not expect to see any interrupts on real hardware,
      but will see interrupts on KVM.
      
      This patch fixes this shortcoming through the following series of
      changes.
      
      First, we change the flow of the timer/vgic sync/flush operations.  Now
      the timer is always flushed/synced before the vgic, because the vgic
      samples the state of the timer output.  This has the implication that we
      move the timer operations in to non-preempible sections, but that is
      fine after the previous commit getting rid of hrtimer schedules on every
      entry/exit.
      
      Second, we change the internal behavior of the timer, letting the timer
      keep track of its previous output state, and only lower/raise the line
      to the vgic when the state changes.  Note that in theory this could have
      been accomplished more simply by signalling the vgic every time the
      state *potentially* changed, but we don't want to be hitting the vgic
      more often than necessary.
      
      Third, we get rid of the use of the map->active field in the vgic and
      instead simply set the interrupt as active on the physical distributor
      whenever the input to the GIC is asserted and conversely clear the
      physical active state when the input to the GIC is deasserted.
      
      Fourth, and finally, we now initialize the timer PPIs (and all the other
      unused PPIs for now), to be level-triggered, and modify the sync code to
      sample the line state on HW sync and re-inject a new interrupt if it is
      still pending at that time.
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      4b4b4512
    • Christoffer Dall's avatar
      arm/arm64: KVM: Add forwarded physical interrupts documentation · 4cf1bc4c
      Christoffer Dall authored
      Forwarded physical interrupts on arm/arm64 is a tricky concept and the
      way we deal with them is not apparently easy to understand by reading
      various specs.
      
      Therefore, add a proper documentation file explaining the flow and
      rationale of the behavior of the vgic.
      
      Some of this text was contributed by Marc Zyngier and edited by me.
      Omissions and errors are all mine.
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      4cf1bc4c
    • Christoffer Dall's avatar
      arm/arm64: KVM: Use appropriate define in VGIC reset code · 54723bb3
      Christoffer Dall authored
      We currently initialize the SGIs to be enabled in the VGIC code, but we
      use the VGIC_NR_PPIS define for this purpose, instead of the the more
      natural VGIC_NR_SGIS.  Change this slightly confusing use of the
      defines.
      
      Note: This should have no functional change, as both names are defined
      to the number 16.
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      54723bb3
    • Christoffer Dall's avatar
      arm/arm64: KVM: Implement GICD_ICFGR as RO for PPIs · 8bf9a701
      Christoffer Dall authored
      The GICD_ICFGR allows the bits for the SGIs and PPIs to be read only.
      We currently simulate this behavior by writing a hardcoded value to the
      register for the SGIs and PPIs on every write of these bits to the
      register (ignoring what the guest actually wrote), and by writing the
      same value as the reset value to the register.
      
      This is a bit counter-intuitive, as the register is RO for these bits,
      and we can just implement it that way, allowing us to control the value
      of the bits purely in the reset code.
      Reviewed-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      8bf9a701
    • Christoffer Dall's avatar
      arm/arm64: KVM: vgic: Factor out level irq processing on guest exit · 9103617d
      Christoffer Dall authored
      Currently vgic_process_maintenance() processes dealing with a completed
      level-triggered interrupt directly, but we are soon going to reuse this
      logic for level-triggered mapped interrupts with the HW bit set, so
      move this logic into a separate static function.
      
      Probably the most scary part of this commit is convincing yourself that
      the current flow is safe compared to the old one.  In the following I
      try to list the changes and why they are harmless:
      
        Move vgic_irq_clear_queued after kvm_notify_acked_irq:
          Harmless because the only potential effect of clearing the queued
          flag wrt.  kvm_set_irq is that vgic_update_irq_pending does not set
          the pending bit on the emulated CPU interface or in the
          pending_on_cpu bitmask if the function is called with level=1.
          However, the point of kvm_notify_acked_irq is to call kvm_set_irq
          with level=0, and we set the queued flag again in
          __kvm_vgic_sync_hwstate later on if the level is stil high.
      
        Move vgic_set_lr before kvm_notify_acked_irq:
          Also, harmless because the LR are cpu-local operations and
          kvm_notify_acked only affects the dist
      
        Move vgic_dist_irq_clear_soft_pend after kvm_notify_acked_irq:
          Also harmless, because now we check the level state in the
          clear_soft_pend function and lower the pending bits if the level is
          low.
      Reviewed-by: default avatarEric Auger <eric.auger@linaro.org>
      Reviewed-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      9103617d
    • Christoffer Dall's avatar
      arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block · d35268da
      Christoffer Dall authored
      We currently schedule a soft timer every time we exit the guest if the
      timer did not expire while running the guest.  This is really not
      necessary, because the only work we do in the timer work function is to
      kick the vcpu.
      
      Kicking the vcpu does two things:
      (1) If the vpcu thread is on a waitqueue, make it runnable and remove it
      from the waitqueue.
      (2) If the vcpu is running on a different physical CPU from the one
      doing the kick, it sends a reschedule IPI.
      
      The second case cannot happen, because the soft timer is only ever
      scheduled when the vcpu is not running.  The first case is only relevant
      when the vcpu thread is on a waitqueue, which is only the case when the
      vcpu thread has called kvm_vcpu_block().
      
      Therefore, we only need to make sure a timer is scheduled for
      kvm_vcpu_block(), which we do by encapsulating all calls to
      kvm_vcpu_block() with kvm_timer_{un}schedule calls.
      
      Additionally, we only schedule a soft timer if the timer is enabled and
      unmasked, since it is useless otherwise.
      
      Note that theoretically userspace can use the SET_ONE_REG interface to
      change registers that should cause the timer to fire, even if the vcpu
      is blocked without a scheduled timer, but this case was not supported
      before this patch and we leave it for future work for now.
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      d35268da
    • Christoffer Dall's avatar
      KVM: Add kvm_arch_vcpu_{un}blocking callbacks · 3217f7c2
      Christoffer Dall authored
      Some times it is useful for architecture implementations of KVM to know
      when the VCPU thread is about to block or when it comes back from
      blocking (arm/arm64 needs to know this to properly implement timers, for
      example).
      
      Therefore provide a generic architecture callback function in line with
      what we do elsewhere for KVM generic-arch interactions.
      Reviewed-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      3217f7c2
  5. 21 Oct, 2015 4 commits
    • Gautham R. Shenoy's avatar
      KVM: PPC: Book3S HV: Handle H_DOORBELL on the guest exit path · 70aa3961
      Gautham R. Shenoy authored
      Currently a CPU running a guest can receive a H_DOORBELL in the
      following two cases:
      1) When the CPU is napping due to CEDE or there not being a guest
      vcpu.
      2) The CPU is running the guest vcpu.
      
      Case 1), the doorbell message is not cleared since we were waking up
      from nap. Hence when the EE bit gets set on transition from guest to
      host, the H_DOORBELL interrupt is delivered to the host and the
      corresponding handler is invoked.
      
      However in Case 2), the message gets cleared by the action of taking
      the H_DOORBELL interrupt. Since the CPU was running a guest, instead
      of invoking the doorbell handler, the code invokes the second-level
      interrupt handler to switch the context from the guest to the host. At
      this point the setting of the EE bit doesn't result in the CPU getting
      the doorbell interrupt since it has already been delivered once. So,
      the handler for this doorbell is never invoked!
      
      This causes softlockups if the missed DOORBELL was an IPI sent from a
      sibling subcore on the same CPU.
      
      This patch fixes it by explitly invoking the doorbell handler on the
      exit path if the exit reason is H_DOORBELL similar to the way an
      EXTERNAL interrupt is handled. Since this will also handle Case 1), we
      can unconditionally clear the doorbell message in
      kvmppc_check_wake_reason.
      Signed-off-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      70aa3961
    • Nikunj A Dadhania's avatar
      KVM: PPC: Implement extension to report number of memslots · bfec5c2c
      Nikunj A Dadhania authored
      QEMU assumes 32 memslots if this extension is not implemented. Although,
      current value of KVM_USER_MEM_SLOTS is 32, once KVM_USER_MEM_SLOTS
      changes QEMU would take a wrong value.
      Signed-off-by: default avatarNikunj A Dadhania <nikunj@linux.vnet.ibm.com>
      Reviewed-by: default avatarThomas Huth <thuth@redhat.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      bfec5c2c
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Make H_REMOVE return correct HPTE value for absent HPTEs · c64dfe2a
      Paul Mackerras authored
      This fixes a bug where the old HPTE value returned by H_REMOVE has
      the valid bit clear if the HPTE was an absent HPTE, as happens for
      HPTEs for emulated MMIO pages and for RAM pages that have been paged
      out by the host.  If the absent bit is set, we clear it and set the
      valid bit, because from the guest's point of view, the HPTE is valid.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      c64dfe2a
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Don't fall back to smaller HPT size in allocation ioctl · 572abd56
      Paul Mackerras authored
      Currently the KVM_PPC_ALLOCATE_HTAB will try to allocate the requested
      size of HPT, and if that is not possible, then try to allocate smaller
      sizes (by factors of 2) until either a minimum is reached or the
      allocation succeeds.  This is not ideal for userspace, particularly in
      migration scenarios, where the destination VM really does require the
      size requested.  Also, the minimum HPT size of 256kB may be
      insufficient for the guest to run successfully.
      
      This removes the fallback to smaller sizes on allocation failure for
      the KVM_PPC_ALLOCATE_HTAB ioctl.  The fallback still exists for the
      case where the HPT is allocated at the time the first VCPU is run, if
      no HPT has been allocated by ioctl by that time.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      572abd56
  6. 20 Oct, 2015 6 commits
    • Christoffer Dall's avatar
      arm/arm64: KVM: Fix disabled distributor operation · 0d997491
      Christoffer Dall authored
      We currently do a single update of the vgic state when the distributor
      enable/disable control register is accessed and then bypass updating the
      state for as long as the distributor remains disabled.
      
      This is incorrect, because updating the state does not consider the
      distributor enable bit, and this you can end up in a situation where an
      interrupt is marked as pending on the CPU interface, but not pending on
      the distributor, which is an impossible state to be in, and triggers a
      warning.  Consider for example the following sequence of events:
      
      1. An interrupt is marked as pending on the distributor
         - the interrupt is also forwarded to the CPU interface
      2. The guest turns off the distributor (it's about to do a reboot)
         - we stop updating the CPU interface state from now on
      3. The guest disables the pending interrupt
         - we remove the pending state from the distributor, but don't touch
           the CPU interface, see point 2.
      
      Since the distributor disable bit really means that no interrupts should
      be forwarded to the CPU interface, we modify the code to keep updating
      the internal VGIC state, but always set the CPU interface pending bits
      to zero when the distributor is disabled.
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      0d997491
    • Christoffer Dall's avatar
      arm/arm64: KVM: Clear map->active on pend/active clear · 544c572e
      Christoffer Dall authored
      When a guest reboots or offlines/onlines CPUs, it is not uncommon for it
      to clear the pending and active states of an interrupt through the
      emulated VGIC distributor.  However, since the architected timers are
      defined by the architecture to be level triggered and the guest
      rightfully expects them to be that, but we emulate them as
      edge-triggered, we have to mimic level-triggered behavior for an
      edge-triggered virtual implementation.
      
      We currently do not signal the VGIC when the map->active field is true,
      because it indicates that the guest has already been signalled of the
      interrupt as required.  Normally this field is set to false when the
      guest deactivates the virtual interrupt through the sync path.
      
      We also need to catch the case where the guest deactivates the interrupt
      through the emulated distributor, again allowing guests to boot even if
      the original virtual timer signal hit before the guest's GIC
      initialization sequence is run.
      Reviewed-by: default avatarEric Auger <eric.auger@linaro.org>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      544c572e
    • Christoffer Dall's avatar
      arm/arm64: KVM: Fix arch timer behavior for disabled interrupts · cff9211e
      Christoffer Dall authored
      We have an interesting issue when the guest disables the timer interrupt
      on the VGIC, which happens when turning VCPUs off using PSCI, for
      example.
      
      The problem is that because the guest disables the virtual interrupt at
      the VGIC level, we never inject interrupts to the guest and therefore
      never mark the interrupt as active on the physical distributor.  The
      host also never takes the timer interrupt (we only use the timer device
      to trigger a guest exit and everything else is done in software), so the
      interrupt does not become active through normal means.
      
      The result is that we keep entering the guest with a programmed timer
      that will always fire as soon as we context switch the hardware timer
      state and run the guest, preventing forward progress for the VCPU.
      
      Since the active state on the physical distributor is really part of the
      timer logic, it is the job of our virtual arch timer driver to manage
      this state.
      
      The timer->map->active boolean field indicates whether we have signalled
      this interrupt to the vgic and if that interrupt is still pending or
      active.  As long as that is the case, the hardware doesn't have to
      generate physical interrupts and therefore we mark the interrupt as
      active on the physical distributor.
      
      We also have to restore the pending state of an interrupt that was
      queued to an LR but was retired from the LR for some reason, while
      remaining pending in the LR.
      
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Reported-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      cff9211e
    • Arnd Bergmann's avatar
      KVM: arm: use GIC support unconditionally · 4a5d69b7
      Arnd Bergmann authored
      The vgic code on ARM is built for all configurations that enable KVM,
      but the parent_data field that it references is only present when
      CONFIG_IRQ_DOMAIN_HIERARCHY is set:
      
      virt/kvm/arm/vgic.c: In function 'kvm_vgic_map_phys_irq':
      virt/kvm/arm/vgic.c:1781:13: error: 'struct irq_data' has no member named 'parent_data'
      
      This flag is implied by the GIC driver, and indeed the VGIC code only
      makes sense if a GIC is present. This changes the CONFIG_KVM symbol
      to always select GIC, which avoids the issue.
      
      Fixes: 662d9715 ("arm/arm64: KVM: Kill CONFIG_KVM_ARM_{VGIC,TIMER}")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      4a5d69b7
    • Pavel Fedin's avatar
      KVM: arm/arm64: Fix memory leak if timer initialization fails · 399ea0f6
      Pavel Fedin authored
      Jump to correct label and free kvm_host_cpu_state
      Reviewed-by: default avatarWei Huang <wei@redhat.com>
      Signed-off-by: default avatarPavel Fedin <p.fedin@samsung.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      399ea0f6
    • Pavel Fedin's avatar
      KVM: arm/arm64: Do not inject spurious interrupts · 437f9963
      Pavel Fedin authored
      When lowering a level-triggered line from userspace, we forgot to lower
      the pending bit on the emulated CPU interface and we also did not
      re-compute the pending_on_cpu bitmap for the CPU affected by the change.
      
      Update vgic_update_irq_pending() to fix the two issues above and also
      raise a warning in vgic_quue_irq_to_lr if we encounter an interrupt
      pending on a CPU which is neither marked active nor pending.
      
        [ Commit text reworked completely - Christoffer ]
      Signed-off-by: default avatarPavel Fedin <p.fedin@samsung.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      437f9963
  7. 19 Oct, 2015 2 commits
    • Takuya Yoshikawa's avatar
      KVM: x86: MMU: Initialize force_pt_level before calling mapping_level() · 8c85ac1c
      Takuya Yoshikawa authored
      Commit fd136902 ("KVM: x86: MMU: Move mapping_level_dirty_bitmap()
      call in mapping_level()") forgot to initialize force_pt_level to false
      in FNAME(page_fault)() before calling mapping_level() like
      nonpaging_map() does.  This can sometimes result in forcing page table
      level mapping unnecessarily.
      
      Fix this and move the first *force_pt_level check in mapping_level()
      before kvm_vcpu_gfn_to_memslot() call to make it a bit clearer that
      the variable must be initialized before mapping_level() gets called.
      
      This change can also avoid calling kvm_vcpu_gfn_to_memslot() when
      !check_hugepage_cache_consistency() check in tdp_page_fault() forces
      page table level mapping.
      Signed-off-by: default avatarTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      8c85ac1c
    • Paolo Bonzini's avatar
      kvm: x86: zero EFER on INIT · 5690891b
      Paolo Bonzini authored
      Not zeroing EFER means that a 32-bit firmware cannot enter paging mode
      without clearing EFER.LME first (which it should not know about).
      Yang Zhang from Intel confirmed that the manual is wrong and EFER is
      cleared to zero on INIT.
      
      Fixes: d28bc9dd
      Cc: stable@vger.kernel.org
      Cc: Yang Z Zhang <yang.z.zhang@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5690891b
  8. 16 Oct, 2015 1 commit