1. 16 Nov, 2017 1 commit
  2. 09 Nov, 2017 9 commits
  3. 08 Nov, 2017 2 commits
    • Radim Krčmář's avatar
      Merge tag 'kvm-arm-for-v4.15' of... · f0d438e4
      Radim Krčmář authored
      Merge tag 'kvm-arm-for-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into next
      
      KVM/ARM Changes for v4.15
      
      Changes include:
       - Optimized arch timer handling for KVM/ARM
       - Improvements to the VGIC ITS code and introduction of an ITS reset
         ioctl
       - Unification of the 32-bit fault injection logic
       - More exact external abort matching logic
      f0d438e4
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Fix exclusion between HPT resizing and other HPT updates · 38c53af8
      Paul Mackerras authored
      Commit 5e985969 ("KVM: PPC: Book3S HV: Outline of KVM-HV HPT resizing
      implementation", 2016-12-20) added code that tries to exclude any use
      or update of the hashed page table (HPT) while the HPT resizing code
      is iterating through all the entries in the HPT.  It does this by
      taking the kvm->lock mutex, clearing the kvm->arch.hpte_setup_done
      flag and then sending an IPI to all CPUs in the host.  The idea is
      that any VCPU task that tries to enter the guest will see that the
      hpte_setup_done flag is clear and therefore call kvmppc_hv_setup_htab_rma,
      which also takes the kvm->lock mutex and will therefore block until
      we release kvm->lock.
      
      However, any VCPU that is already in the guest, or is handling a
      hypervisor page fault or hypercall, can re-enter the guest without
      rechecking the hpte_setup_done flag.  The IPI will cause a guest exit
      of any VCPUs that are currently in the guest, but does not prevent
      those VCPU tasks from immediately re-entering the guest.
      
      The result is that after resize_hpt_rehash_hpte() has made a HPTE
      absent, a hypervisor page fault can occur and make that HPTE present
      again.  This includes updating the rmap array for the guest real page,
      meaning that we now have a pointer in the rmap array which connects
      with pointers in the old rev array but not the new rev array.  In
      fact, if the HPT is being reduced in size, the pointer in the rmap
      array could point outside the bounds of the new rev array.  If that
      happens, we can get a host crash later on such as this one:
      
      [91652.628516] Unable to handle kernel paging request for data at address 0xd0000000157fb10c
      [91652.628668] Faulting instruction address: 0xc0000000000e2640
      [91652.628736] Oops: Kernel access of bad area, sig: 11 [#1]
      [91652.628789] LE SMP NR_CPUS=1024 NUMA PowerNV
      [91652.628847] Modules linked in: binfmt_misc vhost_net vhost tap xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables ses enclosure scsi_transport_sas i2c_opal ipmi_powernv ipmi_devintf i2c_core ipmi_msghandler powernv_op_panel nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc kvm_hv kvm_pr kvm scsi_dh_alua dm_service_time dm_multipath tg3 ptp pps_core [last unloaded: stap_552b612747aec2da355051e464fa72a1_14259]
      [91652.629566] CPU: 136 PID: 41315 Comm: CPU 21/KVM Tainted: G           O    4.14.0-1.rc4.dev.gitb27fc5c.el7.centos.ppc64le #1
      [91652.629684] task: c0000007a419e400 task.stack: c0000000028d8000
      [91652.629750] NIP:  c0000000000e2640 LR: d00000000c36e498 CTR: c0000000000e25f0
      [91652.629829] REGS: c0000000028db5d0 TRAP: 0300   Tainted: G           O     (4.14.0-1.rc4.dev.gitb27fc5c.el7.centos.ppc64le)
      [91652.629932] MSR:  900000010280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 44022422  XER: 00000000
      [91652.630034] CFAR: d00000000c373f84 DAR: d0000000157fb10c DSISR: 40000000 SOFTE: 1
      [91652.630034] GPR00: d00000000c36e498 c0000000028db850 c000000001403900 c0000007b7960000
      [91652.630034] GPR04: d0000000117fb100 d000000007ab00d8 000000000033bb10 0000000000000000
      [91652.630034] GPR08: fffffffffffffe7f 801001810073bb10 d00000000e440000 d00000000c373f70
      [91652.630034] GPR12: c0000000000e25f0 c00000000fdb9400 f000000003b24680 0000000000000000
      [91652.630034] GPR16: 00000000000004fb 00007ff7081a0000 00000000000ec91a 000000000033bb10
      [91652.630034] GPR20: 0000000000010000 00000000001b1190 0000000000000001 0000000000010000
      [91652.630034] GPR24: c0000007b7ab8038 d0000000117fb100 0000000ec91a1190 c000001e6a000000
      [91652.630034] GPR28: 00000000033bb100 000000000073bb10 c0000007b7960000 d0000000157fb100
      [91652.630735] NIP [c0000000000e2640] kvmppc_add_revmap_chain+0x50/0x120
      [91652.630806] LR [d00000000c36e498] kvmppc_book3s_hv_page_fault+0xbb8/0xc40 [kvm_hv]
      [91652.630884] Call Trace:
      [91652.630913] [c0000000028db850] [c0000000028db8b0] 0xc0000000028db8b0 (unreliable)
      [91652.630996] [c0000000028db8b0] [d00000000c36e498] kvmppc_book3s_hv_page_fault+0xbb8/0xc40 [kvm_hv]
      [91652.631091] [c0000000028db9e0] [d00000000c36a078] kvmppc_vcpu_run_hv+0xdf8/0x1300 [kvm_hv]
      [91652.631179] [c0000000028dbb30] [d00000000c2248c4] kvmppc_vcpu_run+0x34/0x50 [kvm]
      [91652.631266] [c0000000028dbb50] [d00000000c220d54] kvm_arch_vcpu_ioctl_run+0x114/0x2a0 [kvm]
      [91652.631351] [c0000000028dbbd0] [d00000000c2139d8] kvm_vcpu_ioctl+0x598/0x7a0 [kvm]
      [91652.631433] [c0000000028dbd40] [c0000000003832e0] do_vfs_ioctl+0xd0/0x8c0
      [91652.631501] [c0000000028dbde0] [c000000000383ba4] SyS_ioctl+0xd4/0x130
      [91652.631569] [c0000000028dbe30] [c00000000000b8e0] system_call+0x58/0x6c
      [91652.631635] Instruction dump:
      [91652.631676] fba1ffe8 fbc1fff0 fbe1fff8 f8010010 f821ffa1 2fa70000 793d0020 e9432110
      [91652.631814] 7bbf26e4 7c7e1b78 7feafa14 409e0094 <807f000c> 786326e4 7c6a1a14 93a40008
      [91652.631959] ---[ end trace ac85ba6db72e5b2e ]---
      
      To fix this, we tighten up the way that the hpte_setup_done flag is
      checked to ensure that it does provide the guarantee that the resizing
      code needs.  In kvmppc_run_core(), we check the hpte_setup_done flag
      after disabling interrupts and refuse to enter the guest if it is
      clear (for a HPT guest).  The code that checks hpte_setup_done and
      calls kvmppc_hv_setup_htab_rma() is moved from kvmppc_vcpu_run_hv()
      to a point inside the main loop in kvmppc_run_vcpu(), ensuring that
      we don't just spin endlessly calling kvmppc_run_core() while
      hpte_setup_done is clear, but instead have a chance to block on the
      kvm->lock mutex.
      
      Finally we also check hpte_setup_done inside the region in
      kvmppc_book3s_hv_page_fault() where the HPTE is locked and we are about
      to update the HPTE, and bail out if it is clear.  If another CPU is
      inside kvm_vm_ioctl_resize_hpt_commit) and has cleared hpte_setup_done,
      then we know that either we are looking at a HPTE
      that resize_hpt_rehash_hpte() has not yet processed, which is OK,
      or else we will see hpte_setup_done clear and refuse to update it,
      because of the full barrier formed by the unlock of the HPTE in
      resize_hpt_rehash_hpte() combined with the locking of the HPTE
      in kvmppc_book3s_hv_page_fault().
      
      Fixes: 5e985969 ("KVM: PPC: Book3S HV: Outline of KVM-HV HPT resizing implementation")
      Cc: stable@vger.kernel.org # v4.10+
      Reported-by: default avatarSatheesh Rajendran <satheera@in.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      38c53af8
  4. 06 Nov, 2017 26 commits
  5. 02 Nov, 2017 1 commit
    • Paolo Bonzini's avatar
      Merge branch 'kvm-ppc-next' of... · 6d6ab940
      Paolo Bonzini authored
      Merge branch 'kvm-ppc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
      
      Apart from various bugfixes and code cleanups, the major new feature
      is the ability to run guests using the hashed page table (HPT) MMU
      mode on a host that is using the radix MMU mode.  Because of limitations
      in the current POWER9 chip (all SMT threads in each core must use the
      same MMU mode, HPT or radix), this requires the host to be configured
      to run similar to POWER8: the host runs in single-threaded mode (only
      thread 0 of each core online), and have KVM be able to wake up the other
      threads when a KVM guest is to be run, and use the other threads for
      running guest VCPUs.  A new module parameter, called "indep_threads_mode",
      is normally Y on POWER9 but must be set to N before any HPT guests can
      be run on a radix host:
      
          # echo N >/sys/module/kvm_hv/parameters/indep_threads_mode
          # ppc64_cpu --smt=off
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6d6ab940
  6. 01 Nov, 2017 1 commit
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Run HPT guests on POWER9 radix hosts · c0101509
      Paul Mackerras authored
      This patch removes the restriction that a radix host can only run
      radix guests, allowing us to run HPT (hashed page table) guests as
      well.  This is useful because it provides a way to run old guest
      kernels that know about POWER8 but not POWER9.
      
      Unfortunately, POWER9 currently has a restriction that all threads
      in a given code must either all be in HPT mode, or all in radix mode.
      This means that when entering a HPT guest, we have to obtain control
      of all 4 threads in the core and get them to switch their LPIDR and
      LPCR registers, even if they are not going to run a guest.  On guest
      exit we also have to get all threads to switch LPIDR and LPCR back
      to host values.
      
      To make this feasible, we require that KVM not be in the "independent
      threads" mode, and that the CPU cores be in single-threaded mode from
      the host kernel's perspective (only thread 0 online; threads 1, 2 and
      3 offline).  That allows us to use the same code as on POWER8 for
      obtaining control of the secondary threads.
      
      To manage the LPCR/LPIDR changes required, we extend the kvm_split_info
      struct to contain the information needed by the secondary threads.
      All threads perform a barrier synchronization (where all threads wait
      for every other thread to reach the synchronization point) on guest
      entry, both before and after loading LPCR and LPIDR.  On guest exit,
      they all once again perform a barrier synchronization both before
      and after loading host values into LPCR and LPIDR.
      
      Finally, it is also currently necessary to flush the entire TLB every
      time we enter a HPT guest on a radix host.  We do this on thread 0
      with a loop of tlbiel instructions.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      c0101509