1. 08 Jul, 2020 26 commits
  2. 06 Jul, 2020 3 commits
    • Paolo Bonzini's avatar
      Merge tag 'kvmarm-fixes-5.8-3' of... · 8038a922
      Paolo Bonzini authored
      Merge tag 'kvmarm-fixes-5.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master
      
      KVM/arm fixes for 5.8, take #3
      
      - Disable preemption on context-switching PMU EL0 state happening
        on system register trap
      - Don't clobber X0 when tearing down KVM via a soft reset (kexec)
      8038a922
    • Andrew Scull's avatar
      KVM: arm64: Stop clobbering x0 for HVC_SOFT_RESTART · b9e10d4a
      Andrew Scull authored
      HVC_SOFT_RESTART is given values for x0-2 that it should installed
      before exiting to the new address so should not set x0 to stub HVC
      success or failure code.
      
      Fixes: af42f204 ("arm64: hyp-stub: Zero x0 on successful stub handling")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAndrew Scull <ascull@google.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20200706095259.1338221-1-ascull@google.com
      b9e10d4a
    • Marc Zyngier's avatar
      KVM: arm64: PMU: Fix per-CPU access in preemptible context · 146f76cc
      Marc Zyngier authored
      Commit 07da1ffa ("KVM: arm64: Remove host_cpu_context
      member from vcpu structure") has, by removing the host CPU
      context pointer, exposed that kvm_vcpu_pmu_restore_guest
      is called in preemptible contexts:
      
      [  266.932442] BUG: using smp_processor_id() in preemptible [00000000] code: qemu-system-aar/779
      [  266.939721] caller is debug_smp_processor_id+0x20/0x30
      [  266.944157] CPU: 2 PID: 779 Comm: qemu-system-aar Tainted: G            E     5.8.0-rc3-00015-g8d4aa58b2fe3 #1374
      [  266.954268] Hardware name: amlogic w400/w400, BIOS 2020.04 05/22/2020
      [  266.960640] Call trace:
      [  266.963064]  dump_backtrace+0x0/0x1e0
      [  266.966679]  show_stack+0x20/0x30
      [  266.969959]  dump_stack+0xe4/0x154
      [  266.973338]  check_preemption_disabled+0xf8/0x108
      [  266.977978]  debug_smp_processor_id+0x20/0x30
      [  266.982307]  kvm_vcpu_pmu_restore_guest+0x2c/0x68
      [  266.986949]  access_pmcr+0xf8/0x128
      [  266.990399]  perform_access+0x8c/0x250
      [  266.994108]  kvm_handle_sys_reg+0x10c/0x2f8
      [  266.998247]  handle_exit+0x78/0x200
      [  267.001697]  kvm_arch_vcpu_ioctl_run+0x2ac/0xab8
      
      Note that the bug was always there, it is only the switch to
      using percpu accessors that made it obvious.
      The fix is to wrap these accesses in a preempt-disabled section,
      so that we sample a coherent context on trap from the guest.
      
      Fixes: 435e53fb ("arm64: KVM: Enable VHE support for :G/:H perf event modifiers")
      Cc:: Andrew Murray <amurray@thegoodpenguin.co.uk>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      146f76cc
  3. 03 Jul, 2020 3 commits
  4. 02 Jul, 2020 1 commit
  5. 01 Jul, 2020 1 commit
  6. 30 Jun, 2020 1 commit
    • Paolo Bonzini's avatar
      KVM: x86: bit 8 of non-leaf PDPEs is not reserved · 5ecad245
      Paolo Bonzini authored
      Bit 8 would be the "global" bit, which does not quite make sense for non-leaf
      page table entries.  Intel ignores it; AMD ignores it in PDEs and PDPEs, but
      reserves it in PML4Es.
      
      Probably, earlier versions of the AMD manual documented it as reserved in PDPEs
      as well, and that behavior made it into KVM as well as kvm-unit-tests; fix it.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarNadav Amit <namit@vmware.com>
      Fixes: a0c0feb5 ("KVM: x86: reserve bit 8 of non-leaf PDPEs and PML4Es in 64-bit mode on AMD", 2014-09-03)
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5ecad245
  7. 29 Jun, 2020 1 commit
    • Wanpeng Li's avatar
      KVM: X86: Fix async pf caused null-ptr-deref · 9d3c447c
      Wanpeng Li authored
      Syzbot reported that:
      
        CPU: 1 PID: 6780 Comm: syz-executor153 Not tainted 5.7.0-syzkaller #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        RIP: 0010:__apic_accept_irq+0x46/0xb80
        Call Trace:
         kvm_arch_async_page_present+0x7de/0x9e0
         kvm_check_async_pf_completion+0x18d/0x400
         kvm_arch_vcpu_ioctl_run+0x18bf/0x69f0
         kvm_vcpu_ioctl+0x46a/0xe20
         ksys_ioctl+0x11a/0x180
         __x64_sys_ioctl+0x6f/0xb0
         do_syscall_64+0xf6/0x7d0
         entry_SYSCALL_64_after_hwframe+0x49/0xb3
      
      The testcase enables APF mechanism in MSR_KVM_ASYNC_PF_EN with ASYNC_PF_INT
      enabled w/o setting MSR_KVM_ASYNC_PF_INT before, what's worse, interrupt
      based APF 'page ready' event delivery depends on in kernel lapic, however,
      we didn't bail out when lapic is not in kernel during guest setting
      MSR_KVM_ASYNC_PF_EN which causes the null-ptr-deref in host later.
      This patch fixes it.
      
      Reported-by: syzbot+1bf777dfdde86d64b89b@syzkaller.appspotmail.com
      Fixes: 2635b5c4 (KVM: x86: interrupt based APF 'page ready' event delivery)
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Message-Id: <1593426391-8231-1-git-send-email-wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      9d3c447c
  8. 24 Jun, 2020 1 commit
  9. 23 Jun, 2020 3 commits
    • Marc Zyngier's avatar
      KVM: arm64: vgic-v4: Plug race between non-residency and v4.1 doorbell · a3f574cd
      Marc Zyngier authored
      When making a vPE non-resident because it has hit a blocking WFI,
      the doorbell can fire at any time after the write to the RD.
      Crucially, it can fire right between the write to GICR_VPENDBASER
      and the write to the pending_last field in the its_vpe structure.
      
      This means that we would overwrite pending_last with stale data,
      and potentially not wakeup until some unrelated event (such as
      a timer interrupt) puts the vPE back on the CPU.
      
      GICv4 isn't affected by this as we actively mask the doorbell on
      entering the guest, while GICv4.1 automatically manages doorbell
      delivery without any hypervisor-driven masking.
      
      Use the vpe_lock to synchronize such update, which solves the
      problem altogether.
      
      Fixes: ae699ad3 ("irqchip/gic-v4.1: Move doorbell management to the GICv4 abstraction layer")
      Reported-by: default avatarZenghui Yu <yuzenghui@huawei.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      a3f574cd
    • Sean Christopherson's avatar
      KVM: VMX: Remove vcpu_vmx's defunct copy of host_pkru · e4553b49
      Sean Christopherson authored
      Remove vcpu_vmx.host_pkru, which got left behind when PKRU support was
      moved to common x86 code.
      
      No functional change intended.
      
      Fixes: 37486135 ("KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it to x86.c")
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200617034123.25647-1-sean.j.christopherson@intel.com>
      Reviewed-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e4553b49
    • Marcelo Tosatti's avatar
      KVM: x86: allow TSC to differ by NTP correction bounds without TSC scaling · 26769f96
      Marcelo Tosatti authored
      The Linux TSC calibration procedure is subject to small variations
      (its common to see +-1 kHz difference between reboots on a given CPU, for example).
      
      So migrating a guest between two hosts with identical processor can fail, in case
      of a small variation in calibrated TSC between them.
      
      Without TSC scaling, the current kernel interface will either return an error
      (if user_tsc_khz <= tsc_khz) or enable TSC catchup mode.
      
      This change enables the following TSC tolerance check to
      accept KVM_SET_TSC_KHZ within tsc_tolerance_ppm (which is 250ppm by default).
      
              /*
               * Compute the variation in TSC rate which is acceptable
               * within the range of tolerance and decide if the
               * rate being applied is within that bounds of the hardware
               * rate.  If so, no scaling or compensation need be done.
               */
              thresh_lo = adjust_tsc_khz(tsc_khz, -tsc_tolerance_ppm);
              thresh_hi = adjust_tsc_khz(tsc_khz, tsc_tolerance_ppm);
              if (user_tsc_khz < thresh_lo || user_tsc_khz > thresh_hi) {
                      pr_debug("kvm: requested TSC rate %u falls outside tolerance [%u,%u]\n", user_tsc_khz, thresh_lo, thresh_hi);
                      use_scaling = 1;
              }
      
      NTP daemon in the guest can correct this difference (NTP can correct upto 500ppm).
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      
      Message-Id: <20200616114741.GA298183@fuller.cnet>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      26769f96