1. 10 Jun, 2022 3 commits
    • Sean Christopherson's avatar
      KVM: x86: Omit VCPU_REGS_RIP from emulator's _regs array · a5ba67b4
      Sean Christopherson authored
      Omit RIP from the emulator's _regs array, which is used only for GPRs,
      i.e. registers that can be referenced via ModRM and/or SIB bytes.  The
      emulator uses the dedicated _eip field for RIP, and manually reads from
      _eip to handle RIP-relative addressing.
      
      To avoid an even bigger, slightly more dangerous change, hardcode the
      number of GPRs to 16 for the time being even though 32-bit KVM's emulator
      technically should only have 8 GPRs.  Add a TODO to address that in a
      future commit.
      
      See also the comments above the read_gpr() and write_gpr() declarations,
      and obviously the handling in writeback_registers().
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Message-Id: <20220526210817.3428868-4-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a5ba67b4
    • Sean Christopherson's avatar
      KVM: x86: Harden _regs accesses to guard against buggy input · dfe21e6b
      Sean Christopherson authored
      WARN and truncate the incoming GPR number/index when reading/writing GPRs
      in the emulator to guard against KVM bugs, e.g. to avoid out-of-bounds
      accesses to ctxt->_regs[] if KVM generates a bogus index.  Truncate the
      index instead of returning e.g. zero, as reg_write() returns a pointer
      to the register, i.e. returning zero would result in a NULL pointer
      dereference.  KVM could also force the index to any arbitrary GPR, but
      that's no better or worse, just different.
      
      Open code the restriction to 16 registers; RIP is handled via _eip and
      should never be accessed through reg_read() or reg_write().  See the
      comments above the declarations of reg_read() and reg_write(), and the
      behavior of writeback_registers().  The horrific open coded mess will be
      cleaned up in a future commit.
      
      There are no such bugs known to exist in the emulator, but determining
      that KVM is bug-free is not at all simple and requires a deep dive into
      the emulator.  The code is so convoluted that GCC-12 with the recently
      enable -Warray-bounds spits out a false-positive due to a GCC bug:
      
        arch/x86/kvm/emulate.c:254:27: warning: array subscript 32 is above array
                                       bounds of 'long unsigned int[17]' [-Warray-bounds]
          254 |         return ctxt->_regs[nr];
              |                ~~~~~~~~~~~^~~~
        In file included from arch/x86/kvm/emulate.c:23:
        arch/x86/kvm/kvm_emulate.h: In function 'reg_rmw':
        arch/x86/kvm/kvm_emulate.h:366:23: note: while referencing '_regs'
          366 |         unsigned long _regs[NR_VCPU_REGS];
              |                       ^~~~~
      
      Link: https://lore.kernel.org/all/YofQlBrlx18J7h9Y@google.com
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=216026
      Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105679Reported-and-tested-by: default avatarRobert Dinse <nanook@eskimo.com>
      Reported-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220526210817.3428868-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      dfe21e6b
    • Sean Christopherson's avatar
      KVM: x86: Grab regs_dirty in local 'unsigned long' · 61d9c412
      Sean Christopherson authored
      Capture ctxt->regs_dirty in a local 'unsigned long' instead of casting it
      to an 'unsigned long *' for use in for_each_set_bit().  The bitops helpers
      really do read the entire 'unsigned long', even though the walking of the
      read value is capped at the specified size.  I.e. 64-bit KVM is reading
      memory beyond ctxt->regs_dirty, which is a u32 and thus 4 bytes, whereas
      an unsigned long is 8 bytes.  Functionally it's not an issue because
      regs_dirty is in the middle of x86_emulate_ctxt, i.e. KVM is just reading
      its own memory, but relying on that coincidence is gross and unsafe.
      Reviewed-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220526210817.3428868-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      61d9c412
  2. 09 Jun, 2022 31 commits
  3. 08 Jun, 2022 6 commits
    • Paul Durrant's avatar
      KVM: x86: PIT: Preserve state of speaker port data bit · b1728622
      Paul Durrant authored
      Currently the state of the speaker port (0x61) data bit (bit 1) is not
      saved in the exported state (kvm_pit_state2) and hence is lost when
      re-constructing guest state.
      
      This patch removes the 'speaker_data_port' field from kvm_kpit_state and
      instead tracks the state using a new KVM_PIT_FLAGS_SPEAKER_DATA_ON flag
      defined in the API.
      Signed-off-by: default avatarPaul Durrant <pdurrant@amazon.com>
      Message-Id: <20220531124421.1427-1-pdurrant@amazon.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b1728622
    • Sean Christopherson's avatar
      KVM: VMX: Reject kvm_intel if an inconsistent VMCS config is detected · 3dbec44d
      Sean Christopherson authored
      Add an on-by-default module param, error_on_inconsistent_vmcs_config, to
      allow rejecting the load of kvm_intel if an inconsistent VMCS config is
      detected.  Continuing on with an inconsistent, degraded config is
      undesirable in the vast majority of use cases, e.g. may result in a
      misconfigured VM, poor performance due to lack of fast MSR switching, or
      even security issues in the unlikely event the guest is relying on MPX.
      
      Practically speaking, an inconsistent VMCS config should never be
      encountered in a production quality environment, e.g. on bare metal it
      indicates a silicon defect (or a disturbing lack of validation by the
      hardware vendor), and in a virtualized machine (KVM as L1) it indicates a
      buggy/misconfigured L0 VMM/hypervisor.
      
      Provide a module param to override the behavior for testing purposes, or
      in the unlikely scenario that KVM is deployed on a flawed-but-usable CPU
      or virtual machine.
      
      Note, what is or isn't an inconsistency is somewhat subjective, e.g. one
      might argue that LOAD_EFER without SAVE_EFER is an inconsistency.  KVM's
      unofficial guideline for an "inconsistency" is either scenarios that are
      completely nonsensical, e.g. the existing checks on having EPT/VPID knobs
      without EPT/VPID, and/or scenarios that prevent KVM from virtualizing or
      utilizing a feature, e.g. the unpaired entry/exit controls checks.  Other
      checks that fall into one or both of the covered scenarios could be added
      in the future, e.g. asserting that a VMCS control exists available if and
      only if the associated feature is supported in bare metal.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220527170658.3571367-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3dbec44d
    • Sean Christopherson's avatar
      KVM: VMX: Sanitize VM-Entry/VM-Exit control pairs at kvm_intel load time · f5a81d0e
      Sean Christopherson authored
      Sanitize the VM-Entry/VM-Exit control pairs (load+load or load+clear)
      during setup instead of checking both controls in a pair at runtime.  If
      only one control is supported, KVM will report the associated feature as
      not available, but will leave the supported control bit set in the VMCS
      config, which could lead to corruption of host state.  E.g. if only the
      VM-Entry control is supported and the feature is not dynamically toggled,
      KVM will set the control in all VMCSes and load zeros without restoring
      host state.
      
      Note, while this is technically a bug fix, practically speaking no sane
      CPU or VMM would support only one control.  KVM's behavior of checking
      both controls is mostly pedantry.
      
      Cc: Chenyi Qiang <chenyi.qiang@intel.com>
      Cc: Lei Wang <lei4.wang@intel.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220527170658.3571367-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f5a81d0e
    • Like Xu's avatar
      KVM: x86/pmu: Accept 0 for absent PMU MSRs when host-initiated if !enable_pmu · 8e6a58e2
      Like Xu authored
      Whenever an MSR is part of KVM_GET_MSR_INDEX_LIST, as is the case for
      MSR_K7_EVNTSEL0 or MSR_F15H_PERF_CTL0, it has to be always retrievable
      and settable with KVM_GET_MSR and KVM_SET_MSR.
      
      Accept a zero value for these MSRs to obey the contract.
      Signed-off-by: default avatarLike Xu <likexu@tencent.com>
      Message-Id: <20220601031925.59693-1-likexu@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      8e6a58e2
    • Like Xu's avatar
      KVM: x86/pmu: Restrict advanced features based on module enable_pmu · 6ef25aa0
      Like Xu authored
      Once vPMU is disabled, the KVM would not expose features like:
      PEBS (via clear kvm_pmu_cap.pebs_ept), legacy LBR and ARCH_LBR,
      CPUID 0xA leaf, PDCM bit and MSR_IA32_PERF_CAPABILITIES, plus
      PT_MODE_HOST_GUEST mode.
      
      What this group of features has in common is that their use
      relies on the underlying PMU counter and the host perf_event as a
      back-end resource requester or sharing part of the irq delivery path.
      Signed-off-by: default avatarLike Xu <likexu@tencent.com>
      Message-Id: <20220601031925.59693-2-likexu@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6ef25aa0
    • Like Xu's avatar
      KVM: x86/pmu: Avoid exposing Intel BTS feature · b9181c8e
      Like Xu authored
      The BTS feature (including the ability to set the BTS and BTINT
      bits in the DEBUGCTL MSR) is currently unsupported on KVM.
      
      But we may try using the BTS facility on a PEBS enabled guest like this:
          perf record -e branches:u -c 1 -d ls
      and then we would encounter the following call trace:
      
       [] unchecked MSR access error: WRMSR to 0x1d9 (tried to write 0x00000000000003c0)
              at rIP: 0xffffffff810745e4 (native_write_msr+0x4/0x20)
       [] Call Trace:
       []  intel_pmu_enable_bts+0x5d/0x70
       []  bts_event_add+0x54/0x70
       []  event_sched_in+0xee/0x290
      
      As it lacks any CPUID indicator or perf_capabilities valid bit
      fields to prompt for this information, the platform would hint
      the Intel BTS feature unavailable to guest by setting the
      BTS_UNAVAIL bit in the IA32_MISC_ENABLE.
      Signed-off-by: default avatarLike Xu <likexu@tencent.com>
      Message-Id: <20220601031925.59693-3-likexu@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b9181c8e