1. 06 Aug, 2018 5 commits
    • Paolo Bonzini's avatar
      KVM: x86: ensure all MSRs can always be KVM_GET/SET_MSR'd · 44883f01
      Paolo Bonzini authored
      Some of the MSRs returned by GET_MSR_INDEX_LIST currently cannot be sent back
      to KVM_GET_MSR and/or KVM_SET_MSR; either they can never be sent back, or you
      they are only accepted under special conditions.  This makes the API a pain to
      use.
      
      To avoid this pain, this patch makes it so that the result of the get-list
      ioctl can always be used for host-initiated get and set.  Since we don't have
      a separate way to check for read-only MSRs, this means some Hyper-V MSRs are
      ignored when written.  Arguably they should not even be in the result of
      GET_MSR_INDEX_LIST, but I am leaving there in case userspace is using the
      outcome of GET_MSR_INDEX_LIST to derive the support for the corresponding
      Hyper-V feature.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      44883f01
    • Sean Christopherson's avatar
      KVM: vmx: remove save/restore of host BNDCGFS MSR · cf81a7e5
      Sean Christopherson authored
      Linux does not support Memory Protection Extensions (MPX) in the
      kernel itself, thus the BNDCFGS (Bound Config Supervisor) MSR will
      always be zero in the KVM host, i.e. RDMSR in vmx_save_host_state()
      is superfluous.  KVM unconditionally sets VM_EXIT_CLEAR_BNDCFGS,
      i.e. BNDCFGS will always be zero after VMEXIT, thus manually loading
      BNDCFGS is also superfluous.
      
      And in the event the MPX kernel support is added (unlikely given
      that MPX for userspace is in its death throes[1]), BNDCFGS will
      likely be common across all CPUs[2], and at the least shouldn't
      change on a regular basis, i.e. saving the MSR on every VMENTRY is
      completely unnecessary.
      
      WARN_ONCE in hardware_setup() if the host's BNDCFGS is non-zero to
      document that KVM does not preserve BNDCFGS and to serve as a hint
      as to how BNDCFGS likely should be handled if MPX is used in the
      kernel, e.g. BNDCFGS should be saved once during KVM setup.
      
      [1] https://lkml.org/lkml/2018/4/27/1046
      [2] http://www.openwall.com/lists/kernel-hardening/2017/07/24/28Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      cf81a7e5
    • KarimAllah Ahmed's avatar
      KVM: Switch 'requests' to be 64-bit (explicitly) · 86dafed5
      KarimAllah Ahmed authored
      Switch 'requests' to be explicitly 64-bit and update BUILD_BUG_ON check to
      use the size of "requests" instead of the hard-coded '32'.
      
      That gives us a bit more room again for arch-specific requests as we
      already ran out of space for x86 due to the hard-coded check.
      
      The only exception here is ARM32 as it is still 32-bits.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim KrÄmáŠ<rkrcmar@redhat.com>
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarKarimAllah Ahmed <karahmed@amazon.de>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      86dafed5
    • Wei Huang's avatar
      kvm: selftests: add cr4_cpuid_sync_test · ca359066
      Wei Huang authored
      KVM is supposed to update some guest VM's CPUID bits (e.g. OSXSAVE) when
      CR4 is changed. A bug was found in KVM recently and it was fixed by
      Commit c4d21882 ("KVM: x86: Update cpuid properly when CR4.OSXAVE or
      CR4.PKE is changed"). This patch adds a test to verify the synchronization
      between guest VM's CR4 and CPUID bits.
      Signed-off-by: default avatarWei Huang <wei@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ca359066
    • Paolo Bonzini's avatar
      Merge tag 'v4.18-rc6' into HEAD · d2ce98ca
      Paolo Bonzini authored
      Pull bug fixes into the KVM development tree to avoid nasty conflicts.
      d2ce98ca
  2. 02 Aug, 2018 2 commits
  3. 30 Jul, 2018 15 commits
  4. 26 Jul, 2018 3 commits
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Read kvm->arch.emul_smt_mode under kvm->lock · b5c6f760
      Paul Mackerras authored
      Commit 1e175d2e ("KVM: PPC: Book3S HV: Pack VCORE IDs to access full
      VCPU ID space", 2018-07-25) added code that uses kvm->arch.emul_smt_mode
      before any VCPUs are created.  However, userspace can change
      kvm->arch.emul_smt_mode at any time up until the first VCPU is created.
      Hence it is (theoretically) possible for the check in
      kvmppc_core_vcpu_create_hv() to race with another userspace thread
      changing kvm->arch.emul_smt_mode.
      
      This fixes it by moving the test that uses kvm->arch.emul_smt_mode into
      the block where kvm->lock is held.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      b5c6f760
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Allow creating max number of VCPUs on POWER9 · 1ebe6b81
      Paul Mackerras authored
      Commit 1e175d2e ("KVM: PPC: Book3S HV: Pack VCORE IDs to access full
      VCPU ID space", 2018-07-25) allowed use of VCPU IDs up to
      KVM_MAX_VCPU_ID on POWER9 in all guest SMT modes and guest emulated
      hardware SMT modes.  However, with the current definition of
      KVM_MAX_VCPU_ID, a guest SMT mode of 1 and an emulated SMT mode of 8,
      it is only possible to create KVM_MAX_VCPUS / 2 VCPUS, because
      threads_per_subcore is 4 on POWER9 CPUs.  (Using an emulated SMT mode
      of 8 is useful when migrating VMs to or from POWER8 hosts.)
      
      This increases KVM_MAX_VCPU_ID to 8 * KVM_MAX_VCPUS when HV KVM is
      configured in, so that a full complement of KVM_MAX_VCPUS VCPUs can
      be created on POWER9 in all guest SMT modes and emulated hardware
      SMT modes.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      1ebe6b81
    • Sam Bobroff's avatar
      KVM: PPC: Book3S HV: Pack VCORE IDs to access full VCPU ID space · 1e175d2e
      Sam Bobroff authored
      It is not currently possible to create the full number of possible
      VCPUs (KVM_MAX_VCPUS) on Power9 with KVM-HV when the guest uses fewer
      threads per core than its core stride (or "VSMT mode"). This is
      because the VCORE ID and XIVE offsets grow beyond KVM_MAX_VCPUS
      even though the VCPU ID is less than KVM_MAX_VCPU_ID.
      
      To address this, "pack" the VCORE ID and XIVE offsets by using
      knowledge of the way the VCPU IDs will be used when there are fewer
      guest threads per core than the core stride. The primary thread of
      each core will always be used first. Then, if the guest uses more than
      one thread per core, these secondary threads will sequentially follow
      the primary in each core.
      
      So, the only way an ID above KVM_MAX_VCPUS can be seen, is if the
      VCPUs are being spaced apart, so at least half of each core is empty,
      and IDs between KVM_MAX_VCPUS and (KVM_MAX_VCPUS * 2) can be mapped
      into the second half of each core (4..7, in an 8-thread core).
      
      Similarly, if IDs above KVM_MAX_VCPUS * 2 are seen, at least 3/4 of
      each core is being left empty, and we can map down into the second and
      third quarters of each core (2, 3 and 5, 6 in an 8-thread core).
      
      Lastly, if IDs above KVM_MAX_VCPUS * 4 are seen, only the primary
      threads are being used and 7/8 of the core is empty, allowing use of
      the 1, 5, 3 and 7 thread slots.
      
      (Strides less than 8 are handled similarly.)
      
      This allows the VCORE ID or offset to be calculated quickly from the
      VCPU ID or XIVE server numbers, without access to the VCPU structure.
      
      [paulus@ozlabs.org - tidied up comment a little, changed some WARN_ONCE
       to pr_devel, wrapped line, fixed id check.]
      Signed-off-by: default avatarSam Bobroff <sam.bobroff@au1.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      1e175d2e
  5. 22 Jul, 2018 8 commits
  6. 21 Jul, 2018 7 commits
    • Linus Torvalds's avatar
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ea75a2c7
      Linus Torvalds authored
      Pull core kernel fixes from Ingo Molnar:
       "This is mostly the copy_to_user_mcsafe() related fixes from Dan
        Williams, and an ORC fix for Clang"
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/asm/memcpy_mcsafe: Fix copy_to_user_mcsafe() exception handling
        lib/iov_iter: Fix pipe handling in _copy_to_iter_mcsafe()
        lib/iov_iter: Document _copy_to_iter_flushcache()
        lib/iov_iter: Document _copy_to_iter_mcsafe()
        objtool: Use '.strtab' if '.shstrtab' doesn't exist, to support ORC tables on Clang
      ea75a2c7
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · ffb48e79
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "Two regression fixes, one for xmon disassembly formatting and the
        other to fix the E500 build.
      
        Two commits to fix a potential security issue in the VFIO code under
        obscure circumstances.
      
        And finally a fix to the Power9 idle code to restore SPRG3, which is
        user visible and used for sched_getcpu().
      
        Thanks to: Alexey Kardashevskiy, David Gibson. Gautham R. Shenoy,
        James Clarke"
      
      * tag 'powerpc-4.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/powernv: Fix save/restore of SPRG3 on entry/exit from stop (idle)
        powerpc/Makefile: Assemble with -me500 when building for E500
        KVM: PPC: Check if IOMMU page is contained in the pinned physical page
        vfio/spapr: Use IOMMU pageshift rather than pagesize
        powerpc/xmon: Fix disassembly since printf changes
      ffb48e79
    • Linus Torvalds's avatar
      Merge tag 'for-4.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 55b636b4
      Linus Torvalds authored
      Pull btrfs fix from David Sterba:
       "A fix of a corruption regarding fsync and clone, under some very
        specific conditions explained in the patch.
      
        The fix is marked for stable 3.16+ so I'd like to get it merged now
        given the impact"
      
      * tag 'for-4.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        Btrfs: fix file data corruption after cloning a range and fsync
      55b636b4
    • Linus Torvalds's avatar
      mm: make vm_area_alloc() initialize core fields · 490fc053
      Linus Torvalds authored
      Like vm_area_dup(), it initializes the anon_vma_chain head, and the
      basic mm pointer.
      
      The rest of the fields end up being different for different users,
      although the plan is to also initialize the 'vm_ops' field to a dummy
      entry.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      490fc053
    • Linus Torvalds's avatar
      mm: make vm_area_dup() actually copy the old vma data · 95faf699
      Linus Torvalds authored
      .. and re-initialize th eanon_vma_chain head.
      
      This removes some boiler-plate from the users, and also makes it clear
      why it didn't need use the 'zalloc()' version.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      95faf699
    • Linus Torvalds's avatar
      mm: use helper functions for allocating and freeing vm_area structs · 3928d4f5
      Linus Torvalds authored
      The vm_area_struct is one of the most fundamental memory management
      objects, but the management of it is entirely open-coded evertwhere,
      ranging from allocation and freeing (using kmem_cache_[z]alloc and
      kmem_cache_free) to initializing all the fields.
      
      We want to unify this in order to end up having some unified
      initialization of the vmas, and the first step to this is to at least
      have basic allocation functions.
      
      Right now those functions are literally just wrappers around the
      kmem_cache_*() calls.  This is a purely mechanical conversion:
      
          # new vma:
          kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL) -> vm_area_alloc()
      
          # copy old vma
          kmem_cache_alloc(vm_area_cachep, GFP_KERNEL) -> vm_area_dup(old)
      
          # free vma
          kmem_cache_free(vm_area_cachep, vma) -> vm_area_free(vma)
      
      to the point where the old vma passed in to the vm_area_dup() function
      isn't even used yet (because I've left all the old manual initialization
      alone).
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3928d4f5
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 191a3afa
      Linus Torvalds authored
      Merge fixes from Andrew Morton:
       "5 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm: memcg: fix use after free in mem_cgroup_iter()
        mm/huge_memory.c: fix data loss when splitting a file pmd
        fat: fix memory allocation failure handling of match_strdup()
        MAINTAINERS: Peter has moved
        mm/memblock: add missing include <linux/bootmem.h>
      191a3afa