1. 06 Aug, 2018 4 commits
  2. 02 Aug, 2018 2 commits
  3. 30 Jul, 2018 15 commits
  4. 26 Jul, 2018 3 commits
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Read kvm->arch.emul_smt_mode under kvm->lock · b5c6f760
      Paul Mackerras authored
      Commit 1e175d2e ("KVM: PPC: Book3S HV: Pack VCORE IDs to access full
      VCPU ID space", 2018-07-25) added code that uses kvm->arch.emul_smt_mode
      before any VCPUs are created.  However, userspace can change
      kvm->arch.emul_smt_mode at any time up until the first VCPU is created.
      Hence it is (theoretically) possible for the check in
      kvmppc_core_vcpu_create_hv() to race with another userspace thread
      changing kvm->arch.emul_smt_mode.
      
      This fixes it by moving the test that uses kvm->arch.emul_smt_mode into
      the block where kvm->lock is held.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      b5c6f760
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Allow creating max number of VCPUs on POWER9 · 1ebe6b81
      Paul Mackerras authored
      Commit 1e175d2e ("KVM: PPC: Book3S HV: Pack VCORE IDs to access full
      VCPU ID space", 2018-07-25) allowed use of VCPU IDs up to
      KVM_MAX_VCPU_ID on POWER9 in all guest SMT modes and guest emulated
      hardware SMT modes.  However, with the current definition of
      KVM_MAX_VCPU_ID, a guest SMT mode of 1 and an emulated SMT mode of 8,
      it is only possible to create KVM_MAX_VCPUS / 2 VCPUS, because
      threads_per_subcore is 4 on POWER9 CPUs.  (Using an emulated SMT mode
      of 8 is useful when migrating VMs to or from POWER8 hosts.)
      
      This increases KVM_MAX_VCPU_ID to 8 * KVM_MAX_VCPUS when HV KVM is
      configured in, so that a full complement of KVM_MAX_VCPUS VCPUs can
      be created on POWER9 in all guest SMT modes and emulated hardware
      SMT modes.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      1ebe6b81
    • Sam Bobroff's avatar
      KVM: PPC: Book3S HV: Pack VCORE IDs to access full VCPU ID space · 1e175d2e
      Sam Bobroff authored
      It is not currently possible to create the full number of possible
      VCPUs (KVM_MAX_VCPUS) on Power9 with KVM-HV when the guest uses fewer
      threads per core than its core stride (or "VSMT mode"). This is
      because the VCORE ID and XIVE offsets grow beyond KVM_MAX_VCPUS
      even though the VCPU ID is less than KVM_MAX_VCPU_ID.
      
      To address this, "pack" the VCORE ID and XIVE offsets by using
      knowledge of the way the VCPU IDs will be used when there are fewer
      guest threads per core than the core stride. The primary thread of
      each core will always be used first. Then, if the guest uses more than
      one thread per core, these secondary threads will sequentially follow
      the primary in each core.
      
      So, the only way an ID above KVM_MAX_VCPUS can be seen, is if the
      VCPUs are being spaced apart, so at least half of each core is empty,
      and IDs between KVM_MAX_VCPUS and (KVM_MAX_VCPUS * 2) can be mapped
      into the second half of each core (4..7, in an 8-thread core).
      
      Similarly, if IDs above KVM_MAX_VCPUS * 2 are seen, at least 3/4 of
      each core is being left empty, and we can map down into the second and
      third quarters of each core (2, 3 and 5, 6 in an 8-thread core).
      
      Lastly, if IDs above KVM_MAX_VCPUS * 4 are seen, only the primary
      threads are being used and 7/8 of the core is empty, allowing use of
      the 1, 5, 3 and 7 thread slots.
      
      (Strides less than 8 are handled similarly.)
      
      This allows the VCORE ID or offset to be calculated quickly from the
      VCPU ID or XIVE server numbers, without access to the VCPU structure.
      
      [paulus@ozlabs.org - tidied up comment a little, changed some WARN_ONCE
       to pr_devel, wrapped line, fixed id check.]
      Signed-off-by: default avatarSam Bobroff <sam.bobroff@au1.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      1e175d2e
  5. 22 Jul, 2018 8 commits
  6. 21 Jul, 2018 8 commits
    • Linus Torvalds's avatar
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ea75a2c7
      Linus Torvalds authored
      Pull core kernel fixes from Ingo Molnar:
       "This is mostly the copy_to_user_mcsafe() related fixes from Dan
        Williams, and an ORC fix for Clang"
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/asm/memcpy_mcsafe: Fix copy_to_user_mcsafe() exception handling
        lib/iov_iter: Fix pipe handling in _copy_to_iter_mcsafe()
        lib/iov_iter: Document _copy_to_iter_flushcache()
        lib/iov_iter: Document _copy_to_iter_mcsafe()
        objtool: Use '.strtab' if '.shstrtab' doesn't exist, to support ORC tables on Clang
      ea75a2c7
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · ffb48e79
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "Two regression fixes, one for xmon disassembly formatting and the
        other to fix the E500 build.
      
        Two commits to fix a potential security issue in the VFIO code under
        obscure circumstances.
      
        And finally a fix to the Power9 idle code to restore SPRG3, which is
        user visible and used for sched_getcpu().
      
        Thanks to: Alexey Kardashevskiy, David Gibson. Gautham R. Shenoy,
        James Clarke"
      
      * tag 'powerpc-4.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/powernv: Fix save/restore of SPRG3 on entry/exit from stop (idle)
        powerpc/Makefile: Assemble with -me500 when building for E500
        KVM: PPC: Check if IOMMU page is contained in the pinned physical page
        vfio/spapr: Use IOMMU pageshift rather than pagesize
        powerpc/xmon: Fix disassembly since printf changes
      ffb48e79
    • Linus Torvalds's avatar
      Merge tag 'for-4.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 55b636b4
      Linus Torvalds authored
      Pull btrfs fix from David Sterba:
       "A fix of a corruption regarding fsync and clone, under some very
        specific conditions explained in the patch.
      
        The fix is marked for stable 3.16+ so I'd like to get it merged now
        given the impact"
      
      * tag 'for-4.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        Btrfs: fix file data corruption after cloning a range and fsync
      55b636b4
    • Linus Torvalds's avatar
      mm: make vm_area_alloc() initialize core fields · 490fc053
      Linus Torvalds authored
      Like vm_area_dup(), it initializes the anon_vma_chain head, and the
      basic mm pointer.
      
      The rest of the fields end up being different for different users,
      although the plan is to also initialize the 'vm_ops' field to a dummy
      entry.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      490fc053
    • Linus Torvalds's avatar
      mm: make vm_area_dup() actually copy the old vma data · 95faf699
      Linus Torvalds authored
      .. and re-initialize th eanon_vma_chain head.
      
      This removes some boiler-plate from the users, and also makes it clear
      why it didn't need use the 'zalloc()' version.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      95faf699
    • Linus Torvalds's avatar
      mm: use helper functions for allocating and freeing vm_area structs · 3928d4f5
      Linus Torvalds authored
      The vm_area_struct is one of the most fundamental memory management
      objects, but the management of it is entirely open-coded evertwhere,
      ranging from allocation and freeing (using kmem_cache_[z]alloc and
      kmem_cache_free) to initializing all the fields.
      
      We want to unify this in order to end up having some unified
      initialization of the vmas, and the first step to this is to at least
      have basic allocation functions.
      
      Right now those functions are literally just wrappers around the
      kmem_cache_*() calls.  This is a purely mechanical conversion:
      
          # new vma:
          kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL) -> vm_area_alloc()
      
          # copy old vma
          kmem_cache_alloc(vm_area_cachep, GFP_KERNEL) -> vm_area_dup(old)
      
          # free vma
          kmem_cache_free(vm_area_cachep, vma) -> vm_area_free(vma)
      
      to the point where the old vma passed in to the vm_area_dup() function
      isn't even used yet (because I've left all the old manual initialization
      alone).
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3928d4f5
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 191a3afa
      Linus Torvalds authored
      Merge fixes from Andrew Morton:
       "5 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm: memcg: fix use after free in mem_cgroup_iter()
        mm/huge_memory.c: fix data loss when splitting a file pmd
        fat: fix memory allocation failure handling of match_strdup()
        MAINTAINERS: Peter has moved
        mm/memblock: add missing include <linux/bootmem.h>
      191a3afa
    • Jing Xia's avatar
      mm: memcg: fix use after free in mem_cgroup_iter() · 9f15bde6
      Jing Xia authored
      It was reported that a kernel crash happened in mem_cgroup_iter(), which
      can be triggered if the legacy cgroup-v1 non-hierarchical mode is used.
      
      Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b8f
      ......
      Call trace:
        mem_cgroup_iter+0x2e0/0x6d4
        shrink_zone+0x8c/0x324
        balance_pgdat+0x450/0x640
        kswapd+0x130/0x4b8
        kthread+0xe8/0xfc
        ret_from_fork+0x10/0x20
      
        mem_cgroup_iter():
            ......
            if (css_tryget(css))    <-- crash here
      	    break;
            ......
      
      The crashing reason is that mem_cgroup_iter() uses the memcg object whose
      pointer is stored in iter->position, which has been freed before and
      filled with POISON_FREE(0x6b).
      
      And the root cause of the use-after-free issue is that
      invalidate_reclaim_iterators() fails to reset the value of iter->position
      to NULL when the css of the memcg is released in non- hierarchical mode.
      
      Link: http://lkml.kernel.org/r/1531994807-25639-1-git-send-email-jing.xia@unisoc.com
      Fixes: 6df38689 ("mm: memcontrol: fix possible memcg leak due to interrupted reclaim")
      Signed-off-by: default avatarJing Xia <jing.xia.mail@gmail.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: <chunyan.zhang@unisoc.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9f15bde6