1. 20 Jul, 2020 5 commits
    • Aneesh Kumar K.V's avatar
      powerpc/mm/radix: Create separate mappings for hot-plugged memory · af9d00e9
      Aneesh Kumar K.V authored
      To enable memory unplug without splitting kernel page table
      mapping, we force the max mapping size to the LMB size. LMB
      size is the unit in which hypervisor will do memory add/remove
      operation.
      
      Pseries systems supports max LMB size of 256MB. Hence on pseries,
      we now end up mapping memory with 2M page size instead of 1G. To improve
      that we want hypervisor to hint the kernel about the hotplug
      memory range. That was added that as part of
      
      commit b6eca183 ("powerpc/kernel: Enables memory
      hot-remove after reboot on pseries guests")
      
      But PowerVM doesn't provide that hint yet. Once we get PowerVM
      updated, we can then force the 2M mapping only to hot-pluggable
      memory region using memblock_is_hotpluggable(). Till then
      let's depend on LMB size for finding the mapping page size
      for linear range.
      
      With this change KVM guest will also be doing linear mapping with
      2M page size.
      
      The actual TLB benefit of mapping guest page table entries with
      hugepage size can only be materialized if the partition scoped
      entries are also using the same or higher page size. A guest using
      1G hugetlbfs backing guest memory can have a performance impact with
      the above change.
      Signed-off-by: default avatarBharata B Rao <bharata@linux.ibm.com>
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      [mpe: Fold in fix from Aneesh spotted by lkp@intel.com]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200709131925.922266-5-aneesh.kumar@linux.ibm.com
      af9d00e9
    • Bharata B Rao's avatar
      powerpc/mm/radix: Remove split_kernel_mapping() · d6d6ebfc
      Bharata B Rao authored
      We split the page table mapping on memory unplug if the
      linear range was mapped with huge page mapping (for ex: 1G)
      The page table splitting code has a few issues:
      
      1. Recursive locking
      --------------------
      Memory unplug path takes cpu_hotplug_lock and calls stop_machine()
      for splitting the mappings. However stop_machine() takes
      cpu_hotplug_lock again causing deadlock.
      
      2. BUG: sleeping function called from in_atomic() context
      ---------------------------------------------------------
      Memory unplug path (remove_pagetable) takes init_mm.page_table_lock
      spinlock and later calls stop_machine() which does wait_for_completion()
      
      3. Bad unlock unbalance
      -----------------------
      Memory unplug path takes init_mm.page_table_lock spinlock and calls
      stop_machine(). The stop_machine thread function runs in a different
      thread context (migration thread) which tries to release and reaquire
      ptl. Releasing ptl from a different thread than which acquired it
      causes bad unlock unbalance.
      
      These problems can be avoided if we avoid mapping hot-plugged memory
      with 1G mapping, thereby removing the need for splitting them during
      unplug. The kernel always make sure the minimum unplug request is
      SUBSECTION_SIZE for device memory and SECTION_SIZE for regular memory.
      
      In preparation for such a change remove page table splitting support.
      
      This essentially is a revert of
      commit 4dd5f8a9 ("powerpc/mm/radix: Split linear mapping on hot-unplug")
      Signed-off-by: default avatarBharata B Rao <bharata@linux.ibm.com>
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200709131925.922266-4-aneesh.kumar@linux.ibm.com
      d6d6ebfc
    • Bharata B Rao's avatar
      powerpc/mm/radix: Free PUD table when freeing pagetable · 9ce8853b
      Bharata B Rao authored
      remove_pagetable() isn't freeing PUD table. This causes memory
      leak during memory unplug. Fix this.
      
      Fixes: 4b5d62ca ("powerpc/mm: add radix__remove_section_mapping()")
      Signed-off-by: default avatarBharata B Rao <bharata@linux.ibm.com>
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200709131925.922266-3-aneesh.kumar@linux.ibm.com
      9ce8853b
    • Aneesh Kumar K.V's avatar
      powerpc/mm/radix: Fix PTE/PMD fragment count for early page table mappings · 645d5ce2
      Aneesh Kumar K.V authored
      We can hit the following BUG_ON during memory unplug:
      
      kernel BUG at arch/powerpc/mm/book3s64/pgtable.c:342!
      Oops: Exception in kernel mode, sig: 5 [#1]
      LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
      NIP [c000000000093308] pmd_fragment_free+0x48/0xc0
      LR [c00000000147bfec] remove_pagetable+0x578/0x60c
      Call Trace:
      0xc000008050000000 (unreliable)
      remove_pagetable+0x384/0x60c
      radix__remove_section_mapping+0x18/0x2c
      remove_section_mapping+0x1c/0x3c
      arch_remove_memory+0x11c/0x180
      try_remove_memory+0x120/0x1b0
      __remove_memory+0x20/0x40
      dlpar_remove_lmb+0xc0/0x114
      dlpar_memory+0x8b0/0xb20
      handle_dlpar_errorlog+0xc0/0x190
      pseries_hp_work_fn+0x2c/0x60
      process_one_work+0x30c/0x810
      worker_thread+0x98/0x540
      kthread+0x1c4/0x1d0
      ret_from_kernel_thread+0x5c/0x74
      
      This occurs when unplug is attempted for such memory which has
      been mapped using memblock pages as part of early kernel page
      table setup. We wouldn't have initialized the PMD or PTE fragment
      count for those PMD or PTE pages.
      
      This can be fixed by allocating memory in PAGE_SIZE granularity
      during early page table allocation. This makes sure a specific
      page is not shared for another memblock allocation and we can
      free them correctly on removing page-table pages.
      
      Since we now do PAGE_SIZE allocations for both PUD table and
      PMD table (Note that PTE table allocation is already of PAGE_SIZE),
      we end up allocating more memory for the same amount of system RAM.
      Here is a comparision of how much more we need for a 64T and 2G
      system after this patch:
      
      1. 64T system
      -------------
      64T RAM would need 64G for vmemmap with struct page size being 64B.
      
      128 PUD tables for 64T memory (1G mappings)
      1 PUD table and 64 PMD tables for 64G vmemmap (2M mappings)
      
      With default PUD[PMD]_TABLE_SIZE(4K), (128+1+64)*4K=772K
      With PAGE_SIZE(64K) table allocations, (128+1+64)*64K=12352K
      
      2. 2G system
      ------------
      2G RAM would need 2M for vmemmap with struct page size being 64B.
      
      1 PUD table for 2G memory (1G mapping)
      1 PUD table and 1 PMD table for 2M vmemmap (2M mappings)
      
      With default PUD[PMD]_TABLE_SIZE(4K), (1+1+1)*4K=12K
      With new PAGE_SIZE(64K) table allocations, (1+1+1)*64K=192K
      Signed-off-by: default avatarBharata B Rao <bharata@linux.ibm.com>
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200709131925.922266-2-aneesh.kumar@linux.ibm.com
      645d5ce2
    • Nicholas Piggin's avatar
      powerpc/prom: Enable Radix GTSE in cpu pa-features · 9a77c4a0
      Nicholas Piggin authored
      When '029ab30b ("powerpc/mm: Enable radix GTSE only if supported.")'
      made GTSE an MMU feature, it was enabled by default in
      powerpc-cpu-features but was missed in pa-features. This causes random
      memory corruption during boot of PowerNV kernels where
      CONFIG_PPC_DT_CPU_FTRS isn't enabled.
      
      Fixes: 029ab30b ("powerpc/mm: Enable radix GTSE only if supported.")
      Reported-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarBharata B Rao <bharata@linux.ibm.com>
      [mpe: Unwrap long line]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200720044258.863574-1-bharata@linux.ibm.com
      9a77c4a0
  2. 18 Jul, 2020 1 commit
  3. 16 Jul, 2020 34 commits