1. 04 Sep, 2024 40 commits
    • Baolin Wang's avatar
      mm: shmem: split large entry if the swapin folio is not large · 12885cbe
      Baolin Wang authored
      Now the swap device can only swap-in order 0 folio, even though a large
      folio is swapped out.  This requires us to split the large entry
      previously saved in the shmem pagecache to support the swap in of small
      folios.
      
      [hughd@google.com: fix warnings from kmalloc_fix_flags()]
        Link: https://lkml.kernel.org/r/e2a2ba5d-864c-50aa-7579-97cba1c7dd0c@google.com
      [baolin.wang@linux.alibaba.com: drop the 'new_order' parameter]
        Link: https://lkml.kernel.org/r/39c71ccf-669b-4d9f-923c-f6b9c4ceb8df@linux.alibaba.com
      Link: https://lkml.kernel.org/r/4a0f12f27c54a62eb4d9ca1265fed3a62531a63e.1723434324.git.baolin.wang@linux.alibaba.comSigned-off-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Barry Song <baohua@kernel.org>
      Cc: Chris Li <chrisl@kernel.org>
      Cc: Daniel Gomez <da.gomez@samsung.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Lance Yang <ioworker0@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Pankaj Raghav <p.raghav@samsung.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      12885cbe
    • Baolin Wang's avatar
      mm: shmem: drop folio reference count using 'nr_pages' in shmem_delete_from_page_cache() · 872339c3
      Baolin Wang authored
      To support large folio swapin/swapout for shmem in the following patches,
      drop the folio's reference count by the number of pages contained in the
      folio when a shmem folio is deleted from shmem pagecache after adding into
      swap cache.
      
      Link: https://lkml.kernel.org/r/b371eadb27f42fc51261c51008fbb9a334985b4c.1723434324.git.baolin.wang@linux.alibaba.comSigned-off-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Cc: Barry Song <baohua@kernel.org>
      Cc: Chris Li <chrisl@kernel.org>
      Cc: Daniel Gomez <da.gomez@samsung.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Lance Yang <ioworker0@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Pankaj Raghav <p.raghav@samsung.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      872339c3
    • Baolin Wang's avatar
      mm: shmem: support large folio allocation for shmem_replace_folio() · 736f0e03
      Baolin Wang authored
      To support large folio swapin for shmem in the following patches, add
      large folio allocation for the new replacement folio in
      shmem_replace_folio().  Moreover large folios occupy N consecutive entries
      in the swap cache instead of using multi-index entries like the page
      cache, therefore we should replace each consecutive entries in the swap
      cache instead of using the shmem_replace_entry().
      
      As well as updating statistics and folio reference count using the number
      of pages in the folio.
      
      [baolin.wang@linux.alibaba.com: fix the gfp flag for large folio allocation]
        Link: https://lkml.kernel.org/r/5b1e9c5a-7f61-4d97-a8d7-41767ca04c77@linux.alibaba.com
      [baolin.wang@linux.alibaba.com: fix build without CONFIG_TRANSPARENT_HUGEPAGE]
        Link: https://lkml.kernel.org/r/8c03467c-63b2-43b4-9851-222d4188725c@linux.alibaba.com
      Link: https://lkml.kernel.org/r/a41138ecc857ef13e7c5ffa0174321e9e2c9970a.1723434324.git.baolin.wang@linux.alibaba.comSigned-off-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Cc: Barry Song <baohua@kernel.org>
      Cc: Chris Li <chrisl@kernel.org>
      Cc: Daniel Gomez <da.gomez@samsung.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Lance Yang <ioworker0@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Pankaj Raghav <p.raghav@samsung.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      736f0e03
    • Baolin Wang's avatar
      mm: shmem: use swap_free_nr() to free shmem swap entries · 40ff2d11
      Baolin Wang authored
      As a preparation for supporting shmem large folio swapout, use
      swap_free_nr() to free some continuous swap entries of the shmem large
      folio when the large folio was swapped in from the swap cache.  In
      addition, the index should also be round down to the number of pages when
      adding the swapin folio into the pagecache.
      
      Link: https://lkml.kernel.org/r/342207fa679fc88a447dac2e101ad79e6050fe79.1723434324.git.baolin.wang@linux.alibaba.comSigned-off-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Cc: Barry Song <baohua@kernel.org>
      Cc: Chris Li <chrisl@kernel.org>
      Cc: Daniel Gomez <da.gomez@samsung.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Lance Yang <ioworker0@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Pankaj Raghav <p.raghav@samsung.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      40ff2d11
    • Baolin Wang's avatar
      mm: filemap: use xa_get_order() to get the swap entry order · fb724159
      Baolin Wang authored
      In the following patches, shmem will support the swap out of large folios,
      which means the shmem mappings may contain large order swap entries, so
      using xa_get_order() to get the folio order of the shmem swap entry to
      update the '*start' correctly.
      
      [hughd@google.com: use xa_get_order() to get the swap entry order]
        Link: https://lkml.kernel.org/r/c336e6e4-da7f-b714-c0f1-12df715f2611@google.com
      Link: https://lkml.kernel.org/r/6876d55145c1cc80e79df7884aa3a62e397b101d.1723434324.git.baolin.wang@linux.alibaba.comSigned-off-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Barry Song <baohua@kernel.org>
      Cc: Chris Li <chrisl@kernel.org>
      Cc: Daniel Gomez <da.gomez@samsung.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Lance Yang <ioworker0@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Pankaj Raghav <p.raghav@samsung.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      fb724159
    • Daniel Gomez's avatar
      mm: shmem: return number of pages beeing freed in shmem_free_swap · 6ea0d1cc
      Daniel Gomez authored
      Both shmem_free_swap callers expect the number of pages being freed.  In
      the large folios context, this needs to support larger values other than 0
      (used as 1 page being freed) and -ENOENT (used as 0 pages being freed). 
      In preparation for large folios adoption, make shmem_free_swap routine
      return the number of pages being freed.  So, returning 0 in this context,
      means 0 pages being freed.
      
      While we are at it, changing to use free_swap_and_cache_nr() to free large
      order swap entry by Baolin Wang.
      
      Link: https://lkml.kernel.org/r/9623e863c83d749d5ab407f6fdf0a8e5a3bdf052.1723434324.git.baolin.wang@linux.alibaba.comSigned-off-by: default avatarDaniel Gomez <da.gomez@samsung.com>
      Signed-off-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Suggested-by: default avatarMatthew Wilcox <willy@infradead.org>
      Cc: Barry Song <baohua@kernel.org>
      Cc: Chris Li <chrisl@kernel.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Lance Yang <ioworker0@gmail.com>
      Cc: Pankaj Raghav <p.raghav@samsung.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6ea0d1cc
    • Baolin Wang's avatar
      mm: shmem: extend shmem_partial_swap_usage() to support large folio swap · 50f381ec
      Baolin Wang authored
      To support shmem large folio swapout in the following patches, using
      xa_get_order() to get the order of the swap entry to calculate the swap
      usage of shmem.
      
      Link: https://lkml.kernel.org/r/60b130b9fc3e422bb91293a172c2113c85e9233a.1723434324.git.baolin.wang@linux.alibaba.comSigned-off-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Cc: Barry Song <baohua@kernel.org>
      Cc: Chris Li <chrisl@kernel.org>
      Cc: Daniel Gomez <da.gomez@samsung.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Lance Yang <ioworker0@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Pankaj Raghav <p.raghav@samsung.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      50f381ec
    • Baolin Wang's avatar
      mm: swap: extend swap_shmem_alloc() to support batch SWAP_MAP_SHMEM flag setting · 65018076
      Baolin Wang authored
      Patch series "support large folio swap-out and swap-in for shmem", v5.
      
      Shmem will support large folio allocation [1] [2] to get a better
      performance, however, the memory reclaim still splits the precious large
      folios when trying to swap-out shmem, which may lead to the memory
      fragmentation issue and can not take advantage of the large folio for
      shmeme.
      
      Moreover, the swap code already supports for swapping out large folio
      without split, and large folio swap-in[3] series is queued into
      mm-unstable branch.  Hence this patch set also supports the large folio
      swap-out and swap-in for shmem.
      
      
      This patch (of 9):
      
      To support shmem large folio swap operations, add a new parameter to
      swap_shmem_alloc() that allows batch SWAP_MAP_SHMEM flag setting for shmem
      swap entries.
      
      While we are at it, using folio_nr_pages() to get the number of pages of
      the folio as a preparation.
      
      Link: https://lkml.kernel.org/r/cover.1723434324.git.baolin.wang@linux.alibaba.com
      Link: https://lkml.kernel.org/r/99f64115d04b285e009580eb177352c57119ffd0.1723434324.git.baolin.wang@linux.alibaba.comSigned-off-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Reviewed-by: default avatarBarry Song <baohua@kernel.org>
      Cc: Chris Li <chrisl@kernel.org>
      Cc: Daniel Gomez <da.gomez@samsung.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Lance Yang <ioworker0@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Pankaj Raghav <p.raghav@samsung.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      65018076
    • Barry Song's avatar
      mm: attempt to batch free swap entries for zap_pte_range() · bea67dcc
      Barry Song authored
      Zhiguo reported that swap release could be a serious bottleneck during
      process exits[1].  With mTHP, we have the opportunity to batch free swaps.
      
      Thanks to the work of Chris and Kairui[2], I was able to achieve this
      optimization with minimal code changes by building on their efforts.
      
      If swap_count is 1, which is likely true as most anon memory are private,
      we can free all contiguous swap slots all together.
      
      Ran the below test program for measuring the bandwidth of munmap
      using zRAM and 64KiB mTHP:
      
       #include <sys/mman.h>
       #include <sys/time.h>
       #include <stdlib.h>
      
       unsigned long long tv_to_ms(struct timeval tv)
       {
              return tv.tv_sec * 1000 + tv.tv_usec / 1000;
       }
      
       main()
       {
              struct timeval tv_b, tv_e;
              int i;
       #define SIZE 1024*1024*1024
              void *p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
                                      MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
              if (!p) {
                      perror("fail to get memory");
                      exit(-1);
              }
      
              madvise(p, SIZE, MADV_HUGEPAGE);
              memset(p, 0x11, SIZE); /* write to get mem */
      
              madvise(p, SIZE, MADV_PAGEOUT);
      
              gettimeofday(&tv_b, NULL);
              munmap(p, SIZE);
              gettimeofday(&tv_e, NULL);
      
              printf("munmap in bandwidth: %ld bytes/ms\n",
                              SIZE/(tv_to_ms(tv_e) - tv_to_ms(tv_b)));
       }
      
      The result is as below (munmap bandwidth):
                      mm-unstable  mm-unstable-with-patch
         round1       21053761      63161283
         round2       21053761      63161283
         round3       21053761      63161283
         round4       20648881      67108864
         round5       20648881      67108864
      
      munmap bandwidth becomes 3X faster.
      
      [1] https://lore.kernel.org/linux-mm/20240731133318.527-1-justinjiang@vivo.com/
      [2] https://lore.kernel.org/linux-mm/20240730-swap-allocator-v5-0-cb9c148b9297@kernel.org/
      
      [v-songbaohua@oppo.com: check all swaps belong to same swap_cgroup in swap_pte_batch()]
        Link: https://lkml.kernel.org/r/20240815215308.55233-1-21cnbao@gmail.com
      [hughd@google.com: add mem_cgroup_disabled() check]
        Link: https://lkml.kernel.org/r/33f34a88-0130-5444-9b84-93198eeb50e7@google.com
      [21cnbao@gmail.com: add missing zswap_invalidate()]
        Link: https://lkml.kernel.org/r/20240821054921.43468-1-21cnbao@gmail.com
      Link: https://lkml.kernel.org/r/20240807215859.57491-3-21cnbao@gmail.comSigned-off-by: default avatarBarry Song <v-songbaohua@oppo.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Kairui Song <kasong@tencent.com>
      Cc: Chris Li <chrisl@kernel.org>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Kalesh Singh <kaleshsingh@google.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Barry Song <baohua@kernel.org>
      Cc: Yosry Ahmed <yosryahmed@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      bea67dcc
    • Barry Song's avatar
      mm: rename instances of swap_info_struct to meaningful 'si' · b85508d7
      Barry Song authored
      Patch series "mm: batch free swaps for zap_pte_range()", v3.
      
      Batch free swap slots for zap_pte_range(), making munmap three times
      faster when the page table entries are filled with swap entries to
      be freed. This is likely another advantage of using mTHP.
      
      
      This patch (of 3):
      
      "p" means "pointer to something", rename it to a more meaningful
      identifier - "si".  We also have a case with the name "sis", rename it to
      "si" as well.
      
      Link: https://lkml.kernel.org/r/20240807215859.57491-1-21cnbao@gmail.com
      Link: https://lkml.kernel.org/r/20240807215859.57491-2-21cnbao@gmail.comSigned-off-by: default avatarBarry Song <v-songbaohua@oppo.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Chris Li <chrisl@kernel.org>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Kairui Song <kasong@tencent.com>
      Cc: Kalesh Singh <kaleshsingh@google.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Zhiguo Jiang <justinjiang@vivo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b85508d7
    • Mike Rapoport (Microsoft)'s avatar
      docs: move numa=fake description to kernel-parameters.txt · 101d6470
      Mike Rapoport (Microsoft) authored
      NUMA emulation can be now enabled on arm64 and riscv in addition to x86.
      
      Move description of numa=fake parameters from x86 documentation of
      admin-guide/kernel-parameters.txt
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-27-rppt@kernel.orgSuggested-by: default avatarZi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      101d6470
    • Mike Rapoport (Microsoft)'s avatar
      mm: make range-to-target_node lookup facility a part of numa_memblks · 1b5695b0
      Mike Rapoport (Microsoft) authored
      The x86 implementation of range-to-target_node lookup (i.e. 
      phys_to_target_node() and memory_add_physaddr_to_nid()) relies on
      numa_memblks.
      
      Since numa_memblks are now part of the generic code, move these functions
      from x86 to mm/numa_memblks.c and select CONFIG_NUMA_KEEP_MEMINFO when
      CONFIG_NUMA_MEMBLKS=y for dax and cxl.
      
      [rppt@kernel.org: fix build]
        Link: https://lkml.kernel.org/r/ZtVfSt_zloPdDqVB@kernel.org
      Link: https://lkml.kernel.org/r/20240807064110.1003856-26-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1b5695b0
    • Mike Rapoport (Microsoft)'s avatar
      arch_numa: switch over to numa_memblks · 76750765
      Mike Rapoport (Microsoft) authored
      Until now arch_numa was directly translating firmware NUMA information
      to memblock.
      
      Using numa_memblks as an intermediate step has a few advantages:
      * alignment with more battle tested x86 implementation
      * availability of NUMA emulation
      * maintaining node information for not yet populated memory
      
      Adjust a few places in numa_memblks to compile with 32-bit phys_addr_t and
      replace current functionality related to numa_add_memblk() and
      __node_distance() in arch_numa with the implementation based on
      numa_memblks and add functions required by numa_emulation.
      
      [rppt@kernel.org: fix section mismatch]
        Link: https://lkml.kernel.org/r/ZrO6cExVz1He_yPn@kernel.org
      [rppt@kernel.org: PFN_PHYS() translation is unnecessary here]
        Link: https://lkml.kernel.org/r/Zs2T5wkSYO9MGcab@kernel.org
      Link: https://lkml.kernel.org/r/20240807064110.1003856-25-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      76750765
    • Mike Rapoport (Microsoft)'s avatar
      of, numa: return -EINVAL when no numa-node-id is found · 7e488677
      Mike Rapoport (Microsoft) authored
      Currently of_numa_parse_memory_nodes() returns 0 if no "memory" node in
      device tree contains "numa-node-id" property.  This makes of_numa_init()
      to return "success" despite no NUMA nodes were actually parsed and set up.
      
      arch_numa workarounds this by returning an error if numa_nodes_parsed is
      empty.
      
      numa_memblks however would WARN() in such case and since it will be used
      by arch_numa shortly, such warning is not desirable.
      
      Make sure of_numa_init() returns -EINVAL when no NUMA node information was
      found in the device tree.
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-24-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7e488677
    • Mike Rapoport (Microsoft)'s avatar
      mm: numa_memblks: use memblock_{start,end}_of_DRAM() when sanitizing meminfo · f7feea28
      Mike Rapoport (Microsoft) authored
      numa_cleanup_meminfo() moves blocks outside system RAM to
      numa_reserved_meminfo and it uses 0 and PFN_PHYS(max_pfn) to determine the
      memory boundaries.
      
      Replace the memory range boundaries with more portable
      memblock_start_of_DRAM() and memblock_end_of_DRAM().
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-23-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f7feea28
    • Mike Rapoport (Microsoft)'s avatar
      mm: numa_memblks: make several functions and variables static · 317ef459
      Mike Rapoport (Microsoft) authored
      Make functions and variables that are exclusively used by numa_memblks
      static.
      
      Move numa_nodemask_from_meminfo() before its callers to avoid forward
      declaration.
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-22-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      317ef459
    • Mike Rapoport (Microsoft)'s avatar
      mm: numa_memblks: introduce numa_memblks_init · 692d73d2
      Mike Rapoport (Microsoft) authored
      Move most of x86::numa_init() to numa_memblks so that the latter will be
      more self-contained.
      
      With this numa_memblk data structures should not be exposed to the
      architecture specific code.
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-21-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      692d73d2
    • Mike Rapoport (Microsoft)'s avatar
      mm: introduce numa_emulation · b0c4e27c
      Mike Rapoport (Microsoft) authored
      Move numa_emulation code from arch/x86 to mm/numa_emulation.c
      
      This code will be later reused by arch_numa.
      
      No functional changes.
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-20-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b0c4e27c
    • Mike Rapoport (Microsoft)'s avatar
      mm: move numa_distance and related code from x86 to numa_memblks · 75f9d4cc
      Mike Rapoport (Microsoft) authored
      Move code dealing with numa_distance array from arch/x86 to
      mm/numa_memblks.c
      
      This code will be later reused by arch_numa.
      
      No functional changes.
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-19-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      75f9d4cc
    • Mike Rapoport (Microsoft)'s avatar
      mm: introduce numa_memblks · 87482708
      Mike Rapoport (Microsoft) authored
      Move code dealing with numa_memblks from arch/x86 to mm/ and add Kconfig
      options to let x86 select it in its Kconfig.
      
      This code will be later reused by arch_numa.
      
      No functional changes.
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-18-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      87482708
    • Mike Rapoport (Microsoft)'s avatar
      x86/numa: numa_{add,remove}_cpu: make cpu parameter unsigned · 7a715285
      Mike Rapoport (Microsoft) authored
      CPU id cannot be negative.
      
      Making it unsigned also aligns with declarations in
      include/asm-generic/numa.h used by arm64 and riscv and allows sharing numa
      emulation code with these architectures.
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-17-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7a715285
    • Mike Rapoport (Microsoft)'s avatar
      x86/numa_emu: use a helper function to get MAX_DMA32_PFN · e52d5873
      Mike Rapoport (Microsoft) authored
      This is required to make numa emulation code architecture independent so
      that it can be moved to generic code in following commits.
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-16-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e52d5873
    • Mike Rapoport (Microsoft)'s avatar
      x86/numa_emu: split __apicid_to_node update to a helper function · 55e74bcc
      Mike Rapoport (Microsoft) authored
      This is required to make numa emulation code architecture independent so
      that it can be moved to generic code in following commits.
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-15-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      55e74bcc
    • Mike Rapoport (Microsoft)'s avatar
      x86/numa_emu: simplify allocation of phys_dist · e3c1299c
      Mike Rapoport (Microsoft) authored
      By the time numa_emulation() is called, all physical memory is already
      mapped in the direct map and there is no need to define limits for
      memblock allocation.
      
      Replace memblock_phys_alloc_range() with memblock_alloc().
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-14-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e3c1299c
    • Mike Rapoport (Microsoft)'s avatar
      x86/numa: move FAKE_NODE_* defines to numa_emu · e4a5e5a5
      Mike Rapoport (Microsoft) authored
      The definitions of FAKE_NODE_MIN_SIZE and FAKE_NODE_MIN_HASH_MASK are only
      used by numa emulation code, make them local to
      arch/x86/mm/numa_emulation.c
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-13-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e4a5e5a5
    • Mike Rapoport (Microsoft)'s avatar
      x86/numa: use get_pfn_range_for_nid to verify that node spans memory · 77c1d0e7
      Mike Rapoport (Microsoft) authored
      Instead of looping over numa_meminfo array to detect node's start and
      end addresses use get_pfn_range_for_init().
      
      This is shorter and make it easier to lift numa_memblks to generic code.
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-12-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      77c1d0e7
    • Mike Rapoport (Microsoft)'s avatar
      x86/numa: simplify numa_distance allocation · 9916c27d
      Mike Rapoport (Microsoft) authored
      Allocation of numa_distance uses memblock_phys_alloc_range() to limit
      allocation to be below the last mapped page.
      
      But NUMA initializaition runs after the direct map is populated and there
      is also code in setup_arch() that adjusts memblock limit to reflect how
      much memory is already mapped in the direct map.
      
      Simplify the allocation of numa_distance and use plain memblock_alloc().
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-11-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9916c27d
    • Mike Rapoport (Microsoft)'s avatar
      arch, mm: pull out allocation of NODE_DATA to generic code · 3515863d
      Mike Rapoport (Microsoft) authored
      Architectures that support NUMA duplicate the code that allocates
      NODE_DATA on the node-local memory with slight variations in reporting of
      the addresses where the memory was allocated.
      
      Use x86 version as the basis for the generic alloc_node_data() function
      and call this function in architecture specific numa initialization.
      
      Round up node data size to SMP_CACHE_BYTES rather than to PAGE_SIZE like
      x86 used to do since the bootmem era when allocation granularity was
      PAGE_SIZE anyway.
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-10-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3515863d
    • Mike Rapoport (Microsoft)'s avatar
      mm: drop CONFIG_HAVE_ARCH_NODEDATA_EXTENSION · ec164cf1
      Mike Rapoport (Microsoft) authored
      There are no users of HAVE_ARCH_NODEDATA_EXTENSION left, so
      arch_alloc_nodedata() and arch_refresh_nodedata() are not needed anymore.
      
      Replace the call to arch_alloc_nodedata() in free_area_init() with a new
      helper alloc_offline_node_data(), remove arch_refresh_nodedata() and
      cleanup include/linux/memory_hotplug.h from the associated ifdefery.
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-9-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ec164cf1
    • Mike Rapoport (Microsoft)'s avatar
      arch, mm: move definition of node_data to generic code · 46bcce50
      Mike Rapoport (Microsoft) authored
      Every architecture that supports NUMA defines node_data in the same way:
      
      	struct pglist_data *node_data[MAX_NUMNODES];
      
      No reason to keep multiple copies of this definition and its forward
      declarations, especially when such forward declaration is the only thing
      in include/asm/mmzone.h for many architectures.
      
      Add definition and declaration of node_data to generic code and drop
      architecture-specific versions.
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-8-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Acked-by: default avatarDavidlohr Bueso <dave@stgolabs.net>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      46bcce50
    • Mike Rapoport (Microsoft)'s avatar
      MIPS: loongson64: drop HAVE_ARCH_NODEDATA_EXTENSION · 3ac9999c
      Mike Rapoport (Microsoft) authored
      Commit f8f9f21c ("MIPS: Fix build error for loongson64 and sgi-ip27")
      added HAVE_ARCH_NODEDATA_EXTENSION to loongson64 to silence a compilation
      error that happened because loongson64 didn't define array of pg_data_t as
      node_data like most other architectures did.
      
      After rename of __node_data to node_data arch_alloc_nodedata() and
      HAVE_ARCH_NODEDATA_EXTENSION can be dropped from loongson64.
      
      Since it was the only user of HAVE_ARCH_NODEDATA_EXTENSION config option
      also remove this option from arch/mips/Kconfig.
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-7-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3ac9999c
    • Mike Rapoport (Microsoft)'s avatar
      MIPS: loongson64: rename __node_data to node_data · e20bac65
      Mike Rapoport (Microsoft) authored
      Make definition of node_data match other architectures.  This will allow
      pulling declaration of node_data to the generic mm code in the following
      commit.
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-6-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Reviewed-by: default avatarJiaxun Yang <jiaxun.yang@flygoat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e20bac65
    • Mike Rapoport (Microsoft)'s avatar
      MIPS: sgi-ip27: drop HAVE_ARCH_NODEDATA_EXTENSION · 6c701269
      Mike Rapoport (Microsoft) authored
      Commit f8f9f21c ("MIPS: Fix build error for loongson64 and sgi-ip27")
      added HAVE_ARCH_NODEDATA_EXTENSION to sgi-ip27 to silence a compilation
      error that happened because sgi-ip27 didn't define array of pg_data_t as
      node_data like most other architectures did.
      
      After addition of node_data array that matches other architectures and
      after ensuring that offline nodes do not appear on node_possible_map, it
      is safe to drop arch_alloc_nodedata() and HAVE_ARCH_NODEDATA_EXTENSION
      from sgi-ip27.
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-5-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6c701269
    • Mike Rapoport (Microsoft)'s avatar
      MIPS: sgi-ip27: ensure node_possible_map only contains valid nodes · 0c445078
      Mike Rapoport (Microsoft) authored
      For SGI IP27 machines node_possible_map is statically set to NODE_MASK_ALL
      and it is not updated during NUMA initialization.
      
      Ensure that it only contains nodes present in the system.
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-4-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0c445078
    • Mike Rapoport (Microsoft)'s avatar
      MIPS: sgi-ip27: make NODE_DATA() the same as on all other architectures · bc5c8ad3
      Mike Rapoport (Microsoft) authored
      sgi-ip27 is the only system that defines NODE_DATA() differently than the
      rest of NUMA machines.
      
      Add node_data array of struct pglist pointers that will point to
      __node_data[node]->pglist and redefine NODE_DATA() to use node_data array.
      
      This will allow pulling declaration of node_data to the generic mm code in
      the next commit.
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-3-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      bc5c8ad3
    • Mike Rapoport (Microsoft)'s avatar
      mm: move kernel/numa.c to mm/ · 0e8b6798
      Mike Rapoport (Microsoft) authored
      Patch series "mm: introduce numa_memblks", v4.
      
      Following the discussion about handling of CXL fixed memory windows on
      arm64 [1] I decided to bite the bullet and move numa_memblks from x86 to
      the generic code so they will be available on arm64/riscv and maybe on
      loongarch sometime later.
      
      While it could be possible to use memblock to describe CXL memory windows,
      it currently lacks notion of unpopulated memory ranges and numa_memblks
      does implement this.
      
      Another reason to make numa_memblks generic is that both arch_numa (arm64
      and riscv) and loongarch use trimmed copy of x86 code although there is no
      fundamental reason why the same code cannot be used on all these
      platforms.  Having numa_memblks in mm/ will make it's interaction with
      ACPI and FDT more consistent and I believe will reduce maintenance burden.
      
      And with generic numa_memblks it is (almost) straightforward to enable
      NUMA emulation on arm64 and riscv.
      
      The first 9 commits in this series are cleanups that are not strictly
      related to numa_memblks.
      Commits 10-16 slightly reorder code in x86 to allow extracting numa_memblks
      and NUMA emulation to the generic code.
      Commits 17-19 actually move the code from arch/x86/ to mm/ and commits 20-22
      does some aftermath cleanups.
      Commit 23 updates of_numa_init() to return error of no NUMA nodes were
      found in the device tree.
      Commit 24 switches arch_numa to numa_memblks.
      Commit 25 enables usage of phys_to_target_node() and
      memory_add_physaddr_to_nid() with numa_memblks.
      Commit 26 moves the description for numa=fake from x86 to admin-guide.
      
      [1] https://lore.kernel.org/all/20240529171236.32002-1-Jonathan.Cameron@huawei.com/
      
      
      This patch (of 26):
      
      The stub functions in kernel/numa.c belong to mm/ rather than to kernel/
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-1-rppt@kernel.org
      Link: https://lkml.kernel.org/r/20240807064110.1003856-2-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0e8b6798
    • Kairui Song's avatar
      mm: swap: add a adaptive full cluster cache reclaim · 2cacbdfd
      Kairui Song authored
      Link all full cluster with one full list, and reclaim from it when the
      allocation have ran out of all usable clusters.
      
      There are many reason a folio can end up being in the swap cache while
      having no swap count reference.  So the best way to search for such slots
      is still by iterating the swap clusters.
      
      With the list as an LRU, iterating from the oldest cluster and keep them
      rotating is a very doable and clean way to free up potentially not inuse
      clusters.
      
      When any allocation failure, try reclaim and rotate only one cluster. 
      This is adaptive for high order allocations they can tolerate fallback. 
      So this avoids latency, and give the full cluster list an fair chance to
      get reclaimed.  It release the usage stress for the fallback order 0
      allocation or following up high order allocation.
      
      If the swap device is getting very full, reclaim more aggresively to
      ensure no OOM will happen.  This ensures order 0 heavy workload won't go
      OOM as order 0 won't fail if any cluster still have any space.
      
      [ryncsn@gmail.com: fix discard of full cluster]
        Link: https://lkml.kernel.org/r/CAMgjq7CWwK75_2Zi5P40K08pk9iqOcuWKL6khu=x4Yg_nXaQag@mail.gmail.com
      Link: https://lkml.kernel.org/r/20240730-swap-allocator-v5-9-cb9c148b9297@kernel.orgSigned-off-by: default avatarKairui Song <kasong@tencent.com>
      Reported-by: default avatarBarry Song <21cnbao@gmail.com>
      Cc: Chris Li <chrisl@kernel.org>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Kalesh Singh <kaleshsingh@google.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Kairui Song <ryncsn@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2cacbdfd
    • Kairui Song's avatar
      mm: swap: relaim the cached parts that got scanned · 661383c6
      Kairui Song authored
      This commit implements reclaim during scan for cluster allocator.
      
      Cluster scanning were unable to reuse SWAP_HAS_CACHE slots, which could
      result in low allocation success rate or early OOM.
      
      So to ensure maximum allocation success rate, integrate reclaiming with
      scanning.  If found a range of suitable swap slots but fragmented due to
      HAS_CACHE, just try to reclaim the slots.
      
      Link: https://lkml.kernel.org/r/20240730-swap-allocator-v5-8-cb9c148b9297@kernel.orgSigned-off-by: default avatarKairui Song <kasong@tencent.com>
      Reported-by: default avatarBarry Song <21cnbao@gmail.com>
      Cc: Chris Li <chrisl@kernel.org>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Kalesh Singh <kaleshsingh@google.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      661383c6
    • Kairui Song's avatar
      mm: swap: add a fragment cluster list · 477cb7ba
      Kairui Song authored
      Now swap cluster allocator arranges the clusters in LRU style, so the
      "cold" cluster stay at the head of nonfull lists are the ones that were
      used for allocation long time ago and still partially occupied.  So if
      allocator can't find enough contiguous slots to satisfy an high order
      allocation, it's unlikely there will be slot being free on them to satisfy
      the allocation, at least in a short period.
      
      As a result, nonfull cluster scanning will waste time repeatly scanning
      the unusable head of the list.
      
      Also, multiple CPUs could content on the same head cluster of nonfull
      list.  Unlike free clusters which are removed from the list when any CPU
      starts using it, nonfull cluster stays on the head.
      
      So introduce a new list frag list, all scanned nonfull clusters will be
      moved to this list.  Both for avoiding repeated scanning and contention.
      
      Frag list is still used as fallback for allocations, so if one CPU failed
      to allocate one order of slots, it can still steal other CPU's clusters. 
      And order 0 will favor the fragmented clusters to better protect nonfull
      clusters
      
      If any slots on a fragment list are being freed, move the fragment list
      back to nonfull list indicating it worth another scan on the cluster. 
      Compared to scan upon freeing a slot, this keep the scanning lazy and save
      some CPU if there are still other clusters to use.
      
      It may seems unneccessay to keep the fragmented cluster on list at all if
      they can't be used for specific order allocation.  But this will start to
      make sense once reclaim dring scanning is ready.
      
      Link: https://lkml.kernel.org/r/20240730-swap-allocator-v5-7-cb9c148b9297@kernel.orgSigned-off-by: default avatarKairui Song <kasong@tencent.com>
      Reported-by: default avatarBarry Song <21cnbao@gmail.com>
      Cc: Chris Li <chrisl@kernel.org>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Kalesh Singh <kaleshsingh@google.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      477cb7ba
    • Kairui Song's avatar
      mm: swap: allow cache reclaim to skip slot cache · 862590ac
      Kairui Song authored
      Currently we free the reclaimed slots through slot cache even if the slot
      is required to be empty immediately.  As a result the reclaim caller will
      see the slot still occupied even after a successful reclaim, and need to
      keep reclaiming until slot cache get flushed.  This caused ineffective or
      over reclaim when SWAP is under stress.
      
      So introduce a new flag allowing the slot to be emptied bypassing the slot
      cache.
      
      [21cnbao@gmail.com: small folios should have nr_pages == 1 but not nr_page == 0]
        Link: https://lkml.kernel.org/r/20240805015324.45134-1-21cnbao@gmail.com
      Link: https://lkml.kernel.org/r/20240730-swap-allocator-v5-6-cb9c148b9297@kernel.orgSigned-off-by: default avatarKairui Song <kasong@tencent.com>
      Reported-by: default avatarBarry Song <21cnbao@gmail.com>
      Cc: Chris Li <chrisl@kernel.org>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Kalesh Singh <kaleshsingh@google.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      862590ac