1. 06 May, 2024 40 commits
    • Kefeng Wang's avatar
      mm: move mm counter updating out of set_pte_range() · 1f2d8b44
      Kefeng Wang authored
      Patch series "mm: batch mm counter updating in filemap_map_pages()", v3.
      
      Let's batch mm counter updating to accelerate filemap_map_pages().
      
      
      This patch (of 2):
      
      In order to support batch mm counter updating in filemap_map_pages(), move
      mm counter updating out of set_pte_range(), the folios are file from
      filemap, and distinguish folios by vmf->flags and vma->vm_flags from
      another caller finish_fault().
      
      Link: https://lkml.kernel.org/r/20240412064751.119015-1-wangkefeng.wang@huawei.com
      Link: https://lkml.kernel.org/r/20240412064751.119015-2-wangkefeng.wang@huawei.comSigned-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1f2d8b44
    • Barry Song's avatar
      mm: correct the docs for thp_fault_alloc and thp_fault_fallback · a14421ae
      Barry Song authored
      The documentation does not align with the code.  In
      __do_huge_pmd_anonymous_page(), THP_FAULT_FALLBACK is incremented when
      mem_cgroup_charge() fails, despite the allocation succeeding, whereas
      THP_FAULT_ALLOC is only incremented after a successful charge.
      
      Link: https://lkml.kernel.org/r/20240412114858.407208-5-21cnbao@gmail.comSigned-off-by: default avatarBarry Song <v-songbaohua@oppo.com>
      Reviewed-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Chris Li <chrisl@kernel.org>
      Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
      Cc: Kairui Song <kasong@tencent.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Yosry Ahmed <yosryahmed@google.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a14421ae
    • Barry Song's avatar
      mm: add docs for per-order mTHP counters and transhuge_page ABI · 42248b9d
      Barry Song authored
      This patch includes documentation for mTHP counters and an ABI file for
      sys-kernel-mm-transparent-hugepage, which appears to have been missing for
      some time.
      
      [v-songbaohua@oppo.com: fix the name and unexpected indentation]
        Link: https://lkml.kernel.org/r/20240415054538.17071-1-21cnbao@gmail.com
      Link: https://lkml.kernel.org/r/20240412114858.407208-4-21cnbao@gmail.comSigned-off-by: default avatarBarry Song <v-songbaohua@oppo.com>
      Reviewed-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Chris Li <chrisl@kernel.org>
      Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
      Cc: Kairui Song <kasong@tencent.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Yosry Ahmed <yosryahmed@google.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      42248b9d
    • Barry Song's avatar
      mm: add per-order mTHP anon_swpout and anon_swpout_fallback counters · d0f048ac
      Barry Song authored
      This helps to display the fragmentation situation of the swapfile, knowing
      the proportion of how much we haven't split large folios.  So far, we only
      support non-split swapout for anon memory, with the possibility of
      expanding to shmem in the future.  So, we add the "anon" prefix to the
      counter names.
      
      Link: https://lkml.kernel.org/r/20240412114858.407208-3-21cnbao@gmail.comSigned-off-by: default avatarBarry Song <v-songbaohua@oppo.com>
      Reviewed-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Chris Li <chrisl@kernel.org>
      Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
      Cc: Kairui Song <kasong@tencent.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Yosry Ahmed <yosryahmed@google.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d0f048ac
    • Barry Song's avatar
      mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback counters · ec33687c
      Barry Song authored
      Patch series "mm: add per-order mTHP alloc and swpout counters", v6.
      
      The patchset introduces a framework to facilitate mTHP counters, starting
      with the allocation and swap-out counters.  Currently, only four new nodes
      are appended to the stats directory for each mTHP size.
      
      /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
      	anon_fault_alloc
      	anon_fault_fallback
      	anon_fault_fallback_charge
      	anon_swpout
      	anon_swpout_fallback
      
      These nodes are crucial for us to monitor the fragmentation levels of both
      the buddy system and the swap partitions.  In the future, we may consider
      adding additional nodes for further insights.
      
      
      This patch (of 4):
      
      Profiling a system blindly with mTHP has become challenging due to the
      lack of visibility into its operations.  Presenting the success rate of
      mTHP allocations appears to be pressing need.
      
      Recently, I've been experiencing significant difficulty debugging
      performance improvements and regressions without these figures.  It's
      crucial for us to understand the true effectiveness of mTHP in real-world
      scenarios, especially in systems with fragmented memory.
      
      This patch establishes the framework for per-order mTHP counters.  It
      begins by introducing the anon_fault_alloc and anon_fault_fallback
      counters.  Additionally, to maintain consistency with
      thp_fault_fallback_charge in /proc/vmstat, this patch also tracks
      anon_fault_fallback_charge when mem_cgroup_charge fails for mTHP. 
      Incorporating additional counters should now be straightforward as well.
      
      Link: https://lkml.kernel.org/r/20240412114858.407208-1-21cnbao@gmail.com
      Link: https://lkml.kernel.org/r/20240412114858.407208-2-21cnbao@gmail.comSigned-off-by: default avatarBarry Song <v-songbaohua@oppo.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Chris Li <chrisl@kernel.org>
      Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
      Cc: Kairui Song <kasong@tencent.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Yosry Ahmed <yosryahmed@google.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ec33687c
    • Sidhartha Kumar's avatar
      mm/hugetlb: rename dissolve_free_huge_pages() to dissolve_free_hugetlb_folios() · d199483c
      Sidhartha Kumar authored
      dissolve_free_huge_pages() only uses folios internally, rename it to
      dissolve_free_hugetlb_folios() and change the comments which reference it.
      
      [akpm@linux-foundation.org: remove unneeded `extern']
      Link: https://lkml.kernel.org/r/20240412182139.120871-2-sidhartha.kumar@oracle.comSigned-off-by: default avatarSidhartha Kumar <sidhartha.kumar@oracle.com>
      Reviewed-by: default avatarVishal Moola (Oracle) <vishal.moola@gmail.com>
      Reviewed-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Oscar Salvador <osalvador@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d199483c
    • Sidhartha Kumar's avatar
      mm/hugetlb: convert dissolve_free_huge_pages() to folios · 54fa49b2
      Sidhartha Kumar authored
      Allows us to rename dissolve_free_huge_pages() to
      dissolve_free_hugetlb_folio(). Convert one caller to pass in a folio
      directly and use page_folio() to convert the caller in mm/memory-failure.
      
      [sidhartha.kumar@oracle.com: remove unneeded `extern']
        Link: https://lkml.kernel.org/r/71760ed4-e80d-493a-95ea-2545414b1aba@oracle.com
      [sidhartha.kumar@oracle.com: v2]
        Link: https://lkml.kernel.org/r/20240412182139.120871-1-sidhartha.kumar@oracle.com
      Link: https://lkml.kernel.org/r/20240411164756.261178-1-sidhartha.kumar@oracle.comSigned-off-by: default avatarSidhartha Kumar <sidhartha.kumar@oracle.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarVishal Moola (Oracle) <vishal.moola@gmail.com>
      Reviewed-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Muchun Song <muchun.song@linux.dev>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      54fa49b2
    • Alex Shi (tencent)'s avatar
      mm/ksm: replace set_page_stable_node by folio_set_stable_node · 452e862f
      Alex Shi (tencent) authored
      Only single page could be reached where we set stable node after write
      protect, so use folio converted func to replace page's.  And remove the
      unused func set_page_stable_node().
      
      Link: https://lkml.kernel.org/r/20240411061713.1847574-11-alexs@kernel.orgSigned-off-by: default avatarAlex Shi (tencent) <alexs@kernel.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Izik Eidus <izik.eidus@ravellosystems.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      452e862f
    • David Hildenbrand's avatar
      mm/ksm: rename get_ksm_page_flags to ksm_get_folio_flags · 85b67b01
      David Hildenbrand authored
      As we are removing get_ksm_page_flags(), make the flags match the new
      function name.
      
      Link: https://lkml.kernel.org/r/20240411061713.1847574-10-alexs@kernel.orgSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarAlex Shi <alexs@kernel.org>
      Reviewed-by: default avatarAlex Shi <alexs@kernel.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Izik Eidus <izik.eidus@ravellosystems.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      85b67b01
    • Alex Shi (tencent)'s avatar
      mm/ksm: convert chain series funcs and replace get_ksm_page · 79899cce
      Alex Shi (tencent) authored
      In ksm stable tree all page are single, let's convert them to use and
      folios as well as stable_tree_insert/stable_tree_search funcs.  And
      replace get_ksm_page() by ksm_get_folio() since there is no more needs.
      
      It could save a few compound_head calls.
      
      Link: https://lkml.kernel.org/r/20240411061713.1847574-9-alexs@kernel.orgSigned-off-by: default avatarAlex Shi (tencent) <alexs@kernel.org>
      Cc: Izik Eidus <izik.eidus@ravellosystems.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: David Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      79899cce
    • Alex Shi (tencent)'s avatar
      mm/ksm: use folio in write_protect_page · 40d707f3
      Alex Shi (tencent) authored
      Compound page is checked and skipped before write_protect_page() called,
      use folio to save a few compound_head checks.
      
      Link: https://lkml.kernel.org/r/20240411061713.1847574-8-alexs@kernel.orgSigned-off-by: default avatarAlex Shi (tencent) <alexs@kernel.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Izik Eidus <izik.eidus@ravellosystems.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      40d707f3
    • Alex Shi (tencent)'s avatar
      mm/ksm: use ksm_get_folio in scan_get_next_rmap_item · 72556a4c
      Alex Shi (tencent) authored
      Save a compound_head call.
      
      Link: https://lkml.kernel.org/r/20240411061713.1847574-7-alexs@kernel.orgSigned-off-by: default avatarAlex Shi (tencent) <alexs@kernel.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Izik Eidus <izik.eidus@ravellosystems.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      72556a4c
    • Alex Shi (tencent)'s avatar
      mm/ksm: use folio in stable_node_dup · 6f528de2
      Alex Shi (tencent) authored
      Use ksm_get_folio() and save 2 compound_head calls.
      
      Link: https://lkml.kernel.org/r/20240411061713.1847574-6-alexs@kernel.orgSigned-off-by: default avatarAlex Shi (tencent) <alexs@kernel.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Izik Eidus <izik.eidus@ravellosystems.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6f528de2
    • Alex Shi (tencent)'s avatar
      mm/ksm: use folio in remove_stable_node · 9d5cc140
      Alex Shi (tencent) authored
      Pages in stable tree are all single normal page, so uses ksm_get_folio()
      and folio_set_stable_node(), also saves 3 calls to compound_head().
      
      Link: https://lkml.kernel.org/r/20240411061713.1847574-5-alexs@kernel.orgSigned-off-by: default avatarAlex Shi (tencent) <alexs@kernel.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Izik Eidus <izik.eidus@ravellosystems.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9d5cc140
    • Alex Shi (tencent)'s avatar
      mm/ksm: add folio_set_stable_node · b8b0ff24
      Alex Shi (tencent) authored
      Turn set_page_stable_node() into a wrapper folio_set_stable_node, and then
      use it to replace the former.  we will merge them together after all place
      converted to folio.
      
      Link: https://lkml.kernel.org/r/20240411061713.1847574-4-alexs@kernel.orgSigned-off-by: default avatarAlex Shi (tencent) <alexs@kernel.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Izik Eidus <izik.eidus@ravellosystems.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b8b0ff24
    • Alex Shi (tencent)'s avatar
      mm/ksm: use folio in remove_rmap_item_from_tree · f39b6e2d
      Alex Shi (tencent) authored
      To save 2 compound_head calls.
      
      Link: https://lkml.kernel.org/r/20240411061713.1847574-3-alexs@kernel.orgSigned-off-by: default avatarAlex Shi (tencent) <alexs@kernel.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Izik Eidus <izik.eidus@ravellosystems.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f39b6e2d
    • Alex Shi (tencent)'s avatar
      mm/ksm: add ksm_get_folio · b91f9472
      Alex Shi (tencent) authored
      Patch series "transfer page to folio in KSM".
      
      This is the first part of page to folio transfer on KSM.  Since only
      single page could be stored in KSM, we could safely transfer stable tree
      pages to folios.  
      
      This patchset could reduce ksm.o 57kbytes from 2541776 bytes on latest
      akpm/mm-stable branch with CONFIG_DEBUG_VM enabled.  It pass the KSM
      testing in LTP and kernel selftest.
      
      Thanks for Matthew Wilcox and David Hildenbrand's suggestions and
      comments!
      
      
      This patch (of 10):
      
      The ksm only contains single pages, so we could add a new func
      ksm_get_folio for get_ksm_page to use folio instead of pages to save a
      couple of compound_head calls.
      
      After all caller replaced, get_ksm_page will be removed.
      
      Link: https://lkml.kernel.org/r/20240411061713.1847574-1-alexs@kernel.org
      Link: https://lkml.kernel.org/r/20240411061713.1847574-2-alexs@kernel.orgSigned-off-by: default avatarAlex Shi (tencent) <alexs@kernel.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Izik Eidus <izik.eidus@ravellosystems.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b91f9472
    • Kefeng Wang's avatar
      arm: mm: drop VM_FAULT_BADMAP/VM_FAULT_BADACCESS · e9016174
      Kefeng Wang authored
      If bad map or access, directly set code to SEGV_MAPRR or SEGV_ACCERR, also
      set fault to 0 and goto error handling, which make us to drop the arch's
      special vm fault reason.
      
      [akpm@linux-foundation.org: coding-style cleanups]
      Link: https://lkml.kernel.org/r/20240411130925.73281-3-wangkefeng.wang@huawei.comSigned-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Aishwarya TCV <aishwarya.tcv@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Cristian Marussi <cristian.marussi@arm.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e9016174
    • Kefeng Wang's avatar
      arm64: mm: drop VM_FAULT_BADMAP/VM_FAULT_BADACCESS · eebb5181
      Kefeng Wang authored
      Patch series "mm: remove arch's private VM_FAULT_BADMAP/BADACCESS", v2.
      
      Directly set SEGV_MAPRR or SEGV_ACCERR for arm/arm64 to remove the last
      two arch's private vm_fault reasons.  
      
      
      This patch (of 2):
      
      If bad map or access, directly set si_code to SEGV_MAPRR or SEGV_ACCERR,
      also set fault to 0 and goto error handling, which make us to drop the
      arch's special vm fault reason.
      
      Link: https://lkml.kernel.org/r/20240411130925.73281-1-wangkefeng.wang@huawei.com
      Link: https://lkml.kernel.org/r/20240411130925.73281-2-wangkefeng.wang@huawei.comSigned-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Aishwarya TCV <aishwarya.tcv@arm.com>
      Cc: Cristian Marussi <cristian.marussi@arm.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      eebb5181
    • David Hildenbrand's avatar
      Documentation/admin-guide/cgroup-v1/memory.rst: don't reference page_mapcount() · 65867060
      David Hildenbrand authored
      Let's stop talking about page_mapcount().
      
      Link: https://lkml.kernel.org/r/20240409192301.907377-19-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Richard Chang <richardycc@google.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yin Fengwei <fengwei.yin@intel.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      65867060
    • David Hildenbrand's avatar
      mm/debug: print only page mapcount (excluding folio entire mapcount) in __dump_folio() · 7441d349
      David Hildenbrand authored
      Let's simplify and only print the page mapcount: we already print the
      large folio mapcount and the entire folio mapcount for large folios
      separately; that should be sufficient to figure out what's happening.
      
      While at it, print the page mapcount also if it had an underflow,
      filtering out only typed pages.
      
      Link: https://lkml.kernel.org/r/20240409192301.907377-18-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Richard Chang <richardycc@google.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yin Fengwei <fengwei.yin@intel.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7441d349
    • David Hildenbrand's avatar
      xtensa/mm: convert check_tlb_entry() to sanity check folios · 5f8856cd
      David Hildenbrand authored
      We want to limit the use of page_mapcount() to the places where it is
      absolutely necessary.  So let's convert check_tlb_entry() to perform
      sanity checks on folios instead of pages.
      
      This essentially already happened: page_count() is mapped to
      folio_ref_count(), and page_mapped() to folio_mapped() internally. 
      However, we would have printed the page_mapount(), which does not really
      match what page_mapped() would have checked.
      
      Let's simply print the folio mapcount to avoid using page_mapcount().  For
      small folios there is no change.
      
      Link: https://lkml.kernel.org/r/20240409192301.907377-17-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Richard Chang <richardycc@google.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yin Fengwei <fengwei.yin@intel.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      5f8856cd
    • David Hildenbrand's avatar
      trace/events/page_ref: trace the raw page mapcount value · 6eca3256
      David Hildenbrand authored
      We want to limit the use of page_mapcount() to the places where it is
      absolutely necessary.  We already trace raw page->refcount, raw
      page->flags and raw page->mapping, and don't involve any folios.  Let's
      also trace the raw mapcount value that does not consider the entire
      mapcount of large folios, and we don't add "1" to it.
      
      When dealing with typed folios, this makes a lot more sense.  ...  and
      it's for debugging purposes only either way.
      
      Link: https://lkml.kernel.org/r/20240409192301.907377-16-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Richard Chang <richardycc@google.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yin Fengwei <fengwei.yin@intel.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6eca3256
    • David Hildenbrand's avatar
      mm/migrate_device: use folio_mapcount() in migrate_vma_check_page() · f2f8a7a0
      David Hildenbrand authored
      We want to limit the use of page_mapcount() to the places where it is
      absolutely necessary.  Let's convert migrate_vma_check_page() to work on a
      folio internally so we can remove the page_mapcount() usage.
      
      Note that we reject any large folios.
      
      There is a lot more folio conversion to be had, but that has to wait for
      another day.  No functional change intended.
      
      Link: https://lkml.kernel.org/r/20240409192301.907377-15-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Richard Chang <richardycc@google.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yin Fengwei <fengwei.yin@intel.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f2f8a7a0
    • David Hildenbrand's avatar
      mm/filemap: use folio_mapcount() in filemap_unaccount_folio() · f0376c71
      David Hildenbrand authored
      We want to limit the use of page_mapcount() to the places where it is
      absolutely necessary.
      
      Let's use folio_mapcount() instead of filemap_unaccount_folio().
      
      No functional change intended, because we're only dealing with small
      folios.
      
      Link: https://lkml.kernel.org/r/20240409192301.907377-14-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Richard Chang <richardycc@google.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yin Fengwei <fengwei.yin@intel.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f0376c71
    • David Hildenbrand's avatar
      sh/mm/cache: use folio_mapped() in copy_from_user_page() · 60706580
      David Hildenbrand authored
      We want to limit the use of page_mapcount() to the places where it is
      absolutely necessary.
      
      We're already using folio_mapped in copy_user_highpage() and
      copy_to_user_page() for a similar purpose so ...  let's also simply use it
      for copy_from_user_page().
      
      There is no change for small folios.  Likely we won't stumble over many
      large folios on sh in that code either way.
      
      Link: https://lkml.kernel.org/r/20240409192301.907377-13-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Richard Chang <richardycc@google.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yin Fengwei <fengwei.yin@intel.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      60706580
    • David Hildenbrand's avatar
      mm/migrate: use folio_likely_mapped_shared() in add_page_for_migration() · 31ce0d7e
      David Hildenbrand authored
      We want to limit the use of page_mapcount() to the places where it is
      absolutely necessary.  In add_page_for_migration(), we actually want to
      check if the folio is mapped shared, to reject such folios.  So let's use
      folio_likely_mapped_shared() instead.
      
      For small folios, fully mapped THP, and hugetlb folios, there is no change.
      For partially mapped, shared THP, we should now do a better job at
      rejecting such folios.
      
      Link: https://lkml.kernel.org/r/20240409192301.907377-12-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Richard Chang <richardycc@google.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yin Fengwei <fengwei.yin@intel.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      31ce0d7e
    • David Hildenbrand's avatar
      mm/page_alloc: use folio_mapped() in __alloc_contig_migrate_range() · 7115936a
      David Hildenbrand authored
      We want to limit the use of page_mapcount() to the places where it is
      absolutely necessary.
      
      For tracing purposes, we use page_mapcount() in
      __alloc_contig_migrate_range().  Adding that mapcount to total_mapped
      sounds strange: total_migrated and total_reclaimed would count each page
      only once, not multiple times.
      
      But then, isolate_migratepages_range() adds each folio only once to the
      list.  So for large folios, we would query the mapcount of the first page
      of the folio, which doesn't make too much sense for large folios.
      
      Let's simply use folio_mapped() * folio_nr_pages(), which makes more sense
      as nr_migratepages is also incremented by the number of pages in the folio
      in case of successful migration.
      
      Link: https://lkml.kernel.org/r/20240409192301.907377-11-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Richard Chang <richardycc@google.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yin Fengwei <fengwei.yin@intel.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7115936a
    • David Hildenbrand's avatar
      mm/memory-failure: use folio_mapcount() in hwpoison_user_mappings() · 33d844bb
      David Hildenbrand authored
      We want to limit the use of page_mapcount() to the places where it is
      absolutely necessary.  We can only unmap full folios; page_mapped(), which
      we check here, is translated to folio_mapped() -- based on
      folio_mapcount().  So let's print the folio mapcount instead.
      
      Link: https://lkml.kernel.org/r/20240409192301.907377-10-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Richard Chang <richardycc@google.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yin Fengwei <fengwei.yin@intel.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      33d844bb
    • David Hildenbrand's avatar
      mm/huge_memory: use folio_mapcount() in zap_huge_pmd() sanity check · 0a7bda48
      David Hildenbrand authored
      We want to limit the use of page_mapcount() to the places where it is
      absolutely necessary.  Let's similarly check for folio_mapcount()
      underflows instead of page_mapcount() underflows like we do in
      zap_present_folio_ptes() now.
      
      Instead of the VM_BUG_ON(), we should actually be doing something like
      print_bad_pte().  For now, let's keep it simple and use WARN_ON_ONCE(),
      performing that check independently of DEBUG_VM.
      
      Link: https://lkml.kernel.org/r/20240409192301.907377-9-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Richard Chang <richardycc@google.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yin Fengwei <fengwei.yin@intel.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0a7bda48
    • David Hildenbrand's avatar
      mm/memory: use folio_mapcount() in zap_present_folio_ptes() · 3aeea4fc
      David Hildenbrand authored
      We want to limit the use of page_mapcount() to the places where it is
      absolutely necessary.  In zap_present_folio_ptes(), let's simply check the
      folio mapcount().  If there is some issue, it will underflow at some point
      either way when unmapping.
      
      As indicated already in commit 10ebac4f ("mm/memory: optimize
      unmap/zap with PTE-mapped THP"), we already documented "If we ever have a
      cheap folio_mapcount(), we might just want to check for underflows
      there.".
      
      There is no change for small folios.  For large folios, we'll now catch
      more underflows when batch-unmapping, because instead of only testing the
      mapcount of the first subpage, we'll test if the folio mapcount
      underflows.
      
      Link: https://lkml.kernel.org/r/20240409192301.907377-8-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Richard Chang <richardycc@google.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yin Fengwei <fengwei.yin@intel.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3aeea4fc
    • David Hildenbrand's avatar
      mm: make folio_mapcount() return 0 for small typed folios · 4103b93b
      David Hildenbrand authored
      We already handle it properly for large folios.  Let's also return "0" for
      small typed folios, like page_mapcount() currently would.
      
      Consequently, folio_mapcount() will never return negative values for typed
      folios, but may return negative values for underflows.
      
      [david@redhat.com: make folio_mapcount() slightly more efficient]
        Link: https://lkml.kernel.org/r/c30fcda1-ed87-46f5-8297-cdedbddac009@redhat.com
      Link: https://lkml.kernel.org/r/20240409192301.907377-7-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Richard Chang <richardycc@google.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yin Fengwei <fengwei.yin@intel.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4103b93b
    • David Hildenbrand's avatar
      mm: improve folio_likely_mapped_shared() using the mapcount of large folios · eefb9b27
      David Hildenbrand authored
      We can now read the mapcount of large folios very efficiently.  Use it to
      improve our handling of partially-mappable folios, falling back to making
      a guess only in case the folio is not "obviously mapped shared".
      
      We can now better detect partially-mappable folios where the first page is
      not mapped as "mapped shared", reducing "false negatives"; but false
      negatives are still possible.
      
      While at it, fixup a wrong comment (false positive vs.  false negative)
      for KSM folios.
      
      Link: https://lkml.kernel.org/r/20240409192301.907377-6-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarYin Fengwei <fengwei.yin@intel.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Richard Chang <richardycc@google.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      eefb9b27
    • David Hildenbrand's avatar
      mm: track mapcount of large folios in single value · 05c5323b
      David Hildenbrand authored
      Let's track the mapcount of large folios in a single value.  The mapcount
      of a large folio currently corresponds to the sum of the entire mapcount
      and all page mapcounts.
      
      This sum is what we actually want to know in folio_mapcount() and it is
      also sufficient for implementing folio_mapped().
      
      With PTE-mapped THP becoming more important and more widely used, we want
      to avoid looping over all pages of a folio just to obtain the mapcount of
      large folios.  The comment "In the common case, avoid the loop when no
      pages mapped by PTE" in folio_total_mapcount() does no longer hold for
      mTHP that are always mapped by PTE.
      
      Further, we are planning on using folio_mapcount() more frequently, and
      might even want to remove page mapcounts for large folios in some kernel
      configs.  Therefore, allow for reading the mapcount of large folios
      efficiently and atomically without looping over any pages.
      
      Maintain the mapcount also for hugetlb pages for simplicity.  Use the new
      mapcount to implement folio_mapcount() and folio_mapped().  Make
      page_mapped() simply call folio_mapped().  We can now get rid of
      folio_large_is_mapped().
      
      _nr_pages_mapped is now only used in rmap code and for debugging purposes.
      Keep folio_nr_pages_mapped() around, but document that its use should be
      limited to rmap internals and debugging purposes.
      
      This change implies one additional atomic add/sub whenever
      mapping/unmapping (parts of) a large folio.
      
      As we now batch RMAP operations for PTE-mapped THP during fork(), during
      unmap/zap, and when PTE-remapping a PMD-mapped THP, and we adjust the
      large mapcount for a PTE batch only once, the added overhead in the common
      case is small.  Only when unmapping individual pages of a large folio
      (e.g., during COW), the overhead might be bigger in comparison, but it's
      essentially one additional atomic operation.
      
      Note that before the new mapcount would overflow, already our refcount
      would overflow: each mapping requires a folio reference.  Extend the
      focumentation of folio_mapcount().
      
      Link: https://lkml.kernel.org/r/20240409192301.907377-5-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarYin Fengwei <fengwei.yin@intel.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Richard Chang <richardycc@google.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      05c5323b
    • David Hildenbrand's avatar
      mm/rmap: add fast-path for small folios when adding/removing/duplicating · 46d62de7
      David Hildenbrand authored
      Let's add a fast-path for small folios to all relevant rmap functions. 
      Note that only RMAP_LEVEL_PTE applies.
      
      This is a preparation for tracking the mapcount of large folios in a
      single value.
      
      Link: https://lkml.kernel.org/r/20240409192301.907377-4-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarYin Fengwei <fengwei.yin@intel.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Richard Chang <richardycc@google.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      46d62de7
    • David Hildenbrand's avatar
      mm/rmap: always inline anon/file rmap duplication of a single PTE · c2e65ebc
      David Hildenbrand authored
      As we grow the code, the compiler might make stupid decisions and
      unnecessarily degrade fork() performance.  Let's make sure to always
      inline functions that operate on a single PTE so the compiler will always
      optimize out the loop and avoid a function call.
      
      This is a preparation for maintining a total mapcount for large folios.
      
      Link: https://lkml.kernel.org/r/20240409192301.907377-3-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarYin Fengwei <fengwei.yin@intel.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Richard Chang <richardycc@google.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c2e65ebc
    • David Hildenbrand's avatar
      mm: allow for detecting underflows with page_mapcount() again · 02faa73f
      David Hildenbrand authored
      Patch series "mm: mapcount for large folios + page_mapcount() cleanups".
      
      This series tracks the mapcount of large folios in a single value, so it
      can be read efficiently and atomically, just like the mapcount of small
      folios.
      
      folio_mapcount() is then used in a couple more places, most notably to
      reduce false negatives in folio_likely_mapped_shared(), and many users of
      page_mapcount() are cleaned up (that's maybe why you got CCed on the full
      series, sorry sh+xtensa folks!  :) ).
      
      The remaining s390x user and one KSM user of page_mapcount() are getting
      removed separately on the list right now.  I have patches to handle the
      other KSM one, the khugepaged one and the kpagecount one; as they are not
      as "obvious", I will send them out separately in the future.  Once that is
      all in place, I'm planning on moving page_mapcount() into
      fs/proc/task_mmu.c, the remaining user for the time being (and we can
      discuss at LSF/MM details on that :) ).
      
      I proposed the mapcount for large folios (previously called total
      mapcount) originally in part of [1] and I later included it in [2] where
      it is a requirement.  In the meantime, I changed the patch a bit so I
      dropped all RB's.  During the discussion of [1], Peter Xu correctly raised
      that this additional tracking might affect the performance when PMD->PTE
      remapping THPs.  In the meantime.  I addressed that by batching RMAP
      operations during fork(), unmap/zap and when PMD->PTE remapping THPs.
      
      Running some of my micro-benchmarks [3] (fork,munmap,cow-byte,remap) on 1
      GiB of memory backed by folios with the same order, I observe the
      following on an Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz tuned for
      reproducible results as much as possible:
      
      Standard deviation is mostly < 1%, except for order-9, where it's < 2% for
      fork() and munmap().
      
      (1) Small folios are not affected (< 1%) in all 4 microbenchmarks.
      (2) Order-4 folios are not affected (< 1%) in all 4 microbenchmarks. A bit
          weird comapred to the other orders ...
      (3) PMD->PTE remapping of order-9 THPs is not affected (< 1%)
      (4) COW-byte (COWing a single page by writing a single byte) is not
          affected for any order (< 1 %). The page copy_fault overhead dominates
          everything.
      (5) fork() is mostly not affected (< 1%), except order-2, where we have
          a slowdown of ~4%. Already for order-3 folios, we're down to a slowdown
          of < 1%.
      (6) munmap() sees a slowdown by < 3% for some orders (order-5,
          order-6, order-9), but less for others (< 1% for order-4 and order-8,
          < 2% for order-2, order-3, order-7).
      
      Especially the fork() and munmap() benchmark are sensitive to each added
      instruction and other system noise, so I suspect some of the change and
      observed weirdness (order-4) is due to code layout changes and other
      factors, but not really due to the added atomics.
      
      So in the common case where we can batch, the added atomics don't really
      make a big difference, especially in light of the recent improvements for
      large folios that we recently gained due to batching.  Surprisingly, for
      some cases where we cannot batch (e.g., COW), the added atomics don't seem
      to matter, because other overhead dominates.
      
      My fork and munmap micro-benchmarks don't cover cases where we cannot
      batch-process bigger parts of large folios.  As this is not the common
      case, I'm not worrying about that right now.
      
      Future work is batching RMAP operations during swapout and folio
      migration.
      
      [1] https://lore.kernel.org/all/20230809083256.699513-1-david@redhat.com/
      [2] https://lore.kernel.org/all/20231124132626.235350-1-david@redhat.com/
      [3] https://gitlab.com/davidhildenbrand/scratchspace/-/raw/main/pte-mapped-folio-benchmarks.c?ref_type=heads
      
      
      This patch (of 18):
      
      Commit 53277bcf126d ("mm: support page_mapcount() on page_has_type()
      pages") made it impossible to detect mapcount underflows by treating any
      negative raw mapcount value as a mapcount of 0.
      
      We perform such underflow checks in zap_present_folio_ptes() and
      zap_huge_pmd(), which would currently no longer trigger.
      
      Let's check against PAGE_MAPCOUNT_RESERVE instead by using
      page_type_has_type(), like page_has_type() would, so we can still catch
      some underflows.
      
      [david@redhat.com: make page_mapcount() slighly more efficient]
        Link: https://lkml.kernel.org/r/1af4fd61-7926-47c8-be45-833c0dbec08b@redhat.com
      Link: https://lkml.kernel.org/r/20240409192301.907377-1-david@redhat.com
      Link: https://lkml.kernel.org/r/20240409192301.907377-2-david@redhat.com
      Fixes: 53277bcf126d ("mm: support page_mapcount() on page_has_type() pages")
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Richard Chang <richardycc@google.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yin Fengwei <fengwei.yin@intel.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      02faa73f
    • David Hildenbrand's avatar
      mm: follow_pte() improvements · c5541ba3
      David Hildenbrand authored
      follow_pte() is now our main function to lookup PTEs in VM_PFNMAP/VM_IO
      VMAs.  Let's perform some more sanity checks to make this exported
      function harder to abuse.
      
      Further, extend the doc a bit, it still focuses on the KVM use case with
      MMU notifiers.  Drop the KVM+follow_pfn() comment, follow_pfn() is no
      more, and we have other users nowadays.
      
      Also extend the doc regarding refcounted pages and the interaction with
      MMU notifiers.
      
      KVM is one example that uses MMU notifiers and can deal with refcounted
      pages properly.  VFIO is one example that doesn't use MMU notifiers, and
      to prevent use-after-free, rejects refcounted pages: pfn_valid(pfn) &&
      !PageReserved(pfn_to_page(pfn)).  Protection changes are less of a concern
      for users like VFIO: the behavior is similar to longterm-pinning a page,
      and getting the PTE protection changed afterwards.
      
      The primary concern with refcounted pages is use-after-free, which callers
      should be aware of.
      
      Link: https://lkml.kernel.org/r/20240410155527.474777-4-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Fei Li <fei1.li@intel.com>
      Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Yonghua Huang <yonghua.huang@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c5541ba3
    • David Hildenbrand's avatar
      mm: pass VMA instead of MM to follow_pte() · 29ae7d96
      David Hildenbrand authored
      ... and centralize the VM_IO/VM_PFNMAP sanity check in there. We'll
      now also perform these sanity checks for direct follow_pte()
      invocations.
      
      For generic_access_phys(), we might now check multiple times: nothing to
      worry about, really.
      
      Link: https://lkml.kernel.org/r/20240410155527.474777-3-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: Sean Christopherson <seanjc@google.com>	[KVM]
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Fei Li <fei1.li@intel.com>
      Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Yonghua Huang <yonghua.huang@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      29ae7d96
    • David Hildenbrand's avatar
      drivers/virt/acrn: fix PFNMAP PTE checks in acrn_vm_ram_map() · 3d658600
      David Hildenbrand authored
      Patch series "mm: follow_pte() improvements and acrn follow_pte() fixes".
      
      Patch #1 fixes a bunch of issues I spotted in the acrn driver.  It
      compiles, that's all I know.  I'll appreciate some review and testing from
      acrn folks.
      
      Patch #2+#3 improve follow_pte(), passing a VMA instead of the MM, adding
      more sanity checks, and improving the documentation.  Gave it a quick test
      on x86-64 using VM_PAT that ends up using follow_pte().
      
      
      This patch (of 3):
      
      We currently miss handling various cases, resulting in a dangerous
      follow_pte() (previously follow_pfn()) usage.
      
      (1) We're not checking PTE write permissions.
      
      Maybe we should simply always require pte_write() like we do for
      pin_user_pages_fast(FOLL_WRITE)? Hard to tell, so let's check for
      ACRN_MEM_ACCESS_WRITE for now.
      
      (2) We're not rejecting refcounted pages.
      
      As we are not using MMU notifiers, messing with refcounted pages is
      dangerous and can result in use-after-free. Let's make sure to reject them.
      
      (3) We are only looking at the first PTE of a bigger range.
      
      We only lookup a single PTE, but memmap->len may span a larger area.
      Let's loop over all involved PTEs and make sure the PFN range is
      actually contiguous. Reject everything else: it couldn't have worked
      either way, and rather made use access PFNs we shouldn't be accessing.
      
      Link: https://lkml.kernel.org/r/20240410155527.474777-1-david@redhat.com
      Link: https://lkml.kernel.org/r/20240410155527.474777-2-david@redhat.com
      Fixes: 8a6e85f7 ("virt: acrn: obtain pa from VMA with PFNMAP flag")
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Fei Li <fei1.li@intel.com>
      Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Yonghua Huang <yonghua.huang@intel.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3d658600