1. 18 Apr, 2023 40 commits
    • Yosry Ahmed's avatar
      mm: vmscan: refactor updating current->reclaim_state · c7b23b68
      Yosry Ahmed authored
      During reclaim, we keep track of pages reclaimed from other means than
      LRU-based reclaim through scan_control->reclaim_state->reclaimed_slab,
      which we stash a pointer to in current task_struct.
      
      However, we keep track of more than just reclaimed slab pages through
      this.  We also use it for clean file pages dropped through pruned inodes,
      and xfs buffer pages freed.  Rename reclaimed_slab to reclaimed, and add a
      helper function that wraps updating it through current, so that future
      changes to this logic are contained within include/linux/swap.h.
      
      Link: https://lkml.kernel.org/r/20230413104034.1086717-4-yosryahmed@google.comSigned-off-by: default avatarYosry Ahmed <yosryahmed@google.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Darrick J. Wong <djwong@kernel.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: NeilBrown <neilb@suse.de>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c7b23b68
    • Yosry Ahmed's avatar
      mm: vmscan: move set_task_reclaim_state() near flush_reclaim_state() · ef05e689
      Yosry Ahmed authored
      Move set_task_reclaim_state() near flush_reclaim_state() so that all
      helpers manipulating reclaim_state are in close proximity.
      
      Link: https://lkml.kernel.org/r/20230413104034.1086717-3-yosryahmed@google.comSigned-off-by: default avatarYosry Ahmed <yosryahmed@google.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Darrick J. Wong <djwong@kernel.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: NeilBrown <neilb@suse.de>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ef05e689
    • Yosry Ahmed's avatar
      mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim · 583c27a1
      Yosry Ahmed authored
      Patch series "Ignore non-LRU-based reclaim in memcg reclaim", v6.
      
      Upon running some proactive reclaim tests using memory.reclaim, we noticed
      some tests flaking where writing to memory.reclaim would be successful
      even though we did not reclaim the requested amount fully Looking further
      into it, I discovered that *sometimes* we overestimate the number of
      reclaimed pages in memcg reclaim.
      
      Reclaimed pages through other means than LRU-based reclaim are tracked
      through reclaim_state in struct scan_control, which is stashed in current
      task_struct.  These pages are added to the number of reclaimed pages
      through LRUs.  For memcg reclaim, these pages generally cannot be linked
      to the memcg under reclaim and can cause an overestimated count of
      reclaimed pages.  This short series tries to address that.
      
      Patch 1 ignores pages reclaimed outside of LRU reclaim in memcg reclaim. 
      The pages are uncharged anyway, so even if we end up under-reporting
      reclaimed pages we will still succeed in making progress during charging.
      
      Patches 2-3 are just refactoring.  Patch 2 moves set_reclaim_state()
      helper next to flush_reclaim_state().  Patch 3 adds a helper that wraps
      updating current->reclaim_state, and renames reclaim_state->reclaimed_slab
      to reclaim_state->reclaimed.
      
      
      This patch (of 3):
      
      We keep track of different types of reclaimed pages through
      reclaim_state->reclaimed_slab, and we add them to the reported number of
      reclaimed pages.  For non-memcg reclaim, this makes sense.  For memcg
      reclaim, we have no clue if those pages are charged to the memcg under
      reclaim.
      
      Slab pages are shared by different memcgs, so a freed slab page may have
      only been partially charged to the memcg under reclaim.  The same goes for
      clean file pages from pruned inodes (on highmem systems) or xfs buffer
      pages, there is no simple way to currently link them to the memcg under
      reclaim.
      
      Stop reporting those freed pages as reclaimed pages during memcg reclaim. 
      This should make the return value of writing to memory.reclaim, and may
      help reduce unnecessary reclaim retries during memcg charging.  Writing to
      memory.reclaim on the root memcg is considered as cgroup_reclaim(), but
      for this case we want to include any freed pages, so use the
      global_reclaim() check instead of !cgroup_reclaim().
      
      Generally, this should make the return value of
      try_to_free_mem_cgroup_pages() more accurate.  In some limited cases (e.g.
      freed a slab page that was mostly charged to the memcg under reclaim),
      the return value of try_to_free_mem_cgroup_pages() can be underestimated,
      but this should be fine.  The freed pages will be uncharged anyway, and we
      can charge the memcg the next time around as we usually do memcg reclaim
      in a retry loop.
      
      Link: https://lkml.kernel.org/r/20230413104034.1086717-1-yosryahmed@google.com
      Link: https://lkml.kernel.org/r/20230413104034.1086717-2-yosryahmed@google.com
      Fixes: f2fe7b09 ("mm: memcg/slab: charge individual slab objects
      instead of pages")
      Signed-off-by: default avatarYosry Ahmed <yosryahmed@google.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Darrick J. Wong <djwong@kernel.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: NeilBrown <neilb@suse.de>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      583c27a1
    • Alexander Potapenko's avatar
      mm: apply __must_check to vmap_pages_range_noflush() · d905ae2b
      Alexander Potapenko authored
      To prevent errors when vmap_pages_range_noflush() or
      __vmap_pages_range_noflush() silently fail (see the link below for an
      example), annotate them with __must_check so that the callers do not
      unconditionally assume the mapping succeeded.
      
      Link: https://lkml.kernel.org/r/20230413131223.4135168-4-glider@google.comSigned-off-by: default avatarAlexander Potapenko <glider@google.com>
      Reported-by: default avatarDipanjan Das <mail.dipanjan.das@gmail.com>
        Link: https://lore.kernel.org/linux-mm/CANX2M5ZRrRA64k0hOif02TjmY9kbbO2aCBPyq79es34RXZ=cAw@mail.gmail.com/Reviewed-by: default avatarMarco Elver <elver@google.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d905ae2b
    • Alexander Potapenko's avatar
      mm: kmsan: apply __must_check to non-void functions · bb1508c2
      Alexander Potapenko authored
      Non-void KMSAN hooks may return error codes that indicate that KMSAN
      failed to reflect the changed memory state in the metadata (e.g.  it could
      not create the necessary memory mappings).  In such cases the callers
      should handle the errors to prevent the tool from using the inconsistent
      metadata in the future.
      
      We mark non-void hooks with __must_check so that error handling is not
      skipped.
      
      Link: https://lkml.kernel.org/r/20230413131223.4135168-3-glider@google.comSigned-off-by: default avatarAlexander Potapenko <glider@google.com>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dipanjan Das <mail.dipanjan.das@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      bb1508c2
    • Liu Shixin's avatar
      mm: hwpoison: support recovery from HugePage copy-on-write faults · 1cb9dc4b
      Liu Shixin authored
      copy-on-write of hugetlb user pages with uncorrectable errors will result
      in a kernel crash.  This is because the copy is performed in kernel mode
      and in general we can not handle accessing memory with such errors while
      in kernel mode.  Commit a873dfe1 ("mm, hwpoison: try to recover from
      copy-on write faults") introduced the routine copy_user_highpage_mc() to
      gracefully handle copying of user pages with uncorrectable errors. 
      However, the separate hugetlb copy-on-write code paths were not modified
      as part of commit a873dfe1.
      
      Modify hugetlb copy-on-write code paths to use copy_mc_user_highpage() so
      that they can also gracefully handle uncorrectable errors in user pages. 
      This involves changing the hugetlb specific routine
      copy_user_large_folio() from type void to int so that it can return an
      error.  Modify the hugetlb userfaultfd code in the same way so that it can
      return -EHWPOISON if it encounters an uncorrectable error.
      
      Link: https://lkml.kernel.org/r/20230413131349.2524210-1-liushixin2@huawei.comSigned-off-by: default avatarLiu Shixin <liushixin2@huawei.com>
      Acked-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Tony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1cb9dc4b
    • Yosry Ahmed's avatar
      memcg: page_cgroup_ino() get memcg from the page's folio · ec342603
      Yosry Ahmed authored
      In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we
      observed a warning from page_cgroup_ino() when reading /proc/kpagecgroup. 
      This warning was added to catch fragile reads of a page memcg.  Make
      page_cgroup_ino() get memcg from the page's folio using
      folio_memcg_check(): that gives it the correct memcg for each page of a
      folio, so is the right fix.
      
      Note that page_folio() is racy, the page's folio can change from under us,
      but the entire function is racy and documented as such.
      
      I dithered between the right fix and the safer "fix": it's unlikely but
      conceivable that some userspace has learnt that /proc/kpagecgroup gives no
      memcg on tail pages, and compensates for that in some (racy) way: so
      continuing to give no memcg on tails, without warning, might be safer.
      
      But hwpoison_filter_task(), the only other user of page_cgroup_ino(),
      persuaded me.  It looks as if it currently leaves out tail pages of the
      selected memcg, by mistake: whereas hwpoison_inject() uses compound_head()
      and expects the tails to be included.  So hwpoison testing coverage has
      probably been restricted by the wrong output from page_cgroup_ino() (if
      that memcg filter is used at all): in the short term, it might be safer
      not to enable wider coverage there, but long term we would regret that.
      
      This is based on a patch originally written by Hugh Dickins and retains
      most of the original commit log [1]
      
      The patch was changed to use folio_memcg_check(page_folio(page)) instead
      of page_memcg_check(compound_head(page)) based on discussions with Matthew
      Wilcox; where he stated that callers of page_memcg_check() should stop
      using it due to the ambiguity around tail pages -- instead they should use
      folio_memcg_check() and handle tail pages themselves.
      
      Link: https://lkml.kernel.org/r/20230412003451.4018887-1-yosryahmed@google.com
      Link: https://lore.kernel.org/linux-mm/20230313083452.1319968-1-yosryahmed@google.com/ [1]
      Signed-off-by: default avatarYosry Ahmed <yosryahmed@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ec342603
    • Aneesh Kumar K.V's avatar
      mm/hugetlb_vmemmap: rename ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP · 0b376f1e
      Aneesh Kumar K.V authored
      Now we use ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP config option to
      indicate devdax and hugetlb vmemmap optimization support.  Hence rename
      that to a generic ARCH_WANT_OPTIMIZE_VMEMMAP
      
      Link: https://lkml.kernel.org/r/20230412050025.84346-2-aneesh.kumar@linux.ibm.comSigned-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Cc: Joao Martins <joao.m.martins@oracle.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Tarun Sahu <tsahu@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0b376f1e
    • Aneesh Kumar K.V's avatar
      mm/vmemmap/devdax: fix kernel crash when probing devdax devices · 87a7ae75
      Aneesh Kumar K.V authored
      commit 4917f55b ("mm/sparse-vmemmap: improve memory savings for
      compound devmaps") added support for using optimized vmmemap for devdax
      devices.  But how vmemmap mappings are created are architecture specific. 
      For example, powerpc with hash translation doesn't have vmemmap mappings
      in init_mm page table instead they are bolted table entries in the
      hardware page table
      
      vmemmap_populate_compound_pages() used by vmemmap optimization code is not
      aware of these architecture-specific mapping.  Hence allow architecture to
      opt for this feature.  I selected architectures supporting
      HUGETLB_PAGE_OPTIMIZE_VMEMMAP option as also supporting this feature.
      
      This patch fixes the below crash on ppc64.
      
      BUG: Unable to handle kernel data access on write at 0xc00c000100400038
      Faulting instruction address: 0xc000000001269d90
      Oops: Kernel access of bad area, sig: 11 [#1]
      LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
      Modules linked in:
      CPU: 7 PID: 1 Comm: swapper/0 Not tainted 6.3.0-rc5-150500.34-default+ #2 5c90a668b6bbd142599890245c2fb5de19d7d28a
      Hardware name: IBM,9009-42G POWER9 (raw) 0x4e0202 0xf000005 of:IBM,FW950.40 (VL950_099) hv:phyp pSeries
      NIP:  c000000001269d90 LR: c0000000004c57d4 CTR: 0000000000000000
      REGS: c000000003632c30 TRAP: 0300   Not tainted  (6.3.0-rc5-150500.34-default+)
      MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24842228  XER: 00000000
      CFAR: c0000000004c57d0 DAR: c00c000100400038 DSISR: 42000000 IRQMASK: 0
      ....
      NIP [c000000001269d90] __init_single_page.isra.74+0x14/0x4c
      LR [c0000000004c57d4] __init_zone_device_page+0x44/0xd0
      Call Trace:
      [c000000003632ed0] [c000000003632f60] 0xc000000003632f60 (unreliable)
      [c000000003632f10] [c0000000004c5ca0] memmap_init_zone_device+0x170/0x250
      [c000000003632fe0] [c0000000005575f8] memremap_pages+0x2c8/0x7f0
      [c0000000036330c0] [c000000000557b5c] devm_memremap_pages+0x3c/0xa0
      [c000000003633100] [c000000000d458a8] dev_dax_probe+0x108/0x3e0
      [c0000000036331a0] [c000000000d41430] dax_bus_probe+0xb0/0x140
      [c0000000036331d0] [c000000000cef27c] really_probe+0x19c/0x520
      [c000000003633260] [c000000000cef6b4] __driver_probe_device+0xb4/0x230
      [c0000000036332e0] [c000000000cef888] driver_probe_device+0x58/0x120
      [c000000003633320] [c000000000cefa6c] __device_attach_driver+0x11c/0x1e0
      [c0000000036333a0] [c000000000cebc58] bus_for_each_drv+0xa8/0x130
      [c000000003633400] [c000000000ceefcc] __device_attach+0x15c/0x250
      [c0000000036334a0] [c000000000ced458] bus_probe_device+0x108/0x110
      [c0000000036334f0] [c000000000ce92dc] device_add+0x7fc/0xa10
      [c0000000036335b0] [c000000000d447c8] devm_create_dev_dax+0x1d8/0x530
      [c000000003633640] [c000000000d46b60] __dax_pmem_probe+0x200/0x270
      [c0000000036337b0] [c000000000d46bf0] dax_pmem_probe+0x20/0x70
      [c0000000036337d0] [c000000000d2279c] nvdimm_bus_probe+0xac/0x2b0
      [c000000003633860] [c000000000cef27c] really_probe+0x19c/0x520
      [c0000000036338f0] [c000000000cef6b4] __driver_probe_device+0xb4/0x230
      [c000000003633970] [c000000000cef888] driver_probe_device+0x58/0x120
      [c0000000036339b0] [c000000000cefd08] __driver_attach+0x1d8/0x240
      [c000000003633a30] [c000000000cebb04] bus_for_each_dev+0xb4/0x130
      [c000000003633a90] [c000000000cee564] driver_attach+0x34/0x50
      [c000000003633ab0] [c000000000ced878] bus_add_driver+0x218/0x300
      [c000000003633b40] [c000000000cf1144] driver_register+0xa4/0x1b0
      [c000000003633bb0] [c000000000d21a0c] __nd_driver_register+0x5c/0x100
      [c000000003633c10] [c00000000206a2e8] dax_pmem_init+0x34/0x48
      [c000000003633c30] [c0000000000132d0] do_one_initcall+0x60/0x320
      [c000000003633d00] [c0000000020051b0] kernel_init_freeable+0x360/0x400
      [c000000003633de0] [c000000000013764] kernel_init+0x34/0x1d0
      [c000000003633e50] [c00000000000de14] ret_from_kernel_thread+0x5c/0x64
      
      Link: https://lkml.kernel.org/r/20230411142214.64464-1-aneesh.kumar@linux.ibm.com
      Fixes: 4917f55b ("mm/sparse-vmemmap: improve memory savings for compound devmaps")
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Reported-by: default avatarTarun Sahu <tsahu@linux.ibm.com>
      Reviewed-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      87a7ae75
    • Peter Xu's avatar
      selftests/mm: add uffdio register ioctls test · 43759d44
      Peter Xu authored
      This new test tests against the returned ioctls from UFFDIO_REGISTER,
      where put into uffdio_register.ioctls.
      
      This also tests the expected failure cases of UFFDIO_REGISTER, aka:
      
        - Register with empty mode should fail with -EINVAL
        - Register minor without page cache (anon) should fail with -EINVAL
      
      Link: https://lkml.kernel.org/r/20230412164548.329376-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      43759d44
    • Peter Xu's avatar
      selftests/mm: add shmem-private test to uffd-stress · 5aec236f
      Peter Xu authored
      The userfaultfd stress test never tested private shmem, which I think was
      overlooked long due.  Add it so it matches with uffd unit test and it'll
      cover all memory supported with the three memory types.
      
      Meanwhile, rename the memory types a bit.  Considering shared mem is the
      major use case for both shmem / hugetlbfs, changing from:
      
        anon, hugetlb, hugetlb_shared, shmem
      
      To (with shmem-private added):
      
        anon, hugetlb, hugetlb-private, shmem, shmem-private
      
      Add the shmem-private to run_vmtests.sh too.
      
      Link: https://lkml.kernel.org/r/20230412164546.329355-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      5aec236f
    • Peter Xu's avatar
      selftests/mm: drop sys/dev test in uffd-stress test · 111fd29b
      Peter Xu authored
      With the new uffd unit test covering the /dev/userfaultfd path and syscall
      path of uffd initializations, we can safely drop the devnode test in the
      old stress test.
      
      One thing is to avoid duplication of running the stress test twice which is
      an overkill to only test the /dev/ interface in run_vmtests.sh.
      
      The other benefit is now all uffd tests (that uses userfaultfd_open) can
      run automatically as long as any type of interface is enabled (either
      syscall or dev), so it's more likely to succeed rather than fail due to
      unprivilege.
      
      With this patch lands, we can drop all the "mem_type:XXX" handlings too.
      
      Link: https://lkml.kernel.org/r/20230412164525.329176-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      111fd29b
    • Peter Xu's avatar
      selftests/mm: allow uffd test to skip properly with no privilege · f9da2426
      Peter Xu authored
      Allow skip a unit test properly due to no privilege (e.g.  sigbus and
      events tests).
      
      [colin.i.king@gmail.com: fix spelling mistake "priviledge" -> "privilege"]
        Link: https://lkml.kernel.org/r/20230414081506.1678998-1-colin.i.king@gmail.com
      Link: https://lkml.kernel.org/r/20230412164520.329163-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarColin Ian King <colin.i.king@gmail.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f9da2426
    • Peter Xu's avatar
      selftests/mm: workaround no way to detect uffd-minor + wp · 4df9cefa
      Peter Xu authored
      Userfaultfd minor+wp mode was very recently added.  The test will fail on
      the old kernels at ioctl(UFFDIO_CONTINUE) which is misterious. 
      Unfortunately there's no feature bit to detect for this support.
      
      Add a hack to leverage WP_UNPOPULATED to detect whether that feature
      existed, since WP_UNPOPULATED was merged right after minor+wp.
      
      Link: https://lkml.kernel.org/r/20230412164517.329152-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4df9cefa
    • Peter Xu's avatar
      selftests/mm: move zeropage test into uffd unit tests · c3315502
      Peter Xu authored
      Simplifies it a bit along the way, e.g., drop the never used offset field
      (which was always the 1st page so offset=0).
      
      Introduce uffd_register_with_ioctls() out of uffd_register() to detect
      uffdio_register.ioctls got returned.  Check that automatically when testing
      UFFDIO_ZEROPAGE on different types of memory (and kernel).
      
      Link: https://lkml.kernel.org/r/20230412164404.328815-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c3315502
    • Peter Xu's avatar
      selftests/mm: move uffd sig/events tests into uffd unit tests · 73c1ea93
      Peter Xu authored
      Move the two tests into the unit test, and convert it into 20 standalone
      tests:
      
        - events test on all 5 mem types, with wp on/off
        - signal test on all 5 mem types, with wp on/off
      
        Testing sigbus on anon... done
        Testing sigbus on shmem... done
        Testing sigbus on shmem-private... done
        Testing sigbus on hugetlb... done
        Testing sigbus on hugetlb-private... done
        Testing sigbus-wp on anon... done
        Testing sigbus-wp on shmem... done
        Testing sigbus-wp on shmem-private... done
        Testing sigbus-wp on hugetlb... done
        Testing sigbus-wp on hugetlb-private... done
        Testing events on anon... done
        Testing events on shmem... done
        Testing events on shmem-private... done
        Testing events on hugetlb... done
        Testing events on hugetlb-private... done
        Testing events-wp on anon... done
        Testing events-wp on shmem... done
        Testing events-wp on shmem-private... done
        Testing events-wp on hugetlb... done
        Testing events-wp on hugetlb-private... done
      
      It'll also remove a lot of global references along the way,
      e.g. test_uffdio_wp will be replaced with the wp value passed over.
      
      Link: https://lkml.kernel.org/r/20230412164400.328798-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      73c1ea93
    • Peter Xu's avatar
      selftests/mm: move uffd minor test to unit test · 62515b5f
      Peter Xu authored
      This moves the minor test to the new unit test.
      
      Rewrite the content check with char* opeartions to avoid fiddling with
      my_bcmp().
      
      Drop global vars test_uffdio_minor and test_collapse, just assume test them
      always in common code for now.
      
      OTOH make this single test into five tests:
      
        - minor test on [shmem, hugetlb] with wp=false
        - minor test on [shmem, hugetlb] with wp=true
        - minor test + collapse on shmem only
      
      One thing to mention that we used to test COLLAPSE+WP but that doesn't
      sound right at all.  It's possible it's silently broken but unnoticed
      because COLLAPSE is not part of the default test suite.
      
      Make the MADV_COLLAPSE test fail-able (by skip it when failing), because
      it's not guaranteed to success anyway.
      
      Drop a bunch of useless code after the move, because the unit test always
      use aligned num of pages and has nothing to do with n_cpus.
      
      Link: https://lkml.kernel.org/r/20230412164357.328779-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      62515b5f
    • Peter Xu's avatar
      selftests/mm: move uffd pagemap test to unit test · 8bda424f
      Peter Xu authored
      Move it over and make it split into two tests, one for pagemap and one for
      the new WP_UNPOPULATED (to be a separate one).
      
      The thp pagemap test wasn't really working (with MADV_HUGEPAGE).  Let's
      just drop it (since it never really worked anyway..) and leave that for
      later.
      
      Link: https://lkml.kernel.org/r/20230412164352.328733-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8bda424f
    • Peter Xu's avatar
      selftests/mm: add framework for uffd-unit-test · 16a45b57
      Peter Xu authored
      Add a framework to be prepared to move unit tests from uffd-stress.c into
      uffd-unit-tests.c.  The goal is to allow detection of uffd features for
      each test, and also loop over specified types of memory that a test
      support.
      
      Link: https://lkml.kernel.org/r/20230412164348.328710-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      16a45b57
    • Peter Xu's avatar
      selftests/mm: allow allocate_area() to fail properly · be39fec4
      Peter Xu authored
      Mostly to detect hugetlb allocation errors and skip hugetlb tests when
      pages are not allocated.
      
      Link: https://lkml.kernel.org/r/20230412164345.328659-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      be39fec4
    • Peter Xu's avatar
      selftests/mm: let uffd_handle_page_fault() take wp parameter · 0210c43e
      Peter Xu authored
      Make the handler optionally apply WP bit when resolving page faults for
      either missing or minor page faults.  This moves towards removing global
      test_uffdio_wp outside of the common code.
      
      Link: https://lkml.kernel.org/r/20230412164341.328618-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0210c43e
    • Peter Xu's avatar
      selftests/mm: rename uffd_stats to uffd_args · 50834084
      Peter Xu authored
      Prepare for adding more fields into the struct.
      
      Link: https://lkml.kernel.org/r/20230412164337.328607-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Suggested-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      50834084
    • Peter Xu's avatar
      selftests/mm: drop global hpage_size in uffd tests · 265818ef
      Peter Xu authored
      hpage_size was wrongly used.  Sometimes it means hugetlb default size,
      sometimes it was used as thp size.
      
      Remove the global variable and use the right one at each place.
      
      Link: https://lkml.kernel.org/r/20230412164333.328596-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      265818ef
    • Peter Xu's avatar
      selftests/mm: drop global mem_fd in uffd tests · c5cb9036
      Peter Xu authored
      Drop it by creating the memfd dynamically in the tests.
      
      Link: https://lkml.kernel.org/r/20230412164331.328584-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c5cb9036
    • Peter Xu's avatar
      selftests/mm: UFFDIO_API test · d5433ce8
      Peter Xu authored
      Add one simple test for UFFDIO_API.  With that, I also added a bunch of
      small but handy helpers along the way.
      
      Link: https://lkml.kernel.org/r/20230412164257.328375-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d5433ce8
    • Peter Xu's avatar
      selftests/mm: uffd_open_{dev|sys}() · 78391f64
      Peter Xu authored
      Provide two helpers to open an uffd handle.  Drop the error checks around
      SKIPs because it's inside an errexit() anyway, which IMHO doesn't really
      help much if the test will not continue.
      
      Link: https://lkml.kernel.org/r/20230412164254.328335-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Reviewed-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      78391f64
    • Peter Xu's avatar
      selftests/mm: uffd_[un]register() · c4277cb6
      Peter Xu authored
      Add two helpers to register/unregister to an uffd.  Use them to drop
      duplicate codes.
      
      This patch also drops assert_expected_ioctls_present() and
      get_expected_ioctls().  Reasons:
      
        - It'll need a lot of effort to pass test_type==HUGETLB into it from
          the upper, so it's the simplest way to get rid of another global var
      
        - The ioctls returned in UFFDIO_REGISTER is hardly useful at all,
          because any app can already detect kernel support on any ioctl via its
          corresponding UFFD_FEATURE_*.  The check here is for sanity mostly but
          it's probably destined no user app will even use it.
      
        - It's not friendly to one future goal of uffd to run on old
          kernels, the problem is get_expected_ioctls() compiles against
          UFFD_API_RANGE_IOCTLS, which is a value that can change depending on
          where the test is compiled, rather than reflecting what the kernel
          underneath has.  It means it'll report false negatives on old kernels
          so it's against our will.
      
      So let's make our lives easier.
      
      [peterx@redhat.com; tools/testing/selftests/mm/hugepage-mremap.c: add headers]
        Link: https://lkml.kernel.org/r/ZDxrvZh/cw357D8P@x1n
      Link: https://lkml.kernel.org/r/20230412164247.328293-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c4277cb6
    • Peter Xu's avatar
      selftests/mm: split uffd tests into uffd-stress and uffd-unit-tests · 686a8bb7
      Peter Xu authored
      In many ways it's weird and unwanted to keep all the tests in the same
      userfaultfd.c at least when still in the current way.
      
      For example, it doesn't make much sense to run the stress test for each
      method we can create an userfaultfd handle (either via syscall or /dev/
      node).  It's a waste of time running this twice for the whole stress as
      the stress paths are the same, only the open path is different.
      
      It's also just weird to need to manually specify different types of memory
      to run all unit tests for the userfaultfd interface.  We should be able to
      just run a single program and that should go through all functional uffd
      tests without running the stress test at all.  The stress test was more
      for torturing and finding race conditions.  We don't want to wait for
      stress to finish just to regress test a functional test.
      
      When we start to pile up more things on top of the same file and same
      functions, things start to go a bit chaos and the code is just harder to
      maintain too with tons of global variables.
      
      This patch creates a new test uffd-unit-tests to keep userfaultfd unit
      tests in the future, currently empty.
      
      Meanwhile rename the old userfaultfd.c test to uffd-stress.c.
      
      Link: https://lkml.kernel.org/r/20230412164244.328270-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Reviewed-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      686a8bb7
    • Peter Xu's avatar
      selftests/mm: create uffd-common.[ch] · 33be4e89
      Peter Xu authored
      Move common utility functions into uffd-common.[ch] files from the
      original userfaultfd.c.  This prepares for a split of userfaultfd.c into
      two tests: one to only cover the old but powerful stress test, the other
      one covers all the functional tests.
      
      This movement is kind of a brute-force effort for now, with light
      touch-ups but nothing should really change.  There's chances to optimize
      more, but let's leave that for later.
      
      Link: https://lkml.kernel.org/r/20230412164241.328259-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Reviewed-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      33be4e89
    • Peter Xu's avatar
      selftests/mm: drop test_uffdio_zeropage_eexist · 618aeb5d
      Peter Xu authored
      The idea was trying to flip this var in the alarm handler from time to
      time to test -EEXIST of UFFDIO_ZEROPAGE, but firstly it's only used in the
      zeropage test so probably only used once, meanwhile we passed
      "retry==false" so it'll never got tested anyway.
      
      Drop both sides so we always test UFFDIO_ZEROPAGE retries if has_zeropage
      is set (!hugetlb).
      
      One more thing to do is doing UFFDIO_REGISTER for the alias buffer too,
      because otherwise the test won't even pass!  We were just lucky that this
      test never really got ran at all.
      
      Link: https://lkml.kernel.org/r/20230412164238.328238-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      618aeb5d
    • Peter Xu's avatar
      selftests/mm: test UFFDIO_ZEROPAGE only when !hugetlb · 4af9ff29
      Peter Xu authored
      Make the check as simple as "test_type == TEST_HUGETLB" because that's the
      only mem that doesn't support ZEROPAGE.
      
      Link: https://lkml.kernel.org/r/20230412164234.328168-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4af9ff29
    • Peter Xu's avatar
      selftests/mm: reuse pagemap_get_entry() in vm_util.h · 366e93c4
      Peter Xu authored
      Meanwhile drop pagemap_read_vaddr().
      
      Link: https://lkml.kernel.org/r/20230412164231.328157-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      366e93c4
    • Peter Xu's avatar
      selftests/mm: use PM_* macros in vm_utils.h · 9f74696b
      Peter Xu authored
      We've got the macros in uffd-stress.c, move it over and use it in
      vm_util.h.
      
      Link: https://lkml.kernel.org/r/20230412164227.328145-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9f74696b
    • Peter Xu's avatar
      selftests/mm: merge default_huge_page_size() into one · bd4d67e7
      Peter Xu authored
      There're already 3 same definitions of the three functions.  Move it into
      vm_util.[ch].
      
      Link: https://lkml.kernel.org/r/20230412164223.328134-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      bd4d67e7
    • Peter Xu's avatar
      selftests/mm: link vm_util.c always · 4b54f5a7
      Peter Xu authored
      We do have plenty of files that want to link against vm_util.c.  Just make
      it simple by linking it always.
      
      Link: https://lkml.kernel.org/r/20230412164220.328123-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4b54f5a7
    • Peter Xu's avatar
      selftests/mm: use TEST_GEN_PROGS where proper · aef6fde7
      Peter Xu authored
      TEST_GEN_PROGS and TEST_GEN_FILES are used randomly in the mm/Makefile to
      specify programs that need to build.  Logically all these binaries should
      all fall into TEST_GEN_PROGS.
      
      Replace those TEST_GEN_FILES with TEST_GEN_PROGS, so that we can reference
      all the tests easily later.
      
      [peterx@redhat.com: tools/testing/selftests/mm/Makefile: don't wipe out TEST_GEN_PROGS]
        Link: https://lkml.kernel.org/r/ZDxrvZh/cw357D8P@x1n
      Link: https://lkml.kernel.org/r/20230412164218.328104-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      aef6fde7
    • Peter Xu's avatar
      selftests/mm: merge util.h into vm_util.h · af605d26
      Peter Xu authored
      There're two util headers under mm/ kselftest.  Merge one with another. 
      It turns out util.h is the easy one to move.
      
      When merging, drop PAGE_SIZE / PAGE_SHIFT because they're unnecessary
      wrappers to page_size() / page_shift(), meanwhile rename them to psize()
      and pshift() so as to not conflict with some existing definitions in some
      test files that includes vm_util.h.
      
      Link: https://lkml.kernel.org/r/20230412164120.327731-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      af605d26
    • Peter Xu's avatar
      selftests/mm: dump a summary in run_vmtests.sh · c7c55fc4
      Peter Xu authored
      Dump a summary after running whatever test specified.  Useful for human
      runners to identify any kind of failures (besides exit code).
      
      Link: https://lkml.kernel.org/r/20230412164117.327720-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c7c55fc4
    • Peter Xu's avatar
      selftests/mm: update .gitignore with two missing tests · c14ef378
      Peter Xu authored
      Patch series "selftests/mm: Split / Refactor userfault test", v2.
      
      This patchset splits userfaultfd.c into two tests:
      
        - uffd-stress: the "vanilla", old and powerful stress test
        - uffd-unit-tests: all the unit tests will be moved here
      
      This is on my todo list for a long time but I never did it for real.  The
      uffd test is growing into a small and cute monster.  I start to notice it's
      going harder to maintain such a test and make it useful.
      
      A few issues I found when looking at userfaultfd test:
      
        - We have a bunch of unit tests in userfaultfd.c, but they always need to
          be run only after a stress type.  No way to not do it.
      
        - We can only run an unit test for one memory type only, if we want to
          do a quick smoke test to check regressions, there's no good way.  The
          best to come currently is "bash ./run_vmtests.sh -t userfaultfd" thanks
          to the most recent changes to run_vmtests.sh on tagging.  Still, that
          needs to run the stress tests always and hard to see what's wrong.
      
        - It's hard to add a new unit test to userfaultfd.c, we don't really know
          what's happening, not until we mostly read the whole file.
      
        - We did a bunch of useless tests, e.g. we run twice the whole suite of
          stress test just to verify both syscall and /dev/userfaultfd.  They're
          all using userfaultfd_new() to create the handle, everything should
          really be the same underneath.  One simple unit test should cover that!
      
        - We have tens of global variables in one file but shared with all the
          tests.  Some of them are not suitable to be a global var from
          maintainance pov.  It enforces every unit test to consider how these
          vars affects the stress test and vice versa, but that's logically not
          necessary.
      
        - Userfaultfd test is not friendly to old kernels.  Mostly it only works
          on the latest kernel tree.  It's preferrable to be run on all kernels
          and properly report what's missing.
      
      I'll stop here, I feel like I can still list some..
      
      This patchset should resolve all issues above, and actually we can do even
      more on top.  I stopped doing that until I found I already got 29 patches
      and 2000+ LOC changes.  That's already a patchset terrible enough so we
      should move in small steps.
      
      After the whole set applied, "./run_vmtests.sh -t userfaultfd" looks like
      this:
      
      ===8<===
      vm.nr_hugepages = 1024
      -------------------------
      running ./uffd-unit-tests
      -------------------------
      Testing UFFDIO_API (with syscall)... done
      Testing UFFDIO_API (with /dev/userfaultfd)... done
      Testing register-ioctls on anon... done
      Testing register-ioctls on shmem... done
      Testing register-ioctls on shmem-private... done
      Testing register-ioctls on hugetlb... done
      Testing register-ioctls on hugetlb-private... done
      Testing zeropage on anon... done
      Testing zeropage on shmem... done
      Testing zeropage on shmem-private... done
      Testing zeropage on hugetlb... done
      Testing zeropage on hugetlb-private... done
      Testing pagemap on anon... done
      Testing wp-unpopulated on anon... done
      Testing minor on shmem... done
      Testing minor on hugetlb... done
      Testing minor-wp on shmem... done
      Testing minor-wp on hugetlb... done
      Testing minor-collapse on shmem... done
      Testing sigbus on anon... done
      Testing sigbus on shmem... done
      Testing sigbus on shmem-private... done
      Testing sigbus on hugetlb... done
      Testing sigbus on hugetlb-private... done
      Testing sigbus-wp on anon... done
      Testing sigbus-wp on shmem... done
      Testing sigbus-wp on shmem-private... done
      Testing sigbus-wp on hugetlb... done
      Testing sigbus-wp on hugetlb-private... done
      Testing events on anon... done
      Testing events on shmem... done
      Testing events on shmem-private... done
      Testing events on hugetlb... done
      Testing events on hugetlb-private... done
      Testing events-wp on anon... done
      Testing events-wp on shmem... done
      Testing events-wp on shmem-private... done
      Testing events-wp on hugetlb... done
      Testing events-wp on hugetlb-private... done
      Userfaults unit tests: pass=39, skip=0, fail=0 (total=39)
      [PASS]
      --------------------------------
      running ./uffd-stress anon 20 16
      --------------------------------
      nr_pages: 5120, nr_pages_per_cpu: 640
      bounces: 15, mode: rnd racing ver poll, userfaults: 345 missing (26+48+61+102+30+12+59+7) 1596 wp (120+139+317+346+215+67+306+86)
      [...]
      [PASS]
      ------------------------------------
      running ./uffd-stress hugetlb 128 32
      ------------------------------------
      nr_pages: 64, nr_pages_per_cpu: 8
      bounces: 31, mode: rnd racing ver poll, userfaults: 29 missing (6+6+6+5+4+2+0+0) 104 wp (20+19+22+18+7+12+5+1)
      [...]
      [PASS]
      --------------------------------------------
      running ./uffd-stress hugetlb-private 128 32
      --------------------------------------------
      nr_pages: 64, nr_pages_per_cpu: 8
      bounces: 31, mode: rnd racing ver poll, userfaults: 33 missing (12+9+7+0+5+0+0+0) 111 wp (24+25+14+14+11+17+5+1)
      [...]
      [PASS]
      ---------------------------------
      running ./uffd-stress shmem 20 16
      ---------------------------------
      nr_pages: 5120, nr_pages_per_cpu: 640
      bounces: 15, mode: rnd racing ver poll, userfaults: 247 missing (15+17+34+60+81+37+3+0) 2038 wp (180+114+276+400+381+318+165+204)
      [...]
      [PASS]
      -----------------------------------------
      running ./uffd-stress shmem-private 20 16
      -----------------------------------------
      nr_pages: 5120, nr_pages_per_cpu: 640
      bounces: 15, mode: rnd racing ver poll, userfaults: 235 missing (52+29+55+56+13+9+16+5) 2849 wp (218+406+461+531+328+284+430+191)
      [...]
      [PASS]
      SUMMARY: PASS=6 SKIP=0 FAIL=0
      ===8<===
      
      The output may be different if we miss some features (e.g., hugetlb not
      allocated, old kernel, less privilege of uffd handle), but they should show
      up with good reasons.  E.g., I tried to run the unit test on my Fedora
      kernel and it gives me:
      
      ===8<===
      UFFDIO_API (with syscall)... failed [reason: UFFDIO_API should fail with wrong api but didn't]
      UFFDIO_API (with /dev/userfaultfd)... skipped [reason: cannot open userfaultfd handle]
      zeropage on anon... done
      zeropage on shmem... done
      zeropage on shmem-private... done
      zeropage-hugetlb on hugetlb... done
      zeropage-hugetlb on hugetlb-private... done
      pagemap on anon... pagemap on anon... pagemap on anon... done
      wp-unpopulated on anon... skipped [reason: feature missing]
      minor on shmem... done
      minor on hugetlb... done
      minor-wp on shmem... skipped [reason: feature missing]
      minor-wp on hugetlb... skipped [reason: feature missing]
      minor-collapse on shmem... done
      sigbus on anon... skipped [reason: possible lack of priviledge]
      sigbus on shmem... skipped [reason: possible lack of priviledge]
      sigbus on shmem-private... skipped [reason: possible lack of priviledge]
      sigbus on hugetlb... skipped [reason: possible lack of priviledge]
      sigbus on hugetlb-private... skipped [reason: possible lack of priviledge]
      sigbus-wp on anon... skipped [reason: possible lack of priviledge]
      sigbus-wp on shmem... skipped [reason: possible lack of priviledge]
      sigbus-wp on shmem-private... skipped [reason: possible lack of priviledge]
      sigbus-wp on hugetlb... skipped [reason: possible lack of priviledge]
      sigbus-wp on hugetlb-private... skipped [reason: possible lack of priviledge]
      events on anon... skipped [reason: possible lack of priviledge]
      events on shmem... skipped [reason: possible lack of priviledge]
      events on shmem-private... skipped [reason: possible lack of priviledge]
      events on hugetlb... skipped [reason: possible lack of priviledge]
      events on hugetlb-private... skipped [reason: possible lack of priviledge]
      events-wp on anon... skipped [reason: possible lack of priviledge]
      events-wp on shmem... skipped [reason: possible lack of priviledge]
      events-wp on shmem-private... skipped [reason: possible lack of priviledge]
      events-wp on hugetlb... skipped [reason: possible lack of priviledge]
      events-wp on hugetlb-private... skipped [reason: possible lack of priviledge]
      Userfaults unit tests: pass=9, skip=24, fail=1 (total=34)
      ===8<===
      
      Patch layout:
      
      - Revert "userfaultfd: don't fail on unrecognized features"
      
        Something I found when I got the UFFDIO_API test below.  Axel, I still
        propose to revert it as a whole, but feel free to continue the discussion
        from the original patch thread.
      
      - selftests/mm: Update .gitignore with two missing tests
      - selftests/mm: Dump a summary in run_vmtests.sh
      - selftests/mm: Merge util.h into vm_util.h
      - selftests/mm: Use TEST_GEN_PROGS where proper
      - selftests/mm: Link vm_util.c always
      - selftests/mm: Merge default_huge_page_size() into one
      - selftests/mm: Use PM_* macros in vm_utils.h
      - selftests/mm: Reuse pagemap_get_entry() in vm_util.h
      - selftests/mm: Test UFFDIO_ZEROPAGE only when !hugetlb
      - selftests/mm: Drop test_uffdio_zeropage_eexist
      
        Until here, all cleanups here and there.  I wanted to keep going, but I
        found that maybe it'll take a few more days to split the test.  Hence I
        did a split starting from the next one, so we have a working thing first.
      
      - selftests/mm: Create uffd-common.[ch]
      - selftests/mm: Split uffd tests into uffd-stress and uffd-unit-tests
      
        This did the major brute force split of common codes into
        uffd-common.[ch].  That'll be the so far common base for stress and unit
        tests.  Then a new unit test is created.
      
      - selftests/mm: uffd_[un]register()
      - selftests/mm: uffd_open_{dev|sys}()
      - selftests/mm: UFFDIO_API test
      
        This patch hides here to start writting the 1st unit test with
        UFFDIO_API, also detection of userfaultfd privileges.
      
      - selftests/mm: Drop global mem_fd in uffd tests
      - selftests/mm: Drop global hpage_size in uffd tests
      - selftests/mm: Rename uffd_stats to uffd_args
      - selftests/mm: Let uffd_handle_page_fault() takes wp parameter
      - selftests/mm: Allow allocate_area() to fail properly
      
        Some further cleanup that I noticed otherwise hard to move the tests.
      
      - selftests/mm: Add framework for uffd-unit-test
      
        The major patch provides the framework for most of the rest unit tests.
      
      - selftests/mm: Move uffd pagemap test to unit test
      - selftests/mm: Move uffd minor test to unit test
      - selftests/mm: Move uffd sig/events tests into uffd unit tests
      - selftests/mm: Move zeropage test into uffd unit tests
      
        Move unit tests and suite them into the new file.
      
      - selftests/mm: Workaround no way to detect uffd-minor + wp
      - selftests/mm: Allow uffd test to skip properly with no privilege
      - selftests/mm: Drop sys/dev test in uffd-stress test
      - selftests/mm: Add shmem-private test to uffd-stress
      
        A bunch of changes to do better on error reportings, and add
        shmem-private to the stress test which was long missing.
      
      - selftests/mm: Add uffdio register ioctls test
      
        One more patch to test uffdio_register.ioctls.
      
      
      This patch (of 30):
      
      Update .gitignore with two missing tests.
      
      Link: https://lkml.kernel.org/r/20230412163922.327282-1-peterx@redhat.com
      Link: https://lkml.kernel.org/r/20230412164114.327709-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c14ef378
    • Haifeng Xu's avatar
      mm/vmscan: simplify shrink_node() · 54c4fe08
      Haifeng Xu authored
      The difference between sc->nr_reclaimed and nr_reclaimed is computed three
      times.  Introduce a new variable to record the value, so it only needs to
      be computed once.
      
      Link: https://lkml.kernel.org/r/20230411061757.12041-1-haifeng.xu@shopee.comSigned-off-by: default avatarHaifeng Xu <haifeng.xu@shopee.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      54c4fe08