• David Hildenbrand's avatar
    mm/debug_vm_pgtable: more pte_swp_exclusive() sanity checks · 2321ba3e
    David Hildenbrand authored
    Patch series "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all
    architectures with swap PTEs".
    
    This is the follow-up on [1]:
    	[PATCH v2 0/8] mm: COW fixes part 3: reliable GUP R/W FOLL_GET of
    	anonymous pages
    
    After we implemented __HAVE_ARCH_PTE_SWP_EXCLUSIVE on most prominent
    enterprise architectures, implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all
    remaining architectures that support swap PTEs.
    
    This makes sure that exclusive anonymous pages will stay exclusive, even
    after they were swapped out -- for example, making GUP R/W FOLL_GET of
    anonymous pages reliable.  Details can be found in [1].
    
    This primarily fixes remaining known O_DIRECT memory corruptions that can
    happen on concurrent swapout, whereby we can lose DMA reads to a page
    (modifying the user page by writing to it).
    
    To verify, there are two test cases (requiring swap space, obviously):
    (1) The O_DIRECT+swapout test case [2] from Andrea. This test case tries
        triggering a race condition.
    (2) My vmsplice() test case [3] that tries to detect if the exclusive
        marker was lost during swapout, not relying on a race condition.
    
    
    For example, on 32bit x86 (with and without PAE), my test case fails
    without these patches:
    	$ ./test_swp_exclusive
    	FAIL: page was replaced during COW
    But succeeds with these patches:
    	$ ./test_swp_exclusive
    	PASS: page was not replaced during COW
    
    
    Why implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE for all architectures, even
    the ones where swap support might be in a questionable state?  This is the
    first step towards removing "readable_exclusive" migration entries, and
    instead using pte_swp_exclusive() also with (readable) migration entries
    instead (as suggested by Peter).  The only missing piece for that is
    supporting pmd_swp_exclusive() on relevant architectures with THP
    migration support.
    
    As all relevant architectures now implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE,,
    we can drop __HAVE_ARCH_PTE_SWP_EXCLUSIVE in the last patch.
    
    I tried cross-compiling all relevant setups and tested on x86 and sparc64
    so far.
    
    CCing arch maintainers only on this cover letter and on the respective
    patch(es).
    
    [1] https://lkml.kernel.org/r/20220329164329.208407-1-david@redhat.com
    [2] https://gitlab.com/aarcange/kernel-testcases-for-v5.11/-/blob/main/page_count_do_wp_page-swap.c
    [3] https://gitlab.com/davidhildenbrand/scratchspace/-/blob/main/test_swp_exclusive.c
    
    
    This patch (of 26):
    
    We want to implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures. 
    Let's extend our sanity checks, especially testing that our PTE bit does
    not affect:
    
    * is_swap_pte() -> pte_present() and pte_none()
    * the swap entry + type
    * pte_swp_soft_dirty()
    
    Especially, the pfn_pte() is dodgy when the swap PTE layout differs
    heavily from ordinary PTEs.  Let's properly construct a swap PTE from swap
    type+offset.
    
    [david@redhat.com: fix build]
      Link: https://lkml.kernel.org/r/6aaad548-cf48-77fa-9d6c-db83d724b2eb@redhat.com
    Link: https://lkml.kernel.org/r/20230113171026.582290-1-david@redhat.com
    Link: https://lkml.kernel.org/r/20230113171026.582290-2-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
    Cc: <aou@eecs.berkeley.edu>
    Cc: Borislav Petkov (AMD) <bp@alien8.de>
    Cc: Brian Cain <bcain@quicinc.com>
    Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
    Cc: Chris Zankel <chris@zankel.net>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: David S. Miller <davem@davemloft.net>
    Cc: Dinh Nguyen <dinguyen@kernel.org>
    Cc: Geert Uytterhoeven <geert@linux-m68k.org>
    Cc: Greg Ungerer <gerg@linux-m68k.org>
    Cc: Guo Ren <guoren@kernel.org>
    Cc: Helge Deller <deller@gmx.de>
    Cc: H. Peter Anvin (Intel) <hpa@zytor.com>
    Cc: Huacai Chen <chenhuacai@kernel.org>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
    Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
    Cc: Jason Gunthorpe <jgg@nvidia.com>
    Cc: Johannes Berg <johannes@sipsolutions.net>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Matt Turner <mattst88@gmail.com>
    Cc: Max Filippov <jcmvbkbc@gmail.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Michal Simek <monstr@monstr.eu>
    Cc: Mike Rapoport <rppt@linux.ibm.com>
    Cc: Nadav Amit <namit@vmware.com>
    Cc: Nicholas Piggin <npiggin@gmail.com>
    Cc: Palmer Dabbelt <palmer@dabbelt.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Richard Henderson <richard.henderson@linaro.org>
    Cc: Richard Weinberger <richard@nod.at>
    Cc: Rich Felker <dalias@libc.org>
    Cc: Russell King <linux@armlinux.org.uk>
    Cc: Stafford Horne <shorne@gmail.com>
    Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vineet Gupta <vgupta@kernel.org>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Xuerui Wang <kernel@xen0n.name>
    Cc: Yang Shi <shy828301@gmail.com>
    Cc: Yoshinori Sato <ysato@users.osdn.me>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    2321ba3e
debug_vm_pgtable.c 40.1 KB