• Linus Torvalds's avatar
    mm: avoid early COW write protect games during fork() · f3c64eda
    Linus Torvalds authored
    In commit 70e806e4 ("mm: Do early cow for pinned pages during fork()
    for ptes") we write-protected the PTE before doing the page pinning
    check, in order to avoid a race with concurrent fast-GUP pinning (which
    doesn't take the mm semaphore or the page table lock).
    
    That trick doesn't actually work - it doesn't handle memory ordering
    properly, and doing so would be prohibitively expensive.
    
    It also isn't really needed.  While we're moving in the direction of
    allowing and supporting page pinning without marking the pinned area
    with MADV_DONTFORK, the fact is that we've never really supported this
    kind of odd "concurrent fork() and page pinning", and doing the
    serialization on a pte level is just wrong.
    
    We can add serialization with a per-mm sequence counter, so we know how
    to solve that race properly, but we'll do that at a more appropriate
    time.  Right now this just removes the write protect games.
    
    It also turns out that the write protect games actually break on Power,
    as reported by Aneesh Kumar:
    
     "Architecture like ppc64 expects set_pte_at to be not used for updating
      a valid pte. This is further explained in commit 56eecdb9 ("mm:
      Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bit")"
    
    and the code triggered a warning there:
    
      WARNING: CPU: 0 PID: 30613 at arch/powerpc/mm/pgtable.c:185 set_pte_at+0x2a8/0x3a0 arch/powerpc/mm/pgtable.c:185
      Call Trace:
        copy_present_page mm/memory.c:857 [inline]
        copy_present_pte mm/memory.c:899 [inline]
        copy_pte_range mm/memory.c:1014 [inline]
        copy_pmd_range mm/memory.c:1092 [inline]
        copy_pud_range mm/memory.c:1127 [inline]
        copy_p4d_range mm/memory.c:1150 [inline]
        copy_page_range+0x1f6c/0x2cc0 mm/memory.c:1212
        dup_mmap kernel/fork.c:592 [inline]
        dup_mm+0x77c/0xab0 kernel/fork.c:1355
        copy_mm kernel/fork.c:1411 [inline]
        copy_process+0x1f00/0x2740 kernel/fork.c:2070
        _do_fork+0xc4/0x10b0 kernel/fork.c:2429
    
    Link: https://lore.kernel.org/lkml/CAHk-=wiWr+gO0Ro4LvnJBMs90OiePNyrE3E+pJvc9PzdBShdmw@mail.gmail.com/
    Link: https://lore.kernel.org/linuxppc-dev/20201008092541.398079-1-aneesh.kumar@linux.ibm.com/Reported-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    Tested-by: default avatarLeon Romanovsky <leonro@nvidia.com>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Jason Gunthorpe <jgg@ziepe.ca>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Kirill Shutemov <kirill@shutemov.name>
    Cc: Hugh Dickins <hughd@google.com>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    f3c64eda
memory.c 141 KB