• Peter Xu's avatar
    userfaultfd: wp: apply _PAGE_UFFD_WP bit · 292924b2
    Peter Xu authored
    Firstly, introduce two new flags MM_CP_UFFD_WP[_RESOLVE] for
    change_protection() when used with uffd-wp and make sure the two new flags
    are exclusively used.  Then,
    
      - For MM_CP_UFFD_WP: apply the _PAGE_UFFD_WP bit and remove _PAGE_RW
        when a range of memory is write protected by uffd
    
      - For MM_CP_UFFD_WP_RESOLVE: remove the _PAGE_UFFD_WP bit and recover
        _PAGE_RW when write protection is resolved from userspace
    
    And use this new interface in mwriteprotect_range() to replace the old
    MM_CP_DIRTY_ACCT.
    
    Do this change for both PTEs and huge PMDs.  Then we can start to identify
    which PTE/PMD is write protected by general (e.g., COW or soft dirty
    tracking), and which is for userfaultfd-wp.
    
    Since we should keep the _PAGE_UFFD_WP when doing pte_modify(), add it
    into _PAGE_CHG_MASK as well.  Meanwhile, since we have this new bit, we
    can be even more strict when detecting uffd-wp page faults in either
    do_wp_page() or wp_huge_pmd().
    
    After we're with _PAGE_UFFD_WP, a special case is when a page is both
    protected by the general COW logic and also userfault-wp.  Here the
    userfault-wp will have higher priority and will be handled first.  Only
    after the uffd-wp bit is cleared on the PTE/PMD will we continue to handle
    the general COW.  These are the steps on what will happen with such a
    page:
    
      1. CPU accesses write protected shared page (so both protected by
         general COW and uffd-wp), blocked by uffd-wp first because in
         do_wp_page we'll handle uffd-wp first, so it has higher priority
         than general COW.
    
      2. Uffd service thread receives the request, do UFFDIO_WRITEPROTECT
         to remove the uffd-wp bit upon the PTE/PMD.  However here we
         still keep the write bit cleared.  Notify the blocked CPU.
    
      3. The blocked CPU resumes the page fault process with a fault
         retry, during retry it'll notice it was not with the uffd-wp bit
         this time but it is still write protected by general COW, then
         it'll go though the COW path in the fault handler, copy the page,
         apply write bit where necessary, and retry again.
    
      4. The CPU will be able to access this page with write bit set.
    Suggested-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
    Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Cc: Brian Geffon <bgeffon@google.com>
    Cc: Pavel Emelyanov <xemul@openvz.org>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Martin Cracauer <cracauer@cons.org>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Bobby Powers <bobbypowers@gmail.com>
    Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
    Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
    Cc: Maya Gokhale <gokhale2@llnl.gov>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Marty McFadden <mcfadden8@llnl.gov>
    Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
    Cc: Jerome Glisse <jglisse@redhat.com>
    Cc: Rik van Riel <riel@redhat.com>
    Cc: Shaohua Li <shli@fb.com>
    Link: http://lkml.kernel.org/r/20200220163112.11409-8-peterx@redhat.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    292924b2
mprotect.c 16.8 KB