• David Hildenbrand's avatar
    mm/userfaultfd: enable writenotify while userfaultfd-wp is enabled for a VMA · 51d3d5eb
    David Hildenbrand authored
    Currently, we don't enable writenotify when enabling userfaultfd-wp on a
    shared writable mapping (for now only shmem and hugetlb).  The consequence
    is that vma->vm_page_prot will still include write permissions, to be set
    as default for all PTEs that get remapped (e.g., mprotect(), NUMA hinting,
    page migration, ...).
    
    So far, vma->vm_page_prot is assumed to be a safe default, meaning that we
    only add permissions (e.g., mkwrite) but not remove permissions (e.g.,
    wrprotect).  For example, when enabling softdirty tracking, we enable
    writenotify.  With uffd-wp on shared mappings, that changed.  More details
    on vma->vm_page_prot semantics were summarized in [1].
    
    This is problematic for uffd-wp: we'd have to manually check for a uffd-wp
    PTEs/PMDs and manually write-protect PTEs/PMDs, which is error prone. 
    Prone to such issues is any code that uses vma->vm_page_prot to set PTE
    permissions: primarily pte_modify() and mk_pte().
    
    Instead, let's enable writenotify such that PTEs/PMDs/...  will be mapped
    write-protected as default and we will only allow selected PTEs that are
    definitely safe to be mapped without write-protection (see
    can_change_pte_writable()) to be writable.  In the future, we might want
    to enable write-bit recovery -- e.g., can_change_pte_writable() -- at more
    locations, for example, also when removing uffd-wp protection.
    
    This fixes two known cases:
    
    (a) remove_migration_pte() mapping uffd-wp'ed PTEs writable, resulting
        in uffd-wp not triggering on write access.
    (b) do_numa_page() / do_huge_pmd_numa_page() mapping uffd-wp'ed PTEs/PMDs
        writable, resulting in uffd-wp not triggering on write access.
    
    Note that do_numa_page() / do_huge_pmd_numa_page() can be reached even
    without NUMA hinting (which currently doesn't seem to be applicable to
    shmem), for example, by using uffd-wp with a PROT_WRITE shmem VMA.  On
    such a VMA, userfaultfd-wp is currently non-functional.
    
    Note that when enabling userfaultfd-wp, there is no need to walk page
    tables to enforce the new default protection for the PTEs: we know that
    they cannot be uffd-wp'ed yet, because that can only happen after enabling
    uffd-wp for the VMA in general.
    
    Also note that this makes mprotect() on ranges with uffd-wp'ed PTEs not
    accidentally set the write bit -- which would result in uffd-wp not
    triggering on later write access.  This commit makes uffd-wp on shmem
    behave just like uffd-wp on anonymous memory in that regard, even though,
    mixing mprotect with uffd-wp is controversial.
    
    [1] https://lkml.kernel.org/r/92173bad-caa3-6b43-9d1e-9a471fdbc184@redhat.com
    
    Link: https://lkml.kernel.org/r/20221209080912.7968-1-david@redhat.com
    Fixes: b1f9e876 ("mm/uffd: enable write protection for shmem & hugetlbfs")
    Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
    Reported-by: default avatarIves van Hoorne <ives@codesandbox.io>
    Debugged-by: default avatarPeter Xu <peterx@redhat.com>
    Acked-by: default avatarPeter Xu <peterx@redhat.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Alistair Popple <apopple@nvidia.com>
    Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
    Cc: Nadav Amit <nadav.amit@gmail.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    51d3d5eb
userfaultfd.c 56.4 KB