• Peter Xu's avatar
    mm/hugetlb: fix race condition of uffd missing/minor handling · 2ea7ff1e
    Peter Xu authored
    Patch series "mm/hugetlb: Fix selftest failures with write check", v3.
    
    Currently akpm mm-unstable fails with uffd hugetlb private mapping test
    randomly on a write check.
    
    The initial bisection of that points to the recent pmd unshare series, but
    it turns out there's no direction relationship with the series but only
    some timing change caused the race to start trigger.
    
    The race should be fixed in patch 1.  Patch 2 is a trivial cleanup on the
    similar race with hugetlb migrations, patch 3 comment on the write check
    so when anyone read it again it'll be clear why it's there.
    
    
    This patch (of 3):
    
    After the recent rework patchset of hugetlb locking on pmd sharing,
    kselftest for userfaultfd sometimes fails on hugetlb private tests with
    unexpected write fault checks.
    
    It turns out there's nothing wrong within the locking series regarding
    this matter, but it could have changed the timing of threads so it can
    trigger an old bug.
    
    The real bug is when we call hugetlb_no_page() we're not with the pgtable
    lock.  It means we're reading the pte values lockless.  It's perfectly
    fine in most cases because before we do normal page allocations we'll take
    the lock and check pte_same() again.  However before that, there are
    actually two paths on userfaultfd missing/minor handling that may directly
    move on with the fault process without checking the pte values.
    
    It means for these two paths we may be generating an uffd message based on
    an unstable pte, while an unstable pte can legally be anything as long as
    the modifier holds the pgtable lock.
    
    One example, which is also what happened in the failing kselftest and
    caused the test failure, is that for private mappings wr-protection
    changes can happen on one page.  While hugetlb_change_protection()
    generally requires pte being cleared before being changed, then there can
    be a race condition like:
    
            thread 1                              thread 2
            --------                              --------
    
          UFFDIO_WRITEPROTECT                     hugetlb_fault
            hugetlb_change_protection
              pgtable_lock()
              huge_ptep_modify_prot_start
                                                  pte==NULL
                                                  hugetlb_no_page
                                                    generate uffd missing event
                                                    even if page existed!!
              huge_ptep_modify_prot_commit
              pgtable_unlock()
    
    Fix this by rechecking the pte after pgtable lock for both userfaultfd
    missing & minor fault paths.
    
    This bug should have been around starting from uffd hugetlb introduced, so
    attaching a Fixes to the commit.  Also attach another Fixes to the minor
    support commit for easier tracking.
    
    Note that userfaultfd is actually fine with false positives (e.g.  caused
    by pte changed), but not wrong logical events (e.g.  caused by reading a
    pte during changing).  The latter can confuse the userspace, so the
    strictness is very much preferred.  E.g., MISSING event should never
    happen on the page after UFFDIO_COPY has correctly installed the page and
    returned.
    
    Link: https://lkml.kernel.org/r/20221004193400.110155-1-peterx@redhat.com
    Link: https://lkml.kernel.org/r/20221004193400.110155-2-peterx@redhat.com
    Fixes: 1a1aad8a ("userfaultfd: hugetlbfs: add userfaultfd hugetlb hook")
    Fixes: 7677f7fd
    
     ("userfaultfd: add minor fault registration mode")
    Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
    Co-developed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
    Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Axel Rasmussen <axelrasmussen@google.com>
    Cc: Nadav Amit <nadav.amit@gmail.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    2ea7ff1e
hugetlb.c 207 KB