• Mike Kravetz's avatar
    hugetlbfs: revert use i_mmap_rwsem for more pmd sharing synchronization · 3a47c54f
    Mike Kravetz authored
    Commit c0d0381a ("hugetlbfs: use i_mmap_rwsem for more pmd sharing
    synchronization") added code to take i_mmap_rwsem in read mode for the
    duration of fault processing.  However, this has been shown to cause
    performance/scaling issues.  Revert the code and go back to only taking
    the semaphore in huge_pmd_share during the fault path.
    
    Keep the code that takes i_mmap_rwsem in write mode before calling
    try_to_unmap as this is required if huge_pmd_unshare is called.
    
    NOTE: Reverting this code does expose the following race condition.
    
    Faulting thread                                 Unsharing thread
    ...                                                  ...
    ptep = huge_pte_offset()
          or
    ptep = huge_pte_alloc()
    ...
                                                    i_mmap_lock_write
                                                    lock page table
    ptep invalid   <------------------------        huge_pmd_unshare()
    Could be in a previously                        unlock_page_table
    sharing process or worse                        i_mmap_unlock_write
    ...
    ptl = huge_pte_lock(ptep)
    get/update pte
    set_pte_at(pte, ptep)
    
    It is unknown if the above race was ever experienced by a user.  It was
    discovered via code inspection when initially addressed.
    
    In subsequent patches, a new synchronization mechanism will be added to
    coordinate pmd sharing and eliminate this race.
    
    Link: https://lkml.kernel.org/r/20220914221810.95771-3-mike.kravetz@oracle.comSigned-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
    Reviewed-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
    Cc: Axel Rasmussen <axelrasmussen@google.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Davidlohr Bueso <dave@stgolabs.net>
    Cc: James Houghton <jthoughton@google.com>
    Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Mina Almasry <almasrymina@google.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
    Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
    Cc: Sven Schnelle <svens@linux.ibm.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    3a47c54f
inode.c 41.1 KB