• Hugh Dickins's avatar
    mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock() · 1043173e
    Hugh Dickins authored
    Bring collapse_and_free_pmd() back into collapse_pte_mapped_thp().  It
    does need mmap_read_lock(), but it does not need mmap_write_lock(), nor
    vma_start_write() nor i_mmap lock nor anon_vma lock.  All racing paths are
    relying on pte_offset_map_lock() and pmd_lock(), so use those.
    
    Follow the pattern in retract_page_tables(); and using pte_free_defer()
    removes most of the need for tlb_remove_table_sync_one() here; but call
    pmdp_get_lockless_sync() to use it in the PAE case.
    
    First check the VMA, in case page tables are being torn down: from JannH. 
    Confirm the preliminary find_pmd_or_thp_or_none() once page lock has been
    acquired and the page looks suitable: from then on its state is stable.
    
    However, collapse_pte_mapped_thp() was doing something others don't:
    freeing a page table still containing "valid" entries.  i_mmap lock did
    stop a racing truncate from double-freeing those pages, but we prefer
    collapse_pte_mapped_thp() to clear the entries as usual.  Their TLB flush
    can wait until the pmdp_collapse_flush() which follows, but the
    mmu_notifier_invalidate_range_start() has to be done earlier.
    
    Do the "step 1" checking loop without mmu_notifier: it wouldn't be good
    for khugepaged to keep on repeatedly invalidating a range which is then
    found unsuitable e.g.  contains COWs.  "step 2", which does the clearing,
    must then be more careful (after dropping ptl to do mmu_notifier), with
    abort prepared to correct the accounting like "step 3".  But with those
    entries now cleared, "step 4" (after dropping ptl to do pmd_lock) is kept
    safe by the huge page lock, which stops new PTEs from being faulted in.
    
    [hughd@google.com: don't set mmap_locked = true in madvise_collapse()]
      Link: https://lkml.kernel.org/r/d3d9ff14-ef8-8f84-e160-bfa1f5794275@google.com
    [hughd@google.com: use ptep_clear() instead of pte_clear()]
      Link: https://lkml.kernel.org/r/e0197433-8a47-6a65-534d-eda26eeb78b0@google.com
    Link: https://lkml.kernel.org/r/b53be6a4-7715-51f9-aad-f1347dcb7c4@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
    Reviewed-by: default avatarQi Zheng <zhengqi.arch@bytedance.com>
    Cc: Alexander Gordeev <agordeev@linux.ibm.com>
    Cc: Alistair Popple <apopple@nvidia.com>
    Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    Cc: Anshuman Khandual <anshuman.khandual@arm.com>
    Cc: Axel Rasmussen <axelrasmussen@google.com>
    Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
    Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
    Cc: Christoph Hellwig <hch@infradead.org>
    Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
    Cc: Heiko Carstens <hca@linux.ibm.com>
    Cc: Huang, Ying <ying.huang@intel.com>
    Cc: Ira Weiny <ira.weiny@intel.com>
    Cc: Jann Horn <jannh@google.com>
    Cc: Jason Gunthorpe <jgg@ziepe.ca>
    Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: Lorenzo Stoakes <lstoakes@gmail.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Mike Rapoport (IBM) <rppt@kernel.org>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Ralph Campbell <rcampbell@nvidia.com>
    Cc: Russell King <linux@armlinux.org.uk>
    Cc: SeongJae Park <sj@kernel.org>
    Cc: Song Liu <song@kernel.org>
    Cc: Steven Price <steven.price@arm.com>
    Cc: Suren Baghdasaryan <surenb@google.com>
    Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
    Cc: Vasily Gorbik <gor@linux.ibm.com>
    Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Will Deacon <will@kernel.org>
    Cc: Yang Shi <shy828301@gmail.com>
    Cc: Yu Zhao <yuzhao@google.com>
    Cc: Zack Rusin <zackr@vmware.com>
    Cc: Zi Yan <ziy@nvidia.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    1043173e
khugepaged.c 73.7 KB