• Yin Fengwei's avatar
    mm/thp: check and bail out if page in deferred queue already · 81e506be
    Yin Fengwei authored
    Kernel build regression with LLVM was reported here:
    https://lore.kernel.org/all/Y1GCYXGtEVZbcv%2F5@dev-arch.thelio-3990X/ with
    commit f35b5d7d ("mm: align larger anonymous mappings on THP
    boundaries").  And the commit f35b5d7d was reverted.
    
    It turned out the regression is related with madvise(MADV_DONTNEED)
    was used by ld.lld. But with none PMD_SIZE aligned parameter len.
    trace-bpfcc captured:
    531607  531732  ld.lld          do_madvise.part.0 start: 0x7feca9000000, len: 0x7fb000, behavior: 0x4
    531607  531793  ld.lld          do_madvise.part.0 start: 0x7fec86a00000, len: 0x7fb000, behavior: 0x4
    
    If the underneath physical page is THP, the madvise(MADV_DONTNEED) can
    trigger split_queue_lock contention raised significantly. perf showed
    following data:
        14.85%     0.00%  ld.lld           [kernel.kallsyms]           [k]
           entry_SYSCALL_64_after_hwframe
               11.52%
                    entry_SYSCALL_64_after_hwframe
                    do_syscall_64
                    __x64_sys_madvise
                    do_madvise.part.0
                    zap_page_range
                    unmap_single_vma
                    unmap_page_range
                    page_remove_rmap
                    deferred_split_huge_page
                    __lock_text_start
                    native_queued_spin_lock_slowpath
    
    If THP can't be removed from rmap as whole THP, partial THP will be
    removed from rmap by removing sub-pages from rmap.  Even the THP head page
    is added to deferred queue already, the split_queue_lock will be acquired
    and check whether the THP head page is in the queue already.  Thus, the
    contention of split_queue_lock is raised.
    
    Before acquire split_queue_lock, check and bail out early if the THP
    head page is in the queue already. The checking without holding
    split_queue_lock could race with deferred_split_scan, but it doesn't
    impact the correctness here.
    
    Test result of building kernel with ld.lld:
    commit 7b5a0b66 (parent commit of f35b5d7d):
    time -f "\t%E real,\t%U user,\t%S sys" make LD=ld.lld -skj96 allmodconfig all
            6:07.99 real,   26367.77 user,  5063.35 sys
    
    commit f35b5d7d:
    time -f "\t%E real,\t%U user,\t%S sys" make LD=ld.lld -skj96 allmodconfig all
            7:22.15 real,   26235.03 user,  12504.55 sys
    
    commit f35b5d7d with the fixing patch:
    time -f "\t%E real,\t%U user,\t%S sys" make LD=ld.lld -skj96 allmodconfig all
            6:08.49 real,   26520.15 user,  5047.91 sys
    
    Link: https://lkml.kernel.org/r/20221223135207.2275317-1-fengwei.yin@intel.comSigned-off-by: default avatarYin Fengwei <fengwei.yin@intel.com>
    Tested-by: default avatarNathan Chancellor <nathan@kernel.org>
    Acked-by: default avatarDavid Rientjes <rientjes@google.com>
    Reviewed-by: default avatar"Huang, Ying" <ying.huang@intel.com>
    Cc: Feng Tang <feng.tang@intel.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Rik van Riel <riel@surriel.com>
    Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
    Cc: Yang Shi <shy828301@gmail.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    81e506be
huge_memory.c 88.4 KB