• Huang Ying's avatar
    mm: hugetlb: clear target sub-page last when clearing huge page · c79b57e4
    Huang Ying authored
    Huge page helps to reduce TLB miss rate, but it has higher cache
    footprint, sometimes this may cause some issue.  For example, when
    clearing huge page on x86_64 platform, the cache footprint is 2M.  But
    on a Xeon E5 v3 2699 CPU, there are 18 cores, 36 threads, and only 45M
    LLC (last level cache).  That is, in average, there are 2.5M LLC for
    each core and 1.25M LLC for each thread.
    
    If the cache pressure is heavy when clearing the huge page, and we clear
    the huge page from the begin to the end, it is possible that the begin
    of huge page is evicted from the cache after we finishing clearing the
    end of the huge page.  And it is possible for the application to access
    the begin of the huge page after clearing the huge page.
    
    To help the above situation, in this patch, when we clear a huge page,
    the order to clear sub-pages is changed.  In quite some situation, we
    can get the address that the application will access after we clear the
    huge page, for example, in a page fault handler.  Instead of clearing
    the huge page from begin to end, we will clear the sub-pages farthest
    from the the sub-page to access firstly, and clear the sub-page to
    access last.  This will make the sub-page to access most cache-hot and
    sub-pages around it more cache-hot too.  If we cannot know the address
    the application will access, the begin of the huge page is assumed to be
    the the address the application will access.
    
    With this patch, the throughput increases ~28.3% in vm-scalability
    anon-w-seq test case with 72 processes on a 2 socket Xeon E5 v3 2699
    system (36 cores, 72 threads).  The test case creates 72 processes, each
    process mmap a big anonymous memory area and writes to it from the begin
    to the end.  For each process, other processes could be seen as other
    workload which generates heavy cache pressure.  At the same time, the
    cache miss rate reduced from ~33.4% to ~31.7%, the IPC (instruction per
    cycle) increased from 0.56 to 0.74, and the time spent in user space is
    reduced ~7.9%
    
    Christopher Lameter suggests to clear bytes inside a sub-page from end
    to begin too.  But tests show no visible performance difference in the
    tests.  May because the size of page is small compared with the cache
    size.
    
    Thanks Andi Kleen to propose to use address to access to determine the
    order of sub-pages to clear.
    
    The hugetlbfs access address could be improved, will do that in another
    patch.
    
    [ying.huang@intel.com: improve readability of clear_huge_page()]
      Link: http://lkml.kernel.org/r/20170830051842.1397-1-ying.huang@intel.com
    Link: http://lkml.kernel.org/r/20170815014618.15842-1-ying.huang@intel.comSuggested-by: default avatarAndi Kleen <andi.kleen@intel.com>
    Signed-off-by: default avatar"Huang, Ying" <ying.huang@intel.com>
    Acked-by: default avatarJan Kara <jack@suse.cz>
    Reviewed-by: default avatarMichal Hocko <mhocko@suse.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
    Cc: Nadia Yvette Chambers <nyc@holomorphy.com>
    Cc: Matthew Wilcox <mawilcox@microsoft.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Shaohua Li <shli@fb.com>
    Cc: Christopher Lameter <cl@linux.com>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    c79b57e4
memory.c 122 KB