• Vineeth Remanan Pillai's avatar
    mm: rid swapoff of quadratic complexity · b56a2d8a
    Vineeth Remanan Pillai authored
    This patch was initially posted by Kelley Nielsen.  Reposting the patch
    with all review comments addressed and with minor modifications and
    optimizations.  Also, folding in the fixes offered by Hugh Dickins and
    Huang Ying.  Tests were rerun and commit message updated with new
    results.
    
    try_to_unuse() is of quadratic complexity, with a lot of wasted effort.
    It unuses swap entries one by one, potentially iterating over all the
    page tables for all the processes in the system for each one.
    
    This new proposed implementation of try_to_unuse simplifies its
    complexity to linear.  It iterates over the system's mms once, unusing
    all the affected entries as it walks each set of page tables.  It also
    makes similar changes to shmem_unuse.
    
    Improvement
    
    swapoff was called on a swap partition containing about 6G of data, in a
    VM(8cpu, 16G RAM), and calls to unuse_pte_range() were counted.
    
    Present implementation....about 1200M calls(8min, avg 80% cpu util).
    Prototype.................about  9.0K calls(3min, avg 5% cpu util).
    
    Details
    
    In shmem_unuse(), iterate over the shmem_swaplist and, for each
    shmem_inode_info that contains a swap entry, pass it to
    shmem_unuse_inode(), along with the swap type.  In shmem_unuse_inode(),
    iterate over its associated xarray, and store the index and value of
    each swap entry in an array for passing to shmem_swapin_page() outside
    of the RCU critical section.
    
    In try_to_unuse(), instead of iterating over the entries in the type and
    unusing them one by one, perhaps walking all the page tables for all the
    processes for each one, iterate over the mmlist, making one pass.  Pass
    each mm to unuse_mm() to begin its page table walk, and during the walk,
    unuse all the ptes that have backing store in the swap type received by
    try_to_unuse().  After the walk, check the type for orphaned swap
    entries with find_next_to_unuse(), and remove them from the swap cache.
    If find_next_to_unuse() starts over at the beginning of the type, repeat
    the check of the shmem_swaplist and the walk a maximum of three times.
    
    Change unuse_mm() and the intervening walk functions down to
    unuse_pte_range() to take the type as a parameter, and to iterate over
    their entire range, calling the next function down on every iteration.
    In unuse_pte_range(), make a swap entry from each pte in the range using
    the passed in type.  If it has backing store in the type, call
    swapin_readahead() to retrieve the page and pass it to unuse_pte().
    
    Pass the count of pages_to_unuse down the page table walks in
    try_to_unuse(), and return from the walk when the desired number of
    pages has been swapped back in.
    
    Link: http://lkml.kernel.org/r/20190114153129.4852-2-vpillai@digitalocean.comSigned-off-by: default avatarVineeth Remanan Pillai <vpillai@digitalocean.com>
    Signed-off-by: default avatarKelley Nielsen <kelleynnn@gmail.com>
    Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
    Acked-by: default avatarHugh Dickins <hughd@google.com>
    Cc: Rik van Riel <riel@surriel.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    b56a2d8a
shmem.c 104 KB