• Kairui Song's avatar
    mm/swap: reduce swap cache search space · 7aad25b4
    Kairui Song authored
    Currently we use one swap_address_space for every 64M chunk to reduce lock
    contention, this is like having a set of smaller swap files inside one
    swap device.  But when doing swap cache look up or insert, we are still
    using the offset of the whole large swap device.  This is OK for
    correctness, as the offset (key) is unique.
    
    But Xarray is specially optimized for small indexes, it creates the radix
    tree levels lazily to be just enough to fit the largest key stored in one
    Xarray.  So we are wasting tree nodes unnecessarily.
    
    For 64M chunk it should only take at most 3 levels to contain everything. 
    But if we are using the offset from the whole swap device, the offset
    (key) value will be way beyond 64M, and so will the tree level.
    
    Optimize this by using a new helper swap_cache_index to get a swap entry's
    unique offset in its own 64M swap_address_space.
    
    I see a ~1% performance gain in benchmark and actual workload with high
    memory pressure.
    
    Test with `time memhog 128G` inside a 8G memcg using 128G swap (ramdisk
    with SWP_SYNCHRONOUS_IO dropped, tested 3 times, results are stable.  The
    test result is similar but the improvement is smaller if
    SWP_SYNCHRONOUS_IO is enabled, as swap out path can never skip swap
    cache):
    
    Before:
    6.07user 250.74system 4:17.26elapsed 99%CPU (0avgtext+0avgdata 8373376maxresident)k
    0inputs+0outputs (55major+33555018minor)pagefaults 0swaps
    
    After (1.8% faster):
    6.08user 246.09system 4:12.58elapsed 99%CPU (0avgtext+0avgdata 8373248maxresident)k
    0inputs+0outputs (54major+33555027minor)pagefaults 0swaps
    
    Similar result with MySQL and sysbench using swap:
    Before:
    94055.61 qps
    
    After (0.8% faster):
    94834.91 qps
    
    Radix tree slab usage is also very slightly lower.
    
    Link: https://lkml.kernel.org/r/20240521175854.96038-12-ryncsn@gmail.comSigned-off-by: default avatarKairui Song <kasong@tencent.com>
    Reviewed-by: default avatar"Huang, Ying" <ying.huang@intel.com>
    Cc: Anna Schumaker <anna@kernel.org>
    Cc: Barry Song <v-songbaohua@oppo.com>
    Cc: Chao Yu <chao@kernel.org>
    Cc: Chris Li <chrisl@kernel.org>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Howells <dhowells@redhat.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Ilya Dryomov <idryomov@gmail.com>
    Cc: Jaegeuk Kim <jaegeuk@kernel.org>
    Cc: Jeff Layton <jlayton@kernel.org>
    Cc: Marc Dionne <marc.dionne@auristor.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: NeilBrown <neilb@suse.de>
    Cc: Ryan Roberts <ryan.roberts@arm.com>
    Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
    Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
    Cc: Xiubo Li <xiubli@redhat.com>
    Cc: Yosry Ahmed <yosryahmed@google.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    7aad25b4
huge_memory.c 101 KB