• Chuanhua Han's avatar
    mm: support large folios swap-in for sync io devices · 242d12c9
    Chuanhua Han authored
    Currently, we have mTHP features, but unfortunately, without support for
    large folio swap-ins, once these large folios are swapped out, they are
    lost because mTHP swap is a one-way process.  The lack of mTHP swap-in
    functionality prevents mTHP from being used on devices like Android that
    heavily rely on swap.
    
    This patch introduces mTHP swap-in support.  It starts from sync devices
    such as zRAM.  This is probably the simplest and most common use case,
    benefiting billions of Android phones and similar devices with minimal
    implementation cost.  In this straightforward scenario, large folios are
    always exclusive, eliminating the need to handle complex rmap and
    swapcache issues.
    
    It offers several benefits:
    1. Enables bidirectional mTHP swapping, allowing retrieval of mTHP after
       swap-out and swap-in. Large folios in the buddy system are also
       preserved as much as possible, rather than being fragmented due
       to swap-in.
    
    2. Eliminates fragmentation in swap slots and supports successful
       THP_SWPOUT.
    
       w/o this patch (Refer to the data from Chris's and Kairui's latest
       swap allocator optimization while running ./thp_swap_allocator_test
       w/o "-a" option [1]):
    
       ./thp_swap_allocator_test
       Iteration 1: swpout inc: 233, swpout fallback inc: 0, Fallback percentage: 0.00%
       Iteration 2: swpout inc: 131, swpout fallback inc: 101, Fallback percentage: 43.53%
       Iteration 3: swpout inc: 71, swpout fallback inc: 155, Fallback percentage: 68.58%
       Iteration 4: swpout inc: 55, swpout fallback inc: 168, Fallback percentage: 75.34%
       Iteration 5: swpout inc: 35, swpout fallback inc: 191, Fallback percentage: 84.51%
       Iteration 6: swpout inc: 25, swpout fallback inc: 199, Fallback percentage: 88.84%
       Iteration 7: swpout inc: 23, swpout fallback inc: 205, Fallback percentage: 89.91%
       Iteration 8: swpout inc: 9, swpout fallback inc: 219, Fallback percentage: 96.05%
       Iteration 9: swpout inc: 13, swpout fallback inc: 213, Fallback percentage: 94.25%
       Iteration 10: swpout inc: 12, swpout fallback inc: 216, Fallback percentage: 94.74%
       Iteration 11: swpout inc: 16, swpout fallback inc: 213, Fallback percentage: 93.01%
       Iteration 12: swpout inc: 10, swpout fallback inc: 210, Fallback percentage: 95.45%
       Iteration 13: swpout inc: 16, swpout fallback inc: 212, Fallback percentage: 92.98%
       Iteration 14: swpout inc: 12, swpout fallback inc: 212, Fallback percentage: 94.64%
       Iteration 15: swpout inc: 15, swpout fallback inc: 211, Fallback percentage: 93.36%
       Iteration 16: swpout inc: 15, swpout fallback inc: 200, Fallback percentage: 93.02%
       Iteration 17: swpout inc: 9, swpout fallback inc: 220, Fallback percentage: 96.07%
    
       w/ this patch (always 0%):
       Iteration 1: swpout inc: 948, swpout fallback inc: 0, Fallback percentage: 0.00%
       Iteration 2: swpout inc: 953, swpout fallback inc: 0, Fallback percentage: 0.00%
       Iteration 3: swpout inc: 950, swpout fallback inc: 0, Fallback percentage: 0.00%
       Iteration 4: swpout inc: 952, swpout fallback inc: 0, Fallback percentage: 0.00%
       Iteration 5: swpout inc: 950, swpout fallback inc: 0, Fallback percentage: 0.00%
       Iteration 6: swpout inc: 950, swpout fallback inc: 0, Fallback percentage: 0.00%
       Iteration 7: swpout inc: 947, swpout fallback inc: 0, Fallback percentage: 0.00%
       Iteration 8: swpout inc: 950, swpout fallback inc: 0, Fallback percentage: 0.00%
       Iteration 9: swpout inc: 950, swpout fallback inc: 0, Fallback percentage: 0.00%
       Iteration 10: swpout inc: 945, swpout fallback inc: 0, Fallback percentage: 0.00%
       Iteration 11: swpout inc: 947, swpout fallback inc: 0, Fallback percentage: 0.00%
       ...
    
    3. With both mTHP swap-out and swap-in supported, we offer the option to enable
       zsmalloc compression/decompression with larger granularity[2]. The upcoming
       optimization in zsmalloc will significantly increase swap speed and improve
       compression efficiency. Tested by running 100 iterations of swapping 100MiB
       of anon memory, the swap speed improved dramatically:
                    time consumption of swapin(ms)   time consumption of swapout(ms)
         lz4 4k                  45274                    90540
         lz4 64k                 22942                    55667
         zstdn 4k                85035                    186585
         zstdn 64k               46558                    118533
    
        The compression ratio also improved, as evaluated with 1 GiB of data:
         granularity   orig_data_size   compr_data_size
         4KiB-zstd      1048576000       246876055
         64KiB-zstd     1048576000       199763892
    
       Without mTHP swap-in, the potential optimizations in zsmalloc cannot be
       realized.
    
    4. Even mTHP swap-in itself can reduce swap-in page faults by a factor
       of nr_pages. Swapping in content filled with the same data 0x11, w/o
       and w/ the patch for five rounds (Since the content is the same,
       decompression will be very fast. This primarily assesses the impact of
       reduced page faults):
    
      swp in bandwidth(bytes/ms)    w/o              w/
       round1                     624152          1127501
       round2                     631672          1127501
       round3                     620459          1139756
       round4                     606113          1139756
       round5                     624152          1152281
       avg                        621310          1137359      +83%
    
    5. With both mTHP swap-out and swap-in supported, we offer the option to enable
       hardware accelerators(Intel IAA) to do parallel decompression with which
       Kanchana reported 7X improvement on zRAM read latency[3].
    
    [1] https://lore.kernel.org/all/20240730-swap-allocator-v5-0-cb9c148b9297@kernel.org/
    [2] https://lore.kernel.org/all/20240327214816.31191-1-21cnbao@gmail.com/
    [3] https://lore.kernel.org/all/cover.1714581792.git.andre.glover@linux.intel.com/
    
    Link: https://lkml.kernel.org/r/20240908232119.2157-4-21cnbao@gmail.comSigned-off-by: default avatarChuanhua Han <hanchuanhua@oppo.com>
    Co-developed-by: default avatarBarry Song <v-songbaohua@oppo.com>
    Signed-off-by: default avatarBarry Song <v-songbaohua@oppo.com>
    Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
    Cc: Chris Li <chrisl@kernel.org>
    Cc: Christoph Hellwig <hch@infradead.org>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Gao Xiang <xiang@kernel.org>
    Cc: "Huang, Ying" <ying.huang@intel.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Kairui Song <kasong@tencent.com>
    Cc: Kalesh Singh <kaleshsingh@google.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Nhat Pham <nphamcs@gmail.com>
    Cc: Ryan Roberts <ryan.roberts@arm.com>
    Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
    Cc: Shakeel Butt <shakeel.butt@linux.dev>
    Cc: Suren Baghdasaryan <surenb@google.com>
    Cc: Yang Shi <shy828301@gmail.com>
    Cc: Yosry Ahmed <yosryahmed@google.com>
    Cc: Usama Arif <usamaarif642@gmail.com>
    Cc: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
    Cc: Kairui Song <ryncsn@gmail.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    242d12c9
memory.c 190 KB