• Huang Ying's avatar
    migrate_pages: organize stats with struct migrate_pages_stats · 5b855937
    Huang Ying authored
    Patch series "migrate_pages(): batch TLB flushing", v5.
    
    Now, migrate_pages() migrates folios one by one, like the fake code as
    follows,
    
      for each folio
        unmap
        flush TLB
        copy
        restore map
    
    If multiple folios are passed to migrate_pages(), there are opportunities
    to batch the TLB flushing and copying.  That is, we can change the code to
    something as follows,
    
      for each folio
        unmap
      for each folio
        flush TLB
      for each folio
        copy
      for each folio
        restore map
    
    The total number of TLB flushing IPI can be reduced considerably.  And we
    may use some hardware accelerator such as DSA to accelerate the folio
    copying.
    
    So in this patch, we refactor the migrate_pages() implementation and
    implement the TLB flushing batching.  Base on this, hardware accelerated
    folio copying can be implemented.
    
    If too many folios are passed to migrate_pages(), in the naive batched
    implementation, we may unmap too many folios at the same time.  The
    possibility for a task to wait for the migrated folios to be mapped again
    increases.  So the latency may be hurt.  To deal with this issue, the max
    number of folios be unmapped in batch is restricted to no more than
    HPAGE_PMD_NR in the unit of page.  That is, the influence is at the same
    level of THP migration.
    
    We use the following test to measure the performance impact of the
    patchset,
    
    On a 2-socket Intel server,
    
     - Run pmbench memory accessing benchmark
    
     - Run `migratepages` to migrate pages of pmbench between node 0 and
       node 1 back and forth.
    
    With the patch, the TLB flushing IPI reduces 99.1% during the test and
    the number of pages migrated successfully per second increases 291.7%.
    
    Xin Hao helped to test the patchset on an ARM64 server with 128 cores,
    2 NUMA nodes.  Test results show that the page migration performance
    increases up to 78%.
    
    
    This patch (of 9):
    
    Define struct migrate_pages_stats to organize the various statistics in
    migrate_pages().  This makes it easier to collect and consume the
    statistics in multiple functions.  This will be needed in the following
    patches in the series.
    
    Link: https://lkml.kernel.org/r/20230213123444.155149-1-ying.huang@intel.com
    Link: https://lkml.kernel.org/r/20230213123444.155149-2-ying.huang@intel.comSigned-off-by: default avatar"Huang, Ying" <ying.huang@intel.com>
    Reviewed-by: default avatarAlistair Popple <apopple@nvidia.com>
    Reviewed-by: default avatarZi Yan <ziy@nvidia.com>
    Reviewed-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
    Reviewed-by: default avatarXin Hao <xhao@linux.alibaba.com>
    Cc: Yang Shi <shy828301@gmail.com>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Bharata B Rao <bharata@amd.com>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    5b855937
migrate.c 59.9 KB