• David Hildenbrand's avatar
    mm: optimize do_wp_page() for fresh pages in local LRU pagevecs · d4c47097
    David Hildenbrand authored
    For example, if a page just got swapped in via a read fault, the LRU
    pagevecs might still hold a reference to the page.  If we trigger a write
    fault on such a page, the additional reference from the LRU pagevecs will
    prohibit reusing the page.
    
    Let's conditionally drain the local LRU pagevecs when we stumble over a
    !PageLRU() page.  We cannot easily drain remote LRU pagevecs and it might
    not be desirable performance-wise.  Consequently, this will only avoid
    copying in some cases.
    
    Add a simple "page_count(page) > 3" check first but keep the
    "page_count(page) > 1 + PageSwapCache(page)" check in place, as we want to
    minimize cases where we remove a page from the swapcache but won't be able
    to reuse it, for example, because another process has it mapped R/O, to
    not affect reclaim.
    
    We cannot easily handle the following cases and we will always have to
    copy:
    
    (1) The page is referenced in the LRU pagevecs of other CPUs. We really
        would have to drain the LRU pagevecs of all CPUs -- most probably
        copying is much cheaper.
    
    (2) The page is already PageLRU() but is getting moved between LRU
        lists, for example, for activation (e.g., mark_page_accessed()),
        deactivation (MADV_COLD), or lazyfree (MADV_FREE). We'd have to
        drain mostly unconditionally, which might be bad performance-wise.
        Most probably this won't happen too often in practice.
    
    Note that there are other reasons why an anon page might temporarily not
    be PageLRU(): for example, compaction and migration have to isolate LRU
    pages from the LRU lists first (isolate_lru_page()), moving them to
    temporary local lists and clearing PageLRU() and holding an additional
    reference on the page.  In that case, we'll always copy.
    
    This change seems to be fairly effective with the reproducer [1] shared by
    Nadav, as long as writeback is done synchronously, for example, using
    zram.  However, with asynchronous writeback, we'll usually fail to free
    the swapcache because the page is still under writeback: something we
    cannot easily optimize for, and maybe it's not really relevant in
    practice.
    
    [1] https://lkml.kernel.org/r/0480D692-D9B2-429A-9A88-9BBA1331AC3A@gmail.com
    
    Link: https://lkml.kernel.org/r/20220131162940.210846-3-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
    Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Don Dutile <ddutile@redhat.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Jann Horn <jannh@google.com>
    Cc: Jason Gunthorpe <jgg@nvidia.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: Liang Zhang <zhangliang5@huawei.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Mike Rapoport <rppt@linux.ibm.com>
    Cc: Nadav Amit <nadav.amit@gmail.com>
    Cc: Oleg Nesterov <oleg@redhat.com>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Rik van Riel <riel@surriel.com>
    Cc: Roman Gushchin <roman.gushchin@linux.dev>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Yang Shi <shy828301@gmail.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    d4c47097
memory.c 149 KB