Commit d8c47cc7 authored by Johannes Weiner's avatar Johannes Weiner Committed by Linus Torvalds

mm: page_io: fix psi memory pressure error on cold swapins

Once upon a time, all swapins counted toward memory pressure[1].  Then
Joonsoo introduced workingset detection for anonymous pages and we gained
the ability to distinguish hot from cold swapins[2][3].  But we failed to
update swap_readpage() accordingly, and now we account partial memory
pressure in the swapin path of cold memory.

Not for all situations - which adds more inconsistency: paths using the
conventional submit_bio() and lock_page() route will not see much pressure
- unless storage itself is heavily congested and the bio submissions
stall.  ZRAM and ZSWAP do most of the work directly from swap_readpage()
and will see all swapins reflected as pressure.

IOW, a workload doing cold swapins could see little to no pressure
reported with on-disk swap, but potentially high pressure with a zram or
zswap backend.  That confuses any psi-based health monitoring, load
shedding, proactive reclaim, or userspace OOM killing schemes that might
be in place for the workload.

Restore consistency by making all swapin stall accounting conditional on
the page actually being part of the workingset.

[1] commit 93779069 ("mm/page_io.c: annotate refault stalls from swap_readpage")
[2] commit aae466b0 ("mm/swap: implement workingset detection for anonymous LRU")
[3] commit cad8320b ("mm/swap: don't SetPageWorkingset unconditionally during swapin")

Link: https://lkml.kernel.org/r/20220214214921.419687-1-hannes@cmpxchg.orgSigned-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
Reported-by: default avatarCGEL <cgel.zte@gmail.com>
Acked-by: default avatarMinchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent a1a3a2fc
......@@ -359,6 +359,7 @@ int swap_readpage(struct page *page, bool synchronous)
struct bio *bio;
int ret = 0;
struct swap_info_struct *sis = page_swap_info(page);
bool workingset = PageWorkingset(page);
unsigned long pflags;
VM_BUG_ON_PAGE(!PageSwapCache(page) && !synchronous, page);
......@@ -370,7 +371,8 @@ int swap_readpage(struct page *page, bool synchronous)
* or the submitting cgroup IO-throttled, submission can be a
* significant part of overall IO time.
*/
psi_memstall_enter(&pflags);
if (workingset)
psi_memstall_enter(&pflags);
delayacct_swapin_start();
if (frontswap_load(page) == 0) {
......@@ -433,7 +435,8 @@ int swap_readpage(struct page *page, bool synchronous)
bio_put(bio);
out:
psi_memstall_leave(&pflags);
if (workingset)
psi_memstall_leave(&pflags);
delayacct_swapin_end();
return ret;
}
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment