mm, vmscan: guarantee drop_slab_node() termination (1399af7e) · Commits · Kirill Smelkov / linux

Commit 1399af7e authored Sep 02, 2021 by

Vlastimil Babka Committed by Linus Torvalds Sep 03, 2021

mm, vmscan: guarantee drop_slab_node() termination

drop_slab_node() is called as part of echo 2>/proc/sys/vm/drop_caches
operation.  It iterates over all memcgs and calls shrink_slab() which in
turn iterates over all slab shrinkers.  Freed objects are counted and as
long as the total number of freed objects from all memcgs and shrinkers is
higher than 10, drop_slab_node() loops for another full memcgs*shrinkers
iteration.

This arbitrary constant threshold of 10 can result in effectively an
infinite loop on a system with large number of memcgs and/or parallel
activity that allocates new objects.  This has been reported previously by
Chunxin Zang [1] and recently by our customer.

The previous report [1] has resulted in commit 069c411d ("mm/vmscan:
fix infinite loop in drop_slab_node") which added a check for signals
allowing the user to terminate the command writing to drop_caches.  At the
time it was also considered to make the threshold grow with each iteration
to guarantee termination, but such patch hasn't been formally proposed
yet.

This patch implements the dynamically growing threshold.  At first
iteration it's enough to free one object to continue, and this threshold
effectively doubles with each iteration.  Our customer's feedback was
positive.

There is always a risk that this change will result on some system in a
previously terminating drop_caches operation to terminate sooner and free
fewer objects.  Ideally the semantics would guarantee freeing all freeable
objects that existed at the moment of starting the operation, while not
looping forever for newly allocated objects, but that's not feasible to
track.  In the less ideal solution based on thresholds, arguably the
termination guarantee is more important than the exhaustiveness guarantee.
If there are reports of large regression wrt being exhaustive, we can
tune how fast the threshold grows.

[1] https://lore.kernel.org/lkml/20200909152047.27905-1-zangchunxin@bytedance.com/T/#u

[vbabka@suse.cz: avoid undefined shift behaviour]
  Link: https://lkml.kernel.org/r/2f034e6f-a753-550a-f374-e4e23899d3d5@suse.cz

Link: https://lkml.kernel.org/r/20210818152239.25502-1-vbabka@suse.czSigned-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Chunxin Zang <zangchunxin@bytedance.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Chris Down <chris@chrisdown.name>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

parent 2e786d9e

Show whitespace changes

Inline Side-by-side

View file @ 1399af7e

...	@@ -939,6 +939,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,	...	@@ -939,6 +939,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
	void drop_slab_node(int nid)		void drop_slab_node(int nid)
	{		{
	unsigned long freed;		unsigned long freed;
			int shift = 0;

	do {		do {
	struct mem_cgroup *memcg = NULL;		struct mem_cgroup *memcg = NULL;
...	@@ -951,7 +952,7 @@ void drop_slab_node(int nid)	...	@@ -951,7 +952,7 @@ void drop_slab_node(int nid)
	do {		do {
	freed += shrink_slab(GFP_KERNEL, nid, memcg, 0);		freed += shrink_slab(GFP_KERNEL, nid, memcg, 0);
	} while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);		} while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);
	} while (freed > 10);		} while ((freed >> shift++) > 1);
	}		}

	void drop_slab(void)		void drop_slab(void)
...		...

Please register or to comment