Commit 7854ea6c authored by Michal Hocko's avatar Michal Hocko Committed by Linus Torvalds

mm: consider compaction feedback also for costly allocation

PAGE_ALLOC_COSTLY_ORDER retry logic is mostly handled inside
should_reclaim_retry currently where we decide to not retry after at
least order worth of pages were reclaimed or the watermark check for at
least one zone would succeed after reclaiming all pages if the reclaim
hasn't made any progress.  Compaction feedback is mostly ignored and we
just try to make sure that the compaction did at least something before
giving up.

The first condition was added by a41f24ea ("page allocator: smarter
retry of costly-order allocations) and it assumed that lumpy reclaim
could have created a page of the sufficient order.  Lumpy reclaim, has
been removed quite some time ago so the assumption doesn't hold anymore.
Remove the check for the number of reclaimed pages and rely on the
compaction feedback solely.  should_reclaim_retry now only makes sure
that we keep retrying reclaim for high order pages only if they are
hidden by watermaks so order-0 reclaim makes really sense.

should_compact_retry now keeps retrying even for the costly allocations.
The number of retries is reduced wrt.  !costly requests because they are
less important and harder to grant and so their pressure shouldn't cause
contention for other requests or cause an over reclaim.  We also do not
reset no_progress_loops for costly request to make sure we do not keep
reclaiming too agressively.

This has been tested by running a process which fragments memory:
	- compact memory
	- mmap large portion of the memory (1920M on 2GRAM machine with 2G
	  of swapspace)
	- MADV_DONTNEED single page in PAGE_SIZE*((1UL<<MAX_ORDER)-1)
	  steps until certain amount of memory is freed (250M in my test)
	  and reduce the step to (step / 2) + 1 after reaching the end of
	  the mapping
	- then run a script which populates the page cache 2G (MemTotal)
	  from /dev/zero to a new file
And then tries to allocate
nr_hugepages=$(awk '/MemAvailable/{printf "%d\n", $2/(2*1024)}' /proc/meminfo)
huge pages.

root@test1:~# echo 1 > /proc/sys/vm/overcommit_memory;echo 1 > /proc/sys/vm/compact_memory; ./fragment-mem-and-run /root/alloc_hugepages.sh 1920M 250M
Node 0, zone      DMA     31     28     31     10      2      0      2      1      2      3      1
Node 0, zone    DMA32    437    319    171     50     28     25     20     16     16     14    437

* This is the /proc/buddyinfo after the compaction

Done fragmenting. size=2013265920 freed=262144000
Node 0, zone      DMA    165     48      3      1      2      0      2      2      2      2      0
Node 0, zone    DMA32  35109  14575    185     51     41     12      6      0      0      0      0

* /proc/buddyinfo after memory got fragmented

Executing "/root/alloc_hugepages.sh"
Eating some pagecache
508623+0 records in
508623+0 records out
2083319808 bytes (2.1 GB) copied, 11.7292 s, 178 MB/s
Node 0, zone      DMA      3      5      3      1      2      0      2      2      2      2      0
Node 0, zone    DMA32    111    344    153     20     24     10      3      0      0      0      0

* /proc/buddyinfo after page cache got eaten

Trying to allocate 129
129

* 129 hugepages requested and all of them granted.

Node 0, zone      DMA      3      5      3      1      2      0      2      2      2      2      0
Node 0, zone    DMA32    127     97     30     99     11      6      2      1      4      0      0

* /proc/buddyinfo after hugetlb allocation.

10 runs will behave as follows:
Trying to allocate 130
130
--
Trying to allocate 129
129
--
Trying to allocate 128
128
--
Trying to allocate 129
129
--
Trying to allocate 128
128
--
Trying to allocate 129
129
--
Trying to allocate 132
132
--
Trying to allocate 129
129
--
Trying to allocate 128
128
--
Trying to allocate 129
129

So basically 100% success for all 10 attempts.
Without the patch numbers looked much worse:
Trying to allocate 128
12
--
Trying to allocate 129
14
--
Trying to allocate 129
7
--
Trying to allocate 129
16
--
Trying to allocate 129
30
--
Trying to allocate 129
38
--
Trying to allocate 129
19
--
Trying to allocate 129
37
--
Trying to allocate 129
28
--
Trying to allocate 129
37

Just for completness the base kernel without oom detection rework looks
as follows:
Trying to allocate 127
30
--
Trying to allocate 129
12
--
Trying to allocate 129
52
--
Trying to allocate 128
32
--
Trying to allocate 129
12
--
Trying to allocate 129
10
--
Trying to allocate 129
32
--
Trying to allocate 128
14
--
Trying to allocate 128
16
--
Trying to allocate 129
8

As we can see the success rate is much more volatile and smaller without
this patch. So the patch not only makes the retry logic for costly
requests more sensible the success rate is even higher.
Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent 33c2d214
...@@ -3260,6 +3260,8 @@ should_compact_retry(unsigned int order, enum compact_result compact_result, ...@@ -3260,6 +3260,8 @@ should_compact_retry(unsigned int order, enum compact_result compact_result,
enum migrate_mode *migrate_mode, enum migrate_mode *migrate_mode,
int compaction_retries) int compaction_retries)
{ {
int max_retries = MAX_COMPACT_RETRIES;
if (!order) if (!order)
return false; return false;
...@@ -3277,17 +3279,24 @@ should_compact_retry(unsigned int order, enum compact_result compact_result, ...@@ -3277,17 +3279,24 @@ should_compact_retry(unsigned int order, enum compact_result compact_result,
} }
/* /*
* !costly allocations are really important and we have to make sure * make sure the compaction wasn't deferred or didn't bail out early
* the compaction wasn't deferred or didn't bail out early due to locks * due to locks contention before we declare that we should give up.
* contention before we go OOM. Still cap the reclaim retry loops with
* progress to prevent from looping forever and potential trashing.
*/ */
if (order <= PAGE_ALLOC_COSTLY_ORDER) { if (compaction_withdrawn(compact_result))
if (compaction_withdrawn(compact_result)) return true;
return true;
if (compaction_retries <= MAX_COMPACT_RETRIES) /*
return true; * !costly requests are much more important than __GFP_REPEAT
} * costly ones because they are de facto nofail and invoke OOM
* killer to move on while costly can fail and users are ready
* to cope with that. 1/4 retries is rather arbitrary but we
* would need much more detailed feedback from compaction to
* make a better decision.
*/
if (order > PAGE_ALLOC_COSTLY_ORDER)
max_retries /= 4;
if (compaction_retries <= max_retries)
return true;
return false; return false;
} }
...@@ -3449,18 +3458,17 @@ static inline bool is_thp_gfp_mask(gfp_t gfp_mask) ...@@ -3449,18 +3458,17 @@ static inline bool is_thp_gfp_mask(gfp_t gfp_mask)
* Checks whether it makes sense to retry the reclaim to make a forward progress * Checks whether it makes sense to retry the reclaim to make a forward progress
* for the given allocation request. * for the given allocation request.
* The reclaim feedback represented by did_some_progress (any progress during * The reclaim feedback represented by did_some_progress (any progress during
* the last reclaim round), pages_reclaimed (cumulative number of reclaimed * the last reclaim round) and no_progress_loops (number of reclaim rounds without
* pages) and no_progress_loops (number of reclaim rounds without any progress * any progress in a row) is considered as well as the reclaimable pages on the
* in a row) is considered as well as the reclaimable pages on the applicable * applicable zone list (with a backoff mechanism which is a function of
* zone list (with a backoff mechanism which is a function of no_progress_loops). * no_progress_loops).
* *
* Returns true if a retry is viable or false to enter the oom path. * Returns true if a retry is viable or false to enter the oom path.
*/ */
static inline bool static inline bool
should_reclaim_retry(gfp_t gfp_mask, unsigned order, should_reclaim_retry(gfp_t gfp_mask, unsigned order,
struct alloc_context *ac, int alloc_flags, struct alloc_context *ac, int alloc_flags,
bool did_some_progress, unsigned long pages_reclaimed, bool did_some_progress, int no_progress_loops)
int no_progress_loops)
{ {
struct zone *zone; struct zone *zone;
struct zoneref *z; struct zoneref *z;
...@@ -3472,14 +3480,6 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order, ...@@ -3472,14 +3480,6 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order,
if (no_progress_loops > MAX_RECLAIM_RETRIES) if (no_progress_loops > MAX_RECLAIM_RETRIES)
return false; return false;
if (order > PAGE_ALLOC_COSTLY_ORDER) {
if (pages_reclaimed >= (1<<order))
return false;
if (did_some_progress)
return true;
}
/* /*
* Keep reclaiming pages while there is a chance this will lead somewhere. * Keep reclaiming pages while there is a chance this will lead somewhere.
* If none of the target zones can satisfy our allocation request even * If none of the target zones can satisfy our allocation request even
...@@ -3550,7 +3550,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, ...@@ -3550,7 +3550,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM; bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM;
struct page *page = NULL; struct page *page = NULL;
unsigned int alloc_flags; unsigned int alloc_flags;
unsigned long pages_reclaimed = 0;
unsigned long did_some_progress; unsigned long did_some_progress;
enum migrate_mode migration_mode = MIGRATE_ASYNC; enum migrate_mode migration_mode = MIGRATE_ASYNC;
enum compact_result compact_result; enum compact_result compact_result;
...@@ -3686,16 +3685,18 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, ...@@ -3686,16 +3685,18 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT)) if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
goto noretry; goto noretry;
if (did_some_progress) { /*
* Costly allocations might have made a progress but this doesn't mean
* their order will become available due to high fragmentation so
* always increment the no progress counter for them
*/
if (did_some_progress && order <= PAGE_ALLOC_COSTLY_ORDER)
no_progress_loops = 0; no_progress_loops = 0;
pages_reclaimed += did_some_progress; else
} else {
no_progress_loops++; no_progress_loops++;
}
if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags, if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
did_some_progress > 0, pages_reclaimed, did_some_progress > 0, no_progress_loops))
no_progress_loops))
goto retry; goto retry;
/* /*
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment