• Tetsuo Handa's avatar
    mm/page_alloc.c: make sure OOM victim can try allocations with no watermarks once · de12c73f
    Tetsuo Handa authored
    commit c288983d upstream.
    
    Roman Gushchin has reported that the OOM killer can trivially selects
    next OOM victim when a thread doing memory allocation from page fault
    path was selected as first OOM victim.
    
        allocate invoked oom-killer: gfp_mask=0x14280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null),  order=0, oom_score_adj=0
        allocate cpuset=/ mems_allowed=0
        CPU: 1 PID: 492 Comm: allocate Not tainted 4.12.0-rc1-mm1+ #181
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
        Call Trace:
         oom_kill_process+0x219/0x3e0
         out_of_memory+0x11d/0x480
         __alloc_pages_slowpath+0xc84/0xd40
         __alloc_pages_nodemask+0x245/0x260
         alloc_pages_vma+0xa2/0x270
         __handle_mm_fault+0xca9/0x10c0
         handle_mm_fault+0xf3/0x210
         __do_page_fault+0x240/0x4e0
         trace_do_page_fault+0x37/0xe0
         do_async_page_fault+0x19/0x70
         async_page_fault+0x28/0x30
        ...
        Out of memory: Kill process 492 (allocate) score 899 or sacrifice child
        Killed process 492 (allocate) total-vm:2052368kB, anon-rss:1894576kB, file-rss:4kB, shmem-rss:0kB
        allocate: page allocation failure: order:0, mode:0x14280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null)
        allocate cpuset=/ mems_allowed=0
        CPU: 1 PID: 492 Comm: allocate Not tainted 4.12.0-rc1-mm1+ #181
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
        Call Trace:
         __alloc_pages_slowpath+0xd32/0xd40
         __alloc_pages_nodemask+0x245/0x260
         alloc_pages_vma+0xa2/0x270
         __handle_mm_fault+0xca9/0x10c0
         handle_mm_fault+0xf3/0x210
         __do_page_fault+0x240/0x4e0
         trace_do_page_fault+0x37/0xe0
         do_async_page_fault+0x19/0x70
         async_page_fault+0x28/0x30
        ...
        oom_reaper: reaped process 492 (allocate), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
        ...
        allocate invoked oom-killer: gfp_mask=0x0(), nodemask=(null),  order=0, oom_score_adj=0
        allocate cpuset=/ mems_allowed=0
        CPU: 1 PID: 492 Comm: allocate Not tainted 4.12.0-rc1-mm1+ #181
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
        Call Trace:
         oom_kill_process+0x219/0x3e0
         out_of_memory+0x11d/0x480
         pagefault_out_of_memory+0x68/0x80
         mm_fault_error+0x8f/0x190
         ? handle_mm_fault+0xf3/0x210
         __do_page_fault+0x4b2/0x4e0
         trace_do_page_fault+0x37/0xe0
         do_async_page_fault+0x19/0x70
         async_page_fault+0x28/0x30
        ...
        Out of memory: Kill process 233 (firewalld) score 10 or sacrifice child
        Killed process 233 (firewalld) total-vm:246076kB, anon-rss:20956kB, file-rss:0kB, shmem-rss:0kB
    
    There is a race window that the OOM reaper completes reclaiming the
    first victim's memory while nothing but mutex_trylock() prevents the
    first victim from calling out_of_memory() from pagefault_out_of_memory()
    after memory allocation for page fault path failed due to being selected
    as an OOM victim.
    
    This is a side effect of commit 9a67f648 ("mm: consolidate
    GFP_NOFAIL checks in the allocator slowpath") because that commit
    silently changed the behavior from
    
        /* Avoid allocations with no watermarks from looping endlessly */
    
    to
    
        /*
         * Give up allocations without trying memory reserves if selected
         * as an OOM victim
         */
    
    in __alloc_pages_slowpath() by moving the location to check TIF_MEMDIE
    flag.  I have noticed this change but I didn't post a patch because I
    thought it is an acceptable change other than noise by warn_alloc()
    because !__GFP_NOFAIL allocations are allowed to fail.  But we
    overlooked that failing memory allocation from page fault path makes
    difference due to the race window explained above.
    
    While it might be possible to add a check to pagefault_out_of_memory()
    that prevents the first victim from calling out_of_memory() or remove
    out_of_memory() from pagefault_out_of_memory(), changing
    pagefault_out_of_memory() does not suppress noise by warn_alloc() when
    allocating thread was selected as an OOM victim.  There is little point
    with printing similar backtraces and memory information from both
    out_of_memory() and warn_alloc().
    
    Instead, if we guarantee that current thread can try allocations with no
    watermarks once when current thread looping inside
    __alloc_pages_slowpath() was selected as an OOM victim, we can follow "who
    can use memory reserves" rules and suppress noise by warn_alloc() and
    prevent memory allocations from page fault path from calling
    pagefault_out_of_memory().
    
    If we take the comment literally, this patch would do
    
      -    if (test_thread_flag(TIF_MEMDIE))
      -        goto nopage;
      +    if (alloc_flags == ALLOC_NO_WATERMARKS || (gfp_mask & __GFP_NOMEMALLOC))
      +        goto nopage;
    
    because gfp_pfmemalloc_allowed() returns false if __GFP_NOMEMALLOC is
    given.  But if I recall correctly (I couldn't find the message), the
    condition is meant to apply to only OOM victims despite the comment.
    Therefore, this patch preserves TIF_MEMDIE check.
    
    Fixes: 9a67f648 ("mm: consolidate GFP_NOFAIL checks in the allocator slowpath")
    Link: http://lkml.kernel.org/r/201705192112.IAF69238.OQOHSJLFOFFMtV@I-love.SAKURA.ne.jpSigned-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
    Reported-by: default avatarRoman Gushchin <guro@fb.com>
    Tested-by: default avatarRoman Gushchin <guro@fb.com>
    Acked-by: default avatarMichal Hocko <mhocko@suse.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    de12c73f
page_alloc.c 208 KB