• Roman Gushchin's avatar
    mm: don't raise MEMCG_OOM event due to failed high-order allocation · 7a1adfdd
    Roman Gushchin authored
    It was reported that on some of our machines containers were restarted
    with OOM symptoms without an obvious reason.  Despite there were almost no
    memory pressure and plenty of page cache, MEMCG_OOM event was raised
    occasionally, causing the container management software to think, that OOM
    has happened.  However, no tasks have been killed.
    
    The following investigation showed that the problem is caused by a failing
    attempt to charge a high-order page.  In such case, the OOM killer is
    never invoked.  As shown below, it can happen under conditions, which are
    very far from a real OOM: e.g.  there is plenty of clean page cache and no
    memory pressure.
    
    There is no sense in raising an OOM event in this case, as it might
    confuse a user and lead to wrong and excessive actions (e.g.  restart the
    workload, as in my case).
    
    Let's look at the charging path in try_charge().  If the memory usage is
    about memory.max, which is absolutely natural for most memory cgroups, we
    try to reclaim some pages.  Even if we were able to reclaim enough memory
    for the allocation, the following check can fail due to a race with
    another concurrent allocation:
    
        if (mem_cgroup_margin(mem_over_limit) >= nr_pages)
            goto retry;
    
    For regular pages the following condition will save us from triggering
    the OOM:
    
       if (nr_reclaimed && nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER))
           goto retry;
    
    But for high-order allocation this condition will intentionally fail.  The
    reason behind is that we'll likely fall to regular pages anyway, so it's
    ok and even preferred to return ENOMEM.
    
    In this case the idea of raising MEMCG_OOM looks dubious.
    
    Fix this by moving MEMCG_OOM raising to mem_cgroup_oom() after allocation
    order check, so that the event won't be raised for high order allocations.
    This change doesn't affect regular pages allocation and charging.
    
    Link: http://lkml.kernel.org/r/20181004214050.7417-1-guro@fb.comSigned-off-by: default avatarRoman Gushchin <guro@fb.com>
    Acked-by: default avatarDavid Rientjes <rientjes@google.com>
    Acked-by: default avatarMichal Hocko <mhocko@kernel.org>
    Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    7a1adfdd
cgroup-v2.rst 78.8 KB