• Johannes Weiner's avatar
    mm: memcontrol: don't throttle dying tasks on memory.high · 63fd3270
    Johannes Weiner authored
    While investigating hosts with high cgroup memory pressures, Tejun
    found culprit zombie tasks that had were holding on to a lot of
    memory, had SIGKILL pending, but were stuck in memory.high reclaim.
    
    In the past, we used to always force-charge allocations from tasks
    that were exiting in order to accelerate them dying and freeing up
    their rss. This changed for memory.max in a4ebf1b6 ("memcg:
    prohibit unconditional exceeding the limit of dying tasks"); it noted
    that this can cause (userspace inducable) containment failures, so it
    added a mandatory reclaim and OOM kill cycle before forcing charges.
    At the time, memory.high enforcement was handled in the userspace
    return path, which isn't reached by dying tasks, and so memory.high
    was still never enforced by dying tasks.
    
    When c9afe31e ("memcg: synchronously enforce memory.high for large
    overcharges") added synchronous reclaim for memory.high, it added
    unconditional memory.high enforcement for dying tasks as well. The
    callstack shows that this path is where the zombie is stuck in.
    
    We need to accelerate dying tasks getting past memory.high, but we
    cannot do it quite the same way as we do for memory.max: memory.max is
    enforced strictly, and tasks aren't allowed to move past it without
    FIRST reclaiming and OOM killing if necessary. This ensures very small
    levels of excess. With memory.high, though, enforcement happens lazily
    after the charge, and OOM killing is never triggered. A lot of
    concurrent threads could have pushed, or could actively be pushing,
    the cgroup into excess. The dying task will enter reclaim on every
    allocation attempt, with little hope of restoring balance.
    
    To fix this, skip synchronous memory.high enforcement on dying tasks
    altogether again. Update memory.high path documentation while at it.
    
    [hannes@cmpxchg.org: also handle tasks are being killed during the reclaim]
      Link: https://lkml.kernel.org/r/20240111192807.GA424308@cmpxchg.org
    Link: https://lkml.kernel.org/r/20240111132902.389862-1-hannes@cmpxchg.org
    Fixes: c9afe31e ("memcg: synchronously enforce memory.high for large overcharges")
    Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Reported-by: default avatarTejun Heo <tj@kernel.org>
    Reviewed-by: default avatarYosry Ahmed <yosryahmed@google.com>
    Acked-by: default avatarShakeel Butt <shakeelb@google.com>
    Acked-by: default avatarRoman Gushchin <roman.gushchin@linux.dev>
    Cc: Dan Schatzberg <schatzberg.dan@gmail.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Muchun Song <muchun.song@linux.dev>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    63fd3270
memcontrol.c 215 KB