• Jakub Kicinski's avatar
    mm/memcg: automatically penalize tasks with high swap use · 4b82ab4f
    Jakub Kicinski authored
    Add a memory.swap.high knob, which can be used to protect the system
    from SWAP exhaustion.  The mechanism used for penalizing is similar to
    memory.high penalty (sleep on return to user space).
    
    That is not to say that the knob itself is equivalent to memory.high.
    The objective is more to protect the system from potentially buggy tasks
    consuming a lot of swap and impacting other tasks, or even bringing the
    whole system to stand still with complete SWAP exhaustion.  Hopefully
    without the need to find per-task hard limits.
    
    Slowing misbehaving tasks down gradually allows user space oom killers
    or other protection mechanisms to react.  oomd and earlyoom already do
    killing based on swap exhaustion, and memory.swap.high protection will
    help implement such userspace oom policies more reliably.
    
    We can use one counter for number of pages allocated under pressure to
    save struct task space and avoid two separate hierarchy walks on the hot
    path.  The exact overage is calculated on return to user space, anyway.
    
    Take the new high limit into account when determining if swap is "full".
    Borrowing the explanation from Johannes:
    
      The idea behind "swap full" is that as long as the workload has plenty
      of swap space available and it's not changing its memory contents, it
      makes sense to generously hold on to copies of data in the swap device,
      even after the swapin.  A later reclaim cycle can drop the page without
      any IO.  Trading disk space for IO.
    
      But the only two ways to reclaim a swap slot is when they're faulted
      in and the references go away, or by scanning the virtual address space
      like swapoff does - which is very expensive (one could argue it's too
      expensive even for swapoff, it's often more practical to just reboot).
    
      So at some point in the fill level, we have to start freeing up swap
      slots on fault/swapin.  Otherwise we could eventually run out of swap
      slots while they're filled with copies of data that is also in RAM.
    
      We don't want to OOM a workload because its available swap space is
      filled with redundant cache.
    Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: Chris Down <chris@chrisdown.name>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Hugh Dickins <hughd@google.com>
    Link: http://lkml.kernel.org/r/20200527195846.102707-5-kuba@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    4b82ab4f
memcontrol.c 190 KB