• Vladimir Davydov's avatar
    list_lru: introduce list_lru_shrink_{count,walk} · 503c358c
    Vladimir Davydov authored
    Kmem accounting of memcg is unusable now, because it lacks slab shrinker
    support.  That means when we hit the limit we will get ENOMEM w/o any
    chance to recover.  What we should do then is to call shrink_slab, which
    would reclaim old inode/dentry caches from this cgroup.  This is what
    this patch set is intended to do.
    
    Basically, it does two things.  First, it introduces the notion of
    per-memcg slab shrinker.  A shrinker that wants to reclaim objects per
    cgroup should mark itself as SHRINKER_MEMCG_AWARE.  Then it will be
    passed the memory cgroup to scan from in shrink_control->memcg.  For
    such shrinkers shrink_slab iterates over the whole cgroup subtree under
    the target cgroup and calls the shrinker for each kmem-active memory
    cgroup.
    
    Secondly, this patch set makes the list_lru structure per-memcg.  It's
    done transparently to list_lru users - everything they have to do is to
    tell list_lru_init that they want memcg-aware list_lru.  Then the
    list_lru will automatically distribute objects among per-memcg lists
    basing on which cgroup the object is accounted to.  This way to make FS
    shrinkers (icache, dcache) memcg-aware we only need to make them use
    memcg-aware list_lru, and this is what this patch set does.
    
    As before, this patch set only enables per-memcg kmem reclaim when the
    pressure goes from memory.limit, not from memory.kmem.limit.  Handling
    memory.kmem.limit is going to be tricky due to GFP_NOFS allocations, and
    it is still unclear whether we will have this knob in the unified
    hierarchy.
    
    This patch (of 9):
    
    NUMA aware slab shrinkers use the list_lru structure to distribute
    objects coming from different NUMA nodes to different lists.  Whenever
    such a shrinker needs to count or scan objects from a particular node,
    it issues commands like this:
    
            count = list_lru_count_node(lru, sc->nid);
            freed = list_lru_walk_node(lru, sc->nid, isolate_func,
                                       isolate_arg, &sc->nr_to_scan);
    
    where sc is an instance of the shrink_control structure passed to it
    from vmscan.
    
    To simplify this, let's add special list_lru functions to be used by
    shrinkers, list_lru_shrink_count() and list_lru_shrink_walk(), which
    consolidate the nid and nr_to_scan arguments in the shrink_control
    structure.
    
    This will also allow us to avoid patching shrinkers that use list_lru
    when we make shrink_slab() per-memcg - all we will have to do is extend
    the shrink_control structure to include the target memcg and make
    list_lru_shrink_{count,walk} handle this appropriately.
    Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
    Suggested-by: default avatarDave Chinner <david@fromorbit.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Michal Hocko <mhocko@suse.cz>
    Cc: Greg Thelen <gthelen@google.com>
    Cc: Glauber Costa <glommer@gmail.com>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: Pekka Enberg <penberg@kernel.org>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Tejun Heo <tj@kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    503c358c
dcache.c 85.7 KB