• Yu Zhao's avatar
    mm: multi-gen LRU: per-node lru_gen_folio lists · e4dde56c
    Yu Zhao authored
    For each node, memcgs are divided into two generations: the old and
    the young. For each generation, memcgs are randomly sharded into
    multiple bins to improve scalability. For each bin, an RCU hlist_nulls
    is virtually divided into three segments: the head, the tail and the
    default.
    
    An onlining memcg is added to the tail of a random bin in the old
    generation. The eviction starts at the head of a random bin in the old
    generation. The per-node memcg generation counter, whose reminder (mod
    2) indexes the old generation, is incremented when all its bins become
    empty.
    
    There are four operations:
    1. MEMCG_LRU_HEAD, which moves an memcg to the head of a random bin in
       its current generation (old or young) and updates its "seg" to
       "head";
    2. MEMCG_LRU_TAIL, which moves an memcg to the tail of a random bin in
       its current generation (old or young) and updates its "seg" to
       "tail";
    3. MEMCG_LRU_OLD, which moves an memcg to the head of a random bin in
       the old generation, updates its "gen" to "old" and resets its "seg"
       to "default";
    4. MEMCG_LRU_YOUNG, which moves an memcg to the tail of a random bin
       in the young generation, updates its "gen" to "young" and resets
       its "seg" to "default".
    
    The events that trigger the above operations are:
    1. Exceeding the soft limit, which triggers MEMCG_LRU_HEAD;
    2. The first attempt to reclaim an memcg below low, which triggers
       MEMCG_LRU_TAIL;
    3. The first attempt to reclaim an memcg below reclaimable size
       threshold, which triggers MEMCG_LRU_TAIL;
    4. The second attempt to reclaim an memcg below reclaimable size
       threshold, which triggers MEMCG_LRU_YOUNG;
    5. Attempting to reclaim an memcg below min, which triggers
       MEMCG_LRU_YOUNG;
    6. Finishing the aging on the eviction path, which triggers
       MEMCG_LRU_YOUNG;
    7. Offlining an memcg, which triggers MEMCG_LRU_OLD.
    
    Note that memcg LRU only applies to global reclaim, and the
    round-robin incrementing of their max_seq counters ensures the
    eventual fairness to all eligible memcgs. For memcg reclaim, it still
    relies on mem_cgroup_iter().
    
    Link: https://lkml.kernel.org/r/20221222041905.2431096-7-yuzhao@google.comSigned-off-by: default avatarYu Zhao <yuzhao@google.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Michael Larabel <Michael@MichaelLarabel.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Mike Rapoport <rppt@kernel.org>
    Cc: Roman Gushchin <roman.gushchin@linux.dev>
    Cc: Suren Baghdasaryan <surenb@google.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    e4dde56c
memcontrol.h 45.5 KB