• Chengming Zhou's avatar
    sched/psi: Optimize task switch inside shared cgroups again · 65176f59
    Chengming Zhou authored
    Way back when PSI_MEM_FULL was accounted from the timer tick, task
    switching could simply iterate next and prev to the common ancestor to
    update TSK_ONCPU and be done.
    
    Then memstall ticks were replaced with checking curr->in_memstall
    directly in psi_group_change(). That meant that now if the task switch
    was between a memstall and a !memstall task, we had to iterate through
    the common ancestors at least ONCE to fix up their state_masks.
    
    We added the identical_state filter to make sure the common ancestor
    elimination was skipped in that case. It seems that was always a
    little too eager, because it caused us to walk the common ancestors
    *twice* instead of the required once: the iteration for next could
    have stopped at the common ancestor; prev could have updated TSK_ONCPU
    up to the common ancestor, then finish to the root without changing
    any flags, just to get the new curr->in_memstall into the state_masks.
    
    This patch recognizes this and makes it so that we walk to the root
    exactly once if state_mask needs updating, which is simply catching up
    on a missed optimization that could have been done in commit 7fae6c81
    ("psi: Use ONCPU state tracking machinery to detect reclaim") directly.
    
    Apart from this, it's also necessary for the next patch "sched/psi: remove
    NR_ONCPU task accounting". Suppose we walk the common ancestors twice:
    
    (1) psi_group_change(.clear = 0, .set = TSK_ONCPU)
    (2) psi_group_change(.clear = TSK_ONCPU, .set = 0)
    
    We previously used tasks[NR_ONCPU] to record TSK_ONCPU, tasks[NR_ONCPU]++
    in (1) then tasks[NR_ONCPU]-- in (2), so tasks[NR_ONCPU] still be correct.
    
    The next patch change to use one bit in state mask to record TSK_ONCPU,
    PSI_ONCPU bit will be set in (1), but then be cleared in (2), which cause
    the psi_group_cpu has task running on CPU but without PSI_ONCPU bit set!
    
    With this patch, we will never walk the common ancestors twice, so won't
    have above problem.
    Suggested-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: default avatarChengming Zhou <zhouchengming@bytedance.com>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Link: https://lore.kernel.org/r/20220825164111.29534-6-zhouchengming@bytedance.com
    65176f59
psi.c 38.7 KB