• Jesper Dangaard Brouer's avatar
    cgroup/rstat: add cgroup_rstat_lock helpers and tracepoints · fc29e04a
    Jesper Dangaard Brouer authored
    This commit enhances the ability to troubleshoot the global
    cgroup_rstat_lock by introducing wrapper helper functions for the lock
    along with associated tracepoints.
    
    Although global, the cgroup_rstat_lock helper APIs and tracepoints take
    arguments such as cgroup pointer and cpu_in_loop variable. This
    adjustment is made because flushing occurs per cgroup despite the lock
    being global. Hence, when troubleshooting, it's important to identify the
    relevant cgroup. The cpu_in_loop variable is necessary because the global
    lock may be released within the main flushing loop that traverses CPUs.
    In the tracepoints, the cpu_in_loop value is set to -1 when acquiring the
    main lock; otherwise, it denotes the CPU number processed last.
    
    The new feature in this patchset is detecting when lock is contended. The
    tracepoints are implemented with production in mind. For minimum overhead
    attach to cgroup:cgroup_rstat_lock_contended, which only gets activated
    when trylock detects lock is contended. A quick production check for
    issues could be done via this perf commands:
    
     perf record -g -e cgroup:cgroup_rstat_lock_contended
    
    Next natural question would be asking how long time do lock contenders
    wait for obtaining the lock. This can be answered by measuring the time
    between cgroup:cgroup_rstat_lock_contended and cgroup:cgroup_rstat_locked
    when args->contended is set.  Like this bpftrace script:
    
     bpftrace -e '
       tracepoint:cgroup:cgroup_rstat_lock_contended {@start[tid]=nsecs}
       tracepoint:cgroup:cgroup_rstat_locked {
         if (args->contended) {
           @wait_ns=hist(nsecs-@start[tid]); delete(@start[tid]);}}
       interval:s:1 {time("%H:%M:%S "); print(@wait_ns); }'
    
    Extending with time spend holding the lock will be more expensive as this
    also looks at all the non-contended cases.
    Like this bpftrace script:
    
     bpftrace -e '
       tracepoint:cgroup:cgroup_rstat_lock_contended {@start[tid]=nsecs}
       tracepoint:cgroup:cgroup_rstat_locked { @locked[tid]=nsecs;
         if (args->contended) {
           @wait_ns=hist(nsecs-@start[tid]); delete(@start[tid]);}}
       tracepoint:cgroup:cgroup_rstat_unlock {
           @locked_ns=hist(nsecs-@locked[tid]); delete(@locked[tid]);}
       interval:s:1 {time("%H:%M:%S ");  print(@wait_ns);print(@locked_ns); }'
    Signed-off-by: default avatarJesper Dangaard Brouer <hawk@kernel.org>
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    fc29e04a
rstat.c 17 KB