• Waiman Long's avatar
    cgroup/rstat: Reduce cpu_lock hold time in cgroup_rstat_flush_locked() · e76d28bd
    Waiman Long authored
    
    
    When cgroup_rstat_updated() isn't being called concurrently with
    cgroup_rstat_flush_locked(), its run time is pretty short. When
    both are called concurrently, the cgroup_rstat_updated() run time
    can spike to a pretty high value due to high cpu_lock hold time in
    cgroup_rstat_flush_locked(). This can be problematic if the task calling
    cgroup_rstat_updated() is a realtime task running on an isolated CPU
    with a strict latency requirement. The cgroup_rstat_updated() call can
    happen when there is a page fault even though the task is running in
    user space most of the time.
    
    The percpu cpu_lock is used to protect the update tree -
    updated_next and updated_children. This protection is only needed when
    cgroup_rstat_cpu_pop_updated() is being called. The subsequent flushing
    operation which can take a much longer time does not need that protection
    as it is already protected by cgroup_rstat_lock.
    
    To reduce the cpu_lock hold time, we need to perform all the
    cgroup_rstat_cpu_pop_updated() calls up front with the lock
    released afterward before doing any flushing. This patch adds a new
    cgroup_rstat_updated_list() function to return a singly linked list of
    cgroups to be flushed.
    
    Some instrumentation code are added to measure the cpu_lock hold time
    right after lock acquisition to after releasing the lock. Parallel
    kernel build on a 2-socket x86-64 server is used as the benchmarking
    tool for measuring the lock hold time.
    
    The maximum cpu_lock hold time before and after the patch are 100us and
    29us respectively. So the worst case time is reduced to about 30% of
    the original. However, there may be some OS or hardware noises like NMI
    or SMI in the test system that can worsen the worst case value. Those
    noises are usually tuned out in a real production environment to get
    a better result.
    
    OTOH, the lock hold time frequency distribution should give a better
    idea of the performance benefit of the patch.  Below were the frequency
    distribution before and after the patch:
    
         Hold time        Before patch       After patch
         ---------        ------------       -----------
           0-01 us           804,139         13,738,708
          01-05 us         9,772,767          1,177,194
          05-10 us         4,595,028              4,984
          10-15 us           303,481              3,562
          15-20 us            78,971              1,314
          20-25 us            24,583                 18
          25-30 us             6,908                 12
          30-40 us             8,015
          40-50 us             2,192
          50-60 us               316
          60-70 us                43
          70-80 us                 7
          80-90 us                 2
            >90 us                 3
    Signed-off-by: default avatarWaiman Long <longman@redhat.com>
    Reviewed-by: default avatarYosry Ahmed <yosryahmed@google.com>
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    e76d28bd
rstat.c 15.1 KB