• Shrikanth Hegde's avatar
    sched: Interleave cfs bandwidth timers for improved single thread performance at low utilization · 41abdba9
    Shrikanth Hegde authored
    CPU cfs bandwidth controller uses hrtimer. Currently there is no initial
    value set. Hence all period timers would align at expiry.
    This happens when there are multiple CPU cgroup's.
    
    There is a performance gain that can be achieved here if the timers are
    interleaved when the utilization of each CPU cgroup is low and total
    utilization of all the CPU cgroup's is less than 50%. If the timers are
    interleaved, then the unthrottled cgroup can run freely without many
    context switches and can also benefit from SMT Folding. This effect will
    be further amplified in SPLPAR environment.
    
    This commit adds a random offset after initializing each hrtimer. This
    would result in interleaving the timers at expiry, which helps in achieving
    the said performance gain.
    
    This was tested on powerpc platform with 8 core SMT=8. Socket power was
    measured when the workload. Benchmarked the stress-ng with power
    information. Throughput oriented benchmarks show significant gain up to
    25% while power consumption increases up to 15%.
    
    Workload: stress-ng --cpu=32 --cpu-ops=50000.
    1CG - 1 cgroup is running.
    2CG - 2 cgroups are running together.
    Time taken to complete stress-ng in seconds and power is in watts.
    each cgroup is throttled at 25% with 100ms as the period value.
               6.2-rc6                     |   with patch
    8 core   1CG    power   2CG     power  |  1CG    power  2 CG    power
            27.5    80.6    40      90     |  27.3    82    32.3    104
            27.5    81      40.2    91     |  27.5    81    38.7     96
            27.7    80      40.1    89     |  27.6    80    29.7    106
            27.7    80.1    40.3    94     |  27.6    80    31.5    105
    
    Latency might be affected by this change. That could happen if the CPU was
    in a deep idle state which is possible if we interleave the timers. Used
    schbench for measuring the latency. Each cgroup is throttled at 25% with
    period value is set to 100ms. Numbers are when both the cgroups are
    running simultaneously. Latency values don't degrade much. Some
    improvement is seen in tail latencies.
    
    		6.2-rc6        with patch
    Groups: 16
    50.0th:          39.5            42.5
    75.0th:         924.0           922.0
    90.0th:         972.0           968.0
    95.0th:        1005.5           994.0
    99.0th:        4166.0          2287.0
    99.5th:        7314.0          7448.0
    99.9th:       15024.0         13600.0
    
    Groups: 32
    50.0th:         819.0           463.0
    75.0th:        1596.0           918.0
    90.0th:        5992.0          1281.5
    95.0th:       13184.0          2765.0
    99.0th:       21792.0         14240.0
    99.5th:       25696.0         18920.0
    99.9th:       33280.0         35776.0
    
    Groups: 64
    50.0th:        4806.0          3440.0
    75.0th:       31136.0         33664.0
    90.0th:       54144.0         58752.0
    95.0th:       66176.0         67200.0
    99.0th:       84736.0         91520.0
    99.5th:       97408.0        114048.0
    99.9th:      136448.0        140032.0
    
    Initial RFC PATCH, discussions and details on the problem:
    
    Link1: https://lore.kernel.org/lkml/5ae3cb09-8c9a-11e8-75a7-cc774d9bc283@linux.vnet.ibm.com/
    Link2: https://lore.kernel.org/lkml/9c57c92c-3e0c-b8c5-4be9-8f4df344a347@linux.vnet.ibm.com/Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
    Suggested-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Shrikanth Hegde<sshegde@linux.vnet.ibm.com>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Reviewed-by: default avatarBen Segall <bsegall@google.com>
    Reviewed-by: default avatarVincent Guittot <vincent.guittot@linaro.org>
    Link: https://lore.kernel.org/r/20230223185153.1499710-1-sshegde@linux.vnet.ibm.com
    41abdba9
fair.c 334 KB