• Qais Yousef's avatar
    sched/uclamp: Add a new sysctl to control RT default boost value · 13685c4a
    Qais Yousef authored
    RT tasks by default run at the highest capacity/performance level. When
    uclamp is selected this default behavior is retained by enforcing the
    requested uclamp.min (p->uclamp_req[UCLAMP_MIN]) of the RT tasks to be
    uclamp_none(UCLAMP_MAX), which is SCHED_CAPACITY_SCALE; the maximum
    value.
    
    This is also referred to as 'the default boost value of RT tasks'.
    
    See commit 1a00d999 ("sched/uclamp: Set default clamps for RT tasks").
    
    On battery powered devices, it is desired to control this default
    (currently hardcoded) behavior at runtime to reduce energy consumed by
    RT tasks.
    
    For example, a mobile device manufacturer where big.LITTLE architecture
    is dominant, the performance of the little cores varies across SoCs, and
    on high end ones the big cores could be too power hungry.
    
    Given the diversity of SoCs, the new knob allows manufactures to tune
    the best performance/power for RT tasks for the particular hardware they
    run on.
    
    They could opt to further tune the value when the user selects
    a different power saving mode or when the device is actively charging.
    
    The runtime aspect of it further helps in creating a single kernel image
    that can be run on multiple devices that require different tuning.
    
    Keep in mind that a lot of RT tasks in the system are created by the
    kernel. On Android for instance I can see over 50 RT tasks, only
    a handful of which created by the Android framework.
    
    To control the default behavior globally by system admins and device
    integrator, introduce the new sysctl_sched_uclamp_util_min_rt_default
    to change the default boost value of the RT tasks.
    
    I anticipate this to be mostly in the form of modifying the init script
    of a particular device.
    
    To avoid polluting the fast path with unnecessary code, the approach
    taken is to synchronously do the update by traversing all the existing
    tasks in the system. This could race with a concurrent fork(), which is
    dealt with by introducing sched_post_fork() function which will ensure
    the racy fork will get the right update applied.
    
    Tested on Juno-r2 in combination with the RT capacity awareness [1].
    By default an RT task will go to the highest capacity CPU and run at the
    maximum frequency, which is particularly energy inefficient on high end
    mobile devices because the biggest core[s] are 'huge' and power hungry.
    
    With this patch the RT task can be controlled to run anywhere by
    default, and doesn't cause the frequency to be maximum all the time.
    Yet any task that really needs to be boosted can easily escape this
    default behavior by modifying its requested uclamp.min value
    (p->uclamp_req[UCLAMP_MIN]) via sched_setattr() syscall.
    
    [1] 804d402f: ("sched/rt: Make RT capacity-aware")
    Signed-off-by: default avatarQais Yousef <qais.yousef@arm.com>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/20200716110347.19553-2-qais.yousef@arm.com
    13685c4a
sysctl.c 79.7 KB