Commit b366f976 authored by Rafael J. Wysocki's avatar Rafael J. Wysocki

Merge branch 'pm-cpufreq'

* pm-cpufreq: (30 commits)
  Documentation: cpufreq: intel_pstate: enhance documentation
  cpufreq-dt: fix handling regulator_get_voltage() result
  cpufreq: governor: Fix negative idle_time when configured with CONFIG_HZ_PERIODIC
  cpufreq: mt8173: migrate to use operating-points-v2 bindings
  cpufreq: Simplify core code related to boost support
  cpufreq: acpi-cpufreq: Simplify boost-related code
  cpufreq: Make cpufreq_boost_supported() static
  blackfin-cpufreq: Mark cpu_set_cclk() as static
  blackfin-cpufreq: Change return type of cpu_set_cclk() to that of clk_set_rate()
  dt: cpufreq: st: Provide bindings for ST's CPUFreq implementation
  cpufreq: st: Provide runtime initialised driver for ST's platforms
  cpufreq: mt8173: Move resources allocation into ->probe()
  cpufreq: intel_pstate: Account for IO wait time
  cpufreq: intel_pstate: Account for non C0 time
  cpufreq: intel_pstate: Configurable algorithm to get target pstate
  cpufreq: mt8173: check return value of regulator_get_voltage() call
  cpufreq: mt8173: remove redundant regulator_get_voltage() call
  cpufreq: mt8173: add CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag
  cpufreq: qoriq: Register cooling device based on device tree
  cpufreq: pcc-cpufreq: update default value of cpuinfo_transition_latency
  ...
parents 7f4a3702 a032d2de
Intel P-state driver
Intel P-State driver
--------------------
This driver provides an interface to control the P state selection for
SandyBridge+ Intel processors. The driver can operate two different
modes based on the processor model, legacy mode and Hardware P state (HWP)
mode.
In legacy mode, the Intel P-state implements two internal governors,
performance and powersave, that differ from the general cpufreq governors of
the same name (the general cpufreq governors implement target(), whereas the
internal Intel P-state governors implement setpolicy()). The internal
performance governor sets the max_perf_pct and min_perf_pct to 100; that is,
the governor selects the highest available P state to maximize the performance
of the core. The internal powersave governor selects the appropriate P state
based on the current load on the CPU.
In HWP mode P state selection is implemented in the processor
itself. The driver provides the interfaces between the cpufreq core and
the processor to control P state selection based on user preferences
and reporting frequency to the cpufreq core. In this mode the
internal Intel P-state governor code is disabled.
In addition to the interfaces provided by the cpufreq core for
controlling frequency the driver provides sysfs files for
controlling P state selection. These files have been added to
/sys/devices/system/cpu/intel_pstate/
max_perf_pct: limits the maximum P state that will be requested by
the driver stated as a percentage of the available performance. The
available (P states) performance may be reduced by the no_turbo
This driver provides an interface to control the P-State selection for the
SandyBridge+ Intel processors.
The following document explains P-States:
http://events.linuxfoundation.org/sites/events/files/slides/LinuxConEurope_2015.pdf
As stated in the document, P-State doesn’t exactly mean a frequency. However, for
the sake of the relationship with cpufreq, P-State and frequency are used
interchangeably.
Understanding the cpufreq core governors and policies are important before
discussing more details about the Intel P-State driver. Based on what callbacks
a cpufreq driver provides to the cpufreq core, it can support two types of
drivers:
- with target_index() callback: In this mode, the drivers using cpufreq core
simply provide the minimum and maximum frequency limits and an additional
interface target_index() to set the current frequency. The cpufreq subsystem
has a number of scaling governors ("performance", "powersave", "ondemand",
etc.). Depending on which governor is in use, cpufreq core will call for
transitions to a specific frequency using target_index() callback.
- setpolicy() callback: In this mode, drivers do not provide target_index()
callback, so cpufreq core can't request a transition to a specific frequency.
The driver provides minimum and maximum frequency limits and callbacks to set a
policy. The policy in cpufreq sysfs is referred to as the "scaling governor".
The cpufreq core can request the driver to operate in any of the two policies:
"performance: and "powersave". The driver decides which frequency to use based
on the above policy selection considering minimum and maximum frequency limits.
The Intel P-State driver falls under the latter category, which implements the
setpolicy() callback. This driver decides what P-State to use based on the
requested policy from the cpufreq core. If the processor is capable of
selecting its next P-State internally, then the driver will offload this
responsibility to the processor (aka HWP: Hardware P-States). If not, the
driver implements algorithms to select the next P-State.
Since these policies are implemented in the driver, they are not same as the
cpufreq scaling governors implementation, even if they have the same name in
the cpufreq sysfs (scaling_governors). For example the "performance" policy is
similar to cpufreq’s "performance" governor, but "powersave" is completely
different than the cpufreq "powersave" governor. The strategy here is similar
to cpufreq "ondemand", where the requested P-State is related to the system load.
Sysfs Interface
In addition to the frequency-controlling interfaces provided by the cpufreq
core, the driver provides its own sysfs files to control the P-State selection.
These files have been added to /sys/devices/system/cpu/intel_pstate/.
Any changes made to these files are applicable to all CPUs (even in a
multi-package system).
max_perf_pct: Limits the maximum P-State that will be requested by
the driver. It states it as a percentage of the available performance. The
available (P-State) performance may be reduced by the no_turbo
setting described below.
min_perf_pct: limits the minimum P state that will be requested by
the driver stated as a percentage of the max (non-turbo)
min_perf_pct: Limits the minimum P-State that will be requested by
the driver. It states it as a percentage of the max (non-turbo)
performance level.
no_turbo: limits the driver to selecting P states below the turbo
no_turbo: Limits the driver to selecting P-State below the turbo
frequency range.
turbo_pct: displays the percentage of the total performance that
is supported by hardware that is in the turbo range. This number
turbo_pct: Displays the percentage of the total performance that
is supported by hardware that is in the turbo range. This number
is independent of whether turbo has been disabled or not.
num_pstates: displays the number of pstates that are supported
by hardware. This number is independent of whether turbo has
num_pstates: Displays the number of P-States that are supported
by hardware. This number is independent of whether turbo has
been disabled or not.
For example, if a system has these parameters:
Max 1 core turbo ratio: 0x21 (Max 1 core ratio is the maximum P-State)
Max non turbo ratio: 0x17
Minimum ratio : 0x08 (Here the ratio is called max efficiency ratio)
Sysfs will show :
max_perf_pct:100, which corresponds to 1 core ratio
min_perf_pct:24, max_efficiency_ratio / max 1 Core ratio
no_turbo:0, turbo is not disabled
num_pstates:26 = (max 1 Core ratio - Max Efficiency Ratio + 1)
turbo_pct:39 = (max 1 core ratio - max non turbo ratio) / num_pstates
Refer to "Intel® 64 and IA-32 Architectures Software Developer’s Manual
Volume 3: System Programming Guide" to understand ratios.
cpufreq sysfs for Intel P-State
Since this driver registers with cpufreq, cpufreq sysfs is also presented.
There are some important differences, which need to be considered.
scaling_cur_freq: This displays the real frequency which was used during
the last sample period instead of what is requested. Some other cpufreq driver,
like acpi-cpufreq, displays what is requested (Some changes are on the
way to fix this for acpi-cpufreq driver). The same is true for frequencies
displayed at /proc/cpuinfo.
scaling_governor: This displays current active policy. Since each CPU has a
cpufreq sysfs, it is possible to set a scaling governor to each CPU. But this
is not possible with Intel P-States, as there is one common policy for all
CPUs. Here, the last requested policy will be applicable to all CPUs. It is
suggested that one use the cpupower utility to change policy to all CPUs at the
same time.
scaling_setspeed: This attribute can never be used with Intel P-State.
scaling_max_freq/scaling_min_freq: This interface can be used similarly to
the max_perf_pct/min_perf_pct of Intel P-State sysfs. However since frequencies
are converted to nearest possible P-State, this is prone to rounding errors.
This method is not preferred to limit performance.
affected_cpus: Not used
related_cpus: Not used
For contemporary Intel processors, the frequency is controlled by the
processor itself and the P-states exposed to software are related to
processor itself and the P-State exposed to software is related to
performance levels. The idea that frequency can be set to a single
frequency is fiction for Intel Core processors. Even if the scaling
driver selects a single P state the actual frequency the processor
frequency is fictional for Intel Core processors. Even if the scaling
driver selects a single P-State, the actual frequency the processor
will run at is selected by the processor itself.
For legacy mode debugfs files have also been added to allow tuning of
the internal governor algorythm. These files are located at
/sys/kernel/debug/pstate_snb/ These files are NOT present in HWP mode.
Tuning Intel P-State driver
When HWP mode is not used, debugfs files have also been added to allow the
tuning of the internal governor algorithm. These files are located at
/sys/kernel/debug/pstate_snb/. The algorithm uses a PID (Proportional
Integral Derivative) controller. The PID tunable parameters are:
deadband
d_gain_pct
......@@ -63,3 +133,90 @@ the internal governor algorythm. These files are located at
p_gain_pct
sample_rate_ms
setpoint
To adjust these parameters, some understanding of driver implementation is
necessary. There are some tweeks described here, but be very careful. Adjusting
them requires expert level understanding of power and performance relationship.
These limits are only useful when the "powersave" policy is active.
-To make the system more responsive to load changes, sample_rate_ms can
be adjusted (current default is 10ms).
-To make the system use higher performance, even if the load is lower, setpoint
can be adjusted to a lower number. This will also lead to faster ramp up time
to reach the maximum P-State.
If there are no derivative and integral coefficients, The next P-State will be
equal to:
current P-State - ((setpoint - current cpu load) * p_gain_pct)
For example, if the current PID parameters are (Which are defaults for the core
processors like SandyBridge):
deadband = 0
d_gain_pct = 0
i_gain_pct = 0
p_gain_pct = 20
sample_rate_ms = 10
setpoint = 97
If the current P-State = 0x08 and current load = 100, this will result in the
next P-State = 0x08 - ((97 - 100) * 0.2) = 8.6 (rounded to 9). Here the P-State
goes up by only 1. If during next sample interval the current load doesn't
change and still 100, then P-State goes up by one again. This process will
continue as long as the load is more than the setpoint until the maximum P-State
is reached.
For the same load at setpoint = 60, this will result in the next P-State
= 0x08 - ((60 - 100) * 0.2) = 16
So by changing the setpoint from 97 to 60, there is an increase of the
next P-State from 9 to 16. So this will make processor execute at higher
P-State for the same CPU load. If the load continues to be more than the
setpoint during next sample intervals, then P-State will go up again till the
maximum P-State is reached. But the ramp up time to reach the maximum P-State
will be much faster when the setpoint is 60 compared to 97.
Debugging Intel P-State driver
Event tracing
To debug P-State transition, the Linux event tracing interface can be used.
There are two specific events, which can be enabled (Provided the kernel
configs related to event tracing are enabled).
# cd /sys/kernel/debug/tracing/
# echo 1 > events/power/pstate_sample/enable
# echo 1 > events/power/cpu_frequency/enable
# cat trace
gnome-terminal--4510 [001] ..s. 1177.680733: pstate_sample: core_busy=107
scaled=94 from=26 to=26 mperf=1143818 aperf=1230607 tsc=29838618
freq=2474476
cat-5235 [002] ..s. 1177.681723: cpu_frequency: state=2900000 cpu_id=2
Using ftrace
If function level tracing is required, the Linux ftrace interface can be used.
For example if we want to check how often a function to set a P-State is
called, we can set ftrace filter to intel_pstate_set_pstate.
# cd /sys/kernel/debug/tracing/
# cat available_filter_functions | grep -i pstate
intel_pstate_set_pstate
intel_pstate_cpu_init
...
# echo intel_pstate_set_pstate > set_ftrace_filter
# echo function > current_tracer
# cat trace | head -15
# tracer: function
#
# entries-in-buffer/entries-written: 80/80 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
Xorg-3129 [000] ..s. 2537.644844: intel_pstate_set_pstate <-intel_pstate_timer_func
gnome-terminal--4510 [002] ..s. 2537.649844: intel_pstate_set_pstate <-intel_pstate_timer_func
gnome-shell-3409 [001] ..s. 2537.650850: intel_pstate_set_pstate <-intel_pstate_timer_func
<idle>-0 [000] ..s. 2537.654843: intel_pstate_set_pstate <-intel_pstate_timer_func
......@@ -159,8 +159,8 @@ to be strictly associated with a P-state.
2.2 cpuinfo_transition_latency:
-------------------------------
The cpuinfo_transition_latency field is 0. The PCC specification does
not include a field to expose this value currently.
The cpuinfo_transition_latency field is CPUFREQ_ETERNAL. The PCC specification
does not include a field to expose this value currently.
2.3 cpuinfo_cur_freq:
---------------------
......
......@@ -242,6 +242,23 @@ nodes to be present and contain the properties described below.
Definition: Specifies the syscon node controlling the cpu core
power domains.
- dynamic-power-coefficient
Usage: optional
Value type: <prop-encoded-array>
Definition: A u32 value that represents the running time dynamic
power coefficient in units of mW/MHz/uVolt^2. The
coefficient can either be calculated from power
measurements or derived by analysis.
The dynamic power consumption of the CPU is
proportional to the square of the Voltage (V) and
the clock frequency (f). The coefficient is used to
calculate the dynamic power as below -
Pdyn = dynamic-power-coefficient * V^2 * f
where voltage is in uV, frequency is in MHz.
Example 1 (dual-cluster big.LITTLE system 32-bit):
cpus {
......
Binding for ST's CPUFreq driver
===============================
ST's CPUFreq driver attempts to read 'process' and 'version' attributes
from the SoC, then supplies the OPP framework with 'prop' and 'supported
hardware' information respectively. The framework is then able to read
the DT and operate in the usual way.
For more information about the expected DT format [See: ../opp/opp.txt].
Frequency Scaling only
----------------------
No vendor specific driver required for this.
Located in CPU's node:
- operating-points : [See: ../power/opp.txt]
Example [safe]
--------------
cpus {
cpu@0 {
/* kHz uV */
operating-points = <1500000 0
1200000 0
800000 0
500000 0>;
};
};
Dynamic Voltage and Frequency Scaling (DVFS)
--------------------------------------------
This requires the ST CPUFreq driver to supply 'process' and 'version' info.
Located in CPU's node:
- operating-points-v2 : [See ../power/opp.txt]
Example [unsafe]
----------------
cpus {
cpu@0 {
operating-points-v2 = <&cpu0_opp_table>;
};
};
cpu0_opp_table: opp_table {
compatible = "operating-points-v2";
/* ############################################################### */
/* # WARNING: Do not attempt to copy/replicate these nodes, # */
/* # they are only to be supplied by the bootloader !!! # */
/* ############################################################### */
opp0 {
/* Major Minor Substrate */
/* 2 all all */
opp-supported-hw = <0x00000004 0xffffffff 0xffffffff>;
opp-hz = /bits/ 64 <1500000000>;
clock-latency-ns = <10000000>;
opp-microvolt-pcode0 = <1200000>;
opp-microvolt-pcode1 = <1200000>;
opp-microvolt-pcode2 = <1200000>;
opp-microvolt-pcode3 = <1200000>;
opp-microvolt-pcode4 = <1170000>;
opp-microvolt-pcode5 = <1140000>;
opp-microvolt-pcode6 = <1100000>;
opp-microvolt-pcode7 = <1070000>;
};
opp1 {
/* Major Minor Substrate */
/* all all all */
opp-supported-hw = <0xffffffff 0xffffffff 0xffffffff>;
opp-hz = /bits/ 64 <1200000000>;
clock-latency-ns = <10000000>;
opp-microvolt-pcode0 = <1110000>;
opp-microvolt-pcode1 = <1150000>;
opp-microvolt-pcode2 = <1100000>;
opp-microvolt-pcode3 = <1080000>;
opp-microvolt-pcode4 = <1040000>;
opp-microvolt-pcode5 = <1020000>;
opp-microvolt-pcode6 = <980000>;
opp-microvolt-pcode7 = <930000>;
};
};
......@@ -6,6 +6,8 @@
config ARM_BIG_LITTLE_CPUFREQ
tristate "Generic ARM big LITTLE CPUfreq driver"
depends on (ARM_CPU_TOPOLOGY || ARM64) && HAVE_CLK
# if CPU_THERMAL is on and THERMAL=m, ARM_BIT_LITTLE_CPUFREQ cannot be =y
depends on !CPU_THERMAL || THERMAL
select PM_OPP
help
This enables the Generic CPUfreq driver for ARM big.LITTLE platforms.
......@@ -217,6 +219,16 @@ config ARM_SPEAR_CPUFREQ
help
This adds the CPUFreq driver support for SPEAr SOCs.
config ARM_STI_CPUFREQ
tristate "STi CPUFreq support"
depends on SOC_STIH407
help
This driver uses the generic OPP framework to match the running
platform with a predefined set of suitable values. If not provided
we will fall-back so safe-values contained in Device Tree. Enable
this config option if you wish to add CPUFreq support for STi based
SoCs.
config ARM_TEGRA20_CPUFREQ
bool "Tegra20 CPUFreq support"
depends on ARCH_TEGRA
......
......@@ -73,6 +73,7 @@ obj-$(CONFIG_ARM_SA1100_CPUFREQ) += sa1100-cpufreq.o
obj-$(CONFIG_ARM_SA1110_CPUFREQ) += sa1110-cpufreq.o
obj-$(CONFIG_ARM_SCPI_CPUFREQ) += scpi-cpufreq.o
obj-$(CONFIG_ARM_SPEAR_CPUFREQ) += spear-cpufreq.o
obj-$(CONFIG_ARM_STI_CPUFREQ) += sti-cpufreq.o
obj-$(CONFIG_ARM_TEGRA20_CPUFREQ) += tegra20-cpufreq.o
obj-$(CONFIG_ARM_TEGRA124_CPUFREQ) += tegra124-cpufreq.o
obj-$(CONFIG_ARM_VEXPRESS_SPC_CPUFREQ) += vexpress-spc-cpufreq.o
......
......@@ -135,7 +135,7 @@ static void boost_set_msrs(bool enable, const struct cpumask *cpumask)
wrmsr_on_cpus(cpumask, msr_addr, msrs);
}
static int _store_boost(int val)
static int set_boost(int val)
{
get_online_cpus();
boost_set_msrs(val, cpu_online_mask);
......@@ -158,29 +158,24 @@ static ssize_t show_freqdomain_cpus(struct cpufreq_policy *policy, char *buf)
cpufreq_freq_attr_ro(freqdomain_cpus);
#ifdef CONFIG_X86_ACPI_CPUFREQ_CPB
static ssize_t store_boost(const char *buf, size_t count)
static ssize_t store_cpb(struct cpufreq_policy *policy, const char *buf,
size_t count)
{
int ret;
unsigned long val = 0;
unsigned int val = 0;
if (!acpi_cpufreq_driver.boost_supported)
if (!acpi_cpufreq_driver.set_boost)
return -EINVAL;
ret = kstrtoul(buf, 10, &val);
if (ret || (val > 1))
ret = kstrtouint(buf, 10, &val);
if (ret || val > 1)
return -EINVAL;
_store_boost((int) val);
set_boost(val);
return count;
}
static ssize_t store_cpb(struct cpufreq_policy *policy, const char *buf,
size_t count)
{
return store_boost(buf, count);
}
static ssize_t show_cpb(struct cpufreq_policy *policy, char *buf)
{
return sprintf(buf, "%u\n", acpi_cpufreq_driver.boost_enabled);
......@@ -905,7 +900,6 @@ static struct cpufreq_driver acpi_cpufreq_driver = {
.resume = acpi_cpufreq_resume,
.name = "acpi-cpufreq",
.attr = acpi_cpufreq_attr,
.set_boost = _store_boost,
};
static void __init acpi_cpufreq_boost_init(void)
......@@ -916,7 +910,7 @@ static void __init acpi_cpufreq_boost_init(void)
if (!msrs)
return;
acpi_cpufreq_driver.boost_supported = true;
acpi_cpufreq_driver.set_boost = set_boost;
acpi_cpufreq_driver.boost_enabled = boost_state(0);
cpu_notifier_register_begin();
......
......@@ -23,6 +23,7 @@
#include <linux/cpu.h>
#include <linux/cpufreq.h>
#include <linux/cpumask.h>
#include <linux/cpu_cooling.h>
#include <linux/export.h>
#include <linux/module.h>
#include <linux/mutex.h>
......@@ -55,6 +56,7 @@ static bool bL_switching_enabled;
#define ACTUAL_FREQ(cluster, freq) ((cluster == A7_CLUSTER) ? freq << 1 : freq)
#define VIRT_FREQ(cluster, freq) ((cluster == A7_CLUSTER) ? freq >> 1 : freq)
static struct thermal_cooling_device *cdev[MAX_CLUSTERS];
static struct cpufreq_arm_bL_ops *arm_bL_ops;
static struct clk *clk[MAX_CLUSTERS];
static struct cpufreq_frequency_table *freq_table[MAX_CLUSTERS + 1];
......@@ -493,6 +495,12 @@ static int bL_cpufreq_init(struct cpufreq_policy *policy)
static int bL_cpufreq_exit(struct cpufreq_policy *policy)
{
struct device *cpu_dev;
int cur_cluster = cpu_to_cluster(policy->cpu);
if (cur_cluster < MAX_CLUSTERS) {
cpufreq_cooling_unregister(cdev[cur_cluster]);
cdev[cur_cluster] = NULL;
}
cpu_dev = get_cpu_device(policy->cpu);
if (!cpu_dev) {
......@@ -507,6 +515,38 @@ static int bL_cpufreq_exit(struct cpufreq_policy *policy)
return 0;
}
static void bL_cpufreq_ready(struct cpufreq_policy *policy)
{
struct device *cpu_dev = get_cpu_device(policy->cpu);
int cur_cluster = cpu_to_cluster(policy->cpu);
struct device_node *np;
/* Do not register a cpu_cooling device if we are in IKS mode */
if (cur_cluster >= MAX_CLUSTERS)
return;
np = of_node_get(cpu_dev->of_node);
if (WARN_ON(!np))
return;
if (of_find_property(np, "#cooling-cells", NULL)) {
u32 power_coefficient = 0;
of_property_read_u32(np, "dynamic-power-coefficient",
&power_coefficient);
cdev[cur_cluster] = of_cpufreq_power_cooling_register(np,
policy->related_cpus, power_coefficient, NULL);
if (IS_ERR(cdev[cur_cluster])) {
dev_err(cpu_dev,
"running cpufreq without cooling device: %ld\n",
PTR_ERR(cdev[cur_cluster]));
cdev[cur_cluster] = NULL;
}
}
of_node_put(np);
}
static struct cpufreq_driver bL_cpufreq_driver = {
.name = "arm-big-little",
.flags = CPUFREQ_STICKY |
......@@ -517,6 +557,7 @@ static struct cpufreq_driver bL_cpufreq_driver = {
.get = bL_cpufreq_get_rate,
.init = bL_cpufreq_init,
.exit = bL_cpufreq_exit,
.ready = bL_cpufreq_ready,
.attr = cpufreq_generic_attr,
};
......
......@@ -112,7 +112,7 @@ static unsigned int bfin_getfreq_khz(unsigned int cpu)
}
#ifdef CONFIG_BF60x
unsigned long cpu_set_cclk(int cpu, unsigned long new)
static int cpu_set_cclk(int cpu, unsigned long new)
{
struct clk *clk;
int ret;
......
......@@ -50,7 +50,8 @@ static int set_target(struct cpufreq_policy *policy, unsigned int index)
struct private_data *priv = policy->driver_data;
struct device *cpu_dev = priv->cpu_dev;
struct regulator *cpu_reg = priv->cpu_reg;
unsigned long volt = 0, volt_old = 0, tol = 0;
unsigned long volt = 0, tol = 0;
int volt_old = 0;
unsigned int old_freq, new_freq;
long freq_Hz, freq_exact;
int ret;
......@@ -83,7 +84,7 @@ static int set_target(struct cpufreq_policy *policy, unsigned int index)
opp_freq / 1000, volt);
}
dev_dbg(cpu_dev, "%u MHz, %ld mV --> %u MHz, %ld mV\n",
dev_dbg(cpu_dev, "%u MHz, %d mV --> %u MHz, %ld mV\n",
old_freq / 1000, (volt_old > 0) ? volt_old / 1000 : -1,
new_freq / 1000, volt ? volt / 1000 : -1);
......@@ -407,8 +408,13 @@ static void cpufreq_ready(struct cpufreq_policy *policy)
* thermal DT code takes care of matching them.
*/
if (of_find_property(np, "#cooling-cells", NULL)) {
priv->cdev = of_cpufreq_cooling_register(np,
policy->related_cpus);
u32 power_coefficient = 0;
of_property_read_u32(np, "dynamic-power-coefficient",
&power_coefficient);
priv->cdev = of_cpufreq_power_cooling_register(np,
policy->related_cpus, power_coefficient, NULL);
if (IS_ERR(priv->cdev)) {
dev_err(priv->cpu_dev,
"running cpufreq without cooling device: %ld\n",
......
......@@ -2330,29 +2330,15 @@ int cpufreq_boost_trigger_state(int state)
return ret;
}
int cpufreq_boost_supported(void)
static bool cpufreq_boost_supported(void)
{
if (likely(cpufreq_driver))
return cpufreq_driver->boost_supported;
return 0;
return likely(cpufreq_driver) && cpufreq_driver->set_boost;
}
EXPORT_SYMBOL_GPL(cpufreq_boost_supported);
static int create_boost_sysfs_file(void)
{
int ret;
if (!cpufreq_boost_supported())
return 0;
/*
* Check if driver provides function to enable boost -
* if not, use cpufreq_boost_set_sw as default
*/
if (!cpufreq_driver->set_boost)
cpufreq_driver->set_boost = cpufreq_boost_set_sw;
ret = sysfs_create_file(cpufreq_global_kobject, &boost.attr);
if (ret)
pr_err("%s: cannot register global BOOST sysfs file\n",
......@@ -2375,7 +2361,7 @@ int cpufreq_enable_boost_support(void)
if (cpufreq_boost_supported())
return 0;
cpufreq_driver->boost_supported = true;
cpufreq_driver->set_boost = cpufreq_boost_set_sw;
/* This will get removed on driver unregister */
return create_boost_sysfs_file();
......@@ -2435,9 +2421,11 @@ int cpufreq_register_driver(struct cpufreq_driver *driver_data)
if (driver_data->setpolicy)
driver_data->flags |= CPUFREQ_CONST_LOOPS;
ret = create_boost_sysfs_file();
if (ret)
goto err_null_driver;
if (cpufreq_boost_supported()) {
ret = create_boost_sysfs_file();
if (ret)
goto err_null_driver;
}
ret = subsys_interface_register(&cpufreq_interface);
if (ret)
......
......@@ -115,13 +115,13 @@ static void cs_check_cpu(int cpu, unsigned int load)
}
}
static unsigned int cs_dbs_timer(struct cpu_dbs_info *cdbs,
struct dbs_data *dbs_data, bool modify_all)
static unsigned int cs_dbs_timer(struct cpufreq_policy *policy, bool modify_all)
{
struct dbs_data *dbs_data = policy->governor_data;
struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
if (modify_all)
dbs_check_cpu(dbs_data, cdbs->shared->policy->cpu);
dbs_check_cpu(dbs_data, policy->cpu);
return delay_for_sampling_rate(cs_tuners->sampling_rate);
}
......
......@@ -84,6 +84,9 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
(cur_wall_time - j_cdbs->prev_cpu_wall);
j_cdbs->prev_cpu_wall = cur_wall_time;
if (cur_idle_time < j_cdbs->prev_cpu_idle)
cur_idle_time = j_cdbs->prev_cpu_idle;
idle_time = (unsigned int)
(cur_idle_time - j_cdbs->prev_cpu_idle);
j_cdbs->prev_cpu_idle = cur_idle_time;
......@@ -158,47 +161,55 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
}
EXPORT_SYMBOL_GPL(dbs_check_cpu);
static inline void __gov_queue_work(int cpu, struct dbs_data *dbs_data,
unsigned int delay)
void gov_add_timers(struct cpufreq_policy *policy, unsigned int delay)
{
struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
mod_delayed_work_on(cpu, system_wq, &cdbs->dwork, delay);
}
void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
unsigned int delay, bool all_cpus)
{
int i;
struct dbs_data *dbs_data = policy->governor_data;
struct cpu_dbs_info *cdbs;
int cpu;
if (!all_cpus) {
/*
* Use raw_smp_processor_id() to avoid preemptible warnings.
* We know that this is only called with all_cpus == false from
* works that have been queued with *_work_on() functions and
* those works are canceled during CPU_DOWN_PREPARE so they
* can't possibly run on any other CPU.
*/
__gov_queue_work(raw_smp_processor_id(), dbs_data, delay);
} else {
for_each_cpu(i, policy->cpus)
__gov_queue_work(i, dbs_data, delay);
for_each_cpu(cpu, policy->cpus) {
cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
cdbs->timer.expires = jiffies + delay;
add_timer_on(&cdbs->timer, cpu);
}
}
EXPORT_SYMBOL_GPL(gov_queue_work);
EXPORT_SYMBOL_GPL(gov_add_timers);
static inline void gov_cancel_work(struct dbs_data *dbs_data,
struct cpufreq_policy *policy)
static inline void gov_cancel_timers(struct cpufreq_policy *policy)
{
struct dbs_data *dbs_data = policy->governor_data;
struct cpu_dbs_info *cdbs;
int i;
for_each_cpu(i, policy->cpus) {
cdbs = dbs_data->cdata->get_cpu_cdbs(i);
cancel_delayed_work_sync(&cdbs->dwork);
del_timer_sync(&cdbs->timer);
}
}
void gov_cancel_work(struct cpu_common_dbs_info *shared)
{
/* Tell dbs_timer_handler() to skip queuing up work items. */
atomic_inc(&shared->skip_work);
/*
* If dbs_timer_handler() is already running, it may not notice the
* incremented skip_work, so wait for it to complete to prevent its work
* item from being queued up after the cancel_work_sync() below.
*/
gov_cancel_timers(shared->policy);
/*
* In case dbs_timer_handler() managed to run and spawn a work item
* before the timers have been canceled, wait for that work item to
* complete and then cancel all of the timers set up by it. If
* dbs_timer_handler() runs again at that point, it will see the
* positive value of skip_work and won't spawn any more work items.
*/
cancel_work_sync(&shared->work);
gov_cancel_timers(shared->policy);
atomic_set(&shared->skip_work, 0);
}
EXPORT_SYMBOL_GPL(gov_cancel_work);
/* Will return if we need to evaluate cpu load again or not */
static bool need_load_eval(struct cpu_common_dbs_info *shared,
unsigned int sampling_rate)
......@@ -217,29 +228,21 @@ static bool need_load_eval(struct cpu_common_dbs_info *shared,
return true;
}
static void dbs_timer(struct work_struct *work)
static void dbs_work_handler(struct work_struct *work)
{
struct cpu_dbs_info *cdbs = container_of(work, struct cpu_dbs_info,
dwork.work);
struct cpu_common_dbs_info *shared = cdbs->shared;
struct cpu_common_dbs_info *shared = container_of(work, struct
cpu_common_dbs_info, work);
struct cpufreq_policy *policy;
struct dbs_data *dbs_data;
unsigned int sampling_rate, delay;
bool modify_all = true;
mutex_lock(&shared->timer_mutex);
bool eval_load;
policy = shared->policy;
/*
* Governor might already be disabled and there is no point continuing
* with the work-handler.
*/
if (!policy)
goto unlock;
dbs_data = policy->governor_data;
/* Kill all timers */
gov_cancel_timers(policy);
if (dbs_data->cdata->governor == GOV_CONSERVATIVE) {
struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
......@@ -250,14 +253,37 @@ static void dbs_timer(struct work_struct *work)
sampling_rate = od_tuners->sampling_rate;
}
if (!need_load_eval(cdbs->shared, sampling_rate))
modify_all = false;
delay = dbs_data->cdata->gov_dbs_timer(cdbs, dbs_data, modify_all);
gov_queue_work(dbs_data, policy, delay, modify_all);
eval_load = need_load_eval(shared, sampling_rate);
unlock:
/*
* Make sure cpufreq_governor_limits() isn't evaluating load in
* parallel.
*/
mutex_lock(&shared->timer_mutex);
delay = dbs_data->cdata->gov_dbs_timer(policy, eval_load);
mutex_unlock(&shared->timer_mutex);
atomic_dec(&shared->skip_work);
gov_add_timers(policy, delay);
}
static void dbs_timer_handler(unsigned long data)
{
struct cpu_dbs_info *cdbs = (struct cpu_dbs_info *)data;
struct cpu_common_dbs_info *shared = cdbs->shared;
/*
* Timer handler may not be allowed to queue the work at the moment,
* because:
* - Another timer handler has done that
* - We are stopping the governor
* - Or we are updating the sampling rate of the ondemand governor
*/
if (atomic_inc_return(&shared->skip_work) > 1)
atomic_dec(&shared->skip_work);
else
queue_work(system_wq, &shared->work);
}
static void set_sampling_rate(struct dbs_data *dbs_data,
......@@ -287,6 +313,9 @@ static int alloc_common_dbs_info(struct cpufreq_policy *policy,
for_each_cpu(j, policy->related_cpus)
cdata->get_cpu_cdbs(j)->shared = shared;
mutex_init(&shared->timer_mutex);
atomic_set(&shared->skip_work, 0);
INIT_WORK(&shared->work, dbs_work_handler);
return 0;
}
......@@ -297,6 +326,8 @@ static void free_common_dbs_info(struct cpufreq_policy *policy,
struct cpu_common_dbs_info *shared = cdbs->shared;
int j;
mutex_destroy(&shared->timer_mutex);
for_each_cpu(j, policy->cpus)
cdata->get_cpu_cdbs(j)->shared = NULL;
......@@ -433,7 +464,6 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy,
shared->policy = policy;
shared->time_stamp = ktime_get();
mutex_init(&shared->timer_mutex);
for_each_cpu(j, policy->cpus) {
struct cpu_dbs_info *j_cdbs = cdata->get_cpu_cdbs(j);
......@@ -450,7 +480,9 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy,
if (ignore_nice)
j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];
INIT_DEFERRABLE_WORK(&j_cdbs->dwork, dbs_timer);
__setup_timer(&j_cdbs->timer, dbs_timer_handler,
(unsigned long)j_cdbs,
TIMER_DEFERRABLE | TIMER_IRQSAFE);
}
if (cdata->governor == GOV_CONSERVATIVE) {
......@@ -468,8 +500,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy,
od_ops->powersave_bias_init_cpu(cpu);
}
gov_queue_work(dbs_data, policy, delay_for_sampling_rate(sampling_rate),
true);
gov_add_timers(policy, delay_for_sampling_rate(sampling_rate));
return 0;
}
......@@ -483,18 +514,9 @@ static int cpufreq_governor_stop(struct cpufreq_policy *policy,
if (!shared || !shared->policy)
return -EBUSY;
/*
* Work-handler must see this updated, as it should not proceed any
* further after governor is disabled. And so timer_mutex is taken while
* updating this value.
*/
mutex_lock(&shared->timer_mutex);
gov_cancel_work(shared);
shared->policy = NULL;
mutex_unlock(&shared->timer_mutex);
gov_cancel_work(dbs_data, policy);
mutex_destroy(&shared->timer_mutex);
return 0;
}
......
......@@ -17,6 +17,7 @@
#ifndef _CPUFREQ_GOVERNOR_H
#define _CPUFREQ_GOVERNOR_H
#include <linux/atomic.h>
#include <linux/cpufreq.h>
#include <linux/kernel_stat.h>
#include <linux/module.h>
......@@ -132,12 +133,14 @@ static void *get_cpu_dbs_info_s(int cpu) \
struct cpu_common_dbs_info {
struct cpufreq_policy *policy;
/*
* percpu mutex that serializes governor limit change with dbs_timer
* invocation. We do not want dbs_timer to run when user is changing
* the governor or limits.
* Per policy mutex that serializes load evaluation from limit-change
* and work-handler.
*/
struct mutex timer_mutex;
ktime_t time_stamp;
atomic_t skip_work;
struct work_struct work;
};
/* Per cpu structures */
......@@ -152,7 +155,7 @@ struct cpu_dbs_info {
* wake-up from idle.
*/
unsigned int prev_load;
struct delayed_work dwork;
struct timer_list timer;
struct cpu_common_dbs_info *shared;
};
......@@ -209,8 +212,7 @@ struct common_dbs_data {
struct cpu_dbs_info *(*get_cpu_cdbs)(int cpu);
void *(*get_cpu_dbs_info_s)(int cpu);
unsigned int (*gov_dbs_timer)(struct cpu_dbs_info *cdbs,
struct dbs_data *dbs_data,
unsigned int (*gov_dbs_timer)(struct cpufreq_policy *policy,
bool modify_all);
void (*gov_check_cpu)(int cpu, unsigned int load);
int (*init)(struct dbs_data *dbs_data, bool notify);
......@@ -269,11 +271,11 @@ static ssize_t show_sampling_rate_min_gov_pol \
extern struct mutex cpufreq_governor_lock;
void gov_add_timers(struct cpufreq_policy *policy, unsigned int delay);
void gov_cancel_work(struct cpu_common_dbs_info *shared);
void dbs_check_cpu(struct dbs_data *dbs_data, int cpu);
int cpufreq_governor_dbs(struct cpufreq_policy *policy,
struct common_dbs_data *cdata, unsigned int event);
void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
unsigned int delay, bool all_cpus);
void od_register_powersave_bias_handler(unsigned int (*f)
(struct cpufreq_policy *, unsigned int, unsigned int),
unsigned int powersave_bias);
......
......@@ -191,10 +191,9 @@ static void od_check_cpu(int cpu, unsigned int load)
}
}
static unsigned int od_dbs_timer(struct cpu_dbs_info *cdbs,
struct dbs_data *dbs_data, bool modify_all)
static unsigned int od_dbs_timer(struct cpufreq_policy *policy, bool modify_all)
{
struct cpufreq_policy *policy = cdbs->shared->policy;
struct dbs_data *dbs_data = policy->governor_data;
unsigned int cpu = policy->cpu;
struct od_cpu_dbs_info_s *dbs_info = &per_cpu(od_cpu_dbs_info,
cpu);
......@@ -247,40 +246,66 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
unsigned int new_rate)
{
struct od_dbs_tuners *od_tuners = dbs_data->tuners;
struct cpumask cpumask;
int cpu;
od_tuners->sampling_rate = new_rate = max(new_rate,
dbs_data->min_sampling_rate);
for_each_online_cpu(cpu) {
/*
* Lock governor so that governor start/stop can't execute in parallel.
*/
mutex_lock(&od_dbs_cdata.mutex);
cpumask_copy(&cpumask, cpu_online_mask);
for_each_cpu(cpu, &cpumask) {
struct cpufreq_policy *policy;
struct od_cpu_dbs_info_s *dbs_info;
struct cpu_dbs_info *cdbs;
struct cpu_common_dbs_info *shared;
unsigned long next_sampling, appointed_at;
policy = cpufreq_cpu_get(cpu);
if (!policy)
continue;
if (policy->governor != &cpufreq_gov_ondemand) {
cpufreq_cpu_put(policy);
continue;
}
dbs_info = &per_cpu(od_cpu_dbs_info, cpu);
cpufreq_cpu_put(policy);
cdbs = &dbs_info->cdbs;
shared = cdbs->shared;
if (!delayed_work_pending(&dbs_info->cdbs.dwork))
/*
* A valid shared and shared->policy means governor hasn't
* stopped or exited yet.
*/
if (!shared || !shared->policy)
continue;
policy = shared->policy;
/* clear all CPUs of this policy */
cpumask_andnot(&cpumask, &cpumask, policy->cpus);
/*
* Update sampling rate for CPUs whose policy is governed by
* dbs_data. In case of governor_per_policy, only a single
* policy will be governed by dbs_data, otherwise there can be
* multiple policies that are governed by the same dbs_data.
*/
if (dbs_data != policy->governor_data)
continue;
/*
* Checking this for any CPU should be fine, timers for all of
* them are scheduled together.
*/
next_sampling = jiffies + usecs_to_jiffies(new_rate);
appointed_at = dbs_info->cdbs.dwork.timer.expires;
appointed_at = dbs_info->cdbs.timer.expires;
if (time_before(next_sampling, appointed_at)) {
cancel_delayed_work_sync(&dbs_info->cdbs.dwork);
gov_queue_work(dbs_data, policy,
usecs_to_jiffies(new_rate), true);
gov_cancel_work(shared);
gov_add_timers(policy, usecs_to_jiffies(new_rate));
}
}
mutex_unlock(&od_dbs_cdata.mutex);
}
static ssize_t store_sampling_rate(struct dbs_data *dbs_data, const char *buf,
......
......@@ -66,6 +66,7 @@ static inline int ceiling_fp(int32_t x)
struct sample {
int32_t core_pct_busy;
int32_t busy_scaled;
u64 aperf;
u64 mperf;
u64 tsc;
......@@ -112,6 +113,7 @@ struct cpudata {
u64 prev_aperf;
u64 prev_mperf;
u64 prev_tsc;
u64 prev_cummulative_iowait;
struct sample sample;
};
......@@ -133,6 +135,7 @@ struct pstate_funcs {
int (*get_scaling)(void);
void (*set)(struct cpudata*, int pstate);
void (*get_vid)(struct cpudata *);
int32_t (*get_target_pstate)(struct cpudata *);
};
struct cpu_defaults {
......@@ -140,6 +143,9 @@ struct cpu_defaults {
struct pstate_funcs funcs;
};
static inline int32_t get_target_pstate_use_performance(struct cpudata *cpu);
static inline int32_t get_target_pstate_use_cpu_load(struct cpudata *cpu);
static struct pstate_adjust_policy pid_params;
static struct pstate_funcs pstate_funcs;
static int hwp_active;
......@@ -738,6 +744,7 @@ static struct cpu_defaults core_params = {
.get_turbo = core_get_turbo_pstate,
.get_scaling = core_get_scaling,
.set = core_set_pstate,
.get_target_pstate = get_target_pstate_use_performance,
},
};
......@@ -758,6 +765,7 @@ static struct cpu_defaults silvermont_params = {
.set = atom_set_pstate,
.get_scaling = silvermont_get_scaling,
.get_vid = atom_get_vid,
.get_target_pstate = get_target_pstate_use_cpu_load,
},
};
......@@ -778,6 +786,7 @@ static struct cpu_defaults airmont_params = {
.set = atom_set_pstate,
.get_scaling = airmont_get_scaling,
.get_vid = atom_get_vid,
.get_target_pstate = get_target_pstate_use_cpu_load,
},
};
......@@ -797,6 +806,7 @@ static struct cpu_defaults knl_params = {
.get_turbo = knl_get_turbo_pstate,
.get_scaling = core_get_scaling,
.set = core_set_pstate,
.get_target_pstate = get_target_pstate_use_performance,
},
};
......@@ -882,12 +892,11 @@ static inline void intel_pstate_sample(struct cpudata *cpu)
local_irq_save(flags);
rdmsrl(MSR_IA32_APERF, aperf);
rdmsrl(MSR_IA32_MPERF, mperf);
if (cpu->prev_mperf == mperf) {
tsc = rdtsc();
if ((cpu->prev_mperf == mperf) || (cpu->prev_tsc == tsc)) {
local_irq_restore(flags);
return;
}
tsc = rdtsc();
local_irq_restore(flags);
cpu->last_sample_time = cpu->sample.time;
......@@ -922,7 +931,43 @@ static inline void intel_pstate_set_sample_time(struct cpudata *cpu)
mod_timer_pinned(&cpu->timer, jiffies + delay);
}
static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu)
static inline int32_t get_target_pstate_use_cpu_load(struct cpudata *cpu)
{
struct sample *sample = &cpu->sample;
u64 cummulative_iowait, delta_iowait_us;
u64 delta_iowait_mperf;
u64 mperf, now;
int32_t cpu_load;
cummulative_iowait = get_cpu_iowait_time_us(cpu->cpu, &now);
/*
* Convert iowait time into number of IO cycles spent at max_freq.
* IO is considered as busy only for the cpu_load algorithm. For
* performance this is not needed since we always try to reach the
* maximum P-State, so we are already boosting the IOs.
*/
delta_iowait_us = cummulative_iowait - cpu->prev_cummulative_iowait;
delta_iowait_mperf = div64_u64(delta_iowait_us * cpu->pstate.scaling *
cpu->pstate.max_pstate, MSEC_PER_SEC);
mperf = cpu->sample.mperf + delta_iowait_mperf;
cpu->prev_cummulative_iowait = cummulative_iowait;
/*
* The load can be estimated as the ratio of the mperf counter
* running at a constant frequency during active periods
* (C0) and the time stamp counter running at the same frequency
* also during C-states.
*/
cpu_load = div64_u64(int_tofp(100) * mperf, sample->tsc);
cpu->sample.busy_scaled = cpu_load;
return cpu->pstate.current_pstate - pid_calc(&cpu->pid, cpu_load);
}
static inline int32_t get_target_pstate_use_performance(struct cpudata *cpu)
{
int32_t core_busy, max_pstate, current_pstate, sample_ratio;
s64 duration_us;
......@@ -960,30 +1005,24 @@ static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu)
core_busy = mul_fp(core_busy, sample_ratio);
}
return core_busy;
cpu->sample.busy_scaled = core_busy;
return cpu->pstate.current_pstate - pid_calc(&cpu->pid, core_busy);
}
static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
{
int32_t busy_scaled;
struct _pid *pid;
signed int ctl;
int from;
int from, target_pstate;
struct sample *sample;
from = cpu->pstate.current_pstate;
pid = &cpu->pid;
busy_scaled = intel_pstate_get_scaled_busy(cpu);
target_pstate = pstate_funcs.get_target_pstate(cpu);
ctl = pid_calc(pid, busy_scaled);
/* Negative values of ctl increase the pstate and vice versa */
intel_pstate_set_pstate(cpu, cpu->pstate.current_pstate - ctl, true);
intel_pstate_set_pstate(cpu, target_pstate, true);
sample = &cpu->sample;
trace_pstate_sample(fp_toint(sample->core_pct_busy),
fp_toint(busy_scaled),
fp_toint(sample->busy_scaled),
from,
cpu->pstate.current_pstate,
sample->mperf,
......@@ -1237,6 +1276,8 @@ static void copy_cpu_funcs(struct pstate_funcs *funcs)
pstate_funcs.get_scaling = funcs->get_scaling;
pstate_funcs.set = funcs->set;
pstate_funcs.get_vid = funcs->get_vid;
pstate_funcs.get_target_pstate = funcs->get_target_pstate;
}
#if IS_ENABLED(CONFIG_ACPI)
......
......@@ -41,16 +41,35 @@
* the original PLL becomes stable at target frequency.
*/
struct mtk_cpu_dvfs_info {
struct cpumask cpus;
struct device *cpu_dev;
struct regulator *proc_reg;
struct regulator *sram_reg;
struct clk *cpu_clk;
struct clk *inter_clk;
struct thermal_cooling_device *cdev;
struct list_head list_head;
int intermediate_voltage;
bool need_voltage_tracking;
};
static LIST_HEAD(dvfs_info_list);
static struct mtk_cpu_dvfs_info *mtk_cpu_dvfs_info_lookup(int cpu)
{
struct mtk_cpu_dvfs_info *info;
struct list_head *list;
list_for_each(list, &dvfs_info_list) {
info = list_entry(list, struct mtk_cpu_dvfs_info, list_head);
if (cpumask_test_cpu(cpu, &info->cpus))
return info;
}
return NULL;
}
static int mtk_cpufreq_voltage_tracking(struct mtk_cpu_dvfs_info *info,
int new_vproc)
{
......@@ -59,7 +78,10 @@ static int mtk_cpufreq_voltage_tracking(struct mtk_cpu_dvfs_info *info,
int old_vproc, old_vsram, new_vsram, vsram, vproc, ret;
old_vproc = regulator_get_voltage(proc_reg);
old_vsram = regulator_get_voltage(sram_reg);
if (old_vproc < 0) {
pr_err("%s: invalid Vproc value: %d\n", __func__, old_vproc);
return old_vproc;
}
/* Vsram should not exceed the maximum allowed voltage of SoC. */
new_vsram = min(new_vproc + MIN_VOLT_SHIFT, MAX_VOLT_LIMIT);
......@@ -72,7 +94,17 @@ static int mtk_cpufreq_voltage_tracking(struct mtk_cpu_dvfs_info *info,
*/
do {
old_vsram = regulator_get_voltage(sram_reg);
if (old_vsram < 0) {
pr_err("%s: invalid Vsram value: %d\n",
__func__, old_vsram);
return old_vsram;
}
old_vproc = regulator_get_voltage(proc_reg);
if (old_vproc < 0) {
pr_err("%s: invalid Vproc value: %d\n",
__func__, old_vproc);
return old_vproc;
}
vsram = min(new_vsram, old_vproc + MAX_VOLT_SHIFT);
......@@ -117,7 +149,17 @@ static int mtk_cpufreq_voltage_tracking(struct mtk_cpu_dvfs_info *info,
*/
do {
old_vproc = regulator_get_voltage(proc_reg);
if (old_vproc < 0) {
pr_err("%s: invalid Vproc value: %d\n",
__func__, old_vproc);
return old_vproc;
}
old_vsram = regulator_get_voltage(sram_reg);
if (old_vsram < 0) {
pr_err("%s: invalid Vsram value: %d\n",
__func__, old_vsram);
return old_vsram;
}
vproc = max(new_vproc, old_vsram - MAX_VOLT_SHIFT);
ret = regulator_set_voltage(proc_reg, vproc,
......@@ -185,6 +227,10 @@ static int mtk_cpufreq_set_target(struct cpufreq_policy *policy,
old_freq_hz = clk_get_rate(cpu_clk);
old_vproc = regulator_get_voltage(info->proc_reg);
if (old_vproc < 0) {
pr_err("%s: invalid Vproc value: %d\n", __func__, old_vproc);
return old_vproc;
}
freq_hz = freq_table[index].frequency * 1000;
......@@ -344,7 +390,15 @@ static int mtk_cpu_dvfs_info_init(struct mtk_cpu_dvfs_info *info, int cpu)
/* Both presence and absence of sram regulator are valid cases. */
sram_reg = regulator_get_exclusive(cpu_dev, "sram");
ret = dev_pm_opp_of_add_table(cpu_dev);
/* Get OPP-sharing information from "operating-points-v2" bindings */
ret = dev_pm_opp_of_get_sharing_cpus(cpu_dev, &info->cpus);
if (ret) {
pr_err("failed to get OPP-sharing information for cpu%d\n",
cpu);
goto out_free_resources;
}
ret = dev_pm_opp_of_cpumask_add_table(&info->cpus);
if (ret) {
pr_warn("no OPP table for cpu%d\n", cpu);
goto out_free_resources;
......@@ -378,7 +432,7 @@ static int mtk_cpu_dvfs_info_init(struct mtk_cpu_dvfs_info *info, int cpu)
return 0;
out_free_opp_table:
dev_pm_opp_of_remove_table(cpu_dev);
dev_pm_opp_of_cpumask_remove_table(&info->cpus);
out_free_resources:
if (!IS_ERR(proc_reg))
......@@ -404,7 +458,7 @@ static void mtk_cpu_dvfs_info_release(struct mtk_cpu_dvfs_info *info)
if (!IS_ERR(info->inter_clk))
clk_put(info->inter_clk);
dev_pm_opp_of_remove_table(info->cpu_dev);
dev_pm_opp_of_cpumask_remove_table(&info->cpus);
}
static int mtk_cpufreq_init(struct cpufreq_policy *policy)
......@@ -413,22 +467,18 @@ static int mtk_cpufreq_init(struct cpufreq_policy *policy)
struct cpufreq_frequency_table *freq_table;
int ret;
info = kzalloc(sizeof(*info), GFP_KERNEL);
if (!info)
return -ENOMEM;
ret = mtk_cpu_dvfs_info_init(info, policy->cpu);
if (ret) {
pr_err("%s failed to initialize dvfs info for cpu%d\n",
__func__, policy->cpu);
goto out_free_dvfs_info;
info = mtk_cpu_dvfs_info_lookup(policy->cpu);
if (!info) {
pr_err("dvfs info for cpu%d is not initialized.\n",
policy->cpu);
return -EINVAL;
}
ret = dev_pm_opp_init_cpufreq_table(info->cpu_dev, &freq_table);
if (ret) {
pr_err("failed to init cpufreq table for cpu%d: %d\n",
policy->cpu, ret);
goto out_release_dvfs_info;
return ret;
}
ret = cpufreq_table_validate_and_show(policy, freq_table);
......@@ -437,8 +487,7 @@ static int mtk_cpufreq_init(struct cpufreq_policy *policy)
goto out_free_cpufreq_table;
}
/* CPUs in the same cluster share a clock and power domain. */
cpumask_copy(policy->cpus, &cpu_topology[policy->cpu].core_sibling);
cpumask_copy(policy->cpus, &info->cpus);
policy->driver_data = info;
policy->clk = info->cpu_clk;
......@@ -446,13 +495,6 @@ static int mtk_cpufreq_init(struct cpufreq_policy *policy)
out_free_cpufreq_table:
dev_pm_opp_free_cpufreq_table(info->cpu_dev, &freq_table);
out_release_dvfs_info:
mtk_cpu_dvfs_info_release(info);
out_free_dvfs_info:
kfree(info);
return ret;
}
......@@ -462,14 +504,13 @@ static int mtk_cpufreq_exit(struct cpufreq_policy *policy)
cpufreq_cooling_unregister(info->cdev);
dev_pm_opp_free_cpufreq_table(info->cpu_dev, &policy->freq_table);
mtk_cpu_dvfs_info_release(info);
kfree(info);
return 0;
}
static struct cpufreq_driver mt8173_cpufreq_driver = {
.flags = CPUFREQ_STICKY | CPUFREQ_NEED_INITIAL_FREQ_CHECK,
.flags = CPUFREQ_STICKY | CPUFREQ_NEED_INITIAL_FREQ_CHECK |
CPUFREQ_HAVE_GOVERNOR_PER_POLICY,
.verify = cpufreq_generic_frequency_table_verify,
.target_index = mtk_cpufreq_set_target,
.get = cpufreq_generic_get,
......@@ -482,11 +523,47 @@ static struct cpufreq_driver mt8173_cpufreq_driver = {
static int mt8173_cpufreq_probe(struct platform_device *pdev)
{
int ret;
struct mtk_cpu_dvfs_info *info;
struct list_head *list, *tmp;
int cpu, ret;
for_each_possible_cpu(cpu) {
info = mtk_cpu_dvfs_info_lookup(cpu);
if (info)
continue;
info = devm_kzalloc(&pdev->dev, sizeof(*info), GFP_KERNEL);
if (!info) {
ret = -ENOMEM;
goto release_dvfs_info_list;
}
ret = mtk_cpu_dvfs_info_init(info, cpu);
if (ret) {
dev_err(&pdev->dev,
"failed to initialize dvfs info for cpu%d\n",
cpu);
goto release_dvfs_info_list;
}
list_add(&info->list_head, &dvfs_info_list);
}
ret = cpufreq_register_driver(&mt8173_cpufreq_driver);
if (ret)
pr_err("failed to register mtk cpufreq driver\n");
if (ret) {
dev_err(&pdev->dev, "failed to register mtk cpufreq driver\n");
goto release_dvfs_info_list;
}
return 0;
release_dvfs_info_list:
list_for_each_safe(list, tmp, &dvfs_info_list) {
info = list_entry(list, struct mtk_cpu_dvfs_info, list_head);
mtk_cpu_dvfs_info_release(info);
list_del(list);
}
return ret;
}
......
......@@ -555,6 +555,8 @@ static int pcc_cpufreq_cpu_init(struct cpufreq_policy *policy)
policy->min = policy->cpuinfo.min_freq =
ioread32(&pcch_hdr->minimum_frequency) * 1000;
policy->cpuinfo.transition_latency = CPUFREQ_ETERNAL;
pr_debug("init: policy->max is %d, policy->min is %d\n",
policy->max, policy->min);
out:
......
......@@ -12,6 +12,7 @@
#include <linux/clk.h>
#include <linux/cpufreq.h>
#include <linux/cpu_cooling.h>
#include <linux/errno.h>
#include <linux/init.h>
#include <linux/kernel.h>
......@@ -33,6 +34,7 @@
struct cpu_data {
struct clk **pclk;
struct cpufreq_frequency_table *table;
struct thermal_cooling_device *cdev;
};
/**
......@@ -321,6 +323,27 @@ static int qoriq_cpufreq_target(struct cpufreq_policy *policy,
return clk_set_parent(policy->clk, parent);
}
static void qoriq_cpufreq_ready(struct cpufreq_policy *policy)
{
struct cpu_data *cpud = policy->driver_data;
struct device_node *np = of_get_cpu_node(policy->cpu, NULL);
if (of_find_property(np, "#cooling-cells", NULL)) {
cpud->cdev = of_cpufreq_cooling_register(np,
policy->related_cpus);
if (IS_ERR(cpud->cdev)) {
pr_err("Failed to register cooling device cpu%d: %ld\n",
policy->cpu, PTR_ERR(cpud->cdev));
cpud->cdev = NULL;
}
}
of_node_put(np);
}
static struct cpufreq_driver qoriq_cpufreq_driver = {
.name = "qoriq_cpufreq",
.flags = CPUFREQ_CONST_LOOPS,
......@@ -329,6 +352,7 @@ static struct cpufreq_driver qoriq_cpufreq_driver = {
.verify = cpufreq_generic_frequency_table_verify,
.target_index = qoriq_cpufreq_target,
.get = cpufreq_generic_get,
.ready = qoriq_cpufreq_ready,
.attr = cpufreq_generic_attr,
};
......
/*
* Match running platform with pre-defined OPP values for CPUFreq
*
* Author: Ajit Pal Singh <ajitpal.singh@st.com>
* Lee Jones <lee.jones@linaro.org>
*
* Copyright (C) 2015 STMicroelectronics (R&D) Limited
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the version 2 of the GNU General Public License as
* published by the Free Software Foundation
*/
#include <linux/cpu.h>
#include <linux/io.h>
#include <linux/mfd/syscon.h>
#include <linux/module.h>
#include <linux/of.h>
#include <linux/of_platform.h>
#include <linux/pm_opp.h>
#include <linux/regmap.h>
#define VERSION_ELEMENTS 3
#define MAX_PCODE_NAME_LEN 7
#define VERSION_SHIFT 28
#define HW_INFO_INDEX 1
#define MAJOR_ID_INDEX 1
#define MINOR_ID_INDEX 2
/*
* Only match on "suitable for ALL versions" entries
*
* This will be used with the BIT() macro. It sets the
* top bit of a 32bit value and is equal to 0x80000000.
*/
#define DEFAULT_VERSION 31
enum {
PCODE = 0,
SUBSTRATE,
DVFS_MAX_REGFIELDS,
};
/**
* ST CPUFreq Driver Data
*
* @cpu_node CPU's OF node
* @syscfg_eng Engineering Syscon register map
* @regmap Syscon register map
*/
static struct sti_cpufreq_ddata {
struct device *cpu;
struct regmap *syscfg_eng;
struct regmap *syscfg;
} ddata;
static int sti_cpufreq_fetch_major(void) {
struct device_node *np = ddata.cpu->of_node;
struct device *dev = ddata.cpu;
unsigned int major_offset;
unsigned int socid;
int ret;
ret = of_property_read_u32_index(np, "st,syscfg",
MAJOR_ID_INDEX, &major_offset);
if (ret) {
dev_err(dev, "No major number offset provided in %s [%d]\n",
np->full_name, ret);
return ret;
}
ret = regmap_read(ddata.syscfg, major_offset, &socid);
if (ret) {
dev_err(dev, "Failed to read major number from syscon [%d]\n",
ret);
return ret;
}
return ((socid >> VERSION_SHIFT) & 0xf) + 1;
}
static int sti_cpufreq_fetch_minor(void)
{
struct device *dev = ddata.cpu;
struct device_node *np = dev->of_node;
unsigned int minor_offset;
unsigned int minid;
int ret;
ret = of_property_read_u32_index(np, "st,syscfg-eng",
MINOR_ID_INDEX, &minor_offset);
if (ret) {
dev_err(dev,
"No minor number offset provided %s [%d]\n",
np->full_name, ret);
return ret;
}
ret = regmap_read(ddata.syscfg_eng, minor_offset, &minid);
if (ret) {
dev_err(dev,
"Failed to read the minor number from syscon [%d]\n",
ret);
return ret;
}
return minid & 0xf;
}
static int sti_cpufreq_fetch_regmap_field(const struct reg_field *reg_fields,
int hw_info_offset, int field)
{
struct regmap_field *regmap_field;
struct reg_field reg_field = reg_fields[field];
struct device *dev = ddata.cpu;
unsigned int value;
int ret;
reg_field.reg = hw_info_offset;
regmap_field = devm_regmap_field_alloc(dev,
ddata.syscfg_eng,
reg_field);
if (IS_ERR(regmap_field)) {
dev_err(dev, "Failed to allocate reg field\n");
return PTR_ERR(regmap_field);
}
ret = regmap_field_read(regmap_field, &value);
if (ret) {
dev_err(dev, "Failed to read %s code\n",
field ? "SUBSTRATE" : "PCODE");
return ret;
}
return value;
}
static const struct reg_field sti_stih407_dvfs_regfields[DVFS_MAX_REGFIELDS] = {
[PCODE] = REG_FIELD(0, 16, 19),
[SUBSTRATE] = REG_FIELD(0, 0, 2),
};
static const struct reg_field *sti_cpufreq_match(void)
{
if (of_machine_is_compatible("st,stih407") ||
of_machine_is_compatible("st,stih410"))
return sti_stih407_dvfs_regfields;
return NULL;
}
static int sti_cpufreq_set_opp_info(void)
{
struct device *dev = ddata.cpu;
struct device_node *np = dev->of_node;
const struct reg_field *reg_fields;
unsigned int hw_info_offset;
unsigned int version[VERSION_ELEMENTS];
int pcode, substrate, major, minor;
int ret;
char name[MAX_PCODE_NAME_LEN];
reg_fields = sti_cpufreq_match();
if (!reg_fields) {
dev_err(dev, "This SoC doesn't support voltage scaling");
return -ENODEV;
}
ret = of_property_read_u32_index(np, "st,syscfg-eng",
HW_INFO_INDEX, &hw_info_offset);
if (ret) {
dev_warn(dev, "Failed to read HW info offset from DT\n");
substrate = DEFAULT_VERSION;
pcode = 0;
goto use_defaults;
}
pcode = sti_cpufreq_fetch_regmap_field(reg_fields,
hw_info_offset,
PCODE);
if (pcode < 0) {
dev_warn(dev, "Failed to obtain process code\n");
/* Use default pcode */
pcode = 0;
}
substrate = sti_cpufreq_fetch_regmap_field(reg_fields,
hw_info_offset,
SUBSTRATE);
if (substrate) {
dev_warn(dev, "Failed to obtain substrate code\n");
/* Use default substrate */
substrate = DEFAULT_VERSION;
}
use_defaults:
major = sti_cpufreq_fetch_major();
if (major < 0) {
dev_err(dev, "Failed to obtain major version\n");
/* Use default major number */
major = DEFAULT_VERSION;
}
minor = sti_cpufreq_fetch_minor();
if (minor < 0) {
dev_err(dev, "Failed to obtain minor version\n");
/* Use default minor number */
minor = DEFAULT_VERSION;
}
snprintf(name, MAX_PCODE_NAME_LEN, "pcode%d", pcode);
ret = dev_pm_opp_set_prop_name(dev, name);
if (ret) {
dev_err(dev, "Failed to set prop name\n");
return ret;
}
version[0] = BIT(major);
version[1] = BIT(minor);
version[2] = BIT(substrate);
ret = dev_pm_opp_set_supported_hw(dev, version, VERSION_ELEMENTS);
if (ret) {
dev_err(dev, "Failed to set supported hardware\n");
return ret;
}
dev_dbg(dev, "pcode: %d major: %d minor: %d substrate: %d\n",
pcode, major, minor, substrate);
dev_dbg(dev, "version[0]: %x version[1]: %x version[2]: %x\n",
version[0], version[1], version[2]);
return 0;
}
static int sti_cpufreq_fetch_syscon_regsiters(void)
{
struct device *dev = ddata.cpu;
struct device_node *np = dev->of_node;
ddata.syscfg = syscon_regmap_lookup_by_phandle(np, "st,syscfg");
if (IS_ERR(ddata.syscfg)) {
dev_err(dev, "\"st,syscfg\" not supplied\n");
return PTR_ERR(ddata.syscfg);
}
ddata.syscfg_eng = syscon_regmap_lookup_by_phandle(np, "st,syscfg-eng");
if (IS_ERR(ddata.syscfg_eng)) {
dev_err(dev, "\"st,syscfg-eng\" not supplied\n");
return PTR_ERR(ddata.syscfg_eng);
}
return 0;
}
static int sti_cpufreq_init(void)
{
int ret;
ddata.cpu = get_cpu_device(0);
if (!ddata.cpu) {
dev_err(ddata.cpu, "Failed to get device for CPU0\n");
goto skip_voltage_scaling;
}
if (!of_get_property(ddata.cpu->of_node, "operating-points-v2", NULL)) {
dev_err(ddata.cpu, "OPP-v2 not supported\n");
goto skip_voltage_scaling;
}
ret = sti_cpufreq_fetch_syscon_regsiters();
if (ret)
goto skip_voltage_scaling;
ret = sti_cpufreq_set_opp_info();
if (!ret)
goto register_cpufreq_dt;
skip_voltage_scaling:
dev_err(ddata.cpu, "Not doing voltage scaling\n");
register_cpufreq_dt:
platform_device_register_simple("cpufreq-dt", -1, NULL, 0);
return 0;
}
module_init(sti_cpufreq_init);
MODULE_DESCRIPTION("STMicroelectronics CPUFreq/OPP driver");
MODULE_AUTHOR("Ajitpal Singh <ajitpal.singh@st.com>");
MODULE_AUTHOR("Lee Jones <lee.jones@linaro.org>");
MODULE_LICENSE("GPL v2");
......@@ -278,7 +278,6 @@ struct cpufreq_driver {
struct freq_attr **attr;
/* platform specific boost support code */
bool boost_supported;
bool boost_enabled;
int (*set_boost)(int state);
};
......@@ -574,7 +573,6 @@ ssize_t cpufreq_show_cpus(const struct cpumask *mask, char *buf);
#ifdef CONFIG_CPU_FREQ
int cpufreq_boost_trigger_state(int state);
int cpufreq_boost_supported(void);
int cpufreq_boost_enabled(void);
int cpufreq_enable_boost_support(void);
bool policy_has_boost_freq(struct cpufreq_policy *policy);
......@@ -583,10 +581,6 @@ static inline int cpufreq_boost_trigger_state(int state)
{
return 0;
}
static inline int cpufreq_boost_supported(void)
{
return 0;
}
static inline int cpufreq_boost_enabled(void)
{
return 0;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment