Commit fecc8c0e authored by Rafael J. Wysocki's avatar Rafael J. Wysocki

Merge branch 'pm-cpufreq'

* pm-cpufreq: (51 commits)
  Documentation: intel_pstate: Document HWP energy/performance hints
  cpufreq: intel_pstate: Support for energy performance hints with HWP
  cpufreq: intel_pstate: Add locking around HWP requests
  cpufreq: ondemand: Set MIN_FREQUENCY_UP_THRESHOLD to 1
  cpufreq: intel_pstate: Add Knights Mill CPUID
  MAINTAINERS: Add bug tracking system location entry for cpufreq
  cpufreq: dt: Add support for zx296718
  cpufreq: acpi-cpufreq: drop rdmsr_on_cpus() usage
  cpufreq: acpi-cpufreq: Convert to hotplug state machine
  cpufreq: intel_pstate: fix intel_pstate_exit_perf_limits() prototype
  cpufreq: intel_pstate: Set EPP/EPB to 0 in performance mode
  cpufreq: schedutil: Rectify comment in sugov_irq_work() function
  cpufreq: intel_pstate: increase precision of performance limits
  cpufreq: intel_pstate: round up min_perf limits
  cpufreq: Make cpufreq_update_policy() void
  ACPI / processor: Make acpi_processor_ppc_has_changed() void
  cpufreq: Avoid using inactive policies
  cpufreq: intel_pstate: Generic governors support
  cpufreq: intel_pstate: Request P-states control from SMM if needed
  cpufreq: dt: Add support for r8a7743 and r8a7745
  ...
parents 57def856 2bf3b685
......@@ -44,11 +44,17 @@ the stats driver insertion.
total 0
drwxr-xr-x 2 root root 0 May 14 16:06 .
drwxr-xr-x 3 root root 0 May 14 15:58 ..
--w------- 1 root root 4096 May 14 16:06 reset
-r--r--r-- 1 root root 4096 May 14 16:06 time_in_state
-r--r--r-- 1 root root 4096 May 14 16:06 total_trans
-r--r--r-- 1 root root 4096 May 14 16:06 trans_table
--------------------------------------------------------------------------------
- reset
Write-only attribute that can be used to reset the stat counters. This can be
useful for evaluating system behaviour under different governors without the
need for a reboot.
- time_in_state
This gives the amount of time spent in each of the frequencies supported by
this CPU. The cat output will have "<frequency> <time>" pair in each line, which
......
......@@ -48,7 +48,7 @@ In addition to the frequency-controlling interfaces provided by the cpufreq
core, the driver provides its own sysfs files to control the P-State selection.
These files have been added to /sys/devices/system/cpu/intel_pstate/.
Any changes made to these files are applicable to all CPUs (even in a
multi-package system).
multi-package system, Refer to later section on placing "Per-CPU limits").
max_perf_pct: Limits the maximum P-State that will be requested by
the driver. It states it as a percentage of the available performance. The
......@@ -120,13 +120,57 @@ frequency is fictional for Intel Core processors. Even if the scaling
driver selects a single P-State, the actual frequency the processor
will run at is selected by the processor itself.
Per-CPU limits
The kernel command line option "intel_pstate=per_cpu_perf_limits" forces
the intel_pstate driver to use per-CPU performance limits. When it is set,
the sysfs control interface described above is subject to limitations.
- The following controls are not available for both read and write
/sys/devices/system/cpu/intel_pstate/max_perf_pct
/sys/devices/system/cpu/intel_pstate/min_perf_pct
- The following controls can be used to set performance limits, as far as the
architecture of the processor permits:
/sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
/sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
/sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
- User can still observe turbo percent and number of P-States from
/sys/devices/system/cpu/intel_pstate/turbo_pct
/sys/devices/system/cpu/intel_pstate/num_pstates
- User can read write system wide turbo status
/sys/devices/system/cpu/no_turbo
Support of energy performance hints
It is possible to provide hints to the HWP algorithms in the processor
to be more performance centric to more energy centric. When the driver
is using HWP, two additional cpufreq sysfs attributes are presented for
each logical CPU.
These attributes are:
- energy_performance_available_preferences
- energy_performance_preference
To get list of supported hints:
$ cat energy_performance_available_preferences
default performance balance_performance balance_power power
The current preference can be read or changed via cpufreq sysfs
attribute "energy_performance_preference". Reading from this attribute
will display current effective setting. User can write any of the valid
preference string to this attribute. User can always restore to power-on
default by writing "default".
Since threads can migrate to different CPUs, this is possible that the
new CPU may have different energy performance preference than the previous
one. To avoid such issues, either threads can be pinned to specific CPUs
or set the same energy performance preference value to all CPUs.
Tuning Intel P-State driver
When HWP mode is not used, debugfs files have also been added to allow the
tuning of the internal governor algorithm. These files are located at
/sys/kernel/debug/pstate_snb/. The algorithm uses a PID (Proportional
Integral Derivative) controller. The PID tunable parameters are:
When the performance can be tuned using PID (Proportional Integral
Derivative) controller, debugfs files are provided for adjusting performance.
They are presented under:
/sys/kernel/debug/pstate_snb/
The PID tunable parameters are:
deadband
d_gain_pct
i_gain_pct
......
Broadcom AVS mail box and interrupt register bindings
=====================================================
A total of three DT nodes are required. One node (brcm,avs-cpu-data-mem)
references the mailbox register used to communicate with the AVS CPU[1]. The
second node (brcm,avs-cpu-l2-intr) is required to trigger an interrupt on
the AVS CPU. The interrupt tells the AVS CPU that it needs to process a
command sent to it by a driver. Interrupting the AVS CPU is mandatory for
commands to be processed.
The interface also requires a reference to the AVS host interrupt controller,
so a driver can react to interrupts generated by the AVS CPU whenever a command
has been processed. See [2] for more information on the brcm,l2-intc node.
[1] The AVS CPU is an independent co-processor that runs proprietary
firmware. On some SoCs, this firmware supports DFS and DVFS in addition to
Adaptive Voltage Scaling.
[2] Documentation/devicetree/bindings/interrupt-controller/brcm,l2-intc.txt
Node brcm,avs-cpu-data-mem
--------------------------
Required properties:
- compatible: must include: brcm,avs-cpu-data-mem and
should include: one of brcm,bcm7271-avs-cpu-data-mem or
brcm,bcm7268-avs-cpu-data-mem
- reg: Specifies base physical address and size of the registers.
- interrupts: The interrupt that the AVS CPU will use to interrupt the host
when a command completed.
- interrupt-parent: The interrupt controller the above interrupt is routed
through.
- interrupt-names: The name of the interrupt used to interrupt the host.
Optional properties:
- None
Node brcm,avs-cpu-l2-intr
-------------------------
Required properties:
- compatible: must include: brcm,avs-cpu-l2-intr and
should include: one of brcm,bcm7271-avs-cpu-l2-intr or
brcm,bcm7268-avs-cpu-l2-intr
- reg: Specifies base physical address and size of the registers.
Optional properties:
- None
Example
=======
avs_host_l2_intc: interrupt-controller@f04d1200 {
#interrupt-cells = <1>;
compatible = "brcm,l2-intc";
interrupt-parent = <&intc>;
reg = <0xf04d1200 0x48>;
interrupt-controller;
interrupts = <0x0 0x19 0x0>;
interrupt-names = "avs";
};
avs-cpu-data-mem@f04c4000 {
compatible = "brcm,bcm7271-avs-cpu-data-mem",
"brcm,avs-cpu-data-mem";
reg = <0xf04c4000 0x60>;
interrupts = <0x1a>;
interrupt-parent = <&avs_host_l2_intc>;
interrupt-names = "sw_intr";
};
avs-cpu-l2-intr@f04d1100 {
compatible = "brcm,bcm7271-avs-cpu-l2-intr",
"brcm,avs-cpu-l2-intr";
reg = <0xf04d1100 0x10>;
};
......@@ -1760,6 +1760,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
disable
Do not enable intel_pstate as the default
scaling driver for the supported processors
passive
Use intel_pstate as a scaling driver, but configure it
to work with generic cpufreq governors (instead of
enabling its internal governor). This mode cannot be
used along with the hardware-managed P-states (HWP)
feature.
force
Enable intel_pstate on systems that prohibit it by default
in favor of acpi-cpufreq. Forcing the intel_pstate driver
......@@ -1780,6 +1786,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
Description Table, specifies preferred power management
profile as "Enterprise Server" or "Performance Server",
then this feature is turned on by default.
per_cpu_perf_limits
Allow per-logical-CPU P-State performance control limits using
cpufreq sysfs interface
intremap= [X86-64, Intel-IOMMU]
on enable Interrupt Remapping (default)
......
......@@ -2749,6 +2749,14 @@ L: bcm-kernel-feedback-list@broadcom.com
S: Maintained
F: drivers/mtd/nand/brcmnand/
BROADCOM STB AVS CPUFREQ DRIVER
M: Markus Mayer <mmayer@broadcom.com>
M: bcm-kernel-feedback-list@broadcom.com
L: linux-pm@vger.kernel.org
S: Maintained
F: Documentation/devicetree/bindings/cpufreq/brcm,stb-avs-cpu-freq.txt
F: drivers/cpufreq/brcmstb*
BROADCOM SPECIFIC AMBA DRIVER (BCMA)
M: Rafał Miłecki <zajec5@gmail.com>
L: linux-wireless@vger.kernel.org
......@@ -3341,6 +3349,7 @@ L: linux-pm@vger.kernel.org
S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git
T: git git://git.linaro.org/people/vireshk/linux.git (For ARM Updates)
B: https://bugzilla.kernel.org
F: Documentation/cpu-freq/
F: drivers/cpufreq/
F: include/linux/cpufreq.h
......
......@@ -157,7 +157,7 @@ static void acpi_processor_ppc_ost(acpi_handle handle, int status)
status, NULL);
}
int acpi_processor_ppc_has_changed(struct acpi_processor *pr, int event_flag)
void acpi_processor_ppc_has_changed(struct acpi_processor *pr, int event_flag)
{
int ret;
......@@ -168,7 +168,7 @@ int acpi_processor_ppc_has_changed(struct acpi_processor *pr, int event_flag)
*/
if (event_flag)
acpi_processor_ppc_ost(pr->handle, 1);
return 0;
return;
}
ret = acpi_processor_get_platform_limit(pr);
......@@ -182,10 +182,8 @@ int acpi_processor_ppc_has_changed(struct acpi_processor *pr, int event_flag)
else
acpi_processor_ppc_ost(pr->handle, 0);
}
if (ret < 0)
return (ret);
else
return cpufreq_update_policy(pr->id);
if (ret >= 0)
cpufreq_update_policy(pr->id);
}
int acpi_processor_get_bios_limit(int cpu, unsigned int *limit)
......@@ -465,11 +463,33 @@ int acpi_processor_get_performance_info(struct acpi_processor *pr)
return result;
}
EXPORT_SYMBOL_GPL(acpi_processor_get_performance_info);
int acpi_processor_notify_smm(struct module *calling_module)
int acpi_processor_pstate_control(void)
{
acpi_status status;
static int is_done = 0;
if (!acpi_gbl_FADT.smi_command || !acpi_gbl_FADT.pstate_control)
return 0;
ACPI_DEBUG_PRINT((ACPI_DB_INFO,
"Writing pstate_control [0x%x] to smi_command [0x%x]\n",
acpi_gbl_FADT.pstate_control, acpi_gbl_FADT.smi_command));
status = acpi_os_write_port(acpi_gbl_FADT.smi_command,
(u32)acpi_gbl_FADT.pstate_control, 8);
if (ACPI_SUCCESS(status))
return 1;
ACPI_EXCEPTION((AE_INFO, status,
"Failed to write pstate_control [0x%x] to smi_command [0x%x]",
acpi_gbl_FADT.pstate_control, acpi_gbl_FADT.smi_command));
return -EIO;
}
int acpi_processor_notify_smm(struct module *calling_module)
{
static int is_done = 0;
int result;
if (!(acpi_processor_ppc_status & PPC_REGISTERED))
return -EBUSY;
......@@ -492,26 +512,15 @@ int acpi_processor_notify_smm(struct module *calling_module)
is_done = -EIO;
/* Can't write pstate_control to smi_command if either value is zero */
if ((!acpi_gbl_FADT.smi_command) || (!acpi_gbl_FADT.pstate_control)) {
result = acpi_processor_pstate_control();
if (!result) {
ACPI_DEBUG_PRINT((ACPI_DB_INFO, "No SMI port or pstate_control\n"));
module_put(calling_module);
return 0;
}
ACPI_DEBUG_PRINT((ACPI_DB_INFO,
"Writing pstate_control [0x%x] to smi_command [0x%x]\n",
acpi_gbl_FADT.pstate_control, acpi_gbl_FADT.smi_command));
status = acpi_os_write_port(acpi_gbl_FADT.smi_command,
(u32) acpi_gbl_FADT.pstate_control, 8);
if (ACPI_FAILURE(status)) {
ACPI_EXCEPTION((AE_INFO, status,
"Failed to write pstate_control [0x%x] to "
"smi_command [0x%x]", acpi_gbl_FADT.pstate_control,
acpi_gbl_FADT.smi_command));
if (result < 0) {
module_put(calling_module);
return status;
return result;
}
/* Success. If there's no _PPC, we need to fear nothing, so
......
......@@ -12,6 +12,27 @@ config ARM_BIG_LITTLE_CPUFREQ
help
This enables the Generic CPUfreq driver for ARM big.LITTLE platforms.
config ARM_BRCMSTB_AVS_CPUFREQ
tristate "Broadcom STB AVS CPUfreq driver"
depends on ARCH_BRCMSTB || COMPILE_TEST
default y
help
Some Broadcom STB SoCs use a co-processor running proprietary firmware
("AVS") to handle voltage and frequency scaling. This driver provides
a standard CPUfreq interface to to the firmware.
Say Y, if you have a Broadcom SoC with AVS support for DFS or DVFS.
config ARM_BRCMSTB_AVS_CPUFREQ_DEBUG
bool "Broadcom STB AVS CPUfreq driver sysfs debug capability"
depends on ARM_BRCMSTB_AVS_CPUFREQ
help
Enabling this option turns on debug support via sysfs under
/sys/kernel/debug/brcmstb-avs-cpufreq. It is possible to read all and
write some AVS mailbox registers through sysfs entries.
If in doubt, say N.
config ARM_DT_BL_CPUFREQ
tristate "Generic probing via DT for ARM big LITTLE CPUfreq driver"
depends on ARM_BIG_LITTLE_CPUFREQ && OF
......@@ -60,14 +81,6 @@ config ARM_IMX6Q_CPUFREQ
If in doubt, say N.
config ARM_INTEGRATOR
tristate "CPUfreq driver for ARM Integrator CPUs"
depends on ARCH_INTEGRATOR
default y
help
This enables the CPUfreq driver for ARM Integrator CPUs.
If in doubt, say Y.
config ARM_KIRKWOOD_CPUFREQ
def_bool MACH_KIRKWOOD
help
......
......@@ -51,12 +51,12 @@ obj-$(CONFIG_ARM_BIG_LITTLE_CPUFREQ) += arm_big_little.o
# LITTLE drivers, so that it is probed last.
obj-$(CONFIG_ARM_DT_BL_CPUFREQ) += arm_big_little_dt.o
obj-$(CONFIG_ARM_BRCMSTB_AVS_CPUFREQ) += brcmstb-avs-cpufreq.o
obj-$(CONFIG_ARCH_DAVINCI) += davinci-cpufreq.o
obj-$(CONFIG_UX500_SOC_DB8500) += dbx500-cpufreq.o
obj-$(CONFIG_ARM_EXYNOS5440_CPUFREQ) += exynos5440-cpufreq.o
obj-$(CONFIG_ARM_HIGHBANK_CPUFREQ) += highbank-cpufreq.o
obj-$(CONFIG_ARM_IMX6Q_CPUFREQ) += imx6q-cpufreq.o
obj-$(CONFIG_ARM_INTEGRATOR) += integrator-cpufreq.o
obj-$(CONFIG_ARM_KIRKWOOD_CPUFREQ) += kirkwood-cpufreq.o
obj-$(CONFIG_ARM_MT8173_CPUFREQ) += mt8173-cpufreq.o
obj-$(CONFIG_ARM_OMAP2PLUS_CPUFREQ) += omap-cpufreq.o
......
......@@ -84,7 +84,6 @@ static inline struct acpi_processor_performance *to_perf_data(struct acpi_cpufre
static struct cpufreq_driver acpi_cpufreq_driver;
static unsigned int acpi_pstate_strict;
static struct msr __percpu *msrs;
static bool boost_state(unsigned int cpu)
{
......@@ -104,11 +103,10 @@ static bool boost_state(unsigned int cpu)
return false;
}
static void boost_set_msrs(bool enable, const struct cpumask *cpumask)
static int boost_set_msr(bool enable)
{
u32 cpu;
u32 msr_addr;
u64 msr_mask;
u64 msr_mask, val;
switch (boot_cpu_data.x86_vendor) {
case X86_VENDOR_INTEL:
......@@ -120,26 +118,31 @@ static void boost_set_msrs(bool enable, const struct cpumask *cpumask)
msr_mask = MSR_K7_HWCR_CPB_DIS;
break;
default:
return;
return -EINVAL;
}
rdmsr_on_cpus(cpumask, msr_addr, msrs);
rdmsrl(msr_addr, val);
for_each_cpu(cpu, cpumask) {
struct msr *reg = per_cpu_ptr(msrs, cpu);
if (enable)
reg->q &= ~msr_mask;
val &= ~msr_mask;
else
reg->q |= msr_mask;
}
val |= msr_mask;
wrmsr_on_cpus(cpumask, msr_addr, msrs);
wrmsrl(msr_addr, val);
return 0;
}
static void boost_set_msr_each(void *p_en)
{
bool enable = (bool) p_en;
boost_set_msr(enable);
}
static int set_boost(int val)
{
get_online_cpus();
boost_set_msrs(val, cpu_online_mask);
on_each_cpu(boost_set_msr_each, (void *)(long)val, 1);
put_online_cpus();
pr_debug("Core Boosting %sabled.\n", val ? "en" : "dis");
......@@ -536,46 +539,24 @@ static void free_acpi_perf_data(void)
free_percpu(acpi_perf_data);
}
static int boost_notify(struct notifier_block *nb, unsigned long action,
void *hcpu)
static int cpufreq_boost_online(unsigned int cpu)
{
unsigned cpu = (long)hcpu;
const struct cpumask *cpumask;
cpumask = get_cpu_mask(cpu);
/*
* On the CPU_UP path we simply keep the boost-disable flag
* in sync with the current global state.
*/
return boost_set_msr(acpi_cpufreq_driver.boost_enabled);
}
static int cpufreq_boost_down_prep(unsigned int cpu)
{
/*
* Clear the boost-disable bit on the CPU_DOWN path so that
* this cpu cannot block the remaining ones from boosting. On
* the CPU_UP path we simply keep the boost-disable flag in
* sync with the current global state.
* this cpu cannot block the remaining ones from boosting.
*/
switch (action) {
case CPU_DOWN_FAILED:
case CPU_DOWN_FAILED_FROZEN:
case CPU_ONLINE:
case CPU_ONLINE_FROZEN:
boost_set_msrs(acpi_cpufreq_driver.boost_enabled, cpumask);
break;
case CPU_DOWN_PREPARE:
case CPU_DOWN_PREPARE_FROZEN:
boost_set_msrs(1, cpumask);
break;
default:
break;
}
return NOTIFY_OK;
return boost_set_msr(1);
}
static struct notifier_block boost_nb = {
.notifier_call = boost_notify,
};
/*
* acpi_cpufreq_early_init - initialize ACPI P-States library
*
......@@ -922,37 +903,35 @@ static struct cpufreq_driver acpi_cpufreq_driver = {
.attr = acpi_cpufreq_attr,
};
static enum cpuhp_state acpi_cpufreq_online;
static void __init acpi_cpufreq_boost_init(void)
{
if (boot_cpu_has(X86_FEATURE_CPB) || boot_cpu_has(X86_FEATURE_IDA)) {
msrs = msrs_alloc();
int ret;
if (!msrs)
if (!(boot_cpu_has(X86_FEATURE_CPB) || boot_cpu_has(X86_FEATURE_IDA)))
return;
acpi_cpufreq_driver.set_boost = set_boost;
acpi_cpufreq_driver.boost_enabled = boost_state(0);
cpu_notifier_register_begin();
/* Force all MSRs to the same value */
boost_set_msrs(acpi_cpufreq_driver.boost_enabled,
cpu_online_mask);
__register_cpu_notifier(&boost_nb);
cpu_notifier_register_done();
/*
* This calls the online callback on all online cpu and forces all
* MSRs to the same value.
*/
ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "cpufreq/acpi:online",
cpufreq_boost_online, cpufreq_boost_down_prep);
if (ret < 0) {
pr_err("acpi_cpufreq: failed to register hotplug callbacks\n");
return;
}
acpi_cpufreq_online = ret;
}
static void acpi_cpufreq_boost_exit(void)
{
if (msrs) {
unregister_cpu_notifier(&boost_nb);
msrs_free(msrs);
msrs = NULL;
}
if (acpi_cpufreq_online >= 0)
cpuhp_remove_state_nocalls(acpi_cpufreq_online);
}
static int __init acpi_cpufreq_init(void)
......
This diff is collapsed.
......@@ -247,3 +247,10 @@ MODULE_DESCRIPTION("CPUFreq driver based on the ACPI CPPC v5.0+ spec");
MODULE_LICENSE("GPL");
late_initcall(cppc_cpufreq_init);
static const struct acpi_device_id cppc_acpi_ids[] = {
{ACPI_PROCESSOR_DEVICE_HID, },
{}
};
MODULE_DEVICE_TABLE(acpi, cppc_acpi_ids);
......@@ -26,6 +26,9 @@ static const struct of_device_id machines[] __initconst = {
{ .compatible = "allwinner,sun8i-a83t", },
{ .compatible = "allwinner,sun8i-h3", },
{ .compatible = "arm,integrator-ap", },
{ .compatible = "arm,integrator-cp", },
{ .compatible = "hisilicon,hi6220", },
{ .compatible = "fsl,imx27", },
......@@ -34,6 +37,8 @@ static const struct of_device_id machines[] __initconst = {
{ .compatible = "fsl,imx7d", },
{ .compatible = "marvell,berlin", },
{ .compatible = "marvell,pxa250", },
{ .compatible = "marvell,pxa270", },
{ .compatible = "samsung,exynos3250", },
{ .compatible = "samsung,exynos4210", },
......@@ -50,6 +55,8 @@ static const struct of_device_id machines[] __initconst = {
{ .compatible = "renesas,r7s72100", },
{ .compatible = "renesas,r8a73a4", },
{ .compatible = "renesas,r8a7740", },
{ .compatible = "renesas,r8a7743", },
{ .compatible = "renesas,r8a7745", },
{ .compatible = "renesas,r8a7778", },
{ .compatible = "renesas,r8a7779", },
{ .compatible = "renesas,r8a7790", },
......@@ -72,6 +79,12 @@ static const struct of_device_id machines[] __initconst = {
{ .compatible = "sigma,tango4" },
{ .compatible = "socionext,uniphier-pro5", },
{ .compatible = "socionext,uniphier-pxs2", },
{ .compatible = "socionext,uniphier-ld6b", },
{ .compatible = "socionext,uniphier-ld11", },
{ .compatible = "socionext,uniphier-ld20", },
{ .compatible = "ti,am33xx", },
{ .compatible = "ti,dra7", },
{ .compatible = "ti,omap2", },
......@@ -81,6 +94,8 @@ static const struct of_device_id machines[] __initconst = {
{ .compatible = "xlnx,zynq-7000", },
{ .compatible = "zte,zx296718", },
{ }
};
......
......@@ -1526,7 +1526,10 @@ unsigned int cpufreq_get(unsigned int cpu)
if (policy) {
down_read(&policy->rwsem);
if (!policy_is_inactive(policy))
ret_freq = __cpufreq_get(policy);
up_read(&policy->rwsem);
cpufreq_cpu_put(policy);
......@@ -2254,17 +2257,19 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy,
* Useful for policy notifiers which have different necessities
* at different times.
*/
int cpufreq_update_policy(unsigned int cpu)
void cpufreq_update_policy(unsigned int cpu)
{
struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
struct cpufreq_policy new_policy;
int ret;
if (!policy)
return -ENODEV;
return;
down_write(&policy->rwsem);
if (policy_is_inactive(policy))
goto unlock;
pr_debug("updating policy for CPU %u\n", cpu);
memcpy(&new_policy, policy, sizeof(*policy));
new_policy.min = policy->user_policy.min;
......@@ -2275,24 +2280,20 @@ int cpufreq_update_policy(unsigned int cpu)
* -> ask driver for current freq and notify governors about a change
*/
if (cpufreq_driver->get && !cpufreq_driver->setpolicy) {
if (cpufreq_suspended) {
ret = -EAGAIN;
if (cpufreq_suspended)
goto unlock;
}
new_policy.cur = cpufreq_update_current_freq(policy);
if (WARN_ON(!new_policy.cur)) {
ret = -EIO;
if (WARN_ON(!new_policy.cur))
goto unlock;
}
}
ret = cpufreq_set_policy(policy, &new_policy);
cpufreq_set_policy(policy, &new_policy);
unlock:
up_write(&policy->rwsem);
cpufreq_cpu_put(policy);
return ret;
}
EXPORT_SYMBOL(cpufreq_update_policy);
......
......@@ -37,16 +37,16 @@ struct cs_dbs_tuners {
#define DEF_SAMPLING_DOWN_FACTOR (1)
#define MAX_SAMPLING_DOWN_FACTOR (10)
static inline unsigned int get_freq_target(struct cs_dbs_tuners *cs_tuners,
static inline unsigned int get_freq_step(struct cs_dbs_tuners *cs_tuners,
struct cpufreq_policy *policy)
{
unsigned int freq_target = (cs_tuners->freq_step * policy->max) / 100;
unsigned int freq_step = (cs_tuners->freq_step * policy->max) / 100;
/* max freq cannot be less than 100. But who knows... */
if (unlikely(freq_target == 0))
freq_target = DEF_FREQUENCY_STEP;
if (unlikely(freq_step == 0))
freq_step = DEF_FREQUENCY_STEP;
return freq_target;
return freq_step;
}
/*
......@@ -55,10 +55,10 @@ static inline unsigned int get_freq_target(struct cs_dbs_tuners *cs_tuners,
* sampling_down_factor, we check, if current idle time is more than 80%
* (default), then we try to decrease frequency
*
* Any frequency increase takes it to the maximum frequency. Frequency reduction
* happens at minimum steps of 5% (default) of maximum frequency
* Frequency updates happen at minimum steps of 5% (default) of maximum
* frequency
*/
static unsigned int cs_dbs_timer(struct cpufreq_policy *policy)
static unsigned int cs_dbs_update(struct cpufreq_policy *policy)
{
struct policy_dbs_info *policy_dbs = policy->governor_data;
struct cs_policy_dbs_info *dbs_info = to_dbs_info(policy_dbs);
......@@ -66,6 +66,7 @@ static unsigned int cs_dbs_timer(struct cpufreq_policy *policy)
struct dbs_data *dbs_data = policy_dbs->dbs_data;
struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
unsigned int load = dbs_update(policy);
unsigned int freq_step;
/*
* break out if we 'cannot' reduce the speed as the user might
......@@ -82,6 +83,23 @@ static unsigned int cs_dbs_timer(struct cpufreq_policy *policy)
if (requested_freq > policy->max || requested_freq < policy->min)
requested_freq = policy->cur;
freq_step = get_freq_step(cs_tuners, policy);
/*
* Decrease requested_freq one freq_step for each idle period that
* we didn't update the frequency.
*/
if (policy_dbs->idle_periods < UINT_MAX) {
unsigned int freq_steps = policy_dbs->idle_periods * freq_step;
if (requested_freq > freq_steps)
requested_freq -= freq_steps;
else
requested_freq = policy->min;
policy_dbs->idle_periods = UINT_MAX;
}
/* Check for frequency increase */
if (load > dbs_data->up_threshold) {
dbs_info->down_skip = 0;
......@@ -90,7 +108,7 @@ static unsigned int cs_dbs_timer(struct cpufreq_policy *policy)
if (requested_freq == policy->max)
goto out;
requested_freq += get_freq_target(cs_tuners, policy);
requested_freq += freq_step;
if (requested_freq > policy->max)
requested_freq = policy->max;
......@@ -106,16 +124,14 @@ static unsigned int cs_dbs_timer(struct cpufreq_policy *policy)
/* Check for frequency decrease */
if (load < cs_tuners->down_threshold) {
unsigned int freq_target;
/*
* if we cannot reduce the frequency anymore, break out early
*/
if (requested_freq == policy->min)
goto out;
freq_target = get_freq_target(cs_tuners, policy);
if (requested_freq > freq_target)
requested_freq -= freq_target;
if (requested_freq > freq_step)
requested_freq -= freq_step;
else
requested_freq = policy->min;
......@@ -305,7 +321,7 @@ static void cs_start(struct cpufreq_policy *policy)
static struct dbs_governor cs_governor = {
.gov = CPUFREQ_DBS_GOVERNOR_INITIALIZER("conservative"),
.kobj_type = { .default_attrs = cs_attributes },
.gov_dbs_timer = cs_dbs_timer,
.gov_dbs_update = cs_dbs_update,
.alloc = cs_alloc,
.free = cs_free,
.init = cs_init,
......
......@@ -61,7 +61,7 @@ ssize_t store_sampling_rate(struct gov_attr_set *attr_set, const char *buf,
* entries can't be freed concurrently.
*/
list_for_each_entry(policy_dbs, &attr_set->policy_list, list) {
mutex_lock(&policy_dbs->timer_mutex);
mutex_lock(&policy_dbs->update_mutex);
/*
* On 32-bit architectures this may race with the
* sample_delay_ns read in dbs_update_util_handler(), but that
......@@ -76,7 +76,7 @@ ssize_t store_sampling_rate(struct gov_attr_set *attr_set, const char *buf,
* taken, so it shouldn't be significant.
*/
gov_update_sample_delay(policy_dbs, 0);
mutex_unlock(&policy_dbs->timer_mutex);
mutex_unlock(&policy_dbs->update_mutex);
}
return count;
......@@ -117,7 +117,7 @@ unsigned int dbs_update(struct cpufreq_policy *policy)
struct policy_dbs_info *policy_dbs = policy->governor_data;
struct dbs_data *dbs_data = policy_dbs->dbs_data;
unsigned int ignore_nice = dbs_data->ignore_nice_load;
unsigned int max_load = 0;
unsigned int max_load = 0, idle_periods = UINT_MAX;
unsigned int sampling_rate, io_busy, j;
/*
......@@ -215,9 +215,19 @@ unsigned int dbs_update(struct cpufreq_policy *policy)
j_cdbs->prev_load = load;
}
if (time_elapsed > 2 * sampling_rate) {
unsigned int periods = time_elapsed / sampling_rate;
if (periods < idle_periods)
idle_periods = periods;
}
if (load > max_load)
max_load = load;
}
policy_dbs->idle_periods = idle_periods;
return max_load;
}
EXPORT_SYMBOL_GPL(dbs_update);
......@@ -236,9 +246,9 @@ static void dbs_work_handler(struct work_struct *work)
* Make sure cpufreq_governor_limits() isn't evaluating load or the
* ondemand governor isn't updating the sampling rate in parallel.
*/
mutex_lock(&policy_dbs->timer_mutex);
gov_update_sample_delay(policy_dbs, gov->gov_dbs_timer(policy));
mutex_unlock(&policy_dbs->timer_mutex);
mutex_lock(&policy_dbs->update_mutex);
gov_update_sample_delay(policy_dbs, gov->gov_dbs_update(policy));
mutex_unlock(&policy_dbs->update_mutex);
/* Allow the utilization update handler to queue up more work. */
atomic_set(&policy_dbs->work_count, 0);
......@@ -348,7 +358,7 @@ static struct policy_dbs_info *alloc_policy_dbs_info(struct cpufreq_policy *poli
return NULL;
policy_dbs->policy = policy;
mutex_init(&policy_dbs->timer_mutex);
mutex_init(&policy_dbs->update_mutex);
atomic_set(&policy_dbs->work_count, 0);
init_irq_work(&policy_dbs->irq_work, dbs_irq_work);
INIT_WORK(&policy_dbs->work, dbs_work_handler);
......@@ -367,7 +377,7 @@ static void free_policy_dbs_info(struct policy_dbs_info *policy_dbs,
{
int j;
mutex_destroy(&policy_dbs->timer_mutex);
mutex_destroy(&policy_dbs->update_mutex);
for_each_cpu(j, policy_dbs->policy->related_cpus) {
struct cpu_dbs_info *j_cdbs = &per_cpu(cpu_dbs, j);
......@@ -547,10 +557,10 @@ void cpufreq_dbs_governor_limits(struct cpufreq_policy *policy)
{
struct policy_dbs_info *policy_dbs = policy->governor_data;
mutex_lock(&policy_dbs->timer_mutex);
mutex_lock(&policy_dbs->update_mutex);
cpufreq_policy_apply_limits(policy);
gov_update_sample_delay(policy_dbs, 0);
mutex_unlock(&policy_dbs->timer_mutex);
mutex_unlock(&policy_dbs->update_mutex);
}
EXPORT_SYMBOL_GPL(cpufreq_dbs_governor_limits);
......@@ -85,7 +85,7 @@ struct policy_dbs_info {
* Per policy mutex that serializes load evaluation from limit-change
* and work-handler.
*/
struct mutex timer_mutex;
struct mutex update_mutex;
u64 last_sample_time;
s64 sample_delay_ns;
......@@ -97,6 +97,7 @@ struct policy_dbs_info {
struct list_head list;
/* Multiplier for increasing sample delay temporarily. */
unsigned int rate_mult;
unsigned int idle_periods; /* For conservative */
/* Status indicators */
bool is_shared; /* This object is used by multiple CPUs */
bool work_in_progress; /* Work is being queued up or in progress */
......@@ -135,7 +136,7 @@ struct dbs_governor {
*/
struct dbs_data *gdbs_data;
unsigned int (*gov_dbs_timer)(struct cpufreq_policy *policy);
unsigned int (*gov_dbs_update)(struct cpufreq_policy *policy);
struct policy_dbs_info *(*alloc)(void);
void (*free)(struct policy_dbs_info *policy_dbs);
int (*init)(struct dbs_data *dbs_data);
......
......@@ -25,7 +25,7 @@
#define MAX_SAMPLING_DOWN_FACTOR (100000)
#define MICRO_FREQUENCY_UP_THRESHOLD (95)
#define MICRO_FREQUENCY_MIN_SAMPLE_RATE (10000)
#define MIN_FREQUENCY_UP_THRESHOLD (11)
#define MIN_FREQUENCY_UP_THRESHOLD (1)
#define MAX_FREQUENCY_UP_THRESHOLD (100)
static struct od_ops od_ops;
......@@ -169,7 +169,7 @@ static void od_update(struct cpufreq_policy *policy)
}
}
static unsigned int od_dbs_timer(struct cpufreq_policy *policy)
static unsigned int od_dbs_update(struct cpufreq_policy *policy)
{
struct policy_dbs_info *policy_dbs = policy->governor_data;
struct dbs_data *dbs_data = policy_dbs->dbs_data;
......@@ -191,7 +191,7 @@ static unsigned int od_dbs_timer(struct cpufreq_policy *policy)
od_update(policy);
if (dbs_info->freq_lo) {
/* Setup timer for SUB_SAMPLE */
/* Setup SUB_SAMPLE */
dbs_info->sample_type = OD_SUB_SAMPLE;
return dbs_info->freq_hi_delay_us;
}
......@@ -255,11 +255,11 @@ static ssize_t store_sampling_down_factor(struct gov_attr_set *attr_set,
list_for_each_entry(policy_dbs, &attr_set->policy_list, list) {
/*
* Doing this without locking might lead to using different
* rate_mult values in od_update() and od_dbs_timer().
* rate_mult values in od_update() and od_dbs_update().
*/
mutex_lock(&policy_dbs->timer_mutex);
mutex_lock(&policy_dbs->update_mutex);
policy_dbs->rate_mult = 1;
mutex_unlock(&policy_dbs->timer_mutex);
mutex_unlock(&policy_dbs->update_mutex);
}
return count;
......@@ -374,8 +374,7 @@ static int od_init(struct dbs_data *dbs_data)
dbs_data->up_threshold = MICRO_FREQUENCY_UP_THRESHOLD;
/*
* In nohz/micro accounting case we set the minimum frequency
* not depending on HZ, but fixed (very low). The deferred
* timer might skip some samples if idle/sleeping as needed.
* not depending on HZ, but fixed (very low).
*/
dbs_data->min_sampling_rate = MICRO_FREQUENCY_MIN_SAMPLE_RATE;
} else {
......@@ -415,7 +414,7 @@ static struct od_ops od_ops = {
static struct dbs_governor od_dbs_gov = {
.gov = CPUFREQ_DBS_GOVERNOR_INITIALIZER("ondemand"),
.kobj_type = { .default_attrs = od_attributes },
.gov_dbs_timer = od_dbs_timer,
.gov_dbs_update = od_dbs_update,
.alloc = od_alloc,
.free = od_free,
.init = od_init,
......
......@@ -41,6 +41,18 @@ static int cpufreq_stats_update(struct cpufreq_stats *stats)
return 0;
}
static void cpufreq_stats_clear_table(struct cpufreq_stats *stats)
{
unsigned int count = stats->max_state;
memset(stats->time_in_state, 0, count * sizeof(u64));
#ifdef CONFIG_CPU_FREQ_STAT_DETAILS
memset(stats->trans_table, 0, count * count * sizeof(int));
#endif
stats->last_time = get_jiffies_64();
stats->total_trans = 0;
}
static ssize_t show_total_trans(struct cpufreq_policy *policy, char *buf)
{
return sprintf(buf, "%d\n", policy->stats->total_trans);
......@@ -64,6 +76,14 @@ static ssize_t show_time_in_state(struct cpufreq_policy *policy, char *buf)
return len;
}
static ssize_t store_reset(struct cpufreq_policy *policy, const char *buf,
size_t count)
{
/* We don't care what is written to the attribute. */
cpufreq_stats_clear_table(policy->stats);
return count;
}
#ifdef CONFIG_CPU_FREQ_STAT_DETAILS
static ssize_t show_trans_table(struct cpufreq_policy *policy, char *buf)
{
......@@ -113,10 +133,12 @@ cpufreq_freq_attr_ro(trans_table);
cpufreq_freq_attr_ro(total_trans);
cpufreq_freq_attr_ro(time_in_state);
cpufreq_freq_attr_wo(reset);
static struct attribute *default_attrs[] = {
&total_trans.attr,
&time_in_state.attr,
&reset.attr,
#ifdef CONFIG_CPU_FREQ_STAT_DETAILS
&trans_table.attr,
#endif
......
/*
* Copyright (C) 2001-2002 Deep Blue Solutions Ltd.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*
* CPU support functions
*/
#include <linux/module.h>
#include <linux/types.h>
#include <linux/kernel.h>
#include <linux/cpufreq.h>
#include <linux/sched.h>
#include <linux/smp.h>
#include <linux/init.h>
#include <linux/io.h>
#include <linux/platform_device.h>
#include <linux/of.h>
#include <linux/of_address.h>
#include <asm/mach-types.h>
#include <asm/hardware/icst.h>
static void __iomem *cm_base;
/* The cpufreq driver only use the OSC register */
#define INTEGRATOR_HDR_OSC_OFFSET 0x08
#define INTEGRATOR_HDR_LOCK_OFFSET 0x14
static struct cpufreq_driver integrator_driver;
static const struct icst_params lclk_params = {
.ref = 24000000,
.vco_max = ICST525_VCO_MAX_5V,
.vco_min = ICST525_VCO_MIN,
.vd_min = 8,
.vd_max = 132,
.rd_min = 24,
.rd_max = 24,
.s2div = icst525_s2div,
.idx2s = icst525_idx2s,
};
static const struct icst_params cclk_params = {
.ref = 24000000,
.vco_max = ICST525_VCO_MAX_5V,
.vco_min = ICST525_VCO_MIN,
.vd_min = 12,
.vd_max = 160,
.rd_min = 24,
.rd_max = 24,
.s2div = icst525_s2div,
.idx2s = icst525_idx2s,
};
/*
* Validate the speed policy.
*/
static int integrator_verify_policy(struct cpufreq_policy *policy)
{
struct icst_vco vco;
cpufreq_verify_within_cpu_limits(policy);
vco = icst_hz_to_vco(&cclk_params, policy->max * 1000);
policy->max = icst_hz(&cclk_params, vco) / 1000;
vco = icst_hz_to_vco(&cclk_params, policy->min * 1000);
policy->min = icst_hz(&cclk_params, vco) / 1000;
cpufreq_verify_within_cpu_limits(policy);
return 0;
}
static int integrator_set_target(struct cpufreq_policy *policy,
unsigned int target_freq,
unsigned int relation)
{
cpumask_t cpus_allowed;
int cpu = policy->cpu;
struct icst_vco vco;
struct cpufreq_freqs freqs;
u_int cm_osc;
/*
* Save this threads cpus_allowed mask.
*/
cpus_allowed = current->cpus_allowed;
/*
* Bind to the specified CPU. When this call returns,
* we should be running on the right CPU.
*/
set_cpus_allowed_ptr(current, cpumask_of(cpu));
BUG_ON(cpu != smp_processor_id());
/* get current setting */
cm_osc = __raw_readl(cm_base + INTEGRATOR_HDR_OSC_OFFSET);
if (machine_is_integrator())
vco.s = (cm_osc >> 8) & 7;
else if (machine_is_cintegrator())
vco.s = 1;
vco.v = cm_osc & 255;
vco.r = 22;
freqs.old = icst_hz(&cclk_params, vco) / 1000;
/* icst_hz_to_vco rounds down -- so we need the next
* larger freq in case of CPUFREQ_RELATION_L.
*/
if (relation == CPUFREQ_RELATION_L)
target_freq += 999;
if (target_freq > policy->max)
target_freq = policy->max;
vco = icst_hz_to_vco(&cclk_params, target_freq * 1000);
freqs.new = icst_hz(&cclk_params, vco) / 1000;
if (freqs.old == freqs.new) {
set_cpus_allowed_ptr(current, &cpus_allowed);
return 0;
}
cpufreq_freq_transition_begin(policy, &freqs);
cm_osc = __raw_readl(cm_base + INTEGRATOR_HDR_OSC_OFFSET);
if (machine_is_integrator()) {
cm_osc &= 0xfffff800;
cm_osc |= vco.s << 8;
} else if (machine_is_cintegrator()) {
cm_osc &= 0xffffff00;
}
cm_osc |= vco.v;
__raw_writel(0xa05f, cm_base + INTEGRATOR_HDR_LOCK_OFFSET);
__raw_writel(cm_osc, cm_base + INTEGRATOR_HDR_OSC_OFFSET);
__raw_writel(0, cm_base + INTEGRATOR_HDR_LOCK_OFFSET);
/*
* Restore the CPUs allowed mask.
*/
set_cpus_allowed_ptr(current, &cpus_allowed);
cpufreq_freq_transition_end(policy, &freqs, 0);
return 0;
}
static unsigned int integrator_get(unsigned int cpu)
{
cpumask_t cpus_allowed;
unsigned int current_freq;
u_int cm_osc;
struct icst_vco vco;
cpus_allowed = current->cpus_allowed;
set_cpus_allowed_ptr(current, cpumask_of(cpu));
BUG_ON(cpu != smp_processor_id());
/* detect memory etc. */
cm_osc = __raw_readl(cm_base + INTEGRATOR_HDR_OSC_OFFSET);
if (machine_is_integrator())
vco.s = (cm_osc >> 8) & 7;
else
vco.s = 1;
vco.v = cm_osc & 255;
vco.r = 22;
current_freq = icst_hz(&cclk_params, vco) / 1000; /* current freq */
set_cpus_allowed_ptr(current, &cpus_allowed);
return current_freq;
}
static int integrator_cpufreq_init(struct cpufreq_policy *policy)
{
/* set default policy and cpuinfo */
policy->max = policy->cpuinfo.max_freq = 160000;
policy->min = policy->cpuinfo.min_freq = 12000;
policy->cpuinfo.transition_latency = 1000000; /* 1 ms, assumed */
return 0;
}
static struct cpufreq_driver integrator_driver = {
.flags = CPUFREQ_NEED_INITIAL_FREQ_CHECK,
.verify = integrator_verify_policy,
.target = integrator_set_target,
.get = integrator_get,
.init = integrator_cpufreq_init,
.name = "integrator",
};
static int __init integrator_cpufreq_probe(struct platform_device *pdev)
{
struct resource *res;
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
if (!res)
return -ENODEV;
cm_base = devm_ioremap(&pdev->dev, res->start, resource_size(res));
if (!cm_base)
return -ENODEV;
return cpufreq_register_driver(&integrator_driver);
}
static int __exit integrator_cpufreq_remove(struct platform_device *pdev)
{
return cpufreq_unregister_driver(&integrator_driver);
}
static const struct of_device_id integrator_cpufreq_match[] = {
{ .compatible = "arm,core-module-integrator"},
{ },
};
MODULE_DEVICE_TABLE(of, integrator_cpufreq_match);
static struct platform_driver integrator_cpufreq_driver = {
.driver = {
.name = "integrator-cpufreq",
.of_match_table = integrator_cpufreq_match,
},
.remove = __exit_p(integrator_cpufreq_remove),
};
module_platform_driver_probe(integrator_cpufreq_driver,
integrator_cpufreq_probe);
MODULE_AUTHOR("Russell M. King");
MODULE_DESCRIPTION("cpufreq driver for ARM Integrator CPUs");
MODULE_LICENSE("GPL");
This diff is collapsed.
......@@ -42,6 +42,10 @@
#define PMSR_PSAFE_ENABLE (1UL << 30)
#define PMSR_SPR_EM_DISABLE (1UL << 31)
#define PMSR_MAX(x) ((x >> 32) & 0xFF)
#define LPSTATE_SHIFT 48
#define GPSTATE_SHIFT 56
#define GET_LPSTATE(x) (((x) >> LPSTATE_SHIFT) & 0xFF)
#define GET_GPSTATE(x) (((x) >> GPSTATE_SHIFT) & 0xFF)
#define MAX_RAMP_DOWN_TIME 5120
/*
......@@ -592,7 +596,8 @@ void gpstate_timer_handler(unsigned long data)
{
struct cpufreq_policy *policy = (struct cpufreq_policy *)data;
struct global_pstate_info *gpstates = policy->driver_data;
int gpstate_idx;
int gpstate_idx, lpstate_idx;
unsigned long val;
unsigned int time_diff = jiffies_to_msecs(jiffies)
- gpstates->last_sampled_time;
struct powernv_smp_call_data freq_data;
......@@ -600,21 +605,37 @@ void gpstate_timer_handler(unsigned long data)
if (!spin_trylock(&gpstates->gpstate_lock))
return;
/*
* If PMCR was last updated was using fast_swtich then
* We may have wrong in gpstate->last_lpstate_idx
* value. Hence, read from PMCR to get correct data.
*/
val = get_pmspr(SPRN_PMCR);
freq_data.gpstate_id = (s8)GET_GPSTATE(val);
freq_data.pstate_id = (s8)GET_LPSTATE(val);
if (freq_data.gpstate_id == freq_data.pstate_id) {
reset_gpstates(policy);
spin_unlock(&gpstates->gpstate_lock);
return;
}
gpstates->last_sampled_time += time_diff;
gpstates->elapsed_time += time_diff;
freq_data.pstate_id = idx_to_pstate(gpstates->last_lpstate_idx);
if ((gpstates->last_gpstate_idx == gpstates->last_lpstate_idx) ||
(gpstates->elapsed_time > MAX_RAMP_DOWN_TIME)) {
if (gpstates->elapsed_time > MAX_RAMP_DOWN_TIME) {
gpstate_idx = pstate_to_idx(freq_data.pstate_id);
lpstate_idx = gpstate_idx;
reset_gpstates(policy);
gpstates->highest_lpstate_idx = gpstate_idx;
} else {
lpstate_idx = pstate_to_idx(freq_data.pstate_id);
gpstate_idx = calc_global_pstate(gpstates->elapsed_time,
gpstates->highest_lpstate_idx,
gpstates->last_lpstate_idx);
lpstate_idx);
}
freq_data.gpstate_id = idx_to_pstate(gpstate_idx);
gpstates->last_gpstate_idx = gpstate_idx;
gpstates->last_lpstate_idx = lpstate_idx;
/*
* If local pstate is equal to global pstate, rampdown is over
* So timer is not required to be queued.
......@@ -622,10 +643,6 @@ void gpstate_timer_handler(unsigned long data)
if (gpstate_idx != gpstates->last_lpstate_idx)
queue_gpstate_timer(gpstates);
freq_data.gpstate_id = idx_to_pstate(gpstate_idx);
gpstates->last_gpstate_idx = pstate_to_idx(freq_data.gpstate_id);
gpstates->last_lpstate_idx = pstate_to_idx(freq_data.pstate_id);
spin_unlock(&gpstates->gpstate_lock);
/* Timer may get migrated to a different cpu on cpu hot unplug */
......@@ -647,8 +664,14 @@ static int powernv_cpufreq_target_index(struct cpufreq_policy *policy,
if (unlikely(rebooting) && new_index != get_nominal_index())
return 0;
if (!throttled)
if (!throttled) {
/* we don't want to be preempted while
* checking if the CPU frequency has been throttled
*/
preempt_disable();
powernv_cpufreq_throttle_check(NULL);
preempt_enable();
}
cur_msec = jiffies_to_msecs(get_jiffies_64());
......@@ -752,9 +775,12 @@ static int powernv_cpufreq_cpu_init(struct cpufreq_policy *policy)
spin_lock_init(&gpstates->gpstate_lock);
ret = cpufreq_table_validate_and_show(policy, powernv_freqs);
if (ret < 0)
if (ret < 0) {
kfree(policy->driver_data);
return ret;
}
policy->fast_switch_possible = true;
return ret;
}
......@@ -897,6 +923,20 @@ static void powernv_cpufreq_stop_cpu(struct cpufreq_policy *policy)
del_timer_sync(&gpstates->timer);
}
static unsigned int powernv_fast_switch(struct cpufreq_policy *policy,
unsigned int target_freq)
{
int index;
struct powernv_smp_call_data freq_data;
index = cpufreq_table_find_index_dl(policy, target_freq);
freq_data.pstate_id = powernv_freqs[index].driver_data;
freq_data.gpstate_id = powernv_freqs[index].driver_data;
set_pstate(&freq_data);
return powernv_freqs[index].frequency;
}
static struct cpufreq_driver powernv_cpufreq_driver = {
.name = "powernv-cpufreq",
.flags = CPUFREQ_CONST_LOOPS,
......@@ -904,6 +944,7 @@ static struct cpufreq_driver powernv_cpufreq_driver = {
.exit = powernv_cpufreq_cpu_exit,
.verify = cpufreq_generic_frequency_table_verify,
.target_index = powernv_cpufreq_target_index,
.fast_switch = powernv_fast_switch,
.get = powernv_cpufreq_get,
.stop_cpu = powernv_cpufreq_stop_cpu,
.attr = powernv_cpu_freq_attr,
......
......@@ -249,6 +249,7 @@ extern int acpi_processor_register_performance(struct acpi_processor_performance
*performance, unsigned int cpu);
extern void acpi_processor_unregister_performance(unsigned int cpu);
int acpi_processor_pstate_control(void);
/* note: this locks both the calling module and the processor module
if a _PPC object exists, rmmod is disallowed then */
int acpi_processor_notify_smm(struct module *calling_module);
......@@ -294,7 +295,7 @@ static inline void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx
#ifdef CONFIG_CPU_FREQ
void acpi_processor_ppc_init(void);
void acpi_processor_ppc_exit(void);
int acpi_processor_ppc_has_changed(struct acpi_processor *pr, int event_flag);
void acpi_processor_ppc_has_changed(struct acpi_processor *pr, int event_flag);
extern int acpi_processor_get_bios_limit(int cpu, unsigned int *limit);
#else
static inline void acpi_processor_ppc_init(void)
......
......@@ -175,7 +175,7 @@ void disable_cpufreq(void);
u64 get_cpu_idle_time(unsigned int cpu, u64 *wall, int io_busy);
int cpufreq_get_policy(struct cpufreq_policy *policy, unsigned int cpu);
int cpufreq_update_policy(unsigned int cpu);
void cpufreq_update_policy(unsigned int cpu);
bool have_governor_per_policy(void);
struct kobject *get_governor_parent_kobj(struct cpufreq_policy *policy);
void cpufreq_enable_fast_switch(struct cpufreq_policy *policy);
......@@ -234,6 +234,10 @@ __ATTR(_name, _perm, show_##_name, NULL)
static struct freq_attr _name = \
__ATTR(_name, 0644, show_##_name, store_##_name)
#define cpufreq_freq_attr_wo(_name) \
static struct freq_attr _name = \
__ATTR(_name, 0200, NULL, store_##_name)
struct global_attr {
struct attribute attr;
ssize_t (*show)(struct kobject *kobj,
......
......@@ -12,11 +12,14 @@
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/cpufreq.h>
#include <linux/kthread.h>
#include <linux/slab.h>
#include <trace/events/power.h>
#include "sched.h"
#define SUGOV_KTHREAD_PRIORITY 50
struct sugov_tunables {
struct gov_attr_set attr_set;
unsigned int rate_limit_us;
......@@ -35,8 +38,10 @@ struct sugov_policy {
/* The next fields are only needed if fast switch cannot be used. */
struct irq_work irq_work;
struct work_struct work;
struct kthread_work work;
struct mutex work_lock;
struct kthread_worker worker;
struct task_struct *thread;
bool work_in_progress;
bool need_freq_update;
......@@ -291,7 +296,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
raw_spin_unlock(&sg_policy->update_lock);
}
static void sugov_work(struct work_struct *work)
static void sugov_work(struct kthread_work *work)
{
struct sugov_policy *sg_policy = container_of(work, struct sugov_policy, work);
......@@ -308,7 +313,21 @@ static void sugov_irq_work(struct irq_work *irq_work)
struct sugov_policy *sg_policy;
sg_policy = container_of(irq_work, struct sugov_policy, irq_work);
schedule_work_on(smp_processor_id(), &sg_policy->work);
/*
* For RT and deadline tasks, the schedutil governor shoots the
* frequency to maximum. Special care must be taken to ensure that this
* kthread doesn't result in the same behavior.
*
* This is (mostly) guaranteed by the work_in_progress flag. The flag is
* updated only at the end of the sugov_work() function and before that
* the schedutil governor rejects all other frequency scaling requests.
*
* There is a very rare case though, where the RT thread yields right
* after the work_in_progress flag is cleared. The effects of that are
* neglected for now.
*/
kthread_queue_work(&sg_policy->worker, &sg_policy->work);
}
/************************** sysfs interface ************************/
......@@ -371,19 +390,64 @@ static struct sugov_policy *sugov_policy_alloc(struct cpufreq_policy *policy)
return NULL;
sg_policy->policy = policy;
init_irq_work(&sg_policy->irq_work, sugov_irq_work);
INIT_WORK(&sg_policy->work, sugov_work);
mutex_init(&sg_policy->work_lock);
raw_spin_lock_init(&sg_policy->update_lock);
return sg_policy;
}
static void sugov_policy_free(struct sugov_policy *sg_policy)
{
mutex_destroy(&sg_policy->work_lock);
kfree(sg_policy);
}
static int sugov_kthread_create(struct sugov_policy *sg_policy)
{
struct task_struct *thread;
struct sched_param param = { .sched_priority = MAX_USER_RT_PRIO / 2 };
struct cpufreq_policy *policy = sg_policy->policy;
int ret;
/* kthread only required for slow path */
if (policy->fast_switch_enabled)
return 0;
kthread_init_work(&sg_policy->work, sugov_work);
kthread_init_worker(&sg_policy->worker);
thread = kthread_create(kthread_worker_fn, &sg_policy->worker,
"sugov:%d",
cpumask_first(policy->related_cpus));
if (IS_ERR(thread)) {
pr_err("failed to create sugov thread: %ld\n", PTR_ERR(thread));
return PTR_ERR(thread);
}
ret = sched_setscheduler_nocheck(thread, SCHED_FIFO, &param);
if (ret) {
kthread_stop(thread);
pr_warn("%s: failed to set SCHED_FIFO\n", __func__);
return ret;
}
sg_policy->thread = thread;
kthread_bind_mask(thread, policy->related_cpus);
init_irq_work(&sg_policy->irq_work, sugov_irq_work);
mutex_init(&sg_policy->work_lock);
wake_up_process(thread);
return 0;
}
static void sugov_kthread_stop(struct sugov_policy *sg_policy)
{
/* kthread only required for slow path */
if (sg_policy->policy->fast_switch_enabled)
return;
kthread_flush_worker(&sg_policy->worker);
kthread_stop(sg_policy->thread);
mutex_destroy(&sg_policy->work_lock);
}
static struct sugov_tunables *sugov_tunables_alloc(struct sugov_policy *sg_policy)
{
struct sugov_tunables *tunables;
......@@ -416,16 +480,24 @@ static int sugov_init(struct cpufreq_policy *policy)
if (policy->governor_data)
return -EBUSY;
cpufreq_enable_fast_switch(policy);
sg_policy = sugov_policy_alloc(policy);
if (!sg_policy)
return -ENOMEM;
if (!sg_policy) {
ret = -ENOMEM;
goto disable_fast_switch;
}
ret = sugov_kthread_create(sg_policy);
if (ret)
goto free_sg_policy;
mutex_lock(&global_tunables_lock);
if (global_tunables) {
if (WARN_ON(have_governor_per_policy())) {
ret = -EINVAL;
goto free_sg_policy;
goto stop_kthread;
}
policy->governor_data = sg_policy;
sg_policy->tunables = global_tunables;
......@@ -437,7 +509,7 @@ static int sugov_init(struct cpufreq_policy *policy)
tunables = sugov_tunables_alloc(sg_policy);
if (!tunables) {
ret = -ENOMEM;
goto free_sg_policy;
goto stop_kthread;
}
tunables->rate_limit_us = LATENCY_MULTIPLIER;
......@@ -454,20 +526,25 @@ static int sugov_init(struct cpufreq_policy *policy)
if (ret)
goto fail;
out:
out:
mutex_unlock(&global_tunables_lock);
cpufreq_enable_fast_switch(policy);
return 0;
fail:
fail:
policy->governor_data = NULL;
sugov_tunables_free(tunables);
free_sg_policy:
stop_kthread:
sugov_kthread_stop(sg_policy);
free_sg_policy:
mutex_unlock(&global_tunables_lock);
sugov_policy_free(sg_policy);
disable_fast_switch:
cpufreq_disable_fast_switch(policy);
pr_err("initialization failed (error %d)\n", ret);
return ret;
}
......@@ -478,8 +555,6 @@ static void sugov_exit(struct cpufreq_policy *policy)
struct sugov_tunables *tunables = sg_policy->tunables;
unsigned int count;
cpufreq_disable_fast_switch(policy);
mutex_lock(&global_tunables_lock);
count = gov_attr_set_put(&tunables->attr_set, &sg_policy->tunables_hook);
......@@ -489,7 +564,9 @@ static void sugov_exit(struct cpufreq_policy *policy)
mutex_unlock(&global_tunables_lock);
sugov_kthread_stop(sg_policy);
sugov_policy_free(sg_policy);
cpufreq_disable_fast_switch(policy);
}
static int sugov_start(struct cpufreq_policy *policy)
......@@ -535,8 +612,10 @@ static void sugov_stop(struct cpufreq_policy *policy)
synchronize_sched();
if (!policy->fast_switch_enabled) {
irq_work_sync(&sg_policy->irq_work);
cancel_work_sync(&sg_policy->work);
kthread_cancel_work_sync(&sg_policy->work);
}
}
static void sugov_limits(struct cpufreq_policy *policy)
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment