Commit 02b82b02 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'pm-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These are mostly fixes and cleanups all over the code and a new piece
  of documentation for Intel uncore frequency scaling.

  Functionality-wise, the intel_idle driver will support Sapphire Rapids
  Xeons natively now (with some extra facilities for controlling
  C-states more precisely on those systems), virtual guests will take
  the ACPI S4 hardware signature into account by default, the
  intel_pstate driver will take the defualt EPP value from the firmware,
  cpupower utility will support the AMD P-state driver added in the
  previous cycle, and there is a new tracer utility for that driver.

  Specifics:

   - Allow device_pm_check_callbacks() to be called from interrupt
     context without issues (Dmitry Baryshkov).

   - Modify devm_pm_runtime_enable() to automatically handle
     pm_runtime_dont_use_autosuspend() at driver exit time (Douglas
     Anderson).

   - Make the schedutil cpufreq governor use to_gov_attr_set() instead
     of open coding it (Kevin Hao).

   - Replace acpi_bus_get_device() with acpi_fetch_acpi_dev() in the
     cpufreq longhaul driver (Rafael Wysocki).

   - Unify show() and store() naming in cpufreq and make it use
     __ATTR_XX (Lianjie Zhang).

   - Make the intel_pstate driver use the EPP value set by the firmware
     by default (Srinivas Pandruvada).

   - Re-order the init checks in the powernow-k8 cpufreq driver (Mario
     Limonciello).

   - Make the ACPI processor idle driver check for architectural support
     for LPI to avoid using it on x86 by mistake (Mario Limonciello).

   - Add Sapphire Rapids Xeon support to the intel_idle driver (Artem
     Bityutskiy).

   - Add 'preferred_cstates' module argument to the intel_idle driver to
     work around C1 and C1E handling issue on Sapphire Rapids (Artem
     Bityutskiy).

   - Add core C6 optimization on Sapphire Rapids to the intel_idle
     driver (Artem Bityutskiy).

   - Optimize the haltpoll cpuidle driver a bit (Li RongQing).

   - Remove leftover text from intel_idle() kerneldoc comment and fix up
     white space in intel_idle (Rafael Wysocki).

   - Fix load_image_and_restore() error path (Ye Bin).

   - Fix typos in comments in the system wakeup hadling code (Tom Rix).

   - Clean up non-kernel-doc comments in hibernation code (Jiapeng
     Chong).

   - Fix __setup handler error handling in system-wide suspend and
     hibernation core code (Randy Dunlap).

   - Add device name to suspend_report_result() (Youngjin Jang).

   - Make virtual guests honour ACPI S4 hardware signature by default
     (David Woodhouse).

   - Block power off of a parent PM domain unless child is in deepest
     state (Ulf Hansson).

   - Use dev_err_probe() to simplify error handling for generic PM
     domains (Ahmad Fatoum).

   - Fix sleep-in-atomic bug caused by genpd_debug_remove() (Shawn Guo).

   - Document Intel uncore frequency scaling (Srinivas Pandruvada).

   - Add DTPM hierarchy description (Daniel Lezcano).

   - Change the locking scheme in DTPM (Daniel Lezcano).

   - Fix dtpm_cpu cleanup at exit time and missing virtual DTPM pointer
     release (Daniel Lezcano).

   - Make dtpm_node_callback[] static (kernel test robot).

   - Fix spelling mistake "initialze" -> "initialize" in
     dtpm_create_hierarchy() (Colin Ian King).

   - Add tracer tool for the amd-pstate driver (Jinzhou Su).

   - Fix PC6 displaying in turbostat on some systems (Artem Bityutskiy).

   - Add AMD P-State support to the cpupower utility (Huang Rui)"

* tag 'pm-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (58 commits)
  cpufreq: powernow-k8: Re-order the init checks
  cpuidle: intel_idle: Drop redundant backslash at line end
  cpuidle: intel_idle: Update intel_idle() kerneldoc comment
  PM: hibernate: Honour ACPI hardware signature by default for virtual guests
  cpufreq: intel_pstate: Use firmware default EPP
  cpufreq: unify show() and store() naming and use __ATTR_XX
  PM: core: keep irq flags in device_pm_check_callbacks()
  cpuidle: haltpoll: Call cpuidle_poll_state_init() later
  Documentation: amd-pstate: add tracer tool introduction
  tools/power/x86/amd_pstate_tracer: Add tracer tool for AMD P-state
  tools/power/x86/intel_pstate_tracer: make tracer as a module
  cpufreq: amd-pstate: Add more tracepoint for AMD P-State module
  PM: sleep: Add device name to suspend_report_result()
  turbostat: fix PC6 displaying on some systems
  intel_idle: add core C6 optimization for SPR
  intel_idle: add 'preferred_cstates' module argument
  intel_idle: add SPR support
  PM: runtime: Have devm_pm_runtime_enable() handle pm_runtime_dont_use_autosuspend()
  ACPI: processor idle: Check for architectural support for LPI
  cpuidle: PSCI: Move the `has_lpi` check to the beginning of the function
  ...
parents 242ba665 ec3d8b83
......@@ -369,6 +369,32 @@ governor (for the policies it is attached to), or by the ``CPUFreq`` core (for t
policies with other scaling governors).
Tracer Tool
-------------
``amd_pstate_tracer.py`` can record and parse ``amd-pstate`` trace log, then
generate performance plots. This utility can be used to debug and tune the
performance of ``amd-pstate`` driver. The tracer tool needs to import intel
pstate tracer.
Tracer tool located in ``linux/tools/power/x86/amd_pstate_tracer``. It can be
used in two ways. If trace file is available, then directly parse the file
with command ::
./amd_pstate_trace.py [-c cpus] -t <trace_file> -n <test_name>
Or generate trace file with root privilege, then parse and plot with command ::
sudo ./amd_pstate_trace.py [-c cpus] -n <test_name> -i <interval> [-m kbytes]
The test result can be found in ``results/test_name``. Following is the example
about part of the output. ::
common_cpu common_secs common_usecs min_perf des_perf max_perf freq mperf apef tsc load duration_ms sample_num elapsed_time common_comm
CPU_005 712 116384 39 49 166 0.7565 9645075 2214891 38431470 25.1 11.646 469 2.496 kworker/5:0-40
CPU_006 712 116408 39 49 166 0.6769 8950227 1839034 37192089 24.06 11.272 470 2.496 kworker/6:0-1264
Reference
===========
......
.. SPDX-License-Identifier: GPL-2.0
.. include:: <isonum.txt>
==============================
Intel Uncore Frequency Scaling
==============================
:Copyright: |copy| 2022 Intel Corporation
:Author: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Introduction
------------
The uncore can consume significant amount of power in Intel's Xeon servers based
on the workload characteristics. To optimize the total power and improve overall
performance, SoCs have internal algorithms for scaling uncore frequency. These
algorithms monitor workload usage of uncore and set a desirable frequency.
It is possible that users have different expectations of uncore performance and
want to have control over it. The objective is similar to allowing users to set
the scaling min/max frequencies via cpufreq sysfs to improve CPU performance.
Users may have some latency sensitive workloads where they do not want any
change to uncore frequency. Also, users may have workloads which require
different core and uncore performance at distinct phases and they may want to
use both cpufreq and the uncore scaling interface to distribute power and
improve overall performance.
Sysfs Interface
---------------
To control uncore frequency, a sysfs interface is provided in the directory:
`/sys/devices/system/cpu/intel_uncore_frequency/`.
There is one directory for each package and die combination as the scope of
uncore scaling control is per die in multiple die/package SoCs or per
package for single die per package SoCs. The name represents the
scope of control. For example: 'package_00_die_00' is for package id 0 and
die 0.
Each package_*_die_* contains the following attributes:
``initial_max_freq_khz``
Out of reset, this attribute represent the maximum possible frequency.
This is a read-only attribute. If users adjust max_freq_khz,
they can always go back to maximum using the value from this attribute.
``initial_min_freq_khz``
Out of reset, this attribute represent the minimum possible frequency.
This is a read-only attribute. If users adjust min_freq_khz,
they can always go back to minimum using the value from this attribute.
``max_freq_khz``
This attribute is used to set the maximum uncore frequency.
``min_freq_khz``
This attribute is used to set the minimum uncore frequency.
``current_freq_khz``
This attribute is used to get the current uncore frequency.
......@@ -15,3 +15,4 @@ Working-State Power Management
cpufreq_drivers
intel_epb
intel-speed-select
intel_uncore_frequency_scaling
......@@ -1002,6 +1002,7 @@ L: linux-pm@vger.kernel.org
S: Supported
F: Documentation/admin-guide/pm/amd-pstate.rst
F: drivers/cpufreq/amd-pstate*
F: tools/power/x86/amd_pstate_tracer/amd_pstate_trace.py
AMD PTDMA DRIVER
M: Sanjay R Mehta <sanju.mehta@amd.com>
......
......@@ -54,6 +54,9 @@ static int psci_acpi_cpu_init_idle(unsigned int cpu)
struct acpi_lpi_state *lpi;
struct acpi_processor *pr = per_cpu(processors, cpu);
if (unlikely(!pr || !pr->flags.has_lpi))
return -EINVAL;
/*
* If the PSCI cpu_suspend function hook has not been initialized
* idle states must not be enabled, so bail out
......@@ -61,9 +64,6 @@ static int psci_acpi_cpu_init_idle(unsigned int cpu)
if (!psci_ops.cpu_suspend)
return -EOPNOTSUPP;
if (unlikely(!pr || !pr->flags.has_lpi))
return -EINVAL;
count = pr->power.count - 1;
if (count <= 0)
return -ENODEV;
......
......@@ -15,6 +15,7 @@
#include <asm/desc.h>
#include <asm/cacheflush.h>
#include <asm/realmode.h>
#include <asm/hypervisor.h>
#include <linux/ftrace.h>
#include "../../realmode/rm/wakeup.h"
......@@ -140,9 +141,9 @@ static int __init acpi_sleep_setup(char *str)
acpi_realmode_flags |= 4;
#ifdef CONFIG_HIBERNATION
if (strncmp(str, "s4_hwsig", 8) == 0)
acpi_check_s4_hw_signature(1);
acpi_check_s4_hw_signature = 1;
if (strncmp(str, "s4_nohwsig", 10) == 0)
acpi_check_s4_hw_signature(0);
acpi_check_s4_hw_signature = 0;
#endif
if (strncmp(str, "nonvs", 5) == 0)
acpi_nvs_nosave();
......@@ -160,3 +161,21 @@ static int __init acpi_sleep_setup(char *str)
}
__setup("acpi_sleep=", acpi_sleep_setup);
#if defined(CONFIG_HIBERNATION) && defined(CONFIG_HYPERVISOR_GUEST)
static int __init init_s4_sigcheck(void)
{
/*
* If running on a hypervisor, honour the ACPI specification
* by default and trigger a clean reboot when the hardware
* signature in FACS is changed after hibernation.
*/
if (acpi_check_s4_hw_signature == -1 &&
!hypervisor_is_type(X86_HYPER_NATIVE))
acpi_check_s4_hw_signature = 1;
return 0;
}
/* This must happen before acpi_init() which is a subsys initcall */
arch_initcall(init_s4_sigcheck);
#endif
......@@ -1080,6 +1080,11 @@ static int flatten_lpi_states(struct acpi_processor *pr,
return 0;
}
int __weak acpi_processor_ffh_lpi_probe(unsigned int cpu)
{
return -EOPNOTSUPP;
}
static int acpi_processor_get_lpi_info(struct acpi_processor *pr)
{
int ret, i;
......@@ -1088,6 +1093,11 @@ static int acpi_processor_get_lpi_info(struct acpi_processor *pr)
struct acpi_device *d = NULL;
struct acpi_lpi_states_array info[2], *tmp, *prev, *curr;
/* make sure our architecture has support */
ret = acpi_processor_ffh_lpi_probe(pr->id);
if (ret == -EOPNOTSUPP)
return ret;
if (!osc_pc_lpi_support_confirmed)
return -EOPNOTSUPP;
......@@ -1139,11 +1149,6 @@ static int acpi_processor_get_lpi_info(struct acpi_processor *pr)
return 0;
}
int __weak acpi_processor_ffh_lpi_probe(unsigned int cpu)
{
return -ENODEV;
}
int __weak acpi_processor_ffh_lpi_enter(struct acpi_lpi_state *lpi)
{
return -ENODEV;
......
......@@ -871,12 +871,7 @@ static inline void acpi_sleep_syscore_init(void) {}
#ifdef CONFIG_HIBERNATION
static unsigned long s4_hardware_signature;
static struct acpi_table_facs *facs;
static int sigcheck = -1; /* Default behaviour is just to warn */
void __init acpi_check_s4_hw_signature(int check)
{
sigcheck = check;
}
int acpi_check_s4_hw_signature = -1; /* Default behaviour is just to warn */
static int acpi_hibernation_begin(pm_message_t stage)
{
......@@ -1001,7 +996,7 @@ static void acpi_sleep_hibernate_setup(void)
hibernation_set_ops(old_suspend_ordering ?
&acpi_hibernation_ops_old : &acpi_hibernation_ops);
sleep_states[ACPI_STATE_S4] = 1;
if (!sigcheck)
if (!acpi_check_s4_hw_signature)
return;
acpi_get_table(ACPI_SIG_FACS, 1, (struct acpi_table_header **)&facs);
......@@ -1013,7 +1008,7 @@ static void acpi_sleep_hibernate_setup(void)
*/
s4_hardware_signature = facs->hardware_signature;
if (sigcheck > 0) {
if (acpi_check_s4_hw_signature > 0) {
/*
* If we're actually obeying the ACPI specification
* then the signature is written out as part of the
......
......@@ -636,6 +636,18 @@ static int genpd_power_off(struct generic_pm_domain *genpd, bool one_dev_on,
atomic_read(&genpd->sd_count) > 0)
return -EBUSY;
/*
* The children must be in their deepest (powered-off) states to allow
* the parent to be powered off. Note that, there's no need for
* additional locking, as powering on a child, requires the parent's
* lock to be acquired first.
*/
list_for_each_entry(link, &genpd->parent_links, parent_node) {
struct generic_pm_domain *child = link->child;
if (child->state_idx < child->state_count - 1)
return -EBUSY;
}
list_for_each_entry(pdd, &genpd->dev_list, list_node) {
enum pm_qos_flags_status stat;
......@@ -1073,6 +1085,13 @@ static void genpd_sync_power_off(struct generic_pm_domain *genpd, bool use_lock,
|| atomic_read(&genpd->sd_count) > 0)
return;
/* Check that the children are in their deepest (powered-off) state. */
list_for_each_entry(link, &genpd->parent_links, parent_node) {
struct generic_pm_domain *child = link->child;
if (child->state_idx < child->state_count - 1)
return;
}
/* Choose the deepest state when suspending */
genpd->state_idx = genpd->state_count - 1;
if (_genpd_power_off(genpd, false))
......@@ -2058,9 +2077,9 @@ static int genpd_remove(struct generic_pm_domain *genpd)
kfree(link);
}
genpd_debug_remove(genpd);
list_del(&genpd->gpd_list_node);
genpd_unlock(genpd);
genpd_debug_remove(genpd);
cancel_work_sync(&genpd->power_off_work);
if (genpd_is_cpu_domain(genpd))
free_cpumask_var(genpd->cpus);
......@@ -2248,12 +2267,8 @@ int of_genpd_add_provider_simple(struct device_node *np,
/* Parse genpd OPP table */
if (genpd->set_performance_state) {
ret = dev_pm_opp_of_add_table(&genpd->dev);
if (ret) {
if (ret != -EPROBE_DEFER)
dev_err(&genpd->dev, "Failed to add OPP table: %d\n",
ret);
return ret;
}
if (ret)
return dev_err_probe(&genpd->dev, ret, "Failed to add OPP table\n");
/*
* Save table for faster processing while setting performance
......@@ -2312,9 +2327,8 @@ int of_genpd_add_provider_onecell(struct device_node *np,
if (genpd->set_performance_state) {
ret = dev_pm_opp_of_add_table_indexed(&genpd->dev, i);
if (ret) {
if (ret != -EPROBE_DEFER)
dev_err(&genpd->dev, "Failed to add OPP table for index %d: %d\n",
i, ret);
dev_err_probe(&genpd->dev, ret,
"Failed to add OPP table for index %d\n", i);
goto error;
}
......@@ -2672,12 +2686,8 @@ static int __genpd_dev_pm_attach(struct device *dev, struct device *base_dev,
ret = genpd_add_device(pd, dev, base_dev);
mutex_unlock(&gpd_list_lock);
if (ret < 0) {
if (ret != -EPROBE_DEFER)
dev_err(dev, "failed to add to PM domain %s: %d",
pd->name, ret);
return ret;
}
if (ret < 0)
return dev_err_probe(dev, ret, "failed to add to PM domain %s\n", pd->name);
dev->pm_domain->detach = genpd_dev_pm_detach;
dev->pm_domain->sync = genpd_dev_pm_sync;
......
......@@ -485,7 +485,7 @@ static int dpm_run_callback(pm_callback_t cb, struct device *dev,
trace_device_pm_callback_start(dev, info, state.event);
error = cb(dev);
trace_device_pm_callback_end(dev, error);
suspend_report_result(cb, error);
suspend_report_result(dev, cb, error);
initcall_debug_report(dev, calltime, cb, error);
......@@ -1568,7 +1568,7 @@ static int legacy_suspend(struct device *dev, pm_message_t state,
trace_device_pm_callback_start(dev, info, state.event);
error = cb(dev, state);
trace_device_pm_callback_end(dev, error);
suspend_report_result(cb, error);
suspend_report_result(dev, cb, error);
initcall_debug_report(dev, calltime, cb, error);
......@@ -1855,7 +1855,7 @@ static int device_prepare(struct device *dev, pm_message_t state)
device_unlock(dev);
if (ret < 0) {
suspend_report_result(callback, ret);
suspend_report_result(dev, callback, ret);
pm_runtime_put(dev);
return ret;
}
......@@ -1960,10 +1960,10 @@ int dpm_suspend_start(pm_message_t state)
}
EXPORT_SYMBOL_GPL(dpm_suspend_start);
void __suspend_report_result(const char *function, void *fn, int ret)
void __suspend_report_result(const char *function, struct device *dev, void *fn, int ret)
{
if (ret)
pr_err("%s(): %pS returns %d\n", function, fn, ret);
dev_err(dev, "%s(): %pS returns %d\n", function, fn, ret);
}
EXPORT_SYMBOL_GPL(__suspend_report_result);
......@@ -2018,7 +2018,9 @@ static bool pm_ops_is_empty(const struct dev_pm_ops *ops)
void device_pm_check_callbacks(struct device *dev)
{
spin_lock_irq(&dev->power.lock);
unsigned long flags;
spin_lock_irqsave(&dev->power.lock, flags);
dev->power.no_pm_callbacks =
(!dev->bus || (pm_ops_is_empty(dev->bus->pm) &&
!dev->bus->suspend && !dev->bus->resume)) &&
......@@ -2027,7 +2029,7 @@ void device_pm_check_callbacks(struct device *dev)
(!dev->pm_domain || pm_ops_is_empty(&dev->pm_domain->ops)) &&
(!dev->driver || (pm_ops_is_empty(dev->driver->pm) &&
!dev->driver->suspend && !dev->driver->resume));
spin_unlock_irq(&dev->power.lock);
spin_unlock_irqrestore(&dev->power.lock, flags);
}
bool dev_pm_skip_suspend(struct device *dev)
......
......@@ -1476,11 +1476,16 @@ EXPORT_SYMBOL_GPL(pm_runtime_enable);
static void pm_runtime_disable_action(void *data)
{
pm_runtime_dont_use_autosuspend(data);
pm_runtime_disable(data);
}
/**
* devm_pm_runtime_enable - devres-enabled version of pm_runtime_enable.
*
* NOTE: this will also handle calling pm_runtime_dont_use_autosuspend() for
* you at driver exit time if needed.
*
* @dev: Device to handle.
*/
int devm_pm_runtime_enable(struct device *dev)
......
......@@ -289,7 +289,7 @@ EXPORT_SYMBOL_GPL(dev_pm_disable_wake_irq);
*
* Enables wakeirq conditionally. We need to enable wake-up interrupt
* lazily on the first rpm_suspend(). This is needed as the consumer device
* starts in RPM_SUSPENDED state, and the the first pm_runtime_get() would
* starts in RPM_SUSPENDED state, and the first pm_runtime_get() would
* otherwise try to disable already disabled wakeirq. The wake-up interrupt
* starts disabled with IRQ_NOAUTOEN set.
*
......
......@@ -587,7 +587,7 @@ static bool wakeup_source_not_registered(struct wakeup_source *ws)
* @ws: Wakeup source to handle.
*
* Update the @ws' statistics and, if @ws has just been activated, notify the PM
* core of the event by incrementing the counter of of wakeup events being
* core of the event by incrementing the counter of the wakeup events being
* processed.
*/
static void wakeup_source_activate(struct wakeup_source *ws)
......@@ -733,7 +733,7 @@ static void wakeup_source_deactivate(struct wakeup_source *ws)
/*
* Increment the counter of registered wakeup events and decrement the
* couter of wakeup events in progress simultaneously.
* counter of wakeup events in progress simultaneously.
*/
cec = atomic_add_return(MAX_IN_PROGRESS, &combined_event_count);
trace_wakeup_source_deactivate(ws->name, cec);
......
......@@ -27,6 +27,10 @@ TRACE_EVENT(amd_pstate_perf,
TP_PROTO(unsigned long min_perf,
unsigned long target_perf,
unsigned long capacity,
u64 freq,
u64 mperf,
u64 aperf,
u64 tsc,
unsigned int cpu_id,
bool changed,
bool fast_switch
......@@ -35,6 +39,10 @@ TRACE_EVENT(amd_pstate_perf,
TP_ARGS(min_perf,
target_perf,
capacity,
freq,
mperf,
aperf,
tsc,
cpu_id,
changed,
fast_switch
......@@ -44,6 +52,10 @@ TRACE_EVENT(amd_pstate_perf,
__field(unsigned long, min_perf)
__field(unsigned long, target_perf)
__field(unsigned long, capacity)
__field(unsigned long long, freq)
__field(unsigned long long, mperf)
__field(unsigned long long, aperf)
__field(unsigned long long, tsc)
__field(unsigned int, cpu_id)
__field(bool, changed)
__field(bool, fast_switch)
......@@ -53,15 +65,23 @@ TRACE_EVENT(amd_pstate_perf,
__entry->min_perf = min_perf;
__entry->target_perf = target_perf;
__entry->capacity = capacity;
__entry->freq = freq;
__entry->mperf = mperf;
__entry->aperf = aperf;
__entry->tsc = tsc;
__entry->cpu_id = cpu_id;
__entry->changed = changed;
__entry->fast_switch = fast_switch;
),
TP_printk("amd_min_perf=%lu amd_des_perf=%lu amd_max_perf=%lu cpu_id=%u changed=%s fast_switch=%s",
TP_printk("amd_min_perf=%lu amd_des_perf=%lu amd_max_perf=%lu freq=%llu mperf=%llu aperf=%llu tsc=%llu cpu_id=%u changed=%s fast_switch=%s",
(unsigned long)__entry->min_perf,
(unsigned long)__entry->target_perf,
(unsigned long)__entry->capacity,
(unsigned long long)__entry->freq,
(unsigned long long)__entry->mperf,
(unsigned long long)__entry->aperf,
(unsigned long long)__entry->tsc,
(unsigned int)__entry->cpu_id,
(__entry->changed) ? "true" : "false",
(__entry->fast_switch) ? "true" : "false"
......
......@@ -65,6 +65,18 @@ MODULE_PARM_DESC(shared_mem,
static struct cpufreq_driver amd_pstate_driver;
/**
* struct amd_aperf_mperf
* @aperf: actual performance frequency clock count
* @mperf: maximum performance frequency clock count
* @tsc: time stamp counter
*/
struct amd_aperf_mperf {
u64 aperf;
u64 mperf;
u64 tsc;
};
/**
* struct amd_cpudata - private CPU data for AMD P-State
* @cpu: CPU number
......@@ -81,6 +93,9 @@ static struct cpufreq_driver amd_pstate_driver;
* @min_freq: the frequency that mapped to lowest_perf
* @nominal_freq: the frequency that mapped to nominal_perf
* @lowest_nonlinear_freq: the frequency that mapped to lowest_nonlinear_perf
* @cur: Difference of Aperf/Mperf/tsc count between last and current sample
* @prev: Last Aperf/Mperf/tsc count value read from register
* @freq: current cpu frequency value
* @boost_supported: check whether the Processor or SBIOS supports boost mode
*
* The amd_cpudata is key private data for each CPU thread in AMD P-State, and
......@@ -102,6 +117,10 @@ struct amd_cpudata {
u32 nominal_freq;
u32 lowest_nonlinear_freq;
struct amd_aperf_mperf cur;
struct amd_aperf_mperf prev;
u64 freq;
bool boost_supported;
};
......@@ -211,6 +230,39 @@ static inline void amd_pstate_update_perf(struct amd_cpudata *cpudata,
max_perf, fast_switch);
}
static inline bool amd_pstate_sample(struct amd_cpudata *cpudata)
{
u64 aperf, mperf, tsc;
unsigned long flags;
local_irq_save(flags);
rdmsrl(MSR_IA32_APERF, aperf);
rdmsrl(MSR_IA32_MPERF, mperf);
tsc = rdtsc();
if (cpudata->prev.mperf == mperf || cpudata->prev.tsc == tsc) {
local_irq_restore(flags);
return false;
}
local_irq_restore(flags);
cpudata->cur.aperf = aperf;
cpudata->cur.mperf = mperf;
cpudata->cur.tsc = tsc;
cpudata->cur.aperf -= cpudata->prev.aperf;
cpudata->cur.mperf -= cpudata->prev.mperf;
cpudata->cur.tsc -= cpudata->prev.tsc;
cpudata->prev.aperf = aperf;
cpudata->prev.mperf = mperf;
cpudata->prev.tsc = tsc;
cpudata->freq = div64_u64((cpudata->cur.aperf * cpu_khz), cpudata->cur.mperf);
return true;
}
static void amd_pstate_update(struct amd_cpudata *cpudata, u32 min_perf,
u32 des_perf, u32 max_perf, bool fast_switch)
{
......@@ -226,8 +278,11 @@ static void amd_pstate_update(struct amd_cpudata *cpudata, u32 min_perf,
value &= ~AMD_CPPC_MAX_PERF(~0L);
value |= AMD_CPPC_MAX_PERF(max_perf);
trace_amd_pstate_perf(min_perf, des_perf, max_perf,
cpudata->cpu, (value != prev), fast_switch);
if (trace_amd_pstate_perf_enabled() && amd_pstate_sample(cpudata)) {
trace_amd_pstate_perf(min_perf, des_perf, max_perf, cpudata->freq,
cpudata->cur.mperf, cpudata->cur.aperf, cpudata->cur.tsc,
cpudata->cpu, (value != prev), fast_switch);
}
if (value == prev)
return;
......
......@@ -146,7 +146,7 @@ static unsigned int cs_dbs_update(struct cpufreq_policy *policy)
/************************** sysfs interface ************************/
static ssize_t store_sampling_down_factor(struct gov_attr_set *attr_set,
static ssize_t sampling_down_factor_store(struct gov_attr_set *attr_set,
const char *buf, size_t count)
{
struct dbs_data *dbs_data = to_dbs_data(attr_set);
......@@ -161,7 +161,7 @@ static ssize_t store_sampling_down_factor(struct gov_attr_set *attr_set,
return count;
}
static ssize_t store_up_threshold(struct gov_attr_set *attr_set,
static ssize_t up_threshold_store(struct gov_attr_set *attr_set,
const char *buf, size_t count)
{
struct dbs_data *dbs_data = to_dbs_data(attr_set);
......@@ -177,7 +177,7 @@ static ssize_t store_up_threshold(struct gov_attr_set *attr_set,
return count;
}
static ssize_t store_down_threshold(struct gov_attr_set *attr_set,
static ssize_t down_threshold_store(struct gov_attr_set *attr_set,
const char *buf, size_t count)
{
struct dbs_data *dbs_data = to_dbs_data(attr_set);
......@@ -195,7 +195,7 @@ static ssize_t store_down_threshold(struct gov_attr_set *attr_set,
return count;
}
static ssize_t store_ignore_nice_load(struct gov_attr_set *attr_set,
static ssize_t ignore_nice_load_store(struct gov_attr_set *attr_set,
const char *buf, size_t count)
{
struct dbs_data *dbs_data = to_dbs_data(attr_set);
......@@ -220,7 +220,7 @@ static ssize_t store_ignore_nice_load(struct gov_attr_set *attr_set,
return count;
}
static ssize_t store_freq_step(struct gov_attr_set *attr_set, const char *buf,
static ssize_t freq_step_store(struct gov_attr_set *attr_set, const char *buf,
size_t count)
{
struct dbs_data *dbs_data = to_dbs_data(attr_set);
......
......@@ -27,7 +27,7 @@ static DEFINE_MUTEX(gov_dbs_data_mutex);
/* Common sysfs tunables */
/*
* store_sampling_rate - update sampling rate effective immediately if needed.
* sampling_rate_store - update sampling rate effective immediately if needed.
*
* If new rate is smaller than the old, simply updating
* dbs.sampling_rate might not be appropriate. For example, if the
......@@ -41,7 +41,7 @@ static DEFINE_MUTEX(gov_dbs_data_mutex);
* This must be called with dbs_data->mutex held, otherwise traversing
* policy_dbs_list isn't safe.
*/
ssize_t store_sampling_rate(struct gov_attr_set *attr_set, const char *buf,
ssize_t sampling_rate_store(struct gov_attr_set *attr_set, const char *buf,
size_t count)
{
struct dbs_data *dbs_data = to_dbs_data(attr_set);
......@@ -80,7 +80,7 @@ ssize_t store_sampling_rate(struct gov_attr_set *attr_set, const char *buf,
return count;
}
EXPORT_SYMBOL_GPL(store_sampling_rate);
EXPORT_SYMBOL_GPL(sampling_rate_store);
/**
* gov_update_cpu_data - Update CPU load data.
......
......@@ -51,7 +51,7 @@ static inline struct dbs_data *to_dbs_data(struct gov_attr_set *attr_set)
}
#define gov_show_one(_gov, file_name) \
static ssize_t show_##file_name \
static ssize_t file_name##_show \
(struct gov_attr_set *attr_set, char *buf) \
{ \
struct dbs_data *dbs_data = to_dbs_data(attr_set); \
......@@ -60,7 +60,7 @@ static ssize_t show_##file_name \
}
#define gov_show_one_common(file_name) \
static ssize_t show_##file_name \
static ssize_t file_name##_show \
(struct gov_attr_set *attr_set, char *buf) \
{ \
struct dbs_data *dbs_data = to_dbs_data(attr_set); \
......@@ -68,12 +68,10 @@ static ssize_t show_##file_name \
}
#define gov_attr_ro(_name) \
static struct governor_attr _name = \
__ATTR(_name, 0444, show_##_name, NULL)
static struct governor_attr _name = __ATTR_RO(_name)
#define gov_attr_rw(_name) \
static struct governor_attr _name = \
__ATTR(_name, 0644, show_##_name, store_##_name)
static struct governor_attr _name = __ATTR_RW(_name)
/* Common to all CPUs of a policy */
struct policy_dbs_info {
......@@ -176,7 +174,7 @@ void od_register_powersave_bias_handler(unsigned int (*f)
(struct cpufreq_policy *, unsigned int, unsigned int),
unsigned int powersave_bias);
void od_unregister_powersave_bias_handler(void);
ssize_t store_sampling_rate(struct gov_attr_set *attr_set, const char *buf,
ssize_t sampling_rate_store(struct gov_attr_set *attr_set, const char *buf,
size_t count);
void gov_update_cpu_data(struct dbs_data *dbs_data);
#endif /* _CPUFREQ_GOVERNOR_H */
......@@ -8,11 +8,6 @@
#include "cpufreq_governor.h"
static inline struct gov_attr_set *to_gov_attr_set(struct kobject *kobj)
{
return container_of(kobj, struct gov_attr_set, kobj);
}
static inline struct governor_attr *to_gov_attr(struct attribute *attr)
{
return container_of(attr, struct governor_attr, attr);
......
......@@ -202,7 +202,7 @@ static unsigned int od_dbs_update(struct cpufreq_policy *policy)
/************************** sysfs interface ************************/
static struct dbs_governor od_dbs_gov;
static ssize_t store_io_is_busy(struct gov_attr_set *attr_set, const char *buf,
static ssize_t io_is_busy_store(struct gov_attr_set *attr_set, const char *buf,
size_t count)
{
struct dbs_data *dbs_data = to_dbs_data(attr_set);
......@@ -220,7 +220,7 @@ static ssize_t store_io_is_busy(struct gov_attr_set *attr_set, const char *buf,
return count;
}
static ssize_t store_up_threshold(struct gov_attr_set *attr_set,
static ssize_t up_threshold_store(struct gov_attr_set *attr_set,
const char *buf, size_t count)
{
struct dbs_data *dbs_data = to_dbs_data(attr_set);
......@@ -237,7 +237,7 @@ static ssize_t store_up_threshold(struct gov_attr_set *attr_set,
return count;
}
static ssize_t store_sampling_down_factor(struct gov_attr_set *attr_set,
static ssize_t sampling_down_factor_store(struct gov_attr_set *attr_set,
const char *buf, size_t count)
{
struct dbs_data *dbs_data = to_dbs_data(attr_set);
......@@ -265,7 +265,7 @@ static ssize_t store_sampling_down_factor(struct gov_attr_set *attr_set,
return count;
}
static ssize_t store_ignore_nice_load(struct gov_attr_set *attr_set,
static ssize_t ignore_nice_load_store(struct gov_attr_set *attr_set,
const char *buf, size_t count)
{
struct dbs_data *dbs_data = to_dbs_data(attr_set);
......@@ -290,7 +290,7 @@ static ssize_t store_ignore_nice_load(struct gov_attr_set *attr_set,
return count;
}
static ssize_t store_powersave_bias(struct gov_attr_set *attr_set,
static ssize_t powersave_bias_store(struct gov_attr_set *attr_set,
const char *buf, size_t count)
{
struct dbs_data *dbs_data = to_dbs_data(attr_set);
......
......@@ -1692,6 +1692,37 @@ static void intel_pstate_enable_hwp_interrupt(struct cpudata *cpudata)
}
}
static void intel_pstate_update_epp_defaults(struct cpudata *cpudata)
{
cpudata->epp_default = intel_pstate_get_epp(cpudata, 0);
/*
* If this CPU gen doesn't call for change in balance_perf
* EPP return.
*/
if (epp_values[EPP_INDEX_BALANCE_PERFORMANCE] == HWP_EPP_BALANCE_PERFORMANCE)
return;
/*
* If powerup EPP is something other than chipset default 0x80 and
* - is more performance oriented than 0x80 (default balance_perf EPP)
* - But less performance oriented than performance EPP
* then use this as new balance_perf EPP.
*/
if (cpudata->epp_default < HWP_EPP_BALANCE_PERFORMANCE &&
cpudata->epp_default > HWP_EPP_PERFORMANCE) {
epp_values[EPP_INDEX_BALANCE_PERFORMANCE] = cpudata->epp_default;
return;
}
/*
* Use hard coded value per gen to update the balance_perf
* and default EPP.
*/
cpudata->epp_default = epp_values[EPP_INDEX_BALANCE_PERFORMANCE];
intel_pstate_set_epp(cpudata, cpudata->epp_default);
}
static void intel_pstate_hwp_enable(struct cpudata *cpudata)
{
/* First disable HWP notification interrupt till we activate again */
......@@ -1705,12 +1736,7 @@ static void intel_pstate_hwp_enable(struct cpudata *cpudata)
if (cpudata->epp_default >= 0)
return;
if (epp_values[EPP_INDEX_BALANCE_PERFORMANCE] == HWP_EPP_BALANCE_PERFORMANCE) {
cpudata->epp_default = intel_pstate_get_epp(cpudata, 0);
} else {
cpudata->epp_default = epp_values[EPP_INDEX_BALANCE_PERFORMANCE];
intel_pstate_set_epp(cpudata, cpudata->epp_default);
}
intel_pstate_update_epp_defaults(cpudata);
}
static int atom_get_min_pstate(void)
......
......@@ -668,9 +668,9 @@ static acpi_status longhaul_walk_callback(acpi_handle obj_handle,
u32 nesting_level,
void *context, void **return_value)
{
struct acpi_device *d;
struct acpi_device *d = acpi_fetch_acpi_dev(obj_handle);
if (acpi_bus_get_device(obj_handle, &d))
if (!d)
return 0;
*return_value = acpi_driver_data(d);
......
......@@ -1172,14 +1172,14 @@ static int powernowk8_init(void)
unsigned int i, supported_cpus = 0;
int ret;
if (!x86_match_cpu(powernow_k8_ids))
return -ENODEV;
if (boot_cpu_has(X86_FEATURE_HW_PSTATE)) {
__request_acpi_cpufreq();
return -ENODEV;
}
if (!x86_match_cpu(powernow_k8_ids))
return -ENODEV;
cpus_read_lock();
for_each_online_cpu(i) {
smp_call_function_single(i, check_supported_cpu, &ret, 1);
......
......@@ -108,11 +108,11 @@ static int __init haltpoll_init(void)
if (boot_option_idle_override != IDLE_NO_OVERRIDE)
return -ENODEV;
cpuidle_poll_state_init(drv);
if (!kvm_para_available() || !haltpoll_want())
return -ENODEV;
cpuidle_poll_state_init(drv);
ret = cpuidle_register_driver(drv);
if (ret < 0)
return ret;
......
......@@ -64,6 +64,7 @@ static struct cpuidle_driver intel_idle_driver = {
/* intel_idle.max_cstate=0 disables driver */
static int max_cstate = CPUIDLE_STATE_MAX - 1;
static unsigned int disabled_states_mask;
static unsigned int preferred_states_mask;
static struct cpuidle_device __percpu *intel_idle_cpuidle_devices;
......@@ -121,9 +122,6 @@ static unsigned int mwait_substates __initdata;
* If the local APIC timer is not known to be reliable in the target idle state,
* enable one-shot tick broadcasting for the target CPU before executing MWAIT.
*
* Optionally call leave_mm() for the target CPU upfront to avoid wakeups due to
* flushing user TLBs.
*
* Must be called under local_irq_disable().
*/
static __cpuidle int intel_idle(struct cpuidle_device *dev,
......@@ -761,6 +759,46 @@ static struct cpuidle_state icx_cstates[] __initdata = {
.enter = NULL }
};
/*
* On Sapphire Rapids Xeon C1 has to be disabled if C1E is enabled, and vice
* versa. On SPR C1E is enabled only if "C1E promotion" bit is set in
* MSR_IA32_POWER_CTL. But in this case there effectively no C1, because C1
* requests are promoted to C1E. If the "C1E promotion" bit is cleared, then
* both C1 and C1E requests end up with C1, so there is effectively no C1E.
*
* By default we enable C1 and disable C1E by marking it with
* 'CPUIDLE_FLAG_UNUSABLE'.
*/
static struct cpuidle_state spr_cstates[] __initdata = {
{
.name = "C1",
.desc = "MWAIT 0x00",
.flags = MWAIT2flg(0x00),
.exit_latency = 1,
.target_residency = 1,
.enter = &intel_idle,
.enter_s2idle = intel_idle_s2idle, },
{
.name = "C1E",
.desc = "MWAIT 0x01",
.flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_ALWAYS_ENABLE |
CPUIDLE_FLAG_UNUSABLE,
.exit_latency = 2,
.target_residency = 4,
.enter = &intel_idle,
.enter_s2idle = intel_idle_s2idle, },
{
.name = "C6",
.desc = "MWAIT 0x20",
.flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED,
.exit_latency = 290,
.target_residency = 800,
.enter = &intel_idle,
.enter_s2idle = intel_idle_s2idle, },
{
.enter = NULL }
};
static struct cpuidle_state atom_cstates[] __initdata = {
{
.name = "C1E",
......@@ -1104,6 +1142,12 @@ static const struct idle_cpu idle_cpu_icx __initconst = {
.use_acpi = true,
};
static const struct idle_cpu idle_cpu_spr __initconst = {
.state_table = spr_cstates,
.disable_promotion_to_c1e = true,
.use_acpi = true,
};
static const struct idle_cpu idle_cpu_avn __initconst = {
.state_table = avn_cstates,
.disable_promotion_to_c1e = true,
......@@ -1166,6 +1210,7 @@ static const struct x86_cpu_id intel_idle_ids[] __initconst = {
X86_MATCH_INTEL_FAM6_MODEL(SKYLAKE_X, &idle_cpu_skx),
X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X, &idle_cpu_icx),
X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_D, &idle_cpu_icx),
X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, &idle_cpu_spr),
X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNL, &idle_cpu_knl),
X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNM, &idle_cpu_knl),
X86_MATCH_INTEL_FAM6_MODEL(ATOM_GOLDMONT, &idle_cpu_bxt),
......@@ -1353,6 +1398,8 @@ static inline void intel_idle_init_cstates_acpi(struct cpuidle_driver *drv) { }
static inline bool intel_idle_off_by_default(u32 mwait_hint) { return false; }
#endif /* !CONFIG_ACPI_PROCESSOR_CSTATE */
static void c1e_promotion_enable(void);
/**
* ivt_idle_state_table_update - Tune the idle states table for Ivy Town.
*
......@@ -1523,6 +1570,41 @@ static void __init skx_idle_state_table_update(void)
}
}
/**
* spr_idle_state_table_update - Adjust Sapphire Rapids idle states table.
*/
static void __init spr_idle_state_table_update(void)
{
unsigned long long msr;
/* Check if user prefers C1E over C1. */
if (preferred_states_mask & BIT(2)) {
if (preferred_states_mask & BIT(1))
/* Both can't be enabled, stick to the defaults. */
return;
spr_cstates[0].flags |= CPUIDLE_FLAG_UNUSABLE;
spr_cstates[1].flags &= ~CPUIDLE_FLAG_UNUSABLE;
/* Enable C1E using the "C1E promotion" bit. */
c1e_promotion_enable();
disable_promotion_to_c1e = false;
}
/*
* By default, the C6 state assumes the worst-case scenario of package
* C6. However, if PC6 is disabled, we update the numbers to match
* core C6.
*/
rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, msr);
/* Limit value 2 and above allow for PC6. */
if ((msr & 0x7) < 2) {
spr_cstates[2].exit_latency = 190;
spr_cstates[2].target_residency = 600;
}
}
static bool __init intel_idle_verify_cstate(unsigned int mwait_hint)
{
unsigned int mwait_cstate = MWAIT_HINT2CSTATE(mwait_hint) + 1;
......@@ -1557,6 +1639,9 @@ static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv)
case INTEL_FAM6_SKYLAKE_X:
skx_idle_state_table_update();
break;
case INTEL_FAM6_SAPPHIRERAPIDS_X:
spr_idle_state_table_update();
break;
}
for (cstate = 0; cstate < CPUIDLE_STATE_MAX; ++cstate) {
......@@ -1629,6 +1714,15 @@ static void auto_demotion_disable(void)
wrmsrl(MSR_PKG_CST_CONFIG_CONTROL, msr_bits);
}
static void c1e_promotion_enable(void)
{
unsigned long long msr_bits;
rdmsrl(MSR_IA32_POWER_CTL, msr_bits);
msr_bits |= 0x2;
wrmsrl(MSR_IA32_POWER_CTL, msr_bits);
}
static void c1e_promotion_disable(void)
{
unsigned long long msr_bits;
......@@ -1798,3 +1892,14 @@ module_param(max_cstate, int, 0444);
*/
module_param_named(states_off, disabled_states_mask, uint, 0444);
MODULE_PARM_DESC(states_off, "Mask of disabled idle states");
/*
* Some platforms come with mutually exclusive C-states, so that if one is
* enabled, the other C-states must not be used. Example: C1 and C1E on
* Sapphire Rapids platform. This parameter allows for selecting the
* preferred C-states among the groups of mutually exclusive C-states - the
* selected C-states will be registered, the other C-states from the mutually
* exclusive group won't be registered. If the platform has no mutually
* exclusive C-states, this parameter has no effect.
*/
module_param_named(preferred_cstates, preferred_states_mask, uint, 0444);
MODULE_PARM_DESC(preferred_cstates, "Mask of preferred idle states");
......@@ -596,7 +596,7 @@ static int pci_legacy_suspend(struct device *dev, pm_message_t state)
int error;
error = drv->suspend(pci_dev, state);
suspend_report_result(drv->suspend, error);
suspend_report_result(dev, drv->suspend, error);
if (error)
return error;
......@@ -775,7 +775,7 @@ static int pci_pm_suspend(struct device *dev)
int error;
error = pm->suspend(dev);
suspend_report_result(pm->suspend, error);
suspend_report_result(dev, pm->suspend, error);
if (error)
return error;
......@@ -821,7 +821,7 @@ static int pci_pm_suspend_noirq(struct device *dev)
int error;
error = pm->suspend_noirq(dev);
suspend_report_result(pm->suspend_noirq, error);
suspend_report_result(dev, pm->suspend_noirq, error);
if (error)
return error;
......@@ -1010,7 +1010,7 @@ static int pci_pm_freeze(struct device *dev)
int error;
error = pm->freeze(dev);
suspend_report_result(pm->freeze, error);
suspend_report_result(dev, pm->freeze, error);
if (error)
return error;
}
......@@ -1030,7 +1030,7 @@ static int pci_pm_freeze_noirq(struct device *dev)
int error;
error = pm->freeze_noirq(dev);
suspend_report_result(pm->freeze_noirq, error);
suspend_report_result(dev, pm->freeze_noirq, error);
if (error)
return error;
}
......@@ -1116,7 +1116,7 @@ static int pci_pm_poweroff(struct device *dev)
int error;
error = pm->poweroff(dev);
suspend_report_result(pm->poweroff, error);
suspend_report_result(dev, pm->poweroff, error);
if (error)
return error;
}
......@@ -1154,7 +1154,7 @@ static int pci_pm_poweroff_noirq(struct device *dev)
int error;
error = pm->poweroff_noirq(dev);
suspend_report_result(pm->poweroff_noirq, error);
suspend_report_result(dev, pm->poweroff_noirq, error);
if (error)
return error;
}
......
......@@ -171,7 +171,7 @@ static int __pnp_bus_suspend(struct device *dev, pm_message_t state)
if (pnp_drv->driver.pm && pnp_drv->driver.pm->suspend) {
error = pnp_drv->driver.pm->suspend(dev);
suspend_report_result(pnp_drv->driver.pm->suspend, error);
suspend_report_result(dev, pnp_drv->driver.pm->suspend, error);
if (error)
return error;
}
......
......@@ -46,6 +46,7 @@ config IDLE_INJECT
config DTPM
bool "Power capping for Dynamic Thermal Power Management (EXPERIMENTAL)"
depends on OF
help
This enables support for the power capping for the dynamic
thermal power management userspace engine.
......@@ -56,4 +57,11 @@ config DTPM_CPU
help
This enables support for CPU power limitation based on
energy model.
config DTPM_DEVFREQ
bool "Add device power capping based on the energy model"
depends on DTPM && ENERGY_MODEL
help
This enables support for device power limitation based on
energy model.
endif
# SPDX-License-Identifier: GPL-2.0-only
obj-$(CONFIG_DTPM) += dtpm.o
obj-$(CONFIG_DTPM_CPU) += dtpm_cpu.o
obj-$(CONFIG_DTPM_DEVFREQ) += dtpm_devfreq.o
obj-$(CONFIG_POWERCAP) += powercap_sys.o
obj-$(CONFIG_INTEL_RAPL_CORE) += intel_rapl_common.o
obj-$(CONFIG_INTEL_RAPL) += intel_rapl_msr.o
......
This diff is collapsed.
......@@ -21,6 +21,7 @@
#include <linux/cpuhotplug.h>
#include <linux/dtpm.h>
#include <linux/energy_model.h>
#include <linux/of.h>
#include <linux/pm_qos.h>
#include <linux/slab.h>
#include <linux/units.h>
......@@ -150,10 +151,17 @@ static int update_pd_power_uw(struct dtpm *dtpm)
static void pd_release(struct dtpm *dtpm)
{
struct dtpm_cpu *dtpm_cpu = to_dtpm_cpu(dtpm);
struct cpufreq_policy *policy;
if (freq_qos_request_active(&dtpm_cpu->qos_req))
freq_qos_remove_request(&dtpm_cpu->qos_req);
policy = cpufreq_cpu_get(dtpm_cpu->cpu);
if (policy) {
for_each_cpu(dtpm_cpu->cpu, policy->related_cpus)
per_cpu(dtpm_per_cpu, dtpm_cpu->cpu) = NULL;
}
kfree(dtpm_cpu);
}
......@@ -176,6 +184,17 @@ static int cpuhp_dtpm_cpu_offline(unsigned int cpu)
}
static int cpuhp_dtpm_cpu_online(unsigned int cpu)
{
struct dtpm_cpu *dtpm_cpu;
dtpm_cpu = per_cpu(dtpm_per_cpu, cpu);
if (dtpm_cpu)
return dtpm_update_power(&dtpm_cpu->dtpm);
return 0;
}
static int __dtpm_cpu_setup(int cpu, struct dtpm *parent)
{
struct dtpm_cpu *dtpm_cpu;
struct cpufreq_policy *policy;
......@@ -183,6 +202,10 @@ static int cpuhp_dtpm_cpu_online(unsigned int cpu)
char name[CPUFREQ_NAME_LEN];
int ret = -ENOMEM;
dtpm_cpu = per_cpu(dtpm_per_cpu, cpu);
if (dtpm_cpu)
return 0;
policy = cpufreq_cpu_get(cpu);
if (!policy)
return 0;
......@@ -191,10 +214,6 @@ static int cpuhp_dtpm_cpu_online(unsigned int cpu)
if (!pd)
return -EINVAL;
dtpm_cpu = per_cpu(dtpm_per_cpu, cpu);
if (dtpm_cpu)
return dtpm_update_power(&dtpm_cpu->dtpm);
dtpm_cpu = kzalloc(sizeof(*dtpm_cpu), GFP_KERNEL);
if (!dtpm_cpu)
return -ENOMEM;
......@@ -207,7 +226,7 @@ static int cpuhp_dtpm_cpu_online(unsigned int cpu)
snprintf(name, sizeof(name), "cpu%d-cpufreq", dtpm_cpu->cpu);
ret = dtpm_register(name, &dtpm_cpu->dtpm, NULL);
ret = dtpm_register(name, &dtpm_cpu->dtpm, parent);
if (ret)
goto out_kfree_dtpm_cpu;
......@@ -231,7 +250,18 @@ static int cpuhp_dtpm_cpu_online(unsigned int cpu)
return ret;
}
static int __init dtpm_cpu_init(void)
static int dtpm_cpu_setup(struct dtpm *dtpm, struct device_node *np)
{
int cpu;
cpu = of_cpu_node_to_id(np);
if (cpu < 0)
return 0;
return __dtpm_cpu_setup(cpu, dtpm);
}
static int dtpm_cpu_init(void)
{
int ret;
......@@ -269,4 +299,15 @@ static int __init dtpm_cpu_init(void)
return 0;
}
DTPM_DECLARE(dtpm_cpu, dtpm_cpu_init);
static void dtpm_cpu_exit(void)
{
cpuhp_remove_state_nocalls(CPUHP_AP_ONLINE_DYN);
cpuhp_remove_state_nocalls(CPUHP_AP_DTPM_CPU_DEAD);
}
struct dtpm_subsys_ops dtpm_cpu_ops = {
.name = KBUILD_MODNAME,
.init = dtpm_cpu_init,
.exit = dtpm_cpu_exit,
.setup = dtpm_cpu_setup,
};
// SPDX-License-Identifier: GPL-2.0-only
/*
* Copyright 2021 Linaro Limited
*
* Author: Daniel Lezcano <daniel.lezcano@linaro.org>
*
* The devfreq device combined with the energy model and the load can
* give an estimation of the power consumption as well as limiting the
* power.
*
*/
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/cpumask.h>
#include <linux/devfreq.h>
#include <linux/dtpm.h>
#include <linux/energy_model.h>
#include <linux/of.h>
#include <linux/pm_qos.h>
#include <linux/slab.h>
#include <linux/units.h>
struct dtpm_devfreq {
struct dtpm dtpm;
struct dev_pm_qos_request qos_req;
struct devfreq *devfreq;
};
static struct dtpm_devfreq *to_dtpm_devfreq(struct dtpm *dtpm)
{
return container_of(dtpm, struct dtpm_devfreq, dtpm);
}
static int update_pd_power_uw(struct dtpm *dtpm)
{
struct dtpm_devfreq *dtpm_devfreq = to_dtpm_devfreq(dtpm);
struct devfreq *devfreq = dtpm_devfreq->devfreq;
struct device *dev = devfreq->dev.parent;
struct em_perf_domain *pd = em_pd_get(dev);
dtpm->power_min = pd->table[0].power;
dtpm->power_min *= MICROWATT_PER_MILLIWATT;
dtpm->power_max = pd->table[pd->nr_perf_states - 1].power;
dtpm->power_max *= MICROWATT_PER_MILLIWATT;
return 0;
}
static u64 set_pd_power_limit(struct dtpm *dtpm, u64 power_limit)
{
struct dtpm_devfreq *dtpm_devfreq = to_dtpm_devfreq(dtpm);
struct devfreq *devfreq = dtpm_devfreq->devfreq;
struct device *dev = devfreq->dev.parent;
struct em_perf_domain *pd = em_pd_get(dev);
unsigned long freq;
u64 power;
int i;
for (i = 0; i < pd->nr_perf_states; i++) {
power = pd->table[i].power * MICROWATT_PER_MILLIWATT;
if (power > power_limit)
break;
}
freq = pd->table[i - 1].frequency;
dev_pm_qos_update_request(&dtpm_devfreq->qos_req, freq);
power_limit = pd->table[i - 1].power * MICROWATT_PER_MILLIWATT;
return power_limit;
}
static void _normalize_load(struct devfreq_dev_status *status)
{
if (status->total_time > 0xfffff) {
status->total_time >>= 10;
status->busy_time >>= 10;
}
status->busy_time <<= 10;
status->busy_time /= status->total_time ? : 1;
status->busy_time = status->busy_time ? : 1;
status->total_time = 1024;
}
static u64 get_pd_power_uw(struct dtpm *dtpm)
{
struct dtpm_devfreq *dtpm_devfreq = to_dtpm_devfreq(dtpm);
struct devfreq *devfreq = dtpm_devfreq->devfreq;
struct device *dev = devfreq->dev.parent;
struct em_perf_domain *pd = em_pd_get(dev);
struct devfreq_dev_status status;
unsigned long freq;
u64 power;
int i;
mutex_lock(&devfreq->lock);
status = devfreq->last_status;
mutex_unlock(&devfreq->lock);
freq = DIV_ROUND_UP(status.current_frequency, HZ_PER_KHZ);
_normalize_load(&status);
for (i = 0; i < pd->nr_perf_states; i++) {
if (pd->table[i].frequency < freq)
continue;
power = pd->table[i].power * MICROWATT_PER_MILLIWATT;
power *= status.busy_time;
power >>= 10;
return power;
}
return 0;
}
static void pd_release(struct dtpm *dtpm)
{
struct dtpm_devfreq *dtpm_devfreq = to_dtpm_devfreq(dtpm);
if (dev_pm_qos_request_active(&dtpm_devfreq->qos_req))
dev_pm_qos_remove_request(&dtpm_devfreq->qos_req);
kfree(dtpm_devfreq);
}
static struct dtpm_ops dtpm_ops = {
.set_power_uw = set_pd_power_limit,
.get_power_uw = get_pd_power_uw,
.update_power_uw = update_pd_power_uw,
.release = pd_release,
};
static int __dtpm_devfreq_setup(struct devfreq *devfreq, struct dtpm *parent)
{
struct device *dev = devfreq->dev.parent;
struct dtpm_devfreq *dtpm_devfreq;
struct em_perf_domain *pd;
int ret = -ENOMEM;
pd = em_pd_get(dev);
if (!pd) {
ret = dev_pm_opp_of_register_em(dev, NULL);
if (ret) {
pr_err("No energy model available for '%s'\n", dev_name(dev));
return -EINVAL;
}
}
dtpm_devfreq = kzalloc(sizeof(*dtpm_devfreq), GFP_KERNEL);
if (!dtpm_devfreq)
return -ENOMEM;
dtpm_init(&dtpm_devfreq->dtpm, &dtpm_ops);
dtpm_devfreq->devfreq = devfreq;
ret = dtpm_register(dev_name(dev), &dtpm_devfreq->dtpm, parent);
if (ret) {
pr_err("Failed to register '%s': %d\n", dev_name(dev), ret);
kfree(dtpm_devfreq);
return ret;
}
ret = dev_pm_qos_add_request(dev, &dtpm_devfreq->qos_req,
DEV_PM_QOS_MAX_FREQUENCY,
PM_QOS_MAX_FREQUENCY_DEFAULT_VALUE);
if (ret) {
pr_err("Failed to add QoS request: %d\n", ret);
goto out_dtpm_unregister;
}
dtpm_update_power(&dtpm_devfreq->dtpm);
return 0;
out_dtpm_unregister:
dtpm_unregister(&dtpm_devfreq->dtpm);
return ret;
}
static int dtpm_devfreq_setup(struct dtpm *dtpm, struct device_node *np)
{
struct devfreq *devfreq;
devfreq = devfreq_get_devfreq_by_node(np);
if (IS_ERR(devfreq))
return 0;
return __dtpm_devfreq_setup(devfreq, dtpm);
}
struct dtpm_subsys_ops dtpm_devfreq_ops = {
.name = KBUILD_MODNAME,
.setup = dtpm_devfreq_setup,
};
/* SPDX-License-Identifier: GPL-2.0-only */
/*
* Copyright (C) 2022 Linaro Ltd
*
* Author: Daniel Lezcano <daniel.lezcano@linaro.org>
*/
#ifndef ___DTPM_SUBSYS_H__
#define ___DTPM_SUBSYS_H__
extern struct dtpm_subsys_ops dtpm_cpu_ops;
extern struct dtpm_subsys_ops dtpm_devfreq_ops;
struct dtpm_subsys_ops *dtpm_subsys[] = {
#ifdef CONFIG_DTPM_CPU
&dtpm_cpu_ops,
#endif
#ifdef CONFIG_DTPM_DEVFREQ
&dtpm_devfreq_ops,
#endif
};
#endif
......@@ -34,4 +34,12 @@ config ROCKCHIP_PM_DOMAINS
If unsure, say N.
config ROCKCHIP_DTPM
tristate "Rockchip DTPM hierarchy"
depends on DTPM && m
help
Describe the hierarchy for the Dynamic Thermal Power
Management tree on this platform. That will create all the
power capping capable devices.
endif
......@@ -5,3 +5,4 @@
obj-$(CONFIG_ROCKCHIP_GRF) += grf.o
obj-$(CONFIG_ROCKCHIP_IODOMAIN) += io-domain.o
obj-$(CONFIG_ROCKCHIP_PM_DOMAINS) += pm_domains.o
obj-$(CONFIG_ROCKCHIP_DTPM) += dtpm.o
// SPDX-License-Identifier: GPL-2.0-only
/*
* Copyright 2021 Linaro Limited
*
* Author: Daniel Lezcano <daniel.lezcano@linaro.org>
*
* DTPM hierarchy description
*/
#include <linux/dtpm.h>
#include <linux/module.h>
#include <linux/of.h>
#include <linux/platform_device.h>
static struct dtpm_node __initdata rk3399_hierarchy[] = {
[0]{ .name = "rk3399",
.type = DTPM_NODE_VIRTUAL },
[1]{ .name = "package",
.type = DTPM_NODE_VIRTUAL,
.parent = &rk3399_hierarchy[0] },
[2]{ .name = "/cpus/cpu@0",
.type = DTPM_NODE_DT,
.parent = &rk3399_hierarchy[1] },
[3]{ .name = "/cpus/cpu@1",
.type = DTPM_NODE_DT,
.parent = &rk3399_hierarchy[1] },
[4]{ .name = "/cpus/cpu@2",
.type = DTPM_NODE_DT,
.parent = &rk3399_hierarchy[1] },
[5]{ .name = "/cpus/cpu@3",
.type = DTPM_NODE_DT,
.parent = &rk3399_hierarchy[1] },
[6]{ .name = "/cpus/cpu@100",
.type = DTPM_NODE_DT,
.parent = &rk3399_hierarchy[1] },
[7]{ .name = "/cpus/cpu@101",
.type = DTPM_NODE_DT,
.parent = &rk3399_hierarchy[1] },
[8]{ .name = "/gpu@ff9a0000",
.type = DTPM_NODE_DT,
.parent = &rk3399_hierarchy[1] },
[9]{ /* sentinel */ }
};
static struct of_device_id __initdata rockchip_dtpm_match_table[] = {
{ .compatible = "rockchip,rk3399", .data = rk3399_hierarchy },
{},
};
static int __init rockchip_dtpm_init(void)
{
return dtpm_create_hierarchy(rockchip_dtpm_match_table);
}
module_init(rockchip_dtpm_init);
static void __exit rockchip_dtpm_exit(void)
{
return dtpm_destroy_hierarchy();
}
module_exit(rockchip_dtpm_exit);
MODULE_SOFTDEP("pre: panfrost cpufreq-dt");
MODULE_DESCRIPTION("Rockchip DTPM driver");
MODULE_LICENSE("GPL");
MODULE_ALIAS("platform:dtpm");
MODULE_AUTHOR("Daniel Lezcano <daniel.lezcano@kernel.org");
......@@ -446,7 +446,7 @@ static int suspend_common(struct device *dev, bool do_wakeup)
HCD_WAKEUP_PENDING(hcd->shared_hcd))
return -EBUSY;
retval = hcd->driver->pci_suspend(hcd, do_wakeup);
suspend_report_result(hcd->driver->pci_suspend, retval);
suspend_report_result(dev, hcd->driver->pci_suspend, retval);
/* Check again in case wakeup raced with pci_suspend */
if ((retval == 0 && do_wakeup && HCD_WAKEUP_PENDING(hcd)) ||
......@@ -556,7 +556,7 @@ static int hcd_pci_suspend_noirq(struct device *dev)
dev_dbg(dev, "--> PCI %s\n",
pci_power_name(pci_dev->current_state));
} else {
suspend_report_result(pci_prepare_to_sleep, retval);
suspend_report_result(dev, pci_prepare_to_sleep, retval);
return retval;
}
......
......@@ -321,16 +321,6 @@
#define THERMAL_TABLE(name)
#endif
#ifdef CONFIG_DTPM
#define DTPM_TABLE() \
. = ALIGN(8); \
__dtpm_table = .; \
KEEP(*(__dtpm_table)) \
__dtpm_table_end = .;
#else
#define DTPM_TABLE()
#endif
#define KERNEL_DTB() \
STRUCT_ALIGN(); \
__dtb_start = .; \
......@@ -723,7 +713,6 @@
ACPI_PROBE_TABLE(irqchip) \
ACPI_PROBE_TABLE(timer) \
THERMAL_TABLE(governor) \
DTPM_TABLE() \
EARLYCON_TABLE() \
LSM_TABLE() \
EARLY_LSM_TABLE() \
......
......@@ -526,7 +526,7 @@ acpi_status acpi_release_memory(acpi_handle handle, struct resource *res,
int acpi_resources_are_enforced(void);
#ifdef CONFIG_HIBERNATION
void __init acpi_check_s4_hw_signature(int check);
extern int acpi_check_s4_hw_signature;
#endif
#ifdef CONFIG_PM_SLEEP
......
......@@ -661,6 +661,11 @@ struct gov_attr_set {
/* sysfs ops for cpufreq governors */
extern const struct sysfs_ops governor_sysfs_ops;
static inline struct gov_attr_set *to_gov_attr_set(struct kobject *kobj)
{
return container_of(kobj, struct gov_attr_set, kobj);
}
void gov_attr_set_init(struct gov_attr_set *attr_set, struct list_head *list_node);
void gov_attr_set_get(struct gov_attr_set *attr_set, struct list_head *list_node);
unsigned int gov_attr_set_put(struct gov_attr_set *attr_set, struct list_head *list_node);
......
......@@ -32,28 +32,25 @@ struct dtpm_ops {
void (*release)(struct dtpm *);
};
typedef int (*dtpm_init_t)(void);
struct device_node;
struct dtpm_descr {
dtpm_init_t init;
struct dtpm_subsys_ops {
const char *name;
int (*init)(void);
void (*exit)(void);
int (*setup)(struct dtpm *, struct device_node *);
};
/* Init section thermal table */
extern struct dtpm_descr __dtpm_table[];
extern struct dtpm_descr __dtpm_table_end[];
#define DTPM_TABLE_ENTRY(name, __init) \
static struct dtpm_descr __dtpm_table_entry_##name \
__used __section("__dtpm_table") = { \
.init = __init, \
}
#define DTPM_DECLARE(name, init) DTPM_TABLE_ENTRY(name, init)
enum DTPM_NODE_TYPE {
DTPM_NODE_VIRTUAL = 0,
DTPM_NODE_DT,
};
#define for_each_dtpm_table(__dtpm) \
for (__dtpm = __dtpm_table; \
__dtpm < __dtpm_table_end; \
__dtpm++)
struct dtpm_node {
enum DTPM_NODE_TYPE type;
const char *name;
struct dtpm_node *parent;
};
static inline struct dtpm *to_dtpm(struct powercap_zone *zone)
{
......@@ -70,4 +67,7 @@ void dtpm_unregister(struct dtpm *dtpm);
int dtpm_register(const char *name, struct dtpm *dtpm, struct dtpm *parent);
int dtpm_create_hierarchy(struct of_device_id *dtpm_match_table);
void dtpm_destroy_hierarchy(void);
#endif
......@@ -770,11 +770,11 @@ extern int dpm_suspend_late(pm_message_t state);
extern int dpm_suspend(pm_message_t state);
extern int dpm_prepare(pm_message_t state);
extern void __suspend_report_result(const char *function, void *fn, int ret);
extern void __suspend_report_result(const char *function, struct device *dev, void *fn, int ret);
#define suspend_report_result(fn, ret) \
#define suspend_report_result(dev, fn, ret) \
do { \
__suspend_report_result(__func__, fn, ret); \
__suspend_report_result(__func__, dev, fn, ret); \
} while (0)
extern int device_pm_wait_for_dev(struct device *sub, struct device *dev);
......@@ -814,7 +814,7 @@ static inline int dpm_suspend_start(pm_message_t state)
return 0;
}
#define suspend_report_result(fn, ret) do {} while (0)
#define suspend_report_result(dev, fn, ret) do {} while (0)
static inline int device_pm_wait_for_dev(struct device *a, struct device *b)
{
......
......@@ -567,6 +567,10 @@ static inline void pm_runtime_disable(struct device *dev)
* Allow the runtime PM autosuspend mechanism to be used for @dev whenever
* requested (or "autosuspend" will be handled as direct runtime-suspend for
* it).
*
* NOTE: It's important to undo this with pm_runtime_dont_use_autosuspend()
* at driver exit time unless your driver initially enabled pm_runtime
* with devm_pm_runtime_enable() (which handles it for you).
*/
static inline void pm_runtime_use_autosuspend(struct device *dev)
{
......
......@@ -689,8 +689,10 @@ static int load_image_and_restore(void)
lock_device_hotplug();
error = create_basic_memory_bitmaps();
if (error)
if (error) {
swsusp_close(FMODE_READ | FMODE_EXCL);
goto Unlock;
}
error = swsusp_read(&flags);
swsusp_close(FMODE_READ | FMODE_EXCL);
......@@ -1328,7 +1330,7 @@ static int __init resumedelay_setup(char *str)
int rc = kstrtouint(str, 0, &resume_delay);
if (rc)
return rc;
pr_warn("resumedelay: bad option string '%s'\n", str);
return 1;
}
......
......@@ -157,22 +157,22 @@ static int __init setup_test_suspend(char *value)
value++;
suspend_type = strsep(&value, ",");
if (!suspend_type)
return 0;
return 1;
repeat = strsep(&value, ",");
if (repeat) {
if (kstrtou32(repeat, 0, &test_repeat_count_max))
return 0;
return 1;
}
for (i = PM_SUSPEND_MIN; i < PM_SUSPEND_MAX; i++)
if (!strcmp(pm_labels[i], suspend_type)) {
test_state_label = pm_labels[i];
return 0;
return 1;
}
printk(warn_bad_state, suspend_type);
return 0;
return 1;
}
__setup("test_suspend", setup_test_suspend);
......
......@@ -89,7 +89,7 @@ struct swap_map_page_list {
struct swap_map_page_list *next;
};
/**
/*
* The swap_map_handle structure is used for handling swap in
* a file-alike way
*/
......@@ -117,7 +117,7 @@ struct swsusp_header {
static struct swsusp_header *swsusp_header;
/**
/*
* The following functions are used for tracing the allocated
* swap pages, so that they can be freed in case of an error.
*/
......@@ -171,7 +171,7 @@ static int swsusp_extents_insert(unsigned long swap_offset)
return 0;
}
/**
/*
* alloc_swapdev_block - allocate a swap page and register that it has
* been allocated, so that it can be freed in case of an error.
*/
......@@ -190,7 +190,7 @@ sector_t alloc_swapdev_block(int swap)
return 0;
}
/**
/*
* free_all_swap_pages - free swap pages allocated for saving image data.
* It also frees the extents used to register which swap entries had been
* allocated.
......
......@@ -539,7 +539,7 @@ ATTRIBUTE_GROUPS(sugov);
static void sugov_tunables_free(struct kobject *kobj)
{
struct gov_attr_set *attr_set = container_of(kobj, struct gov_attr_set, kobj);
struct gov_attr_set *attr_set = to_gov_attr_set(kobj);
kfree(to_sugov_tunables(attr_set));
}
......
......@@ -143,9 +143,9 @@ UTIL_HEADERS = utils/helpers/helpers.h utils/idle_monitor/cpupower-monitor.h \
utils/helpers/bitmask.h \
utils/idle_monitor/idle_monitors.h utils/idle_monitor/idle_monitors.def
LIB_HEADERS = lib/cpufreq.h lib/cpupower.h lib/cpuidle.h
LIB_SRC = lib/cpufreq.c lib/cpupower.c lib/cpuidle.c
LIB_OBJS = lib/cpufreq.o lib/cpupower.o lib/cpuidle.o
LIB_HEADERS = lib/cpufreq.h lib/cpupower.h lib/cpuidle.h lib/acpi_cppc.h
LIB_SRC = lib/cpufreq.c lib/cpupower.c lib/cpuidle.c lib/acpi_cppc.c
LIB_OBJS = lib/cpufreq.o lib/cpupower.o lib/cpuidle.o lib/acpi_cppc.o
LIB_OBJS := $(addprefix $(OUTPUT),$(LIB_OBJS))
override CFLAGS += -pipe
......
// SPDX-License-Identifier: GPL-2.0-only
#include <stdio.h>
#include <errno.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include "cpupower_intern.h"
#include "acpi_cppc.h"
/* ACPI CPPC sysfs access ***********************************************/
static int acpi_cppc_read_file(unsigned int cpu, const char *fname,
char *buf, size_t buflen)
{
char path[SYSFS_PATH_MAX];
snprintf(path, sizeof(path), PATH_TO_CPU "cpu%u/acpi_cppc/%s",
cpu, fname);
return cpupower_read_sysfs(path, buf, buflen);
}
static const char * const acpi_cppc_value_files[] = {
[HIGHEST_PERF] = "highest_perf",
[LOWEST_PERF] = "lowest_perf",
[NOMINAL_PERF] = "nominal_perf",
[LOWEST_NONLINEAR_PERF] = "lowest_nonlinear_perf",
[LOWEST_FREQ] = "lowest_freq",
[NOMINAL_FREQ] = "nominal_freq",
[REFERENCE_PERF] = "reference_perf",
[WRAPAROUND_TIME] = "wraparound_time"
};
unsigned long acpi_cppc_get_data(unsigned int cpu, enum acpi_cppc_value which)
{
unsigned long long value;
unsigned int len;
char linebuf[MAX_LINE_LEN];
char *endp;
if (which >= MAX_CPPC_VALUE_FILES)
return 0;
len = acpi_cppc_read_file(cpu, acpi_cppc_value_files[which],
linebuf, sizeof(linebuf));
if (len == 0)
return 0;
value = strtoull(linebuf, &endp, 0);
if (endp == linebuf || errno == ERANGE)
return 0;
return value;
}
/* SPDX-License-Identifier: GPL-2.0-only */
#ifndef __ACPI_CPPC_H__
#define __ACPI_CPPC_H__
enum acpi_cppc_value {
HIGHEST_PERF,
LOWEST_PERF,
NOMINAL_PERF,
LOWEST_NONLINEAR_PERF,
LOWEST_FREQ,
NOMINAL_FREQ,
REFERENCE_PERF,
WRAPAROUND_TIME,
MAX_CPPC_VALUE_FILES
};
unsigned long acpi_cppc_get_data(unsigned int cpu,
enum acpi_cppc_value which);
#endif /* _ACPI_CPPC_H */
......@@ -83,20 +83,21 @@ static const char *cpufreq_value_files[MAX_CPUFREQ_VALUE_READ_FILES] = {
[STATS_NUM_TRANSITIONS] = "stats/total_trans"
};
static unsigned long sysfs_cpufreq_get_one_value(unsigned int cpu,
enum cpufreq_value which)
unsigned long cpufreq_get_sysfs_value_from_table(unsigned int cpu,
const char **table,
unsigned int index,
unsigned int size)
{
unsigned long value;
unsigned int len;
char linebuf[MAX_LINE_LEN];
char *endp;
if (which >= MAX_CPUFREQ_VALUE_READ_FILES)
if (!table || index >= size || !table[index])
return 0;
len = sysfs_cpufreq_read_file(cpu, cpufreq_value_files[which],
linebuf, sizeof(linebuf));
len = sysfs_cpufreq_read_file(cpu, table[index], linebuf,
sizeof(linebuf));
if (len == 0)
return 0;
......@@ -109,6 +110,14 @@ static unsigned long sysfs_cpufreq_get_one_value(unsigned int cpu,
return value;
}
static unsigned long sysfs_cpufreq_get_one_value(unsigned int cpu,
enum cpufreq_value which)
{
return cpufreq_get_sysfs_value_from_table(cpu, cpufreq_value_files,
which,
MAX_CPUFREQ_VALUE_READ_FILES);
}
/* read access to files which contain one string */
enum cpufreq_string {
......@@ -124,7 +133,7 @@ static const char *cpufreq_string_files[MAX_CPUFREQ_STRING_FILES] = {
static char *sysfs_cpufreq_get_one_string(unsigned int cpu,
enum cpufreq_string which)
enum cpufreq_string which)
{
char linebuf[MAX_LINE_LEN];
char *result;
......
......@@ -203,6 +203,18 @@ int cpufreq_modify_policy_governor(unsigned int cpu, char *governor);
int cpufreq_set_frequency(unsigned int cpu,
unsigned long target_frequency);
/*
* get the sysfs value from specific table
*
* Read the value with the sysfs file name from specific table. Does
* only work if the cpufreq driver has the specific sysfs interfaces.
*/
unsigned long cpufreq_get_sysfs_value_from_table(unsigned int cpu,
const char **table,
unsigned int index,
unsigned int size);
#ifdef __cplusplus
}
#endif
......
......@@ -53,6 +53,9 @@ human\-readable output for the \-f, \-w, \-s and \-y parameters.
\fB\-n\fR \fB\-\-no-rounding\fR
Output frequencies and latencies without rounding off values.
.TP
\fB\-c\fR \fB\-\-perf\fR
Get performances and frequencies capabilities of CPPC, by reading it from hardware (only available on the hardware with CPPC).
.TP
.SH "REMARKS"
.LP
By default only values of core zero are displayed. How to display settings of
......
......@@ -4,7 +4,7 @@
cpupower\-idle\-set \- Utility to set cpu idle state specific kernel options
.SH "SYNTAX"
.LP
cpupower [ \-c cpulist ] idle\-info [\fIoptions\fP]
cpupower [ \-c cpulist ] idle\-set [\fIoptions\fP]
.SH "DESCRIPTION"
.LP
The cpupower idle\-set subcommand allows to set cpu idle, also called cpu
......
......@@ -84,43 +84,6 @@ static void proc_cpufreq_output(void)
}
static int no_rounding;
static void print_speed(unsigned long speed)
{
unsigned long tmp;
if (no_rounding) {
if (speed > 1000000)
printf("%u.%06u GHz", ((unsigned int) speed/1000000),
((unsigned int) speed%1000000));
else if (speed > 1000)
printf("%u.%03u MHz", ((unsigned int) speed/1000),
(unsigned int) (speed%1000));
else
printf("%lu kHz", speed);
} else {
if (speed > 1000000) {
tmp = speed%10000;
if (tmp >= 5000)
speed += 10000;
printf("%u.%02u GHz", ((unsigned int) speed/1000000),
((unsigned int) (speed%1000000)/10000));
} else if (speed > 100000) {
tmp = speed%1000;
if (tmp >= 500)
speed += 1000;
printf("%u MHz", ((unsigned int) speed/1000));
} else if (speed > 1000) {
tmp = speed%100;
if (tmp >= 50)
speed += 100;
printf("%u.%01u MHz", ((unsigned int) speed/1000),
((unsigned int) (speed%1000)/100));
}
}
return;
}
static void print_duration(unsigned long duration)
{
unsigned long tmp;
......@@ -183,9 +146,12 @@ static int get_boost_mode_x86(unsigned int cpu)
printf(_(" Supported: %s\n"), support ? _("yes") : _("no"));
printf(_(" Active: %s\n"), active ? _("yes") : _("no"));
if ((cpupower_cpu_info.vendor == X86_VENDOR_AMD &&
cpupower_cpu_info.family >= 0x10) ||
cpupower_cpu_info.vendor == X86_VENDOR_HYGON) {
if (cpupower_cpu_info.vendor == X86_VENDOR_AMD &&
cpupower_cpu_info.caps & CPUPOWER_CAP_AMD_PSTATE) {
return 0;
} else if ((cpupower_cpu_info.vendor == X86_VENDOR_AMD &&
cpupower_cpu_info.family >= 0x10) ||
cpupower_cpu_info.vendor == X86_VENDOR_HYGON) {
ret = decode_pstates(cpu, b_states, pstates, &pstate_no);
if (ret)
return ret;
......@@ -254,11 +220,11 @@ static int get_boost_mode(unsigned int cpu)
if (freqs) {
printf(_(" boost frequency steps: "));
while (freqs->next) {
print_speed(freqs->frequency);
print_speed(freqs->frequency, no_rounding);
printf(", ");
freqs = freqs->next;
}
print_speed(freqs->frequency);
print_speed(freqs->frequency, no_rounding);
printf("\n");
cpufreq_put_available_frequencies(freqs);
}
......@@ -277,7 +243,7 @@ static int get_freq_kernel(unsigned int cpu, unsigned int human)
return -EINVAL;
}
if (human) {
print_speed(freq);
print_speed(freq, no_rounding);
} else
printf("%lu", freq);
printf(_(" (asserted by call to kernel)\n"));
......@@ -296,7 +262,7 @@ static int get_freq_hardware(unsigned int cpu, unsigned int human)
return -EINVAL;
}
if (human) {
print_speed(freq);
print_speed(freq, no_rounding);
} else
printf("%lu", freq);
printf(_(" (asserted by call to hardware)\n"));
......@@ -316,9 +282,9 @@ static int get_hardware_limits(unsigned int cpu, unsigned int human)
if (human) {
printf(_(" hardware limits: "));
print_speed(min);
print_speed(min, no_rounding);
printf(" - ");
print_speed(max);
print_speed(max, no_rounding);
printf("\n");
} else {
printf("%lu %lu\n", min, max);
......@@ -350,9 +316,9 @@ static int get_policy(unsigned int cpu)
return -EINVAL;
}
printf(_(" current policy: frequency should be within "));
print_speed(policy->min);
print_speed(policy->min, no_rounding);
printf(_(" and "));
print_speed(policy->max);
print_speed(policy->max, no_rounding);
printf(".\n ");
printf(_("The governor \"%s\" may decide which speed to use\n"
......@@ -436,7 +402,7 @@ static int get_freq_stats(unsigned int cpu, unsigned int human)
struct cpufreq_stats *stats = cpufreq_get_stats(cpu, &total_time);
while (stats) {
if (human) {
print_speed(stats->frequency);
print_speed(stats->frequency, no_rounding);
printf(":%.2f%%",
(100.0 * stats->time_in_state) / total_time);
} else
......@@ -472,6 +438,17 @@ static int get_latency(unsigned int cpu, unsigned int human)
return 0;
}
/* --performance / -c */
static int get_perf_cap(unsigned int cpu)
{
if (cpupower_cpu_info.vendor == X86_VENDOR_AMD &&
cpupower_cpu_info.caps & CPUPOWER_CAP_AMD_PSTATE)
amd_pstate_show_perf_and_freq(cpu, no_rounding);
return 0;
}
static void debug_output_one(unsigned int cpu)
{
struct cpufreq_available_frequencies *freqs;
......@@ -486,11 +463,11 @@ static void debug_output_one(unsigned int cpu)
if (freqs) {
printf(_(" available frequency steps: "));
while (freqs->next) {
print_speed(freqs->frequency);
print_speed(freqs->frequency, no_rounding);
printf(", ");
freqs = freqs->next;
}
print_speed(freqs->frequency);
print_speed(freqs->frequency, no_rounding);
printf("\n");
cpufreq_put_available_frequencies(freqs);
}
......@@ -500,6 +477,7 @@ static void debug_output_one(unsigned int cpu)
if (get_freq_hardware(cpu, 1) < 0)
get_freq_kernel(cpu, 1);
get_boost_mode(cpu);
get_perf_cap(cpu);
}
static struct option info_opts[] = {
......@@ -518,6 +496,7 @@ static struct option info_opts[] = {
{"proc", no_argument, NULL, 'o'},
{"human", no_argument, NULL, 'm'},
{"no-rounding", no_argument, NULL, 'n'},
{"performance", no_argument, NULL, 'c'},
{ },
};
......@@ -531,7 +510,7 @@ int cmd_freq_info(int argc, char **argv)
int output_param = 0;
do {
ret = getopt_long(argc, argv, "oefwldpgrasmybn", info_opts,
ret = getopt_long(argc, argv, "oefwldpgrasmybnc", info_opts,
NULL);
switch (ret) {
case '?':
......@@ -554,6 +533,7 @@ int cmd_freq_info(int argc, char **argv)
case 'e':
case 's':
case 'y':
case 'c':
if (output_param) {
output_param = -1;
cont = 0;
......@@ -660,6 +640,9 @@ int cmd_freq_info(int argc, char **argv)
case 'y':
ret = get_latency(cpu, human);
break;
case 'c':
ret = get_perf_cap(cpu);
break;
}
if (ret)
return ret;
......
......@@ -8,7 +8,10 @@
#include <pci/pci.h>
#include "helpers/helpers.h"
#include "cpufreq.h"
#include "acpi_cppc.h"
/* ACPI P-States Helper Functions for AMD Processors ***************/
#define MSR_AMD_PSTATE_STATUS 0xc0010063
#define MSR_AMD_PSTATE 0xc0010064
#define MSR_AMD_PSTATE_LIMIT 0xc0010061
......@@ -146,4 +149,78 @@ int amd_pci_get_num_boost_states(int *active, int *states)
pci_cleanup(pci_acc);
return 0;
}
/* ACPI P-States Helper Functions for AMD Processors ***************/
/* AMD P-State Helper Functions ************************************/
enum amd_pstate_value {
AMD_PSTATE_HIGHEST_PERF,
AMD_PSTATE_MAX_FREQ,
AMD_PSTATE_LOWEST_NONLINEAR_FREQ,
MAX_AMD_PSTATE_VALUE_READ_FILES,
};
static const char *amd_pstate_value_files[MAX_AMD_PSTATE_VALUE_READ_FILES] = {
[AMD_PSTATE_HIGHEST_PERF] = "amd_pstate_highest_perf",
[AMD_PSTATE_MAX_FREQ] = "amd_pstate_max_freq",
[AMD_PSTATE_LOWEST_NONLINEAR_FREQ] = "amd_pstate_lowest_nonlinear_freq",
};
static unsigned long amd_pstate_get_data(unsigned int cpu,
enum amd_pstate_value value)
{
return cpufreq_get_sysfs_value_from_table(cpu,
amd_pstate_value_files,
value,
MAX_AMD_PSTATE_VALUE_READ_FILES);
}
void amd_pstate_boost_init(unsigned int cpu, int *support, int *active)
{
unsigned long highest_perf, nominal_perf, cpuinfo_min,
cpuinfo_max, amd_pstate_max;
highest_perf = amd_pstate_get_data(cpu, AMD_PSTATE_HIGHEST_PERF);
nominal_perf = acpi_cppc_get_data(cpu, NOMINAL_PERF);
*support = highest_perf > nominal_perf ? 1 : 0;
if (!(*support))
return;
cpufreq_get_hardware_limits(cpu, &cpuinfo_min, &cpuinfo_max);
amd_pstate_max = amd_pstate_get_data(cpu, AMD_PSTATE_MAX_FREQ);
*active = cpuinfo_max == amd_pstate_max ? 1 : 0;
}
void amd_pstate_show_perf_and_freq(unsigned int cpu, int no_rounding)
{
printf(_(" AMD PSTATE Highest Performance: %lu. Maximum Frequency: "),
amd_pstate_get_data(cpu, AMD_PSTATE_HIGHEST_PERF));
/*
* If boost isn't active, the cpuinfo_max doesn't indicate real max
* frequency. So we read it back from amd-pstate sysfs entry.
*/
print_speed(amd_pstate_get_data(cpu, AMD_PSTATE_MAX_FREQ), no_rounding);
printf(".\n");
printf(_(" AMD PSTATE Nominal Performance: %lu. Nominal Frequency: "),
acpi_cppc_get_data(cpu, NOMINAL_PERF));
print_speed(acpi_cppc_get_data(cpu, NOMINAL_FREQ) * 1000,
no_rounding);
printf(".\n");
printf(_(" AMD PSTATE Lowest Non-linear Performance: %lu. Lowest Non-linear Frequency: "),
acpi_cppc_get_data(cpu, LOWEST_NONLINEAR_PERF));
print_speed(amd_pstate_get_data(cpu, AMD_PSTATE_LOWEST_NONLINEAR_FREQ),
no_rounding);
printf(".\n");
printf(_(" AMD PSTATE Lowest Performance: %lu. Lowest Frequency: "),
acpi_cppc_get_data(cpu, LOWEST_PERF));
print_speed(acpi_cppc_get_data(cpu, LOWEST_FREQ) * 1000, no_rounding);
printf(".\n");
}
/* AMD P-State Helper Functions ************************************/
#endif /* defined(__i386__) || defined(__x86_64__) */
......@@ -149,6 +149,19 @@ int get_cpu_info(struct cpupower_cpu_info *cpu_info)
if (ext_cpuid_level >= 0x80000008 &&
cpuid_ebx(0x80000008) & (1 << 4))
cpu_info->caps |= CPUPOWER_CAP_AMD_RDPRU;
if (cpupower_amd_pstate_enabled()) {
cpu_info->caps |= CPUPOWER_CAP_AMD_PSTATE;
/*
* If AMD P-State is enabled, the firmware will treat
* AMD P-State function as high priority.
*/
cpu_info->caps &= ~CPUPOWER_CAP_AMD_CPB;
cpu_info->caps &= ~CPUPOWER_CAP_AMD_CPB_MSR;
cpu_info->caps &= ~CPUPOWER_CAP_AMD_HW_PSTATE;
cpu_info->caps &= ~CPUPOWER_CAP_AMD_PSTATEDEF;
}
}
if (cpu_info->vendor == X86_VENDOR_INTEL) {
......
......@@ -11,6 +11,7 @@
#include <libintl.h>
#include <locale.h>
#include <stdbool.h>
#include "helpers/bitmask.h"
#include <cpupower.h>
......@@ -73,6 +74,7 @@ enum cpupower_cpu_vendor {X86_VENDOR_UNKNOWN = 0, X86_VENDOR_INTEL,
#define CPUPOWER_CAP_AMD_HW_PSTATE 0x00000100
#define CPUPOWER_CAP_AMD_PSTATEDEF 0x00000200
#define CPUPOWER_CAP_AMD_CPB_MSR 0x00000400
#define CPUPOWER_CAP_AMD_PSTATE 0x00000800
#define CPUPOWER_AMD_CPBDIS 0x02000000
......@@ -135,6 +137,16 @@ extern int decode_pstates(unsigned int cpu, int boost_states,
extern int cpufreq_has_boost_support(unsigned int cpu, int *support,
int *active, int * states);
/* AMD P-State stuff **************************/
bool cpupower_amd_pstate_enabled(void);
void amd_pstate_boost_init(unsigned int cpu,
int *support, int *active);
void amd_pstate_show_perf_and_freq(unsigned int cpu,
int no_rounding);
/* AMD P-State stuff **************************/
/*
* CPUID functions returning a single datum
*/
......@@ -167,6 +179,15 @@ static inline int cpufreq_has_boost_support(unsigned int cpu, int *support,
int *active, int * states)
{ return -1; }
static inline bool cpupower_amd_pstate_enabled(void)
{ return false; }
static inline void amd_pstate_boost_init(unsigned int cpu, int *support,
int *active)
{}
static inline void amd_pstate_show_perf_and_freq(unsigned int cpu,
int no_rounding)
{}
/* cpuid and cpuinfo helpers **************************/
static inline unsigned int cpuid_eax(unsigned int op) { return 0; };
......@@ -184,5 +205,6 @@ extern struct bitmask *offline_cpus;
void get_cpustate(void);
void print_online_cpus(void);
void print_offline_cpus(void);
void print_speed(unsigned long speed, int no_rounding);
#endif /* __CPUPOWERUTILS_HELPERS__ */
......@@ -3,9 +3,11 @@
#include <stdio.h>
#include <errno.h>
#include <stdlib.h>
#include <string.h>
#include "helpers/helpers.h"
#include "helpers/sysfs.h"
#include "cpufreq.h"
#if defined(__i386__) || defined(__x86_64__)
......@@ -39,6 +41,8 @@ int cpufreq_has_boost_support(unsigned int cpu, int *support, int *active,
if (ret)
return ret;
}
} else if (cpupower_cpu_info.caps & CPUPOWER_CAP_AMD_PSTATE) {
amd_pstate_boost_init(cpu, support, active);
} else if (cpupower_cpu_info.caps & CPUPOWER_CAP_INTEL_IDA)
*support = *active = 1;
return 0;
......@@ -83,6 +87,22 @@ int cpupower_intel_set_perf_bias(unsigned int cpu, unsigned int val)
return 0;
}
bool cpupower_amd_pstate_enabled(void)
{
char *driver = cpufreq_get_driver(0);
bool ret = false;
if (!driver)
return ret;
if (!strcmp(driver, "amd-pstate"))
ret = true;
cpufreq_put_driver(driver);
return ret;
}
#endif /* #if defined(__i386__) || defined(__x86_64__) */
/* get_cpustate
......@@ -144,3 +164,43 @@ void print_offline_cpus(void)
printf(_("cpupower set operation was not performed on them\n"));
}
}
/*
* print_speed
*
* Print the exact CPU frequency with appropriate unit
*/
void print_speed(unsigned long speed, int no_rounding)
{
unsigned long tmp;
if (no_rounding) {
if (speed > 1000000)
printf("%u.%06u GHz", ((unsigned int)speed / 1000000),
((unsigned int)speed % 1000000));
else if (speed > 1000)
printf("%u.%03u MHz", ((unsigned int)speed / 1000),
(unsigned int)(speed % 1000));
else
printf("%lu kHz", speed);
} else {
if (speed > 1000000) {
tmp = speed % 10000;
if (tmp >= 5000)
speed += 10000;
printf("%u.%02u GHz", ((unsigned int)speed / 1000000),
((unsigned int)(speed % 1000000) / 10000));
} else if (speed > 100000) {
tmp = speed % 1000;
if (tmp >= 500)
speed += 1000;
printf("%u MHz", ((unsigned int)speed / 1000));
} else if (speed > 1000) {
tmp = speed % 100;
if (tmp >= 50)
speed += 100;
printf("%u.%01u MHz", ((unsigned int)speed / 1000),
((unsigned int)(speed % 1000) / 100));
}
}
}
This diff is collapsed.
......@@ -2323,7 +2323,7 @@ int skx_pkg_cstate_limits[16] =
};
int icx_pkg_cstate_limits[16] =
{ PCL__0, PCL__2, PCL__6, PCLRSV, PCLRSV, PCLRSV, PCLRSV, PCLUNL, PCLRSV, PCLRSV, PCLRSV, PCLRSV, PCLRSV, PCLRSV,
{ PCL__0, PCL__2, PCL__6, PCL__6, PCLRSV, PCLRSV, PCLRSV, PCLUNL, PCLRSV, PCLRSV, PCLRSV, PCLRSV, PCLRSV, PCLRSV,
PCLRSV, PCLRSV
};
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment