Commit 631ddaba authored by Rafael J. Wysocki's avatar Rafael J. Wysocki

Merge branches 'pm-sleep' and 'powercap'

* pm-sleep:
  PM / sleep: Print active wakeup sources when blocking on wakeup_count reads
  x86/suspend: fix false positive KASAN warning on suspend/resume
  PM / sleep / ACPI: Use the ACPI_FADT_LOW_POWER_S0 flag
  PM / sleep: System sleep state selection interface rework
  PM / hibernate: Verify the consistent of e820 memory map by md5 digest

* powercap:
  powercap / RAPL: Add Knights Mill CPUID
  powercap/intel_rapl: fix and tidy up error handling
  powercap/intel_rapl: Track active CPUs internally
  powercap/intel_rapl: Cleanup duplicated init code
  powercap/intel rapl: Convert to hotplug state machine
  powercap/intel_rapl: Propagate error code when registration fails
  powercap/intel_rapl: Add missing domain data update on hotplug
......@@ -7,30 +7,35 @@ Description:
subsystem.
What: /sys/power/state
Date: May 2014
Date: November 2016
Contact: Rafael J. Wysocki <rjw@rjwysocki.net>
Description:
The /sys/power/state file controls system sleep states.
Reading from this file returns the available sleep state
labels, which may be "mem", "standby", "freeze" and "disk"
(hibernation). The meanings of the first three labels depend on
the relative_sleep_states command line argument as follows:
1) relative_sleep_states = 1
"mem", "standby", "freeze" represent non-hibernation sleep
states from the deepest ("mem", always present) to the
shallowest ("freeze"). "standby" and "freeze" may or may
not be present depending on the capabilities of the
platform. "freeze" can only be present if "standby" is
present.
2) relative_sleep_states = 0 (default)
"mem" - "suspend-to-RAM", present if supported.
"standby" - "power-on suspend", present if supported.
"freeze" - "suspend-to-idle", always present.
Writing to this file one of these strings causes the system to
transition into the corresponding state, if available. See
Documentation/power/states.txt for a description of what
"suspend-to-RAM", "power-on suspend" and "suspend-to-idle" mean.
labels, which may be "mem" (suspend), "standby" (power-on
suspend), "freeze" (suspend-to-idle) and "disk" (hibernation).
Writing one of the above strings to this file causes the system
to transition into the corresponding state, if available.
See Documentation/power/states.txt for more information.
What: /sys/power/mem_sleep
Date: November 2016
Contact: Rafael J. Wysocki <rjw@rjwysocki.net>
Description:
The /sys/power/mem_sleep file controls the operating mode of
system suspend. Reading from it returns the available modes
as "s2idle" (always present), "shallow" and "deep" (present if
supported). The mode that will be used on subsequent attempts
to suspend the system (by writing "mem" to the /sys/power/state
file described above) is enclosed in square brackets.
Writing one of the above strings to this file causes the mode
represented by it to be used on subsequent attempts to suspend
the system.
See Documentation/power/states.txt for more information.
What: /sys/power/disk
Date: September 2006
......
......@@ -2334,6 +2334,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
memory contents and reserves bad memory
regions that are detected.
mem_sleep_default= [SUSPEND] Default system suspend mode:
s2idle - Suspend-To-Idle
shallow - Power-On Suspend or equivalent (if supported)
deep - Suspend-To-RAM or equivalent (if supported)
See Documentation/power/states.txt.
meye.*= [HW] Set MotionEye Camera parameters
See Documentation/video4linux/meye.txt.
......@@ -3677,13 +3683,6 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
[KNL, SMP] Set scheduler's default relax_domain_level.
See Documentation/cgroup-v1/cpusets.txt.
relative_sleep_states=
[SUSPEND] Use sleep state labeling where the deepest
state available other than hibernation is always "mem".
Format: { "0" | "1" }
0 -- Traditional sleep state labels.
1 -- Relative sleep state labels.
reserve= [KNL,BUGS] Force the kernel to ignore some iomem area
reservetop= [X86-32]
......
......@@ -8,25 +8,43 @@ for each state.
The states are represented by strings that can be read or written to the
/sys/power/state file. Those strings may be "mem", "standby", "freeze" and
"disk", where the last one always represents hibernation (Suspend-To-Disk) and
the meaning of the remaining ones depends on the relative_sleep_states command
line argument.
For relative_sleep_states=1, the strings "mem", "standby" and "freeze" label the
available non-hibernation sleep states from the deepest to the shallowest,
respectively. In that case, "mem" is always present in /sys/power/state,
because there is at least one non-hibernation sleep state in every system. If
the given system supports two non-hibernation sleep states, "standby" is present
in /sys/power/state in addition to "mem". If the system supports three
non-hibernation sleep states, "freeze" will be present in /sys/power/state in
addition to "mem" and "standby".
For relative_sleep_states=0, which is the default, the following descriptions
apply.
state: Suspend-To-Idle
"disk", where the last three always represent Power-On Suspend (if supported),
Suspend-To-Idle and hibernation (Suspend-To-Disk), respectively.
The meaning of the "mem" string is controlled by the /sys/power/mem_sleep file.
It contains strings representing the available modes of system suspend that may
be triggered by writing "mem" to /sys/power/state. These modes are "s2idle"
(Suspend-To-Idle), "shallow" (Power-On Suspend) and "deep" (Suspend-To-RAM).
The "s2idle" mode is always available, while the other ones are only available
if supported by the platform (if not supported, the strings representing them
are not present in /sys/power/mem_sleep). The string representing the suspend
mode to be used subsequently is enclosed in square brackets. Writing one of
the other strings present in /sys/power/mem_sleep to it causes the suspend mode
to be used subsequently to change to the one represented by that string.
Consequently, there are two ways to cause the system to go into the
Suspend-To-Idle sleep state. The first one is to write "freeze" directly to
/sys/power/state. The second one is to write "s2idle" to /sys/power/mem_sleep
and then to wrtie "mem" to /sys/power/state. Similarly, there are two ways
to cause the system to go into the Power-On Suspend sleep state (the strings to
write to the control files in that case are "standby" or "shallow" and "mem",
respectively) if that state is supported by the platform. In turn, there is
only one way to cause the system to go into the Suspend-To-RAM state (write
"deep" into /sys/power/mem_sleep and "mem" into /sys/power/state).
The default suspend mode (ie. the one to be used without writing anything into
/sys/power/mem_sleep) is either "deep" (if Suspend-To-RAM is supported) or
"s2idle", but it can be overridden by the value of the "mem_sleep_default"
parameter in the kernel command line. On some ACPI-based systems, depending on
the information in the FADT, the default may be "s2idle" even if Suspend-To-RAM
is supported.
The properties of all of the sleep states are described below.
State: Suspend-To-Idle
ACPI state: S0
Label: "freeze"
Label: "s2idle" ("freeze")
This state is a generic, pure software, light-weight, system sleep state.
It allows more energy to be saved relative to runtime idle by freezing user
......@@ -35,13 +53,13 @@ lower-power than available at run time), such that the processors can
spend more time in their idle states.
This state can be used for platforms without Power-On Suspend/Suspend-to-RAM
support, or it can be used in addition to Suspend-to-RAM (memory sleep)
to provide reduced resume latency. It is always supported.
support, or it can be used in addition to Suspend-to-RAM to provide reduced
resume latency. It is always supported.
State: Standby / Power-On Suspend
ACPI State: S1
Label: "standby"
Label: "shallow" ("standby")
This state, if supported, offers moderate, though real, power savings, while
providing a relatively low-latency transition back to a working system. No
......@@ -58,7 +76,7 @@ state.
State: Suspend-to-RAM
ACPI State: S3
Label: "mem"
Label: "deep"
This state, if supported, offers significant power savings as everything in the
system is put into a low-power state, except for memory, which should be placed
......
......@@ -109,6 +109,15 @@ ENTRY(do_suspend_lowlevel)
movq pt_regs_r14(%rax), %r14
movq pt_regs_r15(%rax), %r15
#ifdef CONFIG_KASAN
/*
* The suspend path may have poisoned some areas deeper in the stack,
* which we now need to unpoison.
*/
movq %rsp, %rdi
call kasan_unpoison_task_stack_below
#endif
xorl %eax, %eax
addq $8, %rsp
FRAME_END
......
......@@ -11,6 +11,10 @@
#include <linux/gfp.h>
#include <linux/smp.h>
#include <linux/suspend.h>
#include <linux/scatterlist.h>
#include <linux/kdebug.h>
#include <crypto/hash.h>
#include <asm/init.h>
#include <asm/proto.h>
......@@ -177,14 +181,86 @@ int pfn_is_nosave(unsigned long pfn)
return (pfn >= nosave_begin_pfn) && (pfn < nosave_end_pfn);
}
#define MD5_DIGEST_SIZE 16
struct restore_data_record {
unsigned long jump_address;
unsigned long jump_address_phys;
unsigned long cr3;
unsigned long magic;
u8 e820_digest[MD5_DIGEST_SIZE];
};
#define RESTORE_MAGIC 0x123456789ABCDEF0UL
#define RESTORE_MAGIC 0x23456789ABCDEF01UL
#if IS_BUILTIN(CONFIG_CRYPTO_MD5)
/**
* get_e820_md5 - calculate md5 according to given e820 map
*
* @map: the e820 map to be calculated
* @buf: the md5 result to be stored to
*/
static int get_e820_md5(struct e820map *map, void *buf)
{
struct scatterlist sg;
struct crypto_ahash *tfm;
int size;
int ret = 0;
tfm = crypto_alloc_ahash("md5", 0, CRYPTO_ALG_ASYNC);
if (IS_ERR(tfm))
return -ENOMEM;
{
AHASH_REQUEST_ON_STACK(req, tfm);
size = offsetof(struct e820map, map)
+ sizeof(struct e820entry) * map->nr_map;
ahash_request_set_tfm(req, tfm);
sg_init_one(&sg, (u8 *)map, size);
ahash_request_set_callback(req, 0, NULL, NULL);
ahash_request_set_crypt(req, &sg, buf, size);
if (crypto_ahash_digest(req))
ret = -EINVAL;
ahash_request_zero(req);
}
crypto_free_ahash(tfm);
return ret;
}
static void hibernation_e820_save(void *buf)
{
get_e820_md5(e820_saved, buf);
}
static bool hibernation_e820_mismatch(void *buf)
{
int ret;
u8 result[MD5_DIGEST_SIZE];
memset(result, 0, MD5_DIGEST_SIZE);
/* If there is no digest in suspend kernel, let it go. */
if (!memcmp(result, buf, MD5_DIGEST_SIZE))
return false;
ret = get_e820_md5(e820_saved, result);
if (ret)
return true;
return memcmp(result, buf, MD5_DIGEST_SIZE) ? true : false;
}
#else
static void hibernation_e820_save(void *buf)
{
}
static bool hibernation_e820_mismatch(void *buf)
{
/* If md5 is not builtin for restore kernel, let it go. */
return false;
}
#endif
/**
* arch_hibernation_header_save - populate the architecture specific part
......@@ -201,6 +277,9 @@ int arch_hibernation_header_save(void *addr, unsigned int max_size)
rdr->jump_address_phys = __pa_symbol(&restore_registers);
rdr->cr3 = restore_cr3;
rdr->magic = RESTORE_MAGIC;
hibernation_e820_save(rdr->e820_digest);
return 0;
}
......@@ -216,5 +295,16 @@ int arch_hibernation_header_restore(void *addr)
restore_jump_address = rdr->jump_address;
jump_address_phys = rdr->jump_address_phys;
restore_cr3 = rdr->cr3;
return (rdr->magic == RESTORE_MAGIC) ? 0 : -EINVAL;
if (rdr->magic != RESTORE_MAGIC) {
pr_crit("Unrecognized hibernate image header format!\n");
return -EINVAL;
}
if (hibernation_e820_mismatch(rdr->e820_digest)) {
pr_crit("Hibernate inconsistent memory map detected!\n");
return -ENODEV;
}
return 0;
}
......@@ -674,6 +674,14 @@ static void acpi_sleep_suspend_setup(void)
if (acpi_sleep_state_supported(i))
sleep_states[i] = 1;
/*
* Use suspend-to-idle by default if ACPI_FADT_LOW_POWER_S0 is set and
* the default suspend mode was not selected from the command line.
*/
if (acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0 &&
mem_sleep_default > PM_SUSPEND_MEM)
mem_sleep_default = PM_SUSPEND_FREEZE;
suspend_set_ops(old_suspend_ordering ?
&acpi_suspend_ops_old : &acpi_suspend_ops);
freeze_set_ops(&acpi_freeze_ops);
......
......@@ -811,7 +811,7 @@ void pm_print_active_wakeup_sources(void)
rcu_read_lock();
list_for_each_entry_rcu(ws, &wakeup_sources, entry) {
if (ws->active) {
pr_info("active wakeup source: %s\n", ws->name);
pr_debug("active wakeup source: %s\n", ws->name);
active = 1;
} else if (!active &&
(!last_activity_ws ||
......@@ -822,7 +822,7 @@ void pm_print_active_wakeup_sources(void)
}
if (!active && last_activity_ws)
pr_info("last active wakeup source: %s\n",
pr_debug("last active wakeup source: %s\n",
last_activity_ws->name);
rcu_read_unlock();
}
......@@ -905,7 +905,7 @@ bool pm_get_wakeup_count(unsigned int *count, bool block)
split_counters(&cnt, &inpr);
if (inpr == 0 || signal_pending(current))
break;
pm_print_active_wakeup_sources();
schedule();
}
finish_wait(&wakeup_count_wait_queue, &wait);
......
......@@ -189,14 +189,13 @@ struct rapl_package {
unsigned int time_unit;
struct rapl_domain *domains; /* array of domains, sized at runtime */
struct powercap_zone *power_zone; /* keep track of parent zone */
int nr_cpus; /* active cpus on the package, topology info is lost during
* cpu hotplug. so we have to track ourselves.
*/
unsigned long power_limit_irq; /* keep track of package power limit
* notify interrupt enable status.
*/
struct list_head plist;
int lead_cpu; /* one active cpu per package for access */
/* Track active cpus */
struct cpumask cpumask;
};
struct rapl_defaults {
......@@ -275,18 +274,6 @@ static struct rapl_package *find_package_by_id(int id)
return NULL;
}
/* caller must hold cpu hotplug lock */
static void rapl_cleanup_data(void)
{
struct rapl_package *p, *tmp;
list_for_each_entry_safe(p, tmp, &rapl_packages, plist) {
kfree(p->domains);
list_del(&p->plist);
kfree(p);
}
}
static int get_energy_counter(struct powercap_zone *power_zone, u64 *energy_raw)
{
struct rapl_domain *rd;
......@@ -442,6 +429,7 @@ static int contraint_to_pl(struct rapl_domain *rd, int cid)
return i;
}
}
pr_err("Cannot find matching power limit for constraint %d\n", cid);
return -EINVAL;
}
......@@ -457,6 +445,10 @@ static int set_power_limit(struct powercap_zone *power_zone, int cid,
get_online_cpus();
rd = power_zone_to_rapl_domain(power_zone);
id = contraint_to_pl(rd, cid);
if (id < 0) {
ret = id;
goto set_exit;
}
rp = rd->rp;
......@@ -496,6 +488,11 @@ static int get_current_power_limit(struct powercap_zone *power_zone, int cid,
get_online_cpus();
rd = power_zone_to_rapl_domain(power_zone);
id = contraint_to_pl(rd, cid);
if (id < 0) {
ret = id;
goto get_exit;
}
switch (rd->rpl[id].prim_id) {
case PL1_ENABLE:
prim = POWER_LIMIT1;
......@@ -512,6 +509,7 @@ static int get_current_power_limit(struct powercap_zone *power_zone, int cid,
else
*data = val;
get_exit:
put_online_cpus();
return ret;
......@@ -527,6 +525,10 @@ static int set_time_window(struct powercap_zone *power_zone, int cid,
get_online_cpus();
rd = power_zone_to_rapl_domain(power_zone);
id = contraint_to_pl(rd, cid);
if (id < 0) {
ret = id;
goto set_time_exit;
}
switch (rd->rpl[id].prim_id) {
case PL1_ENABLE:
......@@ -538,6 +540,8 @@ static int set_time_window(struct powercap_zone *power_zone, int cid,
default:
ret = -EINVAL;
}
set_time_exit:
put_online_cpus();
return ret;
}
......@@ -552,6 +556,10 @@ static int get_time_window(struct powercap_zone *power_zone, int cid, u64 *data)
get_online_cpus();
rd = power_zone_to_rapl_domain(power_zone);
id = contraint_to_pl(rd, cid);
if (id < 0) {
ret = id;
goto get_time_exit;
}
switch (rd->rpl[id].prim_id) {
case PL1_ENABLE:
......@@ -566,6 +574,8 @@ static int get_time_window(struct powercap_zone *power_zone, int cid, u64 *data)
}
if (!ret)
*data = val;
get_time_exit:
put_online_cpus();
return ret;
......@@ -707,7 +717,7 @@ static u64 rapl_unit_xlate(struct rapl_domain *rd, enum unit_type type,
case ENERGY_UNIT:
scale = ENERGY_UNIT_SCALE;
/* per domain unit takes precedence */
if (rd && rd->domain_energy_unit)
if (rd->domain_energy_unit)
units = rd->domain_energy_unit;
else
units = rp->energy_unit;
......@@ -976,10 +986,20 @@ static void package_power_limit_irq_save(struct rapl_package *rp)
smp_call_function_single(rp->lead_cpu, power_limit_irq_save_cpu, rp, 1);
}
static void power_limit_irq_restore_cpu(void *info)
/*
* Restore per package power limit interrupt enable state. Called from cpu
* hotplug code on package removal.
*/
static void package_power_limit_irq_restore(struct rapl_package *rp)
{
u32 l, h = 0;
struct rapl_package *rp = (struct rapl_package *)info;
u32 l, h;
if (!boot_cpu_has(X86_FEATURE_PTS) || !boot_cpu_has(X86_FEATURE_PLN))
return;
/* irq enable state not saved, nothing to restore */
if (!(rp->power_limit_irq & PACKAGE_PLN_INT_SAVED))
return;
rdmsr_safe(MSR_IA32_PACKAGE_THERM_INTERRUPT, &l, &h);
......@@ -991,19 +1011,6 @@ static void power_limit_irq_restore_cpu(void *info)
wrmsr_safe(MSR_IA32_PACKAGE_THERM_INTERRUPT, l, h);
}
/* restore per package power limit interrupt enable state */
static void package_power_limit_irq_restore(struct rapl_package *rp)
{
if (!boot_cpu_has(X86_FEATURE_PTS) || !boot_cpu_has(X86_FEATURE_PLN))
return;
/* irq enable state not saved, nothing to restore */
if (!(rp->power_limit_irq & PACKAGE_PLN_INT_SAVED))
return;
smp_call_function_single(rp->lead_cpu, power_limit_irq_restore_cpu, rp, 1);
}
static void set_floor_freq_default(struct rapl_domain *rd, bool mode)
{
int nr_powerlimit = find_nr_power_limit(rd);
......@@ -1160,84 +1167,49 @@ static const struct x86_cpu_id rapl_ids[] __initconst = {
RAPL_CPU(INTEL_FAM6_ATOM_DENVERTON, rapl_defaults_core),
RAPL_CPU(INTEL_FAM6_XEON_PHI_KNL, rapl_defaults_hsw_server),
RAPL_CPU(INTEL_FAM6_XEON_PHI_KNM, rapl_defaults_hsw_server),
{}
};
MODULE_DEVICE_TABLE(x86cpu, rapl_ids);
/* read once for all raw primitive data for all packages, domains */
static void rapl_update_domain_data(void)
/* Read once for all raw primitive data for domains */
static void rapl_update_domain_data(struct rapl_package *rp)
{
int dmn, prim;
u64 val;
struct rapl_package *rp;
list_for_each_entry(rp, &rapl_packages, plist) {
for (dmn = 0; dmn < rp->nr_domains; dmn++) {
pr_debug("update package %d domain %s data\n", rp->id,
rp->domains[dmn].name);
/* exclude non-raw primitives */
for (prim = 0; prim < NR_RAW_PRIMITIVES; prim++)
for (prim = 0; prim < NR_RAW_PRIMITIVES; prim++) {
if (!rapl_read_data_raw(&rp->domains[dmn], prim,
rpi[prim].unit,
&val))
rp->domains[dmn].rdd.primitives[prim] =
val;
rpi[prim].unit, &val))
rp->domains[dmn].rdd.primitives[prim] = val;
}
}
}
static int rapl_unregister_powercap(void)
static void rapl_unregister_powercap(void)
{
struct rapl_package *rp;
struct rapl_domain *rd, *rd_package = NULL;
/* unregister all active rapl packages from the powercap layer,
* hotplug lock held
*/
list_for_each_entry(rp, &rapl_packages, plist) {
package_power_limit_irq_restore(rp);
for (rd = rp->domains; rd < rp->domains + rp->nr_domains;
rd++) {
pr_debug("remove package, undo power limit on %d: %s\n",
rp->id, rd->name);
rapl_write_data_raw(rd, PL1_ENABLE, 0);
rapl_write_data_raw(rd, PL1_CLAMP, 0);
if (find_nr_power_limit(rd) > 1) {
rapl_write_data_raw(rd, PL2_ENABLE, 0);
rapl_write_data_raw(rd, PL2_CLAMP, 0);
}
if (rd->id == RAPL_DOMAIN_PACKAGE) {
rd_package = rd;
continue;
}
powercap_unregister_zone(control_type, &rd->power_zone);
}
/* do the package zone last */
if (rd_package)
powercap_unregister_zone(control_type,
&rd_package->power_zone);
}
if (platform_rapl_domain) {
powercap_unregister_zone(control_type,
&platform_rapl_domain->power_zone);
kfree(platform_rapl_domain);
}
powercap_unregister_control_type(control_type);
return 0;
}
static int rapl_package_register_powercap(struct rapl_package *rp)
{
struct rapl_domain *rd;
int ret = 0;
char dev_name[17]; /* max domain name = 7 + 1 + 8 for int + 1 for null*/
struct powercap_zone *power_zone = NULL;
int nr_pl;
int nr_pl, ret;;
/* Update the domain data of the new package */
rapl_update_domain_data(rp);
/* first we register package domain as the parent zone*/
for (rd = rp->domains; rd < rp->domains + rp->nr_domains; rd++) {
......@@ -1257,8 +1229,7 @@ static int rapl_package_register_powercap(struct rapl_package *rp)
if (IS_ERR(power_zone)) {
pr_debug("failed to register package, %d\n",
rp->id);
ret = PTR_ERR(power_zone);
goto exit_package;
return PTR_ERR(power_zone);
}
/* track parent zone in per package/socket data */
rp->power_zone = power_zone;
......@@ -1268,8 +1239,7 @@ static int rapl_package_register_powercap(struct rapl_package *rp)
}
if (!power_zone) {
pr_err("no package domain found, unknown topology!\n");
ret = -ENODEV;
goto exit_package;
return -ENODEV;
}
/* now register domains as children of the socket/package*/
for (rd = rp->domains; rd < rp->domains + rp->nr_domains; rd++) {
......@@ -1290,11 +1260,11 @@ static int rapl_package_register_powercap(struct rapl_package *rp)
goto err_cleanup;
}
}
return 0;
exit_package:
return ret;
err_cleanup:
/* clean up previously initialized domains within the package if we
/*
* Clean up previously initialized domains within the package if we
* failed after the first domain setup.
*/
while (--rd >= rp->domains) {
......@@ -1305,7 +1275,7 @@ static int rapl_package_register_powercap(struct rapl_package *rp)
return ret;
}
static int rapl_register_psys(void)
static int __init rapl_register_psys(void)
{
struct rapl_domain *rd;
struct powercap_zone *power_zone;
......@@ -1346,40 +1316,14 @@ static int rapl_register_psys(void)
return 0;
}
static int rapl_register_powercap(void)
static int __init rapl_register_powercap(void)
{
struct rapl_domain *rd;
struct rapl_package *rp;
int ret = 0;
control_type = powercap_register_control_type(NULL, "intel-rapl", NULL);
if (IS_ERR(control_type)) {
pr_debug("failed to register powercap control_type.\n");
return PTR_ERR(control_type);
}
/* read the initial data */
rapl_update_domain_data();
list_for_each_entry(rp, &rapl_packages, plist)
if (rapl_package_register_powercap(rp))
goto err_cleanup_package;
/* Don't bail out if PSys is not supported */
rapl_register_psys();
return ret;
err_cleanup_package:
/* clean up previously initialized packages */
list_for_each_entry_continue_reverse(rp, &rapl_packages, plist) {
for (rd = rp->domains; rd < rp->domains + rp->nr_domains;
rd++) {
pr_debug("unregister zone/package %d, %s domain\n",
rp->id, rd->name);
powercap_unregister_zone(control_type, &rd->power_zone);
}
}
return ret;
return 0;
}
static int rapl_check_domain(int cpu, int domain)
......@@ -1452,9 +1396,8 @@ static void rapl_detect_powerlimit(struct rapl_domain *rd)
*/
static int rapl_detect_domains(struct rapl_package *rp, int cpu)
{
int i;
int ret = 0;
struct rapl_domain *rd;
int i;
for (i = 0; i < RAPL_DOMAIN_MAX; i++) {
/* use physical package id to read counters */
......@@ -1466,84 +1409,20 @@ static int rapl_detect_domains(struct rapl_package *rp, int cpu)
rp->nr_domains = bitmap_weight(&rp->domain_map, RAPL_DOMAIN_MAX);
if (!rp->nr_domains) {
pr_debug("no valid rapl domains found in package %d\n", rp->id);
ret = -ENODEV;
goto done;
return -ENODEV;
}
pr_debug("found %d domains on package %d\n", rp->nr_domains, rp->id);
rp->domains = kcalloc(rp->nr_domains + 1, sizeof(struct rapl_domain),
GFP_KERNEL);
if (!rp->domains) {
ret = -ENOMEM;
goto done;
}
if (!rp->domains)
return -ENOMEM;
rapl_init_domains(rp);
for (rd = rp->domains; rd < rp->domains + rp->nr_domains; rd++)
rapl_detect_powerlimit(rd);
done:
return ret;
}
static bool is_package_new(int package)
{
struct rapl_package *rp;
/* caller prevents cpu hotplug, there will be no new packages added
* or deleted while traversing the package list, no need for locking.
*/
list_for_each_entry(rp, &rapl_packages, plist)
if (package == rp->id)
return false;
return true;
}
/* RAPL interface can be made of a two-level hierarchy: package level and domain
* level. We first detect the number of packages then domains of each package.
* We have to consider the possiblity of CPU online/offline due to hotplug and
* other scenarios.
*/
static int rapl_detect_topology(void)
{
int i;
int phy_package_id;
struct rapl_package *new_package, *rp;
for_each_online_cpu(i) {
phy_package_id = topology_physical_package_id(i);
if (is_package_new(phy_package_id)) {
new_package = kzalloc(sizeof(*rp), GFP_KERNEL);
if (!new_package) {
rapl_cleanup_data();
return -ENOMEM;
}
/* add the new package to the list */
new_package->id = phy_package_id;
new_package->nr_cpus = 1;
/* use the first active cpu of the package to access */
new_package->lead_cpu = i;
/* check if the package contains valid domains */
if (rapl_detect_domains(new_package, i) ||
rapl_defaults->check_unit(new_package, i)) {
kfree(new_package->domains);
kfree(new_package);
/* free up the packages already initialized */
rapl_cleanup_data();
return -ENODEV;
}
INIT_LIST_HEAD(&new_package->plist);
list_add(&new_package->plist, &rapl_packages);
} else {
rp = find_package_by_id(phy_package_id);
if (rp)
++rp->nr_cpus;
}
}
return 0;
}
......@@ -1552,12 +1431,21 @@ static void rapl_remove_package(struct rapl_package *rp)
{
struct rapl_domain *rd, *rd_package = NULL;
package_power_limit_irq_restore(rp);
for (rd = rp->domains; rd < rp->domains + rp->nr_domains; rd++) {
rapl_write_data_raw(rd, PL1_ENABLE, 0);
rapl_write_data_raw(rd, PL1_CLAMP, 0);
if (find_nr_power_limit(rd) > 1) {
rapl_write_data_raw(rd, PL2_ENABLE, 0);
rapl_write_data_raw(rd, PL2_CLAMP, 0);
}
if (rd->id == RAPL_DOMAIN_PACKAGE) {
rd_package = rd;
continue;
}
pr_debug("remove package %d, %s domain\n", rp->id, rd->name);
pr_debug("remove package, undo power limit on %d: %s\n",
rp->id, rd->name);
powercap_unregister_zone(control_type, &rd->power_zone);
}
/* do parent zone last */
......@@ -1567,20 +1455,17 @@ static void rapl_remove_package(struct rapl_package *rp)
}
/* called from CPU hotplug notifier, hotplug lock held */
static int rapl_add_package(int cpu)
static struct rapl_package *rapl_add_package(int cpu, int pkgid)
{
int ret = 0;
int phy_package_id;
struct rapl_package *rp;
int ret;
phy_package_id = topology_physical_package_id(cpu);
rp = kzalloc(sizeof(struct rapl_package), GFP_KERNEL);
if (!rp)
return -ENOMEM;
return ERR_PTR(-ENOMEM);
/* add the new package to the list */
rp->id = phy_package_id;
rp->nr_cpus = 1;
rp->id = pkgid;
rp->lead_cpu = cpu;
/* check if the package contains valid domains */
......@@ -1589,17 +1474,17 @@ static int rapl_add_package(int cpu)
ret = -ENODEV;
goto err_free_package;
}
if (!rapl_package_register_powercap(rp)) {
ret = rapl_package_register_powercap(rp);
if (!ret) {
INIT_LIST_HEAD(&rp->plist);
list_add(&rp->plist, &rapl_packages);
return ret;
return rp;
}
err_free_package:
kfree(rp->domains);
kfree(rp);
return ret;
return ERR_PTR(ret);
}
/* Handles CPU hotplug on multi-socket systems.
......@@ -1609,55 +1494,46 @@ static int rapl_add_package(int cpu)
* associated domains. Cooling devices are handled accordingly at
* per-domain level.
*/
static int rapl_cpu_callback(struct notifier_block *nfb,
unsigned long action, void *hcpu)
static int rapl_cpu_online(unsigned int cpu)
{
unsigned long cpu = (unsigned long)hcpu;
int phy_package_id;
int pkgid = topology_physical_package_id(cpu);
struct rapl_package *rp;
rp = find_package_by_id(pkgid);
if (!rp) {
rp = rapl_add_package(cpu, pkgid);
if (IS_ERR(rp))
return PTR_ERR(rp);
}
cpumask_set_cpu(cpu, &rp->cpumask);
return 0;
}
static int rapl_cpu_down_prep(unsigned int cpu)
{
int pkgid = topology_physical_package_id(cpu);
struct rapl_package *rp;
int lead_cpu;
phy_package_id = topology_physical_package_id(cpu);
switch (action) {
case CPU_ONLINE:
case CPU_ONLINE_FROZEN:
case CPU_DOWN_FAILED:
case CPU_DOWN_FAILED_FROZEN:
rp = find_package_by_id(phy_package_id);
if (rp)
++rp->nr_cpus;
else
rapl_add_package(cpu);
break;
case CPU_DOWN_PREPARE:
case CPU_DOWN_PREPARE_FROZEN:
rp = find_package_by_id(phy_package_id);
rp = find_package_by_id(pkgid);
if (!rp)
break;
if (--rp->nr_cpus == 0)
return 0;
cpumask_clear_cpu(cpu, &rp->cpumask);
lead_cpu = cpumask_first(&rp->cpumask);
if (lead_cpu >= nr_cpu_ids)
rapl_remove_package(rp);
else if (cpu == rp->lead_cpu) {
/* choose another active cpu in the package */
lead_cpu = cpumask_any_but(topology_core_cpumask(cpu), cpu);
if (lead_cpu < nr_cpu_ids)
else if (rp->lead_cpu == cpu)
rp->lead_cpu = lead_cpu;
else /* should never go here */
pr_err("no active cpu available for package %d\n",
phy_package_id);
}
}
return NOTIFY_OK;
return 0;
}
static struct notifier_block rapl_cpu_notifier = {
.notifier_call = rapl_cpu_callback,
};
static enum cpuhp_state pcap_rapl_online;
static int __init rapl_init(void)
{
int ret = 0;
const struct x86_cpu_id *id;
int ret;
id = x86_match_cpu(rapl_ids);
if (!id) {
......@@ -1669,36 +1545,29 @@ static int __init rapl_init(void)
rapl_defaults = (struct rapl_defaults *)id->driver_data;
cpu_notifier_register_begin();
/* prevent CPU hotplug during detection */
get_online_cpus();
ret = rapl_detect_topology();
ret = rapl_register_powercap();
if (ret)
goto done;
return ret;
if (rapl_register_powercap()) {
rapl_cleanup_data();
ret = -ENODEV;
goto done;
}
__register_hotcpu_notifier(&rapl_cpu_notifier);
done:
put_online_cpus();
cpu_notifier_register_done();
ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "powercap/rapl:online",
rapl_cpu_online, rapl_cpu_down_prep);
if (ret < 0)
goto err_unreg;
pcap_rapl_online = ret;
/* Don't bail out if PSys is not supported */
rapl_register_psys();
return 0;
err_unreg:
rapl_unregister_powercap();
return ret;
}
static void __exit rapl_exit(void)
{
cpu_notifier_register_begin();
get_online_cpus();
__unregister_hotcpu_notifier(&rapl_cpu_notifier);
cpuhp_remove_state(pcap_rapl_online);
rapl_unregister_powercap();
rapl_cleanup_data();
put_online_cpus();
cpu_notifier_register_done();
}
module_init(rapl_init);
......
......@@ -194,6 +194,8 @@ struct platform_freeze_ops {
};
#ifdef CONFIG_SUSPEND
extern suspend_state_t mem_sleep_default;
/**
* suspend_set_ops - set platform dependent suspend operations
* @ops: The new suspend operations to set.
......
......@@ -78,6 +78,78 @@ static ssize_t pm_async_store(struct kobject *kobj, struct kobj_attribute *attr,
power_attr(pm_async);
#ifdef CONFIG_SUSPEND
static ssize_t mem_sleep_show(struct kobject *kobj, struct kobj_attribute *attr,
char *buf)
{
char *s = buf;
suspend_state_t i;
for (i = PM_SUSPEND_MIN; i < PM_SUSPEND_MAX; i++)
if (mem_sleep_states[i]) {
const char *label = mem_sleep_states[i];
if (mem_sleep_current == i)
s += sprintf(s, "[%s] ", label);
else
s += sprintf(s, "%s ", label);
}
/* Convert the last space to a newline if needed. */
if (s != buf)
*(s-1) = '\n';
return (s - buf);
}
static suspend_state_t decode_suspend_state(const char *buf, size_t n)
{
suspend_state_t state;
char *p;
int len;
p = memchr(buf, '\n', n);
len = p ? p - buf : n;
for (state = PM_SUSPEND_MIN; state < PM_SUSPEND_MAX; state++) {
const char *label = mem_sleep_states[state];
if (label && len == strlen(label) && !strncmp(buf, label, len))
return state;
}
return PM_SUSPEND_ON;
}
static ssize_t mem_sleep_store(struct kobject *kobj, struct kobj_attribute *attr,
const char *buf, size_t n)
{
suspend_state_t state;
int error;
error = pm_autosleep_lock();
if (error)
return error;
if (pm_autosleep_state() > PM_SUSPEND_ON) {
error = -EBUSY;
goto out;
}
state = decode_suspend_state(buf, n);
if (state < PM_SUSPEND_MAX && state > PM_SUSPEND_ON)
mem_sleep_current = state;
else
error = -EINVAL;
out:
pm_autosleep_unlock();
return error ? error : n;
}
power_attr(mem_sleep);
#endif /* CONFIG_SUSPEND */
#ifdef CONFIG_PM_DEBUG
int pm_test_level = TEST_NONE;
......@@ -368,12 +440,16 @@ static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
}
state = decode_state(buf, n);
if (state < PM_SUSPEND_MAX)
if (state < PM_SUSPEND_MAX) {
if (state == PM_SUSPEND_MEM)
state = mem_sleep_current;
error = pm_suspend(state);
else if (state == PM_SUSPEND_MAX)
} else if (state == PM_SUSPEND_MAX) {
error = hibernate();
else
} else {
error = -EINVAL;
}
out:
pm_autosleep_unlock();
......@@ -485,6 +561,9 @@ static ssize_t autosleep_store(struct kobject *kobj,
&& strcmp(buf, "off") && strcmp(buf, "off\n"))
return -EINVAL;
if (state == PM_SUSPEND_MEM)
state = mem_sleep_current;
error = pm_autosleep_set_state(state);
return error ? error : n;
}
......@@ -602,6 +681,9 @@ static struct attribute * g[] = {
#ifdef CONFIG_PM_SLEEP
&pm_async_attr.attr,
&wakeup_count_attr.attr,
#ifdef CONFIG_SUSPEND
&mem_sleep_attr.attr,
#endif
#ifdef CONFIG_PM_AUTOSLEEP
&autosleep_attr.attr,
#endif
......
......@@ -189,11 +189,15 @@ extern void swsusp_show_speed(ktime_t, ktime_t, unsigned int, char *);
#ifdef CONFIG_SUSPEND
/* kernel/power/suspend.c */
extern const char *pm_labels[];
extern const char * const pm_labels[];
extern const char *pm_states[];
extern const char *mem_sleep_states[];
extern suspend_state_t mem_sleep_current;
extern int suspend_devices_and_enter(suspend_state_t state);
#else /* !CONFIG_SUSPEND */
#define mem_sleep_current PM_SUSPEND_ON
static inline int suspend_devices_and_enter(suspend_state_t state)
{
return -ENOSYS;
......
......@@ -32,8 +32,21 @@
#include "power.h"
const char *pm_labels[] = { "mem", "standby", "freeze", NULL };
const char * const pm_labels[] = {
[PM_SUSPEND_FREEZE] = "freeze",
[PM_SUSPEND_STANDBY] = "standby",
[PM_SUSPEND_MEM] = "mem",
};
const char *pm_states[PM_SUSPEND_MAX];
static const char * const mem_sleep_labels[] = {
[PM_SUSPEND_FREEZE] = "s2idle",
[PM_SUSPEND_STANDBY] = "shallow",
[PM_SUSPEND_MEM] = "deep",
};
const char *mem_sleep_states[PM_SUSPEND_MAX];
suspend_state_t mem_sleep_current = PM_SUSPEND_FREEZE;
suspend_state_t mem_sleep_default = PM_SUSPEND_MAX;
unsigned int pm_suspend_global_flags;
EXPORT_SYMBOL_GPL(pm_suspend_global_flags);
......@@ -110,30 +123,32 @@ static bool valid_state(suspend_state_t state)
return suspend_ops && suspend_ops->valid && suspend_ops->valid(state);
}
/*
* If this is set, the "mem" label always corresponds to the deepest sleep state
* available, the "standby" label corresponds to the second deepest sleep state
* available (if any), and the "freeze" label corresponds to the remaining
* available sleep state (if there is one).
*/
static bool relative_states;
void __init pm_states_init(void)
{
/* "mem" and "freeze" are always present in /sys/power/state. */
pm_states[PM_SUSPEND_MEM] = pm_labels[PM_SUSPEND_MEM];
pm_states[PM_SUSPEND_FREEZE] = pm_labels[PM_SUSPEND_FREEZE];
/*
* freeze state should be supported even without any suspend_ops,
* initialize pm_states accordingly here
* Suspend-to-idle should be supported even without any suspend_ops,
* initialize mem_sleep_states[] accordingly here.
*/
pm_states[PM_SUSPEND_FREEZE] = pm_labels[relative_states ? 0 : 2];
mem_sleep_states[PM_SUSPEND_FREEZE] = mem_sleep_labels[PM_SUSPEND_FREEZE];
}
static int __init sleep_states_setup(char *str)
static int __init mem_sleep_default_setup(char *str)
{
relative_states = !strncmp(str, "1", 1);
suspend_state_t state;
for (state = PM_SUSPEND_FREEZE; state <= PM_SUSPEND_MEM; state++)
if (mem_sleep_labels[state] &&
!strcmp(str, mem_sleep_labels[state])) {
mem_sleep_default = state;
break;
}
return 1;
}
__setup("relative_sleep_states=", sleep_states_setup);
__setup("mem_sleep_default=", mem_sleep_default_setup);
/**
* suspend_set_ops - Set the global suspend method table.
......@@ -141,21 +156,21 @@ __setup("relative_sleep_states=", sleep_states_setup);
*/
void suspend_set_ops(const struct platform_suspend_ops *ops)
{
suspend_state_t i;
int j = 0;
lock_system_sleep();
suspend_ops = ops;
for (i = PM_SUSPEND_MEM; i >= PM_SUSPEND_STANDBY; i--)
if (valid_state(i)) {
pm_states[i] = pm_labels[j++];
} else if (!relative_states) {
pm_states[i] = NULL;
j++;
}
pm_states[PM_SUSPEND_FREEZE] = pm_labels[j];
if (valid_state(PM_SUSPEND_STANDBY)) {
mem_sleep_states[PM_SUSPEND_STANDBY] = mem_sleep_labels[PM_SUSPEND_STANDBY];
pm_states[PM_SUSPEND_STANDBY] = pm_labels[PM_SUSPEND_STANDBY];
if (mem_sleep_default == PM_SUSPEND_STANDBY)
mem_sleep_current = PM_SUSPEND_STANDBY;
}
if (valid_state(PM_SUSPEND_MEM)) {
mem_sleep_states[PM_SUSPEND_MEM] = mem_sleep_labels[PM_SUSPEND_MEM];
if (mem_sleep_default >= PM_SUSPEND_MEM)
mem_sleep_current = PM_SUSPEND_MEM;
}
unlock_system_sleep();
}
......
......@@ -80,7 +80,14 @@ void kasan_unpoison_task_stack(struct task_struct *task)
/* Unpoison the stack for the current task beyond a watermark sp value. */
asmlinkage void kasan_unpoison_task_stack_below(const void *watermark)
{
__kasan_unpoison_stack(current, watermark);
/*
* Calculate the task stack base address. Avoid using 'current'
* because this function is called by early resume code which hasn't
* yet set up the percpu register (%gs).
*/
void *base = (void *)((unsigned long)watermark & ~(THREAD_SIZE - 1));
kasan_unpoison_shadow(base, watermark - base);
}
/*
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment