Commit d79ee93d authored by Linus Torvalds's avatar Linus Torvalds

Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler changes from Ingo Molnar:
 "The biggest change is the cleanup/simplification of the load-balancer:
  instead of the current practice of architectures twiddling scheduler
  internal data structures and providing the scheduler domains in
  colorfully inconsistent ways, we now have generic scheduler code in
  kernel/sched/core.c:sched_init_numa() that looks at the architecture's
  node_distance() parameters and (while not fully trusting it) deducts a
  NUMA topology from it.

  This inevitably changes balancing behavior - hopefully for the better.

  There are various smaller optimizations, cleanups and fixlets as well"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Taint kernel with TAINT_WARN after sleep-in-atomic bug
  sched: Remove stale power aware scheduling remnants and dysfunctional knobs
  sched/debug: Fix printing large integers on 32-bit platforms
  sched/fair: Improve the ->group_imb logic
  sched/nohz: Fix rq->cpu_load[] calculations
  sched/numa: Don't scale the imbalance
  sched/fair: Revert sched-domain iteration breakage
  sched/x86: Rewrite set_cpu_sibling_map()
  sched/numa: Fix the new NUMA topology bits
  sched/numa: Rewrite the CONFIG_NUMA sched domain support
  sched/fair: Propagate 'struct lb_env' usage into find_busiest_group
  sched/fair: Add some serialization to the sched_domain load-balance walk
  sched/fair: Let minimally loaded cpu balance the group
  sched: Change rq->nr_running to unsigned int
  x86/numa: Check for nonsensical topologies on real hw as well
  x86/numa: Hard partition cpu topology masks on node boundaries
  x86/numa: Allow specifying node_distance() for numa=fake
  x86/sched: Make mwait_usable() heed to "idle=" kernel parameters properly
  sched: Update documentation and comments
  sched_rt: Avoid unnecessary dequeue and enqueue of pushable tasks in set_cpus_allowed_rt()
parents 2ff2b289 1c2927f1
...@@ -9,31 +9,6 @@ Description: ...@@ -9,31 +9,6 @@ Description:
/sys/devices/system/cpu/cpu#/ /sys/devices/system/cpu/cpu#/
What: /sys/devices/system/cpu/sched_mc_power_savings
/sys/devices/system/cpu/sched_smt_power_savings
Date: June 2006
Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org>
Description: Discover and adjust the kernel's multi-core scheduler support.
Possible values are:
0 - No power saving load balance (default value)
1 - Fill one thread/core/package first for long running threads
2 - Also bias task wakeups to semi-idle cpu package for power
savings
sched_mc_power_savings is dependent upon SCHED_MC, which is
itself architecture dependent.
sched_smt_power_savings is dependent upon SCHED_SMT, which
is itself architecture dependent.
The two files are independent of each other. It is possible
that one file may be present without the other.
Introduced by git commit 5c45bf27.
What: /sys/devices/system/cpu/kernel_max What: /sys/devices/system/cpu/kernel_max
/sys/devices/system/cpu/offline /sys/devices/system/cpu/offline
/sys/devices/system/cpu/online /sys/devices/system/cpu/online
......
...@@ -130,7 +130,7 @@ CFS implements three scheduling policies: ...@@ -130,7 +130,7 @@ CFS implements three scheduling policies:
idle timer scheduler in order to avoid to get into priority idle timer scheduler in order to avoid to get into priority
inversion problems which would deadlock the machine. inversion problems which would deadlock the machine.
SCHED_FIFO/_RR are implemented in sched_rt.c and are as specified by SCHED_FIFO/_RR are implemented in sched/rt.c and are as specified by
POSIX. POSIX.
The command chrt from util-linux-ng 2.13.1.1 can set all of these except The command chrt from util-linux-ng 2.13.1.1 can set all of these except
...@@ -145,9 +145,9 @@ Classes," an extensible hierarchy of scheduler modules. These modules ...@@ -145,9 +145,9 @@ Classes," an extensible hierarchy of scheduler modules. These modules
encapsulate scheduling policy details and are handled by the scheduler core encapsulate scheduling policy details and are handled by the scheduler core
without the core code assuming too much about them. without the core code assuming too much about them.
sched_fair.c implements the CFS scheduler described above. sched/fair.c implements the CFS scheduler described above.
sched_rt.c implements SCHED_FIFO and SCHED_RR semantics, in a simpler way than sched/rt.c implements SCHED_FIFO and SCHED_RR semantics, in a simpler way than
the previous vanilla scheduler did. It uses 100 runqueues (for all 100 RT the previous vanilla scheduler did. It uses 100 runqueues (for all 100 RT
priority levels, instead of 140 in the previous scheduler) and it needs no priority levels, instead of 140 in the previous scheduler) and it needs no
expired array. expired array.
......
...@@ -61,10 +61,6 @@ The implementor should read comments in include/linux/sched.h: ...@@ -61,10 +61,6 @@ The implementor should read comments in include/linux/sched.h:
struct sched_domain fields, SD_FLAG_*, SD_*_INIT to get an idea of struct sched_domain fields, SD_FLAG_*, SD_*_INIT to get an idea of
the specifics and what to tune. the specifics and what to tune.
For SMT, the architecture must define CONFIG_SCHED_SMT and provide a
cpumask_t cpu_sibling_map[NR_CPUS], where cpu_sibling_map[i] is the mask of
all "i"'s siblings as well as "i" itself.
Architectures may retain the regular override the default SD_*_INIT flags Architectures may retain the regular override the default SD_*_INIT flags
while using the generic domain builder in kernel/sched.c if they wish to while using the generic domain builder in kernel/sched.c if they wish to
retain the traditional SMT->SMP->NUMA topology (or some subset of that). This retain the traditional SMT->SMP->NUMA topology (or some subset of that). This
......
...@@ -70,31 +70,6 @@ void build_cpu_to_node_map(void); ...@@ -70,31 +70,6 @@ void build_cpu_to_node_map(void);
.nr_balance_failed = 0, \ .nr_balance_failed = 0, \
} }
/* sched_domains SD_NODE_INIT for IA64 NUMA machines */
#define SD_NODE_INIT (struct sched_domain) { \
.parent = NULL, \
.child = NULL, \
.groups = NULL, \
.min_interval = 8, \
.max_interval = 8*(min(num_online_cpus(), 32U)), \
.busy_factor = 64, \
.imbalance_pct = 125, \
.cache_nice_tries = 2, \
.busy_idx = 3, \
.idle_idx = 2, \
.newidle_idx = 0, \
.wake_idx = 0, \
.forkexec_idx = 0, \
.flags = SD_LOAD_BALANCE \
| SD_BALANCE_NEWIDLE \
| SD_BALANCE_EXEC \
| SD_BALANCE_FORK \
| SD_SERIALIZE, \
.last_balance = jiffies, \
.balance_interval = 64, \
.nr_balance_failed = 0, \
}
#endif /* CONFIG_NUMA */ #endif /* CONFIG_NUMA */
#ifdef CONFIG_SMP #ifdef CONFIG_SMP
......
...@@ -36,23 +36,6 @@ extern unsigned char __node_distances[MAX_COMPACT_NODES][MAX_COMPACT_NODES]; ...@@ -36,23 +36,6 @@ extern unsigned char __node_distances[MAX_COMPACT_NODES][MAX_COMPACT_NODES];
#define node_distance(from, to) (__node_distances[(from)][(to)]) #define node_distance(from, to) (__node_distances[(from)][(to)])
/* sched_domains SD_NODE_INIT for SGI IP27 machines */
#define SD_NODE_INIT (struct sched_domain) { \
.parent = NULL, \
.child = NULL, \
.groups = NULL, \
.min_interval = 8, \
.max_interval = 32, \
.busy_factor = 32, \
.imbalance_pct = 125, \
.cache_nice_tries = 1, \
.flags = SD_LOAD_BALANCE | \
SD_BALANCE_EXEC, \
.last_balance = jiffies, \
.balance_interval = 1, \
.nr_balance_failed = 0, \
}
#include <asm-generic/topology.h> #include <asm-generic/topology.h>
#endif /* _ASM_MACH_TOPOLOGY_H */ #endif /* _ASM_MACH_TOPOLOGY_H */
...@@ -18,12 +18,6 @@ struct device_node; ...@@ -18,12 +18,6 @@ struct device_node;
*/ */
#define RECLAIM_DISTANCE 10 #define RECLAIM_DISTANCE 10
/*
* Avoid creating an extra level of balancing (SD_ALLNODES) on the largest
* POWER7 boxes which have a maximum of 32 nodes.
*/
#define SD_NODES_PER_DOMAIN 32
#include <asm/mmzone.h> #include <asm/mmzone.h>
static inline int cpu_to_node(int cpu) static inline int cpu_to_node(int cpu)
...@@ -51,36 +45,6 @@ static inline int pcibus_to_node(struct pci_bus *bus) ...@@ -51,36 +45,6 @@ static inline int pcibus_to_node(struct pci_bus *bus)
cpu_all_mask : \ cpu_all_mask : \
cpumask_of_node(pcibus_to_node(bus))) cpumask_of_node(pcibus_to_node(bus)))
/* sched_domains SD_NODE_INIT for PPC64 machines */
#define SD_NODE_INIT (struct sched_domain) { \
.min_interval = 8, \
.max_interval = 32, \
.busy_factor = 32, \
.imbalance_pct = 125, \
.cache_nice_tries = 1, \
.busy_idx = 3, \
.idle_idx = 1, \
.newidle_idx = 0, \
.wake_idx = 0, \
.forkexec_idx = 0, \
\
.flags = 1*SD_LOAD_BALANCE \
| 0*SD_BALANCE_NEWIDLE \
| 1*SD_BALANCE_EXEC \
| 1*SD_BALANCE_FORK \
| 0*SD_BALANCE_WAKE \
| 1*SD_WAKE_AFFINE \
| 0*SD_PREFER_LOCAL \
| 0*SD_SHARE_CPUPOWER \
| 0*SD_POWERSAVINGS_BALANCE \
| 0*SD_SHARE_PKG_RESOURCES \
| 1*SD_SERIALIZE \
| 0*SD_PREFER_SIBLING \
, \
.last_balance = jiffies, \
.balance_interval = 1, \
}
extern int __node_distance(int, int); extern int __node_distance(int, int);
#define node_distance(a, b) __node_distance(a, b) #define node_distance(a, b) __node_distance(a, b)
......
...@@ -3,31 +3,6 @@ ...@@ -3,31 +3,6 @@
#ifdef CONFIG_NUMA #ifdef CONFIG_NUMA
/* sched_domains SD_NODE_INIT for sh machines */
#define SD_NODE_INIT (struct sched_domain) { \
.parent = NULL, \
.child = NULL, \
.groups = NULL, \
.min_interval = 8, \
.max_interval = 32, \
.busy_factor = 32, \
.imbalance_pct = 125, \
.cache_nice_tries = 2, \
.busy_idx = 3, \
.idle_idx = 2, \
.newidle_idx = 0, \
.wake_idx = 0, \
.forkexec_idx = 0, \
.flags = SD_LOAD_BALANCE \
| SD_BALANCE_FORK \
| SD_BALANCE_EXEC \
| SD_BALANCE_NEWIDLE \
| SD_SERIALIZE, \
.last_balance = jiffies, \
.balance_interval = 1, \
.nr_balance_failed = 0, \
}
#define cpu_to_node(cpu) ((void)(cpu),0) #define cpu_to_node(cpu) ((void)(cpu),0)
#define parent_node(node) ((void)(node),0) #define parent_node(node) ((void)(node),0)
......
...@@ -31,25 +31,6 @@ static inline int pcibus_to_node(struct pci_bus *pbus) ...@@ -31,25 +31,6 @@ static inline int pcibus_to_node(struct pci_bus *pbus)
cpu_all_mask : \ cpu_all_mask : \
cpumask_of_node(pcibus_to_node(bus))) cpumask_of_node(pcibus_to_node(bus)))
#define SD_NODE_INIT (struct sched_domain) { \
.min_interval = 8, \
.max_interval = 32, \
.busy_factor = 32, \
.imbalance_pct = 125, \
.cache_nice_tries = 2, \
.busy_idx = 3, \
.idle_idx = 2, \
.newidle_idx = 0, \
.wake_idx = 0, \
.forkexec_idx = 0, \
.flags = SD_LOAD_BALANCE \
| SD_BALANCE_FORK \
| SD_BALANCE_EXEC \
| SD_SERIALIZE, \
.last_balance = jiffies, \
.balance_interval = 1, \
}
#else /* CONFIG_NUMA */ #else /* CONFIG_NUMA */
#include <asm-generic/topology.h> #include <asm-generic/topology.h>
......
...@@ -78,32 +78,6 @@ static inline const struct cpumask *cpumask_of_node(int node) ...@@ -78,32 +78,6 @@ static inline const struct cpumask *cpumask_of_node(int node)
.balance_interval = 32, \ .balance_interval = 32, \
} }
/* sched_domains SD_NODE_INIT for TILE architecture */
#define SD_NODE_INIT (struct sched_domain) { \
.min_interval = 16, \
.max_interval = 512, \
.busy_factor = 32, \
.imbalance_pct = 125, \
.cache_nice_tries = 1, \
.busy_idx = 3, \
.idle_idx = 1, \
.newidle_idx = 2, \
.wake_idx = 1, \
.flags = 1*SD_LOAD_BALANCE \
| 1*SD_BALANCE_NEWIDLE \
| 1*SD_BALANCE_EXEC \
| 1*SD_BALANCE_FORK \
| 0*SD_BALANCE_WAKE \
| 0*SD_WAKE_AFFINE \
| 0*SD_PREFER_LOCAL \
| 0*SD_SHARE_CPUPOWER \
| 0*SD_SHARE_PKG_RESOURCES \
| 1*SD_SERIALIZE \
, \
.last_balance = jiffies, \
.balance_interval = 128, \
}
/* By definition, we create nodes based on online memory. */ /* By definition, we create nodes based on online memory. */
#define node_has_online_mem(nid) 1 #define node_has_online_mem(nid) 1
......
...@@ -92,44 +92,6 @@ extern void setup_node_to_cpumask_map(void); ...@@ -92,44 +92,6 @@ extern void setup_node_to_cpumask_map(void);
#define pcibus_to_node(bus) __pcibus_to_node(bus) #define pcibus_to_node(bus) __pcibus_to_node(bus)
#ifdef CONFIG_X86_32
# define SD_CACHE_NICE_TRIES 1
# define SD_IDLE_IDX 1
#else
# define SD_CACHE_NICE_TRIES 2
# define SD_IDLE_IDX 2
#endif
/* sched_domains SD_NODE_INIT for NUMA machines */
#define SD_NODE_INIT (struct sched_domain) { \
.min_interval = 8, \
.max_interval = 32, \
.busy_factor = 32, \
.imbalance_pct = 125, \
.cache_nice_tries = SD_CACHE_NICE_TRIES, \
.busy_idx = 3, \
.idle_idx = SD_IDLE_IDX, \
.newidle_idx = 0, \
.wake_idx = 0, \
.forkexec_idx = 0, \
\
.flags = 1*SD_LOAD_BALANCE \
| 1*SD_BALANCE_NEWIDLE \
| 1*SD_BALANCE_EXEC \
| 1*SD_BALANCE_FORK \
| 0*SD_BALANCE_WAKE \
| 1*SD_WAKE_AFFINE \
| 0*SD_PREFER_LOCAL \
| 0*SD_SHARE_CPUPOWER \
| 0*SD_POWERSAVINGS_BALANCE \
| 0*SD_SHARE_PKG_RESOURCES \
| 1*SD_SERIALIZE \
| 0*SD_PREFER_SIBLING \
, \
.last_balance = jiffies, \
.balance_interval = 1, \
}
extern int __node_distance(int, int); extern int __node_distance(int, int);
#define node_distance(a, b) __node_distance(a, b) #define node_distance(a, b) __node_distance(a, b)
......
...@@ -582,9 +582,17 @@ int mwait_usable(const struct cpuinfo_x86 *c) ...@@ -582,9 +582,17 @@ int mwait_usable(const struct cpuinfo_x86 *c)
{ {
u32 eax, ebx, ecx, edx; u32 eax, ebx, ecx, edx;
/* Use mwait if idle=mwait boot option is given */
if (boot_option_idle_override == IDLE_FORCE_MWAIT) if (boot_option_idle_override == IDLE_FORCE_MWAIT)
return 1; return 1;
/*
* Any idle= boot option other than idle=mwait means that we must not
* use mwait. Eg: idle=halt or idle=poll or idle=nomwait
*/
if (boot_option_idle_override != IDLE_NO_OVERRIDE)
return 0;
if (c->cpuid_level < MWAIT_INFO) if (c->cpuid_level < MWAIT_INFO)
return 0; return 0;
......
...@@ -299,59 +299,90 @@ void __cpuinit smp_store_cpu_info(int id) ...@@ -299,59 +299,90 @@ void __cpuinit smp_store_cpu_info(int id)
identify_secondary_cpu(c); identify_secondary_cpu(c);
} }
static void __cpuinit link_thread_siblings(int cpu1, int cpu2) static bool __cpuinit
topology_sane(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o, const char *name)
{ {
cpumask_set_cpu(cpu1, cpu_sibling_mask(cpu2)); int cpu1 = c->cpu_index, cpu2 = o->cpu_index;
cpumask_set_cpu(cpu2, cpu_sibling_mask(cpu1));
cpumask_set_cpu(cpu1, cpu_core_mask(cpu2)); return !WARN_ONCE(cpu_to_node(cpu1) != cpu_to_node(cpu2),
cpumask_set_cpu(cpu2, cpu_core_mask(cpu1)); "sched: CPU #%d's %s-sibling CPU #%d is not on the same node! "
cpumask_set_cpu(cpu1, cpu_llc_shared_mask(cpu2)); "[node: %d != %d]. Ignoring dependency.\n",
cpumask_set_cpu(cpu2, cpu_llc_shared_mask(cpu1)); cpu1, name, cpu2, cpu_to_node(cpu1), cpu_to_node(cpu2));
} }
#define link_mask(_m, c1, c2) \
do { \
cpumask_set_cpu((c1), cpu_##_m##_mask(c2)); \
cpumask_set_cpu((c2), cpu_##_m##_mask(c1)); \
} while (0)
void __cpuinit set_cpu_sibling_map(int cpu) static bool __cpuinit match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
{ {
int i;
struct cpuinfo_x86 *c = &cpu_data(cpu);
cpumask_set_cpu(cpu, cpu_sibling_setup_mask);
if (smp_num_siblings > 1) {
for_each_cpu(i, cpu_sibling_setup_mask) {
struct cpuinfo_x86 *o = &cpu_data(i);
if (cpu_has(c, X86_FEATURE_TOPOEXT)) { if (cpu_has(c, X86_FEATURE_TOPOEXT)) {
int cpu1 = c->cpu_index, cpu2 = o->cpu_index;
if (c->phys_proc_id == o->phys_proc_id && if (c->phys_proc_id == o->phys_proc_id &&
per_cpu(cpu_llc_id, cpu) == per_cpu(cpu_llc_id, i) && per_cpu(cpu_llc_id, cpu1) == per_cpu(cpu_llc_id, cpu2) &&
c->compute_unit_id == o->compute_unit_id) c->compute_unit_id == o->compute_unit_id)
link_thread_siblings(cpu, i); return topology_sane(c, o, "smt");
} else if (c->phys_proc_id == o->phys_proc_id && } else if (c->phys_proc_id == o->phys_proc_id &&
c->cpu_core_id == o->cpu_core_id) { c->cpu_core_id == o->cpu_core_id) {
link_thread_siblings(cpu, i); return topology_sane(c, o, "smt");
}
}
} else {
cpumask_set_cpu(cpu, cpu_sibling_mask(cpu));
} }
cpumask_set_cpu(cpu, cpu_llc_shared_mask(cpu)); return false;
}
static bool __cpuinit match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
{
int cpu1 = c->cpu_index, cpu2 = o->cpu_index;
if (per_cpu(cpu_llc_id, cpu1) != BAD_APICID &&
per_cpu(cpu_llc_id, cpu1) == per_cpu(cpu_llc_id, cpu2))
return topology_sane(c, o, "llc");
if (__this_cpu_read(cpu_info.x86_max_cores) == 1) { return false;
cpumask_copy(cpu_core_mask(cpu), cpu_sibling_mask(cpu)); }
static bool __cpuinit match_mc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
{
if (c->phys_proc_id == o->phys_proc_id)
return topology_sane(c, o, "mc");
return false;
}
void __cpuinit set_cpu_sibling_map(int cpu)
{
bool has_mc = boot_cpu_data.x86_max_cores > 1;
bool has_smt = smp_num_siblings > 1;
struct cpuinfo_x86 *c = &cpu_data(cpu);
struct cpuinfo_x86 *o;
int i;
cpumask_set_cpu(cpu, cpu_sibling_setup_mask);
if (!has_smt && !has_mc) {
cpumask_set_cpu(cpu, cpu_sibling_mask(cpu));
cpumask_set_cpu(cpu, cpu_llc_shared_mask(cpu));
cpumask_set_cpu(cpu, cpu_core_mask(cpu));
c->booted_cores = 1; c->booted_cores = 1;
return; return;
} }
for_each_cpu(i, cpu_sibling_setup_mask) { for_each_cpu(i, cpu_sibling_setup_mask) {
if (per_cpu(cpu_llc_id, cpu) != BAD_APICID && o = &cpu_data(i);
per_cpu(cpu_llc_id, cpu) == per_cpu(cpu_llc_id, i)) {
cpumask_set_cpu(i, cpu_llc_shared_mask(cpu)); if ((i == cpu) || (has_smt && match_smt(c, o)))
cpumask_set_cpu(cpu, cpu_llc_shared_mask(i)); link_mask(sibling, cpu, i);
}
if (c->phys_proc_id == cpu_data(i).phys_proc_id) { if ((i == cpu) || (has_mc && match_llc(c, o)))
cpumask_set_cpu(i, cpu_core_mask(cpu)); link_mask(llc_shared, cpu, i);
cpumask_set_cpu(cpu, cpu_core_mask(i));
if ((i == cpu) || (has_mc && match_mc(c, o))) {
link_mask(core, cpu, i);
/* /*
* Does this new cpu bringup a new core? * Does this new cpu bringup a new core?
*/ */
...@@ -382,8 +413,7 @@ const struct cpumask *cpu_coregroup_mask(int cpu) ...@@ -382,8 +413,7 @@ const struct cpumask *cpu_coregroup_mask(int cpu)
* For perf, we return last level cache shared map. * For perf, we return last level cache shared map.
* And for power savings, we return cpu_core_map * And for power savings, we return cpu_core_map
*/ */
if ((sched_mc_power_savings || sched_smt_power_savings) && if (!(cpu_has(c, X86_FEATURE_AMD_DCM)))
!(cpu_has(c, X86_FEATURE_AMD_DCM)))
return cpu_core_mask(cpu); return cpu_core_mask(cpu);
else else
return cpu_llc_shared_mask(cpu); return cpu_llc_shared_mask(cpu);
......
...@@ -339,9 +339,11 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt) ...@@ -339,9 +339,11 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt)
} else { } else {
unsigned long n; unsigned long n;
n = simple_strtoul(emu_cmdline, NULL, 0); n = simple_strtoul(emu_cmdline, &emu_cmdline, 0);
ret = split_nodes_interleave(&ei, &pi, 0, max_addr, n); ret = split_nodes_interleave(&ei, &pi, 0, max_addr, n);
} }
if (*emu_cmdline == ':')
emu_cmdline++;
if (ret < 0) if (ret < 0)
goto no_emu; goto no_emu;
...@@ -418,7 +420,9 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt) ...@@ -418,7 +420,9 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt)
int physj = emu_nid_to_phys[j]; int physj = emu_nid_to_phys[j];
int dist; int dist;
if (physi >= numa_dist_cnt || physj >= numa_dist_cnt) if (get_option(&emu_cmdline, &dist) == 2)
;
else if (physi >= numa_dist_cnt || physj >= numa_dist_cnt)
dist = physi == physj ? dist = physi == physj ?
LOCAL_DISTANCE : REMOTE_DISTANCE; LOCAL_DISTANCE : REMOTE_DISTANCE;
else else
......
...@@ -330,8 +330,4 @@ void __init cpu_dev_init(void) ...@@ -330,8 +330,4 @@ void __init cpu_dev_init(void)
panic("Failed to register CPU subsystem"); panic("Failed to register CPU subsystem");
cpu_dev_register_generic(); cpu_dev_register_generic();
#if defined(CONFIG_SCHED_MC) || defined(CONFIG_SCHED_SMT)
sched_create_sysfs_power_savings_entries(cpu_subsys.dev_root);
#endif
} }
...@@ -36,8 +36,6 @@ extern void cpu_remove_dev_attr(struct device_attribute *attr); ...@@ -36,8 +36,6 @@ extern void cpu_remove_dev_attr(struct device_attribute *attr);
extern int cpu_add_dev_attr_group(struct attribute_group *attrs); extern int cpu_add_dev_attr_group(struct attribute_group *attrs);
extern void cpu_remove_dev_attr_group(struct attribute_group *attrs); extern void cpu_remove_dev_attr_group(struct attribute_group *attrs);
extern int sched_create_sysfs_power_savings_entries(struct device *dev);
#ifdef CONFIG_HOTPLUG_CPU #ifdef CONFIG_HOTPLUG_CPU
extern void unregister_cpu(struct cpu *cpu); extern void unregister_cpu(struct cpu *cpu);
extern ssize_t arch_cpu_probe(const char *, size_t); extern ssize_t arch_cpu_probe(const char *, size_t);
......
...@@ -855,61 +855,14 @@ enum cpu_idle_type { ...@@ -855,61 +855,14 @@ enum cpu_idle_type {
#define SD_WAKE_AFFINE 0x0020 /* Wake task to waking CPU */ #define SD_WAKE_AFFINE 0x0020 /* Wake task to waking CPU */
#define SD_PREFER_LOCAL 0x0040 /* Prefer to keep tasks local to this domain */ #define SD_PREFER_LOCAL 0x0040 /* Prefer to keep tasks local to this domain */
#define SD_SHARE_CPUPOWER 0x0080 /* Domain members share cpu power */ #define SD_SHARE_CPUPOWER 0x0080 /* Domain members share cpu power */
#define SD_POWERSAVINGS_BALANCE 0x0100 /* Balance for power savings */
#define SD_SHARE_PKG_RESOURCES 0x0200 /* Domain members share cpu pkg resources */ #define SD_SHARE_PKG_RESOURCES 0x0200 /* Domain members share cpu pkg resources */
#define SD_SERIALIZE 0x0400 /* Only a single load balancing instance */ #define SD_SERIALIZE 0x0400 /* Only a single load balancing instance */
#define SD_ASYM_PACKING 0x0800 /* Place busy groups earlier in the domain */ #define SD_ASYM_PACKING 0x0800 /* Place busy groups earlier in the domain */
#define SD_PREFER_SIBLING 0x1000 /* Prefer to place tasks in a sibling domain */ #define SD_PREFER_SIBLING 0x1000 /* Prefer to place tasks in a sibling domain */
#define SD_OVERLAP 0x2000 /* sched_domains of this level overlap */ #define SD_OVERLAP 0x2000 /* sched_domains of this level overlap */
enum powersavings_balance_level {
POWERSAVINGS_BALANCE_NONE = 0, /* No power saving load balance */
POWERSAVINGS_BALANCE_BASIC, /* Fill one thread/core/package
* first for long running threads
*/
POWERSAVINGS_BALANCE_WAKEUP, /* Also bias task wakeups to semi-idle
* cpu package for power savings
*/
MAX_POWERSAVINGS_BALANCE_LEVELS
};
extern int sched_mc_power_savings, sched_smt_power_savings;
static inline int sd_balance_for_mc_power(void)
{
if (sched_smt_power_savings)
return SD_POWERSAVINGS_BALANCE;
if (!sched_mc_power_savings)
return SD_PREFER_SIBLING;
return 0;
}
static inline int sd_balance_for_package_power(void)
{
if (sched_mc_power_savings | sched_smt_power_savings)
return SD_POWERSAVINGS_BALANCE;
return SD_PREFER_SIBLING;
}
extern int __weak arch_sd_sibiling_asym_packing(void); extern int __weak arch_sd_sibiling_asym_packing(void);
/*
* Optimise SD flags for power savings:
* SD_BALANCE_NEWIDLE helps aggressive task consolidation and power savings.
* Keep default SD flags if sched_{smt,mc}_power_saving=0
*/
static inline int sd_power_saving_flags(void)
{
if (sched_mc_power_savings | sched_smt_power_savings)
return SD_BALANCE_NEWIDLE;
return 0;
}
struct sched_group_power { struct sched_group_power {
atomic_t ref; atomic_t ref;
/* /*
...@@ -1962,7 +1915,7 @@ static inline int set_cpus_allowed(struct task_struct *p, cpumask_t new_mask) ...@@ -1962,7 +1915,7 @@ static inline int set_cpus_allowed(struct task_struct *p, cpumask_t new_mask)
*/ */
extern unsigned long long notrace sched_clock(void); extern unsigned long long notrace sched_clock(void);
/* /*
* See the comment in kernel/sched_clock.c * See the comment in kernel/sched/clock.c
*/ */
extern u64 cpu_clock(int cpu); extern u64 cpu_clock(int cpu);
extern u64 local_clock(void); extern u64 local_clock(void);
......
...@@ -70,7 +70,6 @@ int arch_update_cpu_topology(void); ...@@ -70,7 +70,6 @@ int arch_update_cpu_topology(void);
* Below are the 3 major initializers used in building sched_domains: * Below are the 3 major initializers used in building sched_domains:
* SD_SIBLING_INIT, for SMT domains * SD_SIBLING_INIT, for SMT domains
* SD_CPU_INIT, for SMP domains * SD_CPU_INIT, for SMP domains
* SD_NODE_INIT, for NUMA domains
* *
* Any architecture that cares to do any tuning to these values should do so * Any architecture that cares to do any tuning to these values should do so
* by defining their own arch-specific initializer in include/asm/topology.h. * by defining their own arch-specific initializer in include/asm/topology.h.
...@@ -99,7 +98,6 @@ int arch_update_cpu_topology(void); ...@@ -99,7 +98,6 @@ int arch_update_cpu_topology(void);
| 0*SD_BALANCE_WAKE \ | 0*SD_BALANCE_WAKE \
| 1*SD_WAKE_AFFINE \ | 1*SD_WAKE_AFFINE \
| 1*SD_SHARE_CPUPOWER \ | 1*SD_SHARE_CPUPOWER \
| 0*SD_POWERSAVINGS_BALANCE \
| 1*SD_SHARE_PKG_RESOURCES \ | 1*SD_SHARE_PKG_RESOURCES \
| 0*SD_SERIALIZE \ | 0*SD_SERIALIZE \
| 0*SD_PREFER_SIBLING \ | 0*SD_PREFER_SIBLING \
...@@ -135,8 +133,6 @@ int arch_update_cpu_topology(void); ...@@ -135,8 +133,6 @@ int arch_update_cpu_topology(void);
| 0*SD_SHARE_CPUPOWER \ | 0*SD_SHARE_CPUPOWER \
| 1*SD_SHARE_PKG_RESOURCES \ | 1*SD_SHARE_PKG_RESOURCES \
| 0*SD_SERIALIZE \ | 0*SD_SERIALIZE \
| sd_balance_for_mc_power() \
| sd_power_saving_flags() \
, \ , \
.last_balance = jiffies, \ .last_balance = jiffies, \
.balance_interval = 1, \ .balance_interval = 1, \
...@@ -168,56 +164,18 @@ int arch_update_cpu_topology(void); ...@@ -168,56 +164,18 @@ int arch_update_cpu_topology(void);
| 0*SD_SHARE_CPUPOWER \ | 0*SD_SHARE_CPUPOWER \
| 0*SD_SHARE_PKG_RESOURCES \ | 0*SD_SHARE_PKG_RESOURCES \
| 0*SD_SERIALIZE \ | 0*SD_SERIALIZE \
| sd_balance_for_package_power() \
| sd_power_saving_flags() \
, \ , \
.last_balance = jiffies, \ .last_balance = jiffies, \
.balance_interval = 1, \ .balance_interval = 1, \
} }
#endif #endif
/* sched_domains SD_ALLNODES_INIT for NUMA machines */
#define SD_ALLNODES_INIT (struct sched_domain) { \
.min_interval = 64, \
.max_interval = 64*num_online_cpus(), \
.busy_factor = 128, \
.imbalance_pct = 133, \
.cache_nice_tries = 1, \
.busy_idx = 3, \
.idle_idx = 3, \
.flags = 1*SD_LOAD_BALANCE \
| 1*SD_BALANCE_NEWIDLE \
| 0*SD_BALANCE_EXEC \
| 0*SD_BALANCE_FORK \
| 0*SD_BALANCE_WAKE \
| 0*SD_WAKE_AFFINE \
| 0*SD_SHARE_CPUPOWER \
| 0*SD_POWERSAVINGS_BALANCE \
| 0*SD_SHARE_PKG_RESOURCES \
| 1*SD_SERIALIZE \
| 0*SD_PREFER_SIBLING \
, \
.last_balance = jiffies, \
.balance_interval = 64, \
}
#ifndef SD_NODES_PER_DOMAIN
#define SD_NODES_PER_DOMAIN 16
#endif
#ifdef CONFIG_SCHED_BOOK #ifdef CONFIG_SCHED_BOOK
#ifndef SD_BOOK_INIT #ifndef SD_BOOK_INIT
#error Please define an appropriate SD_BOOK_INIT in include/asm/topology.h!!! #error Please define an appropriate SD_BOOK_INIT in include/asm/topology.h!!!
#endif #endif
#endif /* CONFIG_SCHED_BOOK */ #endif /* CONFIG_SCHED_BOOK */
#ifdef CONFIG_NUMA
#ifndef SD_NODE_INIT
#error Please define an appropriate SD_NODE_INIT in include/asm/topology.h!!!
#endif
#endif /* CONFIG_NUMA */
#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID #ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
DECLARE_PER_CPU(int, numa_node); DECLARE_PER_CPU(int, numa_node);
......
This diff is collapsed.
...@@ -202,7 +202,7 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq) ...@@ -202,7 +202,7 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
SPLIT_NS(spread0)); SPLIT_NS(spread0));
SEQ_printf(m, " .%-30s: %d\n", "nr_spread_over", SEQ_printf(m, " .%-30s: %d\n", "nr_spread_over",
cfs_rq->nr_spread_over); cfs_rq->nr_spread_over);
SEQ_printf(m, " .%-30s: %ld\n", "nr_running", cfs_rq->nr_running); SEQ_printf(m, " .%-30s: %d\n", "nr_running", cfs_rq->nr_running);
SEQ_printf(m, " .%-30s: %ld\n", "load", cfs_rq->load.weight); SEQ_printf(m, " .%-30s: %ld\n", "load", cfs_rq->load.weight);
#ifdef CONFIG_FAIR_GROUP_SCHED #ifdef CONFIG_FAIR_GROUP_SCHED
#ifdef CONFIG_SMP #ifdef CONFIG_SMP
...@@ -261,7 +261,13 @@ static void print_cpu(struct seq_file *m, int cpu) ...@@ -261,7 +261,13 @@ static void print_cpu(struct seq_file *m, int cpu)
#endif #endif
#define P(x) \ #define P(x) \
SEQ_printf(m, " .%-30s: %Ld\n", #x, (long long)(rq->x)) do { \
if (sizeof(rq->x) == 4) \
SEQ_printf(m, " .%-30s: %ld\n", #x, (long)(rq->x)); \
else \
SEQ_printf(m, " .%-30s: %Ld\n", #x, (long long)(rq->x));\
} while (0)
#define PN(x) \ #define PN(x) \
SEQ_printf(m, " .%-30s: %Ld.%06ld\n", #x, SPLIT_NS(rq->x)) SEQ_printf(m, " .%-30s: %Ld.%06ld\n", #x, SPLIT_NS(rq->x))
......
This diff is collapsed.
...@@ -4,7 +4,7 @@ ...@@ -4,7 +4,7 @@
* idle-task scheduling class. * idle-task scheduling class.
* *
* (NOTE: these are not related to SCHED_IDLE tasks which are * (NOTE: these are not related to SCHED_IDLE tasks which are
* handled in sched_fair.c) * handled in sched/fair.c)
*/ */
#ifdef CONFIG_SMP #ifdef CONFIG_SMP
......
...@@ -1803,44 +1803,40 @@ static void task_woken_rt(struct rq *rq, struct task_struct *p) ...@@ -1803,44 +1803,40 @@ static void task_woken_rt(struct rq *rq, struct task_struct *p)
static void set_cpus_allowed_rt(struct task_struct *p, static void set_cpus_allowed_rt(struct task_struct *p,
const struct cpumask *new_mask) const struct cpumask *new_mask)
{ {
int weight = cpumask_weight(new_mask); struct rq *rq;
int weight;
BUG_ON(!rt_task(p)); BUG_ON(!rt_task(p));
/* if (!p->on_rq)
* Update the migration status of the RQ if we have an RT task return;
* which is running AND changing its weight value.
*/
if (p->on_rq && (weight != p->rt.nr_cpus_allowed)) {
struct rq *rq = task_rq(p);
if (!task_current(rq, p)) { weight = cpumask_weight(new_mask);
/*
* Make sure we dequeue this task from the pushable list
* before going further. It will either remain off of
* the list because we are no longer pushable, or it
* will be requeued.
*/
if (p->rt.nr_cpus_allowed > 1)
dequeue_pushable_task(rq, p);
/* /*
* Requeue if our weight is changing and still > 1 * Only update if the process changes its state from whether it
* can migrate or not.
*/ */
if (weight > 1) if ((p->rt.nr_cpus_allowed > 1) == (weight > 1))
enqueue_pushable_task(rq, p); return;
} rq = task_rq(p);
if ((p->rt.nr_cpus_allowed <= 1) && (weight > 1)) { /*
rq->rt.rt_nr_migratory++; * The process used to be able to migrate OR it can now migrate
} else if ((p->rt.nr_cpus_allowed > 1) && (weight <= 1)) { */
if (weight <= 1) {
if (!task_current(rq, p))
dequeue_pushable_task(rq, p);
BUG_ON(!rq->rt.rt_nr_migratory); BUG_ON(!rq->rt.rt_nr_migratory);
rq->rt.rt_nr_migratory--; rq->rt.rt_nr_migratory--;
} else {
if (!task_current(rq, p))
enqueue_pushable_task(rq, p);
rq->rt.rt_nr_migratory++;
} }
update_rt_migration(&rq->rt); update_rt_migration(&rq->rt);
}
} }
/* Assumes rq->lock is held */ /* Assumes rq->lock is held */
......
...@@ -201,7 +201,7 @@ struct cfs_bandwidth { }; ...@@ -201,7 +201,7 @@ struct cfs_bandwidth { };
/* CFS-related fields in a runqueue */ /* CFS-related fields in a runqueue */
struct cfs_rq { struct cfs_rq {
struct load_weight load; struct load_weight load;
unsigned long nr_running, h_nr_running; unsigned int nr_running, h_nr_running;
u64 exec_clock; u64 exec_clock;
u64 min_vruntime; u64 min_vruntime;
...@@ -279,7 +279,7 @@ static inline int rt_bandwidth_enabled(void) ...@@ -279,7 +279,7 @@ static inline int rt_bandwidth_enabled(void)
/* Real-Time classes' related field in a runqueue: */ /* Real-Time classes' related field in a runqueue: */
struct rt_rq { struct rt_rq {
struct rt_prio_array active; struct rt_prio_array active;
unsigned long rt_nr_running; unsigned int rt_nr_running;
#if defined CONFIG_SMP || defined CONFIG_RT_GROUP_SCHED #if defined CONFIG_SMP || defined CONFIG_RT_GROUP_SCHED
struct { struct {
int curr; /* highest queued rt task prio */ int curr; /* highest queued rt task prio */
...@@ -353,7 +353,7 @@ struct rq { ...@@ -353,7 +353,7 @@ struct rq {
* nr_running and cpu_load should be in the same cacheline because * nr_running and cpu_load should be in the same cacheline because
* remote CPUs use both these fields when doing load calculation. * remote CPUs use both these fields when doing load calculation.
*/ */
unsigned long nr_running; unsigned int nr_running;
#define CPU_LOAD_IDX_MAX 5 #define CPU_LOAD_IDX_MAX 5
unsigned long cpu_load[CPU_LOAD_IDX_MAX]; unsigned long cpu_load[CPU_LOAD_IDX_MAX];
unsigned long last_load_update_tick; unsigned long last_load_update_tick;
...@@ -876,7 +876,7 @@ extern void resched_cpu(int cpu); ...@@ -876,7 +876,7 @@ extern void resched_cpu(int cpu);
extern struct rt_bandwidth def_rt_bandwidth; extern struct rt_bandwidth def_rt_bandwidth;
extern void init_rt_bandwidth(struct rt_bandwidth *rt_b, u64 period, u64 runtime); extern void init_rt_bandwidth(struct rt_bandwidth *rt_b, u64 period, u64 runtime);
extern void update_cpu_load(struct rq *this_rq); extern void update_idle_cpu_load(struct rq *this_rq);
#ifdef CONFIG_CGROUP_CPUACCT #ifdef CONFIG_CGROUP_CPUACCT
#include <linux/cgroup.h> #include <linux/cgroup.h>
......
...@@ -85,15 +85,6 @@ Possible values are: ...@@ -85,15 +85,6 @@ Possible values are:
savings savings
.RE .RE
sched_mc_power_savings is dependent upon SCHED_MC, which is
itself architecture dependent.
sched_smt_power_savings is dependent upon SCHED_SMT, which
is itself architecture dependent.
The two files are independent of each other. It is possible
that one file may be present without the other.
.SH "SEE ALSO" .SH "SEE ALSO"
cpupower-info(1), cpupower-monitor(1), powertop(1) cpupower-info(1), cpupower-monitor(1), powertop(1)
.PP .PP
......
...@@ -362,22 +362,7 @@ char *sysfs_get_cpuidle_driver(void) ...@@ -362,22 +362,7 @@ char *sysfs_get_cpuidle_driver(void)
*/ */
int sysfs_get_sched(const char *smt_mc) int sysfs_get_sched(const char *smt_mc)
{ {
unsigned long value; return -ENODEV;
char linebuf[MAX_LINE_LEN];
char *endp;
char path[SYSFS_PATH_MAX];
if (strcmp("mc", smt_mc) && strcmp("smt", smt_mc))
return -EINVAL;
snprintf(path, sizeof(path),
PATH_TO_CPU "sched_%s_power_savings", smt_mc);
if (sysfs_read_file(path, linebuf, MAX_LINE_LEN) == 0)
return -1;
value = strtoul(linebuf, &endp, 0);
if (endp == linebuf || errno == ERANGE)
return -1;
return value;
} }
/* /*
...@@ -388,21 +373,5 @@ int sysfs_get_sched(const char *smt_mc) ...@@ -388,21 +373,5 @@ int sysfs_get_sched(const char *smt_mc)
*/ */
int sysfs_set_sched(const char *smt_mc, int val) int sysfs_set_sched(const char *smt_mc, int val)
{ {
char linebuf[MAX_LINE_LEN];
char path[SYSFS_PATH_MAX];
struct stat statbuf;
if (strcmp("mc", smt_mc) && strcmp("smt", smt_mc))
return -EINVAL;
snprintf(path, sizeof(path),
PATH_TO_CPU "sched_%s_power_savings", smt_mc);
sprintf(linebuf, "%d", val);
if (stat(path, &statbuf) != 0)
return -ENODEV; return -ENODEV;
if (sysfs_write_file(path, linebuf, MAX_LINE_LEN) == 0)
return -1;
return 0;
} }
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment