Commit 831576fe authored by Linus Torvalds's avatar Linus Torvalds

Merge branch 'sched-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'sched-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (46 commits)
  sched: Add comments to find_busiest_group() function
  sched: Refactor the power savings balance code
  sched: Optimize the !power_savings_balance during fbg()
  sched: Create a helper function to calculate imbalance
  sched: Create helper to calculate small_imbalance in fbg()
  sched: Create a helper function to calculate sched_domain stats for fbg()
  sched: Define structure to store the sched_domain statistics for fbg()
  sched: Create a helper function to calculate sched_group stats for fbg()
  sched: Define structure to store the sched_group statistics for fbg()
  sched: Fix indentations in find_busiest_group() using gotos
  sched: Simple helper functions for find_busiest_group()
  sched: remove unused fields from struct rq
  sched: jiffies not printed per CPU
  sched: small optimisation of can_migrate_task()
  sched: fix typos in documentation
  sched: add avg_overlap decay
  x86, sched_clock(): mark variables read-mostly
  sched: optimize ttwu vs group scheduling
  sched: TIF_NEED_RESCHED -> need_reshed() cleanup
  sched: don't rebalance if attached on NULL domain
  ...
parents 21cdbc13 66fef08f
...@@ -2,8 +2,6 @@ ...@@ -2,8 +2,6 @@
- this file. - this file.
sched-arch.txt sched-arch.txt
- CPU Scheduler implementation hints for architecture specific code. - CPU Scheduler implementation hints for architecture specific code.
sched-coding.txt
- reference for various scheduler-related methods in the O(1) scheduler.
sched-design-CFS.txt sched-design-CFS.txt
- goals, design and implementation of the Complete Fair Scheduler. - goals, design and implementation of the Complete Fair Scheduler.
sched-domains.txt sched-domains.txt
......
Reference for various scheduler-related methods in the O(1) scheduler
Robert Love <rml@tech9.net>, MontaVista Software
Note most of these methods are local to kernel/sched.c - this is by design.
The scheduler is meant to be self-contained and abstracted away. This document
is primarily for understanding the scheduler, not interfacing to it. Some of
the discussed interfaces, however, are general process/scheduling methods.
They are typically defined in include/linux/sched.h.
Main Scheduling Methods
-----------------------
void load_balance(runqueue_t *this_rq, int idle)
Attempts to pull tasks from one cpu to another to balance cpu usage,
if needed. This method is called explicitly if the runqueues are
imbalanced or periodically by the timer tick. Prior to calling,
the current runqueue must be locked and interrupts disabled.
void schedule()
The main scheduling function. Upon return, the highest priority
process will be active.
Locking
-------
Each runqueue has its own lock, rq->lock. When multiple runqueues need
to be locked, lock acquires must be ordered by ascending &runqueue value.
A specific runqueue is locked via
task_rq_lock(task_t pid, unsigned long *flags)
which disables preemption, disables interrupts, and locks the runqueue pid is
running on. Likewise,
task_rq_unlock(task_t pid, unsigned long *flags)
unlocks the runqueue pid is running on, restores interrupts to their previous
state, and reenables preemption.
The routines
double_rq_lock(runqueue_t *rq1, runqueue_t *rq2)
and
double_rq_unlock(runqueue_t *rq1, runqueue_t *rq2)
safely lock and unlock, respectively, the two specified runqueues. They do
not, however, disable and restore interrupts. Users are required to do so
manually before and after calls.
Values
------
MAX_PRIO
The maximum priority of the system, stored in the task as task->prio.
Lower priorities are higher. Normal (non-RT) priorities range from
MAX_RT_PRIO to (MAX_PRIO - 1).
MAX_RT_PRIO
The maximum real-time priority of the system. Valid RT priorities
range from 0 to (MAX_RT_PRIO - 1).
MAX_USER_RT_PRIO
The maximum real-time priority that is exported to user-space. Should
always be equal to or less than MAX_RT_PRIO. Setting it less allows
kernel threads to have higher priorities than any user-space task.
MIN_TIMESLICE
MAX_TIMESLICE
Respectively, the minimum and maximum timeslices (quanta) of a process.
Data
----
struct runqueue
The main per-CPU runqueue data structure.
struct task_struct
The main per-process data structure.
General Methods
---------------
cpu_rq(cpu)
Returns the runqueue of the specified cpu.
this_rq()
Returns the runqueue of the current cpu.
task_rq(pid)
Returns the runqueue which holds the specified pid.
cpu_curr(cpu)
Returns the task currently running on the given cpu.
rt_task(pid)
Returns true if pid is real-time, false if not.
Process Control Methods
-----------------------
void set_user_nice(task_t *p, long nice)
Sets the "nice" value of task p to the given value.
int setscheduler(pid_t pid, int policy, struct sched_param *param)
Sets the scheduling policy and parameters for the given pid.
int set_cpus_allowed(task_t *p, unsigned long new_mask)
Sets a given task's CPU affinity and migrates it to a proper cpu.
Callers must have a valid reference to the task and assure the
task not exit prematurely. No locks can be held during the call.
set_task_state(tsk, state_value)
Sets the given task's state to the given value.
set_current_state(state_value)
Sets the current task's state to the given value.
void set_tsk_need_resched(struct task_struct *tsk)
Sets need_resched in the given task.
void clear_tsk_need_resched(struct task_struct *tsk)
Clears need_resched in the given task.
void set_need_resched()
Sets need_resched in the current task.
void clear_need_resched()
Clears need_resched in the current task.
int need_resched()
Returns true if need_resched is set in the current task, false
otherwise.
yield()
Place the current process at the end of the runqueue and call schedule.
...@@ -4,6 +4,7 @@ ...@@ -4,6 +4,7 @@
#include <linux/string.h> #include <linux/string.h>
#include <linux/bitops.h> #include <linux/bitops.h>
#include <linux/smp.h> #include <linux/smp.h>
#include <linux/sched.h>
#include <linux/thread_info.h> #include <linux/thread_info.h>
#include <linux/module.h> #include <linux/module.h>
...@@ -56,11 +57,16 @@ static void __cpuinit early_init_intel(struct cpuinfo_x86 *c) ...@@ -56,11 +57,16 @@ static void __cpuinit early_init_intel(struct cpuinfo_x86 *c)
/* /*
* c->x86_power is 8000_0007 edx. Bit 8 is TSC runs at constant rate * c->x86_power is 8000_0007 edx. Bit 8 is TSC runs at constant rate
* with P/T states and does not stop in deep C-states * with P/T states and does not stop in deep C-states.
*
* It is also reliable across cores and sockets. (but not across
* cabinets - we turn it off in that case explicitly.)
*/ */
if (c->x86_power & (1 << 8)) { if (c->x86_power & (1 << 8)) {
set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC); set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC); set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
set_cpu_cap(c, X86_FEATURE_TSC_RELIABLE);
sched_clock_stable = 1;
} }
} }
......
...@@ -17,20 +17,21 @@ ...@@ -17,20 +17,21 @@
#include <asm/delay.h> #include <asm/delay.h>
#include <asm/hypervisor.h> #include <asm/hypervisor.h>
unsigned int cpu_khz; /* TSC clocks / usec, not used here */ unsigned int __read_mostly cpu_khz; /* TSC clocks / usec, not used here */
EXPORT_SYMBOL(cpu_khz); EXPORT_SYMBOL(cpu_khz);
unsigned int tsc_khz;
unsigned int __read_mostly tsc_khz;
EXPORT_SYMBOL(tsc_khz); EXPORT_SYMBOL(tsc_khz);
/* /*
* TSC can be unstable due to cpufreq or due to unsynced TSCs * TSC can be unstable due to cpufreq or due to unsynced TSCs
*/ */
static int tsc_unstable; static int __read_mostly tsc_unstable;
/* native_sched_clock() is called before tsc_init(), so /* native_sched_clock() is called before tsc_init(), so
we must start with the TSC soft disabled to prevent we must start with the TSC soft disabled to prevent
erroneous rdtsc usage on !cpu_has_tsc processors */ erroneous rdtsc usage on !cpu_has_tsc processors */
static int tsc_disabled = -1; static int __read_mostly tsc_disabled = -1;
static int tsc_clocksource_reliable; static int tsc_clocksource_reliable;
/* /*
......
...@@ -147,6 +147,7 @@ extern struct cred init_cred; ...@@ -147,6 +147,7 @@ extern struct cred init_cred;
.nr_cpus_allowed = NR_CPUS, \ .nr_cpus_allowed = NR_CPUS, \
}, \ }, \
.tasks = LIST_HEAD_INIT(tsk.tasks), \ .tasks = LIST_HEAD_INIT(tsk.tasks), \
.pushable_tasks = PLIST_NODE_INIT(tsk.pushable_tasks, MAX_PRIO), \
.ptraced = LIST_HEAD_INIT(tsk.ptraced), \ .ptraced = LIST_HEAD_INIT(tsk.ptraced), \
.ptrace_entry = LIST_HEAD_INIT(tsk.ptrace_entry), \ .ptrace_entry = LIST_HEAD_INIT(tsk.ptrace_entry), \
.real_parent = &tsk, \ .real_parent = &tsk, \
......
...@@ -9,6 +9,7 @@ ...@@ -9,6 +9,7 @@
#ifndef _INCLUDE_GUARD_LATENCYTOP_H_ #ifndef _INCLUDE_GUARD_LATENCYTOP_H_
#define _INCLUDE_GUARD_LATENCYTOP_H_ #define _INCLUDE_GUARD_LATENCYTOP_H_
#include <linux/compiler.h>
#ifdef CONFIG_LATENCYTOP #ifdef CONFIG_LATENCYTOP
#define LT_SAVECOUNT 32 #define LT_SAVECOUNT 32
...@@ -24,7 +25,14 @@ struct latency_record { ...@@ -24,7 +25,14 @@ struct latency_record {
struct task_struct; struct task_struct;
void account_scheduler_latency(struct task_struct *task, int usecs, int inter); extern int latencytop_enabled;
void __account_scheduler_latency(struct task_struct *task, int usecs, int inter);
static inline void
account_scheduler_latency(struct task_struct *task, int usecs, int inter)
{
if (unlikely(latencytop_enabled))
__account_scheduler_latency(task, usecs, inter);
}
void clear_all_latency_tracing(struct task_struct *p); void clear_all_latency_tracing(struct task_struct *p);
......
...@@ -96,6 +96,10 @@ struct plist_node { ...@@ -96,6 +96,10 @@ struct plist_node {
# define PLIST_HEAD_LOCK_INIT(_lock) # define PLIST_HEAD_LOCK_INIT(_lock)
#endif #endif
#define _PLIST_HEAD_INIT(head) \
.prio_list = LIST_HEAD_INIT((head).prio_list), \
.node_list = LIST_HEAD_INIT((head).node_list)
/** /**
* PLIST_HEAD_INIT - static struct plist_head initializer * PLIST_HEAD_INIT - static struct plist_head initializer
* @head: struct plist_head variable name * @head: struct plist_head variable name
...@@ -103,8 +107,7 @@ struct plist_node { ...@@ -103,8 +107,7 @@ struct plist_node {
*/ */
#define PLIST_HEAD_INIT(head, _lock) \ #define PLIST_HEAD_INIT(head, _lock) \
{ \ { \
.prio_list = LIST_HEAD_INIT((head).prio_list), \ _PLIST_HEAD_INIT(head), \
.node_list = LIST_HEAD_INIT((head).node_list), \
PLIST_HEAD_LOCK_INIT(&(_lock)) \ PLIST_HEAD_LOCK_INIT(&(_lock)) \
} }
...@@ -116,7 +119,7 @@ struct plist_node { ...@@ -116,7 +119,7 @@ struct plist_node {
#define PLIST_NODE_INIT(node, __prio) \ #define PLIST_NODE_INIT(node, __prio) \
{ \ { \
.prio = (__prio), \ .prio = (__prio), \
.plist = PLIST_HEAD_INIT((node).plist, NULL), \ .plist = { _PLIST_HEAD_INIT((node).plist) }, \
} }
/** /**
......
...@@ -998,6 +998,7 @@ struct sched_class { ...@@ -998,6 +998,7 @@ struct sched_class {
struct rq *busiest, struct sched_domain *sd, struct rq *busiest, struct sched_domain *sd,
enum cpu_idle_type idle); enum cpu_idle_type idle);
void (*pre_schedule) (struct rq *this_rq, struct task_struct *task); void (*pre_schedule) (struct rq *this_rq, struct task_struct *task);
int (*needs_post_schedule) (struct rq *this_rq);
void (*post_schedule) (struct rq *this_rq); void (*post_schedule) (struct rq *this_rq);
void (*task_wake_up) (struct rq *this_rq, struct task_struct *task); void (*task_wake_up) (struct rq *this_rq, struct task_struct *task);
...@@ -1052,6 +1053,10 @@ struct sched_entity { ...@@ -1052,6 +1053,10 @@ struct sched_entity {
u64 last_wakeup; u64 last_wakeup;
u64 avg_overlap; u64 avg_overlap;
u64 start_runtime;
u64 avg_wakeup;
u64 nr_migrations;
#ifdef CONFIG_SCHEDSTATS #ifdef CONFIG_SCHEDSTATS
u64 wait_start; u64 wait_start;
u64 wait_max; u64 wait_max;
...@@ -1067,7 +1072,6 @@ struct sched_entity { ...@@ -1067,7 +1072,6 @@ struct sched_entity {
u64 exec_max; u64 exec_max;
u64 slice_max; u64 slice_max;
u64 nr_migrations;
u64 nr_migrations_cold; u64 nr_migrations_cold;
u64 nr_failed_migrations_affine; u64 nr_failed_migrations_affine;
u64 nr_failed_migrations_running; u64 nr_failed_migrations_running;
...@@ -1164,6 +1168,7 @@ struct task_struct { ...@@ -1164,6 +1168,7 @@ struct task_struct {
#endif #endif
struct list_head tasks; struct list_head tasks;
struct plist_node pushable_tasks;
struct mm_struct *mm, *active_mm; struct mm_struct *mm, *active_mm;
...@@ -1675,6 +1680,16 @@ static inline int set_cpus_allowed(struct task_struct *p, cpumask_t new_mask) ...@@ -1675,6 +1680,16 @@ static inline int set_cpus_allowed(struct task_struct *p, cpumask_t new_mask)
return set_cpus_allowed_ptr(p, &new_mask); return set_cpus_allowed_ptr(p, &new_mask);
} }
/*
* Architectures can set this to 1 if they have specified
* CONFIG_HAVE_UNSTABLE_SCHED_CLOCK in their arch Kconfig,
* but then during bootup it turns out that sched_clock()
* is reliable after all:
*/
#ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
extern int sched_clock_stable;
#endif
extern unsigned long long sched_clock(void); extern unsigned long long sched_clock(void);
extern void sched_clock_init(void); extern void sched_clock_init(void);
......
...@@ -966,7 +966,6 @@ config SLABINFO ...@@ -966,7 +966,6 @@ config SLABINFO
config RT_MUTEXES config RT_MUTEXES
boolean boolean
select PLIST
config BASE_SMALL config BASE_SMALL
int int
......
...@@ -9,6 +9,44 @@ ...@@ -9,6 +9,44 @@
* as published by the Free Software Foundation; version 2 * as published by the Free Software Foundation; version 2
* of the License. * of the License.
*/ */
/*
* CONFIG_LATENCYTOP enables a kernel latency tracking infrastructure that is
* used by the "latencytop" userspace tool. The latency that is tracked is not
* the 'traditional' interrupt latency (which is primarily caused by something
* else consuming CPU), but instead, it is the latency an application encounters
* because the kernel sleeps on its behalf for various reasons.
*
* This code tracks 2 levels of statistics:
* 1) System level latency
* 2) Per process latency
*
* The latency is stored in fixed sized data structures in an accumulated form;
* if the "same" latency cause is hit twice, this will be tracked as one entry
* in the data structure. Both the count, total accumulated latency and maximum
* latency are tracked in this data structure. When the fixed size structure is
* full, no new causes are tracked until the buffer is flushed by writing to
* the /proc file; the userspace tool does this on a regular basis.
*
* A latency cause is identified by a stringified backtrace at the point that
* the scheduler gets invoked. The userland tool will use this string to
* identify the cause of the latency in human readable form.
*
* The information is exported via /proc/latency_stats and /proc/<pid>/latency.
* These files look like this:
*
* Latency Top version : v0.1
* 70 59433 4897 i915_irq_wait drm_ioctl vfs_ioctl do_vfs_ioctl sys_ioctl
* | | | |
* | | | +----> the stringified backtrace
* | | +---------> The maximum latency for this entry in microseconds
* | +--------------> The accumulated latency for this entry (microseconds)
* +-------------------> The number of times this entry is hit
*
* (note: the average latency is the accumulated latency divided by the number
* of times)
*/
#include <linux/latencytop.h> #include <linux/latencytop.h>
#include <linux/kallsyms.h> #include <linux/kallsyms.h>
#include <linux/seq_file.h> #include <linux/seq_file.h>
...@@ -72,7 +110,7 @@ account_global_scheduler_latency(struct task_struct *tsk, struct latency_record ...@@ -72,7 +110,7 @@ account_global_scheduler_latency(struct task_struct *tsk, struct latency_record
firstnonnull = i; firstnonnull = i;
continue; continue;
} }
for (q = 0 ; q < LT_BACKTRACEDEPTH ; q++) { for (q = 0; q < LT_BACKTRACEDEPTH; q++) {
unsigned long record = lat->backtrace[q]; unsigned long record = lat->backtrace[q];
if (latency_record[i].backtrace[q] != record) { if (latency_record[i].backtrace[q] != record) {
...@@ -101,31 +139,52 @@ account_global_scheduler_latency(struct task_struct *tsk, struct latency_record ...@@ -101,31 +139,52 @@ account_global_scheduler_latency(struct task_struct *tsk, struct latency_record
memcpy(&latency_record[i], lat, sizeof(struct latency_record)); memcpy(&latency_record[i], lat, sizeof(struct latency_record));
} }
static inline void store_stacktrace(struct task_struct *tsk, struct latency_record *lat) /*
* Iterator to store a backtrace into a latency record entry
*/
static inline void store_stacktrace(struct task_struct *tsk,
struct latency_record *lat)
{ {
struct stack_trace trace; struct stack_trace trace;
memset(&trace, 0, sizeof(trace)); memset(&trace, 0, sizeof(trace));
trace.max_entries = LT_BACKTRACEDEPTH; trace.max_entries = LT_BACKTRACEDEPTH;
trace.entries = &lat->backtrace[0]; trace.entries = &lat->backtrace[0];
trace.skip = 0;
save_stack_trace_tsk(tsk, &trace); save_stack_trace_tsk(tsk, &trace);
} }
/**
* __account_scheduler_latency - record an occured latency
* @tsk - the task struct of the task hitting the latency
* @usecs - the duration of the latency in microseconds
* @inter - 1 if the sleep was interruptible, 0 if uninterruptible
*
* This function is the main entry point for recording latency entries
* as called by the scheduler.
*
* This function has a few special cases to deal with normal 'non-latency'
* sleeps: specifically, interruptible sleep longer than 5 msec is skipped
* since this usually is caused by waiting for events via select() and co.
*
* Negative latencies (caused by time going backwards) are also explicitly
* skipped.
*/
void __sched void __sched
account_scheduler_latency(struct task_struct *tsk, int usecs, int inter) __account_scheduler_latency(struct task_struct *tsk, int usecs, int inter)
{ {
unsigned long flags; unsigned long flags;
int i, q; int i, q;
struct latency_record lat; struct latency_record lat;
if (!latencytop_enabled)
return;
/* Long interruptible waits are generally user requested... */ /* Long interruptible waits are generally user requested... */
if (inter && usecs > 5000) if (inter && usecs > 5000)
return; return;
/* Negative sleeps are time going backwards */
/* Zero-time sleeps are non-interesting */
if (usecs <= 0)
return;
memset(&lat, 0, sizeof(lat)); memset(&lat, 0, sizeof(lat));
lat.count = 1; lat.count = 1;
lat.time = usecs; lat.time = usecs;
...@@ -143,12 +202,12 @@ account_scheduler_latency(struct task_struct *tsk, int usecs, int inter) ...@@ -143,12 +202,12 @@ account_scheduler_latency(struct task_struct *tsk, int usecs, int inter)
if (tsk->latency_record_count >= LT_SAVECOUNT) if (tsk->latency_record_count >= LT_SAVECOUNT)
goto out_unlock; goto out_unlock;
for (i = 0; i < LT_SAVECOUNT ; i++) { for (i = 0; i < LT_SAVECOUNT; i++) {
struct latency_record *mylat; struct latency_record *mylat;
int same = 1; int same = 1;
mylat = &tsk->latency_record[i]; mylat = &tsk->latency_record[i];
for (q = 0 ; q < LT_BACKTRACEDEPTH ; q++) { for (q = 0; q < LT_BACKTRACEDEPTH; q++) {
unsigned long record = lat.backtrace[q]; unsigned long record = lat.backtrace[q];
if (mylat->backtrace[q] != record) { if (mylat->backtrace[q] != record) {
...@@ -186,7 +245,7 @@ static int lstats_show(struct seq_file *m, void *v) ...@@ -186,7 +245,7 @@ static int lstats_show(struct seq_file *m, void *v)
for (i = 0; i < MAXLR; i++) { for (i = 0; i < MAXLR; i++) {
if (latency_record[i].backtrace[0]) { if (latency_record[i].backtrace[0]) {
int q; int q;
seq_printf(m, "%i %li %li ", seq_printf(m, "%i %lu %lu ",
latency_record[i].count, latency_record[i].count,
latency_record[i].time, latency_record[i].time,
latency_record[i].max); latency_record[i].max);
...@@ -223,7 +282,7 @@ static int lstats_open(struct inode *inode, struct file *filp) ...@@ -223,7 +282,7 @@ static int lstats_open(struct inode *inode, struct file *filp)
return single_open(filp, lstats_show, NULL); return single_open(filp, lstats_show, NULL);
} }
static struct file_operations lstats_fops = { static const struct file_operations lstats_fops = {
.open = lstats_open, .open = lstats_open,
.read = seq_read, .read = seq_read,
.write = lstats_write, .write = lstats_write,
...@@ -236,4 +295,4 @@ static int __init init_lstats_procfs(void) ...@@ -236,4 +295,4 @@ static int __init init_lstats_procfs(void)
proc_create("latency_stats", 0644, NULL, &lstats_fops); proc_create("latency_stats", 0644, NULL, &lstats_fops);
return 0; return 0;
} }
__initcall(init_lstats_procfs); device_initcall(init_lstats_procfs);
This diff is collapsed.
...@@ -24,11 +24,11 @@ ...@@ -24,11 +24,11 @@
* The clock: sched_clock_cpu() is monotonic per cpu, and should be somewhat * The clock: sched_clock_cpu() is monotonic per cpu, and should be somewhat
* consistent between cpus (never more than 2 jiffies difference). * consistent between cpus (never more than 2 jiffies difference).
*/ */
#include <linux/sched.h>
#include <linux/percpu.h>
#include <linux/spinlock.h> #include <linux/spinlock.h>
#include <linux/ktime.h>
#include <linux/module.h> #include <linux/module.h>
#include <linux/percpu.h>
#include <linux/ktime.h>
#include <linux/sched.h>
/* /*
* Scheduler clock - returns current time in nanosec units. * Scheduler clock - returns current time in nanosec units.
...@@ -43,6 +43,7 @@ unsigned long long __attribute__((weak)) sched_clock(void) ...@@ -43,6 +43,7 @@ unsigned long long __attribute__((weak)) sched_clock(void)
static __read_mostly int sched_clock_running; static __read_mostly int sched_clock_running;
#ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
__read_mostly int sched_clock_stable;
struct sched_clock_data { struct sched_clock_data {
/* /*
...@@ -87,7 +88,7 @@ void sched_clock_init(void) ...@@ -87,7 +88,7 @@ void sched_clock_init(void)
} }
/* /*
* min,max except they take wrapping into account * min, max except they take wrapping into account
*/ */
static inline u64 wrap_min(u64 x, u64 y) static inline u64 wrap_min(u64 x, u64 y)
...@@ -111,15 +112,13 @@ static u64 __update_sched_clock(struct sched_clock_data *scd, u64 now) ...@@ -111,15 +112,13 @@ static u64 __update_sched_clock(struct sched_clock_data *scd, u64 now)
s64 delta = now - scd->tick_raw; s64 delta = now - scd->tick_raw;
u64 clock, min_clock, max_clock; u64 clock, min_clock, max_clock;
WARN_ON_ONCE(!irqs_disabled());
if (unlikely(delta < 0)) if (unlikely(delta < 0))
delta = 0; delta = 0;
/* /*
* scd->clock = clamp(scd->tick_gtod + delta, * scd->clock = clamp(scd->tick_gtod + delta,
* max(scd->tick_gtod, scd->clock), * max(scd->tick_gtod, scd->clock),
* scd->tick_gtod + TICK_NSEC); * scd->tick_gtod + TICK_NSEC);
*/ */
clock = scd->tick_gtod + delta; clock = scd->tick_gtod + delta;
...@@ -148,12 +147,13 @@ static void lock_double_clock(struct sched_clock_data *data1, ...@@ -148,12 +147,13 @@ static void lock_double_clock(struct sched_clock_data *data1,
u64 sched_clock_cpu(int cpu) u64 sched_clock_cpu(int cpu)
{ {
struct sched_clock_data *scd = cpu_sdc(cpu);
u64 now, clock, this_clock, remote_clock; u64 now, clock, this_clock, remote_clock;
struct sched_clock_data *scd;
if (unlikely(!sched_clock_running)) if (sched_clock_stable)
return 0ull; return sched_clock();
scd = cpu_sdc(cpu);
WARN_ON_ONCE(!irqs_disabled()); WARN_ON_ONCE(!irqs_disabled());
now = sched_clock(); now = sched_clock();
...@@ -195,14 +195,18 @@ u64 sched_clock_cpu(int cpu) ...@@ -195,14 +195,18 @@ u64 sched_clock_cpu(int cpu)
void sched_clock_tick(void) void sched_clock_tick(void)
{ {
struct sched_clock_data *scd = this_scd(); struct sched_clock_data *scd;
u64 now, now_gtod; u64 now, now_gtod;
if (sched_clock_stable)
return;
if (unlikely(!sched_clock_running)) if (unlikely(!sched_clock_running))
return; return;
WARN_ON_ONCE(!irqs_disabled()); WARN_ON_ONCE(!irqs_disabled());
scd = this_scd();
now_gtod = ktime_to_ns(ktime_get()); now_gtod = ktime_to_ns(ktime_get());
now = sched_clock(); now = sched_clock();
...@@ -250,7 +254,7 @@ u64 sched_clock_cpu(int cpu) ...@@ -250,7 +254,7 @@ u64 sched_clock_cpu(int cpu)
return sched_clock(); return sched_clock();
} }
#endif #endif /* CONFIG_HAVE_UNSTABLE_SCHED_CLOCK */
unsigned long long cpu_clock(int cpu) unsigned long long cpu_clock(int cpu)
{ {
......
...@@ -272,7 +272,6 @@ static void print_cpu(struct seq_file *m, int cpu) ...@@ -272,7 +272,6 @@ static void print_cpu(struct seq_file *m, int cpu)
P(nr_switches); P(nr_switches);
P(nr_load_updates); P(nr_load_updates);
P(nr_uninterruptible); P(nr_uninterruptible);
SEQ_printf(m, " .%-30s: %lu\n", "jiffies", jiffies);
PN(next_balance); PN(next_balance);
P(curr->pid); P(curr->pid);
PN(clock); PN(clock);
...@@ -287,9 +286,6 @@ static void print_cpu(struct seq_file *m, int cpu) ...@@ -287,9 +286,6 @@ static void print_cpu(struct seq_file *m, int cpu)
#ifdef CONFIG_SCHEDSTATS #ifdef CONFIG_SCHEDSTATS
#define P(n) SEQ_printf(m, " .%-30s: %d\n", #n, rq->n); #define P(n) SEQ_printf(m, " .%-30s: %d\n", #n, rq->n);
P(yld_exp_empty);
P(yld_act_empty);
P(yld_both_empty);
P(yld_count); P(yld_count);
P(sched_switch); P(sched_switch);
...@@ -314,7 +310,7 @@ static int sched_debug_show(struct seq_file *m, void *v) ...@@ -314,7 +310,7 @@ static int sched_debug_show(struct seq_file *m, void *v)
u64 now = ktime_to_ns(ktime_get()); u64 now = ktime_to_ns(ktime_get());
int cpu; int cpu;
SEQ_printf(m, "Sched Debug Version: v0.08, %s %.*s\n", SEQ_printf(m, "Sched Debug Version: v0.09, %s %.*s\n",
init_utsname()->release, init_utsname()->release,
(int)strcspn(init_utsname()->version, " "), (int)strcspn(init_utsname()->version, " "),
init_utsname()->version); init_utsname()->version);
...@@ -325,6 +321,7 @@ static int sched_debug_show(struct seq_file *m, void *v) ...@@ -325,6 +321,7 @@ static int sched_debug_show(struct seq_file *m, void *v)
SEQ_printf(m, " .%-40s: %Ld\n", #x, (long long)(x)) SEQ_printf(m, " .%-40s: %Ld\n", #x, (long long)(x))
#define PN(x) \ #define PN(x) \
SEQ_printf(m, " .%-40s: %Ld.%06ld\n", #x, SPLIT_NS(x)) SEQ_printf(m, " .%-40s: %Ld.%06ld\n", #x, SPLIT_NS(x))
P(jiffies);
PN(sysctl_sched_latency); PN(sysctl_sched_latency);
PN(sysctl_sched_min_granularity); PN(sysctl_sched_min_granularity);
PN(sysctl_sched_wakeup_granularity); PN(sysctl_sched_wakeup_granularity);
...@@ -397,6 +394,7 @@ void proc_sched_show_task(struct task_struct *p, struct seq_file *m) ...@@ -397,6 +394,7 @@ void proc_sched_show_task(struct task_struct *p, struct seq_file *m)
PN(se.vruntime); PN(se.vruntime);
PN(se.sum_exec_runtime); PN(se.sum_exec_runtime);
PN(se.avg_overlap); PN(se.avg_overlap);
PN(se.avg_wakeup);
nr_switches = p->nvcsw + p->nivcsw; nr_switches = p->nvcsw + p->nivcsw;
......
...@@ -1314,16 +1314,63 @@ static int select_task_rq_fair(struct task_struct *p, int sync) ...@@ -1314,16 +1314,63 @@ static int select_task_rq_fair(struct task_struct *p, int sync)
} }
#endif /* CONFIG_SMP */ #endif /* CONFIG_SMP */
static unsigned long wakeup_gran(struct sched_entity *se) /*
* Adaptive granularity
*
* se->avg_wakeup gives the average time a task runs until it does a wakeup,
* with the limit of wakeup_gran -- when it never does a wakeup.
*
* So the smaller avg_wakeup is the faster we want this task to preempt,
* but we don't want to treat the preemptee unfairly and therefore allow it
* to run for at least the amount of time we'd like to run.
*
* NOTE: we use 2*avg_wakeup to increase the probability of actually doing one
*
* NOTE: we use *nr_running to scale with load, this nicely matches the
* degrading latency on load.
*/
static unsigned long
adaptive_gran(struct sched_entity *curr, struct sched_entity *se)
{
u64 this_run = curr->sum_exec_runtime - curr->prev_sum_exec_runtime;
u64 expected_wakeup = 2*se->avg_wakeup * cfs_rq_of(se)->nr_running;
u64 gran = 0;
if (this_run < expected_wakeup)
gran = expected_wakeup - this_run;
return min_t(s64, gran, sysctl_sched_wakeup_granularity);
}
static unsigned long
wakeup_gran(struct sched_entity *curr, struct sched_entity *se)
{ {
unsigned long gran = sysctl_sched_wakeup_granularity; unsigned long gran = sysctl_sched_wakeup_granularity;
if (cfs_rq_of(curr)->curr && sched_feat(ADAPTIVE_GRAN))
gran = adaptive_gran(curr, se);
/* /*
* More easily preempt - nice tasks, while not making it harder for * Since its curr running now, convert the gran from real-time
* + nice tasks. * to virtual-time in his units.
*/ */
if (!sched_feat(ASYM_GRAN) || se->load.weight > NICE_0_LOAD) if (sched_feat(ASYM_GRAN)) {
gran = calc_delta_fair(sysctl_sched_wakeup_granularity, se); /*
* By using 'se' instead of 'curr' we penalize light tasks, so
* they get preempted easier. That is, if 'se' < 'curr' then
* the resulting gran will be larger, therefore penalizing the
* lighter, if otoh 'se' > 'curr' then the resulting gran will
* be smaller, again penalizing the lighter task.
*
* This is especially important for buddies when the leftmost
* task is higher priority than the buddy.
*/
if (unlikely(se->load.weight != NICE_0_LOAD))
gran = calc_delta_fair(gran, se);
} else {
if (unlikely(curr->load.weight != NICE_0_LOAD))
gran = calc_delta_fair(gran, curr);
}
return gran; return gran;
} }
...@@ -1350,7 +1397,7 @@ wakeup_preempt_entity(struct sched_entity *curr, struct sched_entity *se) ...@@ -1350,7 +1397,7 @@ wakeup_preempt_entity(struct sched_entity *curr, struct sched_entity *se)
if (vdiff <= 0) if (vdiff <= 0)
return -1; return -1;
gran = wakeup_gran(curr); gran = wakeup_gran(curr, se);
if (vdiff > gran) if (vdiff > gran)
return 1; return 1;
......
SCHED_FEAT(NEW_FAIR_SLEEPERS, 1) SCHED_FEAT(NEW_FAIR_SLEEPERS, 1)
SCHED_FEAT(NORMALIZED_SLEEPER, 1) SCHED_FEAT(NORMALIZED_SLEEPER, 0)
SCHED_FEAT(ADAPTIVE_GRAN, 1)
SCHED_FEAT(WAKEUP_PREEMPT, 1) SCHED_FEAT(WAKEUP_PREEMPT, 1)
SCHED_FEAT(START_DEBIT, 1) SCHED_FEAT(START_DEBIT, 1)
SCHED_FEAT(AFFINE_WAKEUPS, 1) SCHED_FEAT(AFFINE_WAKEUPS, 1)
......
This diff is collapsed.
...@@ -4,7 +4,7 @@ ...@@ -4,7 +4,7 @@
* bump this up when changing the output format or the meaning of an existing * bump this up when changing the output format or the meaning of an existing
* format, so that tools can adapt (or abort) * format, so that tools can adapt (or abort)
*/ */
#define SCHEDSTAT_VERSION 14 #define SCHEDSTAT_VERSION 15
static int show_schedstat(struct seq_file *seq, void *v) static int show_schedstat(struct seq_file *seq, void *v)
{ {
...@@ -26,9 +26,8 @@ static int show_schedstat(struct seq_file *seq, void *v) ...@@ -26,9 +26,8 @@ static int show_schedstat(struct seq_file *seq, void *v)
/* runqueue-specific stats */ /* runqueue-specific stats */
seq_printf(seq, seq_printf(seq,
"cpu%d %u %u %u %u %u %u %u %u %u %llu %llu %lu", "cpu%d %u %u %u %u %u %u %llu %llu %lu",
cpu, rq->yld_both_empty, cpu, rq->yld_count,
rq->yld_act_empty, rq->yld_exp_empty, rq->yld_count,
rq->sched_switch, rq->sched_count, rq->sched_goidle, rq->sched_switch, rq->sched_count, rq->sched_goidle,
rq->ttwu_count, rq->ttwu_local, rq->ttwu_count, rq->ttwu_local,
rq->rq_cpu_time, rq->rq_cpu_time,
......
...@@ -136,12 +136,6 @@ config TEXTSEARCH_BM ...@@ -136,12 +136,6 @@ config TEXTSEARCH_BM
config TEXTSEARCH_FSM config TEXTSEARCH_FSM
tristate tristate
#
# plist support is select#ed if needed
#
config PLIST
boolean
config HAS_IOMEM config HAS_IOMEM
boolean boolean
depends on !NO_IOMEM depends on !NO_IOMEM
......
...@@ -11,7 +11,8 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \ ...@@ -11,7 +11,8 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
rbtree.o radix-tree.o dump_stack.o \ rbtree.o radix-tree.o dump_stack.o \
idr.o int_sqrt.o extable.o prio_tree.o \ idr.o int_sqrt.o extable.o prio_tree.o \
sha1.o irq_regs.o reciprocal_div.o argv_split.o \ sha1.o irq_regs.o reciprocal_div.o argv_split.o \
proportions.o prio_heap.o ratelimit.o show_mem.o is_single_threaded.o proportions.o prio_heap.o ratelimit.o show_mem.o \
is_single_threaded.o plist.o
lib-$(CONFIG_MMU) += ioremap.o lib-$(CONFIG_MMU) += ioremap.o
lib-$(CONFIG_SMP) += cpumask.o lib-$(CONFIG_SMP) += cpumask.o
...@@ -40,7 +41,6 @@ lib-$(CONFIG_GENERIC_FIND_NEXT_BIT) += find_next_bit.o ...@@ -40,7 +41,6 @@ lib-$(CONFIG_GENERIC_FIND_NEXT_BIT) += find_next_bit.o
lib-$(CONFIG_GENERIC_FIND_LAST_BIT) += find_last_bit.o lib-$(CONFIG_GENERIC_FIND_LAST_BIT) += find_last_bit.o
obj-$(CONFIG_GENERIC_HWEIGHT) += hweight.o obj-$(CONFIG_GENERIC_HWEIGHT) += hweight.o
obj-$(CONFIG_LOCK_KERNEL) += kernel_lock.o obj-$(CONFIG_LOCK_KERNEL) += kernel_lock.o
obj-$(CONFIG_PLIST) += plist.o
obj-$(CONFIG_DEBUG_PREEMPT) += smp_processor_id.o obj-$(CONFIG_DEBUG_PREEMPT) += smp_processor_id.o
obj-$(CONFIG_DEBUG_LIST) += list_debug.o obj-$(CONFIG_DEBUG_LIST) += list_debug.o
obj-$(CONFIG_DEBUG_OBJECTS) += debugobjects.o obj-$(CONFIG_DEBUG_OBJECTS) += debugobjects.o
......
...@@ -39,7 +39,7 @@ static __cacheline_aligned_in_smp DEFINE_SPINLOCK(kernel_flag); ...@@ -39,7 +39,7 @@ static __cacheline_aligned_in_smp DEFINE_SPINLOCK(kernel_flag);
int __lockfunc __reacquire_kernel_lock(void) int __lockfunc __reacquire_kernel_lock(void)
{ {
while (!_raw_spin_trylock(&kernel_flag)) { while (!_raw_spin_trylock(&kernel_flag)) {
if (test_thread_flag(TIF_NEED_RESCHED)) if (need_resched())
return -EAGAIN; return -EAGAIN;
cpu_relax(); cpu_relax();
} }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment