Commits · e5f32a3856caabe745381279f7f32e3b581b59dc · nexedi / linux

15 Oct, 2007 40 commits

sched: speed up context-switches a bit · e5f32a38

Ingo Molnar authored Oct 15, 2007

speed up context-switches a bit by not clearing p->exec_start.

(as a side-effect, this also makes p->exec_start a universal timestamp
available to cache-hot estimations.)
Signed-off-by: Ingo Molnar <mingo@elte.hu>

e5f32a38

sched: do not wakeup-preempt with SCHED_BATCH tasks · 91c234b4

Ingo Molnar authored Oct 15, 2007

do not wakeup-preempt with SCHED_BATCH tasks, their preemption
is batched too, driven by the tick.
Signed-off-by: Ingo Molnar <mingo@elte.hu>

91c234b4

sched: generate uevents for user creation/destruction · fb7dde37

Srivatsa Vaddagiri authored Oct 15, 2007

Generate uevents when a user is being created/destroyed. These events
can be used to configure cpu share of a new user.
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

fb7dde37

sched: do not normalize kernel threads via SysRq-N · 178be793

Ingo Molnar authored Oct 15, 2007

do not normalize kernel threads via SysRq-N: the migration threads,
softlockup threads, etc. might be essential for the system to
function properly. So only zap user tasks.

pointed out by Andi Kleen.
Signed-off-by: Ingo Molnar <mingo@elte.hu>

178be793

sched: remove stale comment from sched_group_set_shares() · 1666703a

Andi Kleen authored Oct 15, 2007

remove stale comment from sched_group_set_shares().

Function never returns -EINVAL.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

1666703a

sched: clean up is_migration_thread() · d5036e89

Ingo Molnar authored Oct 15, 2007

clean up is_migration_thread() and turn it into an inline function.
Signed-off-by: Ingo Molnar <mingo@elte.hu>

d5036e89

sched: cleanup: refactor normalize_rt_tasks · 3a5e4dc1

Andi Kleen authored Oct 15, 2007

Replace a particularly ugly ifdef with an inline and a new macro.
Also split up the function to be easier to read.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

3a5e4dc1

sched: cleanup: refactor common code of sleep_on / wait_for_completion · 8cbbe86d

Andi Kleen authored Oct 15, 2007

Refactor common code of sleep_on / wait_for_completion

These functions were largely cut'n'pasted. This moves
the common code into single helpers instead.  Advantage
is about 1k less code on x86-64 and 91 lines of code removed.
It adds one function call to the non timeout version of
the functions; i don't expect this to be measurable.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

8cbbe86d

sched: cleanup: remove unnecessary gotos · 3a5c359a

Andi Kleen authored Oct 15, 2007

Replace loops implemented with gotos with real loops.
Replace err = ...; goto x; x: return err; with return ...;

No functional changes.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

3a5c359a

sched: update comment · d274a4ce

Ingo Molnar authored Oct 15, 2007

update comment: clarify time-slices and remove obsolete tuning detail.
Signed-off-by: Ingo Molnar <mingo@elte.hu>

d274a4ce

sched: prevent wakeup over-scheduling · 95938a35

Mike Galbraith authored Oct 15, 2007

Prevent wakeup over-scheduling.  Once a task has been preempted by a
task of the same or lower priority, it becomes ineligible for repeated
preemption by same until it has been ticked, or slept.  Instead, the
task is marked for preemption at the next tick.  Tasks of higher
priority still preempt immediately.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

95938a35

sched: disable forced preemption by default · ce6c1311

Peter Zijlstra authored Oct 15, 2007

Implement feature bit to disable forced preemption. This way
it can be checked whether a workload is overscheduling or not.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

ce6c1311

sched: fix group scheduling for SCHED_BATCH · e62dd02e

Dmitry Adamushko authored Oct 15, 2007

The following patch (sched: disable sleeper_fairness on SCHED_BATCH)
seems to break GROUP_SCHED. Although, it may be 'oops'-less due to the
possibility of 'p' being always a valid address.
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

e62dd02e

sched: some proc entries are missed in sched_domain sys_ctl debug code · ace8b3d6

Zou Nan hai authored Oct 15, 2007

cache_nice_tries and flags entry do not appear in proc fs sched_domain
directory, because ctl_table entry is skipped.

This patch fixes the issue.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

ace8b3d6

sched: fix rt ptracer monopolizing CPU · 638e13ac

Gautham R Shenoy authored Oct 15, 2007

yield() in wait_task_inactive(), can cause a high priority thread to be
scheduled back in, and there by loop forever while it is waiting for some
lower priority thread which is unfortunately still on the runqueue.

Use schedule_timeout_uninterruptible(1) instead.
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Credit: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

638e13ac

sched: group scheduling, sysfs tunables · 5cb350ba

Dhaval Giani authored Oct 15, 2007

Add tunables in sysfs to modify a user's cpu share.

A directory is created in sysfs for each new user in the system.

	/sys/kernel/uids/<uid>/cpu_share

Reading this file returns the cpu shares granted for the user.
Writing into this file modifies the cpu share for the user. Only an
administrator is allowed to modify a user's cpu share.

Ex:
	# cd /sys/kernel/uids/
	# cat 512/cpu_share
	1024
	# echo 2048 > 512/cpu_share
	# cat 512/cpu_share
	2048
	#
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

5cb350ba

sched: disable sleeper_fairness on SCHED_BATCH · 8ca0e14f

Peter Zijlstra authored Oct 15, 2007

disable sleeper fairness for batch tasks - they are about
batch processing after all.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

8ca0e14f

sched: another wakeup_granularity fix · 810e95cc

Peter Zijlstra authored Oct 15, 2007

unit mis-match: wakeup_gran was used against a vruntime
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

810e95cc

sched: export cpu_clock() · a58f6f25

Paul E. McKenney authored Oct 15, 2007

export cpu_clock() - the preferred API instead of sched_clock().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

a58f6f25

sched: fix: move the CPU check into ->task_new_fair() · 00bf7bfc

Ingo Molnar authored Oct 15, 2007

noticed by Peter Zijlstra:

fix: move the CPU check into ->task_new_fair(), this way we
can call place_entity() and get child ->vruntime right at
initial wakeup time.

(without this there can be large latencies)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>

00bf7bfc

sched: cleanup: function prototype cleanups · 0702e3eb

Ingo Molnar authored Oct 15, 2007

noticed by Thomas Gleixner:

cleanup: function prototype cleanups - move into single line
wherever possible.
Signed-off-by: Ingo Molnar <mingo@elte.hu>

0702e3eb

sched: cleanup: rename task_grp to task_group · 4cf86d77

Ingo Molnar authored Oct 15, 2007

cleanup: rename task_grp to task_group. No need to save two characters
and 'grp' is annoying to read.
Signed-off-by: Ingo Molnar <mingo@elte.hu>

4cf86d77

sched: cleanup: rename SCHED_FEAT_USE_TREE_AVG to SCHED_FEAT_TREE_AVG · 06877c33

Ingo Molnar authored Oct 15, 2007

cleanup: rename SCHED_FEAT_USE_TREE_AVG to SCHED_FEAT_TREE_AVG, to
make SCHED_FEAT_ names more consistent.
Signed-off-by: Ingo Molnar <mingo@elte.hu>

06877c33

sched: kfree(NULL) is valid · a65914b3

Ingo Molnar authored Oct 15, 2007

kfree(NULL) is valid.

pointed out by checkpatch.pl.

the fix shrinks the code a bit:

   text    data     bss     dec     hex filename
  40024    3842     100   43966    abbe sched.o.before
  40002    3842     100   43944    aba8 sched.o.after
Signed-off-by: Ingo Molnar <mingo@elte.hu>

a65914b3

sched: style cleanup · 8927f494

Ingo Molnar authored Oct 15, 2007

fix up __setup() style bug - noticed via checkpatch.pl.
Signed-off-by: Ingo Molnar <mingo@elte.hu>

8927f494

sched: break out if printing a warning in sched_domain_debug() · 26797a34

Ingo Molnar authored Oct 15, 2007

checkpatch.pl and Andy Whitcroft noticed the following bug: we did
not break out after printing an error.
Signed-off-by: Ingo Molnar <mingo@elte.hu>

26797a34

sched: run sched_domain_debug() if CONFIG_SCHED_DEBUG=y · 3e9830dc

Ingo Molnar authored Oct 15, 2007

run sched_domain_debug() if CONFIG_SCHED_DEBUG=y, instead
of relying on the hand-crafted SCHED_DOMAIN_DEBUG switch.
Signed-off-by: Ingo Molnar <mingo@elte.hu>

3e9830dc

sched: cleanup, remove the TASK_NONINTERACTIVE flag · af927232

Mike Galbraith authored Oct 15, 2007

Here's another piece of low hanging obsolete fruit.

Remove obsolete TASK_NONINTERACTIVE.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

af927232

sched: cleanup, make dequeue_entity() and update_stats_wait_end() similar · a2a2d680

Dmitry Adamushko authored Oct 15, 2007

make dequeue_entity() / enqueue_entity() and update_stats_dequeue() /
update_stats_enqueue() look similar, structure-wise.

zero effect, functionality-wise:

   text    data     bss     dec     hex filename
  34550    3026     100   37676    932c sched.o.before
  34550    3026     100   37676    932c sched.o.after
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

a2a2d680

sched: cleanup, remove calc_weighted() · a03c9061

Dmitry Adamushko authored Oct 15, 2007

remove obsolete code -- calc_weighted()
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

a03c9061

sched: tidy up SCHED_RR · a4ec24b4

Dmitry Adamushko authored Oct 15, 2007

- make timeslices of SCHED_RR tasks constant and not
dependent on task's static_prio [1] ;
- remove obsolete code (timeslice related bits);
- make sched_rr_get_interval() return something more
meaningful [2] for SCHED_OTHER tasks.

[1] according to the following link, it's not compliant with SUSv3
(not sure though, what is the reference for us :-)
http://lkml.org/lkml/2007/3/7/656

[2] the interval is dynamic and can be depicted as follows "should a
task be one of the runnable tasks at this particular moment, it would
expect to run for this interval of time before being re-scheduled by the
scheduler tick".
(i.e. it's more precise if a task is runnable at the moment)

yeah, this seems to require task_rq_lock/unlock() but this is not a hot
path.

results:

(SCHED_FIFO)

dimm@earth:~/storage/prog$ sudo chrt -f 10 ./rr_interval 
time_slice: 0 : 0

(SCHED_RR)

dimm@earth:~/storage/prog$ sudo chrt 10 ./rr_interval 
time_slice: 0 : 99984800

(SCHED_NORMAL)

dimm@earth:~/storage/prog$ ./rr_interval 
time_slice: 0 : 19996960

(SCHED_NORMAL + a cpu_hog of similar 'weight' on the same CPU --- so should be a half of the previous result)

dimm@earth:~/storage/prog$ taskset 1 ./rr_interval 
time_slice: 0 : 9998480
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

a4ec24b4

sched: uninline scheduler · a9957449

Alexey Dobriyan authored Oct 15, 2007

* save ~300 bytes
* activate_idle_task() was moved to avoid a warning

bloat-o-meter output:

add/remove: 6/0 grow/shrink: 0/16 up/down: 438/-733 (-295)		<===
function                                     old     new   delta
__enqueue_entity                               -     165    +165
finish_task_switch                             -     110    +110
update_curr_rt                                 -      79     +79
__load_balance_iterator                        -      32     +32
__task_rq_unlock                               -      28     +28
find_process_by_pid                            -      24     +24
do_sched_setscheduler                        133     123     -10
sys_sched_rr_get_interval                    176     165     -11
sys_sched_getparam                           156     145     -11
normalize_rt_tasks                           482     470     -12
sched_getaffinity                            112      99     -13
sys_sched_getscheduler                        86      72     -14
sched_setaffinity                            226     212     -14
sched_setscheduler                           666     642     -24
load_balance_start_fair                       33       9     -24
load_balance_next_fair                        33       9     -24
dequeue_task_rt                              133      67     -66
put_prev_task_rt                              97      28     -69
schedule_tail                                133      50     -83
schedule                                     682     594     -88
enqueue_entity                               499     366    -133
task_new_fair                                317     180    -137
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

a9957449

sched: tweak wakeup granularity · 155bb293
Ingo Molnar authored Oct 15, 2007
```
tweak wakeup granularity.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
```
155bb293

sched: optimize schedule() a bit on SMP · 1e819950

Ingo Molnar authored Oct 15, 2007

optimize schedule() a bit on SMP, by moving the rq-clock update
outside the rq lock.

code size is the same:

      text    data     bss     dec     hex filename
     25725    2666      96   28487    6f47 sched.o.before
     25725    2666      96   28487    6f47 sched.o.after
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

1e819950

sched: fix __pick_next_entity() · 08ec3df5

Dmitry Adamushko authored Oct 15, 2007

The thing is that __pick_next_entity() must never be called when
first_fair(cfs_rq) == NULL. It wouldn't be a problem, should 'run_node'
be the very first field of 'struct sched_entity' (and it's the second).

The 'nr_running != 0' check is _not_ enough, due to the fact that
'current' is not within the tree. Generic paths are ok (e.g. schedule()
as put_prev_task() is called previously)... I'm more worried about e.g.
migration_call() -> CPU_DEAD_FROZEN -> migrate_dead_tasks()... if
'current' == rq->idle, no problems.. if it's one of the SCHED_NORMAL
tasks (or imagine, some other use-cases in the future -- i.e. we should
not make outer world dependent on internal details of sched_fair class)
-- it may be "Houston, we've got a problem" case.

it's +16 bytes to the ".text". Another variant is to make 'run_node' the
first data member of 'struct sched_entity' but an additional check (se !
= NULL) is still needed in pick_next_entity().
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

08ec3df5

sched: vslice fixups for non-0 nice levels · 647e7cac

Ingo Molnar authored Oct 15, 2007

Make vslice accurate wrt nice levels, and add some comments
while we're at it.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

647e7cac

sched: whitespace cleanups · 3a252015

Ingo Molnar authored Oct 15, 2007

more whitespace cleanups. No code changed:

      text    data     bss     dec     hex filename
     26553    2790     288   29631    73bf sched.o.before
     26553    2790     288   29631    73bf sched.o.after
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

3a252015

sched: mark scheduling classes as const · 5522d5d5

Ingo Molnar authored Oct 15, 2007

mark scheduling classes as const. The speeds up the code
a bit and shrinks it:

   text    data     bss     dec     hex filename
  40027    4018     292   44337    ad31 sched.o.before
  40190    3842     292   44324    ad24 sched.o.after
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

5522d5d5

sched: group scheduler, fix latency · b9fa3df3

Srivatsa Vaddagiri authored Oct 15, 2007

There is a possibility that because of task of a group moving from one
cpu to another, it may gain more cpu time that desired. See 
http://marc.info/?l=linux-kernel&m=119073197730334 for details.

This is an attempt to fix that problem. Basically it simulates dequeue
of higher level entities as if they are going to sleep. Similarly it
simulate wakeup of higher level entities as if they are waking up from
sleep.
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

b9fa3df3

sched: group scheduler, fix bloat · fad095a7

Srivatsa Vaddagiri authored Oct 15, 2007

Recent fix to check_preempt_wakeup() to check for preemption at higher
levels caused a size bloat for !CONFIG_FAIR_GROUP_SCHED.

Fix the problem.

  42277   10598     320   53195    cfcb kernel/sched.o-before_this_patch
  42216   10598     320   53134    cf8e kernel/sched.o-after_this_patch
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

fad095a7