Commit 911b2898 authored by Peter Zijlstra's avatar Peter Zijlstra Committed by Ingo Molnar

sched: Optimize task_sched_runtime()

Large multi-threaded apps like to hit this using do_sys_times() and
then queue up on the rq->lock.

Avoid when possible.

Larry reported ~20% performance increase his test case.
Reported-by: default avatarLarry Woodman <lwoodman@redhat.com>
Suggested-by: default avatarPaul Turner <pjt@google.com>
Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20131111172925.GG26898@twins.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
parent 5eca82a9
...@@ -2253,6 +2253,20 @@ unsigned long long task_sched_runtime(struct task_struct *p) ...@@ -2253,6 +2253,20 @@ unsigned long long task_sched_runtime(struct task_struct *p)
struct rq *rq; struct rq *rq;
u64 ns = 0; u64 ns = 0;
#if defined(CONFIG_64BIT) && defined(CONFIG_SMP)
/*
* 64-bit doesn't need locks to atomically read a 64bit value.
* So we have a optimization chance when the task's delta_exec is 0.
* Reading ->on_cpu is racy, but this is ok.
*
* If we race with it leaving cpu, we'll take a lock. So we're correct.
* If we race with it entering cpu, unaccounted time is 0. This is
* indistinguishable from the read occurring a few cycles earlier.
*/
if (!p->on_cpu)
return p->se.sum_exec_runtime;
#endif
rq = task_rq_lock(p, &flags); rq = task_rq_lock(p, &flags);
ns = p->se.sum_exec_runtime + do_task_delta_exec(p, rq); ns = p->se.sum_exec_runtime + do_task_delta_exec(p, rq);
task_rq_unlock(rq, p, &flags); task_rq_unlock(rq, p, &flags);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment