Commit 0abd93e5 authored by 4ast's avatar 4ast Committed by GitHub

Merge pull request #870 from brendangregg/master

add cpuunclaimed
parents dd5799bc 06223652
......@@ -86,6 +86,7 @@ Examples:
- tools/[cachestat](tools/cachestat.py): Trace page cache hit/miss ratio. [Examples](tools/cachestat_example.txt).
- tools/[cachetop](tools/cachetop.py): Trace page cache hit/miss ratio by processes. [Examples](tools/cachetop_example.txt).
- tools/[cpudist](tools/cpudist.py): Summarize on- and off-CPU time per task as a histogram. [Examples](tools/cpudist_example.txt)
- tools/[cpuunclaimed](tools/cpuunclaimed.py): Sample CPU run queues and calculate unclaimed idle CPU. [Examples](tools/cpuunclaimed_example.txt)
- tools/[dcsnoop](tools/dcsnoop.py): Trace directory entry cache (dcache) lookups. [Examples](tools/dcsnoop_example.txt).
- tools/[dcstat](tools/dcstat.py): Directory entry cache (dcache) stats. [Examples](tools/dcstat_example.txt).
- tools/[execsnoop](tools/execsnoop.py): Trace new processes via exec() syscalls. [Examples](tools/execsnoop_example.txt).
......
.TH cpuunclaimed 8 "2016-12-21" "USER COMMANDS"
.SH NAME
cpuunclaimed \- Sample CPU run queues and calculate unclaimed idle CPU. Uses Linux eBPF/bcc.
.SH SYNOPSIS
.B cpuunclaimed
[\-T] [\-j] [\-J] [interval [count]]
.SH DESCRIPTION
This tool samples the length of the run queues and determine when there are idle
CPUs, yet queued threads waiting their turn. It reports the amount of idle
(yet unclaimed by waiting threads) CPU as a system-wide percentage.
This situation can happen for a number of reasons:
.IP -
An application has been bound to some, but not all, CPUs, and has runnable
threads that cannot migrate to other CPUs due to this configuration.
.IP -
CPU affinity: an optimization that leaves threads on CPUs where the CPU
caches are warm, even if this means short periods of waiting while other
CPUs are idle. The wait period is tunale (see sysctl, kernel.sched*).
.IP -
Scheduler bugs.
.P
An unclaimed idle of < 1% is likely to be CPU affinity, and not usually a
cause for concern. By leaving the CPU idle, overall throughput of the system
may be improved. This tool is best for identifying larger issues, > 2%, due
to the coarseness of its 99 Hertz samples.
This is an experimental tool that currently works by use of sampling to
keep overheads low. Tool assumptions:
.IP -
CPU samples consistently fire around the same offset. There will sometimes
be a lag as a sample is delayed by higher-priority interrupts, but it is
assumed the subsequent samples will catch up to the expected offsets (as
is seen in practice). You can use -J to inspect sample offsets. Some
systems can power down CPUs when idle, and when they wake up again they
may begin firing at a skewed offset: this tool will detect the skew, print
an error, and exit.
.IP -
All CPUs are online (see ncpu).
.P
If this identifies unclaimed CPU, you can double check it by dumping raw
samples (-j), as well as using other tracing tools to instrument scheduler
events (although this latter approach has much higher overhead).
Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_BPF and bcc.
.SH EXAMPLES
.TP
Sample and calculate unclaimed idle CPUs, output every 1 second (default:
#
.B cpuunclaimed
.TP
Print 5 second summaries, 10 times:
#
.B cpuunclaimed 5 10
.TP
Print 1 second summaries with timestamps:
#
.B cpuunclaimed \-T 1
.TP
Raw dump of all samples (verbose), as comma-separated values:
#
.B cpuunclaimed \-j
.SH FIELDS
.TP
%CPU
CPU utilization as a system-wide percentage.
.TP
unclaimed idle
Percentage of CPU resources that were idle when work was queued on other CPUs,
as a system-wide percentage.
.TP
TIME
Time (HH:MM:SS)
.TP
TIMESTAMP_ns
Timestamp, nanoseconds.
.TP
CPU#
CPU ID.
.TP
OFFSET_ns_CPU#
Time offset that a sample fired within a sample group for this CPU.
.SH OVERHEAD
The overhead is expected to be low/negligible as this tool uses sampling at
99 Hertz (on all CPUs), which has a fixed and low cost, rather than sampling
every scheduler event as many other approaches use (which can involve
instrumenting millions of events per second). Sampled CPUs, run queue lengths,
and timestamps are written to ring buffers that are periodically read by
user space for reporting. Measure overhead in a test environment.
.SH SOURCE
This is from bcc.
.IP
https://github.com/iovisor/bcc
.PP
Also look in the bcc distribution for a companion _examples.txt file containing
example usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
runqlen(8)
This diff is collapsed.
This diff is collapsed.
......@@ -84,8 +84,8 @@ int do_perf_event()
bpf_probe_read(&my_q, sizeof(my_q), &task->se.cfs_rq);
bpf_probe_read(&len, sizeof(len), &my_q->nr_running);
// Decrement idle thread by dropping the run queue by one. We could do
// this other ways if needed, like matching on task->pid.
// Calculate run queue length by subtracting the currently running task,
// if present. len 0 == idle, len 1 == one running task.
if (len > 0)
len--;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment