Commit 81baae4f authored by Brendan Gregg's avatar Brendan Gregg Committed by yonghong-song

profile: exclude CPU idle stacks by default (#2166)

profile: exclude CPU idle stacks by default
parent 77f4f663
......@@ -50,6 +50,9 @@ Show stacks from user space only (no kernel space stacks).
\-K
Show stacks from kernel space only (no user space stacks).
.TP
\-I
Include CPU idle stacks (by default these are excluded).
.TP
\-\-stack-storage-size COUNT
The maximum number of unique stack traces that the kernel will count (default
16384). If the sampled count exceeds this, a warning will be printed.
......
......@@ -9,6 +9,8 @@
# counting there. Only the unique stacks and counts are passed to user space
# at the end of the profile, greatly reducing the kernel<->user transfer.
#
# By default CPU idle stacks are excluded by simply excluding PID 0.
#
# REQUIRES: Linux 4.9+ (BPF_PROG_TYPE_PERF_EVENT support). Under tools/old is
# a version of this tool that may work on Linux 4.6 - 4.8.
#
......@@ -22,6 +24,7 @@
#
# 15-Jul-2016 Brendan Gregg Created this.
# 20-Oct-2016 " " Switched to use the new 4.9 support.
# 26-Jan-2019 " " Changed to exclude CPU idle by default.
from __future__ import print_function
from bcc import BPF, PerfType, PerfSWConfig
......@@ -93,6 +96,8 @@ parser.add_argument("-d", "--delimited", action="store_true",
help="insert delimiter between kernel/user stacks")
parser.add_argument("-a", "--annotations", action="store_true",
help="add _[k] annotations to kernel frames")
parser.add_argument("-I", "--include-idle", action="store_true",
help="include CPU idle stacks")
parser.add_argument("-f", "--folded", action="store_true",
help="output folded format, one line per stack (for flame graphs)")
parser.add_argument("--stack-storage-size", default=16384,
......@@ -141,6 +146,9 @@ BPF_STACK_TRACE(stack_traces, STACK_STORAGE_SIZE);
int do_perf_event(struct bpf_perf_event_data *ctx) {
u32 pid = bpf_get_current_pid_tgid() >> 32;
if (IDLE_FILTER)
return 0;
if (!(THREAD_FILTER))
return 0;
......@@ -184,6 +192,12 @@ int do_perf_event(struct bpf_perf_event_data *ctx) {
}
"""
# set idle filter
idle_filter = "pid == 0"
if args.include_idle:
idle_filter = "0"
bpf_text = bpf_text.replace('IDLE_FILTER', idle_filter)
# set thread filter
thread_context = ""
perf_filter = "-a"
......
......@@ -41,6 +41,27 @@ Sampling at 49 Hertz of all threads by user + kernel stack... Hit Ctrl-C to end.
- func_ab (13549)
5
The output was long; I truncated some lines ("[...]").
This default output prints stack traces, followed by a line to describe the
process (a dash, the process name, and a PID in parenthesis), and then an
integer count of how many times this stack trace was sampled.
The func_ab process is running the func_a() function, called by main(),
called by __libc_start_main(), and called by "[unknown]" with what looks
like a bogus address (1st column). That's evidence of a broken stack trace.
It's common for user-level software that hasn't been compiled with frame
pointers (in this case, libc).
The dd process has called read(), and then enters the kernel via
entry_SYSCALL_64_fastpath(), calling sys_read(), and so on. Yes, I'm now
reading it bottom up. That way follows the code flow.
By default, CPU idle stacks are excluded. They can be included with -I:
# ./profile -I
[...]
native_safe_halt
......@@ -64,32 +85,16 @@ Sampling at 49 Hertz of all threads by user + kernel stack... Hit Ctrl-C to end.
- swapper/1 (0)
75
The output was long; I truncated some lines ("[...]").
This default output prints stack traces, followed by a line to describe the
process (a dash, the process name, and a PID in parenthesis), and then an
integer count of how many times this stack trace was sampled.
The output above shows the most frequent stack was from the "swapper/1"
process (PID 0), running the native_safe_halt() function, which was called
by default_idle(), which was called by arch_cpu_idle(), and so on. This is
the idle thread. Stacks can be read top-down, to follow ancestry: child,
parent, grandparent, etc.
The func_ab process is running the func_a() function, called by main(),
called by __libc_start_main(), and called by "[unknown]" with what looks
like a bogus address (1st column). That's evidence of a broken stack trace.
It's common for user-level software that hasn't been compiled with frame
pointers (in this case, libc).
The dd process has called read(), and then enters the kernel via
entry_SYSCALL_64_fastpath(), calling sys_read(), and so on. Yes, I'm now
reading it bottom up. That way follows the code flow.
The dd process is actually "dd if=/dev/zero of=/dev/null": it's a simple
workload to analyze that just moves bytes from /dev/zero to /dev/null.
Profiling just that process:
The dd process profiled ealrier is actually "dd if=/dev/zero of=/dev/null":
it's a simple workload to analyze that just moves bytes from /dev/zero to
/dev/null. Profiling just that process:
# ./profile -p 25036
Sampling at 49 Hertz of PID 25036 by user + kernel stack... Hit Ctrl-C to end.
......@@ -539,6 +544,8 @@ You can increase or decrease the sample frequency. Eg, sampling at 9 Hertz:
# ./profile -F 9
Sampling at 9 Hertz of all threads by user + kernel stack... Hit Ctrl-C to end.
^C
[...]
func_b
main
__libc_start_main
......@@ -548,27 +555,6 @@ Sampling at 9 Hertz of all threads by user + kernel stack... Hit Ctrl-C to end.
[...]
native_safe_halt
default_idle
arch_cpu_idle
default_idle_call
cpu_startup_entry
start_secondary
- swapper/3 (0)
8
native_safe_halt
default_idle
arch_cpu_idle
default_idle_call
cpu_startup_entry
rest_init
start_kernel
x86_64_start_reservations
x86_64_start_kernel
- swapper/0 (0)
8
You can also restrict profiling to just kernel stacks (-K) or user stacks (-U).
For example, just user stacks:
......@@ -707,24 +693,6 @@ Sampling at 49 Hertz of all threads by user stack... Hit Ctrl-C to end.
- dd (2931)
14
- swapper/7 (0)
46
- swapper/0 (0)
46
- swapper/2 (0)
46
- swapper/1 (0)
46
- swapper/3 (0)
46
- swapper/4 (0)
46
If there are too many unique stack traces for the kernel to save, a warning
will be printed. Eg:
......@@ -739,8 +707,8 @@ Run ./profile -h to see the default.
USAGE message:
# ./profile -h
usage: profile [-h] [-p PID] [-U | -K] [-F FREQUENCY | -c COUNT] [-d] [-a]
[-f] [--stack-storage-size STACK_STORAGE_SIZE]
usage: profile.py [-h] [-p PID] [-U | -K] [-F FREQUENCY | -c COUNT] [-d] [-a]
[-I] [-f] [--stack-storage-size STACK_STORAGE_SIZE] [-C CPU]
[duration]
Profile CPU stack traces at a timed interval
......@@ -763,11 +731,12 @@ optional arguments:
sample period, number of events
-d, --delimited insert delimiter between kernel/user stacks
-a, --annotations add _[k] annotations to kernel frames
-I, --include-idle include CPU idle stacks
-f, --folded output folded format, one line per stack (for flame
graphs)
--stack-storage-size STACK_STORAGE_SIZE
the number of unique stack traces that can be stored
and displayed (default 2048)
and displayed (default 16384)
-C CPU, --cpu CPU cpu number to run profile on
examples:
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment