Commit 81baae4f authored by Brendan Gregg's avatar Brendan Gregg Committed by yonghong-song

profile: exclude CPU idle stacks by default (#2166)

profile: exclude CPU idle stacks by default
parent 77f4f663
...@@ -50,6 +50,9 @@ Show stacks from user space only (no kernel space stacks). ...@@ -50,6 +50,9 @@ Show stacks from user space only (no kernel space stacks).
\-K \-K
Show stacks from kernel space only (no user space stacks). Show stacks from kernel space only (no user space stacks).
.TP .TP
\-I
Include CPU idle stacks (by default these are excluded).
.TP
\-\-stack-storage-size COUNT \-\-stack-storage-size COUNT
The maximum number of unique stack traces that the kernel will count (default The maximum number of unique stack traces that the kernel will count (default
16384). If the sampled count exceeds this, a warning will be printed. 16384). If the sampled count exceeds this, a warning will be printed.
......
...@@ -9,6 +9,8 @@ ...@@ -9,6 +9,8 @@
# counting there. Only the unique stacks and counts are passed to user space # counting there. Only the unique stacks and counts are passed to user space
# at the end of the profile, greatly reducing the kernel<->user transfer. # at the end of the profile, greatly reducing the kernel<->user transfer.
# #
# By default CPU idle stacks are excluded by simply excluding PID 0.
#
# REQUIRES: Linux 4.9+ (BPF_PROG_TYPE_PERF_EVENT support). Under tools/old is # REQUIRES: Linux 4.9+ (BPF_PROG_TYPE_PERF_EVENT support). Under tools/old is
# a version of this tool that may work on Linux 4.6 - 4.8. # a version of this tool that may work on Linux 4.6 - 4.8.
# #
...@@ -22,6 +24,7 @@ ...@@ -22,6 +24,7 @@
# #
# 15-Jul-2016 Brendan Gregg Created this. # 15-Jul-2016 Brendan Gregg Created this.
# 20-Oct-2016 " " Switched to use the new 4.9 support. # 20-Oct-2016 " " Switched to use the new 4.9 support.
# 26-Jan-2019 " " Changed to exclude CPU idle by default.
from __future__ import print_function from __future__ import print_function
from bcc import BPF, PerfType, PerfSWConfig from bcc import BPF, PerfType, PerfSWConfig
...@@ -93,6 +96,8 @@ parser.add_argument("-d", "--delimited", action="store_true", ...@@ -93,6 +96,8 @@ parser.add_argument("-d", "--delimited", action="store_true",
help="insert delimiter between kernel/user stacks") help="insert delimiter between kernel/user stacks")
parser.add_argument("-a", "--annotations", action="store_true", parser.add_argument("-a", "--annotations", action="store_true",
help="add _[k] annotations to kernel frames") help="add _[k] annotations to kernel frames")
parser.add_argument("-I", "--include-idle", action="store_true",
help="include CPU idle stacks")
parser.add_argument("-f", "--folded", action="store_true", parser.add_argument("-f", "--folded", action="store_true",
help="output folded format, one line per stack (for flame graphs)") help="output folded format, one line per stack (for flame graphs)")
parser.add_argument("--stack-storage-size", default=16384, parser.add_argument("--stack-storage-size", default=16384,
...@@ -141,6 +146,9 @@ BPF_STACK_TRACE(stack_traces, STACK_STORAGE_SIZE); ...@@ -141,6 +146,9 @@ BPF_STACK_TRACE(stack_traces, STACK_STORAGE_SIZE);
int do_perf_event(struct bpf_perf_event_data *ctx) { int do_perf_event(struct bpf_perf_event_data *ctx) {
u32 pid = bpf_get_current_pid_tgid() >> 32; u32 pid = bpf_get_current_pid_tgid() >> 32;
if (IDLE_FILTER)
return 0;
if (!(THREAD_FILTER)) if (!(THREAD_FILTER))
return 0; return 0;
...@@ -184,6 +192,12 @@ int do_perf_event(struct bpf_perf_event_data *ctx) { ...@@ -184,6 +192,12 @@ int do_perf_event(struct bpf_perf_event_data *ctx) {
} }
""" """
# set idle filter
idle_filter = "pid == 0"
if args.include_idle:
idle_filter = "0"
bpf_text = bpf_text.replace('IDLE_FILTER', idle_filter)
# set thread filter # set thread filter
thread_context = "" thread_context = ""
perf_filter = "-a" perf_filter = "-a"
......
...@@ -41,6 +41,27 @@ Sampling at 49 Hertz of all threads by user + kernel stack... Hit Ctrl-C to end. ...@@ -41,6 +41,27 @@ Sampling at 49 Hertz of all threads by user + kernel stack... Hit Ctrl-C to end.
- func_ab (13549) - func_ab (13549)
5 5
The output was long; I truncated some lines ("[...]").
This default output prints stack traces, followed by a line to describe the
process (a dash, the process name, and a PID in parenthesis), and then an
integer count of how many times this stack trace was sampled.
The func_ab process is running the func_a() function, called by main(),
called by __libc_start_main(), and called by "[unknown]" with what looks
like a bogus address (1st column). That's evidence of a broken stack trace.
It's common for user-level software that hasn't been compiled with frame
pointers (in this case, libc).
The dd process has called read(), and then enters the kernel via
entry_SYSCALL_64_fastpath(), calling sys_read(), and so on. Yes, I'm now
reading it bottom up. That way follows the code flow.
By default, CPU idle stacks are excluded. They can be included with -I:
# ./profile -I
[...] [...]
native_safe_halt native_safe_halt
...@@ -64,32 +85,16 @@ Sampling at 49 Hertz of all threads by user + kernel stack... Hit Ctrl-C to end. ...@@ -64,32 +85,16 @@ Sampling at 49 Hertz of all threads by user + kernel stack... Hit Ctrl-C to end.
- swapper/1 (0) - swapper/1 (0)
75 75
The output was long; I truncated some lines ("[...]").
This default output prints stack traces, followed by a line to describe the
process (a dash, the process name, and a PID in parenthesis), and then an
integer count of how many times this stack trace was sampled.
The output above shows the most frequent stack was from the "swapper/1" The output above shows the most frequent stack was from the "swapper/1"
process (PID 0), running the native_safe_halt() function, which was called process (PID 0), running the native_safe_halt() function, which was called
by default_idle(), which was called by arch_cpu_idle(), and so on. This is by default_idle(), which was called by arch_cpu_idle(), and so on. This is
the idle thread. Stacks can be read top-down, to follow ancestry: child, the idle thread. Stacks can be read top-down, to follow ancestry: child,
parent, grandparent, etc. parent, grandparent, etc.
The func_ab process is running the func_a() function, called by main(),
called by __libc_start_main(), and called by "[unknown]" with what looks
like a bogus address (1st column). That's evidence of a broken stack trace.
It's common for user-level software that hasn't been compiled with frame
pointers (in this case, libc).
The dd process has called read(), and then enters the kernel via
entry_SYSCALL_64_fastpath(), calling sys_read(), and so on. Yes, I'm now
reading it bottom up. That way follows the code flow.
The dd process profiled ealrier is actually "dd if=/dev/zero of=/dev/null":
The dd process is actually "dd if=/dev/zero of=/dev/null": it's a simple it's a simple workload to analyze that just moves bytes from /dev/zero to
workload to analyze that just moves bytes from /dev/zero to /dev/null. /dev/null. Profiling just that process:
Profiling just that process:
# ./profile -p 25036 # ./profile -p 25036
Sampling at 49 Hertz of PID 25036 by user + kernel stack... Hit Ctrl-C to end. Sampling at 49 Hertz of PID 25036 by user + kernel stack... Hit Ctrl-C to end.
...@@ -539,6 +544,8 @@ You can increase or decrease the sample frequency. Eg, sampling at 9 Hertz: ...@@ -539,6 +544,8 @@ You can increase or decrease the sample frequency. Eg, sampling at 9 Hertz:
# ./profile -F 9 # ./profile -F 9
Sampling at 9 Hertz of all threads by user + kernel stack... Hit Ctrl-C to end. Sampling at 9 Hertz of all threads by user + kernel stack... Hit Ctrl-C to end.
^C ^C
[...]
func_b func_b
main main
__libc_start_main __libc_start_main
...@@ -548,27 +555,6 @@ Sampling at 9 Hertz of all threads by user + kernel stack... Hit Ctrl-C to end. ...@@ -548,27 +555,6 @@ Sampling at 9 Hertz of all threads by user + kernel stack... Hit Ctrl-C to end.
[...] [...]
native_safe_halt
default_idle
arch_cpu_idle
default_idle_call
cpu_startup_entry
start_secondary
- swapper/3 (0)
8
native_safe_halt
default_idle
arch_cpu_idle
default_idle_call
cpu_startup_entry
rest_init
start_kernel
x86_64_start_reservations
x86_64_start_kernel
- swapper/0 (0)
8
You can also restrict profiling to just kernel stacks (-K) or user stacks (-U). You can also restrict profiling to just kernel stacks (-K) or user stacks (-U).
For example, just user stacks: For example, just user stacks:
...@@ -707,24 +693,6 @@ Sampling at 49 Hertz of all threads by user stack... Hit Ctrl-C to end. ...@@ -707,24 +693,6 @@ Sampling at 49 Hertz of all threads by user stack... Hit Ctrl-C to end.
- dd (2931) - dd (2931)
14 14
- swapper/7 (0)
46
- swapper/0 (0)
46
- swapper/2 (0)
46
- swapper/1 (0)
46
- swapper/3 (0)
46
- swapper/4 (0)
46
If there are too many unique stack traces for the kernel to save, a warning If there are too many unique stack traces for the kernel to save, a warning
will be printed. Eg: will be printed. Eg:
...@@ -739,8 +707,8 @@ Run ./profile -h to see the default. ...@@ -739,8 +707,8 @@ Run ./profile -h to see the default.
USAGE message: USAGE message:
# ./profile -h # ./profile -h
usage: profile [-h] [-p PID] [-U | -K] [-F FREQUENCY | -c COUNT] [-d] [-a] usage: profile.py [-h] [-p PID] [-U | -K] [-F FREQUENCY | -c COUNT] [-d] [-a]
[-f] [--stack-storage-size STACK_STORAGE_SIZE] [-I] [-f] [--stack-storage-size STACK_STORAGE_SIZE] [-C CPU]
[duration] [duration]
Profile CPU stack traces at a timed interval Profile CPU stack traces at a timed interval
...@@ -763,11 +731,12 @@ optional arguments: ...@@ -763,11 +731,12 @@ optional arguments:
sample period, number of events sample period, number of events
-d, --delimited insert delimiter between kernel/user stacks -d, --delimited insert delimiter between kernel/user stacks
-a, --annotations add _[k] annotations to kernel frames -a, --annotations add _[k] annotations to kernel frames
-I, --include-idle include CPU idle stacks
-f, --folded output folded format, one line per stack (for flame -f, --folded output folded format, one line per stack (for flame
graphs) graphs)
--stack-storage-size STACK_STORAGE_SIZE --stack-storage-size STACK_STORAGE_SIZE
the number of unique stack traces that can be stored the number of unique stack traces that can be stored
and displayed (default 2048) and displayed (default 16384)
-C CPU, --cpu CPU cpu number to run profile on -C CPU, --cpu CPU cpu number to run profile on
examples: examples:
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment