Commit f3ab5097 authored by Brendan Gregg's avatar Brendan Gregg Committed by GitHub

Merge pull request #98 from iovisor/tools

More Tools
parents e2d6b93e 097a92bc
......@@ -8,34 +8,23 @@ For instructions on building BPFtrace, see [INSTALL.md](INSTALL.md). There is al
## Examples
Count system calls:
```
tracepoint:syscalls:sys_enter_*
{
@[name] = count();
}
```
Count system calls using tracepoints:
```
# bpftrace -e 'tracepoint:syscalls:sys_enter_* { @[name] = count(); }'
Attaching 320 probes...
^C
...
@[tracepoint:syscalls:sys_enter_futex]: 50
@[tracepoint:syscalls:sys_enter_newfstat]: 52
@[tracepoint:syscalls:sys_enter_clock_gettime]: 56
@[tracepoint:syscalls:sys_enter_perf_event_open]: 148
@[tracepoint:syscalls:sys_enter_select]: 156
@[tracepoint:syscalls:sys_enter_dup]: 291
@[tracepoint:syscalls:sys_enter_read]: 308
@[tracepoint:syscalls:sys_enter_bpf]: 310
@[tracepoint:syscalls:sys_enter_open]: 363
@[tracepoint:syscalls:sys_enter_ioctl]: 571
@[tracepoint:syscalls:sys_enter_dup2]: 580
@[tracepoint:syscalls:sys_enter_close]: 998
@[tracepoint:syscalls:sys_enter_access]: 3291
@[tracepoint:syscalls:sys_enter_close]: 3897
@[tracepoint:syscalls:sys_enter_newstat]: 4268
@[tracepoint:syscalls:sys_enter_open]: 4609
@[tracepoint:syscalls:sys_enter_mmap]: 4781
```
Produce a histogram of amount of time (in nanoseconds) spent in the `read()` system call:
Produce a histogram of time (in nanoseconds) spent in the `read()` system call:
```
// read.bt file
tracepoint:syscalls:sys_enter_read
{
@start[tid] = nsecs;
......@@ -48,20 +37,11 @@ tracepoint:syscalls:sys_exit_read / @start[tid] /
}
```
```
# bpftrace read.bt
Attaching 2 probes...
^C
@start[9134]: 6465933686812
@times:
[0, 1] 0 | |
[2, 4) 0 | |
[4, 8) 0 | |
[8, 16) 0 | |
[16, 32) 0 | |
[32, 64) 0 | |
[64, 128) 0 | |
[128, 256) 0 | |
[256, 512) 326 |@ |
[512, 1k) 7715 |@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[1k, 2k) 15306 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
......@@ -73,50 +53,26 @@ Attaching 2 probes...
[64k, 128k) 5 | |
```
Print paths of any files opened along with the name of process which opened them:
```
kprobe:sys_open
{
printf("%s: %s\n", comm, str(arg0))
}
```
Print process name and paths for file opens, using kprobes (kernel dynamic tracing) of do_sys_open():
```
# bpftrace -e 'kprobe:do_sys_open { printf("%s: %s\n", comm, str(arg0)) }'
Attaching 1 probe...
git: .git/objects/70
git: .git/objects/pack
git: .git/objects/da
git: .git/objects/pack
git: /etc/localtime
systemd-journal: /var/log/journal/72d0774c88dc4943ae3d34ac356125dd
DNS Res~ver #15: /etc/hosts
DNS Res~ver #16: /etc/hosts
DNS Res~ver #15: /etc/hosts
^C
```
Whole system profiling (TODO make example check if kernel is on-cpu before recording):
```
profile:hz:99
{
@[stack] = count()
}
```
CPU profiling, sampling kernel stacks at 99 Hertz:
```
# bpftrace -e 'profile:hz:99 { @[stack] = count() }'
Attaching 1 probe...
^C
...
@[
_raw_spin_unlock_irq+23
finish_task_switch+117
__schedule+574
schedule_idle+44
do_idle+333
cpu_startup_entry+113
start_secondary+344
verify_cpu+0
]: 83
@[
queue_work_on+41
tty_flip_buffer_push+43
pty_write+83
......@@ -146,15 +102,23 @@ verify_cpu+0
bpftrace contains various tools, which also serve as examples of programming in the bpftrace language.
- tools/[bashreadline.bt](tools/bashreadline.bt): Print entered bash commands system wide. [Examples](tools/bashreadline_example.txt).
- tools/[biolatency.bt](tools/biolatency.bt): Block I/O latency as a histogram. [Examples](tools/biolatency_example.txt).
- tools/[biosnoop.bt](tools/biosnoop.bt): Block I/O tracing tool, showing per I/O latency. [Examples](tools/biosnoop_example.txt).
- tools/[bitesize.bt](tools/bitesize.bt): Show disk I/O size as a histogram. [Examples](tools/bitesize_example.txt).
- tools/[capable.bt](tools/capable.bt): Trace security capabilitiy checks. [Examples](tools/capable_example.txt).
- tools/[cpuwalk.bt](tools/cpuwalk.bt): Sample which CPUs are executing processes. [Examples](tools/cpuwalk_example.txt).
- tools/[execsnoop.bt](tools/execsnoop.bt): Trace new processes via exec() syscalls. [Examples](tools/execsnoop_example.txt).
- tools/[gethostlatency.bt](tools/gethostlatency.bt): Show latency for getaddrinfo/gethostbyname[2] calls. [Examples](tools/gethostlatency_example.txt).
- tools/[killsnoop.bt](tools/killsnoop.bt): Trace signals issued by the kill() syscall. [Examples](tools/killsnoop_example.txt).
- tools/[loads.bt](tools/loads.bt): Print load averages. [Examples](tools/loads_example.txt).
- tools/[opensnoop.bt](tools/loads.bt): Trace open() syscalls showing filenames. [Examples](tools/opensnoop_example.txt).
- tools/[pidpersec.bt](tools/pidpersec.bt): Count new procesess (via fork). [Examples](tools/pidpersec_example.txt).
- tools/[statsnoop.bt](tools/statsnoop.bt): Trace stat() syscalls for general debugging. [Examples](tools/statsnoop_example.txt).
- tools/[syncsnoop.bt](tools/syncsnoop.bt): Trace sync() variety of syscalls. [Examples](tools/syncsnoop_example.txt).
- tools/[syscount.bt](tools/syscount.bt): Count system callls. [Examples](tools/syscount_example.txt).
- tools/[vfscount.bt](tools/vfscount.bt): Count VFS calls. [Examples](tools/vfscount_example.txt).
- tools/[vfsstat.bt](tools/vfsstat.bt): Count some VFS calls, with per-second summaries. [Examples](tools/vfsstat_example.txt).
- tools/[writeback.bt](tools/writeback.bt): Trace file system writeback events with details. [Examples](tools/writeback_example.txt).
- tools/[xfsdist.bt](tools/xfsdist.bt): Summarize XFS operation latency distribution as a histogram. [Examples](tools/xfsdist_example.txt).
For more eBPF observability tools, see [bcc tools](https://github.com/iovisor/bcc#tools).
......
.TH biolatency 8 "2018-09-13" "USER COMMANDS"
.SH NAME
biolatency.bt \- Block I/O latency as a histogram. Uses bpftrace/eBPF.
.SH SYNOPSIS
.B biolatency.bt
.SH DESCRIPTION
This tool summarizes time (latency) spent in block device I/O (disk I/O)
as a power-of-2 histogram. This allows the distribution to be studied,
including modes and outliers. There are often two modes, one for device cache
hits and one for cache misses, which can be shown by this tool. Latency
outliers will also be shown.
This tool currently works by dynamic tracing of the blk_account*() kernel
functions, which will need updating to match any changes to these functions
in future kernels versions.
Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_BPF and bpftrace.
.SH EXAMPLES
.TP
Trace block device I/O (disk I/O), and print a latency histogram on Ctrl-C:
#
.B biolatency
.SH FIELDS
.TP
1st, 2nd
This is a range of latency, in microseconds (shown in "[...)" set notation).
.TP
3rd
A column showing the count of operations in this range.
.TP
4th
This is an ASCII histogram representing the count colimn.
.SH OVERHEAD
Since block device I/O usually has a relatively low frequency (< 10,000/s),
the overhead for this tool is expected to be negligible. For high IOPS storage
systems, test and quantify before use.
.SH SOURCE
This is from bpftrace.
.IP
https://github.com/iovisor/bpftrace
.PP
Also look in the bpftrace distribution for a companion _examples.txt file containing
example usage, output, and commentary for this tool.
This is a bpftrace version of the bcc tool of the same name. The bcc tool
may provide more options and customizations.
.IP
https://github.com/iovisor/bcc
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
biosnoop(1)
......@@ -8,6 +8,10 @@ This is a basic block I/O (disk I/O) tracing tool, showing each I/O event
along with the issuing process ID, and the I/O latency. This can be used to
investigate disk I/O performance issues.
This tool currently works by dynamic tracing of the blk_account*() kernel
functions, which will need updating to match any changes to these functions
in future kernels versions.
Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_BPF and bpftrace.
......
.TH bitesize 8 "2018-09-07" "USER COMMANDS"
.SH NAME
bitesize.bt \- Show disk I/O size as a histogram. Uses bpftrace/eBPF.
.SH SYNOPSIS
.B bitesize.bt
.SH DESCRIPTION
This can be used to characterize the distribution of block device
(disk) I/O sizes. To study block device I/O in more detail, see biosnoop(8).
This uses the tracepoint:block:block_rq_issue tracepoint, and is a simple
example of bpftrace.
Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_BPF and bpftrace.
.SH EXAMPLES
.TP
Trace block I/O and summarize as a histogram by process:
#
.B bitesize.bt
.SH FIELDS
.TP
0th
A process name (shown in "@[...]") is printed before each I/O histogram.
.TP
1st, 2nd
This is a range of I/O sizes, in Kbytes (shown in "[...)" set notation).
.TP
3rd
A column showing the count of I/O in this range.
.TP
4th
This is an ASCII histogram representing the count colimn.
.SH OVERHEAD
Since block device I/O usually has a relatively low frequency (< 10,000/s),
the overhead for this tool is expected to be low or negligible. For high IOPS
storage systems, test and quantify before use.
.SH SOURCE
This is from bpftrace.
.IP
https://github.com/iovisor/bpftrace
.PP
Also look in the bpftrace distribution for a companion _examples.txt file
containing example usage, output, and commentary for this tool.
This is a bpftrace version of the bcc tool of the same name.
.IP
https://github.com/iovisor/bcc
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
biosnoop.bt(8)
.TH killsnoop 8 "2018-09-07" "USER COMMANDS"
.SH NAME
killsnoop.bt \- Trace signals issued by the kill() syscall. Uses bpftrace/eBPF.
.SH SYNOPSIS
.B killsnoop.bt
.SH DESCRIPTION
killsnoop traces the kill() syscall, to show signals sent via this method. This
may be useful to troubleshoot failing applications, where an unknown mechanism
is sending signals.
This works by tracing the kill() syscall tracepoints.
Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_BPF and bpftrace.
.SH EXAMPLES
.TP
Trace all kill() syscalls:
#
.B killsnoop.bt
.SH FIELDS
.TP
TIME
Time of the kill call.
.TP
PID
Source process ID
.TP
COMM
Source process name
.TP
SIG
Signal number. See signal(7).
.TP
TPID
Target process ID
.TP
RES
Result. 0 == success, a negative value (of the error code) for failure.
.SH OVERHEAD
This traces the kernel kill function and prints output for each event. As the
rate of this is generally expected to be low (< 100/s), the overhead is also
expected to be negligible. If you have an application that is calling a very
high rate of kill()s for some reason, then test and understand overhead before
use.
.SH SOURCE
This is from bpftrace.
.IP
https://github.com/iovisor/bpftrace
.PP
Also look in the bpftrace distribution for a companion _examples.txt file containing
example usage, output, and commentary for this tool.
This is a bpftrace version of the bcc tool of the same name. The bcc tool
may provide more options and customizations.
.IP
https://github.com/iovisor/bcc
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
opensnoop(8)
.TH opensnoop 8 "2018-09-08" "USER COMMANDS"
.SH NAME
opensnoop.bt \- Trace open() syscalls. Uses bpftrace/eBPF.
.SH SYNOPSIS
.B opensnoop.bt
.SH DESCRIPTION
opensnoop traces the open() syscall, showing which processes are attempting
to open which files. This can be useful for determining the location of config
and log files, or for troubleshooting applications that are failing, specially
on startup.
This works by tracing the open() sysscall tracepoint.
Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_BPF and bcc.
.SH EXAMPLES
.TP
Trace all open() syscalls:
#
.B opensnoop.bt
.SH FIELDS
PID
Process ID
.TP
TID
Thread ID
.TP
COMM
Process name
.TP
FD
File descriptor (if success), or -1 (if failed)
.TP
ERR
Error number (see the system's errno.h)
.TP
PATH
Open path
.SH OVERHEAD
This traces the open tracepoint and prints output for each event. As the
rate of this is generally expected to be low (< 1000/s), the overhead is also
expected to be negligible. If you have an application that is calling a high
rate of open()s, then test and understand overhead before use.
.SH SOURCE
This is from bpftrace.
.IP
https://github.com/iovisor/bpftrace
.PP
Also look in the bpftrace distribution for a companion _examples.txt file containing
example usage, output, and commentary for this tool.
This is a bpftrace version of the bcc tool of the same name. The bcc tool
may provide more options and customizations.
.IP
https://github.com/iovisor/bcc
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
statsnoop(8), execsnoop(8)
.TH statsnoop 8 "2018-09-08" "USER COMMANDS"
.SH NAME
statsnoop.bt \- Trace stat() syscalls. Uses bpftrace/eBPF.
.SH SYNOPSIS
.B statsnoop.bt
.SH DESCRIPTION
statsnoop traces the stat() syscall, showing which processes are attempting
to stat which files. This can be useful for determining the location of config
and log files, or for troubleshooting applications that are failing, specially
on startup.
This traces the traecepoints for statfs(), statx(), newstat(), and
newlstat(). These aren't the only the stat syscalls: if you are missing
activity, you may need to add more variants.
Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_BPF and bcc.
.SH EXAMPLES
.TP
Trace all stat() syscalls:
#
.B statsnoop.bt
.SH FIELDS
PID
Process ID
.TP
TID
Thread ID
.TP
COMM
Process name
.TP
FD
File descriptor (if success), or -1 (if failed)
.TP
ERR
Error number (see the system's errno.h)
.TP
PATH
Stat path
.SH OVERHEAD
This traces the stat tracepoints and prints output for each event. As the
rate of this is generally expected to be low (< 1000/s), the overhead is also
expected to be negligible. If you have an application that is calling a high
rate of stat()s, then test and understand overhead before use.
.SH SOURCE
This is from bpftrace.
.IP
https://github.com/iovisor/bpftrace
.PP
Also look in the bpftrace distribution for a companion _examples.txt file containing
example usage, output, and commentary for this tool.
This is a bpftrace version of the bcc tool of the same name. The bcc tool
may provide more options and customizations.
.IP
https://github.com/iovisor/bcc
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
opensnoop(8), execsnoop(8)
.TH syncsnoop 8 "2018-09-06" "USER COMMANDS"
.SH NAME
syncsnoop.bt \- Trace the sync() variety of syscalls. Uses bpftrace/eBPF.
.SH SYNOPSIS
.B syncsnoop.bt
.SH DESCRIPTION
syncsnoop traces calls to sync() syscalls (sync(), fsync(), msync(), etc), which
flushes file system cache and buffers to storage devices. These calls can cause
performance perturbations, and it can be useful to know if they are happening,
when they happen, and how frequently.
This works by tracing the sync() variety of syscalls via tracepoints.
This program is also a basic example of eBPF/bcc.
Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_BPF and bpftrace.
.SH EXAMPLES
.TP
Trace calls to sync() syscalls:
#
.B syncsnoop.bt
.SH FIELDS
.TP
TIME
A timestamp on the output, in "HH:MM:SS" format.
.TP
PID
The process ID that was on-CPU during the event.
.TP
COMM
The process name that was on-CPU during the event.
.TP
EVENT
The tracepoint name for the sync event.
.SH OVERHEAD
This traces sync syscalls and prints output for each event. As the
rate of this is generally expected to be low (<< 100/s), the overhead is also
expected to be negligible.
.SH SOURCE
This is from bpftrace.
.IP
https://github.com/iovisor/bpftrace
.PP
Also look in the bpftrace distribution for a companion _examples.txt file
containing example usage, output, and commentary for this tool.
This is a bpftrace version of the bcc tool of the same name.
.IP
https://github.com/iovisor/bcc
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
iostat(1)
.TH syscount 8 "2018-09-06" "USER COMMANDS"
.SH NAME
syscount.bt \- Count system calls. Uses bpftrace/eBPF.
.SH SYNOPSIS
.B syscount.bt
.SH DESCRIPTION
This counts system calls (syscalls), printing a summary of the top ten
syscall IDs, and the top ten process names making syscalls. This can be
helpful for characterizing the kernel and resource workload, and finding
applications who are using syscalls inefficiently.
This works by using the tracepoint:raw_syscalls:sys_enter tracepoint.
Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_BPF and bpftrace.
.SH EXAMPLES
.TP
Count all VFS calls until Ctrl-C is hit:
#
.B syscount.bt
.SH OUTPUT
.TP
Top 10 syscalls IDs:
This shows the syscall ID number (in @syscall[]) followed by a count for this
syscall during tracing. To see the syscall name for that ID, you can use
"ausyscall --dump", or the bcc version of this tool that does translations.
.TP
Top 10 processes:
This shows the process name (in @process[]) followed by a count of syscalls
during tracing.
.SH OVERHEAD
For most applications, the overhead should be manageable if they perform 1000's
or even 10,000's of syscalls per second. For higher rates, the overhead may
become considerable. For example, tracing a microbenchmark loop of 4 million
calls to geteuid(), slows it down by 2.4x. However, this represents tracing
a workload that has a syscall rate of over 4 million syscalls per second per
CPU, which should not be typical (in one large cloud production environment,
rates of between 10k and 50k are typical, where the application overhead is
expected to be closer to 1%).
For comparison, strace(1) in its current ptrace-based implementation (which it
has had for decades) runs the same geteuid() workload 102x slower (that's one
hundred and two times slower).
.SH SOURCE
This is from bpftrace.
.IP
https://github.com/iovisor/bpftrace
.PP
Also look in the bpftrace distribution for a companion _examples.txt file
containing example usage, output, and commentary for this tool.
This is a bpftrace version of the bcc tool of the same name.
The bcc version provides different command line options, and translates the
syscall IDs to their syscall names.
.IP
https://github.com/iovisor/bcc
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
strace(1)
.TH writeback 8 "2018-09-14" "USER COMMANDS"
.SH NAME
writeback.bt \- Trace file system writeback events with details. Uses bpftrace/eBPF.
.SH SYNOPSIS
.B writeback.bt
.SH DESCRIPTION
This traces when file system dirtied pages are flushed to disk by kernel
writeback, and prints details including when the event occurred, and the
duration of the event. This can be useful for correlating these times with
other performace problems, and if there is a match, it would be a clue
that the problem may be caused by writeback. How quickly the kernel does
writeback can be tuned: see the kernel docs, eg,
vm.dirty_writeback_centisecs.
This uses the tracepoint:writeback:writeback_start and
tracepoint:writeback:writeback_written tracepoints.
Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_BPF and bpftrace.
.SH EXAMPLES
.TP
Trace all writeback events with timestamps and latency details:
#
.B writeback.bt
.SH FIELDS
.TP
TIME
Time that the writeback event completed, in %H:%M:%S format.
.TP
DEVICE
Device name in major:minor number format.
.TP
PAGES
Pages written during writeback.
.TP
REASON
Reason for the writeback event. This may be "background", "vmscan", "sync", "periodic", etc.
.TP
ms
Duration of the writeback event in milliseconds.
.SH OVERHEAD
Since writeback events are expected to be infrequent (<10/sec), the overhead
of this tool is expected to be negligible (near 0%).
.SH SOURCE
This is from bpftrace.
.IP
https://github.com/iovisor/bpftrace
.PP
Also look in the bpftrace distribution for a companion _examples.txt file containing
example usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
biosnoop(8)
......@@ -57,4 +57,4 @@ Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
iosnoop(1)
biolatency(8)
/*
* biolatency.bt Block I/O latency as a histogram.
* For Linux, uses bpftrace, eBPF.
*
* This is a bpftrace version of the bcc tool of the same name.
*
* Copyright 2018 Netflix, Inc.
* Licensed under the Apache License, Version 2.0 (the "License")
*
* 13-Sep-2018 Brendan Gregg Created this.
*/
BEGIN
{
printf("Tracing block device I/O... Hit Ctrl-C to end.\n");
}
kprobe:blk_account_io_start
{
@start[arg0] = nsecs;
}
kprobe:blk_account_io_completion
/@start[arg0]/
{
@usecs = hist((nsecs - @start[arg0]) / 1000);
delete(@start[arg0]);
}
Demonstrations of biolatency, the Linux BPF/bpftrace version.
This traces block I/O, and shows latency as a power-of-2 histogram. For example:
# biolatency.bt
Attaching 3 probes...
Tracing block device I/O... Hit Ctrl-C to end.
^C
@usecs:
[0, 1] 0 | |
[2, 4) 0 | |
[4, 8) 0 | |
[8, 16) 0 | |
[16, 32) 0 | |
[32, 64) 0 | |
[64, 128) 0 | |
[128, 256) 0 | |
[256, 512) 2 | |
[512, 1K) 10 |@ |
[1K, 2K) 426 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[2K, 4K) 230 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[4K, 8K) 9 |@ |
[8K, 16K) 128 |@@@@@@@@@@@@@@@ |
[16K, 32K) 68 |@@@@@@@@ |
[32K, 64K) 0 | |
[64K, 128K) 0 | |
[128K, 256K) 10 |@ |
While tracing, this shows that 426 block I/O had a latency of between 1K and 2K
usecs (1024 and 2048 microseconds), which is between 1 and 2 milliseconds.
There are also two modes visible, one between 1 and 2 milliseconds, and another
between 8 and 16 milliseconds: this sounds like cache hits and cache misses.
There were also 10 I/O with latency 128 to 256 ms: outliers. Other tools and
instrumentation, like biosnoop.bt, can shed more light on those outliers.
There is another version of this tool in bcc: https://github.com/iovisor/bcc
The bcc version provides options to customize the output.
/*
* bitesize Show disk I/O size as a histogram.
* For Linux, uses bpftrace and eBPF.
*
* USAGE: bitesize.bt
*
* This is a bpftrace version of the bcc tool of the same name.
*
* Copyright 2018 Netflix, Inc.
* Licensed under the Apache License, Version 2.0 (the "License")
*
* 07-Sep-2018 Brendan Gregg Created this.
*/
BEGIN
{
printf("Tracing block device I/O... Hit Ctrl-C to end.\n");
}
tracepoint:block:block_rq_issue
{
$b = *(ctx+28) & 0x3ffff;
@[comm] = hist($b);
}
END
{
printf("\nI/O size (bytes) histograms by process name:");
}
Demonstrations of bitesize, the Linux bpftrace/eBPF version.
This traces disk I/O via the block I/O interface, and prints a summary of I/O
sizes as histograms for each process name. For example:
# bpftrace bitesize.bt
Attaching 3 probes...
Tracing block device I/O... Hit Ctrl-C to end.
^C
I/O size (bytes) histograms by process name:
@[cleanup]:
[0, 1] 0 | |
[2, 4) 0 | |
[4, 8) 0 | |
[8, 16) 0 | |
[16, 32) 0 | |
[32, 64) 0 | |
[64, 128) 0 | |
[128, 256) 0 | |
[256, 512) 0 | |
[512, 1K) 0 | |
[1K, 2K) 0 | |
[2K, 4K) 0 | |
[4K, 8K) 2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
@[postdrop]:
[0, 1] 0 | |
[2, 4) 0 | |
[4, 8) 0 | |
[8, 16) 0 | |
[16, 32) 0 | |
[32, 64) 0 | |
[64, 128) 0 | |
[128, 256) 0 | |
[256, 512) 0 | |
[512, 1K) 0 | |
[1K, 2K) 0 | |
[2K, 4K) 0 | |
[4K, 8K) 2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
@[jps]:
[0, 1] 0 | |
[2, 4) 0 | |
[4, 8) 0 | |
[8, 16) 0 | |
[16, 32) 0 | |
[32, 64) 0 | |
[64, 128) 0 | |
[128, 256) 0 | |
[256, 512) 0 | |
[512, 1K) 0 | |
[1K, 2K) 0 | |
[2K, 4K) 0 | |
[4K, 8K) 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[8K, 16K) 0 | |
[16K, 32K) 0 | |
[32K, 64K) 2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
@[kworker/2:1H]:
[0, 1] 3 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[2, 4) 0 | |
[4, 8) 0 | |
[8, 16) 0 | |
[16, 32) 0 | |
[32, 64) 0 | |
[64, 128) 0 | |
[128, 256) 0 | |
[256, 512) 0 | |
[512, 1K) 0 | |
[1K, 2K) 0 | |
[2K, 4K) 0 | |
[4K, 8K) 0 | |
[8K, 16K) 0 | |
[16K, 32K) 0 | |
[32K, 64K) 0 | |
[64K, 128K) 1 |@@@@@@@@@@@@@@@@@ |
@[jbd2/nvme0n1-8]:
[0, 1] 0 | |
[2, 4) 0 | |
[4, 8) 0 | |
[8, 16) 0 | |
[16, 32) 0 | |
[32, 64) 0 | |
[64, 128) 0 | |
[128, 256) 0 | |
[256, 512) 0 | |
[512, 1K) 0 | |
[1K, 2K) 0 | |
[2K, 4K) 0 | |
[4K, 8K) 3 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[8K, 16K) 0 | |
[16K, 32K) 0 | |
[32K, 64K) 2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[64K, 128K) 1 |@@@@@@@@@@@@@@@@@ |
@[dd]:
[0, 1] 0 | |
[2, 4) 0 | |
[4, 8) 0 | |
[8, 16) 0 | |
[16, 32) 0 | |
[32, 64) 0 | |
[64, 128) 0 | |
[128, 256) 0 | |
[256, 512) 0 | |
[512, 1K) 0 | |
[1K, 2K) 0 | |
[2K, 4K) 0 | |
[4K, 8K) 0 | |
[8K, 16K) 0 | |
[16K, 32K) 921 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
The most active process while tracing was "dd", which issues 921 I/O between
16 Kbytes and 32 Kbytes in size.
There is another version of this tool in bcc: https://github.com/iovisor/bcc
/*
* killsnoop Trace signals issued by the kill() syscall.
* For Linux, uses bpftrace and eBPF.
*
* USAGE: killsnoop.bt
*
* Also a basic example of bpftrace.
*
* This is a bpftrace version of the bcc tool of the same name.
*
* Copyright 2018 Netflix, Inc.
* Licensed under the Apache License, Version 2.0 (the "License")
*
* 07-Sep-2018 Brendan Gregg Created this.
*/
BEGIN
{
printf("Tracing kill() signals... Hit Ctrl-C to end.\n");
printf("%-9s %-6s %-16s %-4s %-6s %s\n", "TIME", "PID", "COMM", "SIG",
"TPID", "RESULT");
}
tracepoint:syscalls:sys_enter_kill
{
$pid = *(ctx + 16);
$sig = *(ctx + 24);
@tpid[tid] = $pid;
@tsig[tid] = $sig;
}
tracepoint:syscalls:sys_exit_kill
/@tpid[tid]/
{
$ret = *(ctx + 16);
time("%H:%M:%S ");
printf("%-6d %-16s %-4d %-6d %d\n", pid, comm, @tsig[tid], @tpid[tid],
$ret);
delete(@tpid[tid]);
delete(@tsig[tid]);
}
Demonstrations of killsnoop, the Linux bpftrace/eBPF version.
This traces signals sent via the kill() syscall. For example:
# bpftrace killsnoop.bt
Attaching 3 probes...
Tracing kill() signals... Hit Ctrl-C to end.
TIME PID COMM SIG TPID RESULT
00:09:37 22485 bash 2 23856 0
00:09:40 22485 bash 2 23856 -3
00:09:31 22485 bash 15 23814 -3
The first line showed a SIGINT (2) sent from PID 22485 (a bash shell) to
PID 23856. The result, 0, means success. The next line shows the same signal
sent, which resulted in -3, a failure (likely because the target process
no longer existed).
There is another version of this tool in bcc: https://github.com/iovisor/bcc
The bcc version provides command line options to customize the output.
/*
* opensnoop Trace open() syscalls.
* For Linux, uses bpftrace and eBPF.
*
* Also a basic example of bpftrace.
*
* USAGE: opensnoop.bt
*
* This is a bpftrace version of the bcc tool of the same name.
*
* Copyright 2018 Netflix, Inc.
* Licensed under the Apache License, Version 2.0 (the "License")
*
* 08-Sep-2018 Brendan Gregg Created this.
*/
BEGIN
{
printf("Tracing open syscalls... Hit Ctrl-C to end.\n");
printf("%-6s %-16s %4s %3s %s\n", "PID", "COMM", "FD", "ERR", "PATH");
}
tracepoint:syscalls:sys_enter_open
{
$file = *(ctx + 16);
@filename[tid] = $file;
}
tracepoint:syscalls:sys_exit_open
/@filename[tid]/
{
$ret = *(ctx + 16);
$fd = $ret > 0 ? $ret : -1;
$errno = $ret > 0 ? 0 : - $ret;
printf("%-6d %-16s %4d %3d %s\n", pid, comm, $fd, $errno,
str(@filename[tid]));
delete(@filename[tid]);
}
Demonstrations of opensnoop, the Linux bpftrace/eBPF version.
opensnoop traces the open() syscall system-wide, and prints various details.
Example output:
# bpftrace opensnoop.bt
Attaching 3 probes...
Tracing open syscalls... Hit Ctrl-C to end.
PID COMM FD ERR PATH
2440 snmp-pass 4 0 /proc/cpuinfo
2440 snmp-pass 4 0 /proc/stat
25706 ls 3 0 /etc/ld.so.cache
25706 ls 3 0 /lib/x86_64-linux-gnu/libselinux.so.1
25706 ls 3 0 /lib/x86_64-linux-gnu/libc.so.6
25706 ls 3 0 /lib/x86_64-linux-gnu/libpcre.so.3
25706 ls 3 0 /lib/x86_64-linux-gnu/libdl.so.2
25706 ls 3 0 /lib/x86_64-linux-gnu/libpthread.so.0
25706 ls 3 0 /proc/filesystems
25706 ls 3 0 /usr/lib/locale/locale-archive
25706 ls 3 0 .
1744 snmpd 8 0 /proc/net/dev
1744 snmpd 21 0 /proc/net/if_inet6
1744 snmpd 21 0 /sys/class/net/eth0/device/vendor
1744 snmpd 21 0 /sys/class/net/eth0/device/device
1744 snmpd 21 0 /proc/sys/net/ipv4/neigh/eth0/retrans_time_ms
1744 snmpd 21 0 /proc/sys/net/ipv6/neigh/eth0/retrans_time_ms
1744 snmpd 21 0 /proc/sys/net/ipv6/conf/eth0/forwarding
1744 snmpd 21 0 /proc/sys/net/ipv6/neigh/eth0/base_reachable_time_ms
1744 snmpd -1 2 /sys/class/net/lo/device/vendor
1744 snmpd 21 0 /proc/sys/net/ipv4/neigh/lo/retrans_time_ms
1744 snmpd 21 0 /proc/sys/net/ipv6/neigh/lo/retrans_time_ms
1744 snmpd 21 0 /proc/sys/net/ipv6/conf/lo/forwarding
1744 snmpd 21 0 /proc/sys/net/ipv6/neigh/lo/base_reachable_time_ms
2440 snmp-pass 4 0 /proc/cpuinfo
2440 snmp-pass 4 0 /proc/stat
22884 pickup 12 0 maildrop
2440 snmp-pass 4 0 /proc/cpuinfo
2440 snmp-pass 4 0 /proc/stat
While tracing, at "ls" command was launched: the libraries it uses can be seen
as they were opened. Also, the snmpd process opened various /proc and /sys
files (reading metrics).
was starting up: a new process).
opensnoop can be useful for discovering configuration and log files, if used
during application startup.
There is another version of this tool in bcc: https://github.com/iovisor/bcc
The bcc version provides command line options to customize the output.
/*
* statsnoop Trace stat() syscalls.
* For Linux, uses bpftrace and eBPF.
*
* This traces the traecepoints for statfs(), statx(), newstat(), and
* newlstat(). These aren't the only the stat syscalls: if you are missing
* activity, you may need to add more variants.
*
* Also a basic example of bpftrace.
*
* USAGE: statsnoop.bt
*
* This is a bpftrace version of the bcc tool of the same name.
*
* Copyright 2018 Netflix, Inc.
* Licensed under the Apache License, Version 2.0 (the "License")
*
* 08-Sep-2018 Brendan Gregg Created this.
*/
BEGIN
{
printf("Tracing stat syscalls... Hit Ctrl-C to end.\n");
printf("%-6s %-16s %3s %s\n", "PID", "COMM", "ERR", "PATH");
}
tracepoint:syscalls:sys_enter_statfs
{
$path = *(ctx + 16);
@filename[tid] = $path;
}
tracepoint:syscalls:sys_enter_statx
{
$file = *(ctx + 24);
@filename[tid] = $file;
}
tracepoint:syscalls:sys_enter_newstat
{
$file = *(ctx + 16);
@filename[tid] = $file;
}
tracepoint:syscalls:sys_enter_newlstat
{
$file = *(ctx + 16);
@filename[tid] = $file;
}
tracepoint:syscalls:sys_exit_statfs,
tracepoint:syscalls:sys_exit_statx,
tracepoint:syscalls:sys_exit_newstat,
tracepoint:syscalls:sys_exit_newlstat
/@filename[tid]/
{
$ret = *(ctx + 16);
$errno = $ret >= 0 ? 0 : - $ret;
printf("%-6d %-16s %3d %s\n", pid, comm, $errno,
str(@filename[tid]));
delete(@filename[tid]);
}
Demonstrations of statsnoop, the Linux bpftrace/eBPF version.
statsnoop traces different stat() syscalls system-wide, and prints details.
Example output:
# bpftrace statsnoop.bt
Attaching 9 probes...
Tracing stat syscalls... Hit Ctrl-C to end.
PID COMM ERR PATH
27835 bash 0 .
27835 bash 2 /usr/local/sbin/iconfig
27835 bash 2 /usr/local/bin/iconfig
27835 bash 2 /usr/sbin/iconfig
27835 bash 2 /usr/bin/iconfig
27835 bash 2 /sbin/iconfig
27835 bash 2 /bin/iconfig
27835 bash 2 /usr/games/iconfig
27835 bash 2 /usr/local/games/iconfig
27835 bash 2 /snap/bin/iconfig
27835 bash 2 /apps/python/bin/iconfig
30573 command-not-fou 2 /usr/bin/Modules/Setup
30573 command-not-fou 2 /usr/bin/lib/python3.5/os.py
30573 command-not-fou 2 /usr/bin/lib/python3.5/os.pyc
30573 command-not-fou 0 /usr/lib/python3.5/os.py
30573 command-not-fou 2 /usr/bin/pybuilddir.txt
30573 command-not-fou 2 /usr/bin/lib/python3.5/lib-dynload
30573 command-not-fou 0 /usr/lib/python3.5/lib-dynload
30573 command-not-fou 2 /usr/lib/python35.zip
30573 command-not-fou 0 /usr/lib
30573 command-not-fou 2 /usr/lib/python35.zip
30573 command-not-fou 0 /usr/lib/python3.5/
30573 command-not-fou 0 /usr/lib/python3.5/
30573 command-not-fou 0 /usr/lib/python3.5/
30573 command-not-fou 2 /usr/lib/python3.5/encodings/__init__.cpython-35m-x86_64-linux-
30573 command-not-fou 2 /usr/lib/python3.5/encodings/__init__.abi3.so
30573 command-not-fou 2 /usr/lib/python3.5/encodings/__init__.so
30573 command-not-fou 0 /usr/lib/python3.5/encodings/__init__.py
30573 command-not-fou 0 /usr/lib/python3.5/encodings/__init__.py
This output has caught me mistyping a command in another shell, "iconfig"
instead of "ifconfig". The first several lines show the bash shell searching
the $PATH (why is games in my $PATH??), and failing to find it (ERR == 2 is
file not found). Then, a "command-not-found" program executes (the name is
truncated to 16 characters in the COMM field, including the NULL), which
begins the process of searching for and suggesting a package. ie, this:
# iconfig
The program 'iconfig' is currently not installed. You can install it by typing:
apt install ipmiutil
statsnoop can be used for general debugging, to see what file information has
been requested, and whether those files exist. It can be used as a companion
to opensnoop, which shows what files were actually opened.
There is another version of this tool in bcc: https://github.com/iovisor/bcc
The bcc version provides options to customize the output.
/*
* syncsnoop Trace sync() variety of syscalls.
* For Linux, uses bpftrace and eBPF.
*
* Also a basic example of bpftrace.
*
* USAGE: syncsnoop.bt
*
* This is a bpftrace version of the bcc tool of the same name.
*
* Copyright 2018 Netflix, Inc.
* Licensed under the Apache License, Version 2.0 (the "License")
*
* 06-Sep-2018 Brendan Gregg Created this.
*/
BEGIN
{
printf("Tracing sync syscalls... Hit Ctrl-C to end.\n");
printf("%-9s %-6s %-16s %s\n", "TIME", "PID", "COMM", "EVENT");
}
tracepoint:syscalls:sys_enter_sync,
tracepoint:syscalls:sys_enter_syncfs,
tracepoint:syscalls:sys_enter_fsync,
tracepoint:syscalls:sys_enter_fdatasync,
tracepoint:syscalls:sys_enter_sync_file_range,
tracepoint:syscalls:sys_enter_msync
{
time("%H:%M:%S ");
printf("%-6d %-16s %s\n", pid, comm, name);
}
Demonstrations of syncsnoop, the Linux bpftrace/eBPF version.
Tracing file system sync events:
# bpftrace syncsnoop.bt
Attaching 7 probes...
Tracing sync syscalls... Hit Ctrl-C to end.
TIME PID COMM EVENT
02:02:17 27933 sync tracepoint:syscalls:sys_enter_sync
02:03:43 27936 sync tracepoint:syscalls:sys_enter_sync
The output shows calls to the sync() syscall (traced via its tracepoint),
along with various details.
There is another version of this tool in bcc: https://github.com/iovisor/bcc
/*
* syscount.bt Count system callls.
* For Linux, uses bpftrace, eBPF.
*
* This is a bpftrace version of the bcc tool of the same name.
* The bcc versions translates syscall IDs to their names, and this version
* currently does not. Syscall IDs can be listed by "ausyscall --dump".
*
* Copyright 2018 Netflix, Inc.
* Licensed under the Apache License, Version 2.0 (the "License")
*
* 13-Sep-2018 Brendan Gregg Created this.
*/
BEGIN
{
printf("Counting syscalls... Hit Ctrl-C to end.\n");
// ausyscall --dump | awk 'NR > 1 { printf("\t@sysname[%d] = \"%s\";\n", $1, $2); }'
}
tracepoint:raw_syscalls:sys_enter
{
$id = *(ctx + 8); // workaround until #32
@syscall[$id] = count();
@process[comm] = count();
}
END
{
printf("\nTop 10 syscalls IDs:\n");
print(@syscall, 10);
clear(@syscall);
printf("\nTop 10 processes:\n");
print(@process, 10);
clear(@process);
}
Demonstrations of syscount, the Linux bpftrace/eBPF version.
syscount counts system calls, and prints summaries of the top ten syscall IDs,
and the top ten process names making syscalls. For example:
# bpftrace syscount.bt
Attaching 3 probes...
Counting syscalls... Hit Ctrl-C to end.
^C
Top 10 syscalls IDs:
@syscall[6]: 36862
@syscall[21]: 42189
@syscall[13]: 44532
@syscall[12]: 58456
@syscall[9]: 82113
@syscall[8]: 95575
@syscall[5]: 147658
@syscall[3]: 163269
@syscall[2]: 270801
@syscall[4]: 326333
Top 10 processes:
@process[rm]: 14360
@process[tail]: 16011
@process[objtool]: 20767
@process[fixdep]: 28489
@process[as]: 48982
@process[gcc]: 90652
@process[command-not-fou]: 172874
@process[sh]: 270515
@process[cc1]: 482888
@process[make]: 1404065
The above output was traced during a Linux kernel build, and the process name
with the most syscalls was "make" with 1,404,065 syscalls while tracing. The
highest syscall ID was 4, which is stat().
There is another version of this tool in bcc: https://github.com/iovisor/bcc
The bcc version provides different command line options, and translates the
syscall IDs to their syscall names.
/*
* writeback Trace file system writeback events with details.
* For Linux, uses bpftrace and eBPF.
*
* This traces when file system dirtied pages are flushed to disk by kernel
* writeback, and prints details including when the event occurred, and the
* duration of the event. This can be useful for correlating these times with
* other performace problems, and if there is a match, it would be a clue
* that the problem may be caused by writeback. How quickly the kernel does
* writeback can be tuned: see the kernel docs, eg,
* vm.dirty_writeback_centisecs.
*
* USAGE: writeback.bt
*
* Copyright 2018 Netflix, Inc.
* Licensed under the Apache License, Version 2.0 (the "License")
*
* 14-Sep-2018 Brendan Gregg Created this.
*/
BEGIN
{
printf("Tracing writeback... Hit Ctrl-C to end.\n");
printf("%-9s %-8s %-8s %-16s %s\n", "TIME", "DEVICE", "PAGES",
"REASON", "ms");
// see /sys/kernel/debug/tracing/events/writeback/writeback_start/format
@reason[0] = "background";
@reason[1] = "vmscan";
@reason[2] = "sync";
@reason[3] = "periodic";
@reason[4] = "laptop_timer";
@reason[5] = "free_more_memory";
@reason[6] = "fs_free_space";
@reason[7] = "forker_thread";
}
tracepoint:writeback:writeback_start
{
$sb_dev = *(ctx + 48) & 0xffffffff;
@start[$sb_dev] = nsecs;
}
tracepoint:writeback:writeback_written
{
$name = str(ctx + 8);
$nr_pages = *(ctx + 40) & 0xffff;
$sb_dev = *(ctx + 48) & 0xffffffff;
$r = *(ctx + 68) & 0xffffffff;
$s = @start[$sb_dev];
delete(@start[$sb_dev]);
$lat = $s ? (nsecs - $s) / 1000 : 0;
time("%H:%M:%S ");
printf("%-8s %-8d %-16s %d.%03d\n", $name, $nr_pages, @reason[$r],
$lat / 1000, $lat % 1000);
}
END
{
clear(@reason);
clear(@start);
}
Demonstrations of writeback, the Linux bpftrace/eBPF version.
This tool traces when the kernel writeback procedure is writing dirtied pages
to disk, and shows details such as the time, device numbers, reason for the
write back, and the duration. For example:
# bpftrace writeback.bt
Attaching 4 probes...
Tracing writeback... Hit Ctrl-C to end.
TIME DEVICE PAGES REASON ms
23:28:47 259:1 15791 periodic 0.005
23:28:48 259:0 15792 periodic 0.004
23:28:52 259:1 15784 periodic 0.003
23:28:53 259:0 18682 periodic 0.003
23:28:55 259:0 41970 background 326.663
23:28:56 259:0 18418 background 332.689
23:28:56 259:0 60402 background 362.446
23:28:57 259:1 18230 periodic 0.005
23:28:57 259:1 65492 background 3.343
23:28:57 259:1 65492 background 0.002
23:28:58 259:0 36850 background 0.000
23:28:58 259:0 13298 background 597.198
23:28:58 259:0 55282 background 322.050
23:28:59 259:0 31730 background 336.031
23:28:59 259:0 8178 background 357.119
23:29:01 259:0 50162 background 1803.146
23:29:02 259:0 27634 background 1311.876
23:29:03 259:0 6130 background 331.599
23:29:03 259:0 50162 background 293.968
23:29:03 259:0 28658 background 284.946
23:29:03 259:0 7154 background 286.572
[...]
By looking a the timestamps and latency, it can be seen that the system was
not spending much time in writeback until 23:28:55, when "background"
writeback began, taking over 300 milliseconds per flush.
If timestamps of heavy writeback coincide with times when applications suffered
performance issues, that would be a clue that they are correlated and there
is contention for the disk devices. There are various ways to tune this:
eg, vm.dirty_writeback_centisecs.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment