Commit e4863ace authored by Brendan Gregg's avatar Brendan Gregg Committed by GitHub

Merge pull request #141 from iovisor/tools

Tools
parents 269c7388 a42eb919
...@@ -144,12 +144,16 @@ bpftrace contains various tools, which also serve as examples of programming in ...@@ -144,12 +144,16 @@ bpftrace contains various tools, which also serve as examples of programming in
- tools/[bitesize.bt](tools/bitesize.bt): Show disk I/O size as a histogram. [Examples](tools/bitesize_example.txt). - tools/[bitesize.bt](tools/bitesize.bt): Show disk I/O size as a histogram. [Examples](tools/bitesize_example.txt).
- tools/[capable.bt](tools/capable.bt): Trace security capabilitiy checks. [Examples](tools/capable_example.txt). - tools/[capable.bt](tools/capable.bt): Trace security capabilitiy checks. [Examples](tools/capable_example.txt).
- tools/[cpuwalk.bt](tools/cpuwalk.bt): Sample which CPUs are executing processes. [Examples](tools/cpuwalk_example.txt). - tools/[cpuwalk.bt](tools/cpuwalk.bt): Sample which CPUs are executing processes. [Examples](tools/cpuwalk_example.txt).
- tools/[dcsnoop.bt](tools/dcsnoop.bt): Trace directory entry cache (dcache) lookups. [Examples](tools/dcsnoop_example.txt).
- tools/[execsnoop.bt](tools/execsnoop.bt): Trace new processes via exec() syscalls. [Examples](tools/execsnoop_example.txt). - tools/[execsnoop.bt](tools/execsnoop.bt): Trace new processes via exec() syscalls. [Examples](tools/execsnoop_example.txt).
- tools/[gethostlatency.bt](tools/gethostlatency.bt): Show latency for getaddrinfo/gethostbyname[2] calls. [Examples](tools/gethostlatency_example.txt). - tools/[gethostlatency.bt](tools/gethostlatency.bt): Show latency for getaddrinfo/gethostbyname[2] calls. [Examples](tools/gethostlatency_example.txt).
- tools/[killsnoop.bt](tools/killsnoop.bt): Trace signals issued by the kill() syscall. [Examples](tools/killsnoop_example.txt). - tools/[killsnoop.bt](tools/killsnoop.bt): Trace signals issued by the kill() syscall. [Examples](tools/killsnoop_example.txt).
- tools/[loads.bt](tools/loads.bt): Print load averages. [Examples](tools/loads_example.txt). - tools/[loads.bt](tools/loads.bt): Print load averages. [Examples](tools/loads_example.txt).
- tools/[mdflush.bt](tools/mdflush.bt): Trace md flush events. [Examples](tools/mdflush_example.txt).
- tools/[opensnoop.bt](tools/loads.bt): Trace open() syscalls showing filenames. [Examples](tools/opensnoop_example.txt). - tools/[opensnoop.bt](tools/loads.bt): Trace open() syscalls showing filenames. [Examples](tools/opensnoop_example.txt).
- tools/[oomkill.bt](tools/oomkill.bt): Trace OOM killer. [Examples](tools/oomkill_example.txt).
- tools/[pidpersec.bt](tools/pidpersec.bt): Count new procesess (via fork). [Examples](tools/pidpersec_example.txt). - tools/[pidpersec.bt](tools/pidpersec.bt): Count new procesess (via fork). [Examples](tools/pidpersec_example.txt).
- tools/[runqlen.bt](tools/runqlen.bt): CPU scheduler run queue length as a histogram. [Examples](tools/runqlen_example.txt).
- tools/[statsnoop.bt](tools/statsnoop.bt): Trace stat() syscalls for general debugging. [Examples](tools/statsnoop_example.txt). - tools/[statsnoop.bt](tools/statsnoop.bt): Trace stat() syscalls for general debugging. [Examples](tools/statsnoop_example.txt).
- tools/[syncsnoop.bt](tools/syncsnoop.bt): Trace sync() variety of syscalls. [Examples](tools/syncsnoop_example.txt). - tools/[syncsnoop.bt](tools/syncsnoop.bt): Trace sync() variety of syscalls. [Examples](tools/syncsnoop_example.txt).
- tools/[syscount.bt](tools/syscount.bt): Count system callls. [Examples](tools/syscount_example.txt). - tools/[syscount.bt](tools/syscount.bt): Count system callls. [Examples](tools/syscount_example.txt).
......
...@@ -367,17 +367,38 @@ These can be used in bpftrace scripts to document your code. ...@@ -367,17 +367,38 @@ These can be used in bpftrace scripts to document your code.
Example: Example:
``` ```
bpftrace -e 'tracepoint:syscalls:sys_enter_open { printf("%s %s\n", comm, str(args->filename)); }' # bpftrace -e 'tracepoint:syscalls:sys_enter_open { printf("%s %s\n", comm, str(args->filename)); }'
Attaching 1 probe...
snmpd /proc/diskstats
snmpd /proc/stat
snmpd /proc/vmstat
[...]
``` ```
This is returning the `filename` member from the `args` struct, which for tracepoint probes contains the tracepoint arguments. This is returning the `filename` member from the `args` struct, which for tracepoint probes contains the tracepoint arguments. See the [Static Tracing, Kernel-Level Arguments](#6-tracepoint-static-tracing-kernel-level-arguments) section for the contents of this struct.
A future example is to add struct support to kprobes, so that this is possible (see issue [#34](https://github.com/iovisor/bpftrace/issues/34)): Here is an example of dynamic tracing of the `vfs_open()` kernel function, via the short script path.bt:
``` ```
bpftrace -e 'kprobe:do_nanosleep { printf("secs: %d\n", ((struct timespec *)arg0)->tv_nsec); }' # cat path.bt
#include <linux/path.h>
#include <linux/dcache.h>
kprobe:vfs_open
{
printf("open path: %s\n", str(((path *)arg0)->dentry->d_name.name));
}
# bpftrace path.bt
Attaching 1 probe...
open path: dev
open path: if_inet6
open path: retrans_time_ms
[...]
``` ```
Some kernel headers needed to be included to understand the `path` and `dentry` structs.
# Probes # Probes
- `kprobe` - kernel function start - `kprobe` - kernel function start
...@@ -456,7 +477,7 @@ returned: 21 ...@@ -456,7 +477,7 @@ returned: 21
[...] [...]
``` ```
**TODO**: see issue [#34](https://github.com/iovisor/bpftrace/issues/34) for supporting struct arguments on kprobes. See [C Struct Navigation](#4---c-struct-navigation) for an example of accessing kprobe struct arguments.
## 3. `uprobe`/`uretprobe`: Dynamic Tracing, User-Level ## 3. `uprobe`/`uretprobe`: Dynamic Tracing, User-Level
......
.TH dcsnoop 8 "2018-09-08" "USER COMMANDS"
.SH NAME
dcsnoop.bt \- Trace directory entry cache (dcache) lookups. Uses bpftrace/eBPF.
.SH SYNOPSIS
.B dcsnoop.bt
.SH DESCRIPTION
By default, this traces every dcache lookup, and shows the
process performing the lookup and the filename requested.
The output of this tool can be verbose, and is intended for further
investigations of dcache performance beyond dcstat(8), which prints
per-second summaries.
This uses kernel dynamic tracing of the d_lookup() function, and will need
and will need updating to match any changes to this function.
Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_BPF and bcc.
.SH EXAMPLES
.TP
Trace all dcache lookups:
#
.B dcsnoop
.SH FIELDS
.TP
TIME(ms)
Time of lookup, in milliseconds.
.TP
PID
Process ID.
.TP
COMM
Process name.
.TP
T
Type: R == reference, M == miss. A miss will print two
lines, one for the reference, and one for the miss.
.TP
FILE
The file name component that was being looked up. This contains trailing
pathname components (after '/'), which will be the subject of subsequent
lookups.
.SH OVERHEAD
File name lookups can be frequent (depending on the workload), and this tool
prints a line for each failed lookup, and with \-a, each reference as well. The
output may be verbose, and the incurred overhead, while optimized to some
extent, may still be from noticeable to significant. This is only really
intended for deeper investigations beyond dcstat(8), when absolutely necessary.
Measure and quantify the overhead in a test environment before use.
.SH SOURCE
This is from bpftrace.
.IP
https://github.com/iovisor/bpftrace
.PP
Also look in the bpftrace distribution for a companion _examples.txt file containing
example usage, output, and commentary for this tool.
This is a bpftrace version of the bcc tool of the same name. The bcc tool
may provide more options and customizations.
.IP
https://github.com/iovisor/bcc
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
dcstat(8)
.TH mdflush 8 "2018-09-07" "USER COMMANDS"
.SH NAME
mdflush.bt \- Trace md flush events. Uses bpftrace/eBPF.
.SH SYNOPSIS
.B mdflush.bt
.SH DESCRIPTION
This tool traces flush events by md, the Linux multiple device driver
(software RAID). The timestamp and md device for the flush are printed.
Knowing when these flushes happen can be useful for correlation with
unexplained spikes in disk latency.
This works by tracing the kernel md_flush_request() function using kernel
dynamic tracing, and will need updating to match any changes to this function.
Note that the flushes themselves are likely to originate from higher in the
I/O stack, such as from the file systems.
Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_BPF and bpftrace.
.SH EXAMPLES
.TP
Trace md flush events:
#
.B mdflush.bt
.SH FIELDS
.TP
TIME
Time of the flush event (HH:MM:SS).
.TP
PID
The process ID that was on-CPU when the event was issued. This may identify
the cause of the flush (eg, the "sync" command), but will often identify a
kernel worker thread that was managing I/O.
.TP
COMM
The command name for the PID.
.TP
DEVICE
The md device name.
.SH OVERHEAD
Expected to be negligible.
.SH SOURCE
This is from bpftrace.
.IP
https://github.com/iovisor/bpftrace
.PP
Also look in the bpftrace distribution for a companion _examples.txt file containing
example usage, output, and commentary for this tool.
This is a bpftrace version of the bcc tool of the same name. The bcc tool
may provide more options and customizations.
.IP
https://github.com/iovisor/bcc
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
biosnoop(8)
.TH oomkill 8 "2018-09-07" "USER COMMANDS"
.SH NAME
oomkill.bt \- Trace OOM killer. Uses bpftrace/eBPF.
.SH SYNOPSIS
.B oomkill.bt
.SH DESCRIPTION
This traces the kernel out-of-memory killer, and prints basic details,
including the system load averages at the time of the OOM kill. This can
provide more context on the system state at the time: was it getting busier
or steady, based on the load averages? This tool may also be useful to
customize for investigations; for example, by adding other task_struct
details at the time of OOM, or by adding other commands to run at the shell.
Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_BPF and bpftrace.
.SH EXAMPLES
.TP
Trace OOM kill events:
#
.B oomkill.bt
.SH FIELDS
.TP
Triggered by ...
The process ID and process name of the task that was running when another task was OOM
killed.
.TP
OOM kill of ...
The process ID and name of the target process that was OOM killed.
.TP
loadavg
Contents of /proc/loadavg. The first three numbers are 1, 5, and 15 minute
load averages (where the average is an exponentially damped moving sum, and
those numbers are constants in the equation); then there is the number of
running tasks, a slash, and the total number of tasks; and then the last number
is the last PID to be created.
.SH OVERHEAD
Negligible.
.SH SOURCE
This is from bpftrace.
.IP
https://github.com/iovisor/bpftrace
.PP
Also look in the bpftrace distribution for a companion _examples.txt file containing
example usage, output, and commentary for this tool.
This is a bpftrace version of the bcc tool of the same name. The bcc tool
may provide more options and customizations.
.IP
https://github.com/iovisor/bcc
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
dmesg(1)
.TH runqlen 8 "2018-10-07" "USER COMMANDS"
.SH NAME
runqlen.bt \- CPU scheduler run queue length as a histogram. Uses bpftrace/eBPF.
.SH SYNOPSIS
.B runqlen.bt
.SH DESCRIPTION
This program summarizes scheduler queue length as a histogram, and can also
show run queue occupancy. It works by sampling the run queue length on all
CPUs at 99 Hertz.
This tool can be used to identify imbalances, eg, when processes are bound
to CPUs causing queueing, or interrupt mappings causing the same.
Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_BPF and bpftrace.
.SH EXAMPLES
.TP
Trace CPU run queue length system wide, printing a histogram on Ctrl-C:
#
.B runqlen.bt
.SH FIELDS
.TP
1st, 2nd
The run queue length is shown in the first field (after "[").
.TP
3rd
A column showing the count of samples in for that length.
.TP
4th
This is an ASCII histogram representing the count colimn.
.SH OVERHEAD
This samples scheduler structs at 99 Hertz across all CPUs. Relatively,
this is a low rate of events, and the overhead of this tool is expected
to be near zero.
.SH SOURCE
This is from bpftrace.
.IP
https://github.com/iovisor/bpftrace
.PP
Also look in the bpftrace distribution for a companion _examples.txt file containing
example usage, output, and commentary for this tool.
This is a bpftrace version of the bcc tool of the same name. The bcc tool
may provide more options and customizations.
.IP
https://github.com/iovisor/bcc
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
mpstat(1), pidstat(1), uptime(1)
/*
* dcsnoop Trace directory entry cache (dcache) lookups.
* For Linux, uses bpftrace and eBPF.
*
* This uses kernel dynamic tracing of kernel functions, lookup_fast() and
* d_lookup(), which will need to be modified to match kernel changes. See
* code comments.
*
* USAGE: dcsnoop.bt
*
* Copyright 2018 Netflix, Inc.
* Licensed under the Apache License, Version 2.0 (the "License")
*
* 08-Sep-2018 Brendan Gregg Created this.
*/
#include <linux/fs.h>
#include <linux/sched.h>
// from fs/namei.c:
struct nameidata {
struct path path;
struct qstr last;
// [...]
}
BEGIN
{
printf("Tracing dcache lookups... Hit Ctrl-C to end.\n");
printf("%-8s %-6s %-16s %1s %s\n", "TIME", "PID", "COMM", "T", "FILE");
@epoch = nsecs;
}
// comment out this block to avoid showing hits:
kprobe:lookup_fast
{
$nd = (nameidata *)arg0;
printf("%-8d %-6d %-16s R %s\n", (nsecs - @epoch) / 1000000, pid, comm,
str($nd->last.name));
}
kprobe:d_lookup
{
$name = (qstr *)arg1;
@fname[tid] = $name->name;
}
kretprobe:d_lookup
/@fname[tid]/
{
printf("%-8d %-6d %-16s M %s\n", (nsecs - @epoch) / 1000000, pid, comm,
str(@fname[tid]));
delete(@fname[tid]);
}
Demonstrations of dcsnoop, the Linux bpftrace/eBPF version.
dcsnoop traces directory entry cache (dcache) lookups, and can be used for
further investigation beyond dcstat(8). The output is likely verbose, as
dcache lookups are likely frequent. For example:
# dcsnoop.bt
Attaching 4 probes...
Tracing dcache lookups... Hit Ctrl-C to end.
TIME PID COMM T FILE
427 1518 irqbalance R proc/interrupts
427 1518 irqbalance R interrupts
427 1518 irqbalance R proc/stat
427 1518 irqbalance R stat
483 2440 snmp-pass R proc/cpuinfo
483 2440 snmp-pass R cpuinfo
486 2440 snmp-pass R proc/stat
486 2440 snmp-pass R stat
834 1744 snmpd R proc/net/dev
834 1744 snmpd R net/dev
834 1744 snmpd R self/net
834 1744 snmpd R 1744
834 1744 snmpd R net
834 1744 snmpd R dev
834 1744 snmpd R proc/net/if_inet6
834 1744 snmpd R net/if_inet6
834 1744 snmpd R self/net
834 1744 snmpd R 1744
834 1744 snmpd R net
834 1744 snmpd R if_inet6
835 1744 snmpd R sys/class/net/docker0/device/vendor
835 1744 snmpd R class/net/docker0/device/vendor
835 1744 snmpd R net/docker0/device/vendor
835 1744 snmpd R docker0/device/vendor
835 1744 snmpd R devices/virtual/net/docker0
835 1744 snmpd R virtual/net/docker0
835 1744 snmpd R net/docker0
835 1744 snmpd R docker0
835 1744 snmpd R device/vendor
835 1744 snmpd R proc/sys/net/ipv4/neigh/docker0/retrans_time_ms
835 1744 snmpd R sys/net/ipv4/neigh/docker0/retrans_time_ms
835 1744 snmpd R net/ipv4/neigh/docker0/retrans_time_ms
835 1744 snmpd R ipv4/neigh/docker0/retrans_time_ms
835 1744 snmpd R neigh/docker0/retrans_time_ms
835 1744 snmpd R docker0/retrans_time_ms
835 1744 snmpd R retrans_time_ms
835 1744 snmpd R proc/sys/net/ipv6/neigh/docker0/retrans_time_ms
835 1744 snmpd R sys/net/ipv6/neigh/docker0/retrans_time_ms
835 1744 snmpd R net/ipv6/neigh/docker0/retrans_time_ms
835 1744 snmpd R ipv6/neigh/docker0/retrans_time_ms
835 1744 snmpd R neigh/docker0/retrans_time_ms
835 1744 snmpd R docker0/retrans_time_ms
835 1744 snmpd R retrans_time_ms
835 1744 snmpd R proc/sys/net/ipv6/conf/docker0/forwarding
835 1744 snmpd R sys/net/ipv6/conf/docker0/forwarding
835 1744 snmpd R net/ipv6/conf/docker0/forwarding
835 1744 snmpd R ipv6/conf/docker0/forwarding
835 1744 snmpd R conf/docker0/forwarding
[...]
5154 934 cksum R usr/bin/basename
5154 934 cksum R bin/basename
5154 934 cksum R basename
5154 934 cksum R usr/bin/bashbug
5154 934 cksum R bin/bashbug
5154 934 cksum R bashbug
5154 934 cksum M bashbug
5155 934 cksum R usr/bin/batch
5155 934 cksum R bin/batch
5155 934 cksum R batch
5155 934 cksum M batch
5155 934 cksum R usr/bin/bc
5155 934 cksum R bin/bc
5155 934 cksum R bc
5155 934 cksum M bc
5169 934 cksum R usr/bin/bdftopcf
5169 934 cksum R bin/bdftopcf
5169 934 cksum R bdftopcf
5169 934 cksum M bdftopcf
5173 934 cksum R usr/bin/bdftruncate
5173 934 cksum R bin/bdftruncate
5173 934 cksum R bdftruncate
5173 934 cksum M bdftruncate
The way the dcache is currently implemented, each component of a path is
checked in turn. The first line, showing "proc/interrupts" from irqbalance,
will be a lookup for "proc" in a directory (that isn't shown here). If it
finds "proc", it will then lookup "interrupts" inside net.
The script is easily modifiable to only show misses, reducing the volume of
the output. Or use the bcc version of this tool, which only shows misses by
default: https://github.com/iovisor/bcc
/*
* mdflush Trace md flush events.
* For Linux, uses bpftrace and eBPF.
*
* USAGE: mdflush.bt
*
* This is a bpftrace version of the bcc tool of the same name.
*
* Copyright 2018 Netflix, Inc.
* Licensed under the Apache License, Version 2.0 (the "License")
*
* 08-Sep-2018 Brendan Gregg Created this.
*/
#include <linux/genhd.h>
#include <linux/bio.h>
BEGIN
{
printf("Tracing md flush events... Hit Ctrl-C to end.\n");
printf("%-8s %-6s %-16s %s", "TIME", "PID", "COMM", "DEVICE");
}
kprobe:md_flush_request
{
time("%H:%M:%S ");
printf("%-6d %-16s %s\n", pid, comm, ((bio *)arg1)->bi_disk->disk_name);
}
Demonstrations of mdflush, the Linux bpftrace/eBPF version.
The mdflush tool traces flushes at the md driver level, and prints details
including the time of the flush:
# ./mdflush.bt
Tracing md flush requests... Hit Ctrl-C to end.
TIME PID COMM DEVICE
03:13:49 16770 sync md0
03:14:08 16864 sync md0
03:14:49 496 kworker/1:0H md0
03:14:49 488 xfsaild/md0 md0
03:14:54 488 xfsaild/md0 md0
03:15:00 488 xfsaild/md0 md0
03:15:02 85 kswapd0 md0
03:15:02 488 xfsaild/md0 md0
03:15:05 488 xfsaild/md0 md0
03:15:08 488 xfsaild/md0 md0
03:15:10 488 xfsaild/md0 md0
03:15:11 488 xfsaild/md0 md0
03:15:11 488 xfsaild/md0 md0
03:15:11 488 xfsaild/md0 md0
03:15:11 488 xfsaild/md0 md0
03:15:11 488 xfsaild/md0 md0
03:15:12 488 xfsaild/md0 md0
03:15:13 488 xfsaild/md0 md0
03:15:15 488 xfsaild/md0 md0
03:15:19 496 kworker/1:0H md0
03:15:49 496 kworker/1:0H md0
03:15:55 18840 sync md0
03:16:49 496 kworker/1:0H md0
03:17:19 496 kworker/1:0H md0
03:20:19 496 kworker/1:0H md0
03:21:19 496 kworker/1:0H md0
03:21:49 496 kworker/1:0H md0
03:25:19 496 kworker/1:0H md0
[...]
This can be useful for correlation with latency outliers or spikes in disk
latency, as measured using another tool (eg, system monitoring). If spikes in
disk latency often coincide with md flush events, then it would make flushing
a target for tuning.
Note that the flush events are likely to originate from higher in the I/O
stack, such as from file systems. This traces md processing them, and the
timestamp corresponds with when md began to issue the flush to disks.
There is another version of this tool in bcc: https://github.com/iovisor/bcc
/*
* oomkill Trace OOM killer.
* For Linux, uses bpftrace and eBPF.
*
* This traces the kernel out-of-memory killer, and prints basic details,
* including the system load averages. This can provide more context on the
* system state at the time of OOM: was it getting busier or steady, based
* on the load averages? This tool may also be useful to customize for
* investigations; for example, by adding other task_struct details at the
* time of the OOM, or other commands in the system() call.
*
* This currently works by using kernel dynamic tracing of oom_kill_process().
*
* USAGE: oomkill.bt
*
* Copyright 2018 Netflix, Inc.
* Licensed under the Apache License, Version 2.0 (the "License")
*
* 07-Sep-2018 Brendan Gregg Created this.
*/
#include <linux/oom.h>
BEGIN
{
printf("Tracing oom_kill_process()... Hit Ctrl-C to end.\n");
}
kprobe:oom_kill_process
{
$oc = (oom_control *)arg1;
time("%H:%M:%S ");
printf("Triggered by PID %d (\"%s\"), ", pid, comm);
printf("OOM kill of PID %d (\"%s\"), %d pages, loadavg: ",
$oc->chosen->pid, $oc->chosen->comm, $oc->totalpages);
system("cat /proc/loadavg");
}
Demonstrations of oomkill, the Linux bpftrace/eBPF version.
oomkill is a simple program that traces the Linux out-of-memory (OOM) killer,
and shows basic details on one line per OOM kill:
# ./oomkill
Tracing oom_kill_process()... Ctrl-C to end.
21:03:39 Triggered by PID 3297 ("ntpd"), OOM kill of PID 22516 ("perl"), 3850642 pages, loadavg: 0.99 0.39 0.30 3/282 22724
21:03:48 Triggered by PID 22517 ("perl"), OOM kill of PID 22517 ("perl"), 3850642 pages, loadavg: 0.99 0.41 0.30 2/282 22932
The first line shows that PID 22516, with process name "perl", was OOM killed
when it reached 3850642 pages (usually 4 Kbytes per page). This OOM kill
happened to be triggered by PID 3297, process name "ntpd", doing some memory
allocation.
The system log (dmesg) shows pages of details and system context about an OOM
kill. What it currently lacks, however, is context on how the system had been
changing over time. I've seen OOM kills where I wanted to know if the system
was at steady state at the time, or if there had been a recent increase in
workload that triggered the OOM event. oomkill provides some context: at the
end of the line is the load average information from /proc/loadavg. For both
of the oomkills here, we can see that the system was getting busier at the
time (a higher 1 minute "average" of 0.99, compared to the 15 minute "average"
of 0.30).
oomkill can also be the basis of other tools and customizations. For example,
you can edit it to include other task_struct details from the target PID at
the time of the OOM kill, or to run other commands from the shell.
There is another version of this tool in bcc: https://github.com/iovisor/bcc
/*
* runqlen.bt CPU scheduler run queue length as a histogram.
* For Linux, uses bpftrace, eBPF.
*
* This is a bpftrace version of the bcc tool of the same name.
*
* Copyright 2018 Netflix, Inc.
* Licensed under the Apache License, Version 2.0 (the "License")
*
* 07-Oct-2018 Brendan Gregg Created this.
*/
#include <linux/sched.h>
// Until BTF is available, we'll need to declare some of this struct manually,
// since it isn't avaible to be #included. This will need maintenance to match
// your kernel version. It is from kernel/sched/sched.h:
struct cfs_rq_partial {
struct load_weight load;
unsigned long runnable_weight;
unsigned int nr_running;
unsigned int h_nr_running;
}
BEGIN
{
printf("Sampling run queue length at 99 Hertz... Hit Ctrl-C to end.\n");
}
profile:hz:99
{
$task = (task_struct *)curtask;
$my_q = (cfs_rq_partial *)$task->se.cfs_rq;
$len = $my_q->nr_running;
$len = $len > 0 ? $len - 1 : 0; // subtract currently runing task
@runqlen = lhist($len, 0, 100, 1);
}
Demonstrations of runqlen, the Linux BPF/bpftrace version.
This tool samples the length of the CPU scheduler run queues, showing these
sampled lengths as a histogram. This can be used to characterize demand for
CPU resources. For example:
# runqlen.bt
Attaching 2 probes...
Sampling run queue length at 99 Hertz... Hit Ctrl-C to end.
^C
@runqlen:
[0, 1) 1967 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1, 2) 0 | |
[2, 3) 0 | |
[3, 4) 306 |@@@@@@@@ |
This output shows that the run queue length was usually zero, except for some
samples where it was 3. This was caused by binding 4 CPU bound threads to a
single CPUs.
There is another version of this tool in bcc: https://github.com/iovisor/bcc
The bcc version provides options to customize the output.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment