Commit 2fc5fc4b authored by Sasha Goldshtein's avatar Sasha Goldshtein

Merge branch 'master' into memleak-enh

parents 50459640 849f83d2
......@@ -72,7 +72,9 @@ Tools:
- tools/[bitesize](tools/bitesize.py): Show per process I/O size histogram. [Examples](tools/bitesize_example.txt).
- tools/[cachestat](tools/cachestat.py): Trace page cache hit/miss ratio. [Examples](tools/cachestat_example.txt).
- tools/[execsnoop](tools/execsnoop.py): Trace new processes via exec() syscalls. [Examples](tools/execsnoop_example.txt).
- tools/[filelife](tools/filelife.py): Trace the lifespan of short-lived files. [Examples](tools/filelife_example.txt).
- tools/[fileslower](tools/fileslower.py): Trace slow synchronous file reads and writes. [Examples](tools/fileslower_example.txt).
- tools/[filetop](tools/filetop.py): File reads and writes by filename and process. Top for files. [Examples](tools/filetop_example.txt).
- tools/[funccount](tools/funccount.py): Count kernel function calls. [Examples](tools/funccount_example.txt).
- tools/[funclatency](tools/funclatency.py): Time kernel functions and show their latency distribution. [Examples](tools/funclatency_example.txt).
- tools/[gethostlatency](tools/gethostlatency.py): Show latency for getaddrinfo/gethostbyname[2] calls. [Examples](tools/gethostlatency_example.txt).
......@@ -87,6 +89,7 @@ Tools:
- tools/[softirqs](tools/softirqs.py): Measure soft IRQ (soft interrupt) event time. [Examples](tools/softirqs_example.txt).
- tools/[stackcount](tools/stackcount.py): Count kernel function calls and their stack traces. [Examples](tools/stackcount_example.txt).
- tools/[stacksnoop](tools/stacksnoop.py): Trace a kernel function and print all kernel stack traces. [Examples](tools/stacksnoop_example.txt).
- tools/[statsnoop](tools/statsnoop.py): Trace stat() syscalls. [Examples](tools/statsnoop_example.txt).
- tools/[syncsnoop](tools/syncsnoop.py): Trace sync() syscall. [Examples](tools/syncsnoop_example.txt).
- tools/[tcpaccept](tools/tcpaccept.py): Trace TCP passive connections (accept()). [Examples](tools/tcpaccept_example.txt).
- tools/[tcpconnect](tools/tcpconnect.py): Trace TCP active connections (connect()). [Examples](tools/tcpconnect_example.txt).
......
......@@ -22,6 +22,22 @@ and will need updating to match any changes to these functions.
Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_BPF and bcc.
.SH OPTIONS
.TP
\-C
Don't clear the screen.
.TP
\-r MAXROWS
Maximum number of rows to print. Default is 20.
.TP
\-p PID
Trace this PID only.
.TP
interval
Interval between updates, seconds.
.TP
count
Number of interval summaries.
.SH EXAMPLES
.TP
Summarize block device I/O by process, 1 second screen refresh:
......@@ -86,5 +102,7 @@ Linux
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH INSPIRATION
top(1) by William LeFebvre
.SH SEE ALSO
biosnoop(8), biolatency(8), iostat(1)
......@@ -2,10 +2,10 @@
.SH NAME
execsnoop \- Trace new processes via exec() syscalls. Uses Linux eBPF/bcc.
.SH SYNOPSIS
.B execsnoop [\-h] [\-t] [\-X] [\-n NAME]
.B execsnoop [\-h] [\-t] [\-x] [\-n NAME]
.SH DESCRIPTION
execsnoop traces new processes, showing the filename executed, argument
list, and return value (0 for success).
execsnoop traces new processes, showing the filename executed and argument
list.
It works by traces the execve() system call (commonly used exec() variant).
This catches new processes that follow the fork->exec sequence, as well as
......@@ -27,8 +27,8 @@ Print usage message.
\-t
Include a timestamp column.
.TP
\-X
Exclude failed exec()s
\-x
Include failed exec()s
.TP
\-n NAME
Only print command lines matching this name (regex), matched anywhere
......@@ -42,9 +42,9 @@ Trace all exec() syscalls, and include timestamps:
#
.B execsnoop \-t
.TP
Only trace successful exec()s:
Include failed exec()s:
#
.B execsnoop \-X
.B execsnoop \-x
.TP
Only trace exec()s where the filename or arguments contain "mount":
#
......@@ -61,7 +61,8 @@ PID
Process ID
.TP
RET
Return value of exec(). 0 == successs.
Return value of exec(). 0 == successs. Failures are only shown when using the
\-x option.
.TP
ARGS
Filename for the exec(), followed be up to 19 arguments. An ellipsis "..." is
......
.TH filelife 8 "2016-02-08" "USER COMMANDS"
.SH NAME
filelife \- Trace the lifespan of short-lived files. Uses Linux eBPF/bcc.
.SH SYNOPSIS
.B filelife [\-h] [\-p PID]
.SH DESCRIPTION
This traces the creation and deletion of files, providing information
on who deleted the file, the file age, and the file name. The intent is to
provide information on short-lived files, for debugging or performance
analysis.
This works by tracing the kernel vfs_create() and vfs_delete() functions using
dynamic tracing, and will need updating to match any changes to these
functions.
Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_BPF and bcc.
.SH OPTIONS
.TP
\-h
Print usage message.
.TP
\-p PID
Trace this process ID only (filtered in-kernel).
.SH EXAMPLES
.TP
Trace all short-lived files, and print details:
#
.B filelife
.TP
Trace all short-lived files created AND deleted by PID 181:
#
.B filelife \-p 181
.SH FIELDS
.TP
TIME
Time of the deletion.
.TP
PID
Process ID that deleted the file.
.TP
COMM
Process name for the PID.
.TP
AGE(s)
Age of the file, from creation to deletion, in seconds.
.TP
FILE
Filename.
.SH OVERHEAD
This traces the kernel VFS file create and delete functions and prints output
for each delete. As the rate of this is generally expected to be low
(< 1000/s), the overhead is also expected to be negligible.
This is from bcc.
.IP
https://github.com/iovisor/bcc
.PP
Also look in the bcc distribution for a companion _examples.txt file containing
example usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
opensnoop(1)
.TH filetop 8 "2016-02-08" "USER COMMANDS"
.SH NAME
filetop \- File reads and writes by filename and process. Top for files.
.SH SYNOPSIS
.B filetop [\-h] [\-C] [\-r MAXROWS] [\-p PID] [interval] [count]
.SH DESCRIPTION
This is top for files.
This traces file reads and writes, and prints a per-file summary every
interval (by default, 1 second). The summary is sorted on the highest read
throughput (Kbytes).
This uses in-kernel eBPF maps to store per process summaries for efficiency.
This script works by tracing the __vfs_read() and __vfs_write() functions using
kernel dynamic tracing, which instruments explicit read and write calls. If
files are read or written using another means (eg, via mmap()), then they
will not be visible using this tool. Also, this tool will need updating to
match any code changes to those vfs functions.
This should be useful for file system workload characterization when analyzing
the performance of applications.
Note that tracing VFS level reads and writes can be a frequent activity, and
this tool can begin to cost measurable overhead at high I/O rates.
Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_BPF and bcc.
.SH OPTIONS
.TP
\-C
Don't clear the screen.
.TP
\-r MAXROWS
Maximum number of rows to print. Default is 20.
.TP
\-p PID
Trace this PID only.
.TP
interval
Interval between updates, seconds.
.TP
count
Number of interval summaries.
.SH EXAMPLES
.TP
Summarize block device I/O by process, 1 second screen refresh:
#
.B filetop
.TP
Don't clear the screen, and top 8 rows only:
#
.B filetop -Cr 8
.TP
5 second summaries, 10 times only:
#
.B filetop 5 10
.SH FIELDS
.TP
loadavg:
The contents of /proc/loadavg
.TP
PID
Process ID.
.TP
COMM
Process name.
.TP
READS
Count of reads during interval.
.TP
WRITES
Count of writes during interval.
.TP
R_Kb
Total read Kbytes during interval.
.TP
W_Kb
Total write Kbytes during interval.
.TP
T
Type of file: R == regular, S == socket, O == other (pipe, etc).
.SH OVERHEAD
Depending on the frequency of application reads and writes, overhead can become
significant, in the worst case slowing applications by over 50%. Hopefully for
real world workloads the overhead is much less -- test before use. The reason
for the high overhead is that VFS reads and writes can be a frequent event, and
despite the eBPF overhead being very small per event, if you multiply this
small overhead by a million events per second, it becomes a million times
worse. Literally. You can gauge the number of reads and writes using the
vfsstat(8) tool, also from bcc.
.SH SOURCE
This is from bcc.
.IP
https://github.com/iovisor/bcc
.PP
Also look in the bcc distribution for a companion _examples.txt file containing
example usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH INSPIRATION
top(1) by William LeFebvre
.SH SEE ALSO
vfsstat(8), vfscount(8), fileslower(8)
.TH statsnoop 8 "2016-02-08" "USER COMMANDS"
.SH NAME
statsnoop \- Trace stat() syscalls. Uses Linux eBPF/bcc.
.SH SYNOPSIS
.B statsnoop [\-h] [\-t] [\-x] [\-p PID]
.SH DESCRIPTION
statsnoop traces the different stat() syscalls, showing which processes are
attempting to read information about which files. This can be useful for
determining the location of config and log files, or for troubleshooting
applications that are failing, especially on startup.
This works by tracing various kernel sys_stat() functions using dynamic
tracing, and will need updating to match any changes to these functions.
Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_BPF and bcc.
.SH OPTIONS
.TP
\-h
Print usage message.
.TP
\-t
Include a timestamp column.
.TP
\-x
Only print failed stats.
.TP
\-p PID
Trace this process ID only (filtered in-kernel).
.SH EXAMPLES
.TP
Trace all stat() syscalls:
#
.B statsnoop
.TP
Trace all stat() syscalls, and include timestamps:
#
.B statsnoop \-t
.TP
Trace only stat() syscalls that failed:
#
.B statsnoop \-x
.TP
Trace PID 181 only:
#
.B statsnoop \-p 181
.SH FIELDS
.TP
TIME(s)
Time of the call, in seconds.
.TP
PID
Process ID
.TP
COMM
Process name
.TP
FD
File descriptor (if success), or -1 (if failed)
.TP
ERR
Error number (see the system's errno.h)
.TP
PATH
Open path
.SH OVERHEAD
This traces the kernel stat function and prints output for each event. As the
rate of this is generally expected to be low (< 1000/s), the overhead is also
expected to be negligible. If you have an application that is calling a high
rate of stat()s, then test and understand overhead before use.
.SH SOURCE
This is from bcc.
.IP
https://github.com/iovisor/bcc
.PP
Also look in the bcc distribution for a companion _examples.txt file containing
example usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
opensnoop(1)
......@@ -4,7 +4,7 @@
# execsnoop Trace new processes via exec() syscalls.
# For Linux, uses BCC, eBPF. Embedded C.
#
# USAGE: execsnoop [-h] [-t] [-X] [-n NAME]
# USAGE: execsnoop [-h] [-t] [-x] [-n NAME]
#
# This currently will print up to a maximum of 19 arguments, plus the process
# name, so 20 fields in total (MAXARG).
......@@ -24,7 +24,7 @@ import re
# arguments
examples = """examples:
./execsnoop # trace all exec() syscalls
./execsnoop -X # only show successful exec()s
./execsnoop -x # include failed exec()s
./execsnoop -t # include timestamps
./execsnoop -n main # only print command lines containing "main"
"""
......@@ -34,8 +34,8 @@ parser = argparse.ArgumentParser(
epilog=examples)
parser.add_argument("-t", "--timestamp", action="store_true",
help="include timestamp on output")
parser.add_argument("-X", "--excludefails", action="store_true",
help="exclude failed exec()s")
parser.add_argument("-x", "--fails", action="store_true",
help="include failed exec()s")
parser.add_argument("-n", "--name",
help="only print commands matching this name (regex), any arg")
args = parser.parse_args()
......@@ -125,17 +125,25 @@ pcomm = {}
# format output
while 1:
(task, pid, cpu, flags, ts, msg) = b.trace_fields()
(type, arg) = msg.split(" ", 1)
try:
(type, arg) = msg.split(" ", 1)
except ValueError:
continue
if start_ts == 0:
start_ts = ts
if type == "RET":
if pid not in cmd:
# zero args
cmd[pid] = ""
pcomm[pid] = ""
skip = 0
if args.name:
if not re.search(args.name, cmd[pid]):
skip = 1
if args.excludefails and int(arg) < 0:
if not args.fails and int(arg) < 0:
skip = 1
if skip:
del cmd[pid]
......
Demonstrations of execsnoop, the Linux eBPF/bcc version.
execsnoop traces new processes. For example:
execsnoop traces new processes. For example, tracing the commands invoked when
running "man ls":
# ./execsnoop
# ./execsnoop
PCOMM PID RET ARGS
bash 15887 0 /usr/bin/man ls
preconv 15894 0 /usr/bin/preconv -e UTF-8
man 15896 0 /usr/bin/tbl
man 15897 0 /usr/bin/nroff -mandoc -rLL=169n -rLT=169n -Tutf8
man 15898 0 /usr/bin/pager -s
nroff 15900 0 /usr/bin/locale charmap
nroff 15901 0 /usr/bin/groff -mtty-char -Tutf8 -mandoc -rLL=169n -rLT=169n
groff 15902 0 /usr/bin/troff -mtty-char -mandoc -rLL=169n -rLT=169n -Tutf8
groff 15903 0 /usr/bin/grotty
The output shows the parent process/command name (PCOMM), the PID, the return
value of the exec() (RET), and the filename with arguments (ARGS).
This works by traces the execve() system call (commonly used exec() variant),
and shows details of the arguments and return value. This catches new processes
that follow the fork->exec sequence, as well as processes that re-exec()
themselves. Some applications fork() but do not exec(), eg, for worker
processes, which won't be included in the execsnoop output.
The -x option can be used to include failed exec()s. For example:
# ./execsnoop -x
PCOMM PID RET ARGS
supervise 9660 0 ./run
supervise 9661 0 ./run
......@@ -21,35 +46,9 @@ run 9661 -2 /usr/local/bin/setuidgid nobody /command/multilog t
supervise 9670 0 ./run
[...]
The output shows the parent process/command name (PCOMM), the PID, the return
value of the exec() (RET), and the filename with arguments (ARGS). The example
above shows various regular system daemon activity, including some failures
(trying to execute a /usr/local/bin/setuidgid, which I just noticed doesn't
exist).
It works by traces the execve() system call (commonly used exec() variant), and
shows details of the arguments and return value. This catches new processes
that follow the fork->exec sequence, as well as processes that re-exec()
themselves. Some applications fork() but do not exec(), eg, for worker
processes, which won't be included in the execsnoop output.
The -X option can be used to only show successful exec()s. For example, tracing
a "man ls":
# ./execsnoop -X
PCOMM PID RET ARGS
bash 15887 0 /usr/bin/man ls
preconv 15894 0 /usr/bin/preconv -e UTF-8
man 15896 0 /usr/bin/tbl
man 15897 0 /usr/bin/nroff -mandoc -rLL=169n -rLT=169n -Tutf8
man 15898 0 /usr/bin/pager -s
nroff 15900 0 /usr/bin/locale charmap
nroff 15901 0 /usr/bin/groff -mtty-char -Tutf8 -mandoc -rLL=169n -rLT=169n
groff 15902 0 /usr/bin/troff -mtty-char -mandoc -rLL=169n -rLT=169n -Tutf8
groff 15903 0 /usr/bin/grotty
This shows the various commands used to process the "man ls" command.
This example shows various regular system daemon activity, including some
failures (trying to execute a /usr/local/bin/setuidgid, which I just noticed
doesn't exist).
A -t option can be used to include a timestamp column, and a -n option to match
......@@ -64,19 +63,19 @@ TIME(s) PCOMM PID RET ARGS
USAGE message:
# ./execsnoop -h
usage: execsnoop [-h] [-t] [-X] [-n NAME]
usage: execsnoop [-h] [-t] [-x] [-n NAME]
Trace exec() syscalls
optional arguments:
-h, --help show this help message and exit
-t, --timestamp include timestamp on output
-X, --excludefails exclude failed exec()s
-x, --fails include failed exec()s
-n NAME, --name NAME only print commands matching this name (regex), any
arg
examples:
./execsnoop # trace all exec() syscalls
./execsnoop -X # only show successful exec()s
./execsnoop -x # include failed exec()s
./execsnoop -t # include timestamps
./execsnoop -n main # only print command lines containing "main"
#!/usr/bin/python
# @lint-avoid-python-3-compatibility-imports
#
# filelife Trace the lifespan of short-lived files.
# For Linux, uses BCC, eBPF. Embedded C.
#
# This traces the creation and deletion of files, providing information
# on who deleted the file, the file age, and the file name. The intent is to
# provide information on short-lived files, for debugging or performance
# analysis.
#
# USAGE: filelife [-h] [-p PID]
#
# Copyright 2016 Netflix, Inc.
# Licensed under the Apache License, Version 2.0 (the "License")
#
# 08-Feb-2015 Brendan Gregg Created this.
from __future__ import print_function
from bcc import BPF
import argparse
from time import strftime
# arguments
examples = """examples:
./filelife # trace all stat() syscalls
./filelife -p 181 # only trace PID 181
"""
parser = argparse.ArgumentParser(
description="Trace stat() syscalls",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=examples)
parser.add_argument("-p", "--pid",
help="trace this PID only")
args = parser.parse_args()
debug = 0
# define BPF program
bpf_text = """
#include <uapi/linux/ptrace.h>
#include <linux/fs.h>
BPF_HASH(birth, struct dentry *);
// trace file creation time
int trace_create(struct pt_regs *ctx, struct inode *dir, struct dentry *dentry)
{
u32 pid = bpf_get_current_pid_tgid();
FILTER
u64 ts = bpf_ktime_get_ns();
birth.update(&dentry, &ts);
return 0;
};
// trace file deletion and output details
int trace_unlink(struct pt_regs *ctx, struct inode *dir, struct dentry *dentry)
{
u32 pid = bpf_get_current_pid_tgid();
FILTER
u64 *tsp, delta;
tsp = birth.lookup(&dentry);
if (tsp == 0) {
return 0; // missed create
}
delta = (bpf_ktime_get_ns() - *tsp) / 1000000;
birth.delete(&dentry);
if (dentry->d_iname[0] == 0)
return 0;
bpf_trace_printk("%d %s\\n", delta, dentry->d_iname);
return 0;
}
"""
if args.pid:
bpf_text = bpf_text.replace('FILTER',
'if (pid != %s) { return 0; }' % args.pid)
else:
bpf_text = bpf_text.replace('FILTER', '')
if debug:
print(bpf_text)
# initialize BPF
b = BPF(text=bpf_text)
b.attach_kprobe(event="vfs_create", fn_name="trace_create")
b.attach_kprobe(event="vfs_unlink", fn_name="trace_unlink")
# header
print("%-8s %-6s %-16s %-7s %s" % ("TIME", "PID", "COMM", "AGE(s)", "FILE"))
start_ts = 0
# format output
while 1:
(task, pid, cpu, flags, ts, msg) = b.trace_fields()
(delta, filename) = msg.split(" ")
# print columns
print("%-8s %-6d %-16s %-7.2f %s" % (strftime("%H:%M:%S"), pid, task,
float(delta) / 1000, filename))
Demonstrations of filelife, the Linux eBPF/bcc version.
filelife traces short-lived files: those that have been created and then
deleted while tracing. For example:
# ./filelife
TIME PID COMM AGE(s) FILE
05:57:59 8556 gcc 0.04 ccCB5EDe.s
05:57:59 8560 rm 0.02 .entry_64.o.d
05:57:59 8563 gcc 0.02 cc5UFHXf.s
05:57:59 8567 rm 0.01 .thunk_64.o.d
05:57:59 8578 rm 0.02 .syscall_64.o.d
05:58:00 8589 rm 0.03 .common.o.d
05:58:00 8596 rm 0.01 .8592.tmp
05:58:00 8601 rm 0.01 .8597.tmp
05:58:00 8606 rm 0.01 .8602.tmp
05:58:00 8639 rm 0.02 .vma.o.d
05:58:00 8650 rm 0.02 .vdso32-setup.o.d
05:58:00 8656 rm 0.00 .vdso.lds.d
05:58:00 8659 gcc 0.01 ccveeJAz.s
05:58:00 8663 rm 0.01 .vdso-note.o.d
05:58:00 8674 rm 0.02 .vclock_gettime.o.d
05:58:01 8684 rm 0.01 .vgetcpu.o.d
05:58:01 8690 collect2 0.00 ccvKMxdm.ld
This has caught short-lived files that were created during a Linux kernel
build. The PID shows the process ID that finally deleted the file, and COMM
is its process name. The AGE(s) column shows the age of the file, in seconds,
when it was deleted. These are all short-lived, and existed for less than
one tenth of a second.
Creating, populating, and then deleting files as part of another process can
be an inefficient method of inter-process communication. It can cause disk I/O
as files are closed and their file descriptors flushed, only later to be
deleted. As such, short-lived files can be a target of performance
optimizations.
USAGE message:
# ./filelife -h
usage: filelife [-h] [-p PID]
Trace stat() syscalls
optional arguments:
-h, --help show this help message and exit
-p PID, --pid PID trace this PID only
examples:
./filelife # trace all stat() syscalls
./filelife -p 181 # only trace PID 181
#!/usr/bin/python
# @lint-avoid-python-3-compatibility-imports
#
# filetop file reads and writes by process.
# For Linux, uses BCC, eBPF.
#
# USAGE: filetop.py [-h] [-C] [-r MAXROWS] [interval] [count]
#
# This uses in-kernel eBPF maps to store per process summaries for efficiency.
#
# Copyright 2016 Netflix, Inc.
# Licensed under the Apache License, Version 2.0 (the "License")
#
# 06-Feb-2016 Brendan Gregg Created this.
from __future__ import print_function
from bcc import BPF
from time import sleep, strftime
import argparse
import signal
from subprocess import call
# arguments
examples = """examples:
./filetop # file I/O top, 1 second refresh
./filetop -C # don't clear the screen
./filetop -p 181 # PID 181 only
./filetop 5 # 5 second summaries
./filetop 5 10 # 5 second summaries, 10 times only
"""
parser = argparse.ArgumentParser(
description="File reads and writes by process",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=examples)
parser.add_argument("-C", "--noclear", action="store_true",
help="don't clear the screen")
parser.add_argument("-r", "--maxrows", default=20,
help="maximum rows to print, default 20")
parser.add_argument("-p", "--pid",
help="trace this PID only")
parser.add_argument("interval", nargs="?", default=1,
help="output interval, in seconds")
parser.add_argument("count", nargs="?", default=99999999,
help="number of outputs")
args = parser.parse_args()
interval = int(args.interval)
countdown = int(args.count)
maxrows = int(args.maxrows)
clear = not int(args.noclear)
debug = 0
# linux stats
loadavg = "/proc/loadavg"
# signal handler
def signal_ignore(signal, frame):
print()
# define BPF program
bpf_text = """
#include <uapi/linux/ptrace.h>
#include <linux/blkdev.h>
#define MAX_FILE_LEN 32
// the key for the output summary
struct info_t {
u32 pid;
char name[TASK_COMM_LEN];
char file[MAX_FILE_LEN];
char type;
};
// the value of the output summary
struct val_t {
u64 reads;
u64 writes;
u64 rbytes;
u64 wbytes;
};
BPF_HASH(counts, struct info_t, struct val_t);
static int do_entry(struct pt_regs *ctx, struct file *file,
char __user *buf, size_t count, int is_read)
{
u32 pid;
pid = bpf_get_current_pid_tgid();
if (FILTER)
return 0;
// skip I/O lacking a filename
struct dentry *de = file->f_path.dentry;
if (de->d_iname[0] == 0)
return 0;
// store counts and sizes by pid & file
struct info_t info = {.pid = pid};
bpf_get_current_comm(&info.name, sizeof(info.name));
__builtin_memcpy(&info.file, de->d_iname, sizeof(info.file));
int mode = file->f_inode->i_mode;
if (S_ISREG(mode)) {
info.type = 'R';
} else if (S_ISSOCK(mode)) {
info.type = 'S';
} else {
info.type = 'O';
}
struct val_t *valp, zero = {};
valp = counts.lookup_or_init(&info, &zero);
if (is_read) {
valp->reads++;
valp->rbytes += count;
} else {
valp->writes++;
valp->wbytes += count;
}
return 0;
}
int trace_read_entry(struct pt_regs *ctx, struct file *file,
char __user *buf, size_t count)
{
return do_entry(ctx, file, buf, count, 1);
}
int trace_write_entry(struct pt_regs *ctx, struct file *file,
char __user *buf, size_t count)
{
return do_entry(ctx, file, buf, count, 0);
}
"""
if args.pid:
bpf_text = bpf_text.replace('FILTER', 'pid != %s' % args.pid)
else:
bpf_text = bpf_text.replace('FILTER', '0')
if debug:
print(bpf_text)
# initialize BPF
b = BPF(text=bpf_text)
b.attach_kprobe(event="__vfs_read", fn_name="trace_read_entry")
b.attach_kprobe(event="__vfs_write", fn_name="trace_write_entry")
print('Tracing... Output every %d secs. Hit Ctrl-C to end' % interval)
# output
exiting = 0
while 1:
try:
sleep(interval)
except KeyboardInterrupt:
exiting = 1
# header
if clear:
call("clear")
else:
print()
with open(loadavg) as stats:
print("%-8s loadavg: %s" % (strftime("%H:%M:%S"), stats.read()))
print("%-6s %-16s %-6s %-6s %-7s %-7s %1s %s" % ("PID", "COMM",
"READS", "WRITES", "R_Kb", "W_Kb", "T", "FILE"))
# by-PID output
counts = b.get_table("counts")
line = 0
for k, v in reversed(sorted(counts.items(),
key=lambda counts: counts[1].rbytes)):
# print line
print("%-6d %-16s %-6d %-6d %-7d %-7d %1s %s" % (k.pid, k.name,
v.reads, v.writes, v.rbytes / 1024, v.wbytes / 1024, k.type,
k.file))
line += 1
if line >= maxrows:
break
counts.clear()
countdown -= 1
if exiting or countdown == 0:
print("Detaching...")
exit()
Demonstrations of filetop, the Linux eBPF/bcc version.
filetop shows reads and writes by file, with process details. For example:
# ./filetop -C
Tracing... Output every 1 secs. Hit Ctrl-C to end
08:00:23 loadavg: 0.91 0.33 0.23 3/286 26635
PID COMM READS WRITES R_Kb W_Kb T FILE
26628 ld 161 186 643 152 R built-in.o
26634 cc1 1 0 200 0 R autoconf.h
26618 cc1 1 0 200 0 R autoconf.h
26634 cc1 12 0 192 0 R tracepoint.h
26584 cc1 2 0 143 0 R mm.h
26634 cc1 2 0 143 0 R mm.h
26631 make 34 0 136 0 R auto.conf
26634 cc1 1 0 98 0 R fs.h
26584 cc1 1 0 98 0 R fs.h
26634 cc1 1 0 91 0 R sched.h
26634 cc1 1 0 78 0 R printk.c
26634 cc1 3 0 73 0 R mmzone.h
26628 ld 18 0 72 0 R hibernate.o
26628 ld 16 0 64 0 R suspend.o
26628 ld 16 0 64 0 R snapshot.o
26630 cat 1 0 64 0 O null
26628 ld 16 0 64 0 R qos.o
26628 ld 13 0 52 0 R main.o
26628 ld 12 0 52 0 R swap.o
12421 sshd 3 0 48 0 O ptmx
[...]
This shows various files read and written during a Linux kernel build. The
output is sorted by the total read size in Kbytes (R_Kb). This is instrumenting
at the VFS interface, so this is reads and writes that may return entirely
from the file system cache (page cache).
While not printed, the average read and write size can be calculated by
dividing R_Kb by READS, and the same for writes.
The "T" column indicates the type of the file: "R" for regular files, "S" for
sockets, and "O" for other (including pipes).
This script works by tracing the vfs_read() and vfs_write() functions using
kernel dynamic tracing, which instruments explicit read and write calls. If
files are read or written using another means (eg, via mmap()), then they
will not be visible using this tool.
This should be useful for file system workload characterization when analyzing
the performance of applications.
Note that tracing VFS level reads and writes can be a frequent activity, and
this tool can begin to cost measurable overhead at high I/O rates.
A -C option will stop clearing the screen, and -r with a number will restrict
the output to that many rows (20 by default). For example, not clearing
the screen and showing the top 5 only:
# ./filetop -Cr 5
Tracing... Output every 1 secs. Hit Ctrl-C to end
08:05:11 loadavg: 0.75 0.35 0.25 3/285 822
PID COMM READS WRITES R_Kb W_Kb T FILE
32672 cksum 5006 0 320384 0 R data1
12296 sshd 2 0 32 0 O ptmx
809 run 2 0 8 0 R nsswitch.conf
811 run 2 0 8 0 R nsswitch.conf
804 chown 2 0 8 0 R nsswitch.conf
08:05:12 loadavg: 0.75 0.35 0.25 3/285 845
PID COMM READS WRITES R_Kb W_Kb T FILE
32672 cksum 4986 0 319104 0 R data1
845 chown 2 0 8 0 R nsswitch.conf
828 run 2 0 8 0 R nsswitch.conf
835 run 2 0 8 0 R nsswitch.conf
830 run 2 0 8 0 R nsswitch.conf
08:05:13 loadavg: 0.75 0.35 0.25 3/285 868
PID COMM READS WRITES R_Kb W_Kb T FILE
32672 cksum 4985 0 319040 0 R data1
857 run 2 0 8 0 R nsswitch.conf
858 run 2 0 8 0 R nsswitch.conf
859 run 2 0 8 0 R nsswitch.conf
848 run 2 0 8 0 R nsswitch.conf
[...]
This output shows a cksum command reading data1. Note that
An optional interval and optional count can also be added to the end of the
command line. For example, for 1 second interval, and 3 summaries in total:
# ./filetop -Cr 5 1 3
Tracing... Output every 1 secs. Hit Ctrl-C to end
08:08:20 loadavg: 0.30 0.42 0.31 3/282 5187
PID COMM READS WRITES R_Kb W_Kb T FILE
12421 sshd 14101 0 225616 0 O ptmx
12296 sshd 4 0 64 0 O ptmx
12421 sshd 3 14104 48 778 S TCP
5178 run 2 0 8 0 R nsswitch.conf
5165 run 2 0 8 0 R nsswitch.conf
08:08:21 loadavg: 0.30 0.42 0.31 5/282 5210
PID COMM READS WRITES R_Kb W_Kb T FILE
12421 sshd 9159 0 146544 0 O ptmx
12421 sshd 3 9161 48 534 S TCP
12296 sshd 1 0 16 0 S TCP
5188 run 2 0 8 0 R nsswitch.conf
5203 run 2 0 8 0 R nsswitch.conf
08:08:22 loadavg: 0.30 0.42 0.31 2/282 5233
PID COMM READS WRITES R_Kb W_Kb T FILE
12421 sshd 26166 0 418656 0 O ptmx
12421 sshd 4 26171 64 1385 S TCP
12296 sshd 1 0 16 0 O ptmx
5214 run 2 0 8 0 R nsswitch.conf
5227 run 2 0 8 0 R nsswitch.conf
Detaching...
This example has caught heavy socket I/O from an sshd process, showing up as
non-regular file types (the "O" for other, and "S" for socket, in the type
column: "T").
USAGE message:
# ./filetop -h
usage: filetop [-h] [-C] [-r MAXROWS] [-p PID] [interval] [count]
File reads and writes by process
positional arguments:
interval output interval, in seconds
count number of outputs
optional arguments:
-h, --help show this help message and exit
-C, --noclear don't clear the screen
-r MAXROWS, --maxrows MAXROWS
maximum rows to print, default 20
-p PID, --pid PID trace this PID only
examples:
./filetop # file I/O top, 1 second refresh
./filetop -C # don't clear the screen
./filetop -p 181 # PID 181 only
./filetop 5 # 5 second summaries
./filetop 5 10 # 5 second summaries, 10 times only
......@@ -59,7 +59,7 @@ debug = 0
# define BPF program
bpf_text = """
#include <uapi/linux/ptrace.h>
#include "kernel/sched/sched.h"
#include <linux/sched.h>
typedef struct pid_key {
u64 pid; // work around
......@@ -68,6 +68,8 @@ typedef struct pid_key {
BPF_HASH(start, u32);
STORAGE
struct rq;
// record enqueue timestamp
int trace_enqueue(struct pt_regs *ctx, struct rq *rq, struct task_struct *p,
int flags)
......
#!/usr/bin/python
# @lint-avoid-python-3-compatibility-imports
#
# statsnoop Trace stat() syscalls.
# For Linux, uses BCC, eBPF. Embedded C.
#
# USAGE: statsnoop [-h] [-t] [-x] [-p PID]
#
# Copyright 2016 Netflix, Inc.
# Licensed under the Apache License, Version 2.0 (the "License")
#
# 08-Feb-2016 Brendan Gregg Created this.
from __future__ import print_function
from bcc import BPF
import argparse
# arguments
examples = """examples:
./statsnoop # trace all stat() syscalls
./statsnoop -t # include timestamps
./statsnoop -x # only show failed stats
./statsnoop -p 181 # only trace PID 181
"""
parser = argparse.ArgumentParser(
description="Trace stat() syscalls",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=examples)
parser.add_argument("-t", "--timestamp", action="store_true",
help="include timestamp on output")
parser.add_argument("-x", "--failed", action="store_true",
help="only show failed stats")
parser.add_argument("-p", "--pid",
help="trace this PID only")
args = parser.parse_args()
debug = 0
# define BPF program
bpf_text = """
#include <uapi/linux/ptrace.h>
BPF_HASH(args_filename, u32, const char *);
int trace_entry(struct pt_regs *ctx, const char __user *filename)
{
u32 pid = bpf_get_current_pid_tgid();
FILTER
args_filename.update(&pid, &filename);
return 0;
};
int trace_return(struct pt_regs *ctx)
{
const char **filenamep;
int ret = ctx->ax;
u32 pid = bpf_get_current_pid_tgid();
filenamep = args_filename.lookup(&pid);
if (filenamep == 0) {
// missed entry
return 0;
}
bpf_trace_printk("%s %d\\n", *filenamep, ret);
args_filename.delete(&pid);
return 0;
}
"""
if args.pid:
bpf_text = bpf_text.replace('FILTER',
'if (pid != %s) { return 0; }' % args.pid)
else:
bpf_text = bpf_text.replace('FILTER', '')
if debug:
print(bpf_text)
# initialize BPF
b = BPF(text=bpf_text)
b.attach_kprobe(event="sys_stat", fn_name="trace_entry")
b.attach_kprobe(event="sys_statfs", fn_name="trace_entry")
b.attach_kprobe(event="sys_newstat", fn_name="trace_entry")
b.attach_kretprobe(event="sys_stat", fn_name="trace_return")
b.attach_kretprobe(event="sys_statfs", fn_name="trace_return")
b.attach_kretprobe(event="sys_newstat", fn_name="trace_return")
# header
if args.timestamp:
print("%-14s" % ("TIME(s)"), end="")
print("%-6s %-16s %4s %3s %s" % ("PID", "COMM", "FD", "ERR", "PATH"))
start_ts = 0
# format output
while 1:
(task, pid, cpu, flags, ts, msg) = b.trace_fields()
(filename, ret_s) = msg.split(" ")
ret = int(ret_s)
if (args.failed and (ret >= 0)):
continue
# split return value into FD and errno columns
if ret >= 0:
fd_s = ret
err = 0
else:
fd_s = "-1"
err = - ret
# print columns
if args.timestamp:
if start_ts == 0:
start_ts = ts
print("%-14.9f" % (ts - start_ts), end="")
print("%-6d %-16s %4s %3s %s" % (pid, task, fd_s, err, filename))
Demonstrations of statsnoop, the Linux eBPF/bcc version.
statsnoop traces the different stat() syscalls system-wide, and prints various
details. Example output:
# ./statsnoop
PID COMM FD ERR PATH
31126 bash 0 0 .
31126 bash -1 2 /usr/local/sbin/iconfig
31126 bash -1 2 /usr/local/bin/iconfig
31126 bash -1 2 /usr/sbin/iconfig
31126 bash -1 2 /usr/bin/iconfig
31126 bash -1 2 /sbin/iconfig
31126 bash -1 2 /bin/iconfig
31126 bash -1 2 /usr/games/iconfig
31126 bash -1 2 /usr/local/games/iconfig
31126 bash -1 2 /apps/python/bin/iconfig
31126 bash -1 2 /mnt/src/llvm/build/bin/iconfig
8902 command-not-fou -1 2 /usr/bin/Modules/Setup
8902 command-not-fou -1 2 /usr/bin/lib/python3.4/os.py
8902 command-not-fou -1 2 /usr/bin/lib/python3.4/os.pyc
8902 command-not-fou 0 0 /usr/lib/python3.4/os.py
8902 command-not-fou -1 2 /usr/bin/pybuilddir.txt
8902 command-not-fou -1 2 /usr/bin/lib/python3.4/lib-dynload
8902 command-not-fou 0 0 /usr/lib/python3.4/lib-dynload
8902 command-not-fou 0 0 /apps/python/lib/python2.7/site-packages
8902 command-not-fou 0 0 /apps/python/lib/python2.7/site-packages
8902 command-not-fou 0 0 /apps/python/lib/python2.7/site-packages
8902 command-not-fou 0 0 /usr/lib/python3.4/
8902 command-not-fou 0 0 /usr/lib/python3.4/
[...]
This output has caught me mistyping a command in another shell, "iconfig"
instead of "ifconfig". The first several lines show the bash shell searching
the $PATH, and failing to find it (ERR == 2 is file not found). Then, a
"command-not-found" program executes (the name is truncated to 16 characters
in the COMM field), which begins the process of searching for and suggesting
a package. ie, this:
# iconfig
No command 'iconfig' found, did you mean:
Command 'vconfig' from package 'vlan' (main)
Command 'fconfig' from package 'redboot-tools' (universe)
Command 'mconfig' from package 'mono-devel' (main)
Command 'iwconfig' from package 'wireless-tools' (main)
Command 'zconfig' from package 'python-zconfig' (universe)
Command 'ifconfig' from package 'net-tools' (main)
iconfig: command not found
statsnoop can be used for general debugging, to see what file information has
been requested, and whether those files exist. It can be used as a companion
to opensnoop, which shows what files were actually opened.
USAGE message:
# ./statsnoop -h
usage: statsnoop [-h] [-t] [-x] [-p PID]
Trace stat() syscalls
optional arguments:
-h, --help show this help message and exit
-t, --timestamp include timestamp on output
-x, --failed only show failed stats
-p PID, --pid PID trace this PID only
examples:
./statsnoop # trace all stat() syscalls
./statsnoop -t # include timestamps
./statsnoop -x # only show failed stats
./statsnoop -p 181 # only trace PID 181
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment