Merge pull request #358 from brendangregg/master

updates and 3 tools: statsnoop, filelife, filetop

Merge pull request #358 from brendangregg/master
updates and 3 tools: statsnoop, filelife, filetop
21878139 · 4ast · e5b4ffeb · 32c5acd4 · 21878139 · 21878139
Commit 21878139 authored Feb 09, 2016 by 4ast
14 changed files
--- a/README.md
+++ b/README.md
@@ -72,7 +72,9 @@ Tools:
 - tools/[bitesize](tools/bitesize.py): Show per process I/O size histogram. [Examples](tools/bitesize_example.txt).
 - tools/[cachestat](tools/cachestat.py): Trace page cache hit/miss ratio. [Examples](tools/cachestat_example.txt).
 - tools/[execsnoop](tools/execsnoop.py): Trace new processes via exec() syscalls. [Examples](tools/execsnoop_example.txt).
+- tools/[filelife](tools/filelife.py): Trace the lifespan of short-lived files. [Examples](tools/filelife_example.txt).
 - tools/[fileslower](tools/fileslower.py): Trace slow synchronous file reads and writes. [Examples](tools/fileslower_example.txt).
+- tools/[filetop](tools/filetop.py): File reads and writes by filename and process. Top for files. [Examples](tools/filetop_example.txt).
 - tools/[funccount](tools/funccount.py): Count kernel function calls. [Examples](tools/funccount_example.txt).
 - tools/[funclatency](tools/funclatency.py): Time kernel functions and show their latency distribution. [Examples](tools/funclatency_example.txt).
 - tools/[gethostlatency](tools/gethostlatency.py): Show latency for getaddrinfo/gethostbyname[2] calls. [Examples](tools/gethostlatency_example.txt).
@@ -87,6 +89,7 @@ Tools:
 - tools/[softirqs](tools/softirqs.py):  Measure soft IRQ (soft interrupt) event time. [Examples](tools/softirqs_example.txt).
 - tools/[stackcount](tools/stackcount.py): Count kernel function calls and their stack traces. [Examples](tools/stackcount_example.txt).
 - tools/[stacksnoop](tools/stacksnoop.py): Trace a kernel function and print all kernel stack traces. [Examples](tools/stacksnoop_example.txt).
+- tools/[statsnoop](tools/statsnoop.py): Trace stat() syscalls. [Examples](tools/statsnoop_example.txt).
 - tools/[syncsnoop](tools/syncsnoop.py): Trace sync() syscall. [Examples](tools/syncsnoop_example.txt).
 - tools/[tcpaccept](tools/tcpaccept.py): Trace TCP passive connections (accept()). [Examples](tools/tcpaccept_example.txt).
 - tools/[tcpconnect](tools/tcpconnect.py): Trace TCP active connections (connect()). [Examples](tools/tcpconnect_example.txt).

--- a/man/man8/biotop.8
+++ b/man/man8/biotop.8
@@ -22,6 +22,22 @@ and will need updating to match any changes to these functions.
 Since this uses BPF, only the root user can use this tool.
 .SH REQUIREMENTS
 CONFIG_BPF and bcc.
+.SH OPTIONS
+.TP
+\-C
+Don't clear the screen.
+.TP
+\-r MAXROWS
+Maximum number of rows to print. Default is 20.
+.TP
+\-p PID
+Trace this PID only.
+.TP
+interval
+Interval between updates, seconds.
+.TP
+count
+Number of interval summaries.
 .SH EXAMPLES
 .TP
 Summarize block device I/O by process, 1 second screen refresh:
@@ -86,5 +102,7 @@ Linux
 Unstable - in development.
 .SH AUTHOR
 Brendan Gregg
+.SH INSPIRATION
+top(1) by William LeFebvre
 .SH SEE ALSO
 biosnoop(8), biolatency(8), iostat(1)
--- a/man/man8/execsnoop.8
+++ b/man/man8/execsnoop.8
@@ -2,10 +2,10 @@
 .SH NAME
 execsnoop \- Trace new processes via exec() syscalls. Uses Linux eBPF/bcc.
 .SH SYNOPSIS
-.B execsnoop [\-h] [\-t] [\-X] [\-n NAME]
+.B execsnoop [\-h] [\-t] [\-x] [\-n NAME]
 .SH DESCRIPTION
-execsnoop traces new processes, showing the filename executed, argument
-list, and return value (0 for success).
+execsnoop traces new processes, showing the filename executed and argument
+list.

 It works by traces the execve() system call (commonly used exec() variant).
 This catches new processes that follow the fork->exec sequence, as well as
@@ -27,8 +27,8 @@ Print usage message.
 \-t
 Include a timestamp column.
 .TP
-\-X
-Exclude failed exec()s
+\-x
+Include failed exec()s
 .TP
 \-n NAME
 Only print command lines matching this name (regex), matched anywhere
@@ -42,9 +42,9 @@ Trace all exec() syscalls, and include timestamps:
 #
 .B execsnoop \-t
 .TP
-Only trace successful exec()s:
+Include failed exec()s:
 #
-.B execsnoop \-X
+.B execsnoop \-x
 .TP
 Only trace exec()s where the filename or arguments contain "mount":
 #
@@ -61,7 +61,8 @@ PID
 Process ID
 .TP
 RET
-Return value of exec(). 0 == successs.
+Return value of exec(). 0 == successs. Failures are only shown when using the
+\-x option.
 .TP
 ARGS
 Filename for the exec(), followed be up to 19 arguments. An ellipsis "..." is

--- a/man/man8/filelife.8
+++ b/man/man8/filelife.8
+.TH filelife 8  "2016-02-08" "USER COMMANDS"
+.SH NAME
+filelife \- Trace the lifespan of short-lived files. Uses Linux eBPF/bcc.
+.SH SYNOPSIS
+.B filelife [\-h] [\-p PID]
+.SH DESCRIPTION
+This traces the creation and deletion of files, providing information
+on who deleted the file, the file age, and the file name. The intent is to
+provide information on short-lived files, for debugging or performance
+analysis.
+
+This works by tracing the kernel vfs_create() and vfs_delete() functions using
+dynamic tracing, and will need updating to match any changes to these
+functions.
+
+Since this uses BPF, only the root user can use this tool.
+.SH REQUIREMENTS
+CONFIG_BPF and bcc.
+.SH OPTIONS
+.TP
+\-h
+Print usage message.
+.TP
+\-p PID
+Trace this process ID only (filtered in-kernel).
+.SH EXAMPLES
+.TP
+Trace all short-lived files, and print details:
+#
+.B filelife
+.TP
+Trace all short-lived files created AND deleted by PID 181:
+#
+.B filelife \-p 181
+.SH FIELDS
+.TP
+TIME
+Time of the deletion.
+.TP
+PID
+Process ID that deleted the file.
+.TP
+COMM
+Process name for the PID.
+.TP
+AGE(s)
+Age of the file, from creation to deletion, in seconds.
+.TP
+FILE
+Filename.
+.SH OVERHEAD
+This traces the kernel VFS file create and delete functions and prints output
+for each delete. As the rate of this is generally expected to be low
+(< 1000/s), the overhead is also expected to be negligible.
+This is from bcc.
+.IP
+https://github.com/iovisor/bcc
+.PP
+Also look in the bcc distribution for a companion _examples.txt file containing
+example usage, output, and commentary for this tool.
+.SH OS
+Linux
+.SH STABILITY
+Unstable - in development.
+.SH AUTHOR
+Brendan Gregg
+.SH SEE ALSO
+opensnoop(1)
--- a/man/man8/filetop.8
+++ b/man/man8/filetop.8
+.TH filetop 8  "2016-02-08" "USER COMMANDS"
+.SH NAME
+filetop \- File reads and writes by filename and process. Top for files.
+.SH SYNOPSIS
+.B filetop [\-h] [\-C] [\-r MAXROWS] [\-p PID] [interval] [count]
+.SH DESCRIPTION
+This is top for files. 
+
+This traces file reads and writes, and prints a per-file summary every
+interval (by default, 1 second). The summary is sorted on the highest read
+throughput (Kbytes).
+
+This uses in-kernel eBPF maps to store per process summaries for efficiency.
+
+This script works by tracing the __vfs_read() and __vfs_write() functions using
+kernel dynamic tracing, which instruments explicit read and write calls. If
+files are read or written using another means (eg, via mmap()), then they
+will not be visible using this tool. Also, this tool will need updating to
+match any code changes to those vfs functions.
+
+This should be useful for file system workload characterization when analyzing
+the performance of applications.
+
+Note that tracing VFS level reads and writes can be a frequent activity, and
+this tool can begin to cost measurable overhead at high I/O rates.
+
+Since this uses BPF, only the root user can use this tool.
+.SH REQUIREMENTS
+CONFIG_BPF and bcc.
+.SH OPTIONS
+.TP
+\-C
+Don't clear the screen.
+.TP
+\-r MAXROWS
+Maximum number of rows to print. Default is 20.
+.TP
+\-p PID
+Trace this PID only.
+.TP
+interval
+Interval between updates, seconds.
+.TP
+count
+Number of interval summaries.
+
+.SH EXAMPLES
+.TP
+Summarize block device I/O by process, 1 second screen refresh:
+#
+.B filetop
+.TP
+Don't clear the screen, and top 8 rows only:
+#
+.B filetop -Cr 8
+.TP
+5 second summaries, 10 times only:
+#
+.B filetop 5 10
+.SH FIELDS
+.TP
+loadavg:
+The contents of /proc/loadavg
+.TP
+PID
+Process ID.
+.TP
+COMM
+Process name.
+.TP
+READS
+Count of reads during interval.
+.TP
+WRITES
+Count of writes during interval.
+.TP
+R_Kb
+Total read Kbytes during interval.
+.TP
+W_Kb
+Total write Kbytes during interval.
+.TP
+T
+Type of file: R == regular, S == socket, O == other (pipe, etc).
+.SH OVERHEAD
+Depending on the frequency of application reads and writes, overhead can become
+significant, in the worst case slowing applications by over 50%. Hopefully for
+real world workloads the overhead is much less -- test before use. The reason
+for the high overhead is that VFS reads and writes can be a frequent event, and
+despite the eBPF overhead being very small per event, if you multiply this
+small overhead by a million events per second, it becomes a million times
+worse. Literally. You can gauge the number of reads and writes using the
+vfsstat(8) tool, also from bcc.
+.SH SOURCE
+This is from bcc.
+.IP
+https://github.com/iovisor/bcc
+.PP
+Also look in the bcc distribution for a companion _examples.txt file containing
+example usage, output, and commentary for this tool.
+.SH OS
+Linux
+.SH STABILITY
+Unstable - in development.
+.SH AUTHOR
+Brendan Gregg
+.SH INSPIRATION
+top(1) by William LeFebvre
+.SH SEE ALSO
+vfsstat(8), vfscount(8), fileslower(8)
--- a/man/man8/statsnoop.8
+++ b/man/man8/statsnoop.8
+.TH statsnoop 8  "2016-02-08" "USER COMMANDS"
+.SH NAME
+statsnoop \- Trace stat() syscalls. Uses Linux eBPF/bcc.
+.SH SYNOPSIS
+.B statsnoop [\-h] [\-t] [\-x] [\-p PID]
+.SH DESCRIPTION
+statsnoop traces the different stat() syscalls, showing which processes are
+attempting to read information about which files. This can be useful for
+determining the location of config and log files, or for troubleshooting
+applications that are failing, especially on startup.
+
+This works by tracing various kernel sys_stat() functions using dynamic
+tracing, and will need updating to match any changes to these functions.
+
+Since this uses BPF, only the root user can use this tool.
+.SH REQUIREMENTS
+CONFIG_BPF and bcc.
+.SH OPTIONS
+.TP
+\-h
+Print usage message.
+.TP
+\-t
+Include a timestamp column.
+.TP
+\-x
+Only print failed stats.
+.TP
+\-p PID
+Trace this process ID only (filtered in-kernel).
+.SH EXAMPLES
+.TP
+Trace all stat() syscalls:
+#
+.B statsnoop
+.TP
+Trace all stat() syscalls, and include timestamps:
+#
+.B statsnoop \-t
+.TP
+Trace only stat() syscalls that failed:
+#
+.B statsnoop \-x
+.TP
+Trace PID 181 only:
+#
+.B statsnoop \-p 181
+.SH FIELDS
+.TP
+TIME(s)
+Time of the call, in seconds.
+.TP
+PID
+Process ID
+.TP
+COMM
+Process name
+.TP
+FD
+File descriptor (if success), or -1 (if failed)
+.TP
+ERR
+Error number (see the system's errno.h)
+.TP
+PATH
+Open path
+.SH OVERHEAD
+This traces the kernel stat function and prints output for each event. As the
+rate of this is generally expected to be low (< 1000/s), the overhead is also
+expected to be negligible. If you have an application that is calling a high
+rate of stat()s, then test and understand overhead before use.
+.SH SOURCE
+This is from bcc.
+.IP
+https://github.com/iovisor/bcc
+.PP
+Also look in the bcc distribution for a companion _examples.txt file containing
+example usage, output, and commentary for this tool.
+.SH OS
+Linux
+.SH STABILITY
+Unstable - in development.
+.SH AUTHOR
+Brendan Gregg
+.SH SEE ALSO
+opensnoop(1)
--- a/tools/execsnoop.py
+++ b/tools/execsnoop.py
@@ -4,7 +4,7 @@
 # execsnoop Trace new processes via exec() syscalls.
 #           For Linux, uses BCC, eBPF. Embedded C.
 #
-# USAGE: execsnoop [-h] [-t] [-X] [-n NAME]
+# USAGE: execsnoop [-h] [-t] [-x] [-n NAME]
 #
 # This currently will print up to a maximum of 19 arguments, plus the process
 # name, so 20 fields in total (MAXARG).
@@ -24,7 +24,7 @@ import re
 # arguments
 examples = """examples:
    ./execsnoop           # trace all exec() syscalls
-    ./execsnoop -X        # only show successful exec()s
+    ./execsnoop -x        # include failed exec()s
    ./execsnoop -t        # include timestamps
    ./execsnoop -n main   # only print command lines containing "main"
 """
@@ -34,8 +34,8 @@ parser = argparse.ArgumentParser(
    epilog=examples)
 parser.add_argument("-t", "--timestamp", action="store_true",
    help="include timestamp on output")
-parser.add_argument("-X", "--excludefails", action="store_true",
-    help="exclude failed exec()s")
+parser.add_argument("-x", "--fails", action="store_true",
+    help="include failed exec()s")
 parser.add_argument("-n", "--name",
    help="only print commands matching this name (regex), any arg")
 args = parser.parse_args()
@@ -125,17 +125,25 @@ pcomm = {}
 # format output
 while 1:
    (task, pid, cpu, flags, ts, msg) = b.trace_fields()
-    (type, arg) = msg.split(" ", 1)
+    try:
+        (type, arg) = msg.split(" ", 1)
+    except ValueError:
+        continue

    if start_ts == 0:
        start_ts = ts

    if type == "RET":
+        if pid not in cmd:
+            # zero args
+            cmd[pid] = ""
+            pcomm[pid] = ""
+
        skip = 0
        if args.name:
            if not re.search(args.name, cmd[pid]):
                skip = 1
-        if args.excludefails and int(arg) < 0:
+        if not args.fails and int(arg) < 0:
            skip = 1
        if skip:
            del cmd[pid]

--- a/tools/execsnoop_example.txt
+++ b/tools/execsnoop_example.txt
 Demonstrations of execsnoop, the Linux eBPF/bcc version.


-execsnoop traces new processes. For example:
+execsnoop traces new processes. For example, tracing the commands invoked when
+running "man ls":

-# ./execsnoop 
+# ./execsnoop
+PCOMM            PID    RET ARGS
+bash             15887    0 /usr/bin/man ls
+preconv          15894    0 /usr/bin/preconv -e UTF-8
+man              15896    0 /usr/bin/tbl
+man              15897    0 /usr/bin/nroff -mandoc -rLL=169n -rLT=169n -Tutf8
+man              15898    0 /usr/bin/pager -s
+nroff            15900    0 /usr/bin/locale charmap
+nroff            15901    0 /usr/bin/groff -mtty-char -Tutf8 -mandoc -rLL=169n -rLT=169n
+groff            15902    0 /usr/bin/troff -mtty-char -mandoc -rLL=169n -rLT=169n -Tutf8
+groff            15903    0 /usr/bin/grotty
+
+The output shows the parent process/command name (PCOMM), the PID, the return
+value of the exec() (RET), and the filename with arguments (ARGS). 
+
+This works by traces the execve() system call (commonly used exec() variant),
+and shows details of the arguments and return value. This catches new processes
+that follow the fork->exec sequence, as well as processes that re-exec()
+themselves. Some applications fork() but do not exec(), eg, for worker
+processes, which won't be included in the execsnoop output.
+
+
+The -x option can be used to include failed exec()s. For example:
+
+# ./execsnoop -x
 PCOMM            PID    RET ARGS
 supervise        9660     0 ./run
 supervise        9661     0 ./run
@@ -21,35 +46,9 @@ run              9661    -2 /usr/local/bin/setuidgid nobody /command/multilog t
 supervise        9670     0 ./run
 [...]

-The output shows the parent process/command name (PCOMM), the PID, the return
-value of the exec() (RET), and the filename with arguments (ARGS). The example
-above shows various regular system daemon activity, including some failures
-(trying to execute a /usr/local/bin/setuidgid, which I just noticed doesn't
-exist).
-
-It works by traces the execve() system call (commonly used exec() variant), and
-shows details of the arguments and return value. This catches new processes
-that follow the fork->exec sequence, as well as processes that re-exec()
-themselves. Some applications fork() but do not exec(), eg, for worker
-processes, which won't be included in the execsnoop output.
-
-
-The -X option can be used to only show successful exec()s. For example, tracing
-a "man ls":
-
-# ./execsnoop -X
-PCOMM            PID    RET ARGS
-bash             15887    0 /usr/bin/man ls
-preconv          15894    0 /usr/bin/preconv -e UTF-8
-man              15896    0 /usr/bin/tbl
-man              15897    0 /usr/bin/nroff -mandoc -rLL=169n -rLT=169n -Tutf8
-man              15898    0 /usr/bin/pager -s
-nroff            15900    0 /usr/bin/locale charmap
-nroff            15901    0 /usr/bin/groff -mtty-char -Tutf8 -mandoc -rLL=169n -rLT=169n
-groff            15902    0 /usr/bin/troff -mtty-char -mandoc -rLL=169n -rLT=169n -Tutf8
-groff            15903    0 /usr/bin/grotty
-
-This shows the various commands used to process the "man ls" command.
+This example shows various regular system daemon activity, including some
+failures (trying to execute a /usr/local/bin/setuidgid, which I just noticed
+doesn't exist).


 A -t option can be used to include a timestamp column, and a -n option to match
@@ -64,19 +63,19 @@ TIME(s) PCOMM            PID    RET ARGS
 USAGE message:

 # ./execsnoop -h
-usage: execsnoop [-h] [-t] [-X] [-n NAME]
+usage: execsnoop [-h] [-t] [-x] [-n NAME]

 Trace exec() syscalls

 optional arguments:
  -h, --help            show this help message and exit
  -t, --timestamp       include timestamp on output
-  -X, --excludefails    exclude failed exec()s
+  -x, --fails           include failed exec()s
  -n NAME, --name NAME  only print commands matching this name (regex), any
                        arg

 examples:
    ./execsnoop           # trace all exec() syscalls
-    ./execsnoop -X        # only show successful exec()s
+    ./execsnoop -x        # include failed exec()s 
    ./execsnoop -t        # include timestamps
    ./execsnoop -n main   # only print command lines containing "main"
--- a/tools/filelife.py
+++ b/tools/filelife.py
+#!/usr/bin/python
+# @lint-avoid-python-3-compatibility-imports
+#
+# filelife    Trace the lifespan of short-lived files.
+#             For Linux, uses BCC, eBPF. Embedded C.
+#
+# This traces the creation and deletion of files, providing information
+# on who deleted the file, the file age, and the file name. The intent is to
+# provide information on short-lived files, for debugging or performance
+# analysis.
+#
+# USAGE: filelife [-h] [-p PID]
+#
+# Copyright 2016 Netflix, Inc.
+# Licensed under the Apache License, Version 2.0 (the "License")
+#
+# 08-Feb-2015   Brendan Gregg   Created this.
+
+from __future__ import print_function
+from bcc import BPF
+import argparse
+from time import strftime
+
+# arguments
+examples = """examples:
+    ./filelife           # trace all stat() syscalls
+    ./filelife -p 181    # only trace PID 181
+"""
+parser = argparse.ArgumentParser(
+    description="Trace stat() syscalls",
+    formatter_class=argparse.RawDescriptionHelpFormatter,
+    epilog=examples)
+parser.add_argument("-p", "--pid",
+    help="trace this PID only")
+args = parser.parse_args()
+debug = 0
+
+# define BPF program
+bpf_text = """
+#include <uapi/linux/ptrace.h>
+#include <linux/fs.h>
+
+BPF_HASH(birth, struct dentry *);
+
+// trace file creation time
+int trace_create(struct pt_regs *ctx, struct inode *dir, struct dentry *dentry)
+{
+    u32 pid = bpf_get_current_pid_tgid();
+    FILTER
+
+    u64 ts = bpf_ktime_get_ns();
+    birth.update(&dentry, &ts);
+
+    return 0;
+};
+
+// trace file deletion and output details
+int trace_unlink(struct pt_regs *ctx, struct inode *dir, struct dentry *dentry)
+{
+    u32 pid = bpf_get_current_pid_tgid();
+    FILTER
+
+    u64 *tsp, delta;
+    tsp = birth.lookup(&dentry);
+    if (tsp == 0) {
+        return 0;   // missed create
+    }
+    delta = (bpf_ktime_get_ns() - *tsp) / 1000000;
+    birth.delete(&dentry);
+
+    if (dentry->d_iname[0] == 0)
+        return 0;
+
+    bpf_trace_printk("%d %s\\n", delta, dentry->d_iname);
+
+    return 0;
+}
+"""
+if args.pid:
+    bpf_text = bpf_text.replace('FILTER',
+        'if (pid != %s) { return 0; }' % args.pid)
+else:
+    bpf_text = bpf_text.replace('FILTER', '')
+if debug:
+    print(bpf_text)
+
+# initialize BPF
+b = BPF(text=bpf_text)
+b.attach_kprobe(event="vfs_create", fn_name="trace_create")
+b.attach_kprobe(event="vfs_unlink", fn_name="trace_unlink")
+
+# header
+print("%-8s %-6s %-16s %-7s %s" % ("TIME", "PID", "COMM", "AGE(s)", "FILE"))
+
+start_ts = 0
+
+# format output
+while 1:
+    (task, pid, cpu, flags, ts, msg) = b.trace_fields()
+    (delta, filename) = msg.split(" ")
+
+    # print columns
+    print("%-8s %-6d %-16s %-7.2f %s" % (strftime("%H:%M:%S"), pid, task,
+        float(delta) / 1000, filename))
--- a/tools/filelife_example.txt
+++ b/tools/filelife_example.txt
+Demonstrations of filelife, the Linux eBPF/bcc version.
+
+
+filelife traces short-lived files: those that have been created and then
+deleted while tracing. For example:
+
+# ./filelife 
+TIME     PID    COMM             AGE(s)  FILE
+05:57:59 8556   gcc              0.04    ccCB5EDe.s
+05:57:59 8560   rm               0.02    .entry_64.o.d
+05:57:59 8563   gcc              0.02    cc5UFHXf.s
+05:57:59 8567   rm               0.01    .thunk_64.o.d
+05:57:59 8578   rm               0.02    .syscall_64.o.d
+05:58:00 8589   rm               0.03    .common.o.d
+05:58:00 8596   rm               0.01    .8592.tmp
+05:58:00 8601   rm               0.01    .8597.tmp
+05:58:00 8606   rm               0.01    .8602.tmp
+05:58:00 8639   rm               0.02    .vma.o.d
+05:58:00 8650   rm               0.02    .vdso32-setup.o.d
+05:58:00 8656   rm               0.00    .vdso.lds.d
+05:58:00 8659   gcc              0.01    ccveeJAz.s
+05:58:00 8663   rm               0.01    .vdso-note.o.d
+05:58:00 8674   rm               0.02    .vclock_gettime.o.d
+05:58:01 8684   rm               0.01    .vgetcpu.o.d
+05:58:01 8690   collect2         0.00    ccvKMxdm.ld
+
+This has caught short-lived files that were created during a Linux kernel
+build. The PID shows the process ID that finally deleted the file, and COMM
+is its process name. The AGE(s) column shows the age of the file, in seconds,
+when it was deleted. These are all short-lived, and existed for less than
+one tenth of a second.
+
+Creating, populating, and then deleting files as part of another process can
+be an inefficient method of inter-process communication. It can cause disk I/O
+as files are closed and their file descriptors flushed, only later to be
+deleted. As such, short-lived files can be a target of performance
+optimizations.
+
+USAGE message:
+
+# ./filelife -h
+usage: filelife [-h] [-p PID]
+
+Trace stat() syscalls
+
+optional arguments:
+  -h, --help         show this help message and exit
+  -p PID, --pid PID  trace this PID only
+
+examples:
+    ./filelife           # trace all stat() syscalls
+    ./filelife -p 181    # only trace PID 181
--- a/tools/filetop.py
+++ b/tools/filetop.py
+#!/usr/bin/python
+# @lint-avoid-python-3-compatibility-imports
+#
+# filetop  file reads and writes by process.
+#          For Linux, uses BCC, eBPF.
+#
+# USAGE: filetop.py [-h] [-C] [-r MAXROWS] [interval] [count]
+#
+# This uses in-kernel eBPF maps to store per process summaries for efficiency.
+#
+# Copyright 2016 Netflix, Inc.
+# Licensed under the Apache License, Version 2.0 (the "License")
+#
+# 06-Feb-2016   Brendan Gregg   Created this.
+
+from __future__ import print_function
+from bcc import BPF
+from time import sleep, strftime
+import argparse
+import signal
+from subprocess import call
+
+# arguments
+examples = """examples:
+    ./filetop            # file I/O top, 1 second refresh
+    ./filetop -C         # don't clear the screen
+    ./filetop -p 181     # PID 181 only
+    ./filetop 5          # 5 second summaries
+    ./filetop 5 10       # 5 second summaries, 10 times only
+"""
+parser = argparse.ArgumentParser(
+    description="File reads and writes by process",
+    formatter_class=argparse.RawDescriptionHelpFormatter,
+    epilog=examples)
+parser.add_argument("-C", "--noclear", action="store_true",
+    help="don't clear the screen")
+parser.add_argument("-r", "--maxrows", default=20,
+    help="maximum rows to print, default 20")
+parser.add_argument("-p", "--pid",
+    help="trace this PID only")
+parser.add_argument("interval", nargs="?", default=1,
+    help="output interval, in seconds")
+parser.add_argument("count", nargs="?", default=99999999,
+    help="number of outputs")
+args = parser.parse_args()
+interval = int(args.interval)
+countdown = int(args.count)
+maxrows = int(args.maxrows)
+clear = not int(args.noclear)
+debug = 0
+
+# linux stats
+loadavg = "/proc/loadavg"
+
+# signal handler
+def signal_ignore(signal, frame):
+    print()
+
+# define BPF program
+bpf_text = """
+#include <uapi/linux/ptrace.h>
+#include <linux/blkdev.h>
+
+#define MAX_FILE_LEN    32
+
+// the key for the output summary
+struct info_t {
+    u32 pid;
+    char name[TASK_COMM_LEN];
+    char file[MAX_FILE_LEN];
+    char type;
+};
+
+// the value of the output summary
+struct val_t {
+    u64 reads;
+    u64 writes;
+    u64 rbytes;
+    u64 wbytes;
+};
+
+BPF_HASH(counts, struct info_t, struct val_t);
+
+static int do_entry(struct pt_regs *ctx, struct file *file,
+    char __user *buf, size_t count, int is_read)
+{
+    u32 pid;
+
+    pid = bpf_get_current_pid_tgid();
+    if (FILTER)
+        return 0;
+
+    // skip I/O lacking a filename
+    struct dentry *de = file->f_path.dentry;
+    if (de->d_iname[0] == 0)
+        return 0;
+
+    // store counts and sizes by pid & file
+    struct info_t info = {.pid = pid};
+    bpf_get_current_comm(&info.name, sizeof(info.name));
+    __builtin_memcpy(&info.file, de->d_iname, sizeof(info.file));
+    int mode = file->f_inode->i_mode;
+    if (S_ISREG(mode)) {
+        info.type = 'R';
+    } else if (S_ISSOCK(mode)) {
+        info.type = 'S';
+    } else {
+        info.type = 'O';
+    }
+
+    struct val_t *valp, zero = {};
+    valp = counts.lookup_or_init(&info, &zero);
+    if (is_read) {
+        valp->reads++;
+        valp->rbytes += count;
+    } else {
+        valp->writes++;
+        valp->wbytes += count;
+    }
+
+    return 0;
+}
+
+int trace_read_entry(struct pt_regs *ctx, struct file *file,
+    char __user *buf, size_t count)
+{
+    return do_entry(ctx, file, buf, count, 1);
+}
+
+int trace_write_entry(struct pt_regs *ctx, struct file *file,
+    char __user *buf, size_t count)
+{
+    return do_entry(ctx, file, buf, count, 0);
+}
+
+"""
+if args.pid:
+    bpf_text = bpf_text.replace('FILTER', 'pid != %s' % args.pid)
+else:
+    bpf_text = bpf_text.replace('FILTER', '0')
+if debug:
+    print(bpf_text)
+
+# initialize BPF
+b = BPF(text=bpf_text)
+b.attach_kprobe(event="__vfs_read", fn_name="trace_read_entry")
+b.attach_kprobe(event="__vfs_write", fn_name="trace_write_entry")
+
+print('Tracing... Output every %d secs. Hit Ctrl-C to end' % interval)
+
+# output
+exiting = 0
+while 1:
+    try:
+        sleep(interval)
+    except KeyboardInterrupt:
+        exiting = 1
+
+    # header
+    if clear:
+        call("clear")
+    else:
+        print()
+    with open(loadavg) as stats:
+        print("%-8s loadavg: %s" % (strftime("%H:%M:%S"), stats.read()))
+    print("%-6s %-16s %-6s %-6s %-7s %-7s %1s %s" % ("PID", "COMM",
+        "READS", "WRITES", "R_Kb", "W_Kb", "T", "FILE"))
+
+    # by-PID output
+    counts = b.get_table("counts")
+    line = 0
+    for k, v in reversed(sorted(counts.items(),
+                                key=lambda counts: counts[1].rbytes)):
+
+        # print line
+        print("%-6d %-16s %-6d %-6d %-7d %-7d %1s %s" % (k.pid, k.name,
+            v.reads, v.writes, v.rbytes / 1024, v.wbytes / 1024, k.type,
+            k.file))
+
+        line += 1
+        if line >= maxrows:
+            break
+    counts.clear()
+
+    countdown -= 1
+    if exiting or countdown == 0:
+        print("Detaching...")
+        exit()
--- a/tools/filetop_example.txt
+++ b/tools/filetop_example.txt
+Demonstrations of filetop, the Linux eBPF/bcc version.
+
+
+filetop shows reads and writes by file, with process details. For example:
+
+# ./filetop -C
+Tracing... Output every 1 secs. Hit Ctrl-C to end
+
+08:00:23 loadavg: 0.91 0.33 0.23 3/286 26635
+
+PID    COMM             READS  WRITES R_Kb    W_Kb    T FILE
+26628  ld               161    186    643     152     R built-in.o
+26634  cc1              1      0      200     0       R autoconf.h
+26618  cc1              1      0      200     0       R autoconf.h
+26634  cc1              12     0      192     0       R tracepoint.h
+26584  cc1              2      0      143     0       R mm.h
+26634  cc1              2      0      143     0       R mm.h
+26631  make             34     0      136     0       R auto.conf
+26634  cc1              1      0      98      0       R fs.h
+26584  cc1              1      0      98      0       R fs.h
+26634  cc1              1      0      91      0       R sched.h
+26634  cc1              1      0      78      0       R printk.c
+26634  cc1              3      0      73      0       R mmzone.h
+26628  ld               18     0      72      0       R hibernate.o
+26628  ld               16     0      64      0       R suspend.o
+26628  ld               16     0      64      0       R snapshot.o
+26630  cat              1      0      64      0       O null
+26628  ld               16     0      64      0       R qos.o
+26628  ld               13     0      52      0       R main.o
+26628  ld               12     0      52      0       R swap.o
+12421  sshd             3      0      48      0       O ptmx
+[...]
+
+This shows various files read and written during a Linux kernel build. The
+output is sorted by the total read size in Kbytes (R_Kb). This is instrumenting
+at the VFS interface, so this is reads and writes that may return entirely
+from the file system cache (page cache).
+
+While not printed, the average read and write size can be calculated by
+dividing R_Kb by READS, and the same for writes.
+
+The "T" column indicates the type of the file: "R" for regular files, "S" for
+sockets, and "O" for other (including pipes).
+
+This script works by tracing the vfs_read() and vfs_write() functions using
+kernel dynamic tracing, which instruments explicit read and write calls. If
+files are read or written using another means (eg, via mmap()), then they
+will not be visible using this tool.
+
+This should be useful for file system workload characterization when analyzing
+the performance of applications.
+
+Note that tracing VFS level reads and writes can be a frequent activity, and
+this tool can begin to cost measurable overhead at high I/O rates.
+
+
+A -C option will stop clearing the screen, and -r with a number will restrict
+the output to that many rows (20 by default). For example, not clearing
+the screen and showing the top 5 only:
+
+# ./filetop -Cr 5
+Tracing... Output every 1 secs. Hit Ctrl-C to end
+
+08:05:11 loadavg: 0.75 0.35 0.25 3/285 822
+
+PID    COMM             READS  WRITES R_Kb    W_Kb    T FILE
+32672  cksum            5006   0      320384  0       R data1
+12296  sshd             2      0      32      0       O ptmx
+809    run              2      0      8       0       R nsswitch.conf
+811    run              2      0      8       0       R nsswitch.conf
+804    chown            2      0      8       0       R nsswitch.conf
+
+08:05:12 loadavg: 0.75 0.35 0.25 3/285 845
+
+PID    COMM             READS  WRITES R_Kb    W_Kb    T FILE
+32672  cksum            4986   0      319104  0       R data1
+845    chown            2      0      8       0       R nsswitch.conf
+828    run              2      0      8       0       R nsswitch.conf
+835    run              2      0      8       0       R nsswitch.conf
+830    run              2      0      8       0       R nsswitch.conf
+
+08:05:13 loadavg: 0.75 0.35 0.25 3/285 868
+
+PID    COMM             READS  WRITES R_Kb    W_Kb    T FILE
+32672  cksum            4985   0      319040  0       R data1
+857    run              2      0      8       0       R nsswitch.conf
+858    run              2      0      8       0       R nsswitch.conf
+859    run              2      0      8       0       R nsswitch.conf
+848    run              2      0      8       0       R nsswitch.conf
+[...]
+
+This output shows a cksum command reading data1. Note that 
+
+
+An optional interval and optional count can also be added to the end of the
+command line. For example, for 1 second interval, and 3 summaries in total:
+
+# ./filetop -Cr 5 1 3
+Tracing... Output every 1 secs. Hit Ctrl-C to end
+
+08:08:20 loadavg: 0.30 0.42 0.31 3/282 5187
+
+PID    COMM             READS  WRITES R_Kb    W_Kb    T FILE
+12421  sshd             14101  0      225616  0       O ptmx
+12296  sshd             4      0      64      0       O ptmx
+12421  sshd             3      14104  48      778     S TCP
+5178   run              2      0      8       0       R nsswitch.conf
+5165   run              2      0      8       0       R nsswitch.conf
+
+08:08:21 loadavg: 0.30 0.42 0.31 5/282 5210
+
+PID    COMM             READS  WRITES R_Kb    W_Kb    T FILE
+12421  sshd             9159   0      146544  0       O ptmx
+12421  sshd             3      9161   48      534     S TCP
+12296  sshd             1      0      16      0       S TCP
+5188   run              2      0      8       0       R nsswitch.conf
+5203   run              2      0      8       0       R nsswitch.conf
+
+08:08:22 loadavg: 0.30 0.42 0.31 2/282 5233
+
+PID    COMM             READS  WRITES R_Kb    W_Kb    T FILE
+12421  sshd             26166  0      418656  0       O ptmx
+12421  sshd             4      26171  64      1385    S TCP
+12296  sshd             1      0      16      0       O ptmx
+5214   run              2      0      8       0       R nsswitch.conf
+5227   run              2      0      8       0       R nsswitch.conf
+Detaching...
+
+This example has caught heavy socket I/O from an sshd process, showing up as
+non-regular file types (the "O" for other, and "S" for socket, in the type
+column: "T"). 
+
+
+USAGE message:
+
+# ./filetop -h
+usage: filetop [-h] [-C] [-r MAXROWS] [-p PID] [interval] [count]
+
+File reads and writes by process
+
+positional arguments:
+  interval              output interval, in seconds
+  count                 number of outputs
+
+optional arguments:
+  -h, --help            show this help message and exit
+  -C, --noclear         don't clear the screen
+  -r MAXROWS, --maxrows MAXROWS
+                        maximum rows to print, default 20
+  -p PID, --pid PID     trace this PID only
+
+examples:
+    ./filetop            # file I/O top, 1 second refresh
+    ./filetop -C         # don't clear the screen
+    ./filetop -p 181     # PID 181 only
+    ./filetop 5          # 5 second summaries
+    ./filetop 5 10       # 5 second summaries, 10 times only
--- a/tools/statsnoop.py
+++ b/tools/statsnoop.py
+#!/usr/bin/python
+# @lint-avoid-python-3-compatibility-imports
+#
+# statsnoop Trace stat() syscalls.
+#           For Linux, uses BCC, eBPF. Embedded C.
+#
+# USAGE: statsnoop [-h] [-t] [-x] [-p PID]
+#
+# Copyright 2016 Netflix, Inc.
+# Licensed under the Apache License, Version 2.0 (the "License")
+#
+# 08-Feb-2016   Brendan Gregg   Created this.
+
+from __future__ import print_function
+from bcc import BPF
+import argparse
+
+# arguments
+examples = """examples:
+    ./statsnoop           # trace all stat() syscalls
+    ./statsnoop -t        # include timestamps
+    ./statsnoop -x        # only show failed stats
+    ./statsnoop -p 181    # only trace PID 181
+"""
+parser = argparse.ArgumentParser(
+    description="Trace stat() syscalls",
+    formatter_class=argparse.RawDescriptionHelpFormatter,
+    epilog=examples)
+parser.add_argument("-t", "--timestamp", action="store_true",
+    help="include timestamp on output")
+parser.add_argument("-x", "--failed", action="store_true",
+    help="only show failed stats")
+parser.add_argument("-p", "--pid",
+    help="trace this PID only")
+args = parser.parse_args()
+debug = 0
+
+# define BPF program
+bpf_text = """
+#include <uapi/linux/ptrace.h>
+
+BPF_HASH(args_filename, u32, const char *);
+
+int trace_entry(struct pt_regs *ctx, const char __user *filename)
+{
+    u32 pid = bpf_get_current_pid_tgid();
+
+    FILTER
+    args_filename.update(&pid, &filename);
+
+    return 0;
+};
+
+int trace_return(struct pt_regs *ctx)
+{
+    const char **filenamep;
+    int ret = ctx->ax;
+    u32 pid = bpf_get_current_pid_tgid();
+
+    filenamep = args_filename.lookup(&pid);
+    if (filenamep == 0) {
+        // missed entry
+        return 0;
+    }
+
+    bpf_trace_printk("%s %d\\n", *filenamep, ret);
+    args_filename.delete(&pid);
+
+    return 0;
+}
+"""
+if args.pid:
+    bpf_text = bpf_text.replace('FILTER',
+        'if (pid != %s) { return 0; }' % args.pid)
+else:
+    bpf_text = bpf_text.replace('FILTER', '')
+if debug:
+    print(bpf_text)
+
+# initialize BPF
+b = BPF(text=bpf_text)
+b.attach_kprobe(event="sys_stat", fn_name="trace_entry")
+b.attach_kprobe(event="sys_statfs", fn_name="trace_entry")
+b.attach_kprobe(event="sys_newstat", fn_name="trace_entry")
+b.attach_kretprobe(event="sys_stat", fn_name="trace_return")
+b.attach_kretprobe(event="sys_statfs", fn_name="trace_return")
+b.attach_kretprobe(event="sys_newstat", fn_name="trace_return")
+
+# header
+if args.timestamp:
+    print("%-14s" % ("TIME(s)"), end="")
+print("%-6s %-16s %4s %3s %s" % ("PID", "COMM", "FD", "ERR", "PATH"))
+
+start_ts = 0
+
+# format output
+while 1:
+    (task, pid, cpu, flags, ts, msg) = b.trace_fields()
+    (filename, ret_s) = msg.split(" ")
+
+    ret = int(ret_s)
+    if (args.failed and (ret >= 0)):
+        continue
+
+    # split return value into FD and errno columns
+    if ret >= 0:
+        fd_s = ret
+        err = 0
+    else:
+        fd_s = "-1"
+        err = - ret
+
+    # print columns
+    if args.timestamp:
+        if start_ts == 0:
+            start_ts = ts
+        print("%-14.9f" % (ts - start_ts), end="")
+    print("%-6d %-16s %4s %3s %s" % (pid, task, fd_s, err, filename))
--- a/tools/statsnoop_example.txt
+++ b/tools/statsnoop_example.txt
+Demonstrations of statsnoop, the Linux eBPF/bcc version.
+
+
+statsnoop traces the different stat() syscalls system-wide, and prints various
+details. Example output:
+
+# ./statsnoop 
+PID    COMM               FD ERR PATH
+31126  bash                0   0 .
+31126  bash               -1   2 /usr/local/sbin/iconfig
+31126  bash               -1   2 /usr/local/bin/iconfig
+31126  bash               -1   2 /usr/sbin/iconfig
+31126  bash               -1   2 /usr/bin/iconfig
+31126  bash               -1   2 /sbin/iconfig
+31126  bash               -1   2 /bin/iconfig
+31126  bash               -1   2 /usr/games/iconfig
+31126  bash               -1   2 /usr/local/games/iconfig
+31126  bash               -1   2 /apps/python/bin/iconfig
+31126  bash               -1   2 /mnt/src/llvm/build/bin/iconfig
+8902   command-not-fou    -1   2 /usr/bin/Modules/Setup
+8902   command-not-fou    -1   2 /usr/bin/lib/python3.4/os.py
+8902   command-not-fou    -1   2 /usr/bin/lib/python3.4/os.pyc
+8902   command-not-fou     0   0 /usr/lib/python3.4/os.py
+8902   command-not-fou    -1   2 /usr/bin/pybuilddir.txt
+8902   command-not-fou    -1   2 /usr/bin/lib/python3.4/lib-dynload
+8902   command-not-fou     0   0 /usr/lib/python3.4/lib-dynload
+8902   command-not-fou     0   0 /apps/python/lib/python2.7/site-packages
+8902   command-not-fou     0   0 /apps/python/lib/python2.7/site-packages
+8902   command-not-fou     0   0 /apps/python/lib/python2.7/site-packages
+8902   command-not-fou     0   0 /usr/lib/python3.4/
+8902   command-not-fou     0   0 /usr/lib/python3.4/
+[...]
+
+This output has caught me mistyping a command in another shell, "iconfig"
+instead of "ifconfig". The first several lines show the bash shell searching
+the $PATH, and failing to find it (ERR == 2 is file not found). Then, a
+"command-not-found" program executes (the name is truncated to 16 characters
+in the COMM field), which begins the process of searching for and suggesting
+a package. ie, this:
+
+# iconfig
+No command 'iconfig' found, did you mean:
+ Command 'vconfig' from package 'vlan' (main)
+ Command 'fconfig' from package 'redboot-tools' (universe)
+ Command 'mconfig' from package 'mono-devel' (main)
+ Command 'iwconfig' from package 'wireless-tools' (main)
+ Command 'zconfig' from package 'python-zconfig' (universe)
+ Command 'ifconfig' from package 'net-tools' (main)
+iconfig: command not found
+
+statsnoop can be used for general debugging, to see what file information has
+been requested, and whether those files exist. It can be used as a companion
+to opensnoop, which shows what files were actually opened.
+
+
+USAGE message:
+
+# ./statsnoop -h
+usage: statsnoop [-h] [-t] [-x] [-p PID]
+
+Trace stat() syscalls
+
+optional arguments:
+  -h, --help         show this help message and exit
+  -t, --timestamp    include timestamp on output
+  -x, --failed       only show failed stats
+  -p PID, --pid PID  trace this PID only
+
+examples:
+    ./statsnoop           # trace all stat() syscalls
+    ./statsnoop -t        # include timestamps
+    ./statsnoop -x        # only show failed stats
+    ./statsnoop -p 181    # only trace PID 181