Merge pull request #2 from iovisor/master

update local repo

Merge pull request #2 from iovisor/master
update local repo
fdfacb71 · Allan McAleavy · eb3c9605 · dd221c4e · fdfacb71 · fdfacb71
Commit fdfacb71 authored Feb 12, 2016 by Allan McAleavy
67 changed files
--- a/CONTRIBUTING-SCRIPTS.md
+++ b/CONTRIBUTING-SCRIPTS.md
@@ -24,11 +24,11 @@ As said earlier: keep it short, neat, and documented (code comments).
 A checklist for bcc tool development:

 1. **Research the topic landscape**. Learn the existing tools and metrics (incl. from /proc). Determine what real world problems exist and need solving. We have too many tools and metrics as it is, we don't need more "I guess that's useful" tools, we need more "ah-hah! I couldn't do this before!" tools. Consider asking other developers about your idea. Many of us can be found in IRC, in the #iovisor channel on irc.oftc.net. There's also the mailing list (see the README.md), and github for issues.
-1. **Create a known workload for testing**. This might involving writing a 10 line C program, using a microbenchmark, or just improvising at the shell. If you don't know how to create a workload, learn! Figuring this out will provide invaluable context and details that you may have otherwise overlooked. Sometimes it's easy, and I'm able to just use dd(1) from /dev/urandom or a disk device to /dev/null. It lets me set the I/O size, count, and provides throughput statistics for cross-checking checking my tool output. But other times I need a micro-benchhark, or some C.
+1. **Create a known workload for testing**. This might involving writing a 10 line C program, using a micro-benchmark, or just improvising at the shell. If you don't know how to create a workload, learn! Figuring this out will provide invaluable context and details that you may have otherwise overlooked. Sometimes it's easy, and I'm able to just use dd(1) from /dev/urandom or a disk device to /dev/null. It lets me set the I/O size, count, and provides throughput statistics for cross-checking checking my tool output. But other times I need a micro-benchmark, or some C.
 1. **Write the tool to solve the problem and no more**. Unix philosophy: do one thing and do it well. netstat doesn't have an option to dump packets, tcpdump-style. They are two different tools.
 1. **Check your tool correctly measures your known workload**. If possible, run a prime number of events (eg, 23) and check that the numbers match. Try other workload variations.
 1. **Use other observability tools to perform a cross-check or sanity check**. Eg, imagine you write a PCI bus tool that shows current throughput is 28 Gbytes/sec. How could you sanity test that? Well, what PCI devices are there? Disks and network cards? Measure their throughput (iostat, nicstat, sar), and check if is in the ballpark of 28 Gbytes/sec (which would include PCI frame overheads). Ideally, your numbers match.
-1. **Measure the overhead of the tool**. If you are running a microbenchmark, how much slower is it with the tool running. Is more CPU consumed? Try to determine the worst case: run the microbenchmark so that CPU headroom is exhausted, and then run the bcc tool. Can overhead be lowered?
+1. **Measure the overhead of the tool**. If you are running a micro-benchmark, how much slower is it with the tool running. Is more CPU consumed? Try to determine the worst case: run the micro-benchmark so that CPU headroom is exhausted, and then run the bcc tool. Can overhead be lowered?
 1. **Test again, and stress test**. You want to discover and fix all the bad things before others hit them.
 1. **Consider command line options**. Should it have -p for filtering on a PID? -T for timestamps? -i for interval? See other tools for examples, and copy the style: the usage message should list example usage at the end. Remember to keep the tool doing one thing and doing it well. Also, if there's one option that seems to be the common case, perhaps it should just be the first argument and not need a switch (no -X). A special case of this is *stat tools, like iostat/vmstat/etc, where the convention is [interval [count]].
 1. **Use pep8 to check Python style**: pep8 --show-source --ignore=E123,E125,E126,E127,E128,E302 filename . Note that it misses some things, like consistent usage, so you'll still need to double check your script.
@@ -36,5 +36,6 @@ A checklist for bcc tool development:
 1. **Read your example.txt file**. Does this sound too niche or convoluted? Are you spending too much time explaining caveats? These can be hints that perhaps you should fix your tool, or abandon it! Perhaps it better belongs as an /example, and not a tool. I've abandoned many tools at this stage.
 1. **Write a man page**. Either ROFF (.8), markdown (.md), or plain text (.txt): so long as it documents the important sections, particularly columns (fields) and caveats. These go under man/man8. See the other examples. Include a section on overhead, and pull no punches. It's better for end users to know about high overhead beforehand, than to discover it the hard way. Also explain caveats. Don't assume those will be obvious to tool users.
 1. **Read your man page**. For ROFF: nroff -man filename. Like before, this exercise is like saying something out loud. Does it sound too niche or convoluted? Again, hints that you might need to go back and fix things, or abandon it.
+1. **Spell check your documentation**. Use a spell checker like aspell to check your document quality before committing.
 1. **Add an entry to README.md**.
 1. If you made it this far, pull request!
--- a/README.md
+++ b/README.md
@@ -67,21 +67,32 @@ Tools:

 - tools/[bashreadline](tools/bashreadline.py): Print entered bash commands system wide. [Examples](tools/bashreadline_example.txt).
 - tools/[biolatency](tools/biolatency.py): Summarize block device I/O latency as a histogram. [Examples](tools/biolatency_example.txt).
+- tools/[biotop](tools/biotop.py): Top for disks: Summarize block device I/O by process. [Examples](tools/biotop_example.txt).
 - tools/[biosnoop](tools/biosnoop.py): Trace block device I/O with PID and latency. [Examples](tools/biosnoop_example.txt).
 - tools/[bitesize](tools/bitesize.py): Show per process I/O size histogram. [Examples](tools/bitesize_example.txt).
 - tools/[cachestat](tools/cachestat.py): Trace page cache hit/miss ratio. [Examples](tools/cachestat_example.txt).
+- tools/[execsnoop](tools/execsnoop.py): Trace new processes via exec() syscalls. [Examples](tools/execsnoop_example.txt).
+- tools/[dcsnoop](tools/dcsnoop.py): Trace directory entry cache (dcache) lookups. [Examples](tools/dcsnoop_example.txt).
+- tools/[dcstat](tools/dcstat.py): Directory entry cache (dcache) stats. [Examples](tools/dcstat_example.txt).
+- tools/[filelife](tools/filelife.py): Trace the lifespan of short-lived files. [Examples](tools/filelife_example.txt).
+- tools/[fileslower](tools/fileslower.py): Trace slow synchronous file reads and writes. [Examples](tools/fileslower_example.txt).
+- tools/[filetop](tools/filetop.py): File reads and writes by filename and process. Top for files. [Examples](tools/filetop_example.txt).
 - tools/[funccount](tools/funccount.py): Count kernel function calls. [Examples](tools/funccount_example.txt).
 - tools/[funclatency](tools/funclatency.py): Time kernel functions and show their latency distribution. [Examples](tools/funclatency_example.txt).
 - tools/[gethostlatency](tools/gethostlatency.py): Show latency for getaddrinfo/gethostbyname[2] calls. [Examples](tools/gethostlatency_example.txt).
 - tools/[hardirqs](tools/hardirqs.py):  Measure hard IRQ (hard interrupt) event time. [Examples](tools/hardirqs_example.txt).
 - tools/[killsnoop](tools/killsnoop.py): Trace signals issued by the kill() syscall. [Examples](tools/killsnoop_example.txt).
+- tools/[memleak](tools/memleak.py): Display outstanding memory allocations to find memory leaks. [Examples](tools/memleak_examples.txt).
 - tools/[offcputime](tools/offcputime.py): Summarize off-CPU time by kernel stack trace. [Examples](tools/offcputime_example.txt).
 - tools/[offwaketime](tools/offwaketime.py): Summarize blocked time by kernel off-CPU stack and waker stack. [Examples](tools/offwaketime_example.txt).
+- tools/[oomkill](tools/oomkill.py): Trace the out-of-memory (OOM) killer. [Examples](tools/oomkill_example.txt).
 - tools/[opensnoop](tools/opensnoop.py): Trace open() syscalls. [Examples](tools/opensnoop_example.txt).
 - tools/[pidpersec](tools/pidpersec.py): Count new processes (via fork). [Examples](tools/pidpersec_example.txt).
+- tools/[runqlat](tools/runqlat.py): Run queue (scheduler) latency as a histogram. [Examples](tools/runqlat_example.txt).
 - tools/[softirqs](tools/softirqs.py):  Measure soft IRQ (soft interrupt) event time. [Examples](tools/softirqs_example.txt).
 - tools/[stackcount](tools/stackcount.py): Count kernel function calls and their stack traces. [Examples](tools/stackcount_example.txt).
 - tools/[stacksnoop](tools/stacksnoop.py): Trace a kernel function and print all kernel stack traces. [Examples](tools/stacksnoop_example.txt).
+- tools/[statsnoop](tools/statsnoop.py): Trace stat() syscalls. [Examples](tools/statsnoop_example.txt).
 - tools/[syncsnoop](tools/syncsnoop.py): Trace sync() syscall. [Examples](tools/syncsnoop_example.txt).
 - tools/[tcpaccept](tools/tcpaccept.py): Trace TCP passive connections (accept()). [Examples](tools/tcpaccept_example.txt).
 - tools/[tcpconnect](tools/tcpconnect.py): Trace TCP active connections (connect()). [Examples](tools/tcpconnect_example.txt).
@@ -177,7 +188,7 @@ section of the [kernel ftrace doc](https://www.kernel.org/doc/Documentation/trac

 ### Networking

-At RedHat Summit 2015, BCC was presented as part of a [session on BPF](http://www.devnation.org/#7784f1f7513e8542e4db519e79ff5eec).
+At Red Hat Summit 2015, BCC was presented as part of a [session on BPF](http://www.devnation.org/#7784f1f7513e8542e4db519e79ff5eec).
 A multi-host vxlan environment is simulated and a BPF program used to monitor
 one of the physical interfaces. The BPF program keeps statistics on the inner
 and outer IP addresses traversing the interface, and the userspace component

--- a/examples/networking/http_filter/README.md
+++ b/examples/networking/http_filter/README.md
+#Simple HTTP Filter: Project purpose
+
+Write an eBPF application that parses HTTP packets and extracts (and prints on screen) the URL contained in the GET/POST request.
+
+[eBPF HTTP Filter - Short Presentation](https://github.com/iovisor/bpf-docs/blob/master/ebpf_http_filter.pdf)
+
+#Usage Example
+
+```Shell
+$ sudo python http-parse-v2.py 
+GET /pipermail/iovisor-dev/ HTTP/1.1
+HTTP/1.1 200 OK
+GET /favicon.ico HTTP/1.1
+HTTP/1.1 404 Not Found
+GET /pipermail/iovisor-dev/2016-January/thread.html HTTP/1.1
+HTTP/1.1 200 OK
+GET /pipermail/iovisor-dev/2016-January/000046.html HTTP/1.1
+HTTP/1.1 200 OK
+```
+
+#Implementation using BCC
+
+eBPF socket filter.<br />
+Filters IP and TCP packets, containing "HTTP", "GET", "POST" in payload and all subsequent packets belonging to the same session, having the same (ip_src,ip_dst,port_src,port_dst).<br />
+Program is loaded as PROG_TYPE_SOCKET_FILTER and attached to a socket, bind to eth0. <br />
+Matching packets are forwarded to user space, others dropped by the filter.<br />
+<br />
+Python script reads filtered raw packets from the socket, if necessary reassembles packets belonging to the same session, and prints on stdout the first line of the HTTP GET/POST request. <br />
+
+#v1 vs v2
+
+First version is the simple one: if the url is too long (splitted in more than one packet) is truncated. <br />
+Second version is quite more complex: if necessary reassembles packets belonging to the same session and prints the complete url.
+
+#To run:
+
+```Shell
+$ sudo python http-parse.py
+$ sudo python http-parse-v2.py
+```
\ No newline at end of file
--- a/examples/networking/http_filter/http-parse-v2.c
+++ b/examples/networking/http_filter/http-parse-v2.c
+#include <uapi/linux/ptrace.h>
+#include <net/sock.h>
+#include <bcc/proto.h>
+
+#define IP_TCP 	6   
+#define ETH_HLEN 14
+
+struct Key {
+	u32 src_ip;               //source ip
+	u32 dst_ip;               //destination ip
+	unsigned short src_port;  //source port
+	unsigned short dst_port;  //destination port	
+};
+
+struct Leaf {
+	int timestamp;            //timestamp in ns
+};
+
+//BPF_TABLE(map_type, key_type, leaf_type, table_name, num_entry) 
+//map <Key, Leaf>
+//tracing sessions having same Key(dst_ip, src_ip, dst_port,src_port)
+BPF_TABLE("hash", struct Key, struct Leaf, sessions, 1024);
+
+/*eBPF program.
+  Filter IP and TCP packets, having payload not empty
+  and containing "HTTP", "GET", "POST"  as first bytes of payload.
+  AND ALL the other packets having same (src_ip,dst_ip,src_port,dst_port)
+  this means belonging to the same "session"
+  this additional check avoids url truncation, if url is too long
+  userspace script, if necessary, reassembles urls splitted in 2 or more packets.
+  if the program is loaded as PROG_TYPE_SOCKET_FILTER
+  and attached to a socket
+  return  0 -> DROP the packet
+  return -1 -> KEEP the packet and return it to user space (userspace can read it from the socket_fd )
+*/
+int http_filter(struct __sk_buff *skb) {
+
+	u8 *cursor = 0;
+
+	struct ethernet_t *ethernet = cursor_advance(cursor, sizeof(*ethernet));
+	//filter IP packets (ethernet type = 0x0800)
+	if (!(ethernet->type == 0x0800)) {
+		goto DROP;	
+	}
+
+	struct ip_t *ip = cursor_advance(cursor, sizeof(*ip));
+	//filter TCP packets (ip next protocol = 0x06)
+	if (ip->nextp != IP_TCP) {
+		goto DROP;
+	}
+
+	u32  tcp_header_length = 0;
+	u32  ip_header_length = 0;
+	u32  payload_offset = 0;
+	u32  payload_length = 0;
+	struct Key 	key;
+	struct Leaf leaf;
+
+	struct tcp_t *tcp = cursor_advance(cursor, sizeof(*tcp));
+
+	//retrieve ip src/dest and port src/dest of current packet
+	//and save it into struct Key
+	key.dst_ip = ip->dst;
+	key.src_ip = ip->src;
+	key.dst_port = tcp->dst_port;
+	key.src_port = tcp->src_port;
+
+	//calculate ip header length
+	//value to multiply * 4
+	//e.g. ip->hlen = 5 ; IP Header Length = 5 x 4 byte = 20 byte
+	ip_header_length = ip->hlen << 2;    //SHL 2 -> *4 multiply
+		
+	//calculate tcp header length
+	//value to multiply *4
+	//e.g. tcp->offset = 5 ; TCP Header Length = 5 x 4 byte = 20 byte
+	tcp_header_length = tcp->offset << 2; //SHL 2 -> *4 multiply
+
+	//calculate patload offset and length
+	payload_offset = ETH_HLEN + ip_header_length + tcp_header_length; 
+	payload_length = ip->tlen - ip_header_length - tcp_header_length;
+		  
+	//http://stackoverflow.com/questions/25047905/http-request-minimum-size-in-bytes
+	//minimum length of http request is always geater than 7 bytes
+	//avoid invalid access memory
+	//include empty payload
+	if(payload_length < 7) {
+		goto DROP;
+	}
+
+	//load firt 7 byte of payload into p (payload_array)
+	//direct access to skb not allowed
+	unsigned long p[7];
+	int i = 0;
+	int j = 0;
+	for (i = payload_offset ; i < (payload_offset + 7) ; i++) {
+		p[j] = load_byte(skb , i);
+		j++;
+	}
+
+	//find a match with an HTTP message
+	//HTTP
+	if ((p[0] == 'H') && (p[1] == 'T') && (p[2] == 'T') && (p[3] == 'P')) {
+		goto HTTP_MATCH;
+	}
+	//GET
+	if ((p[0] == 'G') && (p[1] == 'E') && (p[2] == 'T')) {
+		goto HTTP_MATCH;
+	}
+	//POST
+	if ((p[0] == 'P') && (p[1] == 'O') && (p[2] == 'S') && (p[3] == 'T')) {
+		goto HTTP_MATCH;
+	}
+	//PUT
+	if ((p[0] == 'P') && (p[1] == 'U') && (p[2] == 'T')) {
+		goto HTTP_MATCH;
+	}
+	//DELETE
+	if ((p[0] == 'D') && (p[1] == 'E') && (p[2] == 'L') && (p[3] == 'E') && (p[4] == 'T') && (p[5] == 'E')) {
+		goto HTTP_MATCH;
+	}
+	//HEAD
+	if ((p[0] == 'H') && (p[1] == 'E') && (p[2] == 'A') && (p[3] == 'D')) {
+		goto HTTP_MATCH;
+	}
+
+	//no HTTP match
+	//check if packet belong to an HTTP session
+	struct Leaf * lookup_leaf = sessions.lookup(&key);
+	if(lookup_leaf) {
+		//send packet to userspace
+		goto KEEP;
+	}
+	goto DROP;
+
+	//keep the packet and send it to userspace retruning -1
+	HTTP_MATCH:
+	//if not already present, insert into map <Key, Leaf>
+	leaf.timestamp = 0;
+	sessions.lookup_or_init(&key, &leaf);
+	sessions.update(&key,&leaf);
+	
+	//send packet to userspace returning -1
+	KEEP:
+	return -1;
+
+	//drop the packet returning 0
+	DROP:
+	return 0;
+
+}
--- a/examples/networking/http_filter/http-parse-v2.py
+++ b/examples/networking/http_filter/http-parse-v2.py
--- a/examples/networking/http_filter/http-parse.c
+++ b/examples/networking/http_filter/http-parse.c
+#include <uapi/linux/ptrace.h>
+#include <net/sock.h>
+#include <bcc/proto.h>
+
+#define IP_TCP 	6   
+#define ETH_HLEN 14
+
+/*eBPF program.
+  Filter IP and TCP packets, having payload not empty
+  and containing "HTTP", "GET", "POST" ... as first bytes of payload
+  if the program is loaded as PROG_TYPE_SOCKET_FILTER
+  and attached to a socket
+  return  0 -> DROP the packet
+  return -1 -> KEEP the packet and return it to user space (userspace can read it from the socket_fd )
+*/
+int http_filter(struct __sk_buff *skb) {
+
+	u8 *cursor = 0;
+
+	struct ethernet_t *ethernet = cursor_advance(cursor, sizeof(*ethernet));
+	//filter IP packets (ethernet type = 0x0800)
+	if (!(ethernet->type == 0x0800)) {
+		goto DROP;	
+	}
+
+	struct ip_t *ip = cursor_advance(cursor, sizeof(*ip));
+	//filter TCP packets (ip next protocol = 0x06)
+	if (ip->nextp != IP_TCP) {
+		goto DROP;
+	}
+
+	u32  tcp_header_length = 0;
+	u32  ip_header_length = 0;
+	u32  payload_offset = 0;
+	u32  payload_length = 0;
+
+	struct tcp_t *tcp = cursor_advance(cursor, sizeof(*tcp));
+
+	//calculate ip header length
+	//value to multiply * 4
+	//e.g. ip->hlen = 5 ; IP Header Length = 5 x 4 byte = 20 byte
+	ip_header_length = ip->hlen << 2;    //SHL 2 -> *4 multiply
+		
+	//calculate tcp header length
+	//value to multiply *4
+	//e.g. tcp->offset = 5 ; TCP Header Length = 5 x 4 byte = 20 byte
+	tcp_header_length = tcp->offset << 2; //SHL 2 -> *4 multiply
+
+	//calculate patload offset and length
+	payload_offset = ETH_HLEN + ip_header_length + tcp_header_length; 
+	payload_length = ip->tlen - ip_header_length - tcp_header_length;
+		  
+	//http://stackoverflow.com/questions/25047905/http-request-minimum-size-in-bytes
+	//minimum length of http request is always geater than 7 bytes
+	//avoid invalid access memory
+	//include empty payload
+	if(payload_length < 7) {
+		goto DROP;
+	}
+
+	//load firt 7 byte of payload into p (payload_array)
+	//direct access to skb not allowed
+	unsigned long p[7];
+	int i = 0;
+	int j = 0;
+	for (i = payload_offset ; i < (payload_offset + 7) ; i++) {
+		p[j] = load_byte(skb , i);
+		j++;
+	}
+
+	//find a match with an HTTP message
+	//HTTP
+	if ((p[0] == 'H') && (p[1] == 'T') && (p[2] == 'T') && (p[3] == 'P')) {
+		goto KEEP;
+	}
+	//GET
+	if ((p[0] == 'G') && (p[1] == 'E') && (p[2] == 'T')) {
+		goto KEEP;
+	}
+	//POST
+	if ((p[0] == 'P') && (p[1] == 'O') && (p[2] == 'S') && (p[3] == 'T')) {
+		goto KEEP;
+	}
+	//PUT
+	if ((p[0] == 'P') && (p[1] == 'U') && (p[2] == 'T')) {
+		goto KEEP;
+	}
+	//DELETE
+	if ((p[0] == 'D') && (p[1] == 'E') && (p[2] == 'L') && (p[3] == 'E') && (p[4] == 'T') && (p[5] == 'E')) {
+		goto KEEP;
+	}
+	//HEAD
+	if ((p[0] == 'H') && (p[1] == 'E') && (p[2] == 'A') && (p[3] == 'D')) {
+		goto KEEP;
+	}
+
+	//no HTTP match
+	goto DROP;
+
+	//keep the packet and send it to userspace retruning -1
+	KEEP:
+	return -1;
+
+	//drop the packet returning 0
+	DROP:
+	return 0;
+
+}
\ No newline at end of file
--- a/examples/networking/http_filter/http-parse.py
+++ b/examples/networking/http_filter/http-parse.py
+#!/usr/bin/python
+#
+#Bertrone Matteo - Polytechnic of Turin
+#November 2015
+#
+#eBPF application that parses HTTP packets 
+#and extracts (and prints on screen) the URL contained in the GET/POST request.
+#
+#eBPF program http_filter is used as SOCKET_FILTER attached to eth0 interface.
+#only packet of type ip and tcp containing HTTP GET/POST are returned to userspace, others dropped
+#
+#python script uses bcc BPF Compiler Collection by iovisor (https://github.com/iovisor/bcc)
+#and prints on stdout the first line of the HTTP GET/POST request containing the url
+
+from __future__ import print_function
+from bcc import BPF
+
+import sys
+import socket
+import os
+
+# initialize BPF - load source code from http-parse.c
+bpf = BPF(src_file = "http-parse.c",debug = 0)
+
+#load eBPF program http_filter of type SOCKET_FILTER into the kernel eBPF vm
+#more info about eBPF program types
+#http://man7.org/linux/man-pages/man2/bpf.2.html
+function_http_filter = bpf.load_func("http_filter", BPF.SOCKET_FILTER)
+
+#create raw socket, bind it to eth0
+#attach bpf program to socket created
+BPF.attach_raw_socket(function_http_filter, "eth0")
+
+#get file descriptor of the socket previously created inside BPF.attach_raw_socket
+socket_fd = function_http_filter.sock
+
+#create python socket object, from the file descriptor
+sock = socket.fromfd(socket_fd,socket.PF_PACKET,socket.SOCK_RAW,socket.IPPROTO_IP)
+#set it as blocking socket
+sock.setblocking(True)
+
+while 1:
+  #retrieve raw packet from socket
+  packet_str = os.read(socket_fd,2048)
+
+  #DEBUG - print raw packet in hex format
+  #packet_hex = toHex(packet_str)
+  #print ("%s" % packet_hex)
+
+  #convert packet into bytearray
+  packet_bytearray = bytearray(packet_str)
+  
+  #ethernet header length
+  ETH_HLEN = 14 
+
+  #IP HEADER
+  #https://tools.ietf.org/html/rfc791
+  # 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+  # +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+  # |Version|  IHL  |Type of Service|          Total Length         |
+  # +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+  #
+  #IHL : Internet Header Length is the length of the internet header 
+  #value to multiply * 4 byte
+  #e.g. IHL = 5 ; IP Header Length = 5 * 4 byte = 20 byte
+  #
+  #Total length: This 16-bit field defines the entire packet size, 
+  #including header and data, in bytes.
+
+  #calculate packet total length
+  total_length = packet_bytearray[ETH_HLEN + 2]               #load MSB
+  total_length = total_length << 8                            #shift MSB
+  total_length = total_length + packet_bytearray[ETH_HLEN+3]  #add LSB
+  
+  #calculate ip header length
+  ip_header_length = packet_bytearray[ETH_HLEN]               #load Byte
+  ip_header_length = ip_header_length & 0x0F                  #mask bits 0..3
+  ip_header_length = ip_header_length << 2                    #shift to obtain length
+
+  #TCP HEADER 
+  #https://www.rfc-editor.org/rfc/rfc793.txt
+  #  12              13              14              15  
+  #  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
+  # +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+  # |  Data |           |U|A|P|R|S|F|                               |
+  # | Offset| Reserved  |R|C|S|S|Y|I|            Window             |
+  # |       |           |G|K|H|T|N|N|                               |
+  # +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+  #
+  #Data Offset: This indicates where the data begins.  
+  #The TCP header is an integral number of 32 bits long.
+  #value to multiply * 4 byte
+  #e.g. DataOffset = 5 ; TCP Header Length = 5 * 4 byte = 20 byte
+
+  #calculate tcp header length
+  tcp_header_length = packet_bytearray[ETH_HLEN + ip_header_length + 12]  #load Byte
+  tcp_header_length = tcp_header_length & 0xF0                            #mask bit 4..7
+  tcp_header_length = tcp_header_length >> 2                              #SHR 4 ; SHL 2 -> SHR 2
+  
+  #calculate payload offset
+  payload_offset = ETH_HLEN + ip_header_length + tcp_header_length
+  
+  #print first line of the HTTP GET/POST request
+  #line ends with 0xOD 0xOA (\r\n)
+  #(if we want to print all the header print until \r\n\r\n)
+  for i in range (payload_offset-1,len(packet_bytearray)-1):
+    if (packet_bytearray[i]== 0x0A):
+      if (packet_bytearray[i-1] == 0x0D):
+        break
+    print ("%c" % chr(packet_bytearray[i]), end = "")
+  print("")
+
--- a/examples/tracing/bitehist_example.txt
+++ b/examples/tracing/bitehist_example.txt
@@ -21,5 +21,5 @@ between 128 and 255 Kbytes in size, and another mode of 211 I/O were between
 4 and 7 Kbytes in size.

 Understanding this distribution is useful for characterizing workloads and
-understanding performance. The existance of this distribution is not visible
+understanding performance. The existence of this distribution is not visible
 from averages alone.
--- a/man/man8/biolatency.8
+++ b/man/man8/biolatency.8
@@ -65,7 +65,7 @@ Show a latency histogram for each disk device separately:
 usecs
 Microsecond range
 .TP
-mecs
+msecs
 Millisecond range
 .TP
 count
@@ -74,7 +74,7 @@ How many I/O fell into this range
 distribution
 An ASCII bar chart to visualize the distribution (count column)
 .SH OVERHEAD
-This traces kernel functions and maintains in-kernel timestamps and a histgroam,
+This traces kernel functions and maintains in-kernel timestamps and a histogram,
 which are asynchronously copied to user-space. This method is very efficient,
 and the overhead for most storage I/O rates (< 10k IOPS) should be negligible.
 If you have a higher IOPS storage environment, test and quantify the overhead

--- a/man/man8/biotop.8
+++ b/man/man8/biotop.8
+.TH biotop 8  "2016-02-06" "USER COMMANDS"
+.SH NAME
+biotop \- Block device (disk) I/O by process top.
+.SH SYNOPSIS
+.B biotop [\-h] [\-C] [\-r MAXROWS] [interval] [count]
+.SH DESCRIPTION
+This is top for disks. 
+
+This traces block device I/O (disk I/O), and prints a per-process summary every
+interval (by default, 1 second). The summary is sorted on the top disk
+consumers by throughput (Kbytes). The PID and process name shown are measured
+from when the I/O was first created, which usually identifies the responsible
+process.
+
+For efficiency, this uses in-kernel eBPF maps to cache process details (PID and
+comm) by I/O request, as well as a starting timestamp for calculating I/O
+latency, and the final summary.
+
+This works by tracing various kernel blk_*() functions using dynamic tracing,
+and will need updating to match any changes to these functions.
+
+Since this uses BPF, only the root user can use this tool.
+.SH REQUIREMENTS
+CONFIG_BPF and bcc.
+.SH OPTIONS
+.TP
+\-C
+Don't clear the screen.
+.TP
+\-r MAXROWS
+Maximum number of rows to print. Default is 20.
+.TP
+\-p PID
+Trace this PID only.
+.TP
+interval
+Interval between updates, seconds.
+.TP
+count
+Number of interval summaries.
+.SH EXAMPLES
+.TP
+Summarize block device I/O by process, 1 second screen refresh:
+#
+.B biotop
+.TP
+Don't clear the screen:
+#
+.B biotop -C
+.TP
+5 second summaries, 10 times only:
+#
+.B biotop 5 10
+.SH FIELDS
+.TP
+loadavg:
+The contents of /proc/loadavg
+.TP
+PID
+Cached process ID, if present. This usually (but isn't guaranteed) to identify
+the responsible process for the I/O.
+.TP
+COMM
+Cached process name, if present. This usually (but isn't guaranteed) to identify
+the responsible process for the I/O.
+.TP
+D
+Direction: R == read, W == write.
+.TP
+MAJ
+Major device number.
+.TP
+MIN
+Minor device number.
+.TP
+DISK
+Disk device name.
+.TP
+I/O
+Number of I/O during the interval.
+.TP
+Kbytes
+Total Kbytes for these I/O, during the interval.
+.TP
+AVGms
+Average time for the I/O (latency) from the issue to the device, to its
+completion, in milliseconds.
+.SH OVERHEAD
+Since block device I/O usually has a relatively low frequency (< 10,000/s),
+the overhead for this tool is expected to be low or negligible. For high IOPS
+storage systems, test and quantify before use.
+.SH SOURCE
+This is from bcc.
+.IP
+https://github.com/iovisor/bcc
+.PP
+Also look in the bcc distribution for a companion _examples.txt file containing
+example usage, output, and commentary for this tool.
+.SH OS
+Linux
+.SH STABILITY
+Unstable - in development.
+.SH AUTHOR
+Brendan Gregg
+.SH INSPIRATION
+top(1) by William LeFebvre
+.SH SEE ALSO
+biosnoop(8), biolatency(8), iostat(1)
--- a/man/man8/cachestat.8
+++ b/man/man8/cachestat.8
@@ -34,7 +34,7 @@ Print output every five seconds, three times:
 #
 .B cachestat 5 3
 .TP
-Print output with timetsmap every five seconds, three times:
+Print output with timestamp every five seconds, three times:
 #
 .B cachestat -T 5 3
 .SH FIELDS

--- a/man/man8/dcsnoop.8
+++ b/man/man8/dcsnoop.8
+.TH dcsnoop 8  "2016-02-10" "USER COMMANDS"
+.SH NAME
+dcsnoop \- Trace directory entry cache (dcache) lookups. Uses Linux eBPF/bcc.
+.SH SYNOPSIS
+.B dcsnoop [\-h] [\-a]
+.SH DESCRIPTION
+By default, this traces every failed dcache lookup (cache miss), and shows the
+process performing the lookup and the filename requested. A \-a option can be
+used to show all lookups, not just failed ones.
+
+The output of this tool can be verbose, and is intended for further
+investigations of dcache performance beyond dcstat(8), which prints
+per-second summaries.
+
+This uses kernel dynamic tracing of the d_lookup() function, and will need
+and will need updating to match any changes to this function.
+
+Since this uses BPF, only the root user can use this tool.
+.SH REQUIREMENTS
+CONFIG_BPF and bcc.
+.SH OPTIONS
+.TP
+\-h
+Print usage message.
+.TP
+\-a
+Trace references, not just failed lookups.
+.SH EXAMPLES
+.TP
+Trace failed dcache lookups:
+#
+.B dcsnoop
+.TP
+Trace all dcache lookups:
+#
+.B dcsnoop \-a
+.SH FIELDS
+.TP
+TIME(s)
+Time of lookup, in seconds.
+.TP
+PID
+Process ID.
+.TP
+COMM
+Process name.
+.TP
+T
+Type: R == reference (only visible with \-a), M == miss. A miss will print two
+lines, one for the reference, and one for the miss.
+.TP
+FILE
+The file name component that was being looked up. This contains trailing
+pathname components (after '/'), which will be the subject of subsequent
+lookups.
+.SH OVERHEAD
+File name lookups can be frequent (depending on the workload), and this tool
+prints a line for each failed lookup, and with \-a, each reference as well. The
+output may be verbose, and the incurred overhead, while optimized to some
+extent, may still be from noticeable to significant. This is only really
+intended for deeper investigations beyond dcstat(8), when absolutely necessary.
+Measure and quantify the overhead in a test environment before use.
+.SH SOURCE
+This is from bcc.
+.IP
+https://github.com/iovisor/bcc
+.PP
+Also look in the bcc distribution for a companion _examples.txt file containing
+example usage, output, and commentary for this tool.
+.SH OS
+Linux
+.SH STABILITY
+Unstable - in development.
+.SH AUTHOR
+Brendan Gregg
+.SH SEE ALSO
+dcstat(1)
--- a/man/man8/dcstat.8
+++ b/man/man8/dcstat.8
+.TH dcstat 8  "2016-02-09" "USER COMMANDS"
+.SH NAME
+dcstat \- Directory entry cache (dcache) stats. Uses Linux eBPF/bcc.
+.SH SYNOPSIS
+.B dcstat
+[interval [count]]
+.SH DESCRIPTION
+The Linux directory entry cache (dcache) improves the performance of file and
+directory name lookups. This tool provides per-second summary statistics of
+dcache performance.
+
+This uses kernel dynamic tracing of kernel functions, lookup_fast() and
+d_lookup(), which will need to be modified to match kernel changes.
+
+Since this uses BPF, only the root user can use this tool.
+.SH REQUIREMENTS
+CONFIG_BPF and bcc.
+.SH EXAMPLES
+.TP
+Print summaries each second:
+#
+.B dcstat
+.TP
+Print output every five seconds, three times:
+#
+.B dcstat 5 3
+.SH FIELDS
+.TP
+REFS/s
+Number dcache lookups (references) per second.
+.TP
+SLOW/s
+Number of dcache lookups that failed the lookup_fast() path and executed the
+lookup_slow() path instead.
+.TP
+MISS/s
+Number of dcache misses (failed both fast and slow lookups).
+.TP
+HIT%
+Percentage of dcache hits over total references.
+.SH OVERHEAD
+The overhead depends on the frequency of file and directory name lookups.
+While the per-event overhead is low, some applications may make over 100k
+lookups per second, and the low per-event overhead will begin to add up, and
+could begin to be measurable (over 10% CPU usage). Measure in a test
+environment.
+.SH SOURCE
+This is from bcc.
+.IP
+https://github.com/iovisor/bcc
+.PP
+Also look in the bcc distribution for a companion _examples.txt file containing
+example usage, output, and commentary for this tool.
+.SH OS
+Linux
+.SH STABILITY
+Unstable - in development.
+.SH AUTHOR
+Brendan Gregg
+.SH SEE ALSO
+dcsnoop(8)
--- a/man/man8/execsnoop.8
+++ b/man/man8/execsnoop.8
+.TH execsnoop 8  "2016-02-07" "USER COMMANDS"
+.SH NAME
+execsnoop \- Trace new processes via exec() syscalls. Uses Linux eBPF/bcc.
+.SH SYNOPSIS
+.B execsnoop [\-h] [\-t] [\-x] [\-n NAME]
+.SH DESCRIPTION
+execsnoop traces new processes, showing the filename executed and argument
+list.
+
+It works by traces the execve() system call (commonly used exec() variant).
+This catches new processes that follow the fork->exec sequence, as well as
+processes that re-exec() themselves. Some applications fork() but do not
+exec(), eg, for worker processes, which won't be included in the execsnoop
+output.
+
+This works by tracing the kernel sys_execve() function using dynamic tracing,
+and will need updating to match any changes to this function.
+
+Since this uses BPF, only the root user can use this tool.
+.SH REQUIREMENTS
+CONFIG_BPF and bcc.
+.SH OPTIONS
+.TP
+\-h
+Print usage message.
+.TP
+\-t
+Include a timestamp column.
+.TP
+\-x
+Include failed exec()s
+.TP
+\-n NAME
+Only print command lines matching this name (regex), matched anywhere
+.SH EXAMPLES
+.TP
+Trace all exec() syscalls:
+#
+.B execsnoop
+.TP
+Trace all exec() syscalls, and include timestamps:
+#
+.B execsnoop \-t
+.TP
+Include failed exec()s:
+#
+.B execsnoop \-x
+.TP
+Only trace exec()s where the filename or arguments contain "mount":
+#
+.B opensnoop \-n mount
+.SH FIELDS
+.TP
+TIME(s)
+Time of exec() return, in seconds.
+.TP
+PCOMM
+Parent process/command name.
+.TP
+PID
+Process ID
+.TP
+RET
+Return value of exec(). 0 == successs. Failures are only shown when using the
+\-x option.
+.TP
+ARGS
+Filename for the exec(), followed be up to 19 arguments. An ellipsis "..." is
+shown if the argument list is known to be truncated.
+.SH OVERHEAD
+This traces the kernel execve function and prints output for each event. As the
+rate of this is generally expected to be low (< 1000/s), the overhead is also
+expected to be negligible. If you have an application that is calling a high
+rate of exec()s, then test and understand overhead before use.
+.SH SOURCE
+This is from bcc.
+.IP
+https://github.com/iovisor/bcc
+.PP
+Also look in the bcc distribution for a companion _examples.txt file containing
+example usage, output, and commentary for this tool.
+.SH OS
+Linux
+.SH STABILITY
+Unstable - in development.
+.SH AUTHOR
+Brendan Gregg
+.SH SEE ALSO
+opensnoop(1)
--- a/man/man8/filelife.8
+++ b/man/man8/filelife.8
+.TH filelife 8  "2016-02-08" "USER COMMANDS"
+.SH NAME
+filelife \- Trace the lifespan of short-lived files. Uses Linux eBPF/bcc.
+.SH SYNOPSIS
+.B filelife [\-h] [\-p PID]
+.SH DESCRIPTION
+This traces the creation and deletion of files, providing information
+on who deleted the file, the file age, and the file name. The intent is to
+provide information on short-lived files, for debugging or performance
+analysis.
+
+This works by tracing the kernel vfs_create() and vfs_delete() functions using
+dynamic tracing, and will need updating to match any changes to these
+functions.
+
+Since this uses BPF, only the root user can use this tool.
+.SH REQUIREMENTS
+CONFIG_BPF and bcc.
+.SH OPTIONS
+.TP
+\-h
+Print usage message.
+.TP
+\-p PID
+Trace this process ID only (filtered in-kernel).
+.SH EXAMPLES
+.TP
+Trace all short-lived files, and print details:
+#
+.B filelife
+.TP
+Trace all short-lived files created AND deleted by PID 181:
+#
+.B filelife \-p 181
+.SH FIELDS
+.TP
+TIME
+Time of the deletion.
+.TP
+PID
+Process ID that deleted the file.
+.TP
+COMM
+Process name for the PID.
+.TP
+AGE(s)
+Age of the file, from creation to deletion, in seconds.
+.TP
+FILE
+Filename.
+.SH OVERHEAD
+This traces the kernel VFS file create and delete functions and prints output
+for each delete. As the rate of this is generally expected to be low
+(< 1000/s), the overhead is also expected to be negligible.
+This is from bcc.
+.IP
+https://github.com/iovisor/bcc
+.PP
+Also look in the bcc distribution for a companion _examples.txt file containing
+example usage, output, and commentary for this tool.
+.SH OS
+Linux
+.SH STABILITY
+Unstable - in development.
+.SH AUTHOR
+Brendan Gregg
+.SH SEE ALSO
+opensnoop(1)
--- a/man/man8/fileslower.8
+++ b/man/man8/fileslower.8
+.TH fileslower 8  "2016-02-07" "USER COMMANDS"
+.SH NAME
+fileslower \- Trace slow synchronous file reads and writes.
+.SH SYNOPSIS
+.B fileslower [\-h] [\-p PID] [min_ms]
+.SH DESCRIPTION
+This script uses kernel dynamic tracing of synchronous reads and writes
+at the VFS interface, to identify slow file reads and writes for any file
+system.
+
+This version traces __vfs_read() and __vfs_write() and only showing
+synchronous I/O (the path to new_sync_read() and new_sync_write()), and
+I/O with filenames. This approach provides a view of just two file
+system request types: file reads and writes. There are typically many others:
+asynchronous I/O, directory operations, file handle operations, file open()s,
+fflush(), etc.
+
+WARNING: See the OVERHEAD section.
+
+By default, a minimum millisecond threshold of 10 is used.
+
+Since this works by tracing various kernel __vfs_*() functions using dynamic
+tracing, it will need updating to match any changes to these functions. A
+future version should switch to using FS tracepoints instead.
+
+Since this uses BPF, only the root user can use this tool.
+.SH REQUIREMENTS
+CONFIG_BPF and bcc.
+.SH OPTIONS
+\-p PID
+Trace this PID only.
+.TP
+min_ms
+Minimum I/O latency (duration) to trace, in milliseconds. Default is 10 ms.
+.SH EXAMPLES
+.TP
+Trace synchronous file reads and writes slower than 10 ms:
+#
+.B fileslower
+.TP
+Trace slower than 1 ms:
+#
+.B fileslower 1
+.TP
+Trace slower than 1 ms, for PID 181 only:
+#
+.B fileslower \-p 181 1
+.SH FIELDS
+.TP
+TIME(s)
+Time of I/O completion since the first I/O seen, in seconds.
+.TP
+COMM
+Process name.
+.TP
+PID
+Process ID.
+.TP
+D
+Direction of I/O. R == read, W == write.
+.TP
+BYTES
+Size of I/O, in bytes.
+.TP
+LAT(ms)
+Latency (duration) of I/O, measured from when the application issued it to VFS
+to when it completed. This time is inclusive of block device I/O, file system
+CPU cycles, file system locks, run queue latency, etc. It's a more accurate
+measure of the latency suffered by applications performing file system I/O,
+than to measure this down at the block device interface.
+.TP
+FILENAME
+A cached kernel file name (comes from dentry->d_iname).
+.SH OVERHEAD
+Depending on the frequency of application reads and writes, overhead can become
+severe, in the worst case slowing applications by 2x. In the best case, the
+overhead is negligible. Hopefully for real world workloads the overhead is
+often at the lower end of the spectrum -- test before use. The reason for
+high overhead is that this traces VFS reads and writes, which includes FS
+cache reads and writes, and can exceed one million events per second if the
+application is I/O heavy. While the instrumentation is extremely lightweight,
+and uses in-kernel eBPF maps for efficient timing and filtering, multiply that
+cost by one million events per second and that cost becomes a million times
+worse. You can get an idea of the possible cost by just counting the
+instrumented events using the bcc funccount tool, eg:
+.PP
+# ./funccount.py -i 1 -r '^__vfs_(read|write)$'
+.PP
+This also costs overhead, but is somewhat less than fileslower.
+.PP
+If the overhead is prohibitive for your workload, I'd recommend moving
+down-stack a little from VFS into the file system functions (ext4, xfs, etc).
+Look for updates to bcc for specific file system tools that do this. The
+advantage of a per-file system approach is that we can trace post-cache,
+greatly reducing events and overhead. The disadvantage is needing custom
+tracing approaches for each different file system (whereas VFS is generic).
+.SH SOURCE
+This is from bcc.
+.IP
+https://github.com/iovisor/bcc
+.PP
+Also look in the bcc distribution for a companion _examples.txt file containing
+example usage, output, and commentary for this tool.
+.SH OS
+Linux
+.SH STABILITY
+Unstable - in development.
+.SH AUTHOR
+Brendan Gregg
+.SH SEE ALSO
+biosnoop(8), funccount(8)
--- a/man/man8/filetop.8
+++ b/man/man8/filetop.8
+.TH filetop 8  "2016-02-08" "USER COMMANDS"
+.SH NAME
+filetop \- File reads and writes by filename and process. Top for files.
+.SH SYNOPSIS
+.B filetop [\-h] [\-C] [\-r MAXROWS] [\-p PID] [interval] [count]
+.SH DESCRIPTION
+This is top for files. 
+
+This traces file reads and writes, and prints a per-file summary every
+interval (by default, 1 second). The summary is sorted on the highest read
+throughput (Kbytes).
+
+This uses in-kernel eBPF maps to store per process summaries for efficiency.
+
+This script works by tracing the __vfs_read() and __vfs_write() functions using
+kernel dynamic tracing, which instruments explicit read and write calls. If
+files are read or written using another means (eg, via mmap()), then they
+will not be visible using this tool. Also, this tool will need updating to
+match any code changes to those vfs functions.
+
+This should be useful for file system workload characterization when analyzing
+the performance of applications.
+
+Note that tracing VFS level reads and writes can be a frequent activity, and
+this tool can begin to cost measurable overhead at high I/O rates.
+
+Since this uses BPF, only the root user can use this tool.
+.SH REQUIREMENTS
+CONFIG_BPF and bcc.
+.SH OPTIONS
+.TP
+\-C
+Don't clear the screen.
+.TP
+\-r MAXROWS
+Maximum number of rows to print. Default is 20.
+.TP
+\-p PID
+Trace this PID only.
+.TP
+interval
+Interval between updates, seconds.
+.TP
+count
+Number of interval summaries.
+
+.SH EXAMPLES
+.TP
+Summarize block device I/O by process, 1 second screen refresh:
+#
+.B filetop
+.TP
+Don't clear the screen, and top 8 rows only:
+#
+.B filetop -Cr 8
+.TP
+5 second summaries, 10 times only:
+#
+.B filetop 5 10
+.SH FIELDS
+.TP
+loadavg:
+The contents of /proc/loadavg
+.TP
+PID
+Process ID.
+.TP
+COMM
+Process name.
+.TP
+READS
+Count of reads during interval.
+.TP
+WRITES
+Count of writes during interval.
+.TP
+R_Kb
+Total read Kbytes during interval.
+.TP
+W_Kb
+Total write Kbytes during interval.
+.TP
+T
+Type of file: R == regular, S == socket, O == other (pipe, etc).
+.SH OVERHEAD
+Depending on the frequency of application reads and writes, overhead can become
+significant, in the worst case slowing applications by over 50%. Hopefully for
+real world workloads the overhead is much less -- test before use. The reason
+for the high overhead is that VFS reads and writes can be a frequent event, and
+despite the eBPF overhead being very small per event, if you multiply this
+small overhead by a million events per second, it becomes a million times
+worse. Literally. You can gauge the number of reads and writes using the
+vfsstat(8) tool, also from bcc.
+.SH SOURCE
+This is from bcc.
+.IP
+https://github.com/iovisor/bcc
+.PP
+Also look in the bcc distribution for a companion _examples.txt file containing
+example usage, output, and commentary for this tool.
+.SH OS
+Linux
+.SH STABILITY
+Unstable - in development.
+.SH AUTHOR
+Brendan Gregg
+.SH INSPIRATION
+top(1) by William LeFebvre
+.SH SEE ALSO
+vfsstat(8), vfscount(8), fileslower(8)
--- a/man/man8/funclatency.8
+++ b/man/man8/funclatency.8
 .TH funclatency 8  "2015-08-18" "USER COMMANDS"
 .SH NAME
-funclatency \- Time kernel funcitons and print latency as a histogram.
+funclatency \- Time kernel functions and print latency as a histogram.
 .SH SYNOPSIS
 .B funclatency [\-h] [\-p PID] [\-i INTERVAL] [\-T] [\-u] [\-m] [\-r] [\-F] pattern
 .SH DESCRIPTION
@@ -88,7 +88,7 @@ Nanosecond range
 usecs
 Microsecond range
 .TP
-mecs
+msecs
 Millisecond range
 .TP
 count
@@ -97,7 +97,7 @@ How many calls fell into this range
 distribution
 An ASCII bar chart to visualize the distribution (count column)
 .SH OVERHEAD
-This traces kernel functions and maintains in-kernel timestamps and a histgroam,
+This traces kernel functions and maintains in-kernel timestamps and a histogram,
 which are asynchronously copied to user-space. While this method is very
 efficient, the rate of kernel functions can also be very high (>1M/sec), at
 which point the overhead is expected to be measurable. Measure in a test

--- a/man/man8/hardirqs.8
+++ b/man/man8/hardirqs.8
@@ -73,7 +73,7 @@ This traces kernel functions and maintains in-kernel counts, which
 are asynchronously copied to user-space. While the rate of interrupts
 be very high (>1M/sec), this is a relatively efficient way to trace these
 events, and so the overhead is expected to be small for normal workloads, but
-could become noticable for heavy workloads. Measure in a test environment
+could become noticeable for heavy workloads. Measure in a test environment
 before use.
 .SH SOURCE
 This is from bcc.

--- a/man/man8/memleak.8
+++ b/man/man8/memleak.8
+.TH memleak 8  "2016-01-14" "USER COMMANDS"
+.SH NAME
+memleak \- Print a summary of outstanding allocations and their call stacks to detect memory leaks. Uses Linux eBPF/bcc.
+.SH SYNOPSIS
+.B memleak [-h] [-p PID] [-t] [-a] [-o OLDER] [-c COMMAND] [-s SAMPLE_RATE]
+[-d STACK_DEPTH] [-T TOP] [-z MIN_SIZE] [-Z MAX_SIZE] [INTERVAL] [COUNT]
+.SH DESCRIPTION
+memleak traces and matches memory allocation and deallocation requests, and
+collects call stacks for each allocation. memleak can then print a summary
+of which call stacks performed allocations that weren't subsequently freed.
+
+When tracing a specific process, memleak instruments malloc and free from libc.
+When tracing all processes, memleak instruments kmalloc and kfree.
+
+memleak may introduce significant overhead when tracing processes that allocate
+and free many blocks very quickly. See the OVERHEAD section below.
+
+The stack depth is limited to 10 by default (+1 for the current instruction pointer),
+but it can be controlled using the \-d switch if deeper stacks are required.
+
+This currently only works on x86_64. Check for future versions.
+.SH REQUIREMENTS
+CONFIG_BPF and bcc.
+.SH OPTIONS
+.TP
+\-h
+Print usage message.
+.TP
+\-p PID
+Trace this process ID only (filtered in-kernel). This traces malloc and free from libc.
+.TP
+\-t
+Print a trace of all allocation and free requests and results.
+.TP
+\-a
+Print a list of allocations that weren't freed (and their sizes) in addition to their call stacks.
+.TP
+\-o OLDER
+Print only allocations older than OLDER milliseconds. Useful to remove false positives.
+The default value is 500 milliseconds.
+.TP
+\-c COMMAND
+Run the specified command and trace its allocations only. This traces malloc and free from libc.
+.TP
+\-s SAMPLE_RATE
+Record roughly every SAMPLE_RATE-th allocation to reduce overhead.
+.TP
+\-d STACK_DEPTH
+Capture STACK_DEPTH frames (or less) when obtaining allocation call stacks.
+The default value is 10.
+.TP
+\-t TOP
+Print only the top TOP stacks (sorted by size).
+The default value is 10.
+.TP
+\-z MIN_SIZE
+Capture only allocations that are larger than or equal to MIN_SIZE bytes.
+.TP
+\-Z MAX_SIZE
+Capture only allocations that are smaller than or equal to MAX_SIZE bytes.
+.TP
+INTERVAL
+Print a summary of oustanding allocations and their call stacks every INTERVAL seconds.
+The default interval is 5 seconds.
+.TP
+COUNT
+Print the outstanding allocations summary COUNT times and then exit.
+.SH EXAMPLES
+.TP
+Print outstanding kernel allocation stacks every 3 seconds:
+#
+.B memleak 3
+.TP
+Print user outstanding allocation stacks and allocation details for the process 1005:
+#
+.B memleak -p 1005 -a
+.TP
+Sample roughly every 5th allocation (~20%) of the call stacks and print the top 5
+stacks 10 times before quitting.
+#
+.B memleak -s 5 --top=5 10
+.TP
+Run ./allocs and print outstanding allocation stacks for that process: 
+#
+.B memleak -c "./allocs"
+.TP
+Capture only allocations between 16 and 32 bytes in size:
+#
+.B memleak -z 16 -Z 32
+.SH OVERHEAD
+memleak can have significant overhead if the target process or kernel performs
+allocations at a very high rate. Pathological cases may exhibit up to 100x
+degradation in running time. Most of the time, however, memleak shouldn't cause
+a significant slowdown. You can use the \-s switch to reduce the overhead
+further by capturing only every N-th allocation. The \-z and \-Z switches can
+also reduce overhead by capturing only allocations of specific sizes.
+
+To determine the rate at which your application is calling malloc/free, or the
+rate at which your kernel is calling kmalloc/kfree, place a probe with perf and
+collect statistics. For example, to determine how many calls to __kmalloc are
+placed in a typical period of 10 seconds:
+
+#
+.B perf probe '__kmalloc'
+
+#
+.B perf stat -a -e 'probe:__kmalloc' -- sleep 10
+
+Another setting that may help reduce overhead is lowering the number of stack
+frames captured and parsed by memleak for each allocation, using the \-d switch.
+
+.SH SOURCE
+This is from bcc.
+.IP
+https://github.com/iovisor/bcc
+.PP
+Also look in the bcc distribution for a companion _examples.txt file containing
+example usage, output, and commentary for this tool.
+.SH OS
+Linux
+.SH STABILITY
+Unstable - in development.
+.SH AUTHOR
+Sasha Goldshtein
--- a/man/man8/oomkill.8
+++ b/man/man8/oomkill.8
+.TH oomkill 8  "2016-02-09" "USER COMMANDS"
+.SH NAME
+oomkill \- Trace oom_kill_process(). Uses Linux eBPF/bcc.
+.SH SYNOPSIS
+.B oomkill
+.SH DESCRIPTION
+This traces the kernel out-of-memory killer, and prints basic details,
+including the system load averages at the time of the OOM kill. This can
+provide more context on the system state at the time: was it getting busier
+or steady, based on the load averages? This tool may also be useful to
+customize for investigations; for example, by adding other task_struct
+details at the time of OOM.
+
+This program is also a basic example of eBPF/bcc.
+
+Since this uses BPF, only the root user can use this tool.
+.SH REQUIREMENTS
+CONFIG_BPF and bcc.
+.SH EXAMPLES
+.TP
+Trace OOM kill events:
+#
+.B oomkill
+.SH FIELDS
+.TP
+Triggered by ...
+The process ID and process name of the task that was running when another task was OOM
+killed.
+.TP
+OOM kill of ...
+The process ID and name of the target process that was OOM killed.
+.TP
+loadavg
+Contents of /proc/loadavg. The first three numbers are 1, 5, and 15 minute
+load averages (where the average is an exponentially damped moving sum, and
+those numbers are constants in the equation); then there is the number of
+running tasks, a slash, and the total number of tasks; and then the last number
+is the last PID to be created.
+.SH OVERHEAD
+Negligible.
+.SH SOURCE
+This is from bcc.
+.IP
+https://github.com/iovisor/bcc
+.PP
+Also look in the bcc distribution for a companion _examples.txt file containing
+example usage, output, and commentary for this tool.
+.SH OS
+Linux
+.SH STABILITY
+Unstable - in development.
+.SH AUTHOR
+Brendan Gregg
+.SH SEE ALSO
+memleak(8)
--- a/man/man8/runqlat.8
+++ b/man/man8/runqlat.8
+.TH runqlat 8  "2016-02-07" "USER COMMANDS"
+.SH NAME
+runqlat \- Run queue (scheduler) latency as a histogram.
+.SH SYNOPSIS
+.B runqlat [\-h] [\-T] [\-m] [\-P] [\-p PID] [interval] [count]
+.SH DESCRIPTION
+This measures the time a task spends waiting on a run queue (or equivalent
+scheduler data structure) for a turn on-CPU, and shows this time as a
+histogram. This time should be small, but a task may need to wait its turn due
+to CPU load. The higher the CPU load, the longer a task will generally need to
+wait its turn.
+
+This tool measures two types of run queue latency:
+
+1. The time from a task being enqueued on a run queue to its context switch
+and execution. This traces enqueue_task_*() -> finish_task_switch(),
+and instruments the run queue latency after a voluntary context switch.
+
+2. The time from when a task was involuntary context switched and still
+in the runnable state, to when it next executed. This is instrumented
+from finish_task_switch() alone.
+
+This tool uses in-kernel eBPF maps for storing timestamps and the histogram,
+for efficiency. Despite this, the overhead of this tool may become significant
+for some workloads: see the OVERHEAD section.
+
+This works by tracing various kernel scheduler functions using dynamic tracing,
+and will need updating to match any changes to these functions.
+
+Since this uses BPF, only the root user can use this tool.
+.SH REQUIREMENTS
+CONFIG_BPF and bcc.
+.SH OPTIONS
+.TP
+\-h
+Print usage message.
+.TP
+\-T
+Include timestamps on output.
+.TP
+\-m
+Output histogram in milliseconds.
+.TP
+\-P
+Print a histogram for each PID.
+.TP
+\-p PID
+Only show this PID (filtered in kernel for efficiency).
+.TP
+interval
+Output interval, in seconds.
+.TP
+count
+Number of outputs.
+.SH EXAMPLES
+.TP
+Summarize run queue latency as a histogram:
+#
+.B runqlat
+.TP
+Print 1 second summaries, 10 times:
+#
+.B runqlat 1 10
+.TP
+Print 1 second summaries, using milliseconds as units for the histogram, and include timestamps on output:
+#
+.B runqlat \-mT 1
+.TP
+Trace PID 186 only, 1 second summaries:
+#
+.B runqlat -P 185 1
+.SH FIELDS
+.TP
+usecs
+Microsecond range
+.TP
+msecs
+Millisecond range
+.TP
+count
+How many times a task event fell into this range
+.TP
+distribution
+An ASCII bar chart to visualize the distribution (count column)
+.SH OVERHEAD
+This traces scheduler functions, which can become very frequent. While eBPF
+has very low overhead, and this tool uses in-kernel maps for efficiency, the
+frequency of scheduler events for some workloads may be high enough that the
+overhead of this tool becomes significant. Measure in a lab environment
+to quantify the overhead before use.
+.SH SOURCE
+This is from bcc.
+.IP
+https://github.com/iovisor/bcc
+.PP
+Also look in the bcc distribution for a companion _examples.txt file containing
+example usage, output, and commentary for this tool.
+.SH OS
+Linux
+.SH STABILITY
+Unstable - in development.
+.SH AUTHOR
+Brendan Gregg
+.SH SEE ALSO
+pidstat(1)
--- a/man/man8/softirqs.8
+++ b/man/man8/softirqs.8
@@ -74,7 +74,7 @@ This traces kernel functions and maintains in-kernel counts, which
 are asynchronously copied to user-space. While the rate of interrupts
 be very high (>1M/sec), this is a relatively efficient way to trace these
 events, and so the overhead is expected to be small for normal workloads, but
-could become noticable for heavy workloads. Measure in a test environment
+could become noticeable for heavy workloads. Measure in a test environment
 before use.
 .SH SOURCE
 This is from bcc.

--- a/man/man8/stackcount.8
+++ b/man/man8/stackcount.8
@@ -7,7 +7,7 @@ stackcount \- Count kernel function calls and their stack traces. Uses Linux eBP
 stackcount traces kernel functions and frequency counts them with their entire
 kernel stack trace, summarized in-kernel for efficiency. This allows higher
 frequency events to be studied. The output consists of unique stack traces,
-and their occurance counts.
+and their occurrence counts.

 The pattern is a string with optional '*' wildcards, similar to file globbing.
 If you'd prefer to use regular expressions, use the \-r option.

--- a/man/man8/stacksnoop.8
+++ b/man/man8/stacksnoop.8
@@ -56,7 +56,7 @@ Time of the call, in seconds.
 STACK
 Kernel stack trace. The first column shows "ip" for instruction pointer, and
 "r#" for each return pointer in the stack. The second column is the stack trace
-as hexidecimal. The third column is the translated kernel symbol names.
+as hexadecimal. The third column is the translated kernel symbol names.
 .SH OVERHEAD
 This can have significant overhead if frequently called functions (> 1000/s) are
 traced, and is only intended for low frequency function calls. This is because

--- a/man/man8/statsnoop.8
+++ b/man/man8/statsnoop.8
+.TH statsnoop 8  "2016-02-08" "USER COMMANDS"
+.SH NAME
+statsnoop \- Trace stat() syscalls. Uses Linux eBPF/bcc.
+.SH SYNOPSIS
+.B statsnoop [\-h] [\-t] [\-x] [\-p PID]
+.SH DESCRIPTION
+statsnoop traces the different stat() syscalls, showing which processes are
+attempting to read information about which files. This can be useful for
+determining the location of config and log files, or for troubleshooting
+applications that are failing, especially on startup.
+
+This works by tracing various kernel sys_stat() functions using dynamic
+tracing, and will need updating to match any changes to these functions.
+
+Since this uses BPF, only the root user can use this tool.
+.SH REQUIREMENTS
+CONFIG_BPF and bcc.
+.SH OPTIONS
+.TP
+\-h
+Print usage message.
+.TP
+\-t
+Include a timestamp column.
+.TP
+\-x
+Only print failed stats.
+.TP
+\-p PID
+Trace this process ID only (filtered in-kernel).
+.SH EXAMPLES
+.TP
+Trace all stat() syscalls:
+#
+.B statsnoop
+.TP
+Trace all stat() syscalls, and include timestamps:
+#
+.B statsnoop \-t
+.TP
+Trace only stat() syscalls that failed:
+#
+.B statsnoop \-x
+.TP
+Trace PID 181 only:
+#
+.B statsnoop \-p 181
+.SH FIELDS
+.TP
+TIME(s)
+Time of the call, in seconds.
+.TP
+PID
+Process ID
+.TP
+COMM
+Process name
+.TP
+FD
+File descriptor (if success), or -1 (if failed)
+.TP
+ERR
+Error number (see the system's errno.h)
+.TP
+PATH
+Open path
+.SH OVERHEAD
+This traces the kernel stat function and prints output for each event. As the
+rate of this is generally expected to be low (< 1000/s), the overhead is also
+expected to be negligible. If you have an application that is calling a high
+rate of stat()s, then test and understand overhead before use.
+.SH SOURCE
+This is from bcc.
+.IP
+https://github.com/iovisor/bcc
+.PP
+Also look in the bcc distribution for a companion _examples.txt file containing
+example usage, output, and commentary for this tool.
+.SH OS
+Linux
+.SH STABILITY
+Unstable - in development.
+.SH AUTHOR
+Brendan Gregg
+.SH SEE ALSO
+opensnoop(1)
--- a/src/cc/CMakeLists.txt
+++ b/src/cc/CMakeLists.txt
@@ -55,7 +55,7 @@ install(TARGETS bcc LIBRARY COMPONENT libbcc
 install(DIRECTORY export/ COMPONENT libbcc
  DESTINATION share/bcc/include/bcc
  FILES_MATCHING PATTERN "*.h")
-install(FILES bpf_common.h ../libbpf.h COMPONENT libbcc
+install(FILES bpf_common.h bpf_module.h ../libbpf.h COMPONENT libbcc
  DESTINATION include/bcc)
 install(DIRECTORY compat/linux/ COMPONENT libbcc
  DESTINATION include/bcc/compat/linux

--- a/src/cc/bpf_module.cc
+++ b/src/cc/bpf_module.cc
@@ -148,6 +148,19 @@ static void parse_type(IRBuilder<> &B, vector<Value *> *args, string *fmt,
      *fmt += " ";
    }
    *fmt += "}";
+  } else if (ArrayType *at = dyn_cast<ArrayType>(type)) {
+    *fmt += "[ ";
+    for (size_t i = 0; i < at->getNumElements(); ++i) {
+      parse_type(B, args, fmt, at->getElementType(), B.CreateStructGEP(type, out, i), is_writer);
+      *fmt += " ";
+    }
+    *fmt += "]";
+  } else if (PointerType *pt = dyn_cast<PointerType>(type)) {
+    *fmt += "0xl";
+    if (is_writer)
+      *fmt += "x";
+    else
+      *fmt += "i";
  } else if (IntegerType *it = dyn_cast<IntegerType>(type)) {
    if (is_writer)
      *fmt += "0x";

--- a/src/cc/export/helpers.h
+++ b/src/cc/export/helpers.h
@@ -122,6 +122,8 @@ static int (*bpf_probe_read)(void *dst, u64 size, void *unsafe_ptr) =
  (void *) BPF_FUNC_probe_read;
 static u64 (*bpf_ktime_get_ns)(void) =
  (void *) BPF_FUNC_ktime_get_ns;
+static u32 (*bpf_get_prandom_u32) =
+  (void *) BPF_FUNC_get_prandom_u32;
 static int (*bpf_trace_printk_)(const char *fmt, u64 fmt_size, ...) =
  (void *) BPF_FUNC_trace_printk;
 int bpf_trace_printk(const char *fmt, ...) asm("llvm.bpf.extra");

--- a/src/cc/frontends/p4/README.md
+++ b/src/cc/frontends/p4/README.md
@@ -113,7 +113,7 @@ several types of hooks available:
  through the socket/interface.

 EBPF programs can be used for many purposes; the main use cases are
-dynamic tracing and monitoring, and packet procesisng.  We are mostly
+dynamic tracing and monitoring, and packet processing.  We are mostly
 interested in the latter use case in this document.

 #### EBPF Tables
@@ -219,7 +219,7 @@ very complex packet filters and simple packet forwarding engines.  In
 the spirit of open-source "release early, release often", we expect
 that the compiler's capabilities will improve gradually.

-* Packet filtering is peformed using the `drop()` action.  Packets
+* Packet filtering is performed using the `drop()` action.  Packets
  that are not dropped will be forwarded.

 * Packet forwarding is performed by setting the
@@ -233,7 +233,7 @@ Here are some limitations imposed on the P4 programs:
  EBPF program).  In the future the compiler should probably generate
  two separate EBPF programs.

-* arbirary parsers can be compiled, but the BCC compiler will reject
+* arbitrary parsers can be compiled, but the BCC compiler will reject
  parsers that contain cycles

 * arithmetic on data wider than 32 bits is not supported
@@ -311,7 +311,7 @@ p4toEbpf.py file.p4 -o file.c

 The P4 compiler first runs the C preprocessor on the input P4 file.
 Some of the command-line options are passed directly to the
-preprocesor.
+preprocessor.

 The following compiler options are available:


--- a/src/python/bcc/__init__.py
+++ b/src/python/bcc/__init__.py
@@ -286,6 +286,30 @@ class BPF(object):
                if res < 0:
                    raise KeyError

+        # override the MutableMapping's implementation of these since they
+        # don't handle KeyError nicely
+        def itervalues(self):
+            for key in self:
+                # a map entry may be deleted in between discovering the key and
+                # fetching the value, suppress such errors
+                try:
+                    yield self[key]
+                except KeyError:
+                    pass
+
+        def iteritems(self):
+            for key in self:
+                try:
+                    yield (key, self[key])
+                except KeyError:
+                    pass
+
+        def items(self):
+            return [item for item in self.iteritems()]
+
+        def values(self):
+            return [value for value in self.itervalues()]
+
        def clear(self):
            # default clear uses popitem, which can race with the bpf prog
            for k in self.keys():

--- a/tests/cc/test_clang.py
+++ b/tests/cc/test_clang.py
@@ -3,6 +3,7 @@
 # Licensed under the Apache License, Version 2.0 (the "License")

 from bcc import BPF
+import ctypes
 from unittest import main, TestCase

 class TestClang(TestCase):
@@ -89,6 +90,22 @@ int foo(void *ctx) {
        self.assertEqual(l.s.a, 5)
        self.assertEqual(l.s.b, 6)

+    def test_sscanf_array(self):
+        text = """
+BPF_TABLE("hash", int, struct { u32 a[3]; u32 b; }, stats, 10);
+"""
+        b = BPF(text=text, debug=0)
+        t = b.get_table("stats")
+        s1 = t.key_sprintf(t.Key(2))
+        self.assertEqual(s1, b"0x2")
+        s2 = t.leaf_sprintf(t.Leaf((ctypes.c_uint * 3)(1,2,3), 4))
+        self.assertEqual(s2, b"{ [ 0x1 0x2 0x3 ] 0x4 }")
+        l = t.leaf_scanf(s2)
+        self.assertEqual(l.a[0], 1)
+        self.assertEqual(l.a[1], 2)
+        self.assertEqual(l.a[2], 3)
+        self.assertEqual(l.b, 4)
+
    def test_iosnoop(self):
        text = """
 #include <linux/blkdev.h>

--- a/tools/biolatency_example.txt
+++ b/tools/biolatency_example.txt
@@ -41,7 +41,7 @@ the last row printed, for which there were 2 I/O.
 For efficiency, biolatency uses an in-kernel eBPF map to store timestamps
 with requests, and another in-kernel map to store the histogram (the "count")
 column, which is copied to user-space only when output is printed. These
-methods lower the perormance overhead when tracing is performed.
+methods lower the performance overhead when tracing is performed.


 In the following example, the -m option is used to print a histogram using

--- a/tools/biosnoop.py
+++ b/tools/biosnoop.py
@@ -21,23 +21,21 @@ b = BPF(text="""
 #include <linux/blkdev.h>

 struct val_t {
+    u32 pid;
    char name[TASK_COMM_LEN];
 };

 BPF_HASH(start, struct request *);
-BPF_HASH(pidbyreq, struct request *, u32);
-BPF_HASH(commbyreq, struct request *, struct val_t);
+BPF_HASH(infobyreq, struct request *, struct val_t);

 // cache PID and comm by-req
 int trace_pid_start(struct pt_regs *ctx, struct request *req)
 {
-    u32 pid;
    struct val_t val = {};

-    pid = bpf_get_current_pid_tgid();
-    pidbyreq.update(&req, &pid);
    if (bpf_get_current_comm(&val.name, sizeof(val.name)) == 0) {
-        commbyreq.update(&req, &val);
+        val.pid = bpf_get_current_pid_tgid();
+        infobyreq.update(&req, &val);
    }

    return 0;
@@ -74,12 +72,11 @@ int trace_req_completion(struct pt_regs *ctx, struct request *req)
    // As bpf_trace_prink() is limited to a maximum of 1 string and 2
    // integers, we'll use more than one to output the data.
    //
-    valp = commbyreq.lookup(&req);
-    pidp = pidbyreq.lookup(&req);
-    if (pidp == 0 || valp == 0) {
+    valp = infobyreq.lookup(&req);
+    if (valp == 0) {
        bpf_trace_printk("0 0 ? %d\\n", req->__data_len);
    } else {
-        bpf_trace_printk("0 %d %s %d\\n", *pidp, valp->name,
+        bpf_trace_printk("0 %d %s %d\\n", valp->pid, valp->name,
            req->__data_len);
    }

@@ -93,8 +90,7 @@ int trace_req_completion(struct pt_regs *ctx, struct request *req)
    }

    start.delete(&req);
-    pidbyreq.delete(&req);
-    commbyreq.delete(&req);
+    infobyreq.delete(&req);

    return 0;
 }

--- a/tools/biotop.py
+++ b/tools/biotop.py
+#!/usr/bin/python
+# @lint-avoid-python-3-compatibility-imports
+#
+# biotop  block device (disk) I/O by process.
+#         For Linux, uses BCC, eBPF.
+#
+# USAGE: biotop.py [-h] [-C] [-r MAXROWS] [interval] [count]
+#
+# This uses in-kernel eBPF maps to cache process details (PID and comm) by I/O
+# request, as well as a starting timestamp for calculating I/O latency.
+#
+# Copyright 2016 Netflix, Inc.
+# Licensed under the Apache License, Version 2.0 (the "License")
+#
+# 06-Feb-2016   Brendan Gregg   Created this.
+
+from __future__ import print_function
+from bcc import BPF
+from time import sleep, strftime
+import argparse
+import signal
+from subprocess import call
+
+# arguments
+examples = """examples:
+    ./biotop            # block device I/O top, 1 second refresh
+    ./biotop -C         # don't clear the screen
+    ./biotop 5          # 5 second summaries
+    ./biotop 5 10       # 5 second summaries, 10 times only
+"""
+parser = argparse.ArgumentParser(
+    description="Block device (disk) I/O by process",
+    formatter_class=argparse.RawDescriptionHelpFormatter,
+    epilog=examples)
+parser.add_argument("-C", "--noclear", action="store_true",
+    help="don't clear the screen")
+parser.add_argument("-r", "--maxrows", default=20,
+    help="maximum rows to print, default 20")
+parser.add_argument("interval", nargs="?", default=1,
+    help="output interval, in seconds")
+parser.add_argument("count", nargs="?", default=99999999,
+    help="number of outputs")
+args = parser.parse_args()
+interval = int(args.interval)
+countdown = int(args.count)
+maxrows = int(args.maxrows)
+clear = not int(args.noclear)
+
+# linux stats
+loadavg = "/proc/loadavg"
+diskstats = "/proc/diskstats"
+
+# signal handler
+def signal_ignore(signal, frame):
+    print()
+
+# load BPF program
+b = BPF(text="""
+#include <uapi/linux/ptrace.h>
+#include <linux/blkdev.h>
+
+// for saving process info by request
+struct who_t {
+    u32 pid;
+    char name[TASK_COMM_LEN];
+};
+
+// the key for the output summary
+struct info_t {
+    u32 pid;
+    int type;
+    int major;
+    int minor;
+    char name[TASK_COMM_LEN];
+};
+
+// the value of the output summary
+struct val_t {
+    u64 bytes;
+    u64 us;
+    u32 io;
+};
+
+BPF_HASH(start, struct request *);
+BPF_HASH(whobyreq, struct request *, struct who_t);
+BPF_HASH(counts, struct info_t, struct val_t);
+
+// cache PID and comm by-req
+int trace_pid_start(struct pt_regs *ctx, struct request *req)
+{
+    struct who_t who = {};
+
+    if (bpf_get_current_comm(&who.name, sizeof(who.name)) == 0) {
+        who.pid = bpf_get_current_pid_tgid();
+        whobyreq.update(&req, &who);
+    }
+
+    return 0;
+}
+
+// time block I/O
+int trace_req_start(struct pt_regs *ctx, struct request *req)
+{
+    u64 ts;
+
+    ts = bpf_ktime_get_ns();
+    start.update(&req, &ts);
+
+    return 0;
+}
+
+// output
+int trace_req_completion(struct pt_regs *ctx, struct request *req)
+{
+    u64 *tsp;
+
+    // fetch timestamp and calculate delta
+    tsp = start.lookup(&req);
+    if (tsp == 0) {
+        return 0;    // missed tracing issue
+    }
+
+    struct who_t *whop;
+    struct val_t *valp, zero = {};
+    u64 delta_us = (bpf_ktime_get_ns() - *tsp) / 1000;
+
+    // setup info_t key
+    struct info_t info = {};
+    info.major = req->rq_disk->major;
+    info.minor = req->rq_disk->first_minor;
+    info.type = req->cmd_flags & REQ_WRITE;
+    whop = whobyreq.lookup(&req);
+    if (whop == 0) {
+        // missed pid who, save stats as pid 0
+        valp = counts.lookup_or_init(&info, &zero);
+    } else {
+        info.pid = whop->pid;
+        __builtin_memcpy(&info.name, whop->name, sizeof(info.name));
+        valp = counts.lookup_or_init(&info, &zero);
+    }
+
+    // save stats
+    valp->us += delta_us;
+    valp->bytes += req->__data_len;
+    valp->io++;
+
+    start.delete(&req);
+    whobyreq.delete(&req);
+
+    return 0;
+}
+""", debug=0)
+b.attach_kprobe(event="blk_account_io_start", fn_name="trace_pid_start")
+b.attach_kprobe(event="blk_start_request", fn_name="trace_req_start")
+b.attach_kprobe(event="blk_mq_start_request", fn_name="trace_req_start")
+b.attach_kprobe(event="blk_account_io_completion",
+    fn_name="trace_req_completion")
+
+print('Tracing... Output every %d secs. Hit Ctrl-C to end' % interval)
+
+# cache disk major,minor -> diskname
+disklookup = {}
+with open(diskstats) as stats:
+    for line in stats:
+        a = line.split()
+        disklookup[a[0] + "," + a[1]] = a[2]
+
+# output
+exiting = 0
+while 1:
+    try:
+        sleep(interval)
+    except KeyboardInterrupt:
+        exiting = 1
+
+    # header
+    if clear:
+        call("clear")
+    else:
+        print()
+    with open(loadavg) as stats:
+        print("%-8s loadavg: %s" % (strftime("%H:%M:%S"), stats.read()))
+    print("%-6s %-16s %1s %-3s %-3s %-8s %5s %7s %6s" % ("PID", "COMM",
+        "D", "MAJ", "MIN", "DISK", "I/O", "Kbytes", "AVGms"))
+
+    # by-PID output
+    counts = b.get_table("counts")
+    line = 0
+    for k, v in reversed(sorted(counts.items(),
+                                key=lambda counts: counts[1].bytes)):
+
+        # lookup disk
+        disk = str(k.major) + "," + str(k.minor)
+        if disk in disklookup:
+            diskname = disklookup[disk]
+        else:
+            diskname = "?"
+
+        # print line
+        avg_ms = (float(v.us) / 1000) / v.io
+        print("%-6d %-16s %1s %-3d %-3d %-8s %5s %7s %6.2f" % (k.pid, k.name,
+            "W" if k.type else "R", k.major, k.minor, diskname, v.io,
+            v.bytes / 1024, avg_ms))
+
+        line += 1
+        if line >= maxrows:
+            break
+    counts.clear()
+
+    countdown -= 1
+    if exiting or countdown == 0:
+        print("Detaching...")
+        exit()
--- a/tools/biotop_example.txt
+++ b/tools/biotop_example.txt
+Demonstrations of biotop, the Linux eBPF/bcc version.
+
+
+Short for block device I/O top, biotop summarizes which processes are
+performing disk I/O. It's top for disks. Sample output:
+
+# ./biotop
+Tracing... Output every 1 secs. Hit Ctrl-C to end
+
+08:04:11 loadavg: 1.48 0.87 0.45 1/287 14547
+
+PID    COMM             D MAJ MIN DISK       I/O  Kbytes  AVGms
+14501  cksum            R 202 1   xvda1      361   28832   3.39
+6961   dd               R 202 1   xvda1     1628   13024   0.59
+13855  dd               R 202 1   xvda1     1627   13016   0.59
+326    jbd2/xvda1-8     W 202 1   xvda1        3     168   3.00
+1880   supervise        W 202 1   xvda1        2       8   6.71
+1873   supervise        W 202 1   xvda1        2       8   2.51
+1871   supervise        W 202 1   xvda1        2       8   1.57
+1876   supervise        W 202 1   xvda1        2       8   1.22
+1892   supervise        W 202 1   xvda1        2       8   0.62
+1878   supervise        W 202 1   xvda1        2       8   0.78
+1886   supervise        W 202 1   xvda1        2       8   1.30
+1894   supervise        W 202 1   xvda1        2       8   3.46
+1869   supervise        W 202 1   xvda1        2       8   0.73
+1888   supervise        W 202 1   xvda1        2       8   1.48
+
+By default the screen refreshes every 1 second, and shows the top 20 disk
+consumers, sorted on total Kbytes. The first line printed is the header,
+which has the time and then the contents of /proc/loadavg.
+
+For the interval summarized by the output above, the "cksum" command performed
+361 disk reads to the "xvda1" device, for a total of 28832 Kbytes, with an
+average I/O time of 3.39 ms. Two "dd" processes were also reading from the
+same disk, which a higher I/O rate and lower latency. While the average I/O
+size is not printed, it can be determined by dividing the Kbytes column by
+the I/O column.
+
+The columns through to Kbytes show the workload applied. The final column,
+AVGms, shows resulting performance. Other bcc tools can be used to get more
+details when needed: biolatency and biosnoop.
+
+Many years ago I created the original "iotop", and later regretted not calling
+it diskiotop or blockiotop, as "io" alone is ambiguous. This time it is biotop.
+
+
+The -C option can be used to prevent the screen from clearing (my preference).
+Here's using it with a 5 second interval:
+
+# ./biotop -C 5
+Tracing... Output every 5 secs. Hit Ctrl-C to end
+
+08:09:44 loadavg: 0.42 0.44 0.39 2/282 22115
+
+PID    COMM             D MAJ MIN DISK       I/O  Kbytes  AVGms
+22069  dd               R 202 1   xvda1     5993   47976   0.33
+326    jbd2/xvda1-8     W 202 1   xvda1        3     168   2.67
+1866   svscan           R 202 1   xvda1       33     132   1.24
+1880   supervise        W 202 1   xvda1       10      40   0.56
+1873   supervise        W 202 1   xvda1       10      40   0.79
+1871   supervise        W 202 1   xvda1       10      40   0.78
+1876   supervise        W 202 1   xvda1       10      40   0.68
+1892   supervise        W 202 1   xvda1       10      40   0.71
+1878   supervise        W 202 1   xvda1       10      40   0.65
+1886   supervise        W 202 1   xvda1       10      40   0.78
+1894   supervise        W 202 1   xvda1       10      40   0.80
+1869   supervise        W 202 1   xvda1       10      40   0.91
+1888   supervise        W 202 1   xvda1       10      40   0.63
+22069  bash             R 202 1   xvda1        1      16  19.94
+9251   kworker/u16:2    W 202 16  xvdb         2       8   0.13
+
+08:09:49 loadavg: 0.47 0.44 0.39 1/282 22231
+
+PID    COMM             D MAJ MIN DISK       I/O  Kbytes  AVGms
+22069  dd               R 202 1   xvda1    13450  107600   0.35
+22199  cksum            R 202 1   xvda1      941   45548   4.63
+326    jbd2/xvda1-8     W 202 1   xvda1        3     168   2.93
+24467  kworker/0:2      W 202 16  xvdb         1      64   0.28
+1880   supervise        W 202 1   xvda1       10      40   0.81
+1873   supervise        W 202 1   xvda1       10      40   0.81
+1871   supervise        W 202 1   xvda1       10      40   1.03
+1876   supervise        W 202 1   xvda1       10      40   0.76
+1892   supervise        W 202 1   xvda1       10      40   0.74
+1878   supervise        W 202 1   xvda1       10      40   0.94
+1886   supervise        W 202 1   xvda1       10      40   0.76
+1894   supervise        W 202 1   xvda1       10      40   0.69
+1869   supervise        W 202 1   xvda1       10      40   0.72
+1888   supervise        W 202 1   xvda1       10      40   1.70
+22199  bash             R 202 1   xvda1        2      20   0.35
+482    xfsaild/md0      W 202 16  xvdb         5      13   0.27
+482    xfsaild/md0      W 202 32  xvdc         2       8   0.33
+31331  pickup           R 202 1   xvda1        1       4   0.31
+
+08:09:54 loadavg: 0.51 0.45 0.39 2/282 22346
+
+PID    COMM             D MAJ MIN DISK       I/O  Kbytes  AVGms
+22069  dd               R 202 1   xvda1    14689  117512   0.32
+326    jbd2/xvda1-8     W 202 1   xvda1        3     168   2.33
+1880   supervise        W 202 1   xvda1       10      40   0.65
+1873   supervise        W 202 1   xvda1       10      40   1.08
+1871   supervise        W 202 1   xvda1       10      40   0.66
+1876   supervise        W 202 1   xvda1       10      40   0.79
+1892   supervise        W 202 1   xvda1       10      40   0.67
+1878   supervise        W 202 1   xvda1       10      40   0.66
+1886   supervise        W 202 1   xvda1       10      40   1.02
+1894   supervise        W 202 1   xvda1       10      40   0.88
+1869   supervise        W 202 1   xvda1       10      40   0.89
+1888   supervise        W 202 1   xvda1       10      40   1.25
+
+08:09:59 loadavg: 0.55 0.46 0.40 2/282 22461
+
+PID    COMM             D MAJ MIN DISK       I/O  Kbytes  AVGms
+22069  dd               R 202 1   xvda1    14442  115536   0.33
+326    jbd2/xvda1-8     W 202 1   xvda1        3     168   3.46
+1880   supervise        W 202 1   xvda1       10      40   0.87
+1873   supervise        W 202 1   xvda1       10      40   0.87
+1871   supervise        W 202 1   xvda1       10      40   0.78
+1876   supervise        W 202 1   xvda1       10      40   0.86
+1892   supervise        W 202 1   xvda1       10      40   0.89
+1878   supervise        W 202 1   xvda1       10      40   0.87
+1886   supervise        W 202 1   xvda1       10      40   0.86
+1894   supervise        W 202 1   xvda1       10      40   1.06
+1869   supervise        W 202 1   xvda1       10      40   1.12
+1888   supervise        W 202 1   xvda1       10      40   0.98
+
+08:10:04 loadavg: 0.59 0.47 0.40 3/282 22576
+
+PID    COMM             D MAJ MIN DISK       I/O  Kbytes  AVGms
+22069  dd               R 202 1   xvda1    14179  113432   0.34
+326    jbd2/xvda1-8     W 202 1   xvda1        3     168   2.39
+1880   supervise        W 202 1   xvda1       10      40   0.81
+1873   supervise        W 202 1   xvda1       10      40   1.02
+1871   supervise        W 202 1   xvda1       10      40   1.15
+1876   supervise        W 202 1   xvda1       10      40   1.10
+1892   supervise        W 202 1   xvda1       10      40   0.77
+1878   supervise        W 202 1   xvda1       10      40   0.72
+1886   supervise        W 202 1   xvda1       10      40   0.81
+1894   supervise        W 202 1   xvda1       10      40   0.86
+1869   supervise        W 202 1   xvda1       10      40   0.83
+1888   supervise        W 202 1   xvda1       10      40   0.79
+24467  kworker/0:2      R 202 32  xvdc         3      12   0.26
+1056   cron             R 202 1   xvda1        2       8   0.30
+24467  kworker/0:2      R 202 16  xvdb         1       4   0.23
+
+08:10:09 loadavg: 0.54 0.46 0.40 2/281 22668
+
+PID    COMM             D MAJ MIN DISK       I/O  Kbytes  AVGms
+22069  dd               R 202 1   xvda1      250    2000   0.34
+326    jbd2/xvda1-8     W 202 1   xvda1        3     168   2.40
+1880   supervise        W 202 1   xvda1        8      32   0.93
+1873   supervise        W 202 1   xvda1        8      32   0.76
+1871   supervise        W 202 1   xvda1        8      32   0.60
+1876   supervise        W 202 1   xvda1        8      32   0.61
+1892   supervise        W 202 1   xvda1        8      32   0.68
+1878   supervise        W 202 1   xvda1        8      32   0.90
+1886   supervise        W 202 1   xvda1        8      32   0.57
+1894   supervise        W 202 1   xvda1        8      32   0.97
+1869   supervise        W 202 1   xvda1        8      32   0.69
+1888   supervise        W 202 1   xvda1        8      32   0.67
+
+This shows another "dd" command reading from xvda1. On this system, various
+"supervise" processes do 8 disk writes per second, every second (they are
+creating and updating "status" files).
+
+
+USAGE message:
+
+# ./biotop.py -h
+usage: biotop.py [-h] [-C] [-r MAXROWS] [interval] [count]
+
+Block device (disk) I/O by process
+
+positional arguments:
+  interval              output interval, in seconds
+  count                 number of outputs
+
+optional arguments:
+  -h, --help            show this help message and exit
+  -C, --noclear         don't clear the screen
+  -r MAXROWS, --maxrows MAXROWS
+                        maximum rows to print, default 20
+
+examples:
+    ./biotop            # block device I/O top, 1 second refresh
+    ./biotop -C         # don't clear the screen
+    ./biotop 5          # 5 second summaries
+    ./biotop 5 10       # 5 second summaries, 10 times only
--- a/tools/bitesize.py
+++ b/tools/bitesize.py
@@ -4,8 +4,8 @@
 #               For Linux, uses BCC, eBPF. See .c file.
 #
 # USAGE: bitesize
-# Ctrl-C will print the partially gathered histogram then exit.
 #
+# Ctrl-C will print the partially gathered histogram then exit.
 #
 # Copyright (c) 2016 Allan McAleavy
 # Licensed under the Apache License, Version 2.0 (the "License")
@@ -19,7 +19,7 @@ bpf_text = """
 #include <uapi/linux/ptrace.h>
 #include <linux/blkdev.h>

-struct proc_key_t  {
+struct proc_key_t {
    char name[TASK_COMM_LEN];
    u64 slot;
 };
@@ -41,12 +41,12 @@ int trace_pid_start(struct pt_regs *ctx, struct request *req)
    return 0;
 }

-int do_count (struct pt_regs *ctx, struct request *req)
+int do_count(struct pt_regs *ctx, struct request *req)
 {
    struct val_t *valp;

    valp = commbyreq.lookup(&req);
-    if ( valp ==  0) {
+    if (valp == 0) {
       return 0;
    }

@@ -55,7 +55,7 @@ int do_count (struct pt_regs *ctx, struct request *req)
        bpf_probe_read(&key.name, sizeof(key.name),valp->name);
        dist.increment(key);
    }
-       return 0;
+    return 0;
 }
 """

@@ -72,4 +72,4 @@ dist = b.get_table("dist")
 try:
    sleep(99999999)
 except KeyboardInterrupt:
-    dist.print_log2_hist("Kbytes", "Process Name:")
+    dist.print_log2_hist("Kbytes", "Process Name")
--- a/tools/bitesize_example.txt
+++ b/tools/bitesize_example.txt
-Example of BCC tool bitesize.py
+Examples of bitesize.py, the Linux bcc/eBPF version.
+

 The aim of this tool is to show I/O distribution for requested block sizes, by process name.

@@ -6,13 +7,13 @@ The aim of this tool is to show I/O distribution for requested block sizes, by p
 Tracing... Hit Ctrl-C to end.
 ^C

-Process Name: = 'kworker/u128:1'
+Process Name = 'kworker/u128:1'
     Kbytes              : count     distribution
         0 -> 1          : 1        |********************                    |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 2        |****************************************|

-Process Name: = 'bitesize.py'
+Process Name = 'bitesize.py'
     Kbytes              : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
@@ -23,7 +24,7 @@ Process Name: = 'bitesize.py'
        64 -> 127        : 0        |                                        |
       128 -> 255        : 1        |****************************************|

-Process Name: = 'dd'
+Process Name = 'dd'
     Kbytes              : count     distribution
         0 -> 1          : 3        |                                        |
         2 -> 3          : 0        |                                        |
@@ -37,13 +38,13 @@ Process Name: = 'dd'
       512 -> 1023       : 0        |                                        |
      1024 -> 2047       : 488      |****************************************|

-Process Name: = 'jbd2/dm-1-8'
+Process Name = 'jbd2/dm-1-8'
     Kbytes              : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 1        |****************************************|

-Process Name: = 'cat'
+Process Name = 'cat'
     Kbytes              : count     distribution
         0 -> 1          : 1        |                                        |
         2 -> 3          : 0        |                                        |
@@ -55,19 +56,19 @@ Process Name: = 'cat'
       128 -> 255        : 0        |                                        |
       256 -> 511        : 1924     |****************************************|

-Process Name: = 'ntpd'
+Process Name = 'ntpd'
     Kbytes              : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 104      |****************************************|

-Process Name: = 'vmtoolsd'
+Process Name = 'vmtoolsd'
     Kbytes              : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 1        |****************************************|

-Process Name: = 'bash'
+Process Name = 'bash'
     Kbytes              : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
@@ -75,7 +76,7 @@ Process Name: = 'bash'
         8 -> 15         : 0        |                                        |
        16 -> 31         : 2        |****************************************|

-Process Name: = 'jbd2/sdb-8'
+Process Name = 'jbd2/sdb-8'
     Kbytes              : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
@@ -86,7 +87,3 @@ Process Name: = 'jbd2/sdb-8'

 We can see from above that there was a dd command being run which generated 488 IOPS between 1MB and 2MB, we can also see the
 cat command generating 1924 IOPS between 256Kb and 512Kb.
-
-
-See also systemtap version:
-https://github.com/brendangregg/systemtap-lwtools/blob/master/disk/bitesize-nd.stp
--- a/tools/dcsnoop.py
+++ b/tools/dcsnoop.py
+#!/usr/bin/python
+# @lint-avoid-python-3-compatibility-imports
+#
+# dcsnoop   Trace directory entry cache (dcache) lookups.
+#           For Linux, uses BCC, eBPF. Embedded C.
+#
+# USAGE: dcsnoop [-h] [-a]
+#
+# By default, this traces every failed dcache lookup, and shows the process
+# performing the lookup and the filename requested. A -a option can be used
+# to show all lookups, not just failed ones.
+#
+# This uses kernel dynamic tracing of the d_lookup() function, and will need
+# to be modified to match kernel changes.
+#
+# Also see dcstat(8), for per-second summaries.
+#
+# Copyright 2016 Netflix, Inc.
+# Licensed under the Apache License, Version 2.0 (the "License")
+#
+# 09-Feb-2016   Brendan Gregg   Created this.
+
+from __future__ import print_function
+from bcc import BPF
+import argparse
+import re
+
+# arguments
+examples = """examples:
+    ./dcsnoop           # trace failed dcache lookups
+    ./dcsnoop -a        # trace all dcache lookups
+"""
+parser = argparse.ArgumentParser(
+    description="Trace directory entry cache (dcache) lookups",
+    formatter_class=argparse.RawDescriptionHelpFormatter,
+    epilog=examples)
+parser.add_argument("-a", "--all", action="store_true",
+    help="trace all lookups (default is fails only)")
+args = parser.parse_args()
+
+# define BPF program
+bpf_text = """
+#include <uapi/linux/ptrace.h>
+#include <linux/fs.h>
+#include <linux/sched.h>
+
+#define MAX_FILE_LEN  64
+
+struct entry_t {
+    char name[MAX_FILE_LEN];
+};
+
+BPF_HASH(entrybypid, u32, struct entry_t);
+
+/* from fs/namei.c: */
+struct nameidata {
+        struct path     path;
+        struct qstr     last;
+        // [...]
+};
+
+int trace_fast(struct pt_regs *ctx, struct nameidata *nd, struct path *path)
+{
+    bpf_trace_printk("R %s\\n", nd->last.name);
+    return 1;
+}
+
+int kprobe__d_lookup(struct pt_regs *ctx, const struct dentry *parent,
+    const struct qstr *name)
+{
+    u32 pid = bpf_get_current_pid_tgid();
+    struct entry_t entry = {};
+    const char *fname = name->name;
+    if (fname) {
+        bpf_probe_read(&entry.name, sizeof(entry.name), (void *)fname);
+    }
+    entrybypid.update(&pid, &entry);
+    return 0;
+}
+
+int kretprobe__d_lookup(struct pt_regs *ctx)
+{
+    u32 pid = bpf_get_current_pid_tgid();
+    struct entry_t *ep;
+    ep = entrybypid.lookup(&pid);
+    if (ep == 0) {
+        return 0;   // missed entry
+    }
+    if (ctx->ax == 0) {
+        bpf_trace_printk("M %s\\n", ep->name);
+    }
+    entrybypid.delete(&pid);
+    return 0;
+}
+"""
+
+# initialize BPF
+b = BPF(text=bpf_text)
+if args.all:
+    b.attach_kprobe(event="lookup_fast", fn_name="trace_fast")
+
+# header
+print("%-11s %-6s %-16s %1s %s" % ("TIME(s)", "PID", "COMM", "T", "FILE"))
+
+start_ts = 0
+
+# format output
+while 1:
+    (task, pid, cpu, flags, ts, msg) = b.trace_fields()
+    try:
+        (type, file) = msg.split(" ", 1)
+    except ValueError:
+        continue
+
+    if start_ts == 0:
+        start_ts = ts
+
+    print("%-11.6f %-6s %-16s %1s %s" % (ts - start_ts, pid, task, type, file))
--- a/tools/dcsnoop_example.txt
+++ b/tools/dcsnoop_example.txt
+Demonstrations of dcsnoop, the Linux eBPF/bcc version.
+
+
+dcsnoop traces directory entry cache (dcache) lookups, and can be used for
+further investigation beyond dcstat(8). The output is likely verbose, as
+dcache lookups are likely frequent. By default, only failed lookups are shown.
+For example:
+
+# ./dcsnoop.py 
+TIME(s)     PID    COMM             T FILE
+0.002837    1643   snmpd            M net/dev
+0.002852    1643   snmpd            M 1643
+0.002856    1643   snmpd            M net
+0.002863    1643   snmpd            M dev
+0.002952    1643   snmpd            M net/if_inet6
+0.002964    1643   snmpd            M if_inet6
+0.003180    1643   snmpd            M net/ipv4/neigh/eth0/retrans_time_ms
+0.003192    1643   snmpd            M ipv4/neigh/eth0/retrans_time_ms
+0.003197    1643   snmpd            M neigh/eth0/retrans_time_ms
+0.003203    1643   snmpd            M eth0/retrans_time_ms
+0.003206    1643   snmpd            M retrans_time_ms
+0.003245    1643   snmpd            M ipv6/neigh/eth0/retrans_time_ms
+0.003249    1643   snmpd            M neigh/eth0/retrans_time_ms
+0.003252    1643   snmpd            M eth0/retrans_time_ms
+0.003255    1643   snmpd            M retrans_time_ms
+0.003287    1643   snmpd            M conf/eth0/forwarding
+0.003292    1643   snmpd            M eth0/forwarding
+0.003295    1643   snmpd            M forwarding
+0.003326    1643   snmpd            M base_reachable_time_ms
+[...]
+
+I ran a drop caches at the same time as executing this tool. The output shows
+the processes, the type of event ("T" column: M == miss, R == reference),
+and the filename for the dcache lookup.
+
+The way the dcache is currently implemented, each component of a path is
+checked in turn. The first line, showing "net/dev" from snmp, will be a lookup
+for "net" in a directory (that isn't shown here). If it finds "net", it will
+then lookup "dev" inside net. You can see this sequence a little later,
+starting at time 0.003180, where a pathname is being searched
+directory by directory.
+
+
+The -a option will show all lookups, although be warned, the output will be
+very verbose. For example:
+
+# ./dcsnoop
+TIME(s)     PID    COMM             T FILE
+0.000000    20279  dcsnoop.py       M p_lookup_fast
+0.000010    20279  dcsnoop.py       M enable
+0.000013    20279  dcsnoop.py       M id
+0.000015    20279  dcsnoop.py       M filter
+0.000017    20279  dcsnoop.py       M trigger
+0.000019    20279  dcsnoop.py       M format
+0.006148    20279  dcsnoop.py       R sys/kernel/debug/tracing/trace_pipe
+0.006158    20279  dcsnoop.py       R kernel/debug/tracing/trace_pipe
+0.006161    20279  dcsnoop.py       R debug/tracing/trace_pipe
+0.006164    20279  dcsnoop.py       R tracing/trace_pipe
+0.006166    20279  dcsnoop.py       R trace_pipe
+0.015900    1643   snmpd            R proc/sys/net/ipv6/conf/lo/forwarding
+0.015901    1643   snmpd            R sys/net/ipv6/conf/lo/forwarding
+0.015901    1643   snmpd            R net/ipv6/conf/lo/forwarding
+0.015902    1643   snmpd            R ipv6/conf/lo/forwarding
+0.015903    1643   snmpd            R conf/lo/forwarding
+0.015904    1643   snmpd            R lo/forwarding
+0.015905    1643   snmpd            M lo/forwarding
+0.015908    1643   snmpd            R forwarding
+0.015909    1643   snmpd            M forwarding
+0.015937    1643   snmpd            R proc/sys/net/ipv6/neigh/lo/base_reachable_time_ms
+0.015937    1643   snmpd            R sys/net/ipv6/neigh/lo/base_reachable_time_ms
+0.015938    1643   snmpd            R net/ipv6/neigh/lo/base_reachable_time_ms
+0.015939    1643   snmpd            R ipv6/neigh/lo/base_reachable_time_ms
+0.015940    1643   snmpd            R neigh/lo/base_reachable_time_ms
+0.015941    1643   snmpd            R lo/base_reachable_time_ms
+0.015941    1643   snmpd            R base_reachable_time_ms
+0.015943    1643   snmpd            M base_reachable_time_ms
+0.043569    1876   supervise        M 20281
+0.043573    1886   supervise        M 20280
+0.043582    1886   supervise        R supervise/status.new
+[...]
+
+
+USAGE message:
+
+# ./dcsnoop.py -h
+usage: dcsnoop.py [-h] [-a]
+
+Trace directory entry cache (dcache) lookups
+
+optional arguments:
+  -h, --help  show this help message and exit
+  -a, --all   trace all lookups (default is fails only)
+
+examples:
+    ./dcsnoop           # trace failed dcache lookups
+    ./dcsnoop -a        # trace all dcache lookups
--- a/tools/dcstat.py
+++ b/tools/dcstat.py
+#!/usr/bin/python
+# @lint-avoid-python-3-compatibility-imports
+#
+# dcstat   Directory entry cache (dcache) stats.
+#          For Linux, uses BCC, eBPF.
+#
+# USAGE: dcstat [interval [count]]
+#
+# This uses kernel dynamic tracing of kernel functions, lookup_fast() and
+# d_lookup(), which will need to be modified to match kernel changes. See
+# code comments.
+#
+# Copyright 2016 Netflix, Inc.
+# Licensed under the Apache License, Version 2.0 (the "License")
+#
+# 09-Feb-2016   Brendan Gregg   Created this.
+
+from __future__ import print_function
+from bcc import BPF
+from ctypes import c_int
+from time import sleep, strftime
+from sys import argv
+
+def usage():
+    print("USAGE: %s [interval [count]]" % argv[0])
+    exit()
+
+# arguments
+interval = 1
+count = -1
+if len(argv) > 1:
+    try:
+        interval = int(argv[1])
+        if interval == 0:
+            raise
+        if len(argv) > 2:
+            count = int(argv[2])
+    except:  # also catches -h, --help
+        usage()
+
+# define BPF program
+bpf_text = """
+#include <uapi/linux/ptrace.h>
+
+enum stats {
+    S_REFS = 1,
+    S_SLOW,
+    S_MISS,
+    S_MAXSTAT
+};
+
+BPF_TABLE("array", int, u64, stats, S_MAXSTAT + 1);
+
+/*
+ * How this is instrumented, and how to interpret the statistics, is very much
+ * tied to the current kernel implementation (this was written on Linux 4.4).
+ * This will need maintenance to keep working as the implementation changes. To
+ * aid future adventurers, this is is what the current code does, and why.
+ *
+ * First problem: the current implementation takes a path and then does a
+ * lookup of each component. So how do we count a reference? Once for the path
+ * lookup, or once for every component lookup? I've chosen the latter
+ * since it seems to map more closely to actual dcache lookups (via
+ * __d_lookup_rcu()). It's counted via calls to lookup_fast().
+ *
+ * The implementation tries different, progressively slower, approaches to
+ * lookup a file. At what point do we call it a dcache miss? I've choosen when
+ * a d_lookup() (which is called during lookup_slow()) returns zero.
+ *
+ * I've also included a "SLOW" statistic to show how often the fast lookup
+ * failed. Whether this exists or is interesting is an implementation detail,
+ * and the "SLOW" statistic may be removed in future versions.
+ */
+void count_fast(struct pt_regs *ctx) {
+    int key = S_REFS;
+    u64 *leaf = stats.lookup(&key);
+    if (leaf) (*leaf)++;
+}
+
+void count_lookup(struct pt_regs *ctx) {
+    int key = S_SLOW;
+    u64 *leaf = stats.lookup(&key);
+    if (leaf) (*leaf)++;
+    if (ctx->ax == 0) {
+        key = S_MISS;
+        leaf = stats.lookup(&key);
+        if (leaf) (*leaf)++;
+    }
+}
+"""
+
+# load BPF program
+b = BPF(text=bpf_text)
+b.attach_kprobe(event="lookup_fast", fn_name="count_fast")
+b.attach_kretprobe(event="d_lookup", fn_name="count_lookup")
+
+# stat column labels and indexes
+stats = {
+    "REFS": 1,
+    "SLOW": 2,
+    "MISS": 3
+}
+
+# header
+print("%-8s  " % "TIME", end="")
+for stype, idx in sorted(stats.iteritems(), key=lambda (k, v): (v, k)):
+    print(" %8s" % (stype + "/s"), end="")
+print(" %8s" % "HIT%")
+
+# output
+i = 0
+while (1):
+    if count > 0:
+        i += 1
+        if i > count:
+            exit()
+    try:
+        sleep(interval)
+    except KeyboardInterrupt:
+        pass
+        exit()
+
+    print("%-8s: " % strftime("%H:%M:%S"), end="")
+
+    # print each statistic as a column
+    for stype, idx in sorted(stats.iteritems(), key=lambda (k, v): (v, k)):
+        try:
+            val = b["stats"][c_int(idx)].value / interval
+            print(" %8d" % val, end="")
+        except:
+            print(" %8d" % 0, end="")
+
+    # print hit ratio percentage
+    try:
+        ref = b["stats"][c_int(stats["REFS"])].value
+        miss = b["stats"][c_int(stats["MISS"])].value
+        hit = ref - miss
+        pct = float(100) * hit / ref
+        print(" %8.2f" % pct)
+    except:
+        print(" %7s%%" % "-")
+
+    b["stats"].clear()
--- a/tools/dcstat_example.txt
+++ b/tools/dcstat_example.txt
+Demonstrations of dcstat, the Linux eBPF/bcc version.
+
+
+dcstat shows directory entry cache (dcache) statistics. For example:
+
+# ./dcstat 
+TIME         REFS/s   SLOW/s   MISS/s     HIT%
+08:11:47:      2059      141       97    95.29
+08:11:48:     79974      151      106    99.87
+08:11:49:    192874      146      102    99.95
+08:11:50:      2051      144      100    95.12
+08:11:51:     73373    17239    17194    76.57
+08:11:52:     54685    25431    25387    53.58
+08:11:53:     18127     8182     8137    55.12
+08:11:54:     22517    10345    10301    54.25
+08:11:55:      7524     2881     2836    62.31
+08:11:56:      2067      141       97    95.31
+08:11:57:      2115      145      101    95.22
+
+The output shows the total references per second ("REFS/s"), the number that
+took a slower code path to be processed ("SLOW/s"), the number of dcache misses
+("MISS/s"), and the hit ratio as a percentage. By default, an interval of 1
+second is used.
+
+At 08:11:49, there were 192 thousand references, which almost entirely hit
+from the dcache, with a hit ration of 99.95%. A little later, starting at
+08:11:51, a workload began that walked many uncached files, reducing the hit
+ratio to 53%, and more importantly, a miss rate of over 10 thousand per second.
+
+
+Here's an interesting workload:
+
+# ./dcstat 
+TIME         REFS/s   SLOW/s   MISS/s     HIT%
+08:15:53:    250683      141       97    99.96
+08:15:54:    266115      145      101    99.96
+08:15:55:    268428      141       97    99.96
+08:15:56:    260389      143       99    99.96
+
+It's a 99.96% hit ratio, and these are all negative hits: accessing a file that
+does not exist. Here's the C program that generated the workload:
+
+# cat -n badopen.c
+     1	#include <sys/types.h>
+     2	#include <sys/stat.h>
+     3	#include <fcntl.h>
+     4	
+     5	int
+     6	main(int argc, char *argv[])
+     7	{
+     8	    int fd;
+     9	    while (1) {
+    10	        fd = open("bad", O_RDONLY);
+    11	    }
+    12	    return 0;
+    13	}
+
+This is a simple workload generator than tries to open a missing file ("bad")
+as quickly as possible.
+
+
+Lets see what happens if the workload attempts to open a different filename
+each time (which is also a missing file), using the following C code:
+
+# cat -n badopen2.c
+     1	#include <sys/types.h>
+     2	#include <sys/stat.h>
+     3	#include <fcntl.h>
+     4	#include <stdio.h>
+     5	
+     6	int
+     7	main(int argc, char *argv[])
+     8	{
+     9	    int fd, i = 0;
+    10	    char buf[128] = {};
+    11	
+    12	    while (1) {
+    13	        sprintf(buf, "bad%d", i++);
+    14	        fd = open(buf, O_RDONLY);
+    15	    }
+    16	    return 0;
+    17	}
+
+Here's dcstat:
+
+# ./dcstat 
+TIME         REFS/s   SLOW/s   MISS/s     HIT%
+08:18:52:    241131   237544   237505     1.51
+08:18:53:    238210   236323   236278     0.82
+08:18:54:    235259   233307   233261     0.85
+08:18:55:    233144   231256   231214     0.83
+08:18:56:    231981   230097   230053     0.83
+
+
+dcstat also supports an optional interval and optional count. For example,
+printing 5 second summaries 3 times:
+
+# ./dcstat 5 3
+TIME         REFS/s   SLOW/s   MISS/s     HIT%
+08:20:03:      2085      143       99    95.23
+08:20:08:      2077      143       98    95.24
+08:20:14:      2071      144      100    95.15
+
+
+USAGE message:
+
+# ./dcstat -h
+USAGE: ./dcstat [interval [count]]
--- a/tools/execsnoop.py
+++ b/tools/execsnoop.py
+#!/usr/bin/python
+# @lint-avoid-python-3-compatibility-imports
+#
+# execsnoop Trace new processes via exec() syscalls.
+#           For Linux, uses BCC, eBPF. Embedded C.
+#
+# USAGE: execsnoop [-h] [-t] [-x] [-n NAME]
+#
+# This currently will print up to a maximum of 19 arguments, plus the process
+# name, so 20 fields in total (MAXARG).
+#
+# This won't catch all new processes: an application may fork() but not exec().
+#
+# Copyright 2016 Netflix, Inc.
+# Licensed under the Apache License, Version 2.0 (the "License")
+#
+# 07-Feb-2016   Brendan Gregg   Created this.
+
+from __future__ import print_function
+from bcc import BPF
+import argparse
+import re
+
+# arguments
+examples = """examples:
+    ./execsnoop           # trace all exec() syscalls
+    ./execsnoop -x        # include failed exec()s
+    ./execsnoop -t        # include timestamps
+    ./execsnoop -n main   # only print command lines containing "main"
+"""
+parser = argparse.ArgumentParser(
+    description="Trace exec() syscalls",
+    formatter_class=argparse.RawDescriptionHelpFormatter,
+    epilog=examples)
+parser.add_argument("-t", "--timestamp", action="store_true",
+    help="include timestamp on output")
+parser.add_argument("-x", "--fails", action="store_true",
+    help="include failed exec()s")
+parser.add_argument("-n", "--name",
+    help="only print commands matching this name (regex), any arg")
+args = parser.parse_args()
+
+# define BPF program
+bpf_text = """
+#include <uapi/linux/ptrace.h>
+#include <linux/sched.h>
+#include <linux/fs.h>
+
+#define MAXARG   20
+#define ARGSIZE  64
+
+static int print_arg(void *ptr) {
+    // Fetch an argument, and print using bpf_trace_printk(). This is a work
+    // around until we have a binary trace interface for passing event data to
+    // bcc. Since exec()s should be low frequency, the additional overhead in
+    // this case should not be a problem.
+    const char *argp = NULL;
+    char buf[ARGSIZE] = {};
+
+    bpf_probe_read(&argp, sizeof(argp), ptr);
+    if (argp == NULL) return 0;
+
+    bpf_probe_read(&buf, sizeof(buf), (void *)(argp));
+    bpf_trace_printk("ARG %s\\n", buf);
+
+    return 1;
+}
+
+int kprobe__sys_execve(struct pt_regs *ctx, struct filename *filename,
+    const char __user *const __user *__argv,
+    const char __user *const __user *__envp)
+{
+    char fname[ARGSIZE] = {};
+    bpf_probe_read(&fname, sizeof(fname), (void *)(filename));
+    bpf_trace_printk("ARG %s\\n", fname);
+
+    int i = 1;  // skip first arg, as we printed fname
+
+    // unrolled loop to walk argv[] (MAXARG)
+    if (print_arg((void *)&__argv[i]) == 0) goto out; i++;
+    if (print_arg((void *)&__argv[i]) == 0) goto out; i++;
+    if (print_arg((void *)&__argv[i]) == 0) goto out; i++;
+    if (print_arg((void *)&__argv[i]) == 0) goto out; i++;
+    if (print_arg((void *)&__argv[i]) == 0) goto out; i++;
+    if (print_arg((void *)&__argv[i]) == 0) goto out; i++;
+    if (print_arg((void *)&__argv[i]) == 0) goto out; i++;
+    if (print_arg((void *)&__argv[i]) == 0) goto out; i++;
+    if (print_arg((void *)&__argv[i]) == 0) goto out; i++;
+    if (print_arg((void *)&__argv[i]) == 0) goto out; i++; // X
+    if (print_arg((void *)&__argv[i]) == 0) goto out; i++;
+    if (print_arg((void *)&__argv[i]) == 0) goto out; i++;
+    if (print_arg((void *)&__argv[i]) == 0) goto out; i++;
+    if (print_arg((void *)&__argv[i]) == 0) goto out; i++;
+    if (print_arg((void *)&__argv[i]) == 0) goto out; i++;
+    if (print_arg((void *)&__argv[i]) == 0) goto out; i++;
+    if (print_arg((void *)&__argv[i]) == 0) goto out; i++;
+    if (print_arg((void *)&__argv[i]) == 0) goto out; i++;
+    if (print_arg((void *)&__argv[i]) == 0) goto out; i++;
+    if (print_arg((void *)&__argv[i]) == 0) goto out; i++; // XX
+    bpf_trace_printk("ARG ...\\n");    // truncated
+
+out:
+    return 0;
+}
+
+int kretprobe__sys_execve(struct pt_regs *ctx)
+{
+    bpf_trace_printk("RET %d\\n", ctx->ax);
+    return 0;
+}
+"""
+
+# initialize BPF
+b = BPF(text=bpf_text)
+
+# header
+if args.timestamp:
+    print("%-8s" % ("TIME(s)"), end="")
+print("%-16s %-6s %3s %s" % ("PCOMM", "PID", "RET", "ARGS"))
+
+start_ts = 0
+cmd = {}
+pcomm = {}
+
+# format output
+while 1:
+    (task, pid, cpu, flags, ts, msg) = b.trace_fields()
+    try:
+        (type, arg) = msg.split(" ", 1)
+    except ValueError:
+        continue
+
+    if start_ts == 0:
+        start_ts = ts
+
+    if type == "RET":
+        if pid not in cmd:
+            # zero args
+            cmd[pid] = ""
+            pcomm[pid] = ""
+
+        skip = 0
+        if args.name:
+            if not re.search(args.name, cmd[pid]):
+                skip = 1
+        if not args.fails and int(arg) < 0:
+            skip = 1
+        if skip:
+            del cmd[pid]
+            del pcomm[pid]
+            continue
+
+        # output
+        if args.timestamp:
+            print("%-8.3f" % (ts - start_ts), end="")
+        print("%-16s %-6s %3s %s" % (pcomm[pid], pid, arg, cmd[pid]))
+        del cmd[pid]
+        del pcomm[pid]
+    else:
+        # build command line string
+        if pid in cmd:
+            cmd[pid] = cmd[pid] + " " + arg
+        else:
+            cmd[pid] = arg
+        if pid not in pcomm:
+            pcomm[pid] = task
--- a/tools/execsnoop_example.txt
+++ b/tools/execsnoop_example.txt
+Demonstrations of execsnoop, the Linux eBPF/bcc version.
+
+
+execsnoop traces new processes. For example, tracing the commands invoked when
+running "man ls":
+
+# ./execsnoop
+PCOMM            PID    RET ARGS
+bash             15887    0 /usr/bin/man ls
+preconv          15894    0 /usr/bin/preconv -e UTF-8
+man              15896    0 /usr/bin/tbl
+man              15897    0 /usr/bin/nroff -mandoc -rLL=169n -rLT=169n -Tutf8
+man              15898    0 /usr/bin/pager -s
+nroff            15900    0 /usr/bin/locale charmap
+nroff            15901    0 /usr/bin/groff -mtty-char -Tutf8 -mandoc -rLL=169n -rLT=169n
+groff            15902    0 /usr/bin/troff -mtty-char -mandoc -rLL=169n -rLT=169n -Tutf8
+groff            15903    0 /usr/bin/grotty
+
+The output shows the parent process/command name (PCOMM), the PID, the return
+value of the exec() (RET), and the filename with arguments (ARGS). 
+
+This works by traces the execve() system call (commonly used exec() variant),
+and shows details of the arguments and return value. This catches new processes
+that follow the fork->exec sequence, as well as processes that re-exec()
+themselves. Some applications fork() but do not exec(), eg, for worker
+processes, which won't be included in the execsnoop output.
+
+
+The -x option can be used to include failed exec()s. For example:
+
+# ./execsnoop -x
+PCOMM            PID    RET ARGS
+supervise        9660     0 ./run
+supervise        9661     0 ./run
+mkdir            9662     0 /bin/mkdir -p ./main
+run              9663     0 ./run
+chown            9664     0 /bin/chown nobody:nobody ./main
+run              9665     0 /bin/mkdir -p ./main
+supervise        9667     0 ./run
+run              9660    -2 /usr/local/bin/setuidgid nobody /command/multilog t ./main
+chown            9668     0 /bin/chown nobody:nobody ./main
+run              9666     0 /bin/chmod 0777 main
+run              9663    -2 /usr/local/bin/setuidgid nobody /command/multilog t ./main
+run              9669     0 /bin/mkdir -p ./main
+run              9661    -2 /usr/local/bin/setuidgid nobody /command/multilog t ./main
+supervise        9670     0 ./run
+[...]
+
+This example shows various regular system daemon activity, including some
+failures (trying to execute a /usr/local/bin/setuidgid, which I just noticed
+doesn't exist).
+
+
+A -t option can be used to include a timestamp column, and a -n option to match
+on a name or substring from the full command line (filename + args). Regular
+expressions are allowed. For example, matching commands containing "mount":
+
+# ./execsnoop -tn mount
+TIME(s) PCOMM            PID    RET ARGS
+2.849   bash             18049    0 /bin/mount -p
+
+
+USAGE message:
+
+# ./execsnoop -h
+usage: execsnoop [-h] [-t] [-x] [-n NAME]
+
+Trace exec() syscalls
+
+optional arguments:
+  -h, --help            show this help message and exit
+  -t, --timestamp       include timestamp on output
+  -x, --fails           include failed exec()s
+  -n NAME, --name NAME  only print commands matching this name (regex), any
+                        arg
+
+examples:
+    ./execsnoop           # trace all exec() syscalls
+    ./execsnoop -x        # include failed exec()s 
+    ./execsnoop -t        # include timestamps
+    ./execsnoop -n main   # only print command lines containing "main"
--- a/tools/filelife.py
+++ b/tools/filelife.py
+#!/usr/bin/python
+# @lint-avoid-python-3-compatibility-imports
+#
+# filelife    Trace the lifespan of short-lived files.
+#             For Linux, uses BCC, eBPF. Embedded C.
+#
+# This traces the creation and deletion of files, providing information
+# on who deleted the file, the file age, and the file name. The intent is to
+# provide information on short-lived files, for debugging or performance
+# analysis.
+#
+# USAGE: filelife [-h] [-p PID]
+#
+# Copyright 2016 Netflix, Inc.
+# Licensed under the Apache License, Version 2.0 (the "License")
+#
+# 08-Feb-2015   Brendan Gregg   Created this.
+
+from __future__ import print_function
+from bcc import BPF
+import argparse
+from time import strftime
+
+# arguments
+examples = """examples:
+    ./filelife           # trace all stat() syscalls
+    ./filelife -p 181    # only trace PID 181
+"""
+parser = argparse.ArgumentParser(
+    description="Trace stat() syscalls",
+    formatter_class=argparse.RawDescriptionHelpFormatter,
+    epilog=examples)
+parser.add_argument("-p", "--pid",
+    help="trace this PID only")
+args = parser.parse_args()
+debug = 0
+
+# define BPF program
+bpf_text = """
+#include <uapi/linux/ptrace.h>
+#include <linux/fs.h>
+
+BPF_HASH(birth, struct dentry *);
+
+// trace file creation time
+int trace_create(struct pt_regs *ctx, struct inode *dir, struct dentry *dentry)
+{
+    u32 pid = bpf_get_current_pid_tgid();
+    FILTER
+
+    u64 ts = bpf_ktime_get_ns();
+    birth.update(&dentry, &ts);
+
+    return 0;
+};
+
+// trace file deletion and output details
+int trace_unlink(struct pt_regs *ctx, struct inode *dir, struct dentry *dentry)
+{
+    u32 pid = bpf_get_current_pid_tgid();
+    FILTER
+
+    u64 *tsp, delta;
+    tsp = birth.lookup(&dentry);
+    if (tsp == 0) {
+        return 0;   // missed create
+    }
+    delta = (bpf_ktime_get_ns() - *tsp) / 1000000;
+    birth.delete(&dentry);
+
+    if (dentry->d_iname[0] == 0)
+        return 0;
+
+    bpf_trace_printk("%d %s\\n", delta, dentry->d_iname);
+
+    return 0;
+}
+"""
+if args.pid:
+    bpf_text = bpf_text.replace('FILTER',
+        'if (pid != %s) { return 0; }' % args.pid)
+else:
+    bpf_text = bpf_text.replace('FILTER', '')
+if debug:
+    print(bpf_text)
+
+# initialize BPF
+b = BPF(text=bpf_text)
+b.attach_kprobe(event="vfs_create", fn_name="trace_create")
+b.attach_kprobe(event="vfs_unlink", fn_name="trace_unlink")
+
+# header
+print("%-8s %-6s %-16s %-7s %s" % ("TIME", "PID", "COMM", "AGE(s)", "FILE"))
+
+start_ts = 0
+
+# format output
+while 1:
+    (task, pid, cpu, flags, ts, msg) = b.trace_fields()
+    (delta, filename) = msg.split(" ", 1)
+
+    # print columns
+    print("%-8s %-6d %-16s %-7.2f %s" % (strftime("%H:%M:%S"), pid, task,
+        float(delta) / 1000, filename))
--- a/tools/filelife_example.txt
+++ b/tools/filelife_example.txt
+Demonstrations of filelife, the Linux eBPF/bcc version.
+
+
+filelife traces short-lived files: those that have been created and then
+deleted while tracing. For example:
+
+# ./filelife 
+TIME     PID    COMM             AGE(s)  FILE
+05:57:59 8556   gcc              0.04    ccCB5EDe.s
+05:57:59 8560   rm               0.02    .entry_64.o.d
+05:57:59 8563   gcc              0.02    cc5UFHXf.s
+05:57:59 8567   rm               0.01    .thunk_64.o.d
+05:57:59 8578   rm               0.02    .syscall_64.o.d
+05:58:00 8589   rm               0.03    .common.o.d
+05:58:00 8596   rm               0.01    .8592.tmp
+05:58:00 8601   rm               0.01    .8597.tmp
+05:58:00 8606   rm               0.01    .8602.tmp
+05:58:00 8639   rm               0.02    .vma.o.d
+05:58:00 8650   rm               0.02    .vdso32-setup.o.d
+05:58:00 8656   rm               0.00    .vdso.lds.d
+05:58:00 8659   gcc              0.01    ccveeJAz.s
+05:58:00 8663   rm               0.01    .vdso-note.o.d
+05:58:00 8674   rm               0.02    .vclock_gettime.o.d
+05:58:01 8684   rm               0.01    .vgetcpu.o.d
+05:58:01 8690   collect2         0.00    ccvKMxdm.ld
+
+This has caught short-lived files that were created during a Linux kernel
+build. The PID shows the process ID that finally deleted the file, and COMM
+is its process name. The AGE(s) column shows the age of the file, in seconds,
+when it was deleted. These are all short-lived, and existed for less than
+one tenth of a second.
+
+Creating, populating, and then deleting files as part of another process can
+be an inefficient method of inter-process communication. It can cause disk I/O
+as files are closed and their file descriptors flushed, only later to be
+deleted. As such, short-lived files can be a target of performance
+optimizations.
+
+USAGE message:
+
+# ./filelife -h
+usage: filelife [-h] [-p PID]
+
+Trace stat() syscalls
+
+optional arguments:
+  -h, --help         show this help message and exit
+  -p PID, --pid PID  trace this PID only
+
+examples:
+    ./filelife           # trace all stat() syscalls
+    ./filelife -p 181    # only trace PID 181
--- a/tools/fileslower.py
+++ b/tools/fileslower.py
+#!/usr/bin/python
+# @lint-avoid-python-3-compatibility-imports
+#
+# fileslower  Trace slow synchronous file reads and writes.
+#             For Linux, uses BCC, eBPF.
+#
+# USAGE: fileslower [-h] [-p PID] [min_ms]
+#
+# This script uses kernel dynamic tracing of synchronous reads and writes
+# at the VFS interface, to identify slow file reads and writes for any file
+# system.
+#
+# This works by tracing __vfs_read() and __vfs_write(), and filtering for
+# synchronous I/O (the path to new_sync_read() and new_sync_write()), and
+# for I/O with filenames. This approach provides a view of just two file
+# system request types. There are typically many others: asynchronous I/O,
+# directory operations, file handle operations, etc, that this tool does not
+# instrument.
+#
+# WARNING: This traces VFS reads and writes, which can be extremely frequent,
+# and so the overhead of this tool can become severe depending on the
+# workload.
+#
+# By default, a minimum millisecond threshold of 10 is used.
+#
+# Copyright 2016 Netflix, Inc.
+# Licensed under the Apache License, Version 2.0 (the "License")
+#
+# 06-Feb-2016   Brendan Gregg   Created this.
+
+from __future__ import print_function
+from bcc import BPF
+import argparse
+import signal
+
+# arguments
+examples = """examples:
+    ./fileslower             # trace sync file I/O slower than 10 ms (default)
+    ./fileslower 1           # trace sync file I/O slower than 1 ms
+    ./fileslower -p 185      # trace PID 185 only
+"""
+parser = argparse.ArgumentParser(
+    description="Trace slow synchronous file reads and writes",
+    formatter_class=argparse.RawDescriptionHelpFormatter,
+    epilog=examples)
+parser.add_argument("-p", "--pid",
+    help="trace this PID only")
+parser.add_argument("min_ms", nargs="?", default='10',
+    help="minimum I/O duration to trace, in ms (default 10)")
+args = parser.parse_args()
+min_ms = int(args.min_ms)
+pid = args.pid
+debug = 0
+
+# signal handler
+def signal_ignore(signal, frame):
+    print()
+
+# define BPF program
+bpf_text = """
+#include <uapi/linux/ptrace.h>
+#include <linux/fs.h>
+
+#define TRACE_READ	0
+#define TRACE_WRITE	1
+
+struct val_t {
+    u32 sz;
+    u64 ts;
+    char name[DNAME_INLINE_LEN];
+};
+
+BPF_HASH(entryinfo, pid_t, struct val_t);
+
+// store timestamp and size on entry
+static int trace_rw_entry(struct pt_regs *ctx, struct file *file,
+    char __user *buf, size_t count)
+{
+    u32 pid;
+
+    pid = bpf_get_current_pid_tgid();
+    if (FILTER)
+        return 0;
+
+    // skip I/O lacking a filename
+    struct dentry *de = file->f_path.dentry;
+    if (de->d_iname[0] == 0)
+        return 0;
+
+    // store size and timestamp by pid
+    struct val_t val = {};
+    val.sz = count;
+    val.ts = bpf_ktime_get_ns();
+    __builtin_memcpy(&val.name, de->d_iname, sizeof(val.name));
+    entryinfo.update(&pid, &val);
+
+    return 0;
+}
+
+int trace_read_entry(struct pt_regs *ctx, struct file *file,
+    char __user *buf, size_t count)
+{
+    // skip non-sync I/O; see kernel code for __vfs_read()
+    if (!(file->f_op->read_iter))
+        return 0;
+    return trace_rw_entry(ctx, file, buf, count);
+}
+
+int trace_write_entry(struct pt_regs *ctx, struct file *file,
+    char __user *buf, size_t count)
+{
+    // skip non-sync I/O; see kernel code for __vfs_write()
+    if (!(file->f_op->write_iter))
+        return 0;
+    return trace_rw_entry(ctx, file, buf, count);
+}
+
+// output
+static int trace_rw_return(struct pt_regs *ctx, int type)
+{
+    struct val_t *valp;
+    u32 pid = bpf_get_current_pid_tgid();
+
+    valp = entryinfo.lookup(&pid);
+    if (valp == 0) {
+        // missed tracing issue or filtered
+        return 0;
+    }
+    u64 delta_us = (bpf_ktime_get_ns() - valp->ts) / 1000;
+    entryinfo.delete(&pid);
+    if (delta_us < MIN_US)
+        return 0;
+
+    if (type == TRACE_READ) {
+        bpf_trace_printk("R %d %d %s\\n", valp->sz, delta_us, valp->name);
+    } else {
+        bpf_trace_printk("W %d %d %s\\n", valp->sz, delta_us, valp->name);
+    }
+
+    return 0;
+}
+
+int trace_read_return(struct pt_regs *ctx)
+{
+    return trace_rw_return(ctx, TRACE_READ);
+}
+
+int trace_write_return(struct pt_regs *ctx)
+{
+    return trace_rw_return(ctx, TRACE_WRITE);
+}
+
+"""
+bpf_text = bpf_text.replace('MIN_US', str(min_ms * 1000))
+if args.pid:
+    bpf_text = bpf_text.replace('FILTER', 'pid != %s' % pid)
+else:
+    bpf_text = bpf_text.replace('FILTER', '0')
+if debug:
+    print(bpf_text)
+
+# initialize BPF
+b = BPF(text=bpf_text)
+
+# I'd rather trace these via new_sync_read/new_sync_write (which used to be
+# do_sync_read/do_sync_write), but those became static. So trace these from
+# the parent functions, at the cost of more overhead, instead.
+# Ultimately, we should be using [V]FS tracepoints.
+b.attach_kprobe(event="__vfs_read", fn_name="trace_read_entry")
+b.attach_kprobe(event="__vfs_write", fn_name="trace_write_entry")
+b.attach_kretprobe(event="__vfs_read", fn_name="trace_read_return")
+b.attach_kretprobe(event="__vfs_write", fn_name="trace_write_return")
+
+# header
+print("Tracing sync read/writes slower than %d ms" % min_ms)
+print("%-8s %-14s %-6s %1s %-7s %7s %s" % ("TIME(s)", "COMM", "PID", "D",
+    "BYTES", "LAT(ms)", "FILENAME"))
+
+start_ts = 0
+
+# format output
+while 1:
+    (task, pid, cpu, flags, ts, msg) = b.trace_fields()
+    args = msg.split(" ", 3)
+    (type_s, sz, delta_us_s) = (args[0], args[1], args[2])
+    try:
+        filename = args[3]
+    except:
+        filename = "?"
+    if start_ts == 0:
+        start_ts = ts
+
+    ms = float(int(delta_us_s, 10)) / 1000
+
+    print("%-8.3f %-14.14s %-6s %1s %-7s %7.2f %s" % (
+        ts - start_ts, task, pid, type_s, sz, ms, filename))
--- a/tools/fileslower_example.txt
+++ b/tools/fileslower_example.txt
+Demonstrations of fileslower, the Linux eBPF/bcc version.
+
+
+fileslower shows file-based synchronous reads and writes slower than a
+threshold. For example:
+
+# ./fileslower 
+Tracing sync read/writes slower than 10 ms
+TIME(s)  COMM           PID    D BYTES   LAT(ms) FILENAME
+0.000    randread.pl    4762   R 8192      12.70 data1
+8.850    randread.pl    4762   R 8192      11.26 data1
+12.852   randread.pl    4762   R 8192      10.43 data1
+
+This showed a few reads from a "randread.pl" program, each 8 Kbytes in size,
+and from a "data1" file. These all had over 10 ms latency.
+
+This "latency" is measured from when the read or write was issued at the VFS
+interface, to when it completed. This spans everything: block device I/O (disk
+I/O), file system CPU cycles, file system locks, run queue latency, etc. This
+is a better measure of the latency suffered by applications reading from the
+file system than measuring this down at the block device interface.
+
+Note that this only traces file reads and writes: other file system operations
+(eg, directory operations, open(), fflush()) are not traced.
+
+
+The threshold can be provided as an argument. Eg, I/O slower than 1 ms:
+
+# ./fileslower 1
+Tracing sync read/writes slower than 1 ms
+TIME(s)  COMM           PID    D BYTES   LAT(ms) FILENAME
+0.000    randread.pl    6925   R 8192       1.06 data1
+0.082    randread.pl    6925   R 8192       2.42 data1
+0.116    randread.pl    6925   R 8192       1.78 data1
+0.153    randread.pl    6925   R 8192       2.31 data1
+0.330    randread.pl    6925   R 8192       1.14 data1
+0.345    randread.pl    6925   R 8192       1.52 data1
+0.359    randread.pl    6925   R 8192       1.04 data1
+0.532    randread.pl    6925   R 8192       2.56 data1
+0.609    supervise      1892   W 18         3.65 status.new
+0.610    randread.pl    6925   R 8192       1.37 data1
+0.614    randread.pl    6925   R 8192       3.04 data1
+0.729    randread.pl    6925   R 8192       2.90 data1
+0.755    randread.pl    6925   R 8192       1.12 data1
+0.762    randread.pl    6925   R 8192       2.62 data1
+0.771    randread.pl    6925   R 8192       1.07 data1
+0.816    randread.pl    6925   R 8192      10.50 data1
+0.983    randread.pl    6925   R 8192       1.73 data1
+0.989    randread.pl    6925   R 8192       2.12 data1
+0.992    randread.pl    6925   R 8192       2.17 data1
+1.001    randread.pl    6925   R 8192       1.93 data1
+1.007    randread.pl    6925   R 8192       2.03 data1
+1.210    randread.pl    6925   R 8192       1.82 data1
+1.213    randread.pl    6925   R 8192       2.58 data1
+1.219    randread.pl    6925   R 8192       2.20 data1
+1.430    randread.pl    6925   R 8192       1.01 data1
+1.448    randread.pl    6925   R 8192       2.22 data1
+[...]
+
+There's now much more output (this spans only 1.4 seconds, the previous output
+spanned 12 seconds), and the lower threshold is catching more I/O.
+
+
+In the following example, the file system caches were dropped before running
+fileslower, and then in another session a "man ls" was executed. The command
+and files read from disk can be seen:
+
+# echo 3 > /proc/sys/vm/drop_caches; ./fileslower 1
+Tracing sync read/writes slower than 1 ms
+TIME(s)  COMM           PID    D BYTES   LAT(ms) FILENAME
+0.000    bash           9647   R 128        5.83 man
+0.050    man            9647   R 832       19.52 libmandb-2.6.7.1.so
+0.066    man            9647   R 832       15.79 libman-2.6.7.1.so
+0.123    man            9647   R 832       56.36 libpipeline.so.1.3.0
+0.135    man            9647   R 832        9.79 libgdbm.so.3.0.0
+0.323    man            9647   R 4096      59.52 locale.alias
+0.540    man            9648   R 8192      11.11 ls.1.gz
+0.558    man            9647   R 72         6.97 index.db
+0.563    man            9647   R 4096       5.12 index.db
+0.723    man            9658   R 128       12.06 less
+0.725    man            9656   R 128       14.52 nroff
+0.779    man            9655   R 128       68.86 tbl
+0.814    nroff          9660   R 128       14.55 locale
+0.830    pager          9658   R 4096      28.27 .lesshst
+0.866    man            9654   R 128      163.12 preconv
+0.980    nroff          9684   R 128       13.80 groff
+0.999    groff          9684   R 4096      14.29 DESC
+1.036    groff          9685   R 128        5.94 troff
+1.038    groff          9686   R 128        7.76 grotty
+1.065    troff          9685   R 4096       6.33 R
+1.082    troff          9685   R 4096      10.52 BI
+1.096    troff          9685   R 4096       8.70 troffrc
+1.176    troff          9685   R 4096      80.12 composite.tmac
+1.195    troff          9685   R 4096      19.20 fallbacks.tmac
+1.202    troff          9685   R 4096       6.79 tty.tmac
+1.221    troff          9685   R 4096       7.87 man.local
+2.977    supervise      1876   W 18         4.23 status.new
+
+This caught an individual I/O reaching 163.12 ms, for the "preconv" file. While
+the file system cache was flushed, causing these to need to be read from disk,
+the duration here may not be entirely disk I/O: it can include file system
+locks, run queue latency, etc. These can be explored using other commands.
+
+
+USAGE message:
+
+# ./fileslower -h
+usage: fileslower [-h] [-p PID] [min_ms]
+
+Trace slow synchronous file reads and writes
+
+positional arguments:
+  min_ms             minimum I/O duration to trace, in ms (default 10)
+
+optional arguments:
+  -h, --help         show this help message and exit
+  -p PID, --pid PID  trace this PID only
+
+examples:
+    ./fileslower             # trace sync file I/O slower than 10 ms (default)
+    ./fileslower 1           # trace sync file I/O slower than 1 ms
+    ./fileslower -p 185      # trace PID 185 only
--- a/tools/filetop.py
+++ b/tools/filetop.py
+#!/usr/bin/python
+# @lint-avoid-python-3-compatibility-imports
+#
+# filetop  file reads and writes by process.
+#          For Linux, uses BCC, eBPF.
+#
+# USAGE: filetop.py [-h] [-C] [-r MAXROWS] [interval] [count]
+#
+# This uses in-kernel eBPF maps to store per process summaries for efficiency.
+#
+# Copyright 2016 Netflix, Inc.
+# Licensed under the Apache License, Version 2.0 (the "License")
+#
+# 06-Feb-2016   Brendan Gregg   Created this.
+
+from __future__ import print_function
+from bcc import BPF
+from time import sleep, strftime
+import argparse
+import signal
+from subprocess import call
+
+# arguments
+examples = """examples:
+    ./filetop            # file I/O top, 1 second refresh
+    ./filetop -C         # don't clear the screen
+    ./filetop -p 181     # PID 181 only
+    ./filetop 5          # 5 second summaries
+    ./filetop 5 10       # 5 second summaries, 10 times only
+"""
+parser = argparse.ArgumentParser(
+    description="File reads and writes by process",
+    formatter_class=argparse.RawDescriptionHelpFormatter,
+    epilog=examples)
+parser.add_argument("-C", "--noclear", action="store_true",
+    help="don't clear the screen")
+parser.add_argument("-r", "--maxrows", default=20,
+    help="maximum rows to print, default 20")
+parser.add_argument("-p", "--pid",
+    help="trace this PID only")
+parser.add_argument("interval", nargs="?", default=1,
+    help="output interval, in seconds")
+parser.add_argument("count", nargs="?", default=99999999,
+    help="number of outputs")
+args = parser.parse_args()
+interval = int(args.interval)
+countdown = int(args.count)
+maxrows = int(args.maxrows)
+clear = not int(args.noclear)
+debug = 0
+
+# linux stats
+loadavg = "/proc/loadavg"
+
+# signal handler
+def signal_ignore(signal, frame):
+    print()
+
+# define BPF program
+bpf_text = """
+#include <uapi/linux/ptrace.h>
+#include <linux/blkdev.h>
+
+#define MAX_FILE_LEN    32
+
+// the key for the output summary
+struct info_t {
+    u32 pid;
+    char name[TASK_COMM_LEN];
+    char file[MAX_FILE_LEN];
+    char type;
+};
+
+// the value of the output summary
+struct val_t {
+    u64 reads;
+    u64 writes;
+    u64 rbytes;
+    u64 wbytes;
+};
+
+BPF_HASH(counts, struct info_t, struct val_t);
+
+static int do_entry(struct pt_regs *ctx, struct file *file,
+    char __user *buf, size_t count, int is_read)
+{
+    u32 pid;
+
+    pid = bpf_get_current_pid_tgid();
+    if (FILTER)
+        return 0;
+
+    // skip I/O lacking a filename
+    struct dentry *de = file->f_path.dentry;
+    if (de->d_iname[0] == 0)
+        return 0;
+
+    // store counts and sizes by pid & file
+    struct info_t info = {.pid = pid};
+    bpf_get_current_comm(&info.name, sizeof(info.name));
+    __builtin_memcpy(&info.file, de->d_iname, sizeof(info.file));
+    int mode = file->f_inode->i_mode;
+    if (S_ISREG(mode)) {
+        info.type = 'R';
+    } else if (S_ISSOCK(mode)) {
+        info.type = 'S';
+    } else {
+        info.type = 'O';
+    }
+
+    struct val_t *valp, zero = {};
+    valp = counts.lookup_or_init(&info, &zero);
+    if (is_read) {
+        valp->reads++;
+        valp->rbytes += count;
+    } else {
+        valp->writes++;
+        valp->wbytes += count;
+    }
+
+    return 0;
+}
+
+int trace_read_entry(struct pt_regs *ctx, struct file *file,
+    char __user *buf, size_t count)
+{
+    return do_entry(ctx, file, buf, count, 1);
+}
+
+int trace_write_entry(struct pt_regs *ctx, struct file *file,
+    char __user *buf, size_t count)
+{
+    return do_entry(ctx, file, buf, count, 0);
+}
+
+"""
+if args.pid:
+    bpf_text = bpf_text.replace('FILTER', 'pid != %s' % args.pid)
+else:
+    bpf_text = bpf_text.replace('FILTER', '0')
+if debug:
+    print(bpf_text)
+
+# initialize BPF
+b = BPF(text=bpf_text)
+b.attach_kprobe(event="__vfs_read", fn_name="trace_read_entry")
+b.attach_kprobe(event="__vfs_write", fn_name="trace_write_entry")
+
+print('Tracing... Output every %d secs. Hit Ctrl-C to end' % interval)
+
+# output
+exiting = 0
+while 1:
+    try:
+        sleep(interval)
+    except KeyboardInterrupt:
+        exiting = 1
+
+    # header
+    if clear:
+        call("clear")
+    else:
+        print()
+    with open(loadavg) as stats:
+        print("%-8s loadavg: %s" % (strftime("%H:%M:%S"), stats.read()))
+    print("%-6s %-16s %-6s %-6s %-7s %-7s %1s %s" % ("PID", "COMM",
+        "READS", "WRITES", "R_Kb", "W_Kb", "T", "FILE"))
+
+    # by-PID output
+    counts = b.get_table("counts")
+    line = 0
+    for k, v in reversed(sorted(counts.items(),
+                                key=lambda counts: counts[1].rbytes)):
+
+        # print line
+        print("%-6d %-16s %-6d %-6d %-7d %-7d %1s %s" % (k.pid, k.name,
+            v.reads, v.writes, v.rbytes / 1024, v.wbytes / 1024, k.type,
+            k.file))
+
+        line += 1
+        if line >= maxrows:
+            break
+    counts.clear()
+
+    countdown -= 1
+    if exiting or countdown == 0:
+        print("Detaching...")
+        exit()
--- a/tools/filetop_example.txt
+++ b/tools/filetop_example.txt
+Demonstrations of filetop, the Linux eBPF/bcc version.
+
+
+filetop shows reads and writes by file, with process details. For example:
+
+# ./filetop -C
+Tracing... Output every 1 secs. Hit Ctrl-C to end
+
+08:00:23 loadavg: 0.91 0.33 0.23 3/286 26635
+
+PID    COMM             READS  WRITES R_Kb    W_Kb    T FILE
+26628  ld               161    186    643     152     R built-in.o
+26634  cc1              1      0      200     0       R autoconf.h
+26618  cc1              1      0      200     0       R autoconf.h
+26634  cc1              12     0      192     0       R tracepoint.h
+26584  cc1              2      0      143     0       R mm.h
+26634  cc1              2      0      143     0       R mm.h
+26631  make             34     0      136     0       R auto.conf
+26634  cc1              1      0      98      0       R fs.h
+26584  cc1              1      0      98      0       R fs.h
+26634  cc1              1      0      91      0       R sched.h
+26634  cc1              1      0      78      0       R printk.c
+26634  cc1              3      0      73      0       R mmzone.h
+26628  ld               18     0      72      0       R hibernate.o
+26628  ld               16     0      64      0       R suspend.o
+26628  ld               16     0      64      0       R snapshot.o
+26630  cat              1      0      64      0       O null
+26628  ld               16     0      64      0       R qos.o
+26628  ld               13     0      52      0       R main.o
+26628  ld               12     0      52      0       R swap.o
+12421  sshd             3      0      48      0       O ptmx
+[...]
+
+This shows various files read and written during a Linux kernel build. The
+output is sorted by the total read size in Kbytes (R_Kb). This is instrumenting
+at the VFS interface, so this is reads and writes that may return entirely
+from the file system cache (page cache).
+
+While not printed, the average read and write size can be calculated by
+dividing R_Kb by READS, and the same for writes.
+
+The "T" column indicates the type of the file: "R" for regular files, "S" for
+sockets, and "O" for other (including pipes).
+
+This script works by tracing the vfs_read() and vfs_write() functions using
+kernel dynamic tracing, which instruments explicit read and write calls. If
+files are read or written using another means (eg, via mmap()), then they
+will not be visible using this tool.
+
+This should be useful for file system workload characterization when analyzing
+the performance of applications.
+
+Note that tracing VFS level reads and writes can be a frequent activity, and
+this tool can begin to cost measurable overhead at high I/O rates.
+
+
+A -C option will stop clearing the screen, and -r with a number will restrict
+the output to that many rows (20 by default). For example, not clearing
+the screen and showing the top 5 only:
+
+# ./filetop -Cr 5
+Tracing... Output every 1 secs. Hit Ctrl-C to end
+
+08:05:11 loadavg: 0.75 0.35 0.25 3/285 822
+
+PID    COMM             READS  WRITES R_Kb    W_Kb    T FILE
+32672  cksum            5006   0      320384  0       R data1
+12296  sshd             2      0      32      0       O ptmx
+809    run              2      0      8       0       R nsswitch.conf
+811    run              2      0      8       0       R nsswitch.conf
+804    chown            2      0      8       0       R nsswitch.conf
+
+08:05:12 loadavg: 0.75 0.35 0.25 3/285 845
+
+PID    COMM             READS  WRITES R_Kb    W_Kb    T FILE
+32672  cksum            4986   0      319104  0       R data1
+845    chown            2      0      8       0       R nsswitch.conf
+828    run              2      0      8       0       R nsswitch.conf
+835    run              2      0      8       0       R nsswitch.conf
+830    run              2      0      8       0       R nsswitch.conf
+
+08:05:13 loadavg: 0.75 0.35 0.25 3/285 868
+
+PID    COMM             READS  WRITES R_Kb    W_Kb    T FILE
+32672  cksum            4985   0      319040  0       R data1
+857    run              2      0      8       0       R nsswitch.conf
+858    run              2      0      8       0       R nsswitch.conf
+859    run              2      0      8       0       R nsswitch.conf
+848    run              2      0      8       0       R nsswitch.conf
+[...]
+
+This output shows a cksum command reading data1. Note that 
+
+
+An optional interval and optional count can also be added to the end of the
+command line. For example, for 1 second interval, and 3 summaries in total:
+
+# ./filetop -Cr 5 1 3
+Tracing... Output every 1 secs. Hit Ctrl-C to end
+
+08:08:20 loadavg: 0.30 0.42 0.31 3/282 5187
+
+PID    COMM             READS  WRITES R_Kb    W_Kb    T FILE
+12421  sshd             14101  0      225616  0       O ptmx
+12296  sshd             4      0      64      0       O ptmx
+12421  sshd             3      14104  48      778     S TCP
+5178   run              2      0      8       0       R nsswitch.conf
+5165   run              2      0      8       0       R nsswitch.conf
+
+08:08:21 loadavg: 0.30 0.42 0.31 5/282 5210
+
+PID    COMM             READS  WRITES R_Kb    W_Kb    T FILE
+12421  sshd             9159   0      146544  0       O ptmx
+12421  sshd             3      9161   48      534     S TCP
+12296  sshd             1      0      16      0       S TCP
+5188   run              2      0      8       0       R nsswitch.conf
+5203   run              2      0      8       0       R nsswitch.conf
+
+08:08:22 loadavg: 0.30 0.42 0.31 2/282 5233
+
+PID    COMM             READS  WRITES R_Kb    W_Kb    T FILE
+12421  sshd             26166  0      418656  0       O ptmx
+12421  sshd             4      26171  64      1385    S TCP
+12296  sshd             1      0      16      0       O ptmx
+5214   run              2      0      8       0       R nsswitch.conf
+5227   run              2      0      8       0       R nsswitch.conf
+Detaching...
+
+This example has caught heavy socket I/O from an sshd process, showing up as
+non-regular file types (the "O" for other, and "S" for socket, in the type
+column: "T"). 
+
+
+USAGE message:
+
+# ./filetop -h
+usage: filetop [-h] [-C] [-r MAXROWS] [-p PID] [interval] [count]
+
+File reads and writes by process
+
+positional arguments:
+  interval              output interval, in seconds
+  count                 number of outputs
+
+optional arguments:
+  -h, --help            show this help message and exit
+  -C, --noclear         don't clear the screen
+  -r MAXROWS, --maxrows MAXROWS
+                        maximum rows to print, default 20
+  -p PID, --pid PID     trace this PID only
+
+examples:
+    ./filetop            # file I/O top, 1 second refresh
+    ./filetop -C         # don't clear the screen
+    ./filetop -p 181     # PID 181 only
+    ./filetop 5          # 5 second summaries
+    ./filetop 5 10       # 5 second summaries, 10 times only
--- a/tools/funccount_example.txt
+++ b/tools/funccount_example.txt
@@ -166,7 +166,7 @@ The current implementation can take many seconds to detach from tracing, after
 Ctrl-C has been hit.


-Couting all vfs functions for process ID 5276 only:
+Counting all vfs functions for process ID 5276 only:

 # ./funccount -p 5276 'vfs_*'
 Tracing... Ctrl-C to end.

--- a/tools/funclatency_example.txt
+++ b/tools/funclatency_example.txt
@@ -37,12 +37,12 @@ the function began executing (was called) to when it finished (returned).
 This example output shows that most of the time, do_sys_open() took between
 2048 and 65536 nanoseconds (2 to 65 microseconds). The peak of this distribution
 shows 291 calls of between 4096 and 8191 nanoseconds. There was also one
-occurrance, an outlier, in the 2 to 4 millisecond range.
+occurrence, an outlier, in the 2 to 4 millisecond range.

 How this works: the function entry and return are traced using the kernel kprobe
 and kretprobe tracer. Timestamps are collected, the delta time calculated, which
 is the bucketized and stored as an in-kernel histogram for efficiency. The
-histgram is visible in the output: it's the "count" column; everything else is
+histogram is visible in the output: it's the "count" column; everything else is
 decoration. Only the count column is copied to user-level on output. This is an
 efficient way to time kernel functions and examine their latency distribution.

@@ -242,7 +242,7 @@ USAGE message:
 usage: funclatency [-h] [-p PID] [-i INTERVAL] [-T] [-u] [-m] [-F] [-r]
                   pattern

-Time kernel funcitons and print latency as a histogram
+Time kernel functions and print latency as a histogram

 positional arguments:
  pattern               search expression for kernel functions
@@ -260,7 +260,7 @@ optional arguments:
                        only.

 examples:
-    ./funclatency do_sys_open       # time the do_sys_open() kenel function
+    ./funclatency do_sys_open       # time the do_sys_open() kernel function
    ./funclatency -u vfs_read       # time vfs_read(), in microseconds
    ./funclatency -m do_nanosleep   # time do_nanosleep(), in milliseconds
    ./funclatency -mTi 5 vfs_read   # output every 5 seconds, with timestamps

--- a/tools/memleak.py
+++ b/tools/memleak.py
--- a/tools/memleak_examples.txt
+++ b/tools/memleak_examples.txt
+Demonstrations of memleak.
+
+
+memleak traces and matches memory allocation and deallocation requests, and
+collects call stacks for each allocation. memleak can then print a summary
+of which call stacks performed allocations that weren't subsequently freed.
+For example:
+
+# ./memleak.py -p $(pidof allocs)
+Attaching to malloc and free in pid 5193, Ctrl+C to quit.
+[11:16:33] Top 2 stacks with outstanding allocations:
+        80 bytes in 5 allocations from stack
+                 main+0x6d [/home/vagrant/allocs] (400862) 
+                 __libc_start_main+0xf0 [/usr/lib64/libc-2.21.so] (7fd460ac2790) 
+
+[11:16:34] Top 2 stacks with outstanding allocations:
+        160 bytes in 10 allocations from stack
+                 main+0x6d [/home/vagrant/allocs] (400862) 
+                 __libc_start_main+0xf0 [/usr/lib64/libc-2.21.so] (7fd460ac2790) 
+
+
+Each entry printed is a set of allocations that originate from the same call
+stack, and that weren't freed yet. The number of bytes and number of allocs
+are followed by the call stack, top to bottom, of the allocation site.
+
+As time goes on, it becomes apparent that the main function in the allocs
+process is leaking memory, 16 bytes at a time. Fortunately, you don't have to
+inspect each allocation individually -- you get a nice summary of which stack
+is responsible for a large leak.
+
+Occasionally, you do want the individual allocation details. Perhaps the same
+stack is allocating various sizes and you want to confirm which sizes are 
+prevalent. Use the -a switch:
+
+# ./memleak.py -p $(pidof allocs) -a
+Attaching to malloc and free in pid 5193, Ctrl+C to quit.
+[11:16:33] Top 2 stacks with outstanding allocations:
+        addr = 948cd0 size = 16
+        addr = 948d10 size = 16
+        addr = 948d30 size = 16
+        addr = 948cf0 size = 16
+        64 bytes in 4 allocations from stack
+                 main+0x6d [/home/vagrant/allocs] (400862) 
+                 __libc_start_main+0xf0 [/usr/lib64/libc-2.21.so] (7fd460ac2790) 
+
+[11:16:34] Top 2 stacks with outstanding allocations:
+        addr = 948d50 size = 16
+        addr = 948cd0 size = 16
+        addr = 948d10 size = 16
+        addr = 948d30 size = 16
+        addr = 948cf0 size = 16
+        addr = 948dd0 size = 16
+        addr = 948d90 size = 16
+        addr = 948db0 size = 16
+        addr = 948d70 size = 16
+        addr = 948df0 size = 16
+        160 bytes in 10 allocations from stack
+                 main+0x6d [/home/vagrant/allocs] (400862) 
+                 __libc_start_main+0xf0 [/usr/lib64/libc-2.21.so] (7fd460ac2790) 
+
+
+When using the -p switch, memleak traces the allocations of a particular
+process. Without this switch, kernel allocations (kmalloc) are traced instead.
+For example:
+
+# ./memleak.py
+Attaching to kmalloc and kfree, Ctrl+C to quit.
+...
+        248 bytes in 4 allocations from stack
+                 bpf_prog_load [kernel] (ffffffff8118c471) 
+                 sys_bpf [kernel] (ffffffff8118c8b5) 
+
+        328 bytes in 1 allocations from stack
+                 perf_mmap [kernel] (ffffffff811990fd) 
+                 mmap_region [kernel] (ffffffff811df5d4) 
+                 do_mmap [kernel] (ffffffff811dfb83) 
+                 vm_mmap_pgoff [kernel] (ffffffff811c494f) 
+                 sys_mmap_pgoff [kernel] (ffffffff811ddf02) 
+                 sys_mmap [kernel] (ffffffff8101b0ab) 
+
+        464 bytes in 1 allocations from stack
+                 traceprobe_command [kernel] (ffffffff81187cf2) 
+                 traceprobe_probes_write [kernel] (ffffffff81187d86) 
+                 probes_write [kernel] (ffffffff81181580) 
+                 __vfs_write [kernel] (ffffffff812237b7) 
+                 vfs_write [kernel] (ffffffff81223ec6) 
+                 sys_write [kernel] (ffffffff81224b85) 
+                 entry_SYSCALL_64_fastpath [kernel] (ffffffff8178182e) 
+
+        8192 bytes in 1 allocations from stack
+                 alloc_and_copy_ftrace_hash.constprop.59 [kernel] (ffffffff8115d17e) 
+                 ftrace_set_hash [kernel] (ffffffff8115e767) 
+                 ftrace_set_filter_ip [kernel] (ffffffff8115e9a8) 
+                 arm_kprobe [kernel] (ffffffff81148600) 
+                 enable_kprobe [kernel] (ffffffff811486f6) 
+                 kprobe_register [kernel] (ffffffff81182399) 
+                 perf_trace_init [kernel] (ffffffff8117c4e0) 
+                 perf_tp_event_init [kernel] (ffffffff81192479) 
+
+
+Here you can see that arming the kprobe to which our eBPF program is attached
+consumed 8KB of memory. Loading the BPF program also consumed a couple hundred
+bytes (in bpf_prog_load).
+
+memleak stores each allocated block along with its size, timestamp, and the
+stack that allocated it. When the block is deleted, this information is freed
+to reduce the memory overhead.
+
+To avoid false positives, allocations younger than a certain age (500ms by
+default) are not printed. To change this threshold, use the -o switch.
+
+By default, memleak prints its output every 5 seconds. To change this 
+interval, pass the interval as a positional parameter to memleak. You can 
+also control the number of times the output will be printed before exiting.
+For example:
+
+# ./memleak.py 1 10
+
+... will print the outstanding allocation statistics every second, for ten
+times, and then exit. 
+
+memleak may introduce considerable overhead if your application or kernel is
+allocating and freeing memory at a very high rate. In that case, you can 
+control the overhead by sampling every N-th allocation. For example, to sample
+roughly 10% of the allocations and print the outstanding allocations every 5
+seconds, 3 times before quitting:
+
+# ./memleak.py -p $(pidof allocs) -s 10 5 3
+Attaching to malloc and free in pid 2614, Ctrl+C to quit.
+[11:16:33] Top 2 stacks with outstanding allocations:
+        16 bytes in 1 allocations from stack
+                 main+0x6d [/home/vagrant/allocs] (400862) 
+                 __libc_start_main+0xf0 [/usr/lib64/libc-2.21.so] (7fdc11ce8790) 
+
+[11:16:38] Top 2 stacks with outstanding allocations:
+        16 bytes in 1 allocations from stack
+                 main+0x6d [/home/vagrant/allocs] (400862) 
+                 __libc_start_main+0xf0 [/usr/lib64/libc-2.21.so] (7fdc11ce8790) 
+
+[11:16:43] Top 2 stacks with outstanding allocations:
+        32 bytes in 2 allocations from stack
+                 main+0x6d [/home/vagrant/allocs] (400862) 
+                 __libc_start_main+0xf0 [/usr/lib64/libc-2.21.so] (7fdc11ce8790) 
+
+Note that even though the application leaks 16 bytes of memory every second, 
+the report (printed every 5 seconds) doesn't "see" all the allocations because
+of the sampling rate applied. 
+
+
+USAGE message:
+
+# ./memleak.py -h
+usage: memleak.py [-h] [-p PID] [-t] [-a] [-o OLDER] [-c COMMAND]
+                  [-s SAMPLE_RATE] [-d STACK_DEPTH] [-T TOP]
+                  [interval] [count]
+
+Trace outstanding memory allocations that weren't freed.
+Supports both user-mode allocations made with malloc/free and kernel-mode
+allocations made with kmalloc/kfree.
+
+  interval              interval in seconds to print outstanding allocations
+  count                 number of times to print the report before exiting
+
+optional arguments:
+  -h, --help            show this help message and exit
+  -p PID, --pid PID     the PID to trace; if not specified, trace kernel
+                        allocs
+  -t, --trace           print trace messages for each alloc/free call
+  -a, --show-allocs     show allocation addresses and sizes as well as call
+                        stacks
+  -o OLDER, --older OLDER
+                        prune allocations younger than this age in
+                        milliseconds
+  -c COMMAND, --command COMMAND
+                        execute and trace the specified command
+  -s SAMPLE_RATE, --sample-rate SAMPLE_RATE
+                        sample every N-th allocation to decrease the overhead
+  -d STACK_DEPTH, --stack_depth STACK_DEPTH
+                        maximum stack depth to capture
+  -T TOP, --top TOP     display only this many top allocating stacks (by size)
+  -z MIN_SIZE, --min-size MIN_SIZE
+                        capture only allocations larger than this size
+  -Z MAX_SIZE, --max-size MAX_SIZE
+                        capture only allocations smaller than this size
+
+EXAMPLES:
+
+./memleak.py -p $(pidof allocs)
+        Trace allocations and display a summary of "leaked" (outstanding)
+        allocations every 5 seconds
+./memleak.py -p $(pidof allocs) -t
+        Trace allocations and display each individual call to malloc/free
+./memleak.py -ap $(pidof allocs) 10
+        Trace allocations and display allocated addresses, sizes, and stacks
+        every 10 seconds for outstanding allocations
+./memleak.py -c "./allocs"
+        Run the specified command and trace its allocations
+./memleak.py
+        Trace allocations in kernel mode and display a summary of outstanding
+        allocations every 5 seconds
+./memleak.py -o 60000
+        Trace allocations in kernel mode and display a summary of outstanding
+        allocations that are at least one minute (60 seconds) old
+./memleak.py -s 5
+        Trace roughly every 5th allocation, to reduce overhead
--- a/tools/offcputime_example.txt
+++ b/tools/offcputime_example.txt
@@ -743,4 +743,4 @@ examples:
    ./offcputime 5           # trace for 5 seconds only
    ./offcputime -f 5        # 5 seconds, and output in folded format
    ./offcputime -u          # don't include kernel threads (user only)
-    ./offcputime -p 185      # trace fo PID 185 only
+    ./offcputime -p 185      # trace for PID 185 only
--- a/tools/offwaketime_example.txt
+++ b/tools/offwaketime_example.txt
@@ -165,7 +165,7 @@ via vfs_read() and the other doing a link_path_walk(). There is also a vmstat(8)
 stack showing it sleeping between intervals, and an sshd(8) stack showing it
 waiting on a file descriptor for input.

-The stack shown at the bottom is the off-CPU stack beloning to the task name
+The stack shown at the bottom is the off-CPU stack belonging to the task name
 shown after "target:". Then there is a separator, "-", and above it the waker
 stack and the waker task name after "waker:". The wakeup stack is printed
 in reverse order.

--- a/tools/oomkill.py
+++ b/tools/oomkill.py
+#!/usr/bin/env python
+#
+# oomkill   Trace oom_kill_process(). For Linux, uses BCC, eBPF.
+#
+# This traces the kernel out-of-memory killer, and prints basic details,
+# including the system load averages. This can provide more context on the
+# system state at the time of OOM: was it getting busier or steady, based
+# on the load averages? This tool may also be useful to customize for
+# investigations; for example, by adding other task_struct details at the time
+# of OOM.
+#
+# Copyright 2016 Netflix, Inc.
+# Licensed under the Apache License, Version 2.0 (the "License")
+#
+# 09-Feb-2016   Brendan Gregg   Created this.
+
+from bcc import BPF
+from time import strftime
+import ctypes as ct
+
+# linux stats
+loadavg = "/proc/loadavg"
+
+# define BPF program
+bpf_text = """
+#include <uapi/linux/ptrace.h>
+#include <linux/oom.h>
+
+struct data_t {
+    u64 fpid;
+    u64 tpid;
+    u64 pages;
+    char fcomm[TASK_COMM_LEN];
+    char tcomm[TASK_COMM_LEN];
+};
+
+BPF_PERF_OUTPUT(events);
+
+void kprobe__oom_kill_process(struct pt_regs *ctx, struct oom_control *oc,
+    struct task_struct *p, unsigned int points, unsigned long totalpages)
+{
+    struct data_t data = {};
+    u32 pid = bpf_get_current_pid_tgid();
+    data.fpid = pid;
+    data.tpid = p->pid;
+    data.pages = totalpages;
+    bpf_get_current_comm(&data.fcomm, sizeof(data.fcomm));
+    bpf_probe_read(&data.tcomm, sizeof(data.tcomm), p->comm);
+    events.perf_submit(ctx, &data, sizeof(data));
+}
+"""
+
+# kernel->user event data: struct data_t
+TASK_COMM_LEN = 16  # linux/sched.h
+class Data(ct.Structure):
+    _fields_ = [
+        ("fpid", ct.c_ulonglong),
+        ("tpid", ct.c_ulonglong),
+        ("pages", ct.c_ulonglong),
+        ("fcomm", ct.c_char * TASK_COMM_LEN),
+        ("tcomm", ct.c_char * TASK_COMM_LEN)
+    ]
+
+# process event
+def print_event(cpu, data, size):
+    event = ct.cast(data, ct.POINTER(Data)).contents
+    with open(loadavg) as stats:
+        avgline = stats.read().rstrip()
+    print(("%s Triggered by PID %d (\"%s\"), OOM kill of PID %d (\"%s\")"
+        ", %d pages, loadavg: %s") % (strftime("%H:%M:%S"), event.fpid,
+        event.fcomm, event.tpid, event.tcomm, event.pages, avgline))
+
+# initialize BPF
+b = BPF(text=bpf_text)
+print("Tracing OOM kills... Ctrl-C to stop.")
+b["events"].open_perf_buffer(print_event)
+while 1:
+    b.kprobe_poll()
--- a/tools/oomkill_example.txt
+++ b/tools/oomkill_example.txt
+Demonstrations of oomkill, the Linux eBPF/bcc version.
+
+
+oomkill is a simple program that traces the Linux out-of-memory (OOM) killer,
+and shows basic details on one line per OOM kill:
+
+# ./oomkill
+Tracing oom_kill_process()... Ctrl-C to end.
+21:03:39 Triggered by PID 3297 ("ntpd"), OOM kill of PID 22516 ("perl"), 3850642 pages, loadavg: 0.99 0.39 0.30 3/282 22724
+21:03:48 Triggered by PID 22517 ("perl"), OOM kill of PID 22517 ("perl"), 3850642 pages, loadavg: 0.99 0.41 0.30 2/282 22932
+
+The first line shows that PID 22516, with process name "perl", was OOM killed
+when it reached 3850642 pages (usually 4 Kbytes per page). This OOM kill
+happened to be triggered by PID 3297, process name "ntpd", doing some memory
+allocation.
+
+The system log (dmesg) shows pages of details and system context about an OOM
+kill. What it currently lacks, however, is context on how the system had been
+changing over time. I've seen OOM kills where I wanted to know if the system
+was at steady state at the time, or if there had been a recent increase in
+workload that triggered the OOM event. oomkill provides some context: at the
+end of the line is the load average information from /proc/loadavg. For both
+of the oomkills here, we can see that the system was getting busier at the
+time (a higher 1 minute "average" of 0.99, compared to the 15 minute "average"
+of 0.30).
+
+oomkill can also be the basis of other tools and customizations. For example,
+you can edit it to include other task_struct details from the target PID at
+the time of the OOM kill.
+
+
+The following commands can be used to test this program, and invoke a memory
+consuming process that exhausts system memory and is OOM killed:
+
+sysctl -w vm.overcommit_memory=1              # always overcommit
+perl -e 'while (1) { $a .= "A" x 1024; }'     # eat all memory
+
+WARNING: This exhausts system memory after disabling some overcommit checks.
+Only test in a lab environment.
--- a/tools/opensnoop.py
+++ b/tools/opensnoop.py
@@ -63,7 +63,7 @@ int kretprobe__sys_open(struct pt_regs *ctx)
        return 0;
    }

-    bpf_trace_printk("%s %d\\n", *filenamep, ret);
+    bpf_trace_printk("%d %s\\n", ret, *filenamep);
    args_filename.delete(&pid);

    return 0;
@@ -90,7 +90,7 @@ start_ts = 0
 # format output
 while 1:
    (task, pid, cpu, flags, ts, msg) = b.trace_fields()
-    (filename, ret_s) = msg.split(" ")
+    (ret_s, filename) = msg.split(" ", 1)

    ret = int(ret_s)
    if (args.failed and (ret >= 0)):

--- a/tools/runqlat.py
+++ b/tools/runqlat.py
--- a/tools/runqlat_example.txt
+++ b/tools/runqlat_example.txt
--- a/tools/softirqs_example.txt
+++ b/tools/softirqs_example.txt
@@ -60,7 +60,7 @@ net_rx_action                    15656
 This can be useful for quantifying where CPU cycles are spent among the soft
 interrupts (summarized as the %softirq column from mpstat(1), and shown as
 event counts in /proc/softirqs). The output above shows that most time was spent
-processing net_rx_action(), which was around 15 milleconds per second (total
+processing net_rx_action(), which was around 15 milliseconds per second (total
 time across all CPUs).



--- a/tools/stackcount_example.txt
+++ b/tools/stackcount_example.txt
@@ -376,7 +376,7 @@ Tracing 1 functions for "tcp_sendmsg"... Hit Ctrl-C to end.
 Detaching...

 If it wasn't clear how one function called another, knowing the instruction
-offset can help you locate the lines of code from a dissassembly dump.
+offset can help you locate the lines of code from a disassembly dump.


 A wildcard can also be used. Eg, all functions beginning with "tcp_send":

--- a/tools/stacksnoop_example.txt
+++ b/tools/stacksnoop_example.txt
@@ -3,7 +3,7 @@ Demonstrations of stacksnoop, the Linux eBPF/bcc version.

 This program traces the given kernel function and prints the kernel stack trace
 for every call. This tool is useful for studying low frequency kernel functions,
-to see how they were invoked. For exmaple, tracing the ext4_sync_fs() call:
+to see how they were invoked. For example, tracing the ext4_sync_fs() call:

 # ./stacksnoop ext4_sync_fs
 TIME(s)            STACK

--- a/tools/statsnoop.py
+++ b/tools/statsnoop.py
+#!/usr/bin/python
+# @lint-avoid-python-3-compatibility-imports
+#
+# statsnoop Trace stat() syscalls.
+#           For Linux, uses BCC, eBPF. Embedded C.
+#
+# USAGE: statsnoop [-h] [-t] [-x] [-p PID]
+#
+# Copyright 2016 Netflix, Inc.
+# Licensed under the Apache License, Version 2.0 (the "License")
+#
+# 08-Feb-2016   Brendan Gregg   Created this.
+
+from __future__ import print_function
+from bcc import BPF
+import argparse
+
+# arguments
+examples = """examples:
+    ./statsnoop           # trace all stat() syscalls
+    ./statsnoop -t        # include timestamps
+    ./statsnoop -x        # only show failed stats
+    ./statsnoop -p 181    # only trace PID 181
+"""
+parser = argparse.ArgumentParser(
+    description="Trace stat() syscalls",
+    formatter_class=argparse.RawDescriptionHelpFormatter,
+    epilog=examples)
+parser.add_argument("-t", "--timestamp", action="store_true",
+    help="include timestamp on output")
+parser.add_argument("-x", "--failed", action="store_true",
+    help="only show failed stats")
+parser.add_argument("-p", "--pid",
+    help="trace this PID only")
+args = parser.parse_args()
+debug = 0
+
+# define BPF program
+bpf_text = """
+#include <uapi/linux/ptrace.h>
+
+BPF_HASH(args_filename, u32, const char *);
+
+int trace_entry(struct pt_regs *ctx, const char __user *filename)
+{
+    u32 pid = bpf_get_current_pid_tgid();
+
+    FILTER
+    args_filename.update(&pid, &filename);
+
+    return 0;
+};
+
+int trace_return(struct pt_regs *ctx)
+{
+    const char **filenamep;
+    int ret = ctx->ax;
+    u32 pid = bpf_get_current_pid_tgid();
+
+    filenamep = args_filename.lookup(&pid);
+    if (filenamep == 0) {
+        // missed entry
+        return 0;
+    }
+
+    bpf_trace_printk("%d %s\\n", ret, *filenamep);
+    args_filename.delete(&pid);
+
+    return 0;
+}
+"""
+if args.pid:
+    bpf_text = bpf_text.replace('FILTER',
+        'if (pid != %s) { return 0; }' % args.pid)
+else:
+    bpf_text = bpf_text.replace('FILTER', '')
+if debug:
+    print(bpf_text)
+
+# initialize BPF
+b = BPF(text=bpf_text)
+b.attach_kprobe(event="sys_stat", fn_name="trace_entry")
+b.attach_kprobe(event="sys_statfs", fn_name="trace_entry")
+b.attach_kprobe(event="sys_newstat", fn_name="trace_entry")
+b.attach_kretprobe(event="sys_stat", fn_name="trace_return")
+b.attach_kretprobe(event="sys_statfs", fn_name="trace_return")
+b.attach_kretprobe(event="sys_newstat", fn_name="trace_return")
+
+# header
+if args.timestamp:
+    print("%-14s" % ("TIME(s)"), end="")
+print("%-6s %-16s %4s %3s %s" % ("PID", "COMM", "FD", "ERR", "PATH"))
+
+start_ts = 0
+
+# format output
+while 1:
+    (task, pid, cpu, flags, ts, msg) = b.trace_fields()
+    (ret_s, filename) = msg.split(" ", 1)
+
+    ret = int(ret_s)
+    if (args.failed and (ret >= 0)):
+        continue
+
+    # split return value into FD and errno columns
+    if ret >= 0:
+        fd_s = ret
+        err = 0
+    else:
+        fd_s = "-1"
+        err = - ret
+
+    # print columns
+    if args.timestamp:
+        if start_ts == 0:
+            start_ts = ts
+        print("%-14.9f" % (ts - start_ts), end="")
+    print("%-6d %-16s %4s %3s %s" % (pid, task, fd_s, err, filename))
--- a/tools/statsnoop_example.txt
+++ b/tools/statsnoop_example.txt
--- a/tools/wakeuptime_example.txt
+++ b/tools/wakeuptime_example.txt
@@ -467,4 +467,4 @@ examples:
    ./wakeuptime 5           # trace for 5 seconds only
    ./wakeuptime -f 5        # 5 seconds, and output in folded format
    ./wakeuptime -u          # don't include kernel threads (user only)
-    ./wakeuptime -p 185      # trace fo PID 185 only
+    ./wakeuptime -p 185      # trace for PID 185 only