Merge pull request #377 from goldshtn/argdist-tuples

Enhancements to argdist

Merge pull request #377 from goldshtn/argdist-tuples
Enhancements to argdist
61f5535f · 4ast · e01b6544 · 7df65da7 · 61f5535f · 61f5535f
Commit 61f5535f authored Feb 14, 2016 by 4ast
Expand all Hide whitespace changes
Inline Side-by-side

Showing with 584 additions and 169 deletions

man/man8/argdist.8 man/man8/argdist.8 +50 -21

tools/argdist.py tools/argdist.py +396 -103

tools/argdist_examples.txt tools/argdist_examples.txt +138 -45

No files found.
--- a/man/man8/argdist.8
+++ b/man/man8/argdist.8
@@ -2,14 +2,14 @@
 .SH NAME
 argdist \- Trace a function and display a histogram or frequency count of its parameter values. Uses Linux eBPF/bcc.
 .SH SYNOPSIS
-.B argdist [-h] [-p PID] [-z STRING_SIZE] [-i INTERVAL] [-n COUNT] [-H specifier [specifier ...]] [-C specifier [specifier ...]]
+.B argdist [-h] [-p PID] [-z STRING_SIZE] [-i INTERVAL] [-n COUNT] [-v] [-T TOP] [-H specifier [specifier ...]] [-C specifier [specifier ...]] [-I header [header ...]]
 .SH DESCRIPTION
 argdist attaches to function entry and exit points, collects specified parameter
 values, and stores them in a histogram or a frequency collection that counts
 the number of times a parameter value occurred. It can also filter parameter
 values and instrument multiple entry points at once.

-This currently only works on x86_64. Check for future versions.
+Since this uses BPF, only the root user can use this tool.
 .SH REQUIREMENTS
 CONFIG_BPF and bcc.
 .SH OPTIONS
@@ -24,20 +24,33 @@ Trace only functions in the process PID.
 When collecting string arguments (of type char*), collect up to STRING_SIZE 
 characters. Longer strings will be truncated.
 .TP
-i INTERVAL
+\-i INTERVAL
 Print the collected data every INTERVAL seconds. The default is 1 second.
 .TP
-n NUMBER
+\-n NUMBER
 Print the collected data COUNT times and then exit.
 .TP
-H SPECIFIER, -C SPECIFIER
+\-v
+Display the generated BPF program, for debugging purposes.
+.TP
+\-T TOP
+When collecting frequency counts, display only the top TOP entries.
+.TP
+\-H SPECIFIER, \-C SPECIFIER
 One or more probe specifications that instruct argdist which functions to
 probe, which parameters to collect, how to aggregate them, and whether to perform
 any filtering. See SPECIFIER SYNTAX below.
+.TP
+\-I HEADER
+One or more header files that should be included in the BPF program. This 
+enables the use of structure definitions, enumerations, and constants that
+are available in these headers. You should provide the same path you would
+include in the BPF program, e.g. 'linux/blkdev.h' or 'linux/time.h'. Note: in
+many cases, argdist will deduce the necessary header files automatically. 
 .SH SPECIFIER SYNTAX
 The general specifier syntax is as follows:

-.B {p,r}:[library]:function(signature)[:type:expr[:filter]][;label]
+.B {p,r}:[library]:function(signature)[:type[,type...]:expr[,expr...][:filter]][#label]
 .TP
 .B {p,r}
 Probe type \- "p" for function entry, "r" for function return;
@@ -47,8 +60,7 @@ count information, or aggregate the collected values into a histogram. Counting
 probes will collect the number of times every parameter value was observed,
 whereas histogram probes will collect the parameter values into a histogram.
 Only integral types can be used with histogram probes; there is no such limitation
-for counting probes. Return probes can only use the pseudo-variable $retval, which
-is an alias for the function's return value.
+for counting probes.
 .TP
 .B [library]
 Library containing the probe.
@@ -63,21 +75,30 @@ on the other hand, is only required if you plan to collect parameter values
 based on that signature. For example, if you only want to collect the first
 parameter, you don't have to specify the rest of the parameters in the signature.
 .TP
-.B [type]
-The type of the expression to capture.
+.B [type[,type...]]
+The type(s) of the expression(s) to capture.
 This is the type of the keys in the histogram or raw event collection that are
 collected by the probes.
 .TP
-.B [expr]
-The expression to capture.
+.B [expr[,expr...]]
+The expression(s) to capture.
 These are the values that are assigned to the histogram or raw event collection.
 You may use the parameters directly, or valid C expressions that involve the
 parameters, such as "size % 10".
+Return probes can use the argument values received by the
+function when it was entered, through the $entry(paramname) special variable.
+Return probes can also access the function's return value in $retval, and the
+function's execution time in nanoseconds in $latency. Note that adding the
+$latency or $entry(paramname) variables to the expression will introduce an
+additional probe at the function's entry to collect this data, and therefore
+introduce additional overhead.
 .TP
 .B [filter]
 The filter applied to the captured data.
 Only parameter values that pass the filter will be collected. This is any valid
 C expression that refers to the parameter values, such as "fd == 1 && length > 16".
+The $entry, $retval, and $latency variables can be used here as well, in return
+probes.
 .TP
 .B [label]
 The label that will be displayed when printing the probed values. By default,
@@ -86,39 +107,47 @@ this is the probe specifier.
 .TP
 Print a histogram of allocation sizes passed to kmalloc:
 #
-.B argdist.py -H 'p::__kmalloc(u64 size):u64:size'
+.B argdist -H 'p::__kmalloc(u64 size):u64:size'
 .TP
 Print a count of how many times process 1005 called malloc with an allocation size of 16 bytes:
 #
-.B argdist.py -p 1005 -C 'p:c:malloc(size_t size):size_t:size:size==16'
+.B argdist -p 1005 -C 'p:c:malloc(size_t size):size_t:size:size==16'
 .TP
 Snoop on all strings returned by gets():
 #
-.B argdist.py -C 'r:c:gets():char*:$retval'
+.B argdist -C 'r:c:gets():char*:$retval'
+.TP
+Print a histogram of read sizes that were longer than 1ms:
+#
+.B argdist -H 'r::__vfs_read(void *file, void *buf, size_t count):size_t:$entry(count):$latency > 1000000'
 .TP
 Print frequency counts of how many times writes were issued to a particular file descriptor number, in process 1005:
 #
-.B argdist.py -p 1005 -C 'p:c:write(int fd):int:fd'
+.B argdist -p 1005 -C 'p:c:write(int fd):int:fd'
 .TP
 Print a histogram of error codes returned by read() in process 1005:
 #
-.B argdist.py -p 1005 -H 'r:c:read()'
+.B argdist -p 1005 -H 'r:c:read()'
 .TP
 Print a histogram of buffer sizes passed to write() across all processes, where the file descriptor was 1 (STDOUT):
 #
-.B argdist.py -H 'p:c:write(int fd, const void *buf, size_t count):size_t:count:fd==1'
+.B argdist -H 'p:c:write(int fd, const void *buf, size_t count):size_t:count:fd==1'
 .TP
 Count fork() calls in libc across all processes, grouped by pid:
 #
-.B argdist.py -C 'p:c:fork():int:$PID;fork per process'
+.B argdist -C 'p:c:fork():int:$PID;fork per process'
 .TP
 Print histograms of sleep() and nanosleep() parameter values:
 #
-.B argdist.py -H 'p:c:sleep(u32 seconds):u32:seconds' -H 'p:c:nanosleep(struct timespec { time_t tv_sec; long tv_nsec; } *req):long:req->tv_nsec'
+.B argdist -H 'p:c:sleep(u32 seconds):u32:seconds' 'p:c:nanosleep(struct timespec *req):long:req->tv_nsec'
 .TP
 Spy on writes to STDOUT performed by process 2780, up to a string size of 120 characters:
 #
-.B argdist.py -p 2780 -z 120 -C 'p:c:write(int fd, char* buf, size_t len):char*:buf:fd==1'
+.B argdist -p 2780 -z 120 -C 'p:c:write(int fd, char* buf, size_t len):char*:buf:fd==1'
+.TP
+Group files being read from and the read sizes from __vfs_read:
+#
+.B argdist -C 'p::__vfs_read(struct file *file, void *buf, size_t count):char*,size_t:file->f_path.dentry->d_iname,count:file->f_path.dentry->d_iname[0]!=0'
 .SH SOURCE
 This is from bcc.
 .IP

--- a/tools/argdist.py
+++ b/tools/argdist.py
--- a/tools/argdist_examples.txt
+++ b/tools/argdist_examples.txt