Commit 61f5535f authored by 4ast's avatar 4ast

Merge pull request #377 from goldshtn/argdist-tuples

Enhancements to argdist
parents e01b6544 7df65da7
......@@ -2,14 +2,14 @@
.SH NAME
argdist \- Trace a function and display a histogram or frequency count of its parameter values. Uses Linux eBPF/bcc.
.SH SYNOPSIS
.B argdist [-h] [-p PID] [-z STRING_SIZE] [-i INTERVAL] [-n COUNT] [-H specifier [specifier ...]] [-C specifier [specifier ...]]
.B argdist [-h] [-p PID] [-z STRING_SIZE] [-i INTERVAL] [-n COUNT] [-v] [-T TOP] [-H specifier [specifier ...]] [-C specifier [specifier ...]] [-I header [header ...]]
.SH DESCRIPTION
argdist attaches to function entry and exit points, collects specified parameter
values, and stores them in a histogram or a frequency collection that counts
the number of times a parameter value occurred. It can also filter parameter
values and instrument multiple entry points at once.
This currently only works on x86_64. Check for future versions.
Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_BPF and bcc.
.SH OPTIONS
......@@ -24,20 +24,33 @@ Trace only functions in the process PID.
When collecting string arguments (of type char*), collect up to STRING_SIZE
characters. Longer strings will be truncated.
.TP
-i INTERVAL
\-i INTERVAL
Print the collected data every INTERVAL seconds. The default is 1 second.
.TP
-n NUMBER
\-n NUMBER
Print the collected data COUNT times and then exit.
.TP
-H SPECIFIER, -C SPECIFIER
\-v
Display the generated BPF program, for debugging purposes.
.TP
\-T TOP
When collecting frequency counts, display only the top TOP entries.
.TP
\-H SPECIFIER, \-C SPECIFIER
One or more probe specifications that instruct argdist which functions to
probe, which parameters to collect, how to aggregate them, and whether to perform
any filtering. See SPECIFIER SYNTAX below.
.TP
\-I HEADER
One or more header files that should be included in the BPF program. This
enables the use of structure definitions, enumerations, and constants that
are available in these headers. You should provide the same path you would
include in the BPF program, e.g. 'linux/blkdev.h' or 'linux/time.h'. Note: in
many cases, argdist will deduce the necessary header files automatically.
.SH SPECIFIER SYNTAX
The general specifier syntax is as follows:
.B {p,r}:[library]:function(signature)[:type:expr[:filter]][;label]
.B {p,r}:[library]:function(signature)[:type[,type...]:expr[,expr...][:filter]][#label]
.TP
.B {p,r}
Probe type \- "p" for function entry, "r" for function return;
......@@ -47,8 +60,7 @@ count information, or aggregate the collected values into a histogram. Counting
probes will collect the number of times every parameter value was observed,
whereas histogram probes will collect the parameter values into a histogram.
Only integral types can be used with histogram probes; there is no such limitation
for counting probes. Return probes can only use the pseudo-variable $retval, which
is an alias for the function's return value.
for counting probes.
.TP
.B [library]
Library containing the probe.
......@@ -63,21 +75,30 @@ on the other hand, is only required if you plan to collect parameter values
based on that signature. For example, if you only want to collect the first
parameter, you don't have to specify the rest of the parameters in the signature.
.TP
.B [type]
The type of the expression to capture.
.B [type[,type...]]
The type(s) of the expression(s) to capture.
This is the type of the keys in the histogram or raw event collection that are
collected by the probes.
.TP
.B [expr]
The expression to capture.
.B [expr[,expr...]]
The expression(s) to capture.
These are the values that are assigned to the histogram or raw event collection.
You may use the parameters directly, or valid C expressions that involve the
parameters, such as "size % 10".
Return probes can use the argument values received by the
function when it was entered, through the $entry(paramname) special variable.
Return probes can also access the function's return value in $retval, and the
function's execution time in nanoseconds in $latency. Note that adding the
$latency or $entry(paramname) variables to the expression will introduce an
additional probe at the function's entry to collect this data, and therefore
introduce additional overhead.
.TP
.B [filter]
The filter applied to the captured data.
Only parameter values that pass the filter will be collected. This is any valid
C expression that refers to the parameter values, such as "fd == 1 && length > 16".
The $entry, $retval, and $latency variables can be used here as well, in return
probes.
.TP
.B [label]
The label that will be displayed when printing the probed values. By default,
......@@ -86,39 +107,47 @@ this is the probe specifier.
.TP
Print a histogram of allocation sizes passed to kmalloc:
#
.B argdist.py -H 'p::__kmalloc(u64 size):u64:size'
.B argdist -H 'p::__kmalloc(u64 size):u64:size'
.TP
Print a count of how many times process 1005 called malloc with an allocation size of 16 bytes:
#
.B argdist.py -p 1005 -C 'p:c:malloc(size_t size):size_t:size:size==16'
.B argdist -p 1005 -C 'p:c:malloc(size_t size):size_t:size:size==16'
.TP
Snoop on all strings returned by gets():
#
.B argdist.py -C 'r:c:gets():char*:$retval'
.B argdist -C 'r:c:gets():char*:$retval'
.TP
Print a histogram of read sizes that were longer than 1ms:
#
.B argdist -H 'r::__vfs_read(void *file, void *buf, size_t count):size_t:$entry(count):$latency > 1000000'
.TP
Print frequency counts of how many times writes were issued to a particular file descriptor number, in process 1005:
#
.B argdist.py -p 1005 -C 'p:c:write(int fd):int:fd'
.B argdist -p 1005 -C 'p:c:write(int fd):int:fd'
.TP
Print a histogram of error codes returned by read() in process 1005:
#
.B argdist.py -p 1005 -H 'r:c:read()'
.B argdist -p 1005 -H 'r:c:read()'
.TP
Print a histogram of buffer sizes passed to write() across all processes, where the file descriptor was 1 (STDOUT):
#
.B argdist.py -H 'p:c:write(int fd, const void *buf, size_t count):size_t:count:fd==1'
.B argdist -H 'p:c:write(int fd, const void *buf, size_t count):size_t:count:fd==1'
.TP
Count fork() calls in libc across all processes, grouped by pid:
#
.B argdist.py -C 'p:c:fork():int:$PID;fork per process'
.B argdist -C 'p:c:fork():int:$PID;fork per process'
.TP
Print histograms of sleep() and nanosleep() parameter values:
#
.B argdist.py -H 'p:c:sleep(u32 seconds):u32:seconds' -H 'p:c:nanosleep(struct timespec { time_t tv_sec; long tv_nsec; } *req):long:req->tv_nsec'
.B argdist -H 'p:c:sleep(u32 seconds):u32:seconds' 'p:c:nanosleep(struct timespec *req):long:req->tv_nsec'
.TP
Spy on writes to STDOUT performed by process 2780, up to a string size of 120 characters:
#
.B argdist.py -p 2780 -z 120 -C 'p:c:write(int fd, char* buf, size_t len):char*:buf:fd==1'
.B argdist -p 2780 -z 120 -C 'p:c:write(int fd, char* buf, size_t len):char*:buf:fd==1'
.TP
Group files being read from and the read sizes from __vfs_read:
#
.B argdist -C 'p::__vfs_read(struct file *file, void *buf, size_t count):char*,size_t:file->f_path.dentry->d_iname,count:file->f_path.dentry->d_iname[0]!=0'
.SH SOURCE
This is from bcc.
.IP
......
#!/usr/bin/env python
#
# argdist.py Trace a function and display a distribution of its
# argdist Trace a function and display a distribution of its
# parameter values as a histogram or frequency count.
#
# USAGE: argdist.py [-h] [-p PID] [-z STRING_SIZE] [-i INTERVAL]
# [-n COUNT] [-C specifier [specifier ...]]
# USAGE: argdist [-h] [-p PID] [-z STRING_SIZE] [-i INTERVAL]
# [-n COUNT] [-v] [-T TOP]
# [-C specifier [specifier ...]]
# [-H specifier [specifier ...]]
# [-I header [header ...]]
#
# Licensed under the Apache License, Version 2.0 (the "License")
# Copyright (C) 2016 Sasha Goldshtein.
......@@ -13,22 +15,41 @@
from bcc import BPF
from time import sleep, strftime
import argparse
import re
class Specifier(object):
text = """
probe_text = """
DATA_DECL
int PROBENAME(struct pt_regs *ctx SIGNATURE)
{
PREFIX
PID_FILTER
KEY_EXPR
if (!(FILTER)) return 0;
KEY_EXPR
COLLECT
return 0;
}
"""
next_probe_index = 0
aliases = { "$PID": "bpf_get_current_pid_tgid()" }
auto_includes = {
"linux/time.h" : ["time"],
"linux/fs.h" : ["fs", "file"],
"linux/blkdev.h" : ["bio", "request"],
"linux/slab.h" : ["alloc"]
}
@staticmethod
def generate_auto_includes(specifiers):
headers = ""
for header, keywords in Specifier.auto_includes.items():
for keyword in keywords:
for specifier in specifiers:
if keyword in specifier:
headers += "#include <%s>\n" \
% header
return headers
def _substitute_aliases(self, expr):
if expr is None:
......@@ -37,45 +58,198 @@ int PROBENAME(struct pt_regs *ctx SIGNATURE)
expr = expr.replace(alias, subst)
return expr
def _parse_signature(self):
params = map(str.strip, self.signature.split(','))
self.param_types = {}
for param in params:
# If the type is a pointer, the * can be next to the
# param name. Other complex types like arrays are not
# supported right now.
index = param.rfind('*')
index = index if index != -1 else param.rfind(' ')
param_type = param[0:index+1].strip()
param_name = param[index+1:].strip()
self.param_types[param_name] = param_type
entry_probe_text = """
int PROBENAME(struct pt_regs *ctx SIGNATURE)
{
u32 pid = bpf_get_current_pid_tgid();
PID_FILTER
COLLECT
return 0;
}
"""
def _generate_entry(self):
self.entry_probe_func = self.probe_func_name + "_entry"
text = self.entry_probe_text
text = text.replace("PROBENAME", self.entry_probe_func)
text = text.replace("SIGNATURE",
"" if len(self.signature) == 0 else ", " + self.signature)
pid_filter = "" if self.is_user or self.pid is None \
else "if (pid != %d) { return 0; }" % self.pid
text = text.replace("PID_FILTER", pid_filter)
collect = ""
for pname in self.args_to_probe:
param_hash = self.hashname_prefix + pname
if pname == "__latency":
collect += """
u64 __time = bpf_ktime_get_ns();
%s.update(&pid, &__time);
""" % param_hash
else:
collect += "%s.update(&pid, &%s);\n" % \
(param_hash, pname)
text = text.replace("COLLECT", collect)
return text
def _generate_entry_probe(self):
# Any $entry(name) expressions result in saving that argument
# when entering the function.
self.args_to_probe = set()
regex = r"\$entry\((\w+)\)"
for expr in self.exprs:
for arg in re.finditer(regex, expr):
self.args_to_probe.add(arg.group(1))
for arg in re.finditer(regex, self.filter):
self.args_to_probe.add(arg.group(1))
if any(map(lambda expr: "$latency" in expr, self.exprs)) or \
"$latency" in self.filter:
self.args_to_probe.add("__latency")
self.param_types["__latency"] = "u64" # nanoseconds
for pname in self.args_to_probe:
if pname not in self.param_types:
raise ValueError("$entry(%s): no such param" \
% arg)
self.hashname_prefix = "%s_param_" % self.probe_hash_name
text = ""
for pname in self.args_to_probe:
# Each argument is stored in a separate hash that is
# keyed by pid.
text += "BPF_HASH(%s, u32, %s);\n" % \
(self.hashname_prefix + pname,
self.param_types[pname])
text += self._generate_entry()
return text
def _generate_retprobe_prefix(self):
# After we're done here, there are __%s_val variables for each
# argument we needed to probe using $entry(name), and they all
# have values (which isn't necessarily the case if we missed
# the method entry probe).
text = "u32 __pid = bpf_get_current_pid_tgid();\n"
self.param_val_names = {}
for pname in self.args_to_probe:
val_name = "__%s_val" % pname
text += "%s *%s = %s.lookup(&__pid);\n" % \
(self.param_types[pname], val_name,
self.hashname_prefix + pname)
text += "if (%s == 0) { return 0 ; }\n" % val_name
self.param_val_names[pname] = val_name
return text
def _replace_entry_exprs(self):
for pname, vname in self.param_val_names.items():
if pname == "__latency":
entry_expr = "$latency"
val_expr = "(bpf_ktime_get_ns() - *%s)" % vname
else:
entry_expr = "$entry(%s)" % pname
val_expr = "(*%s)" % vname
for i in range(0, len(self.exprs)):
self.exprs[i] = self.exprs[i].replace(
entry_expr, val_expr)
self.filter = self.filter.replace(entry_expr,
val_expr)
def _attach_entry_probe(self):
if self.is_user:
self.bpf.attach_uprobe(name=self.library,
sym=self.function,
fn_name=self.entry_probe_func,
pid=self.pid or -1)
else:
self.bpf.attach_kprobe(event=self.function,
fn_name=self.entry_probe_func)
def _bail(self, error):
raise ValueError("error parsing probe '%s': %s" %
(self.raw_spec, error))
def _validate_specifier(self):
# Everything after '#' is the probe label, ignore it
spec = self.raw_spec.split('#')[0]
parts = spec.strip().split(':')
if len(parts) < 3:
self._bail("at least the probe type, library, and " +
"function signature must be specified")
if len(parts) > 6:
self._bail("extraneous ':'-separated parts detected")
if parts[0] not in ["r", "p"]:
self._bail("probe type must be either 'p' or 'r', " +
"but got '%s'" % parts[0])
if re.match(r"\w+\(.*\)", parts[2]) is None:
self._bail(("function signature '%s' has an invalid " +
"format") % parts[2])
def _parse_expr_types(self, expr_types):
if len(expr_types) == 0:
self._bail("no expr types specified")
self.expr_types = expr_types.split(',')
def _parse_exprs(self, exprs):
if len(exprs) == 0:
self._bail("no exprs specified")
self.exprs = exprs.split(',')
def __init__(self, type, specifier, pid):
self.raw_spec = specifier
spec_and_label = specifier.split(';')
self.raw_spec = specifier
self._validate_specifier()
spec_and_label = specifier.split('#')
self.label = spec_and_label[1] \
if len(spec_and_label) == 2 else None
parts = spec_and_label[0].strip().split(':')
if len(parts) < 3 or len(parts) > 6:
raise ValueError("invalid specifier format")
self.type = type # hist or freq
self.is_ret_probe = parts[0] == "r"
if self.type != "hist" and self.type != "freq":
raise ValueError("unrecognized probe type")
if parts[0] not in ["r", "p"]:
raise ValueError("unrecognized probe type")
self.library = parts[1]
self.is_user = len(self.library) > 0
fparts = parts[2].split('(')
if len(fparts) != 2:
raise ValueError("invalid specifier format")
self.function = fparts[0]
self.signature = fparts[1][:-1]
self.function = fparts[0].strip()
self.signature = fparts[1].strip()[:-1]
self._parse_signature()
# If the user didn't specify an expression to probe, we probe
# the retval in a ret probe, or simply the value "1" otherwise.
self.is_default_expr = len(parts) < 5
if not self.is_default_expr:
self.expr_type = parts[3]
self.expr = parts[4]
self._parse_expr_types(parts[3])
self._parse_exprs(parts[4])
if len(self.exprs) != len(self.expr_types):
self._bail("mismatched # of exprs and types")
if self.type == "hist" and len(self.expr_types) > 1:
self._bail("histograms can only have 1 expr")
else:
if not self.is_ret_probe and self.type == "hist":
raise ValueError("dist probes must have expr")
self.expr_type = \
"u64" if not self.is_ret_probe else "int"
self.expr = "1" if not self.is_ret_probe else "$retval"
self.expr = self.expr.replace("$retval",
"(%s)ctx->ax" % self.expr_type)
self.filter = None if len(parts) != 6 else parts[5]
if self.filter is not None:
self.filter = self.filter.replace("$retval",
"(%s)ctx->ax" % self.expr_type)
self.expr = self._substitute_aliases(self.expr)
self.filter = self._substitute_aliases(self.filter)
self._bail("histograms must have expr")
self.expr_types = \
["u64" if not self.is_ret_probe else "int"]
self.exprs = \
["1" if not self.is_ret_probe else "$retval"]
self.filter = "" if len(parts) != 6 else parts[5]
self._substitute_exprs()
# Do we need to attach an entry probe so that we can collect an
# argument that is required for an exit (return) probe?
def check(expr):
keywords = ["$entry", "$latency"]
return any(map(lambda kw: kw in expr, keywords))
self.entry_probe_required = self.is_ret_probe and \
(any(map(check, self.exprs)) or check(self.filter))
self.pid = pid
self.probe_func_name = "%s_probe%d" % \
(self.function, Specifier.next_probe_index)
......@@ -83,51 +257,107 @@ int PROBENAME(struct pt_regs *ctx SIGNATURE)
(self.function, Specifier.next_probe_index)
Specifier.next_probe_index += 1
def _is_string_probe(self):
return self.expr_type == "char*" or self.expr_type == "char *"
def _substitute_exprs(self):
def repl(expr):
expr = self._substitute_aliases(expr)
return expr.replace("$retval", "ctx->ax")
for i in range(0, len(self.exprs)):
self.exprs[i] = repl(self.exprs[i])
self.filter = repl(self.filter)
def generate_text(self, string_size):
program = self.text.replace("PROBENAME", self.probe_func_name)
signature = "" if len(self.signature) == 0 \
else "," + self.signature
program = program.replace("SIGNATURE", signature)
def _is_string(self, expr_type):
return expr_type == "char*" or expr_type == "char *"
def _generate_hash_field(self, i):
if self._is_string(self.expr_types[i]):
return "struct __string_t v%d;\n" % i
else:
return "%s v%d;\n" % (self.expr_types[i], i)
def _generate_field_assignment(self, i):
if self._is_string(self.expr_types[i]):
return "bpf_probe_read(" + \
"&__key.v%d.s, sizeof(__key.v%d.s), %s);\n" % \
(i, i, self.exprs[i])
else:
return "__key.v%d = %s;\n" % (i, self.exprs[i])
def _generate_hash_decl(self):
if self.type == "hist":
return "BPF_HISTOGRAM(%s, %s);" % \
(self.probe_hash_name, self.expr_types[0])
else:
text = "struct %s_key_t {\n" % self.probe_hash_name
for i in range(0, len(self.expr_types)):
text += self._generate_hash_field(i)
text += "};\n"
text += "BPF_HASH(%s, struct %s_key_t, u64);\n" % \
(self.probe_hash_name, self.probe_hash_name)
return text
def _generate_key_assignment(self):
if self.type == "hist":
return "%s __key = %s;\n" % \
(self.expr_types[0], self.exprs[0])
else:
text = "struct %s_key_t __key = {};\n" % \
self.probe_hash_name
for i in range(0, len(self.exprs)):
text += self._generate_field_assignment(i)
return text
def _generate_hash_update(self):
if self.type == "hist":
return "%s.increment(bpf_log2l(__key));" % \
self.probe_hash_name
else:
return "%s.increment(__key);" % self.probe_hash_name
def _generate_pid_filter(self):
# Kernel probes need to explicitly filter pid, because the
# attach interface doesn't support pid filtering
if self.pid is not None and not self.is_user:
# kernel probes need to explicitly filter pid
program = program.replace("PID_FILTER",
"u32 pid = bpf_get_current_pid_tgid();\n" + \
"if (pid != %d) { return 0; }" % self.pid)
return "u32 pid = bpf_get_current_pid_tgid();\n" + \
"if (pid != %d) { return 0; }" % self.pid
else:
program = program.replace("PID_FILTER", "")
if self._is_string_probe():
decl = """
struct %s_key_t { char key[%d]; };
BPF_HASH(%s, struct %s_key_t, u64);
""" \
% (self.function, string_size,
self.probe_hash_name, self.function)
collect = "%s.increment(__key);" % self.probe_hash_name
key_expr = """
struct %s_key_t __key = {0};
bpf_probe_read(&__key.key, sizeof(__key.key), %s);
""" \
% (self.function, self.expr)
elif self.type == "freq":
decl = "BPF_HASH(%s, %s, u64);" % \
(self.probe_hash_name, self.expr_type)
collect = "%s.increment(__key);" % self.probe_hash_name
key_expr = "%s __key = %s;" % \
(self.expr_type, self.expr)
elif self.type == "hist":
decl = "BPF_HISTOGRAM(%s, %s);" % \
(self.probe_hash_name, self.expr_type)
collect = "%s.increment(bpf_log2l(__key));" % \
self.probe_hash_name
key_expr = "%s __key = %s;" % \
(self.expr_type, self.expr)
return ""
def generate_text(self):
# We don't like tools writing tools (Brendan Gregg), but this
# is an exception because we're letting the user fully
# customize the values we probe. As a rule of thumb though,
# try to build a custom tool for a specific purpose.
program = ""
# If any entry arguments are probed in a ret probe, we need
# to generate an entry probe to collect them
prefix = ""
if self.entry_probe_required:
program = self._generate_entry_probe()
prefix = self._generate_retprobe_prefix()
# Replace $entry(paramname) with a reference to the
# value we collected when entering the function:
self._replace_entry_exprs()
program += self.probe_text.replace("PROBENAME",
self.probe_func_name)
signature = "" if len(self.signature) == 0 \
or self.is_ret_probe \
else ", " + self.signature
program = program.replace("SIGNATURE", signature)
program = program.replace("PID_FILTER",
self._generate_pid_filter())
decl = self._generate_hash_decl()
key_expr = self._generate_key_assignment()
collect = self._generate_hash_update()
program = program.replace("DATA_DECL", decl)
program = program.replace("KEY_EXPR", key_expr)
program = program.replace("FILTER", self.filter or "1")
program = program.replace("FILTER",
"1" if len(self.filter) == 0 else self.filter)
program = program.replace("COLLECT", collect)
program = program.replace("PREFIX", prefix)
return program
def attach(self, bpf):
......@@ -150,81 +380,131 @@ bpf_probe_read(&__key.key, sizeof(__key.key), %s);
else:
bpf.attach_kprobe(event=self.function,
fn_name=self.probe_func_name)
if self.entry_probe_required:
self._attach_entry_probe()
def _v2s(self, v):
# Most fields can be converted with plain str(), but strings
# are wrapped in a __string_t which has an .s field
if "__string_t" in type(v).__name__:
return str(v.s)
return str(v)
def _display_expr(self, i):
# Replace ugly latency calculation with $latency
expr = self.exprs[i].replace(
"(bpf_ktime_get_ns() - *____latency_val)", "$latency")
# Replace alias values back with the alias name
for alias, subst in Specifier.aliases.items():
expr = expr.replace(subst, alias)
# Replace retval expression with $retval
expr = expr.replace("ctx->ax", "$retval")
# Replace ugly (*__param_val) expressions with param name
return re.sub(r"\(\*__(\w+)_val\)", r"\1", expr)
def display(self):
print(self.label or self.raw_spec)
def _display_key(self, key):
if self.is_default_expr:
if not self.is_ret_probe:
return "total calls"
else:
return "retval = %s" % str(key.v0)
else:
# The key object has v0, ..., vk fields containing
# the values of the expressions from self.exprs
def str_i(i):
key_i = self._v2s(getattr(key, "v%d" % i))
return "%s = %s" % \
(self._display_expr(i), key_i)
return ", ".join(map(str_i, range(0, len(self.exprs))))
def display(self, top):
data = self.bpf.get_table(self.probe_hash_name)
if self.type == "freq":
print(self.label or self.raw_spec)
print("\t%-10s %s" % ("COUNT", "EVENT"))
for key, value in sorted(data.items(),
key=lambda kv: kv[1].value):
key_val = key.key if self._is_string_probe() \
else str(key.value)
data = sorted(data.items(), key=lambda kv: kv[1].value)
if top is not None:
data = data[-top:]
for key, value in data:
# Print some nice values if the user didn't
# specify an expression to probe
if self.is_default_expr:
if not self.is_ret_probe:
key_str = "total calls"
else:
key_str = "retval = %s" % \
key_val
self._v2s(key.v0)
else:
key_str = "%s = %s" % \
(self.expr, key_val)
key_str = self._display_key(key)
print("\t%-10s %s" % \
(str(value.value), key_str))
elif self.type == "hist":
label = self.expr if not self.is_default_expr \
else "retval"
label = self.label or (self._display_expr(0)
if not self.is_default_expr else "retval")
data.print_log2_hist(val_type=label)
examples = """
Probe specifier syntax:
{p,r}:[library]:function(signature)[:type:expr[:filter]][;label]
{p,r}:[library]:function(signature)[:type[,type...]:expr[,expr...][:filter]][#label]
Where:
p,r -- probe at function entry or at function exit
in exit probes, only $retval is accessible
in exit probes: can use $retval, $entry(param), $latency
library -- the library that contains the function
(leave empty for kernel functions)
function -- the function name to trace
signature -- the function's parameters, as in the C header
type -- the type of the expression to collect
expr -- the expression to collect
type -- the type of the expression to collect (supports multiple)
expr -- the expression to collect (supports multiple)
filter -- the filter that is applied to collected values
label -- the label for this probe in the resulting output
EXAMPLES:
argdist.py -H 'p::__kmalloc(u64 size):u64:size'
argdist -H 'p::__kmalloc(u64 size):u64:size'
Print a histogram of allocation sizes passed to kmalloc
argdist.py -p 1005 -C 'p:c:malloc(size_t size):size_t:size:size==16'
argdist -p 1005 -C 'p:c:malloc(size_t size):size_t:size:size==16'
Print a frequency count of how many times process 1005 called malloc
with an allocation size of 16 bytes
argdist.py -C 'r:c:gets():char*:$retval;snooped strings'
argdist -C 'r:c:gets():char*:(char*)$retval#snooped strings'
Snoop on all strings returned by gets()
argdist.py -p 1005 -C 'p:c:write(int fd):int:fd'
argdist -H 'r::__kmalloc(size_t size):u64:$latency/$entry(size)#ns per byte'
Print a histogram of nanoseconds per byte from kmalloc allocations
argdist -C 'p::__kmalloc(size_t size, gfp_t flags):size_t:size:flags&GFP_ATOMIC'
Print frequency count of kmalloc allocation sizes that have GFP_ATOMIC
argdist -p 1005 -C 'p:c:write(int fd):int:fd' -T 5
Print frequency counts of how many times writes were issued to a
particular file descriptor number, in process 1005
particular file descriptor number, in process 1005, but only show
the top 5 busiest fds
argdist -p 1005 -H 'r:c:read()'
Print a histogram of results (sizes) returned by read() in process 1005
argdist.py -p 1005 -H 'r:c:read()'
Print a histogram of error codes returned by read() in process 1005
argdist -C 'r::__vfs_read():u32:$PID:$latency > 100000'
Print frequency of reads by process where the latency was >0.1ms
argdist.py -H \\
argdist -H 'r::__vfs_read(void *file, void *buf, size_t count):size_t:$entry(count):$latency > 1000000'
Print a histogram of read sizes that were longer than 1ms
argdist -H \\
'p:c:write(int fd, const void *buf, size_t count):size_t:count:fd==1'
Print a histogram of buffer sizes passed to write() across all
processes, where the file descriptor was 1 (STDOUT)
argdist.py -C 'p:c:fork();fork calls'
argdist -C 'p:c:fork()#fork calls'
Count fork() calls in libc across all processes
Can also use funccount.py, which is easier and more flexible
argdist.py \\
-H 'p:c:sleep(u32 seconds):u32:seconds' \\
-H 'p:c:nanosleep(struct timespec { time_t tv_sec; long tv_nsec; } *req):long:req->tv_nsec'
argdist -H \\
'p:c:sleep(u32 seconds):u32:seconds' \\
'p:c:nanosleep(struct timespec *req):long:req->tv_nsec'
Print histograms of sleep() and nanosleep() parameter values
argdist.py -p 2780 -z 120 \\
argdist -p 2780 -z 120 \\
-C 'p:c:write(int fd, char* buf, size_t len):char*:buf:fd==1'
Spy on writes to STDOUT performed by process 2780, up to a string size
of 120 characters
......@@ -242,12 +522,18 @@ parser.add_argument("-i", "--interval", default=1, type=int,
help="output interval, in seconds")
parser.add_argument("-n", "--number", type=int, dest="count",
help="number of outputs")
parser.add_argument("-v", "--verbose", action="store_true",
help="print resulting BPF program code before executing")
parser.add_argument("-T", "--top", type=int,
help="number of top results to show (not applicable to histograms)")
parser.add_argument("-H", "--histogram", nargs="*", dest="histspecifier",
metavar="specifier",
help="probe specifier to capture histogram of (see examples below)")
parser.add_argument("-C", "--count", nargs="*", dest="countspecifier",
metavar="specifier",
help="probe specifier to capture count of (see examples below)")
parser.add_argument("-v", "--verbose", action="store_true",
help="print resulting BPF program code before executing")
parser.add_argument("-I", "--include", nargs="*", metavar="header",
help="additional header files to include in the BPF program")
args = parser.parse_args()
specifiers = []
......@@ -259,9 +545,16 @@ if len(specifiers) == 0:
print("at least one specifier is required")
exit(1)
bpf_source = "#include <uapi/linux/ptrace.h>\n"
bpf_source = """
struct __string_t { char s[%d]; };
#include <uapi/linux/ptrace.h>
""" % args.string_size
for include in (args.include or []):
bpf_source += "#include <%s>\n" % include
bpf_source += Specifier.generate_auto_includes(map(lambda s: s.raw_spec, specifiers))
for specifier in specifiers:
bpf_source += specifier.generate_text(args.string_size)
bpf_source += specifier.generate_text()
if args.verbose:
print(bpf_source)
......@@ -279,7 +572,7 @@ while True:
exit()
print("[%s]" % strftime("%H:%M:%S"))
for specifier in specifiers:
specifier.display()
specifier.display(args.top)
count_so_far += 1
if args.count is not None and count_so_far >= args.count:
exit()
......@@ -10,7 +10,7 @@ various functions.
For example, suppose you want to find what allocation sizes are common in
your application:
# ./argdist.py -p 2420 -C 'p:c:malloc(size_t size):size_t:size'
# ./argdist -p 2420 -C 'p:c:malloc(size_t size):size_t:size'
[01:42:29]
p:c:malloc(size_t size):size_t:size
COUNT EVENT
......@@ -43,7 +43,7 @@ probed and its value was 16, repeatedly.
Now, suppose you wanted a histogram of buffer sizes passed to the write()
function across the system:
# ./argdist.py -H 'p:c:write(int fd, void *buf, size_t len):size_t:len'
# ./argdist -H 'p:c:write(int fd, void *buf, size_t len):size_t:len'
[01:45:22]
p:c:write(int fd, void *buf, size_t len):size_t:len
len : count distribution
......@@ -81,7 +81,7 @@ bytes, medium writes of 32-63 bytes, and larger writes of 64-127 bytes.
But these are writes across the board -- what if you wanted to focus on writes
to STDOUT?
# ./argdist.py -H 'p:c:write(int fd, void *buf, size_t len):size_t:len:fd==1'
# ./argdist -H 'p:c:write(int fd, void *buf, size_t len):size_t:len:fd==1'
[01:47:17]
p:c:write(int fd, void *buf, size_t len):size_t:len:fd==1
len : count distribution
......@@ -120,7 +120,7 @@ You can also use argdist to trace kernel functions. For example, suppose you
wanted a histogram of kernel allocation (kmalloc) sizes across the system,
printed twice with 3 second intervals:
# ./argdist.py -i 3 -n 2 -H 'p::__kmalloc(size_t size):size_t:size'
# ./argdist -i 3 -n 2 -H 'p::__kmalloc(size_t size):size_t:size'
[01:50:00]
p::__kmalloc(size_t size):size_t:size
size : count distribution
......@@ -143,7 +143,7 @@ p::__kmalloc(size_t size):size_t:size
Occasionally, numeric information isn't enough and you want to capture strings.
What are the strings printed by puts() across the system?
# ./argdist.py -i 10 -n 1 -C 'p:c:puts(char *str):char*:str'
# ./argdist -i 10 -n 1 -C 'p:c:puts(char *str):char*:str'
[01:53:54]
p:c:puts(char *str):char*:str
COUNT EVENT
......@@ -156,17 +156,17 @@ What about reads? You could trace gets() across the system and print the
strings input by the user (note how "r" is used instead of "p" to attach a
probe to the function's return):
# ./argdist.py -i 10 -n 1 -C 'r:c:gets():char*:$retval:$retval!=0'
# ./argdist -i 10 -n 1 -C 'r:c:gets():char*:(char*)$retval:$retval!=0'
[02:12:23]
r:c:gets():char*:$retval:$retval!=0
COUNT EVENT
1 (char*)ctx->ax = hi there
3 (char*)ctx->ax = sasha
8 (char*)ctx->ax = hello
1 (char*)$retval = hi there
3 (char*)$retval = sasha
8 (char*)$retval = hello
Similarly, we could get a histogram of the error codes returned by read():
# ./argdist.py -i 10 -c 1 -H 'r:c:read()'
# ./argdist -i 10 -c 1 -H 'r:c:read()'
[02:15:36]
r:c:read()
retval : count distribution
......@@ -183,32 +183,109 @@ r:c:read()
1024 -> 2047 : 0 | |
2048 -> 4095 : 2 |** |
In return probes, you can also trace the latency of the function (unless it is
recursive) and the parameters it had on entry. For example, we can identify
which processes are performing slow synchronous filesystem reads -- say,
longer than 0.1ms (100,000ns):
# ./argdist -C 'r::__vfs_read():u32:$PID:$latency > 100000'
[01:08:48]
r::__vfs_read():u32:$PID:$latency > 100000
COUNT EVENT
1 $PID = 10457
21 $PID = 2780
[01:08:49]
r::__vfs_read():u32:$PID:$latency > 100000
COUNT EVENT
1 $PID = 10457
21 $PID = 2780
^C
It looks like process 2780 performed 21 slow reads.
Occasionally, entry parameter values are also interesting. For example, you
might be curious how long it takes malloc() to allocate memory -- nanoseconds
per byte allocated. Let's go:
# ./argdist -H 'r:c:malloc(size_t size):u64:$latency/$entry(size);ns per byte' -n 1 -i 10
[01:11:13]
ns per byte : count distribution
0 -> 1 : 0 | |
2 -> 3 : 4 |***************** |
4 -> 7 : 3 |************* |
8 -> 15 : 2 |******** |
16 -> 31 : 1 |**** |
32 -> 63 : 0 | |
64 -> 127 : 7 |******************************* |
128 -> 255 : 1 |**** |
256 -> 511 : 0 | |
512 -> 1023 : 1 |**** |
1024 -> 2047 : 1 |**** |
2048 -> 4095 : 9 |****************************************|
4096 -> 8191 : 1 |**** |
It looks like a tri-modal distribution. Some allocations are extremely cheap,
and take 2-15 nanoseconds per byte. Other allocations are slower, and take
64-127 nanoseconds per byte. And some allocations are slower still, and take
multiple microseconds per byte.
You could also group results by more than one field. For example, __kmalloc
takes an additional flags parameter that describes how to allocate memory:
# ./argdist -C 'p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size'
[03:42:29]
p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size
COUNT EVENT
1 flags = 16, size = 152
2 flags = 131280, size = 8
7 flags = 131280, size = 16
[03:42:30]
p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size
COUNT EVENT
1 flags = 16, size = 152
6 flags = 131280, size = 8
19 flags = 131280, size = 16
[03:42:31]
p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size
COUNT EVENT
2 flags = 16, size = 152
10 flags = 131280, size = 8
31 flags = 131280, size = 16
[03:42:32]
p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size
COUNT EVENT
2 flags = 16, size = 152
14 flags = 131280, size = 8
43 flags = 131280, size = 16
^C
The flags value must be expanded by hand, but it's still helpful to eliminate
certain kinds of allocations or visually group them together.
Here's a final example that finds how many write() system calls are performed
by each process on the system:
# argdist.py -C 'p:c:write():int:$PID;write per process' -n 2
# argdist -C 'p:c:write():int:$PID;write per process' -n 2
[06:47:18]
write by process
COUNT EVENT
3 bpf_get_current_pid_tgid() = 8889
7 bpf_get_current_pid_tgid() = 7615
7 bpf_get_current_pid_tgid() = 2480
3 $PID = 8889
7 $PID = 7615
7 $PID = 2480
[06:47:19]
write by process
COUNT EVENT
9 bpf_get_current_pid_tgid() = 8889
23 bpf_get_current_pid_tgid() = 7615
23 bpf_get_current_pid_tgid() = 2480
As you see, the $PID alias is expanded to the BPF function bpf_get_current_pid_tgid(),
which returns the current process' pid.
9 $PID = 8889
23 $PID = 7615
23 $PID = 2480
USAGE message:
usage: argdist.py [-h] [-p PID] [-z STRING_SIZE] [-i INTERVAL] [-n COUNT]
[-H [HISTSPECIFIER [HISTSPECIFIER ...]]]
[-C [COUNTSPECIFIER [COUNTSPECIFIER ...]]] [-v]
# argdist -h
usage: argdist [-h] [-p PID] [-z STRING_SIZE] [-i INTERVAL] [-n COUNT] [-v]
[-T TOP] [-H [specifier [specifier ...]]]
[-C [specifier [specifier ...]]] [-I [header [header ...]]]
Trace a function and display a summary of its parameter values.
......@@ -221,63 +298,79 @@ optional arguments:
output interval, in seconds
-n COUNT, --number COUNT
number of outputs
-H [HISTSPECIFIER [HISTSPECIFIER ...]], --histogram [HISTSPECIFIER [HISTSPECIFIER ...]]
-v, --verbose print resulting BPF program code before executing
-T TOP, --top TOP number of top results to show (not applicable to
histograms)
-H [specifier [specifier ...]], --histogram [specifier [specifier ...]]
probe specifier to capture histogram of (see examples
below)
-C [COUNTSPECIFIER [COUNTSPECIFIER ...]], --count [COUNTSPECIFIER [COUNTSPECIFIER ...]]
-C [specifier [specifier ...]], --count [specifier [specifier ...]]
probe specifier to capture count of (see examples
below)
-v, --verbose print resulting BPF program code before executing
-I [header [header ...]], --include [header [header ...]]
additional header files to include in the BPF program
Probe specifier syntax:
{p,r}:[library]:function(signature)[:type:expr[:filter]][;label]
{p,r}:[library]:function(signature)[:type[,type...]:expr[,expr...][:filter]][#label]
Where:
p,r -- probe at function entry or at function exit
in exit probes, only $retval is accessible
in exit probes: can use $retval, $entry(param), $latency
library -- the library that contains the function
(leave empty for kernel functions)
function -- the function name to trace
signature -- the function's parameters, as in the C header
type -- the type of the expression to collect
expr -- the expression to collect
type -- the type of the expression to collect (supports multiple)
expr -- the expression to collect (supports multiple)
filter -- the filter that is applied to collected values
label -- the label for this probe in the resulting output
EXAMPLES:
argdist.py -H 'p::__kmalloc(u64 size):u64:size'
argdist -H 'p::__kmalloc(u64 size):u64:size'
Print a histogram of allocation sizes passed to kmalloc
argdist.py -p 1005 -C 'p:c:malloc(size_t size):size_t:size:size==16'
argdist -p 1005 -C 'p:c:malloc(size_t size):size_t:size:size==16'
Print a frequency count of how many times process 1005 called malloc
with an allocation size of 16 bytes
argdist.py -C 'r:c:gets():char*:$retval;snooped strings'
argdist -C 'r:c:gets():char*:$retval#snooped strings'
Snoop on all strings returned by gets()
argdist.py -p 1005 -C 'p:c:write(int fd):int:fd'
argdist -H 'r::__kmalloc(size_t size):u64:$latency/$entry(size)#ns per byte'
Print a histogram of nanoseconds per byte from kmalloc allocations
argdist -C 'p::__kmalloc(size_t size, gfp_t flags):size_t:size:flags&GFP_ATOMIC'
Print frequency count of kmalloc allocation sizes that have GFP_ATOMIC
argdist -p 1005 -C 'p:c:write(int fd):int:fd' -T 5
Print frequency counts of how many times writes were issued to a
particular file descriptor number, in process 1005
particular file descriptor number, in process 1005, but only show
the top 5 busiest fds
argdist.py -p 1005 -H 'r:c:read()'
argdist -p 1005 -H 'r:c:read()'
Print a histogram of error codes returned by read() in process 1005
argdist.py -H \
argdist -C 'r::__vfs_read():u32:$PID:$latency > 100000'
Print frequency of reads by process where the latency was >0.1ms
argdist -H 'r::__vfs_read(void *file, void *buf, size_t count):size_t:$entry(count):$latency > 1000000'
Print a histogram of read sizes that were longer than 1ms
argdist -H \
'p:c:write(int fd, const void *buf, size_t count):size_t:count:fd==1'
Print a histogram of buffer sizes passed to write() across all
processes, where the file descriptor was 1 (STDOUT)
argdist.py -C 'p:c:fork();fork calls'
argdist -C 'p:c:fork()#fork calls'
Count fork() calls in libc across all processes
Can also use funccount.py, which is easier and more flexible
Can also use funccount.py, which is easier and more flexible
argdist.py \
-H 'p:c:sleep(u32 seconds):u32:seconds' \
-H 'p:c:nanosleep(struct timespec { time_t tv_sec; long tv_nsec; } *req):long:req->tv_nsec'
argdist -H \
'p:c:sleep(u32 seconds):u32:seconds' \
'p:c:nanosleep(struct timespec *req):long:req->tv_nsec'
Print histograms of sleep() and nanosleep() parameter values
argdist.py -p 2780 -z 120 \
argdist -p 2780 -z 120 \
-C 'p:c:write(int fd, char* buf, size_t len):char*:buf:fd==1'
Spy on writes to STDOUT performed by process 2780, up to a string size
of 120 characters
of 120 characters
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment