Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
B
bcc
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
Kirill Smelkov
bcc
Commits
61f5535f
Commit
61f5535f
authored
Feb 14, 2016
by
4ast
Browse files
Options
Browse Files
Download
Plain Diff
Merge pull request #377 from goldshtn/argdist-tuples
Enhancements to argdist
parents
e01b6544
7df65da7
Changes
3
Show whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
584 additions
and
169 deletions
+584
-169
man/man8/argdist.8
man/man8/argdist.8
+50
-21
tools/argdist.py
tools/argdist.py
+396
-103
tools/argdist_examples.txt
tools/argdist_examples.txt
+138
-45
No files found.
man/man8/argdist.8
View file @
61f5535f
...
...
@@ -2,14 +2,14 @@
.SH NAME
argdist \- Trace a function and display a histogram or frequency count of its parameter values. Uses Linux eBPF/bcc.
.SH SYNOPSIS
.B argdist [-h] [-p PID] [-z STRING_SIZE] [-i INTERVAL] [-n COUNT] [-
H specifier [specifier ...]] [-C specifier [specifi
er ...]]
.B argdist [-h] [-p PID] [-z STRING_SIZE] [-i INTERVAL] [-n COUNT] [-
v] [-T TOP] [-H specifier [specifier ...]] [-C specifier [specifier ...]] [-I header [head
er ...]]
.SH DESCRIPTION
argdist attaches to function entry and exit points, collects specified parameter
values, and stores them in a histogram or a frequency collection that counts
the number of times a parameter value occurred. It can also filter parameter
values and instrument multiple entry points at once.
This currently only works on x86_64. Check for future versions
.
Since this uses BPF, only the root user can use this tool
.
.SH REQUIREMENTS
CONFIG_BPF and bcc.
.SH OPTIONS
...
...
@@ -24,20 +24,33 @@ Trace only functions in the process PID.
When collecting string arguments (of type char*), collect up to STRING_SIZE
characters. Longer strings will be truncated.
.TP
-i INTERVAL
\
-i INTERVAL
Print the collected data every INTERVAL seconds. The default is 1 second.
.TP
-n NUMBER
\
-n NUMBER
Print the collected data COUNT times and then exit.
.TP
-H SPECIFIER, -C SPECIFIER
\-v
Display the generated BPF program, for debugging purposes.
.TP
\-T TOP
When collecting frequency counts, display only the top TOP entries.
.TP
\-H SPECIFIER, \-C SPECIFIER
One or more probe specifications that instruct argdist which functions to
probe, which parameters to collect, how to aggregate them, and whether to perform
any filtering. See SPECIFIER SYNTAX below.
.TP
\-I HEADER
One or more header files that should be included in the BPF program. This
enables the use of structure definitions, enumerations, and constants that
are available in these headers. You should provide the same path you would
include in the BPF program, e.g. 'linux/blkdev.h' or 'linux/time.h'. Note: in
many cases, argdist will deduce the necessary header files automatically.
.SH SPECIFIER SYNTAX
The general specifier syntax is as follows:
.B {p,r}:[library]:function(signature)[:type
:expr[:filter]][;
label]
.B {p,r}:[library]:function(signature)[:type
[,type...]:expr[,expr...][:filter]][#
label]
.TP
.B {p,r}
Probe type \- "p" for function entry, "r" for function return;
...
...
@@ -47,8 +60,7 @@ count information, or aggregate the collected values into a histogram. Counting
probes will collect the number of times every parameter value was observed,
whereas histogram probes will collect the parameter values into a histogram.
Only integral types can be used with histogram probes; there is no such limitation
for counting probes. Return probes can only use the pseudo-variable $retval, which
is an alias for the function's return value.
for counting probes.
.TP
.B [library]
Library containing the probe.
...
...
@@ -63,21 +75,30 @@ on the other hand, is only required if you plan to collect parameter values
based on that signature. For example, if you only want to collect the first
parameter, you don't have to specify the rest of the parameters in the signature.
.TP
.B [type]
The type
of the expression
to capture.
.B [type
[,type...]
]
The type
(s) of the expression(s)
to capture.
This is the type of the keys in the histogram or raw event collection that are
collected by the probes.
.TP
.B [expr]
The expression to capture.
.B [expr
[,expr...]
]
The expression
(s)
to capture.
These are the values that are assigned to the histogram or raw event collection.
You may use the parameters directly, or valid C expressions that involve the
parameters, such as "size % 10".
Return probes can use the argument values received by the
function when it was entered, through the $entry(paramname) special variable.
Return probes can also access the function's return value in $retval, and the
function's execution time in nanoseconds in $latency. Note that adding the
$latency or $entry(paramname) variables to the expression will introduce an
additional probe at the function's entry to collect this data, and therefore
introduce additional overhead.
.TP
.B [filter]
The filter applied to the captured data.
Only parameter values that pass the filter will be collected. This is any valid
C expression that refers to the parameter values, such as "fd == 1 && length > 16".
The $entry, $retval, and $latency variables can be used here as well, in return
probes.
.TP
.B [label]
The label that will be displayed when printing the probed values. By default,
...
...
@@ -86,39 +107,47 @@ this is the probe specifier.
.TP
Print a histogram of allocation sizes passed to kmalloc:
#
.B argdist
.py
-H 'p::__kmalloc(u64 size):u64:size'
.B argdist -H 'p::__kmalloc(u64 size):u64:size'
.TP
Print a count of how many times process 1005 called malloc with an allocation size of 16 bytes:
#
.B argdist
.py
-p 1005 -C 'p:c:malloc(size_t size):size_t:size:size==16'
.B argdist -p 1005 -C 'p:c:malloc(size_t size):size_t:size:size==16'
.TP
Snoop on all strings returned by gets():
#
.B argdist.py -C 'r:c:gets():char*:$retval'
.B argdist -C 'r:c:gets():char*:$retval'
.TP
Print a histogram of read sizes that were longer than 1ms:
#
.B argdist -H 'r::__vfs_read(void *file, void *buf, size_t count):size_t:$entry(count):$latency > 1000000'
.TP
Print frequency counts of how many times writes were issued to a particular file descriptor number, in process 1005:
#
.B argdist
.py
-p 1005 -C 'p:c:write(int fd):int:fd'
.B argdist -p 1005 -C 'p:c:write(int fd):int:fd'
.TP
Print a histogram of error codes returned by read() in process 1005:
#
.B argdist
.py
-p 1005 -H 'r:c:read()'
.B argdist -p 1005 -H 'r:c:read()'
.TP
Print a histogram of buffer sizes passed to write() across all processes, where the file descriptor was 1 (STDOUT):
#
.B argdist
.py
-H 'p:c:write(int fd, const void *buf, size_t count):size_t:count:fd==1'
.B argdist -H 'p:c:write(int fd, const void *buf, size_t count):size_t:count:fd==1'
.TP
Count fork() calls in libc across all processes, grouped by pid:
#
.B argdist
.py
-C 'p:c:fork():int:$PID;fork per process'
.B argdist -C 'p:c:fork():int:$PID;fork per process'
.TP
Print histograms of sleep() and nanosleep() parameter values:
#
.B argdist
.py -H 'p:c:sleep(u32 seconds):u32:seconds' -H 'p:c:nanosleep(struct timespec { time_t tv_sec; long tv_nsec; }
*req):long:req->tv_nsec'
.B argdist
-H 'p:c:sleep(u32 seconds):u32:seconds' 'p:c:nanosleep(struct timespec
*req):long:req->tv_nsec'
.TP
Spy on writes to STDOUT performed by process 2780, up to a string size of 120 characters:
#
.B argdist.py -p 2780 -z 120 -C 'p:c:write(int fd, char* buf, size_t len):char*:buf:fd==1'
.B argdist -p 2780 -z 120 -C 'p:c:write(int fd, char* buf, size_t len):char*:buf:fd==1'
.TP
Group files being read from and the read sizes from __vfs_read:
#
.B argdist -C 'p::__vfs_read(struct file *file, void *buf, size_t count):char*,size_t:file->f_path.dentry->d_iname,count:file->f_path.dentry->d_iname[0]!=0'
.SH SOURCE
This is from bcc.
.IP
...
...
tools/argdist.py
View file @
61f5535f
#!/usr/bin/env python
#
# argdist
.py
Trace a function and display a distribution of its
# argdist Trace a function and display a distribution of its
# parameter values as a histogram or frequency count.
#
# USAGE: argdist.py [-h] [-p PID] [-z STRING_SIZE] [-i INTERVAL]
# [-n COUNT] [-C specifier [specifier ...]]
# USAGE: argdist [-h] [-p PID] [-z STRING_SIZE] [-i INTERVAL]
# [-n COUNT] [-v] [-T TOP]
# [-C specifier [specifier ...]]
# [-H specifier [specifier ...]]
# [-I header [header ...]]
#
# Licensed under the Apache License, Version 2.0 (the "License")
# Copyright (C) 2016 Sasha Goldshtein.
...
...
@@ -13,22 +15,41 @@
from
bcc
import
BPF
from
time
import
sleep
,
strftime
import
argparse
import
re
class
Specifier
(
object
):
text
=
"""
probe_
text
=
"""
DATA_DECL
int PROBENAME(struct pt_regs *ctx SIGNATURE)
{
PREFIX
PID_FILTER
KEY_EXPR
if (!(FILTER)) return 0;
KEY_EXPR
COLLECT
return 0;
}
"""
next_probe_index
=
0
aliases
=
{
"$PID"
:
"bpf_get_current_pid_tgid()"
}
auto_includes
=
{
"linux/time.h"
:
[
"time"
],
"linux/fs.h"
:
[
"fs"
,
"file"
],
"linux/blkdev.h"
:
[
"bio"
,
"request"
],
"linux/slab.h"
:
[
"alloc"
]
}
@
staticmethod
def
generate_auto_includes
(
specifiers
):
headers
=
""
for
header
,
keywords
in
Specifier
.
auto_includes
.
items
():
for
keyword
in
keywords
:
for
specifier
in
specifiers
:
if
keyword
in
specifier
:
headers
+=
"#include <%s>
\
n
"
\
%
header
return
headers
def
_substitute_aliases
(
self
,
expr
):
if
expr
is
None
:
...
...
@@ -37,45 +58,198 @@ int PROBENAME(struct pt_regs *ctx SIGNATURE)
expr
=
expr
.
replace
(
alias
,
subst
)
return
expr
def
_parse_signature
(
self
):
params
=
map
(
str
.
strip
,
self
.
signature
.
split
(
','
))
self
.
param_types
=
{}
for
param
in
params
:
# If the type is a pointer, the * can be next to the
# param name. Other complex types like arrays are not
# supported right now.
index
=
param
.
rfind
(
'*'
)
index
=
index
if
index
!=
-
1
else
param
.
rfind
(
' '
)
param_type
=
param
[
0
:
index
+
1
].
strip
()
param_name
=
param
[
index
+
1
:].
strip
()
self
.
param_types
[
param_name
]
=
param_type
entry_probe_text
=
"""
int PROBENAME(struct pt_regs *ctx SIGNATURE)
{
u32 pid = bpf_get_current_pid_tgid();
PID_FILTER
COLLECT
return 0;
}
"""
def
_generate_entry
(
self
):
self
.
entry_probe_func
=
self
.
probe_func_name
+
"_entry"
text
=
self
.
entry_probe_text
text
=
text
.
replace
(
"PROBENAME"
,
self
.
entry_probe_func
)
text
=
text
.
replace
(
"SIGNATURE"
,
""
if
len
(
self
.
signature
)
==
0
else
", "
+
self
.
signature
)
pid_filter
=
""
if
self
.
is_user
or
self
.
pid
is
None
\
else
"if (pid != %d) { return 0; }"
%
self
.
pid
text
=
text
.
replace
(
"PID_FILTER"
,
pid_filter
)
collect
=
""
for
pname
in
self
.
args_to_probe
:
param_hash
=
self
.
hashname_prefix
+
pname
if
pname
==
"__latency"
:
collect
+=
"""
u64 __time = bpf_ktime_get_ns();
%s.update(&pid, &__time);
"""
%
param_hash
else
:
collect
+=
"%s.update(&pid, &%s);
\
n
"
%
\
(
param_hash
,
pname
)
text
=
text
.
replace
(
"COLLECT"
,
collect
)
return
text
def
_generate_entry_probe
(
self
):
# Any $entry(name) expressions result in saving that argument
# when entering the function.
self
.
args_to_probe
=
set
()
regex
=
r"\
$e
ntry\
((
\w+)\
)
"
for expr in self.exprs:
for arg in re.finditer(regex, expr):
self.args_to_probe.add(arg.group(1))
for arg in re.finditer(regex, self.filter):
self.args_to_probe.add(arg.group(1))
if any(map(lambda expr: "
$
latency
" in expr, self.exprs)) or
\
"
$
latency
" in self.filter:
self.args_to_probe.add("
__latency
")
self.param_types["
__latency
"] = "
u64
" # nanoseconds
for pname in self.args_to_probe:
if pname not in self.param_types:
raise ValueError("
$
entry
(
%
s
):
no
such
param
"
\
% arg)
self.hashname_prefix = "
%
s_param_
" % self.probe_hash_name
text = ""
for pname in self.args_to_probe:
# Each argument is stored in a separate hash that is
# keyed by pid.
text += "
BPF_HASH
(
%
s
,
u32
,
%
s
);
\
n
" %
\
(self.hashname_prefix + pname,
self.param_types[pname])
text += self._generate_entry()
return text
def _generate_retprobe_prefix(self):
# After we're done here, there are __%s_val variables for each
# argument we needed to probe using $entry(name), and they all
# have values (which isn't necessarily the case if we missed
# the method entry probe).
text = "
u32
__pid
=
bpf_get_current_pid_tgid
();
\
n
"
self.param_val_names = {}
for pname in self.args_to_probe:
val_name = "
__
%
s_val
" % pname
text += "
%
s
*%
s
=
%
s
.
lookup
(
&
__pid
);
\
n
" %
\
(self.param_types[pname], val_name,
self.hashname_prefix + pname)
text += "
if
(
%
s
==
0
)
{
return
0
;
}
\
n
" % val_name
self.param_val_names[pname] = val_name
return text
def _replace_entry_exprs(self):
for pname, vname in self.param_val_names.items():
if pname == "
__latency
":
entry_expr = "
$
latency
"
val_expr = "
(
bpf_ktime_get_ns
()
-
*%
s
)
" % vname
else:
entry_expr = "
$
entry
(
%
s
)
" % pname
val_expr = "
(
*%
s
)
" % vname
for i in range(0, len(self.exprs)):
self.exprs[i] = self.exprs[i].replace(
entry_expr, val_expr)
self.filter = self.filter.replace(entry_expr,
val_expr)
def _attach_entry_probe(self):
if self.is_user:
self.bpf.attach_uprobe(name=self.library,
sym=self.function,
fn_name=self.entry_probe_func,
pid=self.pid or -1)
else:
self.bpf.attach_kprobe(event=self.function,
fn_name=self.entry_probe_func)
def _bail(self, error):
raise ValueError("
error
parsing
probe
'%s'
:
%
s
" %
(self.raw_spec, error))
def _validate_specifier(self):
# Everything after '#' is the probe label, ignore it
spec = self.raw_spec.split('#')[0]
parts = spec.strip().split(':')
if len(parts) < 3:
self._bail("
at
least
the
probe
type
,
library
,
and
" +
"
function
signature
must
be
specified
")
if len(parts) > 6:
self._bail("
extraneous
':'
-
separated
parts
detected
")
if parts[0] not in ["r", "
p
"]:
self._bail("
probe
type
must
be
either
'p'
or
'r'
,
" +
"
but
got
'%s'" % parts[0])
if re.match(r"
\
w
+
\
(.
*
\
)
", parts[2]) is None:
self._bail(("
function
signature
'%s'
has
an
invalid
" +
"
format
") % parts[2])
def _parse_expr_types(self, expr_types):
if len(expr_types) == 0:
self._bail("
no
expr
types
specified
")
self.expr_types = expr_types.split(',')
def _parse_exprs(self, exprs):
if len(exprs) == 0:
self._bail("
no
exprs
specified
")
self.exprs = exprs.split(',')
def __init__(self, type, specifier, pid):
self.raw_spec = specifier
spec_and_label
=
specifier
.
split
(
';'
)
self._validate_specifier()
spec_and_label = specifier.split('#')
self.label = spec_and_label[1]
\
if len(spec_and_label) == 2 else None
parts = spec_and_label[0].strip().split(':')
if
len
(
parts
)
<
3
or
len
(
parts
)
>
6
:
raise
ValueError
(
"invalid specifier format"
)
self.type = type # hist or freq
self.is_ret_probe = parts[0] == "r"
if
self
.
type
!=
"hist"
and
self
.
type
!=
"freq"
:
raise
ValueError
(
"unrecognized probe type"
)
if
parts
[
0
]
not
in
[
"r"
,
"p"
]:
raise
ValueError
(
"unrecognized probe type"
)
self.library = parts[1]
self.is_user = len(self.library) > 0
fparts = parts[2].split('(')
if
len
(
fparts
)
!=
2
:
raise
ValueError
(
"invalid specifier format"
)
self
.
function
=
fparts
[
0
]
self
.
signature
=
fparts
[
1
][:
-
1
]
self.function = fparts[0].strip()
self.signature = fparts[1].strip()[:-1]
self._parse_signature()
# If the user didn't specify an expression to probe, we probe
# the retval in a ret probe, or simply the value "
1
" otherwise.
self.is_default_expr = len(parts) < 5
if not self.is_default_expr:
self
.
expr_type
=
parts
[
3
]
self
.
expr
=
parts
[
4
]
self._parse_expr_types(parts[3])
self._parse_exprs(parts[4])
if len(self.exprs) != len(self.expr_types):
self._bail("
mismatched
# of exprs and types")
if
self
.
type
==
"hist"
and
len
(
self
.
expr_types
)
>
1
:
self
.
_bail
(
"histograms can only have 1 expr"
)
else
:
if
not
self
.
is_ret_probe
and
self
.
type
==
"hist"
:
raise
ValueError
(
"dist probes must have expr"
)
self
.
expr_type
=
\
"u64"
if
not
self
.
is_ret_probe
else
"int"
self
.
expr
=
"1"
if
not
self
.
is_ret_probe
else
"$retval"
self
.
expr
=
self
.
expr
.
replace
(
"$retval"
,
"(%s)ctx->ax"
%
self
.
expr_type
)
self
.
filter
=
None
if
len
(
parts
)
!=
6
else
parts
[
5
]
if
self
.
filter
is
not
None
:
self
.
filter
=
self
.
filter
.
replace
(
"$retval"
,
"(%s)ctx->ax"
%
self
.
expr_type
)
self
.
expr
=
self
.
_substitute_aliases
(
self
.
expr
)
self
.
filter
=
self
.
_substitute_aliases
(
self
.
filter
)
self
.
_bail
(
"histograms must have expr"
)
self
.
expr_types
=
\
[
"u64"
if
not
self
.
is_ret_probe
else
"int"
]
self
.
exprs
=
\
[
"1"
if
not
self
.
is_ret_probe
else
"$retval"
]
self
.
filter
=
""
if
len
(
parts
)
!=
6
else
parts
[
5
]
self
.
_substitute_exprs
()
# Do we need to attach an entry probe so that we can collect an
# argument that is required for an exit (return) probe?
def
check
(
expr
):
keywords
=
[
"$entry"
,
"$latency"
]
return
any
(
map
(
lambda
kw
:
kw
in
expr
,
keywords
))
self
.
entry_probe_required
=
self
.
is_ret_probe
and
\
(
any
(
map
(
check
,
self
.
exprs
))
or
check
(
self
.
filter
))
self
.
pid
=
pid
self
.
probe_func_name
=
"%s_probe%d"
%
\
(
self
.
function
,
Specifier
.
next_probe_index
)
...
...
@@ -83,51 +257,107 @@ int PROBENAME(struct pt_regs *ctx SIGNATURE)
(
self
.
function
,
Specifier
.
next_probe_index
)
Specifier
.
next_probe_index
+=
1
def
_is_string_probe
(
self
):
return
self
.
expr_type
==
"char*"
or
self
.
expr_type
==
"char *"
def
_substitute_exprs
(
self
):
def
repl
(
expr
):
expr
=
self
.
_substitute_aliases
(
expr
)
return
expr
.
replace
(
"$retval"
,
"ctx->ax"
)
for
i
in
range
(
0
,
len
(
self
.
exprs
)):
self
.
exprs
[
i
]
=
repl
(
self
.
exprs
[
i
])
self
.
filter
=
repl
(
self
.
filter
)
def
_is_string
(
self
,
expr_type
):
return
expr_type
==
"char*"
or
expr_type
==
"char *"
def
generate_text
(
self
,
string_size
):
program
=
self
.
text
.
replace
(
"PROBENAME"
,
self
.
probe_func_name
)
def
_generate_hash_field
(
self
,
i
):
if
self
.
_is_string
(
self
.
expr_types
[
i
]):
return
"struct __string_t v%d;
\
n
"
%
i
else
:
return
"%s v%d;
\
n
"
%
(
self
.
expr_types
[
i
],
i
)
def
_generate_field_assignment
(
self
,
i
):
if
self
.
_is_string
(
self
.
expr_types
[
i
]):
return
"bpf_probe_read("
+
\
"&__key.v%d.s, sizeof(__key.v%d.s), %s);
\
n
"
%
\
(
i
,
i
,
self
.
exprs
[
i
])
else
:
return
"__key.v%d = %s;
\
n
"
%
(
i
,
self
.
exprs
[
i
])
def
_generate_hash_decl
(
self
):
if
self
.
type
==
"hist"
:
return
"BPF_HISTOGRAM(%s, %s);"
%
\
(
self
.
probe_hash_name
,
self
.
expr_types
[
0
])
else
:
text
=
"struct %s_key_t {
\
n
"
%
self
.
probe_hash_name
for
i
in
range
(
0
,
len
(
self
.
expr_types
)):
text
+=
self
.
_generate_hash_field
(
i
)
text
+=
"};
\
n
"
text
+=
"BPF_HASH(%s, struct %s_key_t, u64);
\
n
"
%
\
(
self
.
probe_hash_name
,
self
.
probe_hash_name
)
return
text
def
_generate_key_assignment
(
self
):
if
self
.
type
==
"hist"
:
return
"%s __key = %s;
\
n
"
%
\
(
self
.
expr_types
[
0
],
self
.
exprs
[
0
])
else
:
text
=
"struct %s_key_t __key = {};
\
n
"
%
\
self
.
probe_hash_name
for
i
in
range
(
0
,
len
(
self
.
exprs
)):
text
+=
self
.
_generate_field_assignment
(
i
)
return
text
def
_generate_hash_update
(
self
):
if
self
.
type
==
"hist"
:
return
"%s.increment(bpf_log2l(__key));"
%
\
self
.
probe_hash_name
else
:
return
"%s.increment(__key);"
%
self
.
probe_hash_name
def
_generate_pid_filter
(
self
):
# Kernel probes need to explicitly filter pid, because the
# attach interface doesn't support pid filtering
if
self
.
pid
is
not
None
and
not
self
.
is_user
:
return
"u32 pid = bpf_get_current_pid_tgid();
\
n
"
+
\
"if (pid != %d) { return 0; }"
%
self
.
pid
else
:
return
""
def
generate_text
(
self
):
# We don't like tools writing tools (Brendan Gregg), but this
# is an exception because we're letting the user fully
# customize the values we probe. As a rule of thumb though,
# try to build a custom tool for a specific purpose.
program
=
""
# If any entry arguments are probed in a ret probe, we need
# to generate an entry probe to collect them
prefix
=
""
if
self
.
entry_probe_required
:
program
=
self
.
_generate_entry_probe
()
prefix
=
self
.
_generate_retprobe_prefix
()
# Replace $entry(paramname) with a reference to the
# value we collected when entering the function:
self
.
_replace_entry_exprs
()
program
+=
self
.
probe_text
.
replace
(
"PROBENAME"
,
self
.
probe_func_name
)
signature
=
""
if
len
(
self
.
signature
)
==
0
\
else
","
+
self
.
signature
or
self
.
is_ret_probe
\
else
", "
+
self
.
signature
program
=
program
.
replace
(
"SIGNATURE"
,
signature
)
if
self
.
pid
is
not
None
and
not
self
.
is_user
:
# kernel probes need to explicitly filter pid
program
=
program
.
replace
(
"PID_FILTER"
,
"u32 pid = bpf_get_current_pid_tgid();
\
n
"
+
\
"if (pid != %d) { return 0; }"
%
self
.
pid
)
else
:
program
=
program
.
replace
(
"PID_FILTER"
,
""
)
if
self
.
_is_string_probe
():
decl
=
"""
struct %s_key_t { char key[%d]; };
BPF_HASH(%s, struct %s_key_t, u64);
"""
\
%
(
self
.
function
,
string_size
,
self
.
probe_hash_name
,
self
.
function
)
collect
=
"%s.increment(__key);"
%
self
.
probe_hash_name
key_expr
=
"""
struct %s_key_t __key = {0};
bpf_probe_read(&__key.key, sizeof(__key.key), %s);
"""
\
%
(
self
.
function
,
self
.
expr
)
elif
self
.
type
==
"freq"
:
decl
=
"BPF_HASH(%s, %s, u64);"
%
\
(
self
.
probe_hash_name
,
self
.
expr_type
)
collect
=
"%s.increment(__key);"
%
self
.
probe_hash_name
key_expr
=
"%s __key = %s;"
%
\
(
self
.
expr_type
,
self
.
expr
)
elif
self
.
type
==
"hist"
:
decl
=
"BPF_HISTOGRAM(%s, %s);"
%
\
(
self
.
probe_hash_name
,
self
.
expr_type
)
collect
=
"%s.increment(bpf_log2l(__key));"
%
\
self
.
probe_hash_name
key_expr
=
"%s __key = %s;"
%
\
(
self
.
expr_type
,
self
.
expr
)
self
.
_generate_pid_filter
())
decl
=
self
.
_generate_hash_decl
()
key_expr
=
self
.
_generate_key_assignment
()
collect
=
self
.
_generate_hash_update
()
program
=
program
.
replace
(
"DATA_DECL"
,
decl
)
program
=
program
.
replace
(
"KEY_EXPR"
,
key_expr
)
program
=
program
.
replace
(
"FILTER"
,
self
.
filter
or
"1"
)
program
=
program
.
replace
(
"FILTER"
,
"1"
if
len
(
self
.
filter
)
==
0
else
self
.
filter
)
program
=
program
.
replace
(
"COLLECT"
,
collect
)
program
=
program
.
replace
(
"PREFIX"
,
prefix
)
return
program
def
attach
(
self
,
bpf
):
...
...
@@ -150,81 +380,131 @@ bpf_probe_read(&__key.key, sizeof(__key.key), %s);
else
:
bpf
.
attach_kprobe
(
event
=
self
.
function
,
fn_name
=
self
.
probe_func_name
)
if
self
.
entry_probe_required
:
self
.
_attach_entry_probe
()
def
display
(
self
):
print
(
self
.
label
or
self
.
raw_spec
)
def
_v2s
(
self
,
v
):
# Most fields can be converted with plain str(), but strings
# are wrapped in a __string_t which has an .s field
if
"__string_t"
in
type
(
v
).
__name__
:
return
str
(
v
.
s
)
return
str
(
v
)
def
_display_expr
(
self
,
i
):
# Replace ugly latency calculation with $latency
expr
=
self
.
exprs
[
i
].
replace
(
"(bpf_ktime_get_ns() - *____latency_val)"
,
"$latency"
)
# Replace alias values back with the alias name
for
alias
,
subst
in
Specifier
.
aliases
.
items
():
expr
=
expr
.
replace
(
subst
,
alias
)
# Replace retval expression with $retval
expr
=
expr
.
replace
(
"ctx->ax"
,
"$retval"
)
# Replace ugly (*__param_val) expressions with param name
return
re
.
sub
(
r"\
(
\*__(\
w+)_
val\
)
", r"
\
1
", expr)
def _display_key(self, key):
if self.is_default_expr:
if not self.is_ret_probe:
return "
total
calls
"
else:
return "
retval
=
%
s
" % str(key.v0)
else:
# The key object has v0, ..., vk fields containing
# the values of the expressions from self.exprs
def str_i(i):
key_i = self._v2s(getattr(key, "
v
%
d
" % i))
return "
%
s
=
%
s
" %
\
(self._display_expr(i), key_i)
return "
,
".join(map(str_i, range(0, len(self.exprs))))
def display(self, top):
data = self.bpf.get_table(self.probe_hash_name)
if self.type == "
freq
":
print(self.label or self.raw_spec)
print("
\
t
%-
10
s
%
s
" % ("
COUNT
", "
EVENT
"))
for
key
,
value
in
sorted
(
data
.
items
(),
key
=
lambda
kv
:
kv
[
1
].
value
):
key_val
=
key
.
key
if
self
.
_is_string_probe
()
\
else
str
(
key
.
value
)
data = sorted(data.items(), key=lambda kv: kv[1].value)
if top is not None:
data = data[-top:]
for key, value in data:
# Print some nice values if the user didn't
# specify an expression to probe
if self.is_default_expr:
if not self.is_ret_probe:
key_str = "
total
calls
"
else:
key_str = "
retval
=
%
s
" %
\
key_val
self._v2s(key.v0)
else:
key_str
=
"%s = %s"
%
\
(
self
.
expr
,
key_val
)
key_str = self._display_key(key)
print("
\
t
%-
10
s
%
s
" %
\
(str(value.value), key_str))
elif self.type == "
hist
":
label
=
self
.
expr
if
not
self
.
is_default_expr
\
else
"retval"
label = self.
label or (self._display_expr(0)
if not self.is_default_expr else "
retval
")
data.print_log2_hist(val_type=label)
examples = """
Probe specifier syntax:
{p,r}:[library]:function(signature)[:type
:expr[:filter]][;
label]
{p,r}:[library]:function(signature)[:type
[,type...]:expr[,expr...][:filter]][#
label]
Where:
p,r -- probe at function entry or at function exit
in exit probes
, only $retval is accessible
in exit probes
: can use $retval, $entry(param), $latency
library -- the library that contains the function
(leave empty for kernel functions)
function -- the function name to trace
signature -- the function's parameters, as in the C header
type -- the type of the expression to collect
expr -- the expression to collect
type -- the type of the expression to collect
(supports multiple)
expr -- the expression to collect
(supports multiple)
filter -- the filter that is applied to collected values
label -- the label for this probe in the resulting output
EXAMPLES:
argdist
.py
-H 'p::__kmalloc(u64 size):u64:size'
argdist -H 'p::__kmalloc(u64 size):u64:size'
Print a histogram of allocation sizes passed to kmalloc
argdist
.py
-p 1005 -C 'p:c:malloc(size_t size):size_t:size:size==16'
argdist -p 1005 -C 'p:c:malloc(size_t size):size_t:size:size==16'
Print a frequency count of how many times process 1005 called malloc
with an allocation size of 16 bytes
argdist
.py -C 'r:c:gets():char*:$retval;
snooped strings'
argdist
-C 'r:c:gets():char*:(char*)$retval#
snooped strings'
Snoop on all strings returned by gets()
argdist.py -p 1005 -C 'p:c:write(int fd):int:fd'
argdist -H 'r::__kmalloc(size_t size):u64:$latency/$entry(size)#ns per byte'
Print a histogram of nanoseconds per byte from kmalloc allocations
argdist -C 'p::__kmalloc(size_t size, gfp_t flags):size_t:size:flags&GFP_ATOMIC'
Print frequency count of kmalloc allocation sizes that have GFP_ATOMIC
argdist -p 1005 -C 'p:c:write(int fd):int:fd' -T 5
Print frequency counts of how many times writes were issued to a
particular file descriptor number, in process 1005
particular file descriptor number, in process 1005, but only show
the top 5 busiest fds
argdist -p 1005 -H 'r:c:read()'
Print a histogram of results (sizes) returned by read() in process 1005
argdist
.py -p 1005 -H 'r:c:read()
'
Print
a histogram of error codes returned by read() in process 1005
argdist
-C 'r::__vfs_read():u32:$PID:$latency > 100000
'
Print
frequency of reads by process where the latency was >0.1ms
argdist.py -H
\
\
argdist -H 'r::__vfs_read(void *file, void *buf, size_t count):size_t:$entry(count):$latency > 1000000'
Print a histogram of read sizes that were longer than 1ms
argdist -H
\
\
'p:c:write(int fd, const void *buf, size_t count):size_t:count:fd==1'
Print a histogram of buffer sizes passed to write() across all
processes, where the file descriptor was 1 (STDOUT)
argdist
.py -C 'p:c:fork();
fork calls'
argdist
-C 'p:c:fork()#
fork calls'
Count fork() calls in libc across all processes
Can also use funccount.py, which is easier and more flexible
argdist
.py
\
\
-H
'p:c:sleep(u32 seconds):u32:seconds'
\
\
-H 'p:c:nanosleep(struct timespec { time_t tv_sec; long tv_nsec; }
*req):long:req->tv_nsec'
argdist
-H
\
\
'p:c:sleep(u32 seconds):u32:seconds'
\
\
'p:c:nanosleep(struct timespec
*req):long:req->tv_nsec'
Print histograms of sleep() and nanosleep() parameter values
argdist
.py
-p 2780 -z 120
\
\
argdist -p 2780 -z 120
\
\
-C 'p:c:write(int fd, char* buf, size_t len):char*:buf:fd==1'
Spy on writes to STDOUT performed by process 2780, up to a string size
of 120 characters
...
...
@@ -242,12 +522,18 @@ parser.add_argument("-i", "--interval", default=1, type=int,
help="
output
interval
,
in
seconds
")
parser.add_argument("
-
n
", "
--
number
", type=int, dest="
count
",
help="
number
of
outputs
")
parser.add_argument("
-
v
", "
--
verbose
", action="
store_true
",
help="
print
resulting
BPF
program
code
before
executing
")
parser.add_argument("
-
T
", "
--
top
", type=int,
help="
number
of
top
results
to
show
(
not
applicable
to
histograms
)
")
parser.add_argument("
-
H
", "
--
histogram
", nargs="
*
", dest="
histspecifier
",
metavar="
specifier
",
help="
probe
specifier
to
capture
histogram
of
(
see
examples
below
)
")
parser.add_argument("
-
C
", "
--
count
", nargs="
*
", dest="
countspecifier
",
metavar="
specifier
",
help="
probe
specifier
to
capture
count
of
(
see
examples
below
)
")
parser
.
add_argument
(
"-
v"
,
"--verbose"
,
action
=
"store_true
"
,
help
=
"
print resulting BPF program code before executing
"
)
parser.add_argument("
-
I
", "
--
include
", nargs="
*
", metavar="
header
",
help="
additional
header
files
to
include
in
the
BPF
program
")
args = parser.parse_args()
specifiers = []
...
...
@@ -259,9 +545,16 @@ if len(specifiers) == 0:
print("
at
least
one
specifier
is
required
")
exit(1)
bpf_source
=
"#include <uapi/linux/ptrace.h>
\
n
"
bpf_source = """
struct __string_t { char s[%d]; };
#include <uapi/linux/ptrace.h>
""" % args.string_size
for include in (args.include or []):
bpf_source += "
#include <%s>\n" % include
bpf_source
+=
Specifier
.
generate_auto_includes
(
map
(
lambda
s
:
s
.
raw_spec
,
specifiers
))
for
specifier
in
specifiers
:
bpf_source
+=
specifier
.
generate_text
(
args
.
string_size
)
bpf_source
+=
specifier
.
generate_text
()
if
args
.
verbose
:
print
(
bpf_source
)
...
...
@@ -279,7 +572,7 @@ while True:
exit
()
print
(
"[%s]"
%
strftime
(
"%H:%M:%S"
))
for
specifier
in
specifiers
:
specifier
.
display
()
specifier
.
display
(
args
.
top
)
count_so_far
+=
1
if
args
.
count
is
not
None
and
count_so_far
>=
args
.
count
:
exit
()
tools/argdist_examples.txt
View file @
61f5535f
...
...
@@ -10,7 +10,7 @@ various functions.
For example, suppose you want to find what allocation sizes are common in
your application:
# ./argdist
.py
-p 2420 -C 'p:c:malloc(size_t size):size_t:size'
# ./argdist -p 2420 -C 'p:c:malloc(size_t size):size_t:size'
[01:42:29]
p:c:malloc(size_t size):size_t:size
COUNT EVENT
...
...
@@ -43,7 +43,7 @@ probed and its value was 16, repeatedly.
Now, suppose you wanted a histogram of buffer sizes passed to the write()
function across the system:
# ./argdist
.py
-H 'p:c:write(int fd, void *buf, size_t len):size_t:len'
# ./argdist -H 'p:c:write(int fd, void *buf, size_t len):size_t:len'
[01:45:22]
p:c:write(int fd, void *buf, size_t len):size_t:len
len : count distribution
...
...
@@ -81,7 +81,7 @@ bytes, medium writes of 32-63 bytes, and larger writes of 64-127 bytes.
But these are writes across the board -- what if you wanted to focus on writes
to STDOUT?
# ./argdist
.py
-H 'p:c:write(int fd, void *buf, size_t len):size_t:len:fd==1'
# ./argdist -H 'p:c:write(int fd, void *buf, size_t len):size_t:len:fd==1'
[01:47:17]
p:c:write(int fd, void *buf, size_t len):size_t:len:fd==1
len : count distribution
...
...
@@ -120,7 +120,7 @@ You can also use argdist to trace kernel functions. For example, suppose you
wanted a histogram of kernel allocation (kmalloc) sizes across the system,
printed twice with 3 second intervals:
# ./argdist
.py
-i 3 -n 2 -H 'p::__kmalloc(size_t size):size_t:size'
# ./argdist -i 3 -n 2 -H 'p::__kmalloc(size_t size):size_t:size'
[01:50:00]
p::__kmalloc(size_t size):size_t:size
size : count distribution
...
...
@@ -143,7 +143,7 @@ p::__kmalloc(size_t size):size_t:size
Occasionally, numeric information isn't enough and you want to capture strings.
What are the strings printed by puts() across the system?
# ./argdist
.py
-i 10 -n 1 -C 'p:c:puts(char *str):char*:str'
# ./argdist -i 10 -n 1 -C 'p:c:puts(char *str):char*:str'
[01:53:54]
p:c:puts(char *str):char*:str
COUNT EVENT
...
...
@@ -156,17 +156,17 @@ What about reads? You could trace gets() across the system and print the
strings input by the user (note how "r" is used instead of "p" to attach a
probe to the function's return):
# ./argdist
.py -i 10 -n 1 -C 'r:c:gets():char*:
$retval:$retval!=0'
# ./argdist
-i 10 -n 1 -C 'r:c:gets():char*:(char*)
$retval:$retval!=0'
[02:12:23]
r:c:gets():char*:$retval:$retval!=0
COUNT EVENT
1 (char*)
ctx->ax
= hi there
3 (char*)
ctx->ax
= sasha
8 (char*)
ctx->ax
= hello
1 (char*)
$retval
= hi there
3 (char*)
$retval
= sasha
8 (char*)
$retval
= hello
Similarly, we could get a histogram of the error codes returned by read():
# ./argdist
.py
-i 10 -c 1 -H 'r:c:read()'
# ./argdist -i 10 -c 1 -H 'r:c:read()'
[02:15:36]
r:c:read()
retval : count distribution
...
...
@@ -183,32 +183,109 @@ r:c:read()
1024 -> 2047 : 0 | |
2048 -> 4095 : 2 |** |
In return probes, you can also trace the latency of the function (unless it is
recursive) and the parameters it had on entry. For example, we can identify
which processes are performing slow synchronous filesystem reads -- say,
longer than 0.1ms (100,000ns):
# ./argdist -C 'r::__vfs_read():u32:$PID:$latency > 100000'
[01:08:48]
r::__vfs_read():u32:$PID:$latency > 100000
COUNT EVENT
1 $PID = 10457
21 $PID = 2780
[01:08:49]
r::__vfs_read():u32:$PID:$latency > 100000
COUNT EVENT
1 $PID = 10457
21 $PID = 2780
^C
It looks like process 2780 performed 21 slow reads.
Occasionally, entry parameter values are also interesting. For example, you
might be curious how long it takes malloc() to allocate memory -- nanoseconds
per byte allocated. Let's go:
# ./argdist -H 'r:c:malloc(size_t size):u64:$latency/$entry(size);ns per byte' -n 1 -i 10
[01:11:13]
ns per byte : count distribution
0 -> 1 : 0 | |
2 -> 3 : 4 |***************** |
4 -> 7 : 3 |************* |
8 -> 15 : 2 |******** |
16 -> 31 : 1 |**** |
32 -> 63 : 0 | |
64 -> 127 : 7 |******************************* |
128 -> 255 : 1 |**** |
256 -> 511 : 0 | |
512 -> 1023 : 1 |**** |
1024 -> 2047 : 1 |**** |
2048 -> 4095 : 9 |****************************************|
4096 -> 8191 : 1 |**** |
It looks like a tri-modal distribution. Some allocations are extremely cheap,
and take 2-15 nanoseconds per byte. Other allocations are slower, and take
64-127 nanoseconds per byte. And some allocations are slower still, and take
multiple microseconds per byte.
You could also group results by more than one field. For example, __kmalloc
takes an additional flags parameter that describes how to allocate memory:
# ./argdist -C 'p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size'
[03:42:29]
p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size
COUNT EVENT
1 flags = 16, size = 152
2 flags = 131280, size = 8
7 flags = 131280, size = 16
[03:42:30]
p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size
COUNT EVENT
1 flags = 16, size = 152
6 flags = 131280, size = 8
19 flags = 131280, size = 16
[03:42:31]
p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size
COUNT EVENT
2 flags = 16, size = 152
10 flags = 131280, size = 8
31 flags = 131280, size = 16
[03:42:32]
p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size
COUNT EVENT
2 flags = 16, size = 152
14 flags = 131280, size = 8
43 flags = 131280, size = 16
^C
The flags value must be expanded by hand, but it's still helpful to eliminate
certain kinds of allocations or visually group them together.
Here's a final example that finds how many write() system calls are performed
by each process on the system:
# argdist
.py
-C 'p:c:write():int:$PID;write per process' -n 2
# argdist -C 'p:c:write():int:$PID;write per process' -n 2
[06:47:18]
write by process
COUNT EVENT
3
bpf_get_current_pid_tgid()
= 8889
7
bpf_get_current_pid_tgid()
= 7615
7
bpf_get_current_pid_tgid()
= 2480
3
$PID
= 8889
7
$PID
= 7615
7
$PID
= 2480
[06:47:19]
write by process
COUNT EVENT
9 bpf_get_current_pid_tgid() = 8889
23 bpf_get_current_pid_tgid() = 7615
23 bpf_get_current_pid_tgid() = 2480
As you see, the $PID alias is expanded to the BPF function bpf_get_current_pid_tgid(),
which returns the current process' pid.
9 $PID = 8889
23 $PID = 7615
23 $PID = 2480
USAGE message:
usage: argdist.py [-h] [-p PID] [-z STRING_SIZE] [-i INTERVAL] [-n COUNT]
[-H [HISTSPECIFIER [HISTSPECIFIER ...]]]
[-C [COUNTSPECIFIER [COUNTSPECIFIER ...]]] [-v]
# argdist -h
usage: argdist [-h] [-p PID] [-z STRING_SIZE] [-i INTERVAL] [-n COUNT] [-v]
[-T TOP] [-H [specifier [specifier ...]]]
[-C [specifier [specifier ...]]] [-I [header [header ...]]]
Trace a function and display a summary of its parameter values.
...
...
@@ -221,63 +298,79 @@ optional arguments:
output interval, in seconds
-n COUNT, --number COUNT
number of outputs
-H [HISTSPECIFIER [HISTSPECIFIER ...]], --histogram [HISTSPECIFIER [HISTSPECIFIER ...]]
-v, --verbose print resulting BPF program code before executing
-T TOP, --top TOP number of top results to show (not applicable to
histograms)
-H [specifier [specifier ...]], --histogram [specifier [specifier ...]]
probe specifier to capture histogram of (see examples
below)
-C [
COUNTSPECIFIER [COUNTSPECIFIER ...]], --count [COUNTSPECIFIER [COUNTSPECIFIER
...]]
-C [
specifier [specifier ...]], --count [specifier [specifier
...]]
probe specifier to capture count of (see examples
below)
-v, --verbose print resulting BPF program code before executing
-I [header [header ...]], --include [header [header ...]]
additional header files to include in the BPF program
Probe specifier syntax:
{p,r}:[library]:function(signature)[:type
:expr[:filter]][;
label]
{p,r}:[library]:function(signature)[:type
[,type...]:expr[,expr...][:filter]][#
label]
Where:
p,r -- probe at function entry or at function exit
in exit probes
, only $retval is accessible
in exit probes
: can use $retval, $entry(param), $latency
library -- the library that contains the function
(leave empty for kernel functions)
function -- the function name to trace
signature -- the function's parameters, as in the C header
type -- the type of the expression to collect
expr -- the expression to collect
type -- the type of the expression to collect
(supports multiple)
expr -- the expression to collect
(supports multiple)
filter -- the filter that is applied to collected values
label -- the label for this probe in the resulting output
EXAMPLES:
argdist
.py
-H 'p::__kmalloc(u64 size):u64:size'
argdist -H 'p::__kmalloc(u64 size):u64:size'
Print a histogram of allocation sizes passed to kmalloc
argdist
.py
-p 1005 -C 'p:c:malloc(size_t size):size_t:size:size==16'
argdist -p 1005 -C 'p:c:malloc(size_t size):size_t:size:size==16'
Print a frequency count of how many times process 1005 called malloc
with an allocation size of 16 bytes
argdist
.py -C 'r:c:gets():char*:$retval;
snooped strings'
argdist
-C 'r:c:gets():char*:$retval#
snooped strings'
Snoop on all strings returned by gets()
argdist.py -p 1005 -C 'p:c:write(int fd):int:fd'
argdist -H 'r::__kmalloc(size_t size):u64:$latency/$entry(size)#ns per byte'
Print a histogram of nanoseconds per byte from kmalloc allocations
argdist -C 'p::__kmalloc(size_t size, gfp_t flags):size_t:size:flags&GFP_ATOMIC'
Print frequency count of kmalloc allocation sizes that have GFP_ATOMIC
argdist -p 1005 -C 'p:c:write(int fd):int:fd' -T 5
Print frequency counts of how many times writes were issued to a
particular file descriptor number, in process 1005
particular file descriptor number, in process 1005, but only show
the top 5 busiest fds
argdist
.py
-p 1005 -H 'r:c:read()'
argdist -p 1005 -H 'r:c:read()'
Print a histogram of error codes returned by read() in process 1005
argdist.py -H \
argdist -C 'r::__vfs_read():u32:$PID:$latency > 100000'
Print frequency of reads by process where the latency was >0.1ms
argdist -H 'r::__vfs_read(void *file, void *buf, size_t count):size_t:$entry(count):$latency > 1000000'
Print a histogram of read sizes that were longer than 1ms
argdist -H \
'p:c:write(int fd, const void *buf, size_t count):size_t:count:fd==1'
Print a histogram of buffer sizes passed to write() across all
processes, where the file descriptor was 1 (STDOUT)
argdist
.py -C 'p:c:fork();
fork calls'
argdist
-C 'p:c:fork()#
fork calls'
Count fork() calls in libc across all processes
Can also use funccount.py, which is easier and more flexible
argdist
.py
\
-H
'p:c:sleep(u32 seconds):u32:seconds' \
-H 'p:c:nanosleep(struct timespec { time_t tv_sec; long tv_nsec; }
*req):long:req->tv_nsec'
argdist
-H
\
'p:c:sleep(u32 seconds):u32:seconds' \
'p:c:nanosleep(struct timespec
*req):long:req->tv_nsec'
Print histograms of sleep() and nanosleep() parameter values
argdist
.py
-p 2780 -z 120 \
argdist -p 2780 -z 120 \
-C 'p:c:write(int fd, char* buf, size_t len):char*:buf:fd==1'
Spy on writes to STDOUT performed by process 2780, up to a string size
of 120 characters
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment