Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
B
bcc
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
Kirill Smelkov
bcc
Commits
fd60d55c
Commit
fd60d55c
authored
Mar 01, 2016
by
Sasha Goldshtein
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
tracepoint support for argdist and trace, and new tplist tool for printing tracepoints
parent
f6bf78f5
Changes
10
Show whitespace changes
Inline
Side-by-side
Showing
10 changed files
with
654 additions
and
60 deletions
+654
-60
README.md
README.md
+1
-0
man/man8/argdist.8
man/man8/argdist.8
+20
-4
man/man8/tplist.8
man/man8/tplist.8
+39
-0
man/man8/trace.8
man/man8/trace.8
+20
-3
src/python/bcc/__init__.py
src/python/bcc/__init__.py
+5
-0
tools/argdist.py
tools/argdist.py
+227
-37
tools/argdist_example.txt
tools/argdist_example.txt
+31
-4
tools/tplist.py
tools/tplist.py
+61
-0
tools/trace.py
tools/trace.py
+223
-12
tools/trace_example.txt
tools/trace_example.txt
+27
-0
No files found.
README.md
View file @
fd60d55c
...
@@ -108,6 +108,7 @@ Examples:
...
@@ -108,6 +108,7 @@ Examples:
-
tools/
[
tcpconnect
](
tools/tcpconnect.py
)
: Trace TCP active connections (connect()).
[
Examples
](
tools/tcpconnect_example.txt
)
.
-
tools/
[
tcpconnect
](
tools/tcpconnect.py
)
: Trace TCP active connections (connect()).
[
Examples
](
tools/tcpconnect_example.txt
)
.
-
tools/
[
tcpconnlat
](
tools/tcpconnlat.py
)
: Trace TCP active connection latency (connect()).
[
Examples
](
tools/tcpconnlat_example.txt
)
.
-
tools/
[
tcpconnlat
](
tools/tcpconnlat.py
)
: Trace TCP active connection latency (connect()).
[
Examples
](
tools/tcpconnlat_example.txt
)
.
-
tools/
[
tcpretrans
](
tools/tcpretrans.py
)
: Trace TCP retransmits and TLPs.
[
Examples
](
tools/tcpretrans_example.txt
)
.
-
tools/
[
tcpretrans
](
tools/tcpretrans.py
)
: Trace TCP retransmits and TLPs.
[
Examples
](
tools/tcpretrans_example.txt
)
.
-
tools/
[
tplist
](
tools/tplist.py
)
: Display kernel tracepoints and their format.
-
tools/
[
trace
](
tools/trace.py
)
: Trace arbitrary functions, with filters.
[
Examples
](
tools/trace_example.txt
)
-
tools/
[
trace
](
tools/trace.py
)
: Trace arbitrary functions, with filters.
[
Examples
](
tools/trace_example.txt
)
-
tools/
[
vfscount
](
tools/vfscount.py
)
tools/
[
vfscount.c
](
tools/vfscount.c
)
: Count VFS calls.
[
Examples
](
tools/vfscount_example.txt
)
.
-
tools/
[
vfscount
](
tools/vfscount.py
)
tools/
[
vfscount.c
](
tools/vfscount.c
)
: Count VFS calls.
[
Examples
](
tools/vfscount_example.txt
)
.
-
tools/
[
vfsstat
](
tools/vfsstat.py
)
tools/
[
vfsstat.c
](
tools/vfsstat.c
)
: Count some VFS calls, with column output.
[
Examples
](
tools/vfsstat_example.txt
)
.
-
tools/
[
vfsstat
](
tools/vfsstat.py
)
tools/
[
vfsstat.c
](
tools/vfsstat.c
)
: Count some VFS calls, with column output.
[
Examples
](
tools/vfsstat_example.txt
)
.
...
...
man/man8/argdist.8
View file @
fd60d55c
...
@@ -50,11 +50,11 @@ many cases, argdist will deduce the necessary header files automatically.
...
@@ -50,11 +50,11 @@ many cases, argdist will deduce the necessary header files automatically.
.SH SPECIFIER SYNTAX
.SH SPECIFIER SYNTAX
The general specifier syntax is as follows:
The general specifier syntax is as follows:
.B {p,r
}:[library]
:function(signature)[:type[,type...]:expr[,expr...][:filter]][#label]
.B {p,r
,t}:{[library],category}
:function(signature)[:type[,type...]:expr[,expr...][:filter]][#label]
.TP
.TP
.B {p,r}
.B {p,r
,t
}
Probe type \- "p" for function entry, "r" for function return
;
Probe type \- "p" for function entry, "r" for function return
, "t" for kernel
\-H for histogram collection, \-C for frequency count.
tracepoint;
\-H for histogram collection, \-C for frequency count.
Indicates where to place the probe and whether the probe should collect frequency
Indicates where to place the probe and whether the probe should collect frequency
count information, or aggregate the collected values into a histogram. Counting
count information, or aggregate the collected values into a histogram. Counting
probes will collect the number of times every parameter value was observed,
probes will collect the number of times every parameter value was observed,
...
@@ -68,12 +68,17 @@ Specify the full path to the .so or executable file where the function to probe
...
@@ -68,12 +68,17 @@ Specify the full path to the .so or executable file where the function to probe
resides. Alternatively, you can specify just the lib name: for example, "c"
resides. Alternatively, you can specify just the lib name: for example, "c"
refers to libc. If no library name is specified, the kernel is assumed.
refers to libc. If no library name is specified, the kernel is assumed.
.TP
.TP
.B category
The category of the kernel tracepoint. For example: net, sched, block.
.TP
.B function(signature)
.B function(signature)
The function to probe, and its signature.
The function to probe, and its signature.
The function name must match exactly for the probe to be placed. The signature,
The function name must match exactly for the probe to be placed. The signature,
on the other hand, is only required if you plan to collect parameter values
on the other hand, is only required if you plan to collect parameter values
based on that signature. For example, if you only want to collect the first
based on that signature. For example, if you only want to collect the first
parameter, you don't have to specify the rest of the parameters in the signature.
parameter, you don't have to specify the rest of the parameters in the signature.
When capturing kernel tracepoints, this should be the name of the event, e.g.
net_dev_start_xmit. The signature for kernel tracepoints should be empty.
.TP
.TP
.B [type[,type...]]
.B [type[,type...]]
The type(s) of the expression(s) to capture.
The type(s) of the expression(s) to capture.
...
@@ -85,6 +90,9 @@ The expression(s) to capture.
...
@@ -85,6 +90,9 @@ The expression(s) to capture.
These are the values that are assigned to the histogram or raw event collection.
These are the values that are assigned to the histogram or raw event collection.
You may use the parameters directly, or valid C expressions that involve the
You may use the parameters directly, or valid C expressions that involve the
parameters, such as "size % 10".
parameters, such as "size % 10".
Tracepoints may access a special structure called "tp" that is formatted according
to the tracepoint format (which you can obtain using tplist). For example, the
block:block_rq_complete tracepoint can access tp.nr_sector.
Return probes can use the argument values received by the
Return probes can use the argument values received by the
function when it was entered, through the $entry(paramname) special variable.
function when it was entered, through the $entry(paramname) special variable.
Return probes can also access the function's return value in $retval, and the
Return probes can also access the function's return value in $retval, and the
...
@@ -137,6 +145,14 @@ Count fork() calls in libc across all processes, grouped by pid:
...
@@ -137,6 +145,14 @@ Count fork() calls in libc across all processes, grouped by pid:
#
#
.B argdist -C 'p:c:fork():int:$PID;fork per process'
.B argdist -C 'p:c:fork():int:$PID;fork per process'
.TP
.TP
Print histogram of number of sectors in completing block I/O requests:
#
.B argdist -H 't:block:block_rq_complete():u32:tp.nr_sector'
.TP
Aggregate interrupts by interrupt request (IRQ):
#
.B argdist -C 't:irq:irq_handler_entry():int:tp.irq'
.TP
Print histograms of sleep() and nanosleep() parameter values:
Print histograms of sleep() and nanosleep() parameter values:
#
#
.B argdist -H 'p:c:sleep(u32 seconds):u32:seconds' 'p:c:nanosleep(struct timespec *req):long:req->tv_nsec'
.B argdist -H 'p:c:sleep(u32 seconds):u32:seconds' 'p:c:nanosleep(struct timespec *req):long:req->tv_nsec'
...
...
man/man8/tplist.8
0 → 100644
View file @
fd60d55c
.TH tplist 8 "2016-03-20" "USER COMMANDS"
.SH NAME
tplist \- Display kernel tracepoints and their format.
.SH SYNOPSIS
.B tplist [-v] [tracepoint]
.SH DESCRIPTION
tplist lists all kernel tracepoints, and can optionally print out the tracepoint
format; namely, the variables that you can trace when the tracepoint is hit. This
is usually used in conjunction with the argdist and/or trace tools.
On a typical system, accessing the tracepoint list and format requires root.
.SH OPTIONS
.TP
\-v
Display the variables associated with the tracepoint or tracepoints.
.TP
[tracepoint]
A wildcard expression that specifies which tracepoints to print. For example,
block:* will print all block tracepoints (block:block_rq_complete, etc.).
Regular expressions are not supported.
.SH EXAMPLES
.TP
Print all kernel tracepoints:
#
.B tplist
.TP
Print all net tracepoints with their format:
#
.B tplist -v 'net:*'
.SH SOURCE
This is from bcc.
.IP
https://github.com/iovisor/bcc
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Sasha Goldshtein
man/man8/trace.8
View file @
fd60d55c
...
@@ -45,10 +45,12 @@ information. See PROBE SYNTAX below.
...
@@ -45,10 +45,12 @@ information. See PROBE SYNTAX below.
The general probe syntax is as follows:
The general probe syntax is as follows:
.B [{p,r}]:[library]:function [(predicate)] ["format string"[, arguments]]
.B [{p,r}]:[library]:function [(predicate)] ["format string"[, arguments]]
.B t:category:event [(predicate)] ["format string"[, arguments]]
.TP
.TP
.B
[{p,r}]
.B
{[{p,r}],t}
Probe type \- "p" for function entry, "r" for function return
. The default
Probe type \- "p" for function entry, "r" for function return
, "t" for kernel
probe type is "p".
tracepoint. The default
probe type is "p".
.TP
.TP
.B [library]
.B [library]
Library containing the probe.
Library containing the probe.
...
@@ -58,9 +60,15 @@ refers to libc. If no library name is specified, the kernel is assumed. Also,
...
@@ -58,9 +60,15 @@ refers to libc. If no library name is specified, the kernel is assumed. Also,
you can specify an executable name (without a full path) if it is in the PATH.
you can specify an executable name (without a full path) if it is in the PATH.
For example, "bash".
For example, "bash".
.TP
.TP
.B category
The tracepoint category. For example, "sched" or "irq".
.TP
.B function
.B function
The function to probe.
The function to probe.
.TP
.TP
.B event
The tracepoint event. For example, "block_rq_complete".
.TP
.B [(predicate)]
.B [(predicate)]
The filter applied to the captured data. Only if the filter evaluates as true,
The filter applied to the captured data. Only if the filter evaluates as true,
the trace message will be printed. The filter can use any valid C expression
the trace message will be printed. The filter can use any valid C expression
...
@@ -81,6 +89,11 @@ number of arguments as there are placeholders in the format string. The
...
@@ -81,6 +89,11 @@ number of arguments as there are placeholders in the format string. The
format specifier replacements may be any C expressions, and may refer to the
format specifier replacements may be any C expressions, and may refer to the
same special keywords as in the predicate (arg1, arg2, etc.).
same special keywords as in the predicate (arg1, arg2, etc.).
In tracepoints, both the predicate and the arguments may refer to the tracepoint
format structure, which is stored in the special "tp" variable. For example, the
block:block_rq_complete tracepoint can print or filter by tp.nr_sector. To
discover the format of your tracepoint, use the tplist tool.
The predicate expression and the format specifier replacements for printing
The predicate expression and the format specifier replacements for printing
may also use the following special keywords: $pid, $tgid to refer to the
may also use the following special keywords: $pid, $tgid to refer to the
current process' pid and tgid; $uid, $gid to refer to the current user's
current process' pid and tgid; $uid, $gid to refer to the current user's
...
@@ -102,6 +115,10 @@ Trace all malloc calls and print the size of the requested allocation:
...
@@ -102,6 +115,10 @@ Trace all malloc calls and print the size of the requested allocation:
Trace returns from the readline function in bash and print the return value as a string:
Trace returns from the readline function in bash and print the return value as a string:
#
#
.B trace 'r:bash:readline """%s"", retval'
.B trace 'r:bash:readline """%s"", retval'
.TP
Trace the block:block_rq_complete tracepoint and print the number of sectors completed:
#
.B trace 't:block:block_rq_complete """%d sectors"", tp.nr_sector'
.SH SOURCE
.SH SOURCE
This is from bcc.
This is from bcc.
.IP
.IP
...
...
src/python/bcc/__init__.py
View file @
fd60d55c
...
@@ -325,6 +325,11 @@ class BPF(object):
...
@@ -325,6 +325,11 @@ class BPF(object):
global
open_kprobes
global
open_kprobes
return
open_kprobes
return
open_kprobes
@
staticmethod
def
open_uprobes
():
global
open_uprobes
return
open_uprobes
@
staticmethod
@
staticmethod
def
detach_kprobe
(
event
):
def
detach_kprobe
(
event
):
ev_name
=
"p_"
+
event
.
replace
(
"+"
,
"_"
).
replace
(
"."
,
"_"
)
ev_name
=
"p_"
+
event
.
replace
(
"+"
,
"_"
).
replace
(
"."
,
"_"
)
...
...
tools/argdist.py
View file @
fd60d55c
...
@@ -15,10 +15,129 @@
...
@@ -15,10 +15,129 @@
from
bcc
import
BPF
from
bcc
import
BPF
from
time
import
sleep
,
strftime
from
time
import
sleep
,
strftime
import
argparse
import
argparse
import
ctypes
as
ct
import
re
import
re
import
traceback
import
traceback
import
os
import
multiprocessing
import
sys
import
sys
class
Perf
(
object
):
class
perf_event_attr
(
ct
.
Structure
):
_fields_
=
[
(
'type'
,
ct
.
c_uint
),
(
'size'
,
ct
.
c_uint
),
(
'config'
,
ct
.
c_ulong
),
(
'sample_period'
,
ct
.
c_ulong
),
(
'sample_type'
,
ct
.
c_ulong
),
(
'IGNORE1'
,
ct
.
c_ulong
),
(
'IGNORE2'
,
ct
.
c_ulong
),
(
'wakeup_events'
,
ct
.
c_uint
),
(
'IGNORE3'
,
ct
.
c_uint
),
(
'IGNORE4'
,
ct
.
c_ulong
),
(
'IGNORE5'
,
ct
.
c_ulong
),
(
'IGNORE6'
,
ct
.
c_ulong
),
(
'IGNORE7'
,
ct
.
c_uint
),
(
'IGNORE8'
,
ct
.
c_int
),
(
'IGNORE9'
,
ct
.
c_ulong
),
(
'IGNORE10'
,
ct
.
c_uint
),
(
'IGNORE11'
,
ct
.
c_uint
)
]
NR_PERF_EVENT_OPEN
=
298
PERF_TYPE_TRACEPOINT
=
2
PERF_SAMPLE_RAW
=
1024
PERF_FLAG_FD_CLOEXEC
=
8
PERF_EVENT_IOC_SET_FILTER
=
1074275334
PERF_EVENT_IOC_ENABLE
=
9216
libc
=
ct
.
CDLL
(
'libc.so.6'
,
use_errno
=
True
)
syscall
=
libc
.
syscall
# not declaring vararg types
ioctl
=
libc
.
ioctl
# not declaring vararg types
@
staticmethod
def
_open_for_cpu
(
cpu
,
attr
):
pfd
=
Perf
.
syscall
(
Perf
.
NR_PERF_EVENT_OPEN
,
ct
.
byref
(
attr
),
-
1
,
cpu
,
-
1
,
Perf
.
PERF_FLAG_FD_CLOEXEC
)
if
pfd
<
0
:
errno_
=
ct
.
get_errno
()
raise
OSError
(
errno_
,
os
.
strerror
(
errno_
))
if
Perf
.
ioctl
(
pfd
,
Perf
.
PERF_EVENT_IOC_SET_FILTER
,
"common_pid == -17"
)
<
0
:
errno_
=
ct
.
get_errno
()
raise
OSError
(
errno_
,
os
.
strerror
(
errno_
))
if
Perf
.
ioctl
(
pfd
,
Perf
.
PERF_EVENT_IOC_ENABLE
,
0
)
<
0
:
errno_
=
ct
.
get_errno
()
raise
OSError
(
errno_
,
os
.
strerror
(
errno_
))
@
staticmethod
def
perf_event_open
(
tpoint_id
):
attr
=
Perf
.
perf_event_attr
()
attr
.
config
=
tpoint_id
attr
.
type
=
Perf
.
PERF_TYPE_TRACEPOINT
attr
.
sample_type
=
Perf
.
PERF_SAMPLE_RAW
attr
.
sample_period
=
1
attr
.
wakeup_events
=
1
for
cpu
in
range
(
0
,
multiprocessing
.
cpu_count
()):
Perf
.
_open_for_cpu
(
cpu
,
attr
)
class
Tracepoint
(
object
):
tracepoints_enabled
=
0
trace_root
=
"/sys/kernel/debug/tracing"
event_root
=
os
.
path
.
join
(
trace_root
,
"events"
)
@
staticmethod
def
generate_decl
():
if
Tracepoint
.
tracepoints_enabled
==
0
:
return
""
return
"
\
n
BPF_HASH(__trace_di, u64, u64);
\
n
"
@
staticmethod
def
generate_entry_probe
():
if
Tracepoint
.
tracepoints_enabled
==
0
:
return
""
return
"""
int __trace_entry_update(struct pt_regs *ctx)
{
u64 tid = bpf_get_current_pid_tgid();
u64 val = ctx->di;
__trace_di.update(&tid, &val);
return 0;
}
"""
@
staticmethod
def
enable_tracepoint
(
category
,
event
):
tp_id
=
Tracepoint
.
get_tpoint_id
(
category
,
event
)
if
tp_id
==
-
1
:
raise
ValueError
(
"no such tracepoint found: %s:%s"
%
(
category
,
event
))
Perf
.
perf_event_open
(
tp_id
)
Tracepoint
.
tracepoints_enabled
+=
1
@
staticmethod
def
get_tpoint_id
(
category
,
event
):
evt_dir
=
os
.
path
.
join
(
Tracepoint
.
event_root
,
category
,
event
)
try
:
return
int
(
open
(
os
.
path
.
join
(
evt_dir
,
"id"
)).
read
().
strip
())
except
:
return
-
1
@
staticmethod
def
get_tpoint_format
(
category
,
event
):
evt_dir
=
os
.
path
.
join
(
Tracepoint
.
event_root
,
category
,
event
)
try
:
return
open
(
os
.
path
.
join
(
evt_dir
,
"format"
)).
readlines
()
except
:
return
""
@
staticmethod
def
attach
(
bpf
):
if
Tracepoint
.
tracepoints_enabled
>
0
:
bpf
.
attach_kprobe
(
event
=
"tracing_generic_entry_update"
,
fn_name
=
"__trace_entry_update"
)
class
Specifier
(
object
):
class
Specifier
(
object
):
probe_text
=
"""
probe_text
=
"""
DATA_DECL
DATA_DECL
...
@@ -39,7 +158,8 @@ int PROBENAME(struct pt_regs *ctx SIGNATURE)
...
@@ -39,7 +158,8 @@ int PROBENAME(struct pt_regs *ctx SIGNATURE)
"linux/time.h"
:
[
"time"
],
"linux/time.h"
:
[
"time"
],
"linux/fs.h"
:
[
"fs"
,
"file"
],
"linux/fs.h"
:
[
"fs"
,
"file"
],
"linux/blkdev.h"
:
[
"bio"
,
"request"
],
"linux/blkdev.h"
:
[
"bio"
,
"request"
],
"linux/slab.h"
:
[
"alloc"
]
"linux/slab.h"
:
[
"alloc"
],
"linux/netdevice.h"
:
[
"sk_buff"
,
"net_device"
]
}
}
@
staticmethod
@
staticmethod
...
@@ -189,8 +309,8 @@ u64 __time = bpf_ktime_get_ns();
...
@@ -189,8 +309,8 @@ u64 __time = bpf_ktime_get_ns();
"
function
signature
must
be
specified
")
"
function
signature
must
be
specified
")
if len(parts) > 6:
if len(parts) > 6:
self._bail("
extraneous
':'
-
separated
parts
detected
")
self._bail("
extraneous
':'
-
separated
parts
detected
")
if parts[0] not in ["r", "
p
"]:
if parts[0] not in ["r", "
p
"
, "
t
"
]:
self._bail("
probe
type
must
be
either
'p'
or
'r
'
,
" +
self._bail("
probe
type
must
be
'p'
,
'r'
,
or
't
'
,
" +
"
but
got
'%s'" % parts[0])
"
but
got
'%s'" % parts[0])
if re.match(r"
\
w
+
\
(.
*
\
)
", parts[2]) is None:
if re.match(r"
\
w
+
\
(.
*
\
)
", parts[2]) is None:
self._bail(("
function
signature
'%s'
has
an
invalid
" +
self._bail(("
function
signature
'%s'
has
an
invalid
" +
...
@@ -216,11 +336,19 @@ u64 __time = bpf_ktime_get_ns();
...
@@ -216,11 +336,19 @@ u64 __time = bpf_ktime_get_ns();
parts = spec_and_label[0].strip().split(':')
parts = spec_and_label[0].strip().split(':')
self.type = type # hist or freq
self.type = type # hist or freq
self.is_ret_probe = parts[0] == "r"
self.probe_type = parts[0]
self.library = parts[1]
self.is_user = len(self.library) > 0
fparts = parts[2].split('(')
fparts = parts[2].split('(')
self.function = fparts[0].strip()
self.function = fparts[0].strip()
if self.probe_type == "
t
":
self.library = "" # kernel
self.tp_category = parts[1]
self.tp_event = self.function
Tracepoint.enable_tracepoint(
self.tp_category, self.tp_event)
self.function = "
perf_trace_
" + self.function
else:
self.library = parts[1]
self.is_user = len(self.library) > 0
self.signature = fparts[1].strip()[:-1]
self.signature = fparts[1].strip()[:-1]
self._parse_signature()
self._parse_signature()
...
@@ -235,12 +363,12 @@ u64 __time = bpf_ktime_get_ns();
...
@@ -235,12 +363,12 @@ u64 __time = bpf_ktime_get_ns();
if
self
.
type
==
"hist"
and
len
(
self
.
expr_types
)
>
1
:
if
self
.
type
==
"hist"
and
len
(
self
.
expr_types
)
>
1
:
self
.
_bail
(
"histograms can only have 1 expr"
)
self
.
_bail
(
"histograms can only have 1 expr"
)
else
:
else
:
if
not
self
.
is_ret_probe
and
self
.
type
==
"hist"
:
if
not
self
.
probe_type
==
"r"
and
self
.
type
==
"hist"
:
self
.
_bail
(
"histograms must have expr"
)
self
.
_bail
(
"histograms must have expr"
)
self
.
expr_types
=
\
self
.
expr_types
=
\
[
"u64"
if
not
self
.
is_ret_probe
else
"int"
]
[
"u64"
if
not
self
.
probe_type
==
"r"
else
"int"
]
self
.
exprs
=
\
self
.
exprs
=
\
[
"1"
if
not
self
.
is_ret_probe
else
"$retval"
]
[
"1"
if
not
self
.
probe_type
==
"r"
else
"$retval"
]
self
.
filter
=
""
if
len
(
parts
)
!=
6
else
parts
[
5
]
self
.
filter
=
""
if
len
(
parts
)
!=
6
else
parts
[
5
]
self
.
_substitute_exprs
()
self
.
_substitute_exprs
()
...
@@ -249,7 +377,7 @@ u64 __time = bpf_ktime_get_ns();
...
@@ -249,7 +377,7 @@ u64 __time = bpf_ktime_get_ns();
def
check
(
expr
):
def
check
(
expr
):
keywords
=
[
"$entry"
,
"$latency"
]
keywords
=
[
"$entry"
,
"$latency"
]
return
any
(
map
(
lambda
kw
:
kw
in
expr
,
keywords
))
return
any
(
map
(
lambda
kw
:
kw
in
expr
,
keywords
))
self
.
entry_probe_required
=
self
.
is_ret_probe
and
\
self
.
entry_probe_required
=
self
.
probe_type
==
"r"
and
\
(
any
(
map
(
check
,
self
.
exprs
))
or
check
(
self
.
filter
))
(
any
(
map
(
check
,
self
.
exprs
))
or
check
(
self
.
filter
))
self
.
pid
=
pid
self
.
pid
=
pid
...
@@ -278,11 +406,11 @@ u64 __time = bpf_ktime_get_ns();
...
@@ -278,11 +406,11 @@ u64 __time = bpf_ktime_get_ns();
def
_generate_field_assignment
(
self
,
i
):
def
_generate_field_assignment
(
self
,
i
):
if
self
.
_is_string
(
self
.
expr_types
[
i
]):
if
self
.
_is_string
(
self
.
expr_types
[
i
]):
return
"bpf_probe_read("
+
\
return
"
bpf_probe_read("
+
\
"&__key.v%d.s, sizeof(__key.v%d.s), %s);
\
n
"
%
\
"&__key.v%d.s, sizeof(__key.v%d.s), %s);
\
n
"
%
\
(
i
,
i
,
self
.
exprs
[
i
])
(
i
,
i
,
self
.
exprs
[
i
])
else
:
else
:
return
"__key.v%d = %s;
\
n
"
%
(
i
,
self
.
exprs
[
i
])
return
"
__key.v%d = %s;
\
n
"
%
(
i
,
self
.
exprs
[
i
])
def
_generate_hash_decl
(
self
):
def
_generate_hash_decl
(
self
):
if
self
.
type
==
"hist"
:
if
self
.
type
==
"hist"
:
...
@@ -324,28 +452,71 @@ u64 __time = bpf_ktime_get_ns();
...
@@ -324,28 +452,71 @@ u64 __time = bpf_ktime_get_ns();
else
:
else
:
return
""
return
""
def
generate_text
(
self
):
def
_generate_tpoint_entry_struct_fields
(
self
):
# We don't like tools writing tools (Brendan Gregg), but this
format_lines
=
Tracepoint
.
get_tpoint_format
(
self
.
tp_category
,
# is an exception because we're letting the user fully
self
.
tp_event
)
# customize the values we probe. As a rule of thumb though,
text
=
""
# try to build a custom tool for a specific purpose.
for
line
in
format_lines
:
match
=
re
.
search
(
r'field:([^;]*);.*size:(\
d+);
', line)
if match is None:
continue
parts = match.group(1).split()
field_name = parts[-1:][0]
field_type = " ".join(parts[:-1])
field_size = int(match.group(2))
if "__data_loc" in field_type:
continue
if field_name.startswith("common_"):
continue
text += " %s %s;\n" % (field_type, field_name)
return text
def _generate_tpoint_entry_struct(self):
text = """
struct %s {
u64 __do_not_use__;
%s
};
"""
self.tp_entry_struct_name = self.probe_func_name + \
"_trace_entry"
fields = self._generate_tpoint_entry_struct_fields()
return text % (self.tp_entry_struct_name, fields)
def _generate_tpoint_entry_prefix(self):
text = """
u64 tid = bpf_get_current_pid_tgid();
u64 *di = __trace_di.lookup(&tid);
if (di == 0) { return 0; }
struct %s tp = {};
bpf_probe_read(&tp, sizeof(tp), (void *)*di);
""" % self.tp_entry_struct_name
return text
def generate_text(self):
program = ""
program = ""
# If any entry arguments are probed in a ret probe, we need
# If any entry arguments are probed in a ret probe, we need
# to generate an entry probe to collect them
# to generate an entry probe to collect them
prefix = ""
prefix = ""
if self.entry_probe_required:
if self.entry_probe_required:
program
=
self
.
_generate_entry_probe
()
program
+
= self._generate_entry_probe()
prefix
=
self
.
_generate_retprobe_prefix
()
prefix
+
= self._generate_retprobe_prefix()
# Replace $entry(paramname) with a reference to the
# Replace $entry(paramname) with a reference to the
# value we collected when entering the function:
# value we collected when entering the function:
self._replace_entry_exprs()
self._replace_entry_exprs()
# If this is a tracepoint probe, generate a local variable
# that enables access to the tracepoint structure and also
# the structure definition itself
if self.probe_type == "t":
program += self._generate_tpoint_entry_struct()
prefix += self._generate_tpoint_entry_prefix()
program += self.probe_text.replace("PROBENAME",
program += self.probe_text.replace("PROBENAME",
self.probe_func_name)
self.probe_func_name)
signature = "" if len(self.signature) == 0 \
signature = "" if len(self.signature) == 0 \
or
self
.
is_ret_probe
\
or self.
probe_type == "r"
\
else ", " + self.signature
else ", " + self.signature
program = program.replace("SIGNATURE", signature)
program = program.replace("SIGNATURE", signature)
program = program.replace("PID_FILTER",
program = program.replace("PID_FILTER",
...
@@ -364,8 +535,10 @@ u64 __time = bpf_ktime_get_ns();
...
@@ -364,8 +535,10 @@ u64 __time = bpf_ktime_get_ns();
def attach(self, bpf):
def attach(self, bpf):
self.bpf = bpf
self.bpf = bpf
uprobes_start = len(BPF.open_uprobes())
kprobes_start = len(BPF.open_kprobes())
if self.is_user:
if self.is_user:
if
self
.
is_ret_probe
:
if self.
probe_type == "r"
:
bpf.attach_uretprobe(name=self.library,
bpf.attach_uretprobe(name=self.library,
sym=self.function,
sym=self.function,
fn_name=self.probe_func_name,
fn_name=self.probe_func_name,
...
@@ -375,13 +548,17 @@ u64 __time = bpf_ktime_get_ns();
...
@@ -375,13 +548,17 @@ u64 __time = bpf_ktime_get_ns();
sym=self.function,
sym=self.function,
fn_name=self.probe_func_name,
fn_name=self.probe_func_name,
pid=self.pid or -1)
pid=self.pid or -1)
if len(BPF.open_uprobes()) != uprobes_start + 1:
self._bail("error attaching probe")
else:
else:
if
self
.
is_ret_probe
:
if self.
probe_type == "r" or self.probe_type == "t"
:
bpf.attach_kretprobe(event=self.function,
bpf.attach_kretprobe(event=self.function,
fn_name=self.probe_func_name)
fn_name=self.probe_func_name)
else:
else:
bpf.attach_kprobe(event=self.function,
bpf.attach_kprobe(event=self.function,
fn_name=self.probe_func_name)
fn_name=self.probe_func_name)
if len(BPF.open_kprobes()) != kprobes_start + 1:
self._bail("error attaching probe")
if self.entry_probe_required:
if self.entry_probe_required:
self._attach_entry_probe()
self._attach_entry_probe()
...
@@ -406,7 +583,7 @@ u64 __time = bpf_ktime_get_ns();
...
@@ -406,7 +583,7 @@ u64 __time = bpf_ktime_get_ns();
def _display_key(self, key):
def _display_key(self, key):
if self.is_default_expr:
if self.is_default_expr:
if not self.
is_ret_probe
:
if not self.
probe_type == "r"
:
return "total calls"
return "total calls"
else:
else:
return "retval = %s" % str(key.v0)
return "retval = %s" % str(key.v0)
...
@@ -431,7 +608,7 @@ u64 __time = bpf_ktime_get_ns();
...
@@ -431,7 +608,7 @@ u64 __time = bpf_ktime_get_ns();
# Print some nice values if the user didn'
t
# Print some nice values if the user didn'
t
# specify an expression to probe
# specify an expression to probe
if
self
.
is_default_expr
:
if
self
.
is_default_expr
:
if not self.
is_ret_probe
:
if
not
self
.
probe_type
==
"r"
:
key_str
=
"total calls"
key_str
=
"total calls"
else
:
else
:
key_str
=
"retval = %s"
%
\
key_str
=
"retval = %s"
%
\
...
@@ -448,13 +625,14 @@ u64 __time = bpf_ktime_get_ns();
...
@@ -448,13 +625,14 @@ u64 __time = bpf_ktime_get_ns();
class
Tool
(
object
):
class
Tool
(
object
):
examples
=
"""
examples
=
"""
Probe specifier syntax:
Probe specifier syntax:
{p,r
}:[library]
:function(signature)[:type[,type...]:expr[,expr...][:filter]][#label]
{p,r
,t}:{[library],category}
:function(signature)[:type[,type...]:expr[,expr...][:filter]][#label]
Where:
Where:
p,r
-- probe at function entry or at function exi
t
p,r
,t -- probe at function entry, function exit, or kernel tracepoin
t
in exit probes: can use $retval, $entry(param), $latency
in exit probes: can use $retval, $entry(param), $latency
library -- the library that contains the function
library -- the library that contains the function
(leave empty for kernel functions)
(leave empty for kernel functions)
function -- the function name to trace
category -- the category of the kernel tracepoint (e.g. net, sched)
function -- the function name to trace (or tracepoint name)
signature -- the function's parameters, as in the C header
signature -- the function's parameters, as in the C header
type -- the type of the expression to collect (supports multiple)
type -- the type of the expression to collect (supports multiple)
expr -- the expression to collect (supports multiple)
expr -- the expression to collect (supports multiple)
...
@@ -502,6 +680,12 @@ argdist -C 'p:c:fork()#fork calls'
...
@@ -502,6 +680,12 @@ argdist -C 'p:c:fork()#fork calls'
Count fork() calls in libc across all processes
Count fork() calls in libc across all processes
Can also use funccount.py, which is easier and more flexible
Can also use funccount.py, which is easier and more flexible
argdist -H 't:block:block_rq_complete():u32:tp.nr_sector'
Print histogram of number of sectors in completing block I/O requests
argdist -C 't:irq:irq_handler_entry():int:tp.irq'
Aggregate interrupts by interrupt request (IRQ)
argdist -H
\
\
argdist -H
\
\
'p:c:sleep(u32 seconds):u32:seconds'
\
\
'p:c:sleep(u32 seconds):u32:seconds'
\
\
'p:c:nanosleep(struct timespec *req):long:req->tv_nsec'
'p:c:nanosleep(struct timespec *req):long:req->tv_nsec'
...
@@ -555,7 +739,7 @@ argdist -p 2780 -z 120 \\
...
@@ -555,7 +739,7 @@ argdist -p 2780 -z 120 \\
Specifier
(
"hist"
,
histspecifier
,
self
.
args
.
pid
))
Specifier
(
"hist"
,
histspecifier
,
self
.
args
.
pid
))
if
len
(
self
.
specifiers
)
==
0
:
if
len
(
self
.
specifiers
)
==
0
:
print
(
"at least one specifier is required"
)
print
(
"at least one specifier is required"
)
exit(
1
)
exit
()
def
_generate_program
(
self
):
def
_generate_program
(
self
):
bpf_source
=
"""
bpf_source
=
"""
...
@@ -567,6 +751,8 @@ struct __string_t { char s[%d]; };
...
@@ -567,6 +751,8 @@ struct __string_t { char s[%d]; };
bpf_source
+=
"#include <%s>
\
n
"
%
include
bpf_source
+=
"#include <%s>
\
n
"
%
include
bpf_source
+=
Specifier
.
generate_auto_includes
(
bpf_source
+=
Specifier
.
generate_auto_includes
(
map
(
lambda
s
:
s
.
raw_spec
,
self
.
specifiers
))
map
(
lambda
s
:
s
.
raw_spec
,
self
.
specifiers
))
bpf_source
+=
Tracepoint
.
generate_decl
()
bpf_source
+=
Tracepoint
.
generate_entry_probe
()
for
specifier
in
self
.
specifiers
:
for
specifier
in
self
.
specifiers
:
bpf_source
+=
specifier
.
generate_text
()
bpf_source
+=
specifier
.
generate_text
()
if
self
.
args
.
verbose
:
if
self
.
args
.
verbose
:
...
@@ -574,8 +760,12 @@ struct __string_t { char s[%d]; };
...
@@ -574,8 +760,12 @@ struct __string_t { char s[%d]; };
self
.
bpf
=
BPF
(
text
=
bpf_source
)
self
.
bpf
=
BPF
(
text
=
bpf_source
)
def
_attach
(
self
):
def
_attach
(
self
):
Tracepoint
.
attach
(
self
.
bpf
)
for
specifier
in
self
.
specifiers
:
for
specifier
in
self
.
specifiers
:
specifier
.
attach
(
self
.
bpf
)
specifier
.
attach
(
self
.
bpf
)
if
self
.
args
.
verbose
:
print
(
"open uprobes: %s"
%
BPF
.
open_uprobes
())
print
(
"open kprobes: %s"
%
BPF
.
open_kprobes
())
def
_main_loop
(
self
):
def
_main_loop
(
self
):
count_so_far
=
0
count_so_far
=
0
...
@@ -601,7 +791,7 @@ struct __string_t { char s[%d]; };
...
@@ -601,7 +791,7 @@ struct __string_t { char s[%d]; };
except
:
except
:
if
self
.
args
.
verbose
:
if
self
.
args
.
verbose
:
traceback
.
print_exc
()
traceback
.
print_exc
()
el
se
:
el
if
sys
.
exc_type
is
not
SystemExit
:
print
(
sys
.
exc_value
)
print
(
sys
.
exc_value
)
if
__name__
==
"__main__"
:
if
__name__
==
"__main__"
:
...
...
tools/argdist_example.txt
View file @
fd60d55c
...
@@ -262,6 +262,27 @@ p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size
...
@@ -262,6 +262,27 @@ p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size
The flags value must be expanded by hand, but it's still helpful to eliminate
The flags value must be expanded by hand, but it's still helpful to eliminate
certain kinds of allocations or visually group them together.
certain kinds of allocations or visually group them together.
argdist also has basic support for kernel tracepoints. It is sometimes more
convenient to use tracepoints because they are documented and don't vary a lot
between kernel versions like function signatures tend to. For example, let's
trace the net:net_dev_start_xmit tracepoint and print the interface name that
is transmitting:
# argdist -C 't:net:net_dev_start_xmit(void *a, void *b, struct net_device *c):char*:c->name' -n 2
[05:01:10]
t:net:net_dev_start_xmit(void *a, void *b, struct net_device *c):char*:c->name
COUNT EVENT
4 c->name = eth0
[05:01:11]
t:net:net_dev_start_xmit(void *a, void *b, struct net_device *c):char*:c->name
COUNT EVENT
6 c->name = lo
92 c->name = eth0
Note that to determine the necessary function signature you need to look at the
TP_PROTO declaration in the kernel headers. For example, the net_dev_start_xmit
tracepoint is defined in the include/trace/events/net.h header file.
Here's a final example that finds how many write() system calls are performed
Here's a final example that finds how many write() system calls are performed
by each process on the system:
by each process on the system:
...
@@ -311,13 +332,13 @@ optional arguments:
...
@@ -311,13 +332,13 @@ optional arguments:
additional header files to include in the BPF program
additional header files to include in the BPF program
Probe specifier syntax:
Probe specifier syntax:
{p,r
}:[library]
:function(signature)[:type[,type...]:expr[,expr...][:filter]][#label]
{p,r
,t}:{[library],category}
:function(signature)[:type[,type...]:expr[,expr...][:filter]][#label]
Where:
Where:
p,r
-- probe at function entry or at function exi
t
p,r
,t -- probe at function entry, function exit, or kernel tracepoin
t
in exit probes: can use $retval, $entry(param), $latency
in exit probes: can use $retval, $entry(param), $latency
library -- the library that contains the function
library -- the library that contains the function
(leave empty for kernel functions)
(leave empty for kernel functions)
function -- the function name to trace
category -- the category of the kernel tracepoint (e.g. net, sched)
signature -- the function's parameters, as in the C header
signature -- the function's parameters, as in the C header
type -- the type of the expression to collect (supports multiple)
type -- the type of the expression to collect (supports multiple)
expr -- the expression to collect (supports multiple)
expr -- the expression to collect (supports multiple)
...
@@ -365,6 +386,12 @@ argdist -C 'p:c:fork()#fork calls'
...
@@ -365,6 +386,12 @@ argdist -C 'p:c:fork()#fork calls'
Count fork() calls in libc across all processes
Count fork() calls in libc across all processes
Can also use funccount.py, which is easier and more flexible
Can also use funccount.py, which is easier and more flexible
argdist -H 't:block:block_rq_complete():u32:tp.nr_sector'
Print histogram of number of sectors in completing block I/O requests
argdist -C 't:irq:irq_handler_entry():int:tp.irq'
Aggregate interrupts by interrupt request (IRQ)
argdist -H \
argdist -H \
'p:c:sleep(u32 seconds):u32:seconds' \
'p:c:sleep(u32 seconds):u32:seconds' \
'p:c:nanosleep(struct timespec *req):long:req->tv_nsec'
'p:c:nanosleep(struct timespec *req):long:req->tv_nsec'
...
...
tools/tplist.py
0 → 100755
View file @
fd60d55c
#!/usr/bin/env python
#
# tplist Display kernel tracepoints and their formats.
#
# USAGE: tplist [-v] [tracepoint]
#
# Licensed under the Apache License, Version 2.0 (the "License")
# Copyright (C) 2016 Sasha Goldshtein.
import
argparse
import
fnmatch
import
re
import
os
trace_root
=
"/sys/kernel/debug/tracing"
event_root
=
os
.
path
.
join
(
trace_root
,
"events"
)
parser
=
argparse
.
ArgumentParser
(
description
=
"Display kernel tracepoints and their formats."
,
formatter_class
=
argparse
.
RawDescriptionHelpFormatter
)
parser
.
add_argument
(
"-v"
,
dest
=
"variables"
,
action
=
"store_true"
,
help
=
"Print the format (available variables) for each tracepoint"
)
parser
.
add_argument
(
dest
=
"tracepoint"
,
nargs
=
"?"
,
help
=
"The tracepoint name to print (wildcards allowed)"
)
args
=
parser
.
parse_args
()
def
print_tpoint_format
(
category
,
event
):
fmt
=
open
(
os
.
path
.
join
(
event_root
,
category
,
event
,
"format"
)
).
readlines
()
for
line
in
fmt
:
match
=
re
.
search
(
r'field:([^;]*);'
,
line
)
if
match
is
None
:
continue
parts
=
match
.
group
(
1
).
split
()
field_name
=
parts
[
-
1
:][
0
]
field_type
=
" "
.
join
(
parts
[:
-
1
])
if
"__data_loc"
in
field_type
:
continue
if
field_name
.
startswith
(
"common_"
):
continue
print
(
" %s %s;"
%
(
field_type
,
field_name
))
def
print_tpoint
(
category
,
event
):
tpoint
=
"%s:%s"
%
(
category
,
event
)
if
not
args
.
tracepoint
or
fnmatch
.
fnmatch
(
tpoint
,
args
.
tracepoint
):
print
(
tpoint
)
if
args
.
variables
:
print_tpoint_format
(
category
,
event
)
def
print_all
():
for
category
in
os
.
listdir
(
event_root
):
cat_dir
=
os
.
path
.
join
(
event_root
,
category
)
if
not
os
.
path
.
isdir
(
cat_dir
):
continue
for
event
in
os
.
listdir
(
cat_dir
):
evt_dir
=
os
.
path
.
join
(
cat_dir
,
event
)
if
os
.
path
.
isdir
(
evt_dir
):
print_tpoint
(
category
,
event
)
if
__name__
==
"__main__"
:
print_all
()
tools/trace.py
View file @
fd60d55c
...
@@ -5,6 +5,7 @@
...
@@ -5,6 +5,7 @@
#
#
# USAGE: trace [-h] [-p PID] [-v] [-Z STRING_SIZE] [-S] [-M MAX_EVENTS] [-o]
# USAGE: trace [-h] [-p PID] [-v] [-Z STRING_SIZE] [-S] [-M MAX_EVENTS] [-o]
# probe [probe ...]
# probe [probe ...]
#
# Licensed under the Apache License, Version 2.0 (the "License")
# Licensed under the Apache License, Version 2.0 (the "License")
# Copyright (C) 2016 Sasha Goldshtein.
# Copyright (C) 2016 Sasha Goldshtein.
...
@@ -13,6 +14,7 @@ from time import sleep, strftime
...
@@ -13,6 +14,7 @@ from time import sleep, strftime
import
argparse
import
argparse
import
re
import
re
import
ctypes
as
ct
import
ctypes
as
ct
import
multiprocessing
import
os
import
os
import
traceback
import
traceback
import
sys
import
sys
...
@@ -42,6 +44,122 @@ class Time(object):
...
@@ -42,6 +44,122 @@ class Time(object):
raise
OSError
(
errno_
,
os
.
strerror
(
errno_
))
raise
OSError
(
errno_
,
os
.
strerror
(
errno_
))
return
t
.
tv_sec
*
1e9
+
t
.
tv_nsec
return
t
.
tv_sec
*
1e9
+
t
.
tv_nsec
class
Perf
(
object
):
class
perf_event_attr
(
ct
.
Structure
):
_fields_
=
[
(
'type'
,
ct
.
c_uint
),
(
'size'
,
ct
.
c_uint
),
(
'config'
,
ct
.
c_ulong
),
(
'sample_period'
,
ct
.
c_ulong
),
(
'sample_type'
,
ct
.
c_ulong
),
(
'IGNORE1'
,
ct
.
c_ulong
),
(
'IGNORE2'
,
ct
.
c_ulong
),
(
'wakeup_events'
,
ct
.
c_uint
),
(
'IGNORE3'
,
ct
.
c_uint
),
(
'IGNORE4'
,
ct
.
c_ulong
),
(
'IGNORE5'
,
ct
.
c_ulong
),
(
'IGNORE6'
,
ct
.
c_ulong
),
(
'IGNORE7'
,
ct
.
c_uint
),
(
'IGNORE8'
,
ct
.
c_int
),
(
'IGNORE9'
,
ct
.
c_ulong
),
(
'IGNORE10'
,
ct
.
c_uint
),
(
'IGNORE11'
,
ct
.
c_uint
)
]
NR_PERF_EVENT_OPEN
=
298
PERF_TYPE_TRACEPOINT
=
2
PERF_SAMPLE_RAW
=
1024
PERF_FLAG_FD_CLOEXEC
=
8
PERF_EVENT_IOC_SET_FILTER
=
1074275334
PERF_EVENT_IOC_ENABLE
=
9216
libc
=
ct
.
CDLL
(
'libc.so.6'
,
use_errno
=
True
)
syscall
=
libc
.
syscall
# not declaring vararg types
ioctl
=
libc
.
ioctl
# not declaring vararg types
@
staticmethod
def
_open_for_cpu
(
cpu
,
attr
):
pfd
=
Perf
.
syscall
(
Perf
.
NR_PERF_EVENT_OPEN
,
ct
.
byref
(
attr
),
-
1
,
cpu
,
-
1
,
Perf
.
PERF_FLAG_FD_CLOEXEC
)
if
pfd
<
0
:
errno_
=
ct
.
get_errno
()
raise
OSError
(
errno_
,
os
.
strerror
(
errno_
))
if
Perf
.
ioctl
(
pfd
,
Perf
.
PERF_EVENT_IOC_SET_FILTER
,
"common_pid == -17"
)
<
0
:
errno_
=
ct
.
get_errno
()
raise
OSError
(
errno_
,
os
.
strerror
(
errno_
))
if
Perf
.
ioctl
(
pfd
,
Perf
.
PERF_EVENT_IOC_ENABLE
,
0
)
<
0
:
errno_
=
ct
.
get_errno
()
raise
OSError
(
errno_
,
os
.
strerror
(
errno_
))
@
staticmethod
def
perf_event_open
(
tpoint_id
):
attr
=
Perf
.
perf_event_attr
()
attr
.
config
=
tpoint_id
attr
.
type
=
Perf
.
PERF_TYPE_TRACEPOINT
attr
.
sample_type
=
Perf
.
PERF_SAMPLE_RAW
attr
.
sample_period
=
1
attr
.
wakeup_events
=
1
for
cpu
in
range
(
0
,
multiprocessing
.
cpu_count
()):
Perf
.
_open_for_cpu
(
cpu
,
attr
)
class
Tracepoint
(
object
):
tracepoints_enabled
=
0
trace_root
=
"/sys/kernel/debug/tracing"
event_root
=
os
.
path
.
join
(
trace_root
,
"events"
)
@
staticmethod
def
generate_decl
():
if
Tracepoint
.
tracepoints_enabled
==
0
:
return
""
return
"
\
n
BPF_HASH(__trace_di, u64, u64);
\
n
"
@
staticmethod
def
generate_entry_probe
():
if
Tracepoint
.
tracepoints_enabled
==
0
:
return
""
return
"""
int __trace_entry_update(struct pt_regs *ctx)
{
u64 tid = bpf_get_current_pid_tgid();
u64 val = ctx->di;
__trace_di.update(&tid, &val);
return 0;
}
"""
@
staticmethod
def
enable_tracepoint
(
category
,
event
):
tp_id
=
Tracepoint
.
get_tpoint_id
(
category
,
event
)
if
tp_id
==
-
1
:
raise
ValueError
(
"no such tracepoint found: %s:%s"
%
(
category
,
event
))
Perf
.
perf_event_open
(
tp_id
)
Tracepoint
.
tracepoints_enabled
+=
1
@
staticmethod
def
get_tpoint_id
(
category
,
event
):
evt_dir
=
os
.
path
.
join
(
Tracepoint
.
event_root
,
category
,
event
)
try
:
return
int
(
open
(
os
.
path
.
join
(
evt_dir
,
"id"
)).
read
().
strip
())
except
:
return
-
1
@
staticmethod
def
get_tpoint_format
(
category
,
event
):
evt_dir
=
os
.
path
.
join
(
Tracepoint
.
event_root
,
category
,
event
)
try
:
return
open
(
os
.
path
.
join
(
evt_dir
,
"format"
)).
readlines
()
except
:
return
""
@
staticmethod
def
attach
(
bpf
):
if
Tracepoint
.
tracepoints_enabled
>
0
:
bpf
.
attach_kprobe
(
event
=
"tracing_generic_entry_update"
,
fn_name
=
"__trace_entry_update"
)
class
Probe
(
object
):
class
Probe
(
object
):
probe_count
=
0
probe_count
=
0
max_events
=
None
max_events
=
None
...
@@ -72,6 +190,47 @@ class Probe(object):
...
@@ -72,6 +190,47 @@ class Probe(object):
def
is_default_action
(
self
):
def
is_default_action
(
self
):
return
self
.
python_format
==
""
return
self
.
python_format
==
""
def
_generate_tpoint_entry_struct_fields
(
self
):
format_lines
=
Tracepoint
.
get_tpoint_format
(
self
.
tp_category
,
self
.
tp_event
)
text
=
""
for
line
in
format_lines
:
match
=
re
.
search
(
r'field:([^;]*);.*size:(\
d+);
', line)
if match is None:
continue
parts = match.group(1).split()
field_name = parts[-1:][0]
field_type = " ".join(parts[:-1])
field_size = int(match.group(2))
if "__data_loc" in field_type:
continue
if field_name.startswith("common_"):
continue
text += " %s %s;\n" % (field_type, field_name)
return text
def _generate_tpoint_entry_struct(self):
text = """
struct %s {
u64 __do_not_use__;
%s
};
"""
self.tp_entry_struct_name = self.probe_name + \
"_trace_entry"
fields = self._generate_tpoint_entry_struct_fields()
return text % (self.tp_entry_struct_name, fields)
def _generate_tpoint_entry_prefix(self):
text = """
u64 tid = bpf_get_current_pid_tgid();
u64 *di = __trace_di.lookup(&tid);
if (di == 0) { return 0; }
struct %s tp = {};
bpf_probe_read(&tp, sizeof(tp), (void *)*di);
""" % self.tp_entry_struct_name
return text
def _bail(self, error):
def _bail(self, error):
raise ValueError("error parsing probe '
%
s
': %s" %
raise ValueError("error parsing probe '
%
s
': %s" %
(self.raw_probe, error))
(self.raw_probe, error))
...
@@ -123,11 +282,19 @@ class Probe(object):
...
@@ -123,11 +282,19 @@ class Probe(object):
parts = ["p", parts[0], parts[1]]
parts = ["p", parts[0], parts[1]]
if len(parts[0]) == 0:
if len(parts[0]) == 0:
self.probe_type = "p"
self.probe_type = "p"
elif
parts
[
0
]
in
[
"p"
,
"r"
]:
elif parts[0] in ["p", "r"
, "t"
]:
self.probe_type = parts[0]
self.probe_type = parts[0]
else:
else:
self
.
_bail
(
"expected '', 'p', or 'r', got '%s'"
%
self._bail("expected '', '
p
',
'
t
',
or 'r', got '
%
s
'" %
parts[0])
parts[0])
if self.probe_type == "t":
self.tp_category = parts[1]
self.tp_event = parts[2]
Tracepoint.enable_tracepoint(self.tp_category,
self.tp_event)
self.library = "" # kernel
self.function = "perf_trace_%s" % self.tp_event
else:
self.library = parts[1]
self.library = parts[1]
self.function = parts[2]
self.function = parts[2]
...
@@ -149,11 +316,16 @@ class Probe(object):
...
@@ -149,11 +316,16 @@ class Probe(object):
if len(action) == 0:
if len(action) == 0:
return
return
parts
=
action
.
split
(
','
)
action = action.strip()
self
.
raw_format
=
parts
[
0
]
match = re.search(r'(
\
"
.*
\
"
),?(.*)', action)
if match is None:
self._bail("
expected
format
string
in
\
"s"
)
self
.
raw_format
=
match
.
group
(
1
)
self
.
_parse_types
(
self
.
raw_format
)
self
.
_parse_types
(
self
.
raw_format
)
for
part
in
parts
[
1
:]
:
for
part
in
match
.
group
(
2
).
split
(
','
)
:
part
=
self
.
_replace_args
(
part
)
part
=
self
.
_replace_args
(
part
)
if
len
(
part
)
>
0
:
self
.
values
.
append
(
part
)
self
.
values
.
append
(
part
)
aliases
=
{
aliases
=
{
...
@@ -283,9 +455,15 @@ BPF_PERF_OUTPUT(%s);
...
@@ -283,9 +455,15 @@ BPF_PERF_OUTPUT(%s);
for
i
,
expr
in
enumerate
(
self
.
values
):
for
i
,
expr
in
enumerate
(
self
.
values
):
data_fields
+=
self
.
_generate_field_assign
(
i
)
data_fields
+=
self
.
_generate_field_assign
(
i
)
prefix
=
""
if
self
.
probe_type
==
"t"
:
data_decl
+=
self
.
_generate_tpoint_entry_struct
()
prefix
=
self
.
_generate_tpoint_entry_prefix
()
text
=
"""
text
=
"""
int %s(struct pt_regs *ctx)
int %s(struct pt_regs *ctx)
{
{
%s
%s
%s
if (!(%s)) return 0;
if (!(%s)) return 0;
...
@@ -298,7 +476,7 @@ int %s(struct pt_regs *ctx)
...
@@ -298,7 +476,7 @@ int %s(struct pt_regs *ctx)
return 0;
return 0;
}
}
"""
"""
text
=
text
%
(
self
.
probe_name
,
pid_filter
,
text
=
text
%
(
self
.
probe_name
,
pid_filter
,
prefix
,
self
.
filter
,
self
.
struct_name
,
self
.
filter
,
self
.
struct_name
,
data_fields
,
self
.
events_name
)
data_fields
,
self
.
events_name
)
...
@@ -308,6 +486,31 @@ int %s(struct pt_regs *ctx)
...
@@ -308,6 +486,31 @@ int %s(struct pt_regs *ctx)
def
_time_off_str
(
cls
,
timestamp_ns
):
def
_time_off_str
(
cls
,
timestamp_ns
):
return
"%.6f"
%
(
1e-9
*
(
timestamp_ns
-
cls
.
first_ts
))
return
"%.6f"
%
(
1e-9
*
(
timestamp_ns
-
cls
.
first_ts
))
auto_includes
=
{
"linux/time.h"
:
[
"time"
],
"linux/fs.h"
:
[
"fs"
,
"file"
],
"linux/blkdev.h"
:
[
"bio"
,
"request"
],
"linux/slab.h"
:
[
"alloc"
],
"linux/netdevice.h"
:
[
"sk_buff"
]
}
@
classmethod
def
generate_auto_includes
(
cls
,
probes
):
headers
=
""
for
header
,
keywords
in
cls
.
auto_includes
.
items
():
for
keyword
in
keywords
:
for
probe
in
probes
:
if
keyword
in
probe
:
headers
+=
"#include <%s>
\
n
"
\
%
header
return
headers
def
_display_function
(
self
):
if
self
.
probe_type
!=
't'
:
return
self
.
function
else
:
return
self
.
function
.
replace
(
"perf_trace_"
,
""
)
def
print_event
(
self
,
cpu
,
data
,
size
):
def
print_event
(
self
,
cpu
,
data
,
size
):
# Cast as the generated structure type and display
# Cast as the generated structure type and display
# according to the format string in the probe.
# according to the format string in the probe.
...
@@ -318,7 +521,8 @@ int %s(struct pt_regs *ctx)
...
@@ -318,7 +521,8 @@ int %s(struct pt_regs *ctx)
time
=
strftime
(
"%H:%M:%S"
)
if
Probe
.
use_localtime
else
\
time
=
strftime
(
"%H:%M:%S"
)
if
Probe
.
use_localtime
else
\
Probe
.
_time_off_str
(
event
.
timestamp_ns
)
Probe
.
_time_off_str
(
event
.
timestamp_ns
)
print
(
"%-8s %-6d %-12s %-16s %s"
%
\
print
(
"%-8s %-6d %-12s %-16s %s"
%
\
(
time
[:
8
],
event
.
pid
,
event
.
comm
[:
12
],
self
.
function
,
msg
))
(
time
[:
8
],
event
.
pid
,
event
.
comm
[:
12
],
self
.
_display_function
(),
msg
))
Probe
.
event_count
+=
1
Probe
.
event_count
+=
1
if
Probe
.
max_events
is
not
None
and
\
if
Probe
.
max_events
is
not
None
and
\
...
@@ -337,7 +541,7 @@ int %s(struct pt_regs *ctx)
...
@@ -337,7 +541,7 @@ int %s(struct pt_regs *ctx)
if
self
.
probe_type
==
"r"
:
if
self
.
probe_type
==
"r"
:
bpf
.
attach_kretprobe
(
event
=
self
.
function
,
bpf
.
attach_kretprobe
(
event
=
self
.
function
,
fn_name
=
self
.
probe_name
)
fn_name
=
self
.
probe_name
)
elif
self
.
probe_type
==
"p"
:
elif
self
.
probe_type
==
"p"
or
self
.
probe_type
==
"t"
:
bpf
.
attach_kprobe
(
event
=
self
.
function
,
bpf
.
attach_kprobe
(
event
=
self
.
function
,
fn_name
=
self
.
probe_name
)
fn_name
=
self
.
probe_name
)
...
@@ -384,6 +588,8 @@ trace 'r::__kmalloc (retval == 0) "kmalloc failed!"
...
@@ -384,6 +588,8 @@ trace 'r::__kmalloc (retval == 0) "kmalloc failed!"
Trace returns from __kmalloc which returned a null pointer
Trace returns from __kmalloc which returned a null pointer
trace 'r:c:malloc (retval) "allocated = %p", retval
trace 'r:c:malloc (retval) "allocated = %p", retval
Trace returns from malloc and print non-NULL allocated buffers
Trace returns from malloc and print non-NULL allocated buffers
trace 't:block:block_rq_complete "sectors=%d", tp.nr_sector'
Trace the block_rq_complete kernel tracepoint and print # of tx sectors
"""
"""
def
__init__
(
self
):
def
__init__
(
self
):
...
@@ -420,6 +626,10 @@ trace 'r:c:malloc (retval) "allocated = %p", retval
...
@@ -420,6 +626,10 @@ trace 'r:c:malloc (retval) "allocated = %p", retval
#include <linux/sched.h> /* For TASK_COMM_LEN */
#include <linux/sched.h> /* For TASK_COMM_LEN */
"""
"""
self
.
program
+=
Probe
.
generate_auto_includes
(
map
(
lambda
p
:
p
.
raw_probe
,
self
.
probes
))
self
.
program
+=
Tracepoint
.
generate_decl
()
self
.
program
+=
Tracepoint
.
generate_entry_probe
()
for
probe
in
self
.
probes
:
for
probe
in
self
.
probes
:
self
.
program
+=
probe
.
generate_program
(
self
.
program
+=
probe
.
generate_program
(
self
.
args
.
pid
or
-
1
,
self
.
args
.
include_self
)
self
.
args
.
pid
or
-
1
,
self
.
args
.
include_self
)
...
@@ -429,6 +639,7 @@ trace 'r:c:malloc (retval) "allocated = %p", retval
...
@@ -429,6 +639,7 @@ trace 'r:c:malloc (retval) "allocated = %p", retval
def
_attach_probes
(
self
):
def
_attach_probes
(
self
):
self
.
bpf
=
BPF
(
text
=
self
.
program
)
self
.
bpf
=
BPF
(
text
=
self
.
program
)
Tracepoint
.
attach
(
self
.
bpf
)
for
probe
in
self
.
probes
:
for
probe
in
self
.
probes
:
if
self
.
args
.
verbose
:
if
self
.
args
.
verbose
:
print
(
probe
)
print
(
probe
)
...
@@ -455,7 +666,7 @@ trace 'r:c:malloc (retval) "allocated = %p", retval
...
@@ -455,7 +666,7 @@ trace 'r:c:malloc (retval) "allocated = %p", retval
except
:
except
:
if
self
.
args
.
verbose
:
if
self
.
args
.
verbose
:
traceback
.
print_exc
()
traceback
.
print_exc
()
el
se
:
el
if
sys
.
exc_type
is
not
SystemExit
:
print
(
sys
.
exc_value
)
print
(
sys
.
exc_value
)
if
__name__
==
"__main__"
:
if
__name__
==
"__main__"
:
...
...
tools/trace_example.txt
View file @
fd60d55c
...
@@ -80,6 +80,31 @@ Note that the retval variable must be cast to int before comparing to zero.
...
@@ -80,6 +80,31 @@ Note that the retval variable must be cast to int before comparing to zero.
The reason is that the default type for argN and retval is an unsigned 64-bit
The reason is that the default type for argN and retval is an unsigned 64-bit
integer, which can never be smaller than 0.
integer, which can never be smaller than 0.
trace has also some basic support for kernel tracepoints. For example, let's
trace the block:block_rq_complete tracepoint and print out the number of sectors
transferred:
# trace 't:block:block_rq_complete "sectors=%d", tp.nr_sector'
TIME PID COMM FUNC -
01:23:51 0 swapper/0 block_rq_complete sectors=8
01:23:55 10017 kworker/u64: block_rq_complete sectors=1
01:23:55 0 swapper/0 block_rq_complete sectors=8
^C
To discover the tracepoint structure format (which you can refer to as the "tp"
variable), use the tplist tool. For example:
# tplist -v block:block_rq_complete
block:block_rq_complete
dev_t dev;
sector_t sector;
unsigned int nr_sector;
int errors;
char rwbs[8];
This output tells you that you can use "tp.dev", "tp.sector", etc. in your
predicate and trace arguments.
As a final example, let's trace open syscalls for a specific process. By
As a final example, let's trace open syscalls for a specific process. By
default, tracing is system-wide, but the -p switch overrides this:
default, tracing is system-wide, but the -p switch overrides this:
...
@@ -144,4 +169,6 @@ trace 'r::__kmalloc (retval == 0) "kmalloc failed!"
...
@@ -144,4 +169,6 @@ trace 'r::__kmalloc (retval == 0) "kmalloc failed!"
Trace returns from __kmalloc which returned a null pointer
Trace returns from __kmalloc which returned a null pointer
trace 'r:c:malloc (retval) "allocated = %p", retval
trace 'r:c:malloc (retval) "allocated = %p", retval
Trace returns from malloc and print non-NULL allocated buffers
Trace returns from malloc and print non-NULL allocated buffers
trace 't:block:block_rq_complete "sectors=%d", tp.nr_sector'
Trace the block_rq_complete kernel tracepoint and print # of tx sectors
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment