Commit 3e39a08a authored by Sasha Goldshtein's avatar Sasha Goldshtein

argdist, trace, and tplist support for USDT probes

These tools now support USDT probes with the 'u:provider:probe' syntax.
Probes in a library or process can be listed with 'tplist -l LIB' or 'tplist -p PID'.
Probe arguments are also parsed and available in both argdist and trace as arg1,
arg2, etc., regardless of the probe attach location.

The same USDT probe can be used at multiple locations, which means the attach infra-
structure must probe all these locations. argdist and trace register thunk probes
at each location, which call a central probe function (which is static inline) with
the location id (__loc_id). The central probe function checks the location id to
determine how the arguments should be retrieved -- this is location-dependent.

Finally, some USDT probes must be enabled first by writing a value to a memory
location (this is called a "semaphore"). This value is per-process, so we require a
process id for this kind of probes.

Along with trace and argdist tool support, this commit also introduces new classes
in the bcc module: ProcStat handles pid-wrap detection, whereas USDTReader,
USDTProbe, USDTProbeLocation, and USDTArgument are the shared USDT-related
infrastructure that enables enumeration, attachment, and argument retrieval for
USDT probes.
parent af66546d
......@@ -109,7 +109,7 @@ Examples:
- tools/[tcpconnect](tools/tcpconnect.py): Trace TCP active connections (connect()). [Examples](tools/tcpconnect_example.txt).
- tools/[tcpconnlat](tools/tcpconnlat.py): Trace TCP active connection latency (connect()). [Examples](tools/tcpconnlat_example.txt).
- tools/[tcpretrans](tools/tcpretrans.py): Trace TCP retransmits and TLPs. [Examples](tools/tcpretrans_example.txt).
- tools/[tplist](tools/tplist.py): Display kernel tracepoints and their format.
- tools/[tplist](tools/tplist.py): Display kernel tracepoints or USDT probes and their formats. [Examples](tools/tplist_example.txt).
- tools/[trace](tools/trace.py): Trace arbitrary functions, with filters. [Examples](tools/trace_example.txt)
- tools/[vfscount](tools/vfscount.py) tools/[vfscount.c](tools/vfscount.c): Count VFS calls. [Examples](tools/vfscount_example.txt).
- tools/[vfsstat](tools/vfsstat.py) tools/[vfsstat.c](tools/vfsstat.c): Count some VFS calls, with column output. [Examples](tools/vfsstat_example.txt).
......
......@@ -50,11 +50,11 @@ many cases, argdist will deduce the necessary header files automatically.
.SH SPECIFIER SYNTAX
The general specifier syntax is as follows:
.B {p,r,t}:{[library],category}:function(signature)[:type[,type...]:expr[,expr...][:filter]][#label]
.B {p,r,t,u}:{[library],category}:function(signature)[:type[,type...]:expr[,expr...][:filter]][#label]
.TP
.B {p,r,t}
.B {p,r,t,u}
Probe type \- "p" for function entry, "r" for function return, "t" for kernel
tracepoint; \-H for histogram collection, \-C for frequency count.
tracepoint, "u" for USDT probe; \-H for histogram collection, \-C for frequency count.
Indicates where to place the probe and whether the probe should collect frequency
count information, or aggregate the collected values into a histogram. Counting
probes will collect the number of times every parameter value was observed,
......@@ -78,7 +78,9 @@ on the other hand, is only required if you plan to collect parameter values
based on that signature. For example, if you only want to collect the first
parameter, you don't have to specify the rest of the parameters in the signature.
When capturing kernel tracepoints, this should be the name of the event, e.g.
net_dev_start_xmit. The signature for kernel tracepoints should be empty.
net_dev_start_xmit. The signature for kernel tracepoints should be empty. When
capturing USDT probes, this should be the name of the probe, e.g. reloc_complete.
The signature for USDT probes should be empty.
.TP
.B [type[,type...]]
The type(s) of the expression(s) to capture.
......@@ -94,6 +96,8 @@ Tracepoints may access a special structure called "tp" that is formatted accordi
to the tracepoint format (which you can obtain using tplist). For example, the
block:block_rq_complete tracepoint can access tp.nr_sector. You may also use the
members of the "tp" struct directly, e.g. "nr_sector" instead of "tp.nr_sector".
USDT probes may access the arguments defined by the tracing program in the
special arg1, arg2, ... variables. To obtain their types, use the tplist tool.
Return probes can use the argument values received by the
function when it was entered, through the $entry(paramname) special variable.
Return probes can also access the function's return value in $retval, and the
......@@ -154,6 +158,10 @@ Aggregate interrupts by interrupt request (IRQ):
#
.B argdist -C 't:irq:irq_handler_entry():int:irq'
.TP
Print the functions used as thread entry points and how common they are:
#
.B argdist -C 'u:pthread:pthread_start():u64:arg2' -p 1337
.TP
Print histograms of sleep() and nanosleep() parameter values:
#
.B argdist -H 'p:c:sleep(u32 seconds):u32:seconds' 'p:c:nanosleep(struct timespec *req):long:req->tv_nsec'
......
.TH tplist 8 "2016-03-20" "USER COMMANDS"
.SH NAME
tplist \- Display kernel tracepoints and their format.
tplist \- Display kernel tracepoints or USDT probes and their formats.
.SH SYNOPSIS
.B tplist [-v] [tracepoint]
.B tplist [-p PID] [-l LIB] [-v] [filter]
.SH DESCRIPTION
tplist lists all kernel tracepoints, and can optionally print out the tracepoint
format; namely, the variables that you can trace when the tracepoint is hit. This
is usually used in conjunction with the argdist and/or trace tools.
format; namely, the variables that you can trace when the tracepoint is hit.
tplist can also list USDT probes embedded in a specific library or executable,
and can list USDT probes for all the libraries loaded by a specific process.
These features are usually used in conjunction with the argdist and/or trace tools.
On a typical system, accessing the tracepoint list and format requires root.
However, accessing USDT probes does not require root.
.SH OPTIONS
.TP
\-p PID
Display the USDT probes from all the libraries loaded by the specified process.
.TP
\-l LIB
Display the USDT probes from the specified library or executable. If the librar
or executable can be found in the standard paths, a full path is not required.
.TP
\-v
Display the variables associated with the tracepoint or tracepoints.
Display the variables associated with the tracepoint or USDT probe.
.TP
[tracepoint]
A wildcard expression that specifies which tracepoints to print. For example,
block:* will print all block tracepoints (block:block_rq_complete, etc.).
Regular expressions are not supported.
[filter]
A wildcard expression that specifies which tracepoints or probes to print.
For example, block:* will print all block tracepoints (block:block_rq_complete,
etc.). Regular expressions are not supported.
.SH EXAMPLES
.TP
Print all kernel tracepoints:
......@@ -27,6 +37,14 @@ Print all kernel tracepoints:
Print all net tracepoints with their format:
#
.B tplist -v 'net:*'
.TP
Print all USDT probes in libpthread:
$
.B tplist -l pthread
.TP
Print all USDT probes in process 4717 from the libc provider:
$
.B tplist -p 4717 'libc:*'
.SH SOURCE
This is from bcc.
.IP
......
......@@ -46,11 +46,11 @@ The general probe syntax is as follows:
.B [{p,r}]:[library]:function [(predicate)] ["format string"[, arguments]]
.B t:category:event [(predicate)] ["format string"[, arguments]]
.B {t:category:event,u:library:probe} [(predicate)] ["format string"[, arguments]]
.TP
.B {[{p,r}],t}
.B {[{p,r}],t,u}
Probe type \- "p" for function entry, "r" for function return, "t" for kernel
tracepoint. The default probe type is "p".
tracepoint, "u" for USDT probe. The default probe type is "p".
.TP
.B [library]
Library containing the probe.
......@@ -69,6 +69,9 @@ The function to probe.
.B event
The tracepoint event. For example, "block_rq_complete".
.TP
.B probe
The USDT probe name. For example, "pthread_create".
.TP
.B [(predicate)]
The filter applied to the captured data. Only if the filter evaluates as true,
the trace message will be printed. The filter can use any valid C expression
......@@ -96,6 +99,9 @@ discover the format of your tracepoint, use the tplist tool. Note that you can
also use the members of the "tp" struct directly, e.g "nr_sector" instead of
"tp.nr_sector".
In USDT probes, the arg1, ..., argN variables refer to the probe's arguments.
To determine which arguments your probe has, use the tplist tool.
The predicate expression and the format specifier replacements for printing
may also use the following special keywords: $pid, $tgid to refer to the
current process' pid and tgid; $uid, $gid to refer to the current user's
......@@ -121,6 +127,10 @@ Trace returns from the readline function in bash and print the return value as a
Trace the block:block_rq_complete tracepoint and print the number of sectors completed:
#
.B trace 't:block:block_rq_complete """%d sectors"", nr_sector'
.TP
Trace the pthread_create USDT probe from the pthread library and print the address of the thread's start function:
#
.B trace 'u:pthread:pthread_create """start addr = %llx"", arg3'
.SH SOURCE
This is from bcc.
.IP
......
......@@ -26,6 +26,7 @@ import sys
basestring = (unicode if sys.version_info[0] < 3 else str)
from .libbcc import lib, _CB_TYPE
from .procstat import ProcStat
from .table import Table
from .tracepoint import Perf, Tracepoint
from .usyms import ProcessSymbols
......@@ -341,7 +342,7 @@ class BPF(object):
desc.encode("ascii"), pid, cpu, group_fd,
self._reader_cb_impl, ct.cast(id(self), ct.py_object))
res = ct.cast(res, ct.c_void_p)
if res.value is None:
if res == None:
raise Exception("Failed to attach BPF to kprobe")
open_kprobes[ev_name] = res
return self
......@@ -389,7 +390,7 @@ class BPF(object):
desc.encode("ascii"), pid, cpu, group_fd,
self._reader_cb_impl, ct.cast(id(self), ct.py_object))
res = ct.cast(res, ct.c_void_p)
if res.value is None:
if res == None:
raise Exception("Failed to attach BPF to kprobe")
open_kprobes[ev_name] = res
return self
......@@ -513,7 +514,7 @@ class BPF(object):
desc.encode("ascii"), pid, cpu, group_fd,
self._reader_cb_impl, ct.cast(id(self), ct.py_object))
res = ct.cast(res, ct.c_void_p)
if res.value is None:
if res == None:
raise Exception("Failed to attach BPF to uprobe")
open_uprobes[ev_name] = res
return self
......@@ -557,7 +558,7 @@ class BPF(object):
desc.encode("ascii"), pid, cpu, group_fd,
self._reader_cb_impl, ct.cast(id(self), ct.py_object))
res = ct.cast(res, ct.c_void_p)
if res.value is None:
if res == None:
raise Exception("Failed to attach BPF to uprobe")
open_uprobes[ev_name] = res
return self
......@@ -793,3 +794,5 @@ class BPF(object):
except KeyboardInterrupt:
exit()
from .usdt import USDTReader
# Copyright 2016 Sasha Goldshtein
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
class ProcStat(object):
def __init__(self, pid):
self.pid = pid
self.exe = self._get_exe()
self.start_time = self._get_start_time()
def is_stale(self):
return self.exe != self._get_exe() or \
self.start_time != self._get_start_time()
def _get_exe(self):
return os.popen("readlink -f /proc/%d/exe" % self.pid).read()
def _get_start_time(self):
return os.popen("cut -d' ' -f 22 /proc/%d/stat" %
self.pid).read()
This diff is collapsed.
......@@ -27,16 +27,7 @@ class ProcessSymbols(object):
def refresh_code_ranges(self):
self.code_ranges = self._get_code_ranges()
self.ranges_cache = {}
self.exe = self._get_exe()
self.start_time = self._get_start_time()
def _get_exe(self):
return ProcessSymbols._run_command_get_output(
"readlink -f /proc/%d/exe" % self.pid)
def _get_start_time(self):
return ProcessSymbols._run_command_get_output(
"cut -d' ' -f 22 /proc/%d/stat" % self.pid)
self.procstat = ProcStat(self.pid)
@staticmethod
def _is_binary_segment(parts):
......@@ -101,10 +92,7 @@ class ProcessSymbols(object):
return "%x" % offset
def _check_pid_wrap(self):
# If the pid wrapped, our exe name and start time must have changed.
# Detect this and get rid of the cached ranges.
if self.exe != self._get_exe() or \
self.start_time != self._get_start_time():
if self.procstat.is_stale():
self.refresh_code_ranges()
def decode_addr(self, addr):
......@@ -127,3 +115,4 @@ class ProcessSymbols(object):
binary)
return "%x" % addr
from . import ProcStat
This diff is collapsed.
......@@ -332,9 +332,10 @@ optional arguments:
additional header files to include in the BPF program
Probe specifier syntax:
{p,r,t}:{[library],category}:function(signature)[:type[,type...]:expr[,expr...][:filter]][#label]
{p,r,t,u}:{[library],category}:function(signature)[:type[,type...]:expr[,expr...][:filter]][#label]
Where:
p,r,t -- probe at function entry, function exit, or kernel tracepoint
p,r,t,u -- probe at function entry, function exit, kernel tracepoint,
or USDT probe
in exit probes: can use $retval, $entry(param), $latency
library -- the library that contains the function
(leave empty for kernel functions)
......@@ -392,6 +393,10 @@ argdist -H 't:block:block_rq_complete():u32:tp.nr_sector'
argdist -C 't:irq:irq_handler_entry():int:tp.irq'
Aggregate interrupts by interrupt request (IRQ)
argdist -C 'u:pthread:pthread_start():u64:arg2' -p 1337
Print frequency of function addresses used as a pthread start function,
relying on the USDT pthread_start probe in process 1337
argdist -H \
'p:c:sleep(u32 seconds):u32:seconds' \
'p:c:nanosleep(struct timespec *req):long:req->tv_nsec'
......
#!/usr/bin/env python
#
# tplist Display kernel tracepoints and their formats.
# tplist Display kernel tracepoints or USDT probes and their formats.
#
# USAGE: tplist [-v] [tracepoint]
# USAGE: tplist [-p PID] [-l LIB] [-v] [filter]
#
# Licensed under the Apache License, Version 2.0 (the "License")
# Copyright (C) 2016 Sasha Goldshtein.
import argparse
import fnmatch
import re
import os
import re
import sys
from bcc import USDTReader
trace_root = "/sys/kernel/debug/tracing"
event_root = os.path.join(trace_root, "events")
parser = argparse.ArgumentParser(description=
"Display kernel tracepoints and their formats.",
"Display kernel tracepoints or USDT probes and their formats.",
formatter_class=argparse.RawDescriptionHelpFormatter)
parser.add_argument("-p", "--pid", type=int, default=-1, help=
"List USDT probes in the specified process")
parser.add_argument("-l", "--lib", default="", help=
"List USDT probes in the specified library or executable")
parser.add_argument("-v", dest="variables", action="store_true", help=
"Print the format (available variables) for each tracepoint")
parser.add_argument(dest="tracepoint", nargs="?",
help="The tracepoint name to print (wildcards allowed)")
"Print the format (available variables)")
parser.add_argument(dest="filter", nargs="?", help=
"A filter that specifies which probes/tracepoints to print")
args = parser.parse_args()
def print_tpoint_format(category, event):
......@@ -42,12 +49,12 @@ def print_tpoint_format(category, event):
def print_tpoint(category, event):
tpoint = "%s:%s" % (category, event)
if not args.tracepoint or fnmatch.fnmatch(tpoint, args.tracepoint):
if not args.filter or fnmatch.fnmatch(tpoint, args.filter):
print(tpoint)
if args.variables:
print_tpoint_format(category, event)
def print_all():
def print_tracepoints():
for category in os.listdir(event_root):
cat_dir = os.path.join(event_root, category)
if not os.path.isdir(cat_dir):
......@@ -57,5 +64,28 @@ def print_all():
if os.path.isdir(evt_dir):
print_tpoint(category, event)
def print_usdt(pid, lib):
reader = USDTReader(bin_path=lib, pid=pid)
probes_seen = []
for probe in reader.probes:
probe_name = "%s:%s" % (probe.provider, probe.name)
if not args.filter or fnmatch.fnmatch(probe_name, args.filter):
if probe_name in probes_seen:
continue
probes_seen.append(probe_name)
if args.variables:
print(probe.display_verbose())
else:
print("%s %s:%s" % (probe.bin_path,
probe.provider, probe.name))
if __name__ == "__main__":
print_all()
try:
if args.pid != -1 or args.lib != "":
print_usdt(args.pid, args.lib)
else:
print_tracepoints()
except:
if sys.exc_type is not SystemExit:
print(sys.exc_value)
Demonstrations of tplist.
tplist displays kernel tracepoints and USDT probes, including their
format. It can be used to discover probe points for use with the trace
and argdist tools. Kernel tracepoints are scattered around the kernel
and provide valuable static tracing on block and network I/O, scheduling,
power events, and many other subjects. USDT probes are placed in libraries
(such as libc) and executables (such as node) and provide static tracing
information that can (optionally) be turned on and off at runtime.
For example, suppose you want to discover which USDT probes a particular
executable contains. Just run tplist on that executable (or library):
$ tplist -l basic_usdt
/home/vagrant/basic_usdt basic_usdt:start_main
/home/vagrant/basic_usdt basic_usdt:loop_iter
/home/vagrant/basic_usdt basic_usdt:end_main
The loop_iter probe sounds interesting. What are the locations of that
probe, and which variables are available?
$ tplist '*loop_iter' -l basic_usdt -v
/home/vagrant/basic_usdt basic_usdt:loop_iter [sema 0x601036]
location 0x400550 raw args: -4@$42 8@%rax
4 signed bytes @ constant 42
8 unsigned bytes @ register %rax
location 0x40056f raw args: 8@-8(%rbp) 8@%rax
8 unsigned bytes @ -8(%rbp)
8 unsigned bytes @ register %rax
This output indicates that the loop_iter probe is used in two locations
in the basic_usdt executable. The first location passes a constant value,
42, to the probe. The second location passes a variable value located at
an offset from the %rbp register. Don't worry -- you don't have to trace
the register values yourself. The argdist and trace tools understand the
probe format and can print out the arguments automatically -- you can
refer to them as arg1, arg2, and so on.
Try to explore with some common libraries on your system and see if they
contain UDST probes. Here are two examples you might find interesting:
$ tplist -l pthread # list probes in libpthread
/lib64/libpthread.so.0 libpthread:pthread_start
/lib64/libpthread.so.0 libpthread:pthread_create
/lib64/libpthread.so.0 libpthread:pthread_join
/lib64/libpthread.so.0 libpthread:pthread_join_ret
/lib64/libpthread.so.0 libpthread:mutex_init
... more output truncated
$ tplist -l c # list probes in libc
/lib64/libc.so.6 libc:setjmp
/lib64/libc.so.6 libc:longjmp
/lib64/libc.so.6 libc:longjmp_target
/lib64/libc.so.6 libc:memory_arena_reuse_free_list
/lib64/libc.so.6 libc:memory_heap_new
... more output truncated
tplist also understands kernel tracepoints, and can list their format
as well. For example, let's look for all block I/O-related tracepoints:
# tplist 'block*'
block:block_touch_buffer
block:block_dirty_buffer
block:block_rq_abort
block:block_rq_requeue
block:block_rq_complete
block:block_rq_insert
block:block_rq_issue
block:block_bio_bounce
block:block_bio_complete
block:block_bio_backmerge
block:block_bio_frontmerge
block:block_bio_queue
block:block_getrq
block:block_sleeprq
block:block_plug
block:block_unplug
block:block_split
block:block_bio_remap
block:block_rq_remap
The block:block_rq_complete tracepoints sounds interesting. Let's print
its format to see what we can trace with argdist and trace:
$ tplist -v block:block_rq_complete
block:block_rq_complete
dev_t dev;
sector_t sector;
unsigned int nr_sector;
int errors;
char rwbs[8];
The dev, sector, nr_sector, etc. variables can now all be used in probes
you specify with argdist or trace.
USAGE message:
$ tplist -h
usage: tplist.py [-h] [-p PID] [-l LIB] [-v] [filter]
Display kernel tracepoints or USDT probes and their formats.
positional arguments:
filter A filter that specifies which probes/tracepoints to print
optional arguments:
-h, --help show this help message and exit
-p PID, --pid PID List USDT probes in the specified process
-l LIB, --lib LIB List USDT probes in the specified library or executable
-v Print the format (available variables)
This diff is collapsed.
......@@ -171,4 +171,6 @@ trace 'r:c:malloc (retval) "allocated = %p", retval
Trace returns from malloc and print non-NULL allocated buffers
trace 't:block:block_rq_complete "sectors=%d", tp.nr_sector'
Trace the block_rq_complete kernel tracepoint and print # of tx sectors
trace 'u:pthread:pthread_create (arg4 != 0)'
Trace the USDT probe pthread_create when its 4th argument is non-zero
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment