Commit 3e39a08a authored by Sasha Goldshtein's avatar Sasha Goldshtein

argdist, trace, and tplist support for USDT probes

These tools now support USDT probes with the 'u:provider:probe' syntax.
Probes in a library or process can be listed with 'tplist -l LIB' or 'tplist -p PID'.
Probe arguments are also parsed and available in both argdist and trace as arg1,
arg2, etc., regardless of the probe attach location.

The same USDT probe can be used at multiple locations, which means the attach infra-
structure must probe all these locations. argdist and trace register thunk probes
at each location, which call a central probe function (which is static inline) with
the location id (__loc_id). The central probe function checks the location id to
determine how the arguments should be retrieved -- this is location-dependent.

Finally, some USDT probes must be enabled first by writing a value to a memory
location (this is called a "semaphore"). This value is per-process, so we require a
process id for this kind of probes.

Along with trace and argdist tool support, this commit also introduces new classes
in the bcc module: ProcStat handles pid-wrap detection, whereas USDTReader,
USDTProbe, USDTProbeLocation, and USDTArgument are the shared USDT-related
infrastructure that enables enumeration, attachment, and argument retrieval for
USDT probes.
parent af66546d
......@@ -109,7 +109,7 @@ Examples:
- tools/[tcpconnect](tools/tcpconnect.py): Trace TCP active connections (connect()). [Examples](tools/tcpconnect_example.txt).
- tools/[tcpconnlat](tools/tcpconnlat.py): Trace TCP active connection latency (connect()). [Examples](tools/tcpconnlat_example.txt).
- tools/[tcpretrans](tools/tcpretrans.py): Trace TCP retransmits and TLPs. [Examples](tools/tcpretrans_example.txt).
- tools/[tplist](tools/tplist.py): Display kernel tracepoints and their format.
- tools/[tplist](tools/tplist.py): Display kernel tracepoints or USDT probes and their formats. [Examples](tools/tplist_example.txt).
- tools/[trace](tools/trace.py): Trace arbitrary functions, with filters. [Examples](tools/trace_example.txt)
- tools/[vfscount](tools/vfscount.py) tools/[vfscount.c](tools/vfscount.c): Count VFS calls. [Examples](tools/vfscount_example.txt).
- tools/[vfsstat](tools/vfsstat.py) tools/[vfsstat.c](tools/vfsstat.c): Count some VFS calls, with column output. [Examples](tools/vfsstat_example.txt).
......
......@@ -50,11 +50,11 @@ many cases, argdist will deduce the necessary header files automatically.
.SH SPECIFIER SYNTAX
The general specifier syntax is as follows:
.B {p,r,t}:{[library],category}:function(signature)[:type[,type...]:expr[,expr...][:filter]][#label]
.B {p,r,t,u}:{[library],category}:function(signature)[:type[,type...]:expr[,expr...][:filter]][#label]
.TP
.B {p,r,t}
.B {p,r,t,u}
Probe type \- "p" for function entry, "r" for function return, "t" for kernel
tracepoint; \-H for histogram collection, \-C for frequency count.
tracepoint, "u" for USDT probe; \-H for histogram collection, \-C for frequency count.
Indicates where to place the probe and whether the probe should collect frequency
count information, or aggregate the collected values into a histogram. Counting
probes will collect the number of times every parameter value was observed,
......@@ -78,7 +78,9 @@ on the other hand, is only required if you plan to collect parameter values
based on that signature. For example, if you only want to collect the first
parameter, you don't have to specify the rest of the parameters in the signature.
When capturing kernel tracepoints, this should be the name of the event, e.g.
net_dev_start_xmit. The signature for kernel tracepoints should be empty.
net_dev_start_xmit. The signature for kernel tracepoints should be empty. When
capturing USDT probes, this should be the name of the probe, e.g. reloc_complete.
The signature for USDT probes should be empty.
.TP
.B [type[,type...]]
The type(s) of the expression(s) to capture.
......@@ -94,6 +96,8 @@ Tracepoints may access a special structure called "tp" that is formatted accordi
to the tracepoint format (which you can obtain using tplist). For example, the
block:block_rq_complete tracepoint can access tp.nr_sector. You may also use the
members of the "tp" struct directly, e.g. "nr_sector" instead of "tp.nr_sector".
USDT probes may access the arguments defined by the tracing program in the
special arg1, arg2, ... variables. To obtain their types, use the tplist tool.
Return probes can use the argument values received by the
function when it was entered, through the $entry(paramname) special variable.
Return probes can also access the function's return value in $retval, and the
......@@ -154,6 +158,10 @@ Aggregate interrupts by interrupt request (IRQ):
#
.B argdist -C 't:irq:irq_handler_entry():int:irq'
.TP
Print the functions used as thread entry points and how common they are:
#
.B argdist -C 'u:pthread:pthread_start():u64:arg2' -p 1337
.TP
Print histograms of sleep() and nanosleep() parameter values:
#
.B argdist -H 'p:c:sleep(u32 seconds):u32:seconds' 'p:c:nanosleep(struct timespec *req):long:req->tv_nsec'
......
.TH tplist 8 "2016-03-20" "USER COMMANDS"
.SH NAME
tplist \- Display kernel tracepoints and their format.
tplist \- Display kernel tracepoints or USDT probes and their formats.
.SH SYNOPSIS
.B tplist [-v] [tracepoint]
.B tplist [-p PID] [-l LIB] [-v] [filter]
.SH DESCRIPTION
tplist lists all kernel tracepoints, and can optionally print out the tracepoint
format; namely, the variables that you can trace when the tracepoint is hit. This
is usually used in conjunction with the argdist and/or trace tools.
format; namely, the variables that you can trace when the tracepoint is hit.
tplist can also list USDT probes embedded in a specific library or executable,
and can list USDT probes for all the libraries loaded by a specific process.
These features are usually used in conjunction with the argdist and/or trace tools.
On a typical system, accessing the tracepoint list and format requires root.
However, accessing USDT probes does not require root.
.SH OPTIONS
.TP
\-p PID
Display the USDT probes from all the libraries loaded by the specified process.
.TP
\-l LIB
Display the USDT probes from the specified library or executable. If the librar
or executable can be found in the standard paths, a full path is not required.
.TP
\-v
Display the variables associated with the tracepoint or tracepoints.
Display the variables associated with the tracepoint or USDT probe.
.TP
[tracepoint]
A wildcard expression that specifies which tracepoints to print. For example,
block:* will print all block tracepoints (block:block_rq_complete, etc.).
Regular expressions are not supported.
[filter]
A wildcard expression that specifies which tracepoints or probes to print.
For example, block:* will print all block tracepoints (block:block_rq_complete,
etc.). Regular expressions are not supported.
.SH EXAMPLES
.TP
Print all kernel tracepoints:
......@@ -27,6 +37,14 @@ Print all kernel tracepoints:
Print all net tracepoints with their format:
#
.B tplist -v 'net:*'
.TP
Print all USDT probes in libpthread:
$
.B tplist -l pthread
.TP
Print all USDT probes in process 4717 from the libc provider:
$
.B tplist -p 4717 'libc:*'
.SH SOURCE
This is from bcc.
.IP
......
......@@ -46,11 +46,11 @@ The general probe syntax is as follows:
.B [{p,r}]:[library]:function [(predicate)] ["format string"[, arguments]]
.B t:category:event [(predicate)] ["format string"[, arguments]]
.B {t:category:event,u:library:probe} [(predicate)] ["format string"[, arguments]]
.TP
.B {[{p,r}],t}
.B {[{p,r}],t,u}
Probe type \- "p" for function entry, "r" for function return, "t" for kernel
tracepoint. The default probe type is "p".
tracepoint, "u" for USDT probe. The default probe type is "p".
.TP
.B [library]
Library containing the probe.
......@@ -69,6 +69,9 @@ The function to probe.
.B event
The tracepoint event. For example, "block_rq_complete".
.TP
.B probe
The USDT probe name. For example, "pthread_create".
.TP
.B [(predicate)]
The filter applied to the captured data. Only if the filter evaluates as true,
the trace message will be printed. The filter can use any valid C expression
......@@ -96,6 +99,9 @@ discover the format of your tracepoint, use the tplist tool. Note that you can
also use the members of the "tp" struct directly, e.g "nr_sector" instead of
"tp.nr_sector".
In USDT probes, the arg1, ..., argN variables refer to the probe's arguments.
To determine which arguments your probe has, use the tplist tool.
The predicate expression and the format specifier replacements for printing
may also use the following special keywords: $pid, $tgid to refer to the
current process' pid and tgid; $uid, $gid to refer to the current user's
......@@ -121,6 +127,10 @@ Trace returns from the readline function in bash and print the return value as a
Trace the block:block_rq_complete tracepoint and print the number of sectors completed:
#
.B trace 't:block:block_rq_complete """%d sectors"", nr_sector'
.TP
Trace the pthread_create USDT probe from the pthread library and print the address of the thread's start function:
#
.B trace 'u:pthread:pthread_create """start addr = %llx"", arg3'
.SH SOURCE
This is from bcc.
.IP
......
......@@ -26,6 +26,7 @@ import sys
basestring = (unicode if sys.version_info[0] < 3 else str)
from .libbcc import lib, _CB_TYPE
from .procstat import ProcStat
from .table import Table
from .tracepoint import Perf, Tracepoint
from .usyms import ProcessSymbols
......@@ -341,7 +342,7 @@ class BPF(object):
desc.encode("ascii"), pid, cpu, group_fd,
self._reader_cb_impl, ct.cast(id(self), ct.py_object))
res = ct.cast(res, ct.c_void_p)
if res.value is None:
if res == None:
raise Exception("Failed to attach BPF to kprobe")
open_kprobes[ev_name] = res
return self
......@@ -389,7 +390,7 @@ class BPF(object):
desc.encode("ascii"), pid, cpu, group_fd,
self._reader_cb_impl, ct.cast(id(self), ct.py_object))
res = ct.cast(res, ct.c_void_p)
if res.value is None:
if res == None:
raise Exception("Failed to attach BPF to kprobe")
open_kprobes[ev_name] = res
return self
......@@ -513,7 +514,7 @@ class BPF(object):
desc.encode("ascii"), pid, cpu, group_fd,
self._reader_cb_impl, ct.cast(id(self), ct.py_object))
res = ct.cast(res, ct.c_void_p)
if res.value is None:
if res == None:
raise Exception("Failed to attach BPF to uprobe")
open_uprobes[ev_name] = res
return self
......@@ -557,7 +558,7 @@ class BPF(object):
desc.encode("ascii"), pid, cpu, group_fd,
self._reader_cb_impl, ct.cast(id(self), ct.py_object))
res = ct.cast(res, ct.c_void_p)
if res.value is None:
if res == None:
raise Exception("Failed to attach BPF to uprobe")
open_uprobes[ev_name] = res
return self
......@@ -793,3 +794,5 @@ class BPF(object):
except KeyboardInterrupt:
exit()
from .usdt import USDTReader
# Copyright 2016 Sasha Goldshtein
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
class ProcStat(object):
def __init__(self, pid):
self.pid = pid
self.exe = self._get_exe()
self.start_time = self._get_start_time()
def is_stale(self):
return self.exe != self._get_exe() or \
self.start_time != self._get_start_time()
def _get_exe(self):
return os.popen("readlink -f /proc/%d/exe" % self.pid).read()
def _get_start_time(self):
return os.popen("cut -d' ' -f 22 /proc/%d/stat" %
self.pid).read()
# Copyright 2016 Sasha Goldshtein
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import struct
import re
from . import BPF
from . import ProcStat
class USDTArgument(object):
def __init__(self, size, is_signed, register=None, constant=None,
deref_offset=None, deref_name=None):
self.size = size
self.is_signed = is_signed
self.register = register
self.constant = constant
self.deref_offset = deref_offset
self.deref_name = deref_name
def _normalize_register(self):
normalized = self.register
if normalized is None:
return None
if normalized.startswith('%'):
normalized = normalized[1:]
if normalized in USDTArgument.translations:
normalized = USDTArgument.translations[normalized]
return normalized
translations = {
"rax": "ax", "rbx": "bx", "rcx": "cx", "rdx": "dx",
"rdi": "di", "rsi": "si", "rbp": "bp", "rsp": "sp",
"rip": "ip", "eax": "ax", "ebx": "bx", "ecx": "cx",
"edx": "dx", "edi": "di", "esi": "si", "ebp": "bp",
"esp": "sp", "eip": "ip", "al": "ax", "bl": "bx",
"cl": "cx", "dl": "dx"
}
def generate_assign_to_local(self, local_name):
"""
generate_assign_to_local(local_name)
Generates an assignment statement that initializes a local
variable with the value of this argument. Assumes that the
struct pt_regs pointer is called 'ctx', and accesses registers
from that pointer. The local variable must already be declared
by the caller. Use get_type() to get the proper type for that
declaration.
Example output:
local1 = (u64)ctx->di;
{
u64 __tmp;
bpf_probe_read(&__tmp, sizeof(__tmp),
(void *)(ctx->bp - 8));
bpf_probe_read(&local2, sizeof(local2),
(void *)__tmp);
}
"""
normalized_reg = self._normalize_register()
if self.constant is not None:
# Simplest case, it's just a constant
return "%s = %d;" % (local_name, self.constant)
if self.deref_offset is None:
# Simple read from the specified register
return "%s = (%s)ctx->%s;" % \
(local_name, self.get_type(), normalized_reg)
# Note that the cast to a smaller type should grab the
# relevant part of the register anyway, if we're dealing
# with 32/16/8-bit registers like ecx, dx, al, etc.
if self.deref_offset is not None and self.deref_name is None:
# Add deref_offset to register value and bpf_probe_read
# from the resulting address
return \
"""{
u64 __temp = ctx->%s + (%d);
bpf_probe_read(&%s, sizeof(%s), (void *)__temp);
} """ % (normalized_reg, self.deref_offset,
local_name, local_name)
# Final case: dereference global, need to find address of global
# with the provided name and then potentially add deref_offset
# and bpf_probe_read the result. None of this will work with BPF
# because we can't just access arbitrary addresses.
return "%s = 0; /* UNSUPPORTED CASE, SEE SOURCE */" % \
local_name
def get_type(self):
result_type = None
if self.size == 1:
result_type = "char"
elif self.size == 2:
result_type = "short"
elif self.size == 4:
result_type = "int"
elif self.size == 8:
result_type = "long"
if result_type is None:
raise ValueError("arguments of size %d are not " +
"currently supported" % self.size)
if not self.is_signed:
result_type = "unsigned " + result_type
return result_type
def __str__(self):
prefix = "%d %s bytes @ " % (self.size,
" signed" if self.is_signed else "unsigned")
if self.constant is not None:
return prefix + "constant %d" % self.constant
if self.deref_offset is None:
return prefix + "register " + self.register
if self.deref_offset is not None and self.deref_name is None:
return prefix + "%d(%s)" % (self.deref_offset,
self.register)
return prefix + "%d from %s global" % (self.deref_offset,
self.deref_name)
class USDTProbeLocation(object):
def __init__(self, address, args):
self.address = address
self.raw_args = args
self.args = []
self._parse_args()
def generate_usdt_assignments(self, prefix="arg"):
text = ""
for i, arg in enumerate(self.args, 1):
text += (" "*16) + \
arg.generate_assign_to_local(
"%s%d" % (prefix, i)) + "\n"
return text
def _parse_args(self):
for arg in self.raw_args.split():
self._parse_arg(arg.strip())
def _parse_arg(self, arg):
qregs = ["%rax", "%rbx", "%rcx", "%rdx", "%rdi", "%rsi",
"%rbp", "%rsp", "%rip", "%r8", "%r9", "%r10", "%r11",
"%r12", "%r13", "%r14", "%r15"]
dregs = ["%eax", "%ebx", "%ecx", "%edx", "%edi", "%esi",
"%ebp", "%esp", "%eip"]
wregs = ["%ax", "%bx", "%cx", "%dx", "%di", "%si",
"%bp", "%sp", "%ip"]
bregs = ["%al", "%bl", "%cl", "%dl"]
any_reg = "(" + "|".join(qregs + dregs + wregs + bregs) + ")"
# -4@$0, 8@$1234
m = re.match(r'(\-?)(\d+)@\$(\d+)', arg)
if m is not None:
self.args.append(USDTArgument(
int(m.group(2)),
m.group(1) == '-',
constant=int(m.group(3))
))
return
# %rdi, %rax, %rsi
m = re.match(any_reg, arg)
if m is not None:
if arg in qregs:
size = 8
elif arg in dregs:
size = 4
elif arg in wregs:
size = 2
elif arg in bregs:
size = 1
self.args.append(USDTArgument(
size, False, register=arg
))
return
# -8@%rbx, 4@%r12
m = re.match(r'(\-?)(\d+)@' + any_reg, arg)
if m is not None:
self.args.append(USDTArgument(
int(m.group(2)), # Size (in bytes)
m.group(1) == '-', # Signed
register=m.group(3)
))
return
# 8@-8(%rbp), 4@(%rax)
m = re.match(r'(\-?)(\d+)@(\-?)(\d*)\(' + any_reg + r'\)', arg)
if m is not None:
deref_offset = int(m.group(4))
if m.group(3) == '-':
deref_offset = -deref_offset
self.args.append(USDTArgument(
int(m.group(2)), m.group(1) == '-',
register=m.group(5), deref_offset=deref_offset
))
return
# -4@global_max_action(%rip)
m = re.match(r'(\-?)(\d+)@(\w+)\(%rip\)', arg)
if m is not None:
self.args.append(USDTArgument(
int(m.group(2)), m.group(1) == '-',
register="%rip", deref_name=m.group(3),
deref_offset=0
))
return
# 8@24+mp_(@rip)
m = re.match(r'(\-?)(\d+)@(\-?)(\d+)\+(\w+)\(%rip\)', arg)
if m is not None:
deref_offset = int(m.group(4))
if m.group(3) == '-':
deref_offset = -deref_offset
self.args.append(USDTArgument(
int(m.group(2)), m.group(1) == '-',
register="%rip", deref_offset=deref_offset,
deref_name=m.group(5)
))
return
raise ValueError("unrecognized argument format: '%s'" % arg)
class USDTProbe(object):
def __init__(self, bin_path, provider, name, semaphore):
self.bin_path = bin_path
self.provider = provider
self.name = name
self.semaphore = semaphore
self.enabled_procs = {}
self.proc_semas = {}
self.locations = []
def add_location(self, location, arguments):
self.locations.append(USDTProbeLocation(location, arguments))
def need_enable(self):
"""
Returns whether this probe needs to be enabled in each
process that uses it. Probes that must be enabled can't be
traced without specifying a specific pid.
"""
return self.semaphore != 0
def enable(self, pid):
"""Enables this probe in the specified process."""
self._add_to_semaphore(pid, +1)
self.enabled_procs[pid] = ProcStat(pid)
def disable(self, pid):
"""Disables the probe in the specified process."""
if pid not in self.enabled_procs:
raise ValueError("probe wasn't enabled in this process")
# Because of the possibility of pid wrap, it's extremely
# important to verify that we are still dealing with the same
# process. Otherwise, we are overwriting random memory in some
# other process :-)
if not self.enabled_procs[pid].is_stale():
self._add_to_semaphore(pid, -1)
del(self.enabled_procs[pid])
def get_arg_types(self):
"""
Returns the argument types used by this probe. Different probe
locations might use different argument types, e.g. signed i32
vs. unsigned i64. We should take the largest type, and the
sign really doesn't matter that much.
"""
arg_types = []
for i in range(len(self.locations[0].args)):
max_size_loc = max(self.locations, key=lambda loc:
loc.args[i].size)
arg_types.append(max_size_loc.args[i].get_type())
return arg_types
def generate_usdt_thunks(self, name_prefix, thunk_names):
text = ""
for i in range(len(self.locations)):
thunk_name = "%s_thunk_%d" % (name_prefix, i)
thunk_names.append(thunk_name)
text += """
int %s(struct pt_regs *ctx) {
return %s(ctx, %d);
} """ % (thunk_name, name_prefix, i)
return text
def generate_usdt_cases(self):
text = ""
for i, arg_type in enumerate(self.get_arg_types(), 1):
text += " %s arg%d = 0;\n" % (arg_type, i)
for i, location in enumerate(self.locations):
assignments = location.generate_usdt_assignments()
text += \
"""
if (__loc_id == %d) {
%s
} \n""" % (i, assignments)
return text
def _ensure_proc_sema(self, pid):
if pid in self.proc_semas:
return self.proc_semas[pid]
if self.bin_path.endswith(".so"):
# Semaphores declared in shared objects are relative
# to that shared object's load address
with open("/proc/%d/maps" % pid) as m:
maps = m.readlines()
addrs = map(lambda l: l.split('-')[0],
filter(lambda l: self.bin_path in l, maps)
)
if len(addrs) == 0:
raise ValueError("lib %s not loaded in pid %d"
% (self.bin_path, pid))
sema_addr = int(addrs[0], 16) + self.semaphore
else:
sema_addr = self.semaphore # executable, absolute
self.proc_semas[pid] = sema_addr
return sema_addr
def _add_to_semaphore(self, pid, val):
sema_addr = self._ensure_proc_sema(pid)
with open("/proc/%d/mem" % pid, "r+b") as fd:
fd.seek(sema_addr, 0)
prev = struct.unpack("H", fd.read(2))[0]
fd.seek(sema_addr, 0)
fd.write(struct.pack("H", prev + val))
def __str__(self):
return "%s %s:%s" % (self.bin_path, self.provider, self.name)
def display_verbose(self):
text = str(self) + " [sema 0x%x]\n" % self.semaphore
for location in self.locations:
text += " location 0x%x raw args: %s\n" % \
(location.address, location.raw_args)
for arg in location.args:
text += " %s\n" % str(arg)
return text
class USDTReader(object):
def __init__(self, bin_path="", pid=-1):
"""
__init__(bin_path="", pid=-1)
Reads all the probes from the specified library, executable,
or process. If a pid is specified, all the libraries (including
the executable) are searched for probes. After initialization
completes, the found probes are in the 'probes' property.
"""
self.probes = []
if pid != -1:
for mod in USDTReader._get_modules(pid):
self._add_probes(mod)
elif len(bin_path) != 0:
self._add_probes(bin_path)
else:
raise ValueError("pid or bin_path is required")
@staticmethod
def _get_modules(pid):
with open("/proc/%d/maps" % pid) as f:
maps = f.readlines()
modules = []
for line in maps:
parts = line.strip().split()
if len(parts) < 6:
continue
if parts[5][0] == '[' or not 'x' in parts[1]:
continue
modules.append(parts[5])
return modules
def _add_probes(self, bin_path):
if not os.path.isfile(bin_path):
attempt1 = os.popen(
"which --skip-alias %s 2>/dev/null"
% bin_path).read().strip()
if attempt1 is None or not os.path.isfile(attempt1):
attempt2 = BPF.find_library(bin_path)
if attempt2 is None or \
not os.path.isfile(attempt2):
raise ValueError("can't find %s"
% bin_path)
else:
bin_path = attempt2
else:
bin_path = attempt1
with os.popen("readelf -n %s 2>/dev/null" % bin_path) as child:
notes = child.read()
for match in re.finditer(r'stapsdt.*?NT_STAPSDT.*?Provider: ' +
r'(\w+).*?Name: (\w+).*?Location: (\w+), Base: ' +
r'(\w+), Semaphore: (\w+).*?Arguments: ([^\n]*)',
notes, re.DOTALL):
self._add_or_merge_probe(
bin_path, match.group(1), match.group(2),
int(match.group(3), 16),
int(match.group(5), 16), match.group(6)
)
# Note that BPF.attach_uprobe takes care of subtracting
# the load address for that bin, so we can report the actual
# address that appears in the note
def _add_or_merge_probe(self, bin_path, provider, name, location,
semaphore, arguments):
matches = filter(lambda p: p.provider == provider and \
p.name == name, self.probes)
if len(matches) > 0:
probe = matches[0]
else:
probe = USDTProbe(bin_path, provider, name, semaphore)
self.probes.append(probe)
probe.add_location(location, arguments)
def __str__(self):
return "\n".join(map(USDTProbe.display_verbose, self.probes))
......@@ -27,16 +27,7 @@ class ProcessSymbols(object):
def refresh_code_ranges(self):
self.code_ranges = self._get_code_ranges()
self.ranges_cache = {}
self.exe = self._get_exe()
self.start_time = self._get_start_time()
def _get_exe(self):
return ProcessSymbols._run_command_get_output(
"readlink -f /proc/%d/exe" % self.pid)
def _get_start_time(self):
return ProcessSymbols._run_command_get_output(
"cut -d' ' -f 22 /proc/%d/stat" % self.pid)
self.procstat = ProcStat(self.pid)
@staticmethod
def _is_binary_segment(parts):
......@@ -101,10 +92,7 @@ class ProcessSymbols(object):
return "%x" % offset
def _check_pid_wrap(self):
# If the pid wrapped, our exe name and start time must have changed.
# Detect this and get rid of the cached ranges.
if self.exe != self._get_exe() or \
self.start_time != self._get_start_time():
if self.procstat.is_stale():
self.refresh_code_ranges()
def decode_addr(self, addr):
......@@ -127,3 +115,4 @@ class ProcessSymbols(object):
binary)
return "%x" % addr
from . import ProcStat
......@@ -12,7 +12,7 @@
# Licensed under the Apache License, Version 2.0 (the "License")
# Copyright (C) 2016 Sasha Goldshtein.
from bcc import BPF, Tracepoint, Perf
from bcc import BPF, Tracepoint, Perf, USDTReader
from time import sleep, strftime
import argparse
import re
......@@ -20,27 +20,14 @@ import traceback
import os
import sys
class Specifier(object):
probe_text = """
DATA_DECL
int PROBENAME(struct pt_regs *ctx SIGNATURE)
{
PREFIX
PID_FILTER
if (!(FILTER)) return 0;
KEY_EXPR
COLLECT
return 0;
}
"""
class Probe(object):
next_probe_index = 0
aliases = { "$PID": "bpf_get_current_pid_tgid()" }
def _substitute_aliases(self, expr):
if expr is None:
return expr
for alias, subst in Specifier.aliases.items():
for alias, subst in Probe.aliases.items():
expr = expr.replace(alias, subst)
return expr
......@@ -57,7 +44,9 @@ int PROBENAME(struct pt_regs *ctx SIGNATURE)
param_name = param[index+1:].strip()
self.param_types[param_name] = param_type
entry_probe_text = """
def _generate_entry(self):
self.entry_probe_func = self.probe_func_name + "_entry"
text = """
int PROBENAME(struct pt_regs *ctx SIGNATURE)
{
u32 pid = bpf_get_current_pid_tgid();
......@@ -66,10 +55,6 @@ int PROBENAME(struct pt_regs *ctx SIGNATURE)
return 0;
}
"""
def _generate_entry(self):
self.entry_probe_func = self.probe_func_name + "_entry"
text = self.entry_probe_text
text = text.replace("PROBENAME", self.entry_probe_func)
text = text.replace("SIGNATURE",
"" if len(self.signature) == 0 else ", " + self.signature)
......@@ -173,8 +158,8 @@ u64 __time = bpf_ktime_get_ns();
"function signature must be specified")
if len(parts) > 6:
self._bail("extraneous ':'-separated parts detected")
if parts[0] not in ["r", "p", "t"]:
self._bail("probe type must be 'p', 'r', or 't', " +
if parts[0] not in ["r", "p", "t", "u"]:
self._bail("probe type must be 'p', 'r', 't', or 'u' " +
"but got '%s'" % parts[0])
if re.match(r"\w+\(.*\)", parts[2]) is None:
self._bail(("function signature '%s' has an invalid " +
......@@ -191,6 +176,7 @@ u64 __time = bpf_ktime_get_ns();
self.exprs = exprs.split(',')
def __init__(self, type, specifier, pid):
self.pid = pid
self.raw_spec = specifier
self._validate_specifier()
......@@ -210,6 +196,10 @@ u64 __time = bpf_ktime_get_ns();
self.tp = Tracepoint.enable_tracepoint(
self.tp_category, self.tp_event)
self.function = "perf_trace_" + self.function
elif self.probe_type == "u":
self.library = parts[1]
self._find_usdt_probe()
self._enable_usdt_probe()
else:
self.library = parts[1]
self.is_user = len(self.library) > 0
......@@ -244,12 +234,32 @@ u64 __time = bpf_ktime_get_ns();
self.entry_probe_required = self.probe_type == "r" and \
(any(map(check, self.exprs)) or check(self.filter))
self.pid = pid
self.probe_func_name = "%s_probe%d" % \
(self.function, Specifier.next_probe_index)
(self.function, Probe.next_probe_index)
self.probe_hash_name = "%s_hash%d" % \
(self.function, Specifier.next_probe_index)
Specifier.next_probe_index += 1
(self.function, Probe.next_probe_index)
Probe.next_probe_index += 1
def _enable_usdt_probe(self):
if self.usdt.need_enable():
if self.pid is None:
self._bail("probe needs pid to enable")
self.usdt.enable(self.pid)
def _disable_usdt_probe(self):
if self.probe_type == "u" and self.usdt.need_enable():
self.usdt.disable(self.pid)
def close(self):
self._disable_usdt_probe()
def _find_usdt_probe(self):
reader = USDTReader(bin_path=self.library)
for probe in reader.probes:
if probe.name == self.function:
self.usdt = probe
return
self._bail("unrecognized USDT probe %s" % self.function)
def _substitute_exprs(self):
def repl(expr):
......@@ -270,8 +280,8 @@ u64 __time = bpf_ktime_get_ns();
def _generate_field_assignment(self, i):
if self._is_string(self.expr_types[i]):
return " bpf_probe_read(" + \
"&__key.v%d.s, sizeof(__key.v%d.s), %s);\n" % \
return (" bpf_probe_read(&__key.v%d.s," +
" sizeof(__key.v%d.s), (void *)%s);\n") % \
(i, i, self.exprs[i])
else:
return " __key.v%d = %s;\n" % (i, self.exprs[i])
......@@ -318,10 +328,25 @@ u64 __time = bpf_ktime_get_ns();
def generate_text(self):
program = ""
probe_text = """
DATA_DECL
QUALIFIER int PROBENAME(struct pt_regs *ctx SIGNATURE)
{
PID_FILTER
PREFIX
if (!(FILTER)) return 0;
KEY_EXPR
COLLECT
return 0;
}
"""
prefix = ""
qualifier = ""
signature = ""
# If any entry arguments are probed in a ret probe, we need
# to generate an entry probe to collect them
prefix = ""
if self.entry_probe_required:
program += self._generate_entry_probe()
prefix += self._generate_retprobe_prefix()
......@@ -329,18 +354,19 @@ u64 __time = bpf_ktime_get_ns();
# value we collected when entering the function:
self._replace_entry_exprs()
# If this is a tracepoint probe, generate a local variable
# that enables access to the tracepoint structure and also
# the structure definition itself
if self.probe_type == "t":
program += self.tp.generate_struct()
prefix += self.tp.generate_get_struct()
program += self.probe_text.replace("PROBENAME",
self.probe_func_name)
signature = "" if len(self.signature) == 0 \
or self.probe_type == "r" \
else ", " + self.signature
elif self.probe_type == "u":
qualifier = "static inline"
signature = ", int __loc_id"
prefix += self.usdt.generate_usdt_cases()
elif self.probe_type == "p" and len(self.signature) > 0:
# Only entry uprobes/kprobes can have user-specified
# signatures. Other probes force it to ().
signature = ", " + self.signature
program += probe_text.replace("PROBENAME", self.probe_func_name)
program = program.replace("SIGNATURE", signature)
program = program.replace("PID_FILTER",
self._generate_pid_filter())
......@@ -354,34 +380,56 @@ u64 __time = bpf_ktime_get_ns();
"1" if len(self.filter) == 0 else self.filter)
program = program.replace("COLLECT", collect)
program = program.replace("PREFIX", prefix)
program = program.replace("QUALIFIER", qualifier)
if self.probe_type == "u":
self.usdt_thunk_names = []
program += self.usdt.generate_usdt_thunks(
self.probe_func_name, self.usdt_thunk_names)
return program
def attach(self, bpf):
self.bpf = bpf
uprobes_start = len(BPF.open_uprobes())
kprobes_start = len(BPF.open_kprobes())
if self.is_user:
if self.probe_type == "r":
bpf.attach_uretprobe(name=self.library,
def _attach_u(self):
libpath = BPF.find_library(self.library)
if libpath is None:
with os.popen(("which --skip-alias %s " +
"2>/dev/null") % self.library) as w:
libpath = w.read().strip()
if libpath is None or len(libpath) == 0:
self._bail("unable to find library %s" %
self.library)
if self.probe_type == "u":
for i, location in enumerate(self.usdt.locations):
self.bpf.attach_uprobe(name=libpath,
addr=location.address,
fn_name=self.usdt_thunk_names[i],
pid=self.pid or -1)
elif self.probe_type == "r":
self.bpf.attach_uretprobe(name=libpath,
sym=self.function,
fn_name=self.probe_func_name,
pid=self.pid or -1)
else:
bpf.attach_uprobe(name=self.library,
self.bpf.attach_uprobe(name=libpath,
sym=self.function,
fn_name=self.probe_func_name,
pid=self.pid or -1)
if len(BPF.open_uprobes()) != uprobes_start + 1:
self._bail("error attaching probe")
else:
def _attach_k(self):
if self.probe_type == "r" or self.probe_type == "t":
bpf.attach_kretprobe(event=self.function,
self.bpf.attach_kretprobe(event=self.function,
fn_name=self.probe_func_name)
else:
bpf.attach_kprobe(event=self.function,
self.bpf.attach_kprobe(event=self.function,
fn_name=self.probe_func_name)
if len(BPF.open_kprobes()) != kprobes_start + 1:
self._bail("error attaching probe")
def attach(self, bpf):
self.bpf = bpf
if self.is_user:
self._attach_u()
else:
self._attach_k()
if self.entry_probe_required:
self._attach_entry_probe()
......@@ -397,7 +445,7 @@ u64 __time = bpf_ktime_get_ns();
expr = self.exprs[i].replace(
"(bpf_ktime_get_ns() - *____latency_val)", "$latency")
# Replace alias values back with the alias name
for alias, subst in Specifier.aliases.items():
for alias, subst in Probe.aliases.items():
expr = expr.replace(subst, alias)
# Replace retval expression with $retval
expr = expr.replace("ctx->ax", "$retval")
......@@ -445,12 +493,16 @@ u64 __time = bpf_ktime_get_ns();
if not self.is_default_expr else "retval")
data.print_log2_hist(val_type=label)
def __str__(self):
return self.label or self.raw_spec
class Tool(object):
examples = """
Probe specifier syntax:
{p,r,t}:{[library],category}:function(signature)[:type[,type...]:expr[,expr...][:filter]][#label]
{p,r,t,u}:{[library],category}:function(signature)[:type[,type...]:expr[,expr...][:filter]][#label]
Where:
p,r,t -- probe at function entry, function exit, or kernel tracepoint
p,r,t,u -- probe at function entry, function exit, kernel tracepoint,
or USDT probe
in exit probes: can use $retval, $entry(param), $latency
library -- the library that contains the function
(leave empty for kernel functions)
......@@ -509,6 +561,10 @@ argdist -H 't:block:block_rq_complete():u32:tp.nr_sector'
argdist -C 't:irq:irq_handler_entry():int:tp.irq'
Aggregate interrupts by interrupt request (IRQ)
argdist -C 'u:pthread:pthread_start():u64:arg2' -p 1337
Print frequency of function addresses used as a pthread start function,
relying on the USDT pthread_start probe in process 1337
argdist -H \\
'p:c:sleep(u32 seconds):u32:seconds' \\
'p:c:nanosleep(struct timespec *req):long:req->tv_nsec'
......@@ -552,15 +608,15 @@ argdist -p 2780 -z 120 \\
help="additional header files to include in the BPF program")
self.args = parser.parse_args()
def _create_specifiers(self):
self.specifiers = []
def _create_probes(self):
self.probes = []
for specifier in (self.args.countspecifier or []):
self.specifiers.append(Specifier(
self.probes.append(Probe(
"freq", specifier, self.args.pid))
for histspecifier in (self.args.histspecifier or []):
self.specifiers.append(
Specifier("hist", histspecifier, self.args.pid))
if len(self.specifiers) == 0:
self.probes.append(
Probe("hist", histspecifier, self.args.pid))
if len(self.probes) == 0:
print("at least one specifier is required")
exit()
......@@ -573,19 +629,19 @@ struct __string_t { char s[%d]; };
for include in (self.args.include or []):
bpf_source += "#include <%s>\n" % include
bpf_source += BPF.generate_auto_includes(
map(lambda s: s.raw_spec, self.specifiers))
map(lambda p: p.raw_spec, self.probes))
bpf_source += Tracepoint.generate_decl()
bpf_source += Tracepoint.generate_entry_probe()
for specifier in self.specifiers:
bpf_source += specifier.generate_text()
for probe in self.probes:
bpf_source += probe.generate_text()
if self.args.verbose:
print(bpf_source)
self.bpf = BPF(text=bpf_source)
def _attach(self):
Tracepoint.attach(self.bpf)
for specifier in self.specifiers:
specifier.attach(self.bpf)
for probe in self.probes:
probe.attach(self.bpf)
if self.args.verbose:
print("open uprobes: %s" % BPF.open_uprobes())
print("open kprobes: %s" % BPF.open_kprobes())
......@@ -598,16 +654,22 @@ struct __string_t { char s[%d]; };
except KeyboardInterrupt:
exit()
print("[%s]" % strftime("%H:%M:%S"))
for specifier in self.specifiers:
specifier.display(self.args.top)
for probe in self.probes:
probe.display(self.args.top)
count_so_far += 1
if self.args.count is not None and \
count_so_far >= self.args.count:
exit()
def _close_probes(self):
for probe in self.probes:
probe.close()
if self.args.verbose:
print("closed probe: " + str(probe))
def run(self):
try:
self._create_specifiers()
self._create_probes()
self._generate_program()
self._attach()
self._main_loop()
......@@ -616,6 +678,7 @@ struct __string_t { char s[%d]; };
traceback.print_exc()
elif sys.exc_type is not SystemExit:
print(sys.exc_value)
self._close_probes()
if __name__ == "__main__":
Tool().run()
......@@ -332,9 +332,10 @@ optional arguments:
additional header files to include in the BPF program
Probe specifier syntax:
{p,r,t}:{[library],category}:function(signature)[:type[,type...]:expr[,expr...][:filter]][#label]
{p,r,t,u}:{[library],category}:function(signature)[:type[,type...]:expr[,expr...][:filter]][#label]
Where:
p,r,t -- probe at function entry, function exit, or kernel tracepoint
p,r,t,u -- probe at function entry, function exit, kernel tracepoint,
or USDT probe
in exit probes: can use $retval, $entry(param), $latency
library -- the library that contains the function
(leave empty for kernel functions)
......@@ -392,6 +393,10 @@ argdist -H 't:block:block_rq_complete():u32:tp.nr_sector'
argdist -C 't:irq:irq_handler_entry():int:tp.irq'
Aggregate interrupts by interrupt request (IRQ)
argdist -C 'u:pthread:pthread_start():u64:arg2' -p 1337
Print frequency of function addresses used as a pthread start function,
relying on the USDT pthread_start probe in process 1337
argdist -H \
'p:c:sleep(u32 seconds):u32:seconds' \
'p:c:nanosleep(struct timespec *req):long:req->tv_nsec'
......
#!/usr/bin/env python
#
# tplist Display kernel tracepoints and their formats.
# tplist Display kernel tracepoints or USDT probes and their formats.
#
# USAGE: tplist [-v] [tracepoint]
# USAGE: tplist [-p PID] [-l LIB] [-v] [filter]
#
# Licensed under the Apache License, Version 2.0 (the "License")
# Copyright (C) 2016 Sasha Goldshtein.
import argparse
import fnmatch
import re
import os
import re
import sys
from bcc import USDTReader
trace_root = "/sys/kernel/debug/tracing"
event_root = os.path.join(trace_root, "events")
parser = argparse.ArgumentParser(description=
"Display kernel tracepoints and their formats.",
"Display kernel tracepoints or USDT probes and their formats.",
formatter_class=argparse.RawDescriptionHelpFormatter)
parser.add_argument("-p", "--pid", type=int, default=-1, help=
"List USDT probes in the specified process")
parser.add_argument("-l", "--lib", default="", help=
"List USDT probes in the specified library or executable")
parser.add_argument("-v", dest="variables", action="store_true", help=
"Print the format (available variables) for each tracepoint")
parser.add_argument(dest="tracepoint", nargs="?",
help="The tracepoint name to print (wildcards allowed)")
"Print the format (available variables)")
parser.add_argument(dest="filter", nargs="?", help=
"A filter that specifies which probes/tracepoints to print")
args = parser.parse_args()
def print_tpoint_format(category, event):
......@@ -42,12 +49,12 @@ def print_tpoint_format(category, event):
def print_tpoint(category, event):
tpoint = "%s:%s" % (category, event)
if not args.tracepoint or fnmatch.fnmatch(tpoint, args.tracepoint):
if not args.filter or fnmatch.fnmatch(tpoint, args.filter):
print(tpoint)
if args.variables:
print_tpoint_format(category, event)
def print_all():
def print_tracepoints():
for category in os.listdir(event_root):
cat_dir = os.path.join(event_root, category)
if not os.path.isdir(cat_dir):
......@@ -57,5 +64,28 @@ def print_all():
if os.path.isdir(evt_dir):
print_tpoint(category, event)
def print_usdt(pid, lib):
reader = USDTReader(bin_path=lib, pid=pid)
probes_seen = []
for probe in reader.probes:
probe_name = "%s:%s" % (probe.provider, probe.name)
if not args.filter or fnmatch.fnmatch(probe_name, args.filter):
if probe_name in probes_seen:
continue
probes_seen.append(probe_name)
if args.variables:
print(probe.display_verbose())
else:
print("%s %s:%s" % (probe.bin_path,
probe.provider, probe.name))
if __name__ == "__main__":
print_all()
try:
if args.pid != -1 or args.lib != "":
print_usdt(args.pid, args.lib)
else:
print_tracepoints()
except:
if sys.exc_type is not SystemExit:
print(sys.exc_value)
Demonstrations of tplist.
tplist displays kernel tracepoints and USDT probes, including their
format. It can be used to discover probe points for use with the trace
and argdist tools. Kernel tracepoints are scattered around the kernel
and provide valuable static tracing on block and network I/O, scheduling,
power events, and many other subjects. USDT probes are placed in libraries
(such as libc) and executables (such as node) and provide static tracing
information that can (optionally) be turned on and off at runtime.
For example, suppose you want to discover which USDT probes a particular
executable contains. Just run tplist on that executable (or library):
$ tplist -l basic_usdt
/home/vagrant/basic_usdt basic_usdt:start_main
/home/vagrant/basic_usdt basic_usdt:loop_iter
/home/vagrant/basic_usdt basic_usdt:end_main
The loop_iter probe sounds interesting. What are the locations of that
probe, and which variables are available?
$ tplist '*loop_iter' -l basic_usdt -v
/home/vagrant/basic_usdt basic_usdt:loop_iter [sema 0x601036]
location 0x400550 raw args: -4@$42 8@%rax
4 signed bytes @ constant 42
8 unsigned bytes @ register %rax
location 0x40056f raw args: 8@-8(%rbp) 8@%rax
8 unsigned bytes @ -8(%rbp)
8 unsigned bytes @ register %rax
This output indicates that the loop_iter probe is used in two locations
in the basic_usdt executable. The first location passes a constant value,
42, to the probe. The second location passes a variable value located at
an offset from the %rbp register. Don't worry -- you don't have to trace
the register values yourself. The argdist and trace tools understand the
probe format and can print out the arguments automatically -- you can
refer to them as arg1, arg2, and so on.
Try to explore with some common libraries on your system and see if they
contain UDST probes. Here are two examples you might find interesting:
$ tplist -l pthread # list probes in libpthread
/lib64/libpthread.so.0 libpthread:pthread_start
/lib64/libpthread.so.0 libpthread:pthread_create
/lib64/libpthread.so.0 libpthread:pthread_join
/lib64/libpthread.so.0 libpthread:pthread_join_ret
/lib64/libpthread.so.0 libpthread:mutex_init
... more output truncated
$ tplist -l c # list probes in libc
/lib64/libc.so.6 libc:setjmp
/lib64/libc.so.6 libc:longjmp
/lib64/libc.so.6 libc:longjmp_target
/lib64/libc.so.6 libc:memory_arena_reuse_free_list
/lib64/libc.so.6 libc:memory_heap_new
... more output truncated
tplist also understands kernel tracepoints, and can list their format
as well. For example, let's look for all block I/O-related tracepoints:
# tplist 'block*'
block:block_touch_buffer
block:block_dirty_buffer
block:block_rq_abort
block:block_rq_requeue
block:block_rq_complete
block:block_rq_insert
block:block_rq_issue
block:block_bio_bounce
block:block_bio_complete
block:block_bio_backmerge
block:block_bio_frontmerge
block:block_bio_queue
block:block_getrq
block:block_sleeprq
block:block_plug
block:block_unplug
block:block_split
block:block_bio_remap
block:block_rq_remap
The block:block_rq_complete tracepoints sounds interesting. Let's print
its format to see what we can trace with argdist and trace:
$ tplist -v block:block_rq_complete
block:block_rq_complete
dev_t dev;
sector_t sector;
unsigned int nr_sector;
int errors;
char rwbs[8];
The dev, sector, nr_sector, etc. variables can now all be used in probes
you specify with argdist or trace.
USAGE message:
$ tplist -h
usage: tplist.py [-h] [-p PID] [-l LIB] [-v] [filter]
Display kernel tracepoints or USDT probes and their formats.
positional arguments:
filter A filter that specifies which probes/tracepoints to print
optional arguments:
-h, --help show this help message and exit
-p PID, --pid PID List USDT probes in the specified process
-l LIB, --lib LIB List USDT probes in the specified library or executable
-v Print the format (available variables)
......@@ -9,7 +9,7 @@
# Licensed under the Apache License, Version 2.0 (the "License")
# Copyright (C) 2016 Sasha Goldshtein.
from bcc import BPF, Tracepoint, Perf
from bcc import BPF, Tracepoint, Perf, USDTReader
from time import sleep, strftime
import argparse
import re
......@@ -49,12 +49,14 @@ class Probe(object):
event_count = 0
first_ts = 0
use_localtime = True
pid = -1
@classmethod
def configure(cls, args):
cls.max_events = args.max_events
cls.use_localtime = not args.offset
cls.first_ts = Time.monotonic_time()
cls.pid = args.pid or -1
def __init__(self, probe, string_size):
self.raw_probe = probe
......@@ -63,18 +65,18 @@ class Probe(object):
self._parse_probe()
self.probe_num = Probe.probe_count
self.probe_name = "probe_%s_%d" % \
(self.function, self.probe_num)
(self._display_function(), self.probe_num)
def __str__(self):
return "%s:%s`%s FLT=%s ACT=%s/%s" % (self.probe_type,
self.library, self.function, self.filter,
return "%s:%s:%s FLT=%s ACT=%s/%s" % (self.probe_type,
self.library, self._display_function(), self.filter,
self.types, self.values)
def is_default_action(self):
return self.python_format == ""
def _bail(self, error):
raise ValueError("error parsing probe '%s': %s" %
raise ValueError("error in probe '%s': %s" %
(self.raw_probe, error))
def _parse_probe(self):
......@@ -124,11 +126,11 @@ class Probe(object):
parts = ["p", parts[0], parts[1]]
if len(parts[0]) == 0:
self.probe_type = "p"
elif parts[0] in ["p", "r", "t"]:
elif parts[0] in ["p", "r", "t", "u"]:
self.probe_type = parts[0]
else:
self._bail("expected '', 'p', 't', or 'r', got '%s'" %
parts[0])
self._bail("probe type must be '', 'p', 't', 'r', " +
"or 'u', but got '%s'" % parts[0])
if self.probe_type == "t":
self.tp_category = parts[1]
self.tp_event = parts[2]
......@@ -136,10 +138,39 @@ class Probe(object):
self.tp_category, self.tp_event)
self.library = "" # kernel
self.function = "perf_trace_%s" % self.tp_event
elif self.probe_type == "u":
self.library = parts[1]
self.usdt_name = parts[2]
self.function = "" # no function, just address
# We will discover the USDT provider by matching on
# the USDT name in the specified library
self._find_usdt_probe()
self._enable_usdt_probe()
else:
self.library = parts[1]
self.function = parts[2]
def _enable_usdt_probe(self):
if self.usdt.need_enable():
if Probe.pid == -1:
self._bail("probe needs pid to enable")
self.usdt.enable(Probe.pid)
def _disable_usdt_probe(self):
if self.probe_type == "u" and self.usdt.need_enable():
self.usdt.disable(Probe.pid)
def close(self):
self._disable_usdt_probe()
def _find_usdt_probe(self):
reader = USDTReader(bin_path=self.library)
for probe in reader.probes:
if probe.name == self.usdt_name:
self.usdt = probe
return
self._bail("unrecognized USDT probe %s" % self.usdt_name)
def _parse_filter(self, filt):
self.filter = self._replace_args(filt)
......@@ -187,6 +218,10 @@ class Probe(object):
def _replace_args(self, expr):
for alias, replacement in Probe.aliases.items():
# For USDT probes, we replace argN values with the
# actual arguments for that probe.
if alias.startswith("arg") and self.probe_type == "u":
continue
expr = expr.replace(alias, replacement)
return expr
......@@ -206,7 +241,7 @@ class Probe(object):
def _generate_python_data_decl(self):
self.python_struct_name = "%s_%d_Data" % \
(self.function, self.probe_num)
(self._display_function(), self.probe_num)
fields = [
("timestamp_ns", ct.c_ulonglong),
("pid", ct.c_uint),
......@@ -266,21 +301,16 @@ BPF_PERF_OUTPUT(%s);
bpf_probe_read(&__data.v%d, sizeof(__data.v%d), (void *)%s);
}
""" % (expr, idx, idx, expr)
# return ("bpf_probe_read(&__data.v%d, " + \
# "sizeof(__data.v%d), (char*)%s);\n") % (idx, idx, expr)
# return ("__builtin_memcpy(&__data.v%d, (void *)%s, " + \
# "sizeof(__data.v%d));\n") % (idx, expr, idx)
if field_type in Probe.fmt_types:
return " __data.v%d = (%s)%s;\n" % \
(idx, Probe.c_type[field_type], expr)
self._bail("unrecognized field type %s" % field_type)
def generate_program(self, pid, include_self):
def generate_program(self, include_self):
data_decl = self._generate_data_decl()
self.pid = pid
# kprobes don't have built-in pid filters, so we have to add
# it to the function body:
if len(self.library) == 0 and pid != -1:
if len(self.library) == 0 and Probe.pid != -1:
pid_filter = """
u32 __pid = bpf_get_current_pid_tgid();
if (__pid != %d) { return 0; }
......@@ -293,17 +323,23 @@ BPF_PERF_OUTPUT(%s);
else:
pid_filter = ""
data_fields = ""
for i, expr in enumerate(self.values):
data_fields += self._generate_field_assign(i)
prefix = ""
qualifier = ""
signature = "struct pt_regs *ctx"
if self.probe_type == "t":
data_decl += self.tp.generate_struct()
prefix = self.tp.generate_get_struct()
elif self.probe_type == "u":
signature += ", int __loc_id"
prefix = self.usdt.generate_usdt_cases()
qualifier = "static inline"
data_fields = ""
for i, expr in enumerate(self.values):
data_fields += self._generate_field_assign(i)
text = """
int %s(struct pt_regs *ctx)
%s int %s(%s)
{
%s
%s
......@@ -318,9 +354,14 @@ int %s(struct pt_regs *ctx)
return 0;
}
"""
text = text % (self.probe_name, pid_filter, prefix,
self.filter, self.struct_name,
data_fields, self.events_name)
text = text % (qualifier, self.probe_name, signature,
pid_filter, prefix, self.filter,
self.struct_name, data_fields, self.events_name)
if self.probe_type == "u":
self.usdt_thunk_names = []
text += self.usdt.generate_usdt_thunks(
self.probe_name, self.usdt_thunk_names)
return data_decl + "\n" + text
......@@ -329,10 +370,12 @@ int %s(struct pt_regs *ctx)
return "%.6f" % (1e-9 * (timestamp_ns - cls.first_ts))
def _display_function(self):
if self.probe_type != 't':
if self.probe_type == 'p' or self.probe_type == 'r':
return self.function
else:
return self.function.replace("perf_trace_", "")
elif self.probe_type == 'u':
return self.usdt_name
else: # self.probe_type == 't'
return self.tp_event
def print_event(self, cpu, data, size):
# Cast as the generated structure type and display
......@@ -361,39 +404,40 @@ int %s(struct pt_regs *ctx)
bpf[self.events_name].open_perf_buffer(self.print_event)
def _attach_k(self, bpf):
kprobes_start = len(BPF.open_kprobes())
if self.probe_type == "r":
bpf.attach_kretprobe(event=self.function,
fn_name=self.probe_name)
elif self.probe_type == "p" or self.probe_type == "t":
bpf.attach_kprobe(event=self.function,
fn_name=self.probe_name)
if len(BPF.open_kprobes()) != kprobes_start + 1:
self._bail("error attaching probe")
def _attach_u(self, bpf):
libpath = BPF.find_library(self.library)
if libpath is None:
# This might be an executable (e.g. 'bash')
with os.popen("/usr/bin/which %s 2>/dev/null" %
with os.popen(
"/usr/bin/which --skip-alias %s 2>/dev/null" %
self.library) as w:
libpath = w.read().strip()
if libpath is None or len(libpath) == 0:
self._bail("unable to find library %s" % self.library)
uprobes_start = len(BPF.open_uprobes())
if self.probe_type == "r":
if self.probe_type == "u":
for i, location in enumerate(self.usdt.locations):
bpf.attach_uprobe(name=libpath,
addr=location.address,
fn_name=self.usdt_thunk_names[i],
pid=Probe.pid)
elif self.probe_type == "r":
bpf.attach_uretprobe(name=libpath,
sym=self.function,
fn_name=self.probe_name,
pid=self.pid)
pid=Probe.pid)
else:
bpf.attach_uprobe(name=libpath,
sym=self.function,
fn_name=self.probe_name,
pid=self.pid)
if len(BPF.open_uprobes()) != uprobes_start + 1:
self._bail("error attaching probe")
pid=Probe.pid)
class Tool(object):
examples = """
......@@ -419,6 +463,8 @@ trace 'r:c:malloc (retval) "allocated = %p", retval
Trace returns from malloc and print non-NULL allocated buffers
trace 't:block:block_rq_complete "sectors=%d", tp.nr_sector'
Trace the block_rq_complete kernel tracepoint and print # of tx sectors
trace 'u:pthread:pthread_create (arg4 != 0)'
Trace the USDT probe pthread_create when its 4th argument is non-zero
"""
def __init__(self):
......@@ -461,7 +507,7 @@ trace 't:block:block_rq_complete "sectors=%d", tp.nr_sector'
self.program += Tracepoint.generate_entry_probe()
for probe in self.probes:
self.program += probe.generate_program(
self.args.pid or -1, self.args.include_self)
self.args.include_self)
if self.args.verbose:
print(self.program)
......@@ -486,6 +532,12 @@ trace 't:block:block_rq_complete "sectors=%d", tp.nr_sector'
while True:
self.bpf.kprobe_poll()
def _close_probes(self):
for probe in self.probes:
probe.close()
if self.args.verbose:
print("closed probe: " + str(probe))
def run(self):
try:
self._create_probes()
......@@ -497,6 +549,7 @@ trace 't:block:block_rq_complete "sectors=%d", tp.nr_sector'
traceback.print_exc()
elif sys.exc_type is not SystemExit:
print(sys.exc_value)
self._close_probes()
if __name__ == "__main__":
Tool().run()
......@@ -171,4 +171,6 @@ trace 'r:c:malloc (retval) "allocated = %p", retval
Trace returns from malloc and print non-NULL allocated buffers
trace 't:block:block_rq_complete "sectors=%d", tp.nr_sector'
Trace the block_rq_complete kernel tracepoint and print # of tx sectors
trace 'u:pthread:pthread_create (arg4 != 0)'
Trace the USDT probe pthread_create when its 4th argument is non-zero
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment