Commit fd60d55c authored by Sasha Goldshtein's avatar Sasha Goldshtein

tracepoint support for argdist and trace, and new tplist tool for printing tracepoints

parent f6bf78f5
......@@ -108,6 +108,7 @@ Examples:
- tools/[tcpconnect](tools/tcpconnect.py): Trace TCP active connections (connect()). [Examples](tools/tcpconnect_example.txt).
- tools/[tcpconnlat](tools/tcpconnlat.py): Trace TCP active connection latency (connect()). [Examples](tools/tcpconnlat_example.txt).
- tools/[tcpretrans](tools/tcpretrans.py): Trace TCP retransmits and TLPs. [Examples](tools/tcpretrans_example.txt).
- tools/[tplist](tools/tplist.py): Display kernel tracepoints and their format.
- tools/[trace](tools/trace.py): Trace arbitrary functions, with filters. [Examples](tools/trace_example.txt)
- tools/[vfscount](tools/vfscount.py) tools/[vfscount.c](tools/vfscount.c): Count VFS calls. [Examples](tools/vfscount_example.txt).
- tools/[vfsstat](tools/vfsstat.py) tools/[vfsstat.c](tools/vfsstat.c): Count some VFS calls, with column output. [Examples](tools/vfsstat_example.txt).
......
......@@ -50,11 +50,11 @@ many cases, argdist will deduce the necessary header files automatically.
.SH SPECIFIER SYNTAX
The general specifier syntax is as follows:
.B {p,r}:[library]:function(signature)[:type[,type...]:expr[,expr...][:filter]][#label]
.B {p,r,t}:{[library],category}:function(signature)[:type[,type...]:expr[,expr...][:filter]][#label]
.TP
.B {p,r}
Probe type \- "p" for function entry, "r" for function return;
\-H for histogram collection, \-C for frequency count.
.B {p,r,t}
Probe type \- "p" for function entry, "r" for function return, "t" for kernel
tracepoint; \-H for histogram collection, \-C for frequency count.
Indicates where to place the probe and whether the probe should collect frequency
count information, or aggregate the collected values into a histogram. Counting
probes will collect the number of times every parameter value was observed,
......@@ -68,12 +68,17 @@ Specify the full path to the .so or executable file where the function to probe
resides. Alternatively, you can specify just the lib name: for example, "c"
refers to libc. If no library name is specified, the kernel is assumed.
.TP
.B category
The category of the kernel tracepoint. For example: net, sched, block.
.TP
.B function(signature)
The function to probe, and its signature.
The function name must match exactly for the probe to be placed. The signature,
on the other hand, is only required if you plan to collect parameter values
based on that signature. For example, if you only want to collect the first
parameter, you don't have to specify the rest of the parameters in the signature.
When capturing kernel tracepoints, this should be the name of the event, e.g.
net_dev_start_xmit. The signature for kernel tracepoints should be empty.
.TP
.B [type[,type...]]
The type(s) of the expression(s) to capture.
......@@ -85,6 +90,9 @@ The expression(s) to capture.
These are the values that are assigned to the histogram or raw event collection.
You may use the parameters directly, or valid C expressions that involve the
parameters, such as "size % 10".
Tracepoints may access a special structure called "tp" that is formatted according
to the tracepoint format (which you can obtain using tplist). For example, the
block:block_rq_complete tracepoint can access tp.nr_sector.
Return probes can use the argument values received by the
function when it was entered, through the $entry(paramname) special variable.
Return probes can also access the function's return value in $retval, and the
......@@ -137,6 +145,14 @@ Count fork() calls in libc across all processes, grouped by pid:
#
.B argdist -C 'p:c:fork():int:$PID;fork per process'
.TP
Print histogram of number of sectors in completing block I/O requests:
#
.B argdist -H 't:block:block_rq_complete():u32:tp.nr_sector'
.TP
Aggregate interrupts by interrupt request (IRQ):
#
.B argdist -C 't:irq:irq_handler_entry():int:tp.irq'
.TP
Print histograms of sleep() and nanosleep() parameter values:
#
.B argdist -H 'p:c:sleep(u32 seconds):u32:seconds' 'p:c:nanosleep(struct timespec *req):long:req->tv_nsec'
......
.TH tplist 8 "2016-03-20" "USER COMMANDS"
.SH NAME
tplist \- Display kernel tracepoints and their format.
.SH SYNOPSIS
.B tplist [-v] [tracepoint]
.SH DESCRIPTION
tplist lists all kernel tracepoints, and can optionally print out the tracepoint
format; namely, the variables that you can trace when the tracepoint is hit. This
is usually used in conjunction with the argdist and/or trace tools.
On a typical system, accessing the tracepoint list and format requires root.
.SH OPTIONS
.TP
\-v
Display the variables associated with the tracepoint or tracepoints.
.TP
[tracepoint]
A wildcard expression that specifies which tracepoints to print. For example,
block:* will print all block tracepoints (block:block_rq_complete, etc.).
Regular expressions are not supported.
.SH EXAMPLES
.TP
Print all kernel tracepoints:
#
.B tplist
.TP
Print all net tracepoints with their format:
#
.B tplist -v 'net:*'
.SH SOURCE
This is from bcc.
.IP
https://github.com/iovisor/bcc
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Sasha Goldshtein
......@@ -45,10 +45,12 @@ information. See PROBE SYNTAX below.
The general probe syntax is as follows:
.B [{p,r}]:[library]:function [(predicate)] ["format string"[, arguments]]
.B t:category:event [(predicate)] ["format string"[, arguments]]
.TP
.B [{p,r}]
Probe type \- "p" for function entry, "r" for function return. The default
probe type is "p".
.B {[{p,r}],t}
Probe type \- "p" for function entry, "r" for function return, "t" for kernel
tracepoint. The default probe type is "p".
.TP
.B [library]
Library containing the probe.
......@@ -58,9 +60,15 @@ refers to libc. If no library name is specified, the kernel is assumed. Also,
you can specify an executable name (without a full path) if it is in the PATH.
For example, "bash".
.TP
.B category
The tracepoint category. For example, "sched" or "irq".
.TP
.B function
The function to probe.
.TP
.B event
The tracepoint event. For example, "block_rq_complete".
.TP
.B [(predicate)]
The filter applied to the captured data. Only if the filter evaluates as true,
the trace message will be printed. The filter can use any valid C expression
......@@ -81,6 +89,11 @@ number of arguments as there are placeholders in the format string. The
format specifier replacements may be any C expressions, and may refer to the
same special keywords as in the predicate (arg1, arg2, etc.).
In tracepoints, both the predicate and the arguments may refer to the tracepoint
format structure, which is stored in the special "tp" variable. For example, the
block:block_rq_complete tracepoint can print or filter by tp.nr_sector. To
discover the format of your tracepoint, use the tplist tool.
The predicate expression and the format specifier replacements for printing
may also use the following special keywords: $pid, $tgid to refer to the
current process' pid and tgid; $uid, $gid to refer to the current user's
......@@ -102,6 +115,10 @@ Trace all malloc calls and print the size of the requested allocation:
Trace returns from the readline function in bash and print the return value as a string:
#
.B trace 'r:bash:readline """%s"", retval'
.TP
Trace the block:block_rq_complete tracepoint and print the number of sectors completed:
#
.B trace 't:block:block_rq_complete """%d sectors"", tp.nr_sector'
.SH SOURCE
This is from bcc.
.IP
......
......@@ -325,6 +325,11 @@ class BPF(object):
global open_kprobes
return open_kprobes
@staticmethod
def open_uprobes():
global open_uprobes
return open_uprobes
@staticmethod
def detach_kprobe(event):
ev_name = "p_" + event.replace("+", "_").replace(".", "_")
......
This diff is collapsed.
......@@ -262,6 +262,27 @@ p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size
The flags value must be expanded by hand, but it's still helpful to eliminate
certain kinds of allocations or visually group them together.
argdist also has basic support for kernel tracepoints. It is sometimes more
convenient to use tracepoints because they are documented and don't vary a lot
between kernel versions like function signatures tend to. For example, let's
trace the net:net_dev_start_xmit tracepoint and print the interface name that
is transmitting:
# argdist -C 't:net:net_dev_start_xmit(void *a, void *b, struct net_device *c):char*:c->name' -n 2
[05:01:10]
t:net:net_dev_start_xmit(void *a, void *b, struct net_device *c):char*:c->name
COUNT EVENT
4 c->name = eth0
[05:01:11]
t:net:net_dev_start_xmit(void *a, void *b, struct net_device *c):char*:c->name
COUNT EVENT
6 c->name = lo
92 c->name = eth0
Note that to determine the necessary function signature you need to look at the
TP_PROTO declaration in the kernel headers. For example, the net_dev_start_xmit
tracepoint is defined in the include/trace/events/net.h header file.
Here's a final example that finds how many write() system calls are performed
by each process on the system:
......@@ -311,13 +332,13 @@ optional arguments:
additional header files to include in the BPF program
Probe specifier syntax:
{p,r}:[library]:function(signature)[:type[,type...]:expr[,expr...][:filter]][#label]
{p,r,t}:{[library],category}:function(signature)[:type[,type...]:expr[,expr...][:filter]][#label]
Where:
p,r -- probe at function entry or at function exit
p,r,t -- probe at function entry, function exit, or kernel tracepoint
in exit probes: can use $retval, $entry(param), $latency
library -- the library that contains the function
(leave empty for kernel functions)
function -- the function name to trace
category -- the category of the kernel tracepoint (e.g. net, sched)
signature -- the function's parameters, as in the C header
type -- the type of the expression to collect (supports multiple)
expr -- the expression to collect (supports multiple)
......@@ -365,6 +386,12 @@ argdist -C 'p:c:fork()#fork calls'
Count fork() calls in libc across all processes
Can also use funccount.py, which is easier and more flexible
argdist -H 't:block:block_rq_complete():u32:tp.nr_sector'
Print histogram of number of sectors in completing block I/O requests
argdist -C 't:irq:irq_handler_entry():int:tp.irq'
Aggregate interrupts by interrupt request (IRQ)
argdist -H \
'p:c:sleep(u32 seconds):u32:seconds' \
'p:c:nanosleep(struct timespec *req):long:req->tv_nsec'
......
#!/usr/bin/env python
#
# tplist Display kernel tracepoints and their formats.
#
# USAGE: tplist [-v] [tracepoint]
#
# Licensed under the Apache License, Version 2.0 (the "License")
# Copyright (C) 2016 Sasha Goldshtein.
import argparse
import fnmatch
import re
import os
trace_root = "/sys/kernel/debug/tracing"
event_root = os.path.join(trace_root, "events")
parser = argparse.ArgumentParser(description=
"Display kernel tracepoints and their formats.",
formatter_class=argparse.RawDescriptionHelpFormatter)
parser.add_argument("-v", dest="variables", action="store_true", help=
"Print the format (available variables) for each tracepoint")
parser.add_argument(dest="tracepoint", nargs="?",
help="The tracepoint name to print (wildcards allowed)")
args = parser.parse_args()
def print_tpoint_format(category, event):
fmt = open(os.path.join(event_root, category, event, "format")
).readlines()
for line in fmt:
match = re.search(r'field:([^;]*);', line)
if match is None:
continue
parts = match.group(1).split()
field_name = parts[-1:][0]
field_type = " ".join(parts[:-1])
if "__data_loc" in field_type:
continue
if field_name.startswith("common_"):
continue
print(" %s %s;" % (field_type, field_name))
def print_tpoint(category, event):
tpoint = "%s:%s" % (category, event)
if not args.tracepoint or fnmatch.fnmatch(tpoint, args.tracepoint):
print(tpoint)
if args.variables:
print_tpoint_format(category, event)
def print_all():
for category in os.listdir(event_root):
cat_dir = os.path.join(event_root, category)
if not os.path.isdir(cat_dir):
continue
for event in os.listdir(cat_dir):
evt_dir = os.path.join(cat_dir, event)
if os.path.isdir(evt_dir):
print_tpoint(category, event)
if __name__ == "__main__":
print_all()
This diff is collapsed.
......@@ -80,6 +80,31 @@ Note that the retval variable must be cast to int before comparing to zero.
The reason is that the default type for argN and retval is an unsigned 64-bit
integer, which can never be smaller than 0.
trace has also some basic support for kernel tracepoints. For example, let's
trace the block:block_rq_complete tracepoint and print out the number of sectors
transferred:
# trace 't:block:block_rq_complete "sectors=%d", tp.nr_sector'
TIME PID COMM FUNC -
01:23:51 0 swapper/0 block_rq_complete sectors=8
01:23:55 10017 kworker/u64: block_rq_complete sectors=1
01:23:55 0 swapper/0 block_rq_complete sectors=8
^C
To discover the tracepoint structure format (which you can refer to as the "tp"
variable), use the tplist tool. For example:
# tplist -v block:block_rq_complete
block:block_rq_complete
dev_t dev;
sector_t sector;
unsigned int nr_sector;
int errors;
char rwbs[8];
This output tells you that you can use "tp.dev", "tp.sector", etc. in your
predicate and trace arguments.
As a final example, let's trace open syscalls for a specific process. By
default, tracing is system-wide, but the -p switch overrides this:
......@@ -144,4 +169,6 @@ trace 'r::__kmalloc (retval == 0) "kmalloc failed!"
Trace returns from __kmalloc which returned a null pointer
trace 'r:c:malloc (retval) "allocated = %p", retval
Trace returns from malloc and print non-NULL allocated buffers
trace 't:block:block_rq_complete "sectors=%d", tp.nr_sector'
Trace the block_rq_complete kernel tracepoint and print # of tx sectors
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment