Commit a466c469 authored by Sasha Goldshtein's avatar Sasha Goldshtein Committed by 4ast

funclatency: user functions support (#733)

* bcc: Allow custom bucket formatting for histogram keys

When histogram keys are complex structures (`ct.Structure`
created from C struct), they can't be used as dictionary keys
for counting purposes without a custom hashing function.
Allow the user to provide such hashing function when calling
`print_log_hist` to print the histogram.

* bcc: Allow regular expression in attach_uretprobe

Similarly to `attach_uprobe`, `attach_uretprobe` now supports
taking a regular expression.

* funclatency: Support user functions

funclatency now supports user functions (including regular
expressions) in addition to kernel functions. When multiple
processes are traced, the output is always per-function, per-
process. When a single process is traced, the output can be
combined for all traced functions (as with kernel functions).

Usage examples:

```
funclatency pthread:*mutex* -p 6449 -F
funclatency c:read
funclatency dd:* -p $(pidof dd) -uF
```
parent db287f59
......@@ -95,7 +95,7 @@ Examples:
- tools/[fileslower](tools/fileslower.py): Trace slow synchronous file reads and writes. [Examples](tools/fileslower_example.txt).
- tools/[filetop](tools/filetop.py): File reads and writes by filename and process. Top for files. [Examples](tools/filetop_example.txt).
- tools/[funccount](tools/funccount.py): Count kernel function calls. [Examples](tools/funccount_example.txt).
- tools/[funclatency](tools/funclatency.py): Time kernel functions and show their latency distribution. [Examples](tools/funclatency_example.txt).
- tools/[funclatency](tools/funclatency.py): Time functions and show their latency distribution. [Examples](tools/funclatency_example.txt).
- tools/[gethostlatency](tools/gethostlatency.py): Show latency for getaddrinfo/gethostbyname[2] calls. [Examples](tools/gethostlatency_example.txt).
- tools/[hardirqs](tools/hardirqs.py): Measure hard IRQ (hard interrupt) event time. [Examples](tools/hardirqs_example.txt).
- tools/[killsnoop](tools/killsnoop.py): Trace signals issued by the kill() syscall. [Examples](tools/killsnoop_example.txt).
......
.TH funclatency 8 "2015-08-18" "USER COMMANDS"
.SH NAME
funclatency \- Time kernel functions and print latency as a histogram.
funclatency \- Time functions and print latency as a histogram.
.SH SYNOPSIS
.B funclatency [\-h] [\-p PID] [\-i INTERVAL] [\-T] [\-u] [\-m] [\-r] [\-F] pattern
.B funclatency [\-h] [\-p PID] [\-i INTERVAL] [\-T] [\-u] [\-m] [\-F] [\-r] [\-v] pattern
.SH DESCRIPTION
This tool traces kernel function calls and times their duration (latency), and
This tool traces function calls and times their duration (latency), and
shows the latency distribution as a histogram. The time is measured from when
the function is called to when it returns, and is inclusive of both on-CPU
time and time spent blocked.
......@@ -17,7 +17,7 @@ timestamps will be overwritten, creating dubious output. Try to match single
functions, or groups of functions that run at the same stack layer, and
don't ultimately call each other.
WARNING: This uses dynamic tracing of (what can be many) kernel functions, an
WARNING: This uses dynamic tracing of (what can be many) functions, an
activity that has had issues on some kernel versions (risk of panics or
freezes). Test, and know what you are doing, before use.
......@@ -51,12 +51,19 @@ Print a separate histogram per function matched.
.TP
\-r
Use regular expressions for the search pattern.
.TP
\-v
Print the BPF program (for debugging purposes).
.SH EXAMPLES
.TP
Time the do_sys_open() kernel function, and print the distribution as a histogram:
#
.B funclatency do_sys_open
.TP
Time the read() function in libc across all processes on the system:
#
.B funclatency c:read
.TP
Time vfs_read(), and print the histogram in units of microseconds:
#
.B funclatency \-u vfs_read
......@@ -116,6 +123,6 @@ Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
Brendan Gregg, Sasha Goldshtein
.SH SEE ALSO
funccount(8)
......@@ -700,9 +700,9 @@ class BPF(object):
raise Exception("Failed to detach BPF from uprobe")
self._del_uprobe(ev_name)
def attach_uretprobe(self, name="", sym="", addr=None,
def attach_uretprobe(self, name="", sym="", sym_re="", addr=None,
fn_name="", pid=-1, cpu=0, group_fd=-1):
"""attach_uretprobe(name="", sym="", addr=None, fn_name=""
"""attach_uretprobe(name="", sym="", sym_re="", addr=None, fn_name=""
pid=-1, cpu=0, group_fd=-1)
Run the bpf function denoted by fn_name every time the symbol sym in
......@@ -710,6 +710,13 @@ class BPF(object):
meaning of additional parameters.
"""
if sym_re:
for sym_addr in self._get_user_functions(name, sym_re):
self.attach_uretprobe(name=name, addr=sym_addr,
fn_name=fn_name, pid=pid, cpu=cpu,
group_fd=group_fd)
return
name = str(name)
(path, addr) = BPF._check_path_symbol(name, sym, addr)
......
......@@ -243,16 +243,17 @@ class TableBase(MutableMapping):
return next_key
def print_log2_hist(self, val_type="value", section_header="Bucket ptr",
section_print_fn=None):
section_print_fn=None, bucket_fn=None):
"""print_log2_hist(val_type="value", section_header="Bucket ptr",
section_print_fn=None)
section_print_fn=None, bucket_fn=None)
Prints a table as a log2 histogram. The table must be stored as
log2. The val_type argument is optional, and is a column header.
If the histogram has a secondary key, multiple tables will print
and section_header can be used as a header description for each.
If section_print_fn is not None, it will be passed the bucket value
to format into a string as it sees fit.
to format into a string as it sees fit. If bucket_fn is not None,
it will be used to produce a bucket value for the histogram keys.
"""
if isinstance(self.Key(), ct.Structure):
tmp = {}
......@@ -260,6 +261,8 @@ class TableBase(MutableMapping):
f2 = self.Key._fields_[1][0]
for k, v in self.items():
bucket = getattr(k, f1)
if bucket_fn:
bucket = bucket_fn(bucket)
vals = tmp[bucket] = tmp.get(bucket, [0] * 65)
slot = getattr(k, f2)
vals[slot] = v.value
......
#!/usr/bin/python
# @lint-avoid-python-3-compatibility-imports
#
# funclatency Time kernel funcitons and print latency as a histogram.
# funclatency Time functions and print latency as a histogram.
# For Linux, uses BCC, eBPF.
#
# USAGE: funclatency [-h] [-p PID] [-i INTERVAL] [-T] [-u] [-m] [-r] pattern
# USAGE: funclatency [-h] [-p PID] [-i INTERVAL] [-T] [-u] [-m] [-F] [-r] [-v]
# pattern
#
# Run "funclatency -h" for full usage.
#
# The pattern is a string with optional '*' wildcards, similar to file globbing.
# If you'd prefer to use regular expressions, use the -r option. Matching
# multiple functions is of limited use, since the output has one histogram for
# everything. Future versions should split the output histogram by the function.
# If you'd prefer to use regular expressions, use the -r option.
#
# Currently nested or recursive functions are not supported properly, and
# timestamps will be overwritten, creating dubious output. Try to match single
......@@ -21,7 +20,8 @@
# Copyright (c) 2015 Brendan Gregg.
# Licensed under the Apache License, Version 2.0 (the "License")
#
# 20-Sep-2015 Brendan Gregg Created this.
# 20-Sep-2015 Brendan Gregg Created this.
# 06-Oct-2016 Sasha Goldshtein Added user function support.
from __future__ import print_function
from bcc import BPF
......@@ -31,16 +31,18 @@ import signal
# arguments
examples = """examples:
./funclatency do_sys_open # time the do_sys_open() kenel function
./funclatency do_sys_open # time the do_sys_open() kernel function
./funclatency c:read # time the read() C library function
./funclatency -u vfs_read # time vfs_read(), in microseconds
./funclatency -m do_nanosleep # time do_nanosleep(), in milliseconds
./funclatency -mTi 5 vfs_read # output every 5 seconds, with timestamps
./funclatency -p 181 vfs_read # time process 181 only
./funclatency 'vfs_fstat*' # time both vfs_fstat() and vfs_fstatat()
./funclatency 'c:*printf' # time the *printf family of functions
./funclatency -F 'vfs_r*' # show one histogram per matched function
"""
parser = argparse.ArgumentParser(
description="Time kernel funcitons and print latency as a histogram",
description="Time functions and print latency as a histogram",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=examples)
parser.add_argument("-p", "--pid",
......@@ -57,31 +59,57 @@ parser.add_argument("-F", "--function", action="store_true",
help="show a separate histogram per function")
parser.add_argument("-r", "--regexp", action="store_true",
help="use regular expressions. Default is \"*\" wildcards only.")
parser.add_argument("-v", "--verbose", action="store_true",
help="print the BPF program (for debugging purposes)")
parser.add_argument("pattern",
help="search expression for kernel functions")
help="search expression for functions")
args = parser.parse_args()
pattern = args.pattern
def bail(error):
print("Error: " + error)
exit(1)
parts = args.pattern.split(':')
if len(parts) == 1:
library = None
pattern = args.pattern
elif len(parts) == 2:
library = parts[0]
libpath = BPF.find_library(library) or BPF.find_exe(library)
if not libpath:
bail("can't resolve library %s" % library)
library = libpath
pattern = parts[1]
else:
bail("unrecognized pattern format '%s'" % pattern)
if not args.regexp:
pattern = pattern.replace('*', '.*')
pattern = '^' + pattern + '$'
debug = 0
# define BPF program
bpf_text = """
#include <uapi/linux/ptrace.h>
#include <linux/blkdev.h>
typedef struct ip_key {
typedef struct ip_pid {
u64 ip;
u64 pid;
} ip_pid_t;
typedef struct hist_key {
ip_pid_t key;
u64 slot;
} ip_key_t;
} hist_key_t;
BPF_HASH(start, u32);
STORAGE
int trace_func_entry(struct pt_regs *ctx)
{
u32 pid = bpf_get_current_pid_tgid();
u64 pid_tgid = bpf_get_current_pid_tgid();
u32 pid = pid_tgid;
u32 tgid = pid_tgid >> 32;
u64 ts = bpf_ktime_get_ns();
FILTER
......@@ -94,7 +122,9 @@ int trace_func_entry(struct pt_regs *ctx)
int trace_func_return(struct pt_regs *ctx)
{
u64 *tsp, delta;
u32 pid = bpf_get_current_pid_tgid();
u64 pid_tgid = bpf_get_current_pid_tgid();
u32 pid = pid_tgid;
u32 tgid = pid_tgid >> 32;
// calculate delta time
tsp = start.lookup(&pid);
......@@ -112,10 +142,13 @@ int trace_func_return(struct pt_regs *ctx)
}
"""
# do we need to store the IP and pid for each invocation?
need_key = args.function or (library and not args.pid)
# code substitutions
if args.pid:
bpf_text = bpf_text.replace('FILTER',
'if (pid != %s) { return 0; }' % args.pid)
'if (tgid != %s) { return 0; }' % args.pid)
else:
bpf_text = bpf_text.replace('FILTER', '')
if args.milliseconds:
......@@ -127,22 +160,32 @@ elif args.microseconds:
else:
bpf_text = bpf_text.replace('FACTOR', '')
label = "nsecs"
if args.function:
if need_key:
bpf_text = bpf_text.replace('STORAGE', 'BPF_HASH(ipaddr, u32);\n' +
'BPF_HISTOGRAM(dist, ip_key_t);')
'BPF_HISTOGRAM(dist, hist_key_t);')
# stash the IP on entry, as on return it's kretprobe_trampoline:
bpf_text = bpf_text.replace('ENTRYSTORE',
'u64 ip = PT_REGS_IP(ctx); ipaddr.update(&pid, &ip);')
pid = '-1' if not library else 'tgid'
bpf_text = bpf_text.replace('STORE',
'u64 ip, *ipp = ipaddr.lookup(&pid); if (ipp) { ip = *ipp; ' +
'dist.increment((ip_key_t){ip, bpf_log2l(delta)}); ' +
'ipaddr.delete(&pid); }')
"""
u64 ip, *ipp = ipaddr.lookup(&pid);
if (ipp) {
ip = *ipp;
hist_key_t key;
key.key.ip = ip;
key.key.pid = %s;
key.slot = bpf_log2l(delta);
dist.increment(key);
ipaddr.delete(&pid);
}
""" % pid)
else:
bpf_text = bpf_text.replace('STORAGE', 'BPF_HISTOGRAM(dist);')
bpf_text = bpf_text.replace('ENTRYSTORE', '')
bpf_text = bpf_text.replace('STORE',
'dist.increment(bpf_log2l(delta));')
if debug:
if args.verbose:
print(bpf_text)
# signal handler
......@@ -151,9 +194,17 @@ def signal_ignore(signal, frame):
# load BPF program
b = BPF(text=bpf_text)
b.attach_kprobe(event_re=pattern, fn_name="trace_func_entry")
b.attach_kretprobe(event_re=pattern, fn_name="trace_func_return")
matched = b.num_open_kprobes()
# attach probes
if not library:
b.attach_kprobe(event_re=pattern, fn_name="trace_func_entry")
b.attach_kretprobe(event_re=pattern, fn_name="trace_func_return")
matched = b.num_open_kprobes()
else:
b.attach_uprobe(name=library, sym_re=pattern, fn_name="trace_func_entry")
b.attach_uretprobe(name=library, sym_re=pattern, fn_name="trace_func_return")
matched = b.num_open_uprobes()
if matched == 0:
print("0 functions matched by \"%s\". Exiting." % args.pattern)
exit()
......@@ -163,6 +214,12 @@ print("Tracing %d functions for \"%s\"... Hit Ctrl-C to end." %
(matched / 2, args.pattern))
# output
def print_section(key):
if not library:
return BPF.sym(key[0], -1)
else:
return "%s [%d]" % (BPF.sym(key[0], key[1]), key[1])
exiting = 0 if args.interval else 1
dist = b.get_table("dist")
while (1):
......@@ -177,8 +234,9 @@ while (1):
if args.timestamp:
print("%-8s\n" % strftime("%H:%M:%S"), end="")
if args.function:
dist.print_log2_hist(label, "Function", BPF.ksym)
if need_key:
dist.print_log2_hist(label, "Function", section_print_fn=print_section,
bucket_fn=lambda k: (k.ip, k.pid))
else:
dist.print_log2_hist(label)
dist.clear()
......
......@@ -47,6 +47,98 @@ decoration. Only the count column is copied to user-level on output. This is an
efficient way to time kernel functions and examine their latency distribution.
Now trace a user function, pthread_mutex_lock in libpthread, to determine if
there is considerable lock contention:
# ./funclatency pthread:pthread_mutex_lock -p $(pidof contentions)
Tracing 1 function for "pthread:pthread_mutex_lock"... Hit Ctrl-C to end.
nsecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 0 | |
16 -> 31 : 0 | |
32 -> 63 : 0 | |
64 -> 127 : 0 | |
128 -> 255 : 0 | |
256 -> 511 : 0 | |
512 -> 1023 : 0 | |
1024 -> 2047 : 0 | |
2048 -> 4095 : 508967 |****************************************|
4096 -> 8191 : 70072 |***** |
8192 -> 16383 : 27686 |** |
16384 -> 32767 : 5075 | |
32768 -> 65535 : 2318 | |
65536 -> 131071 : 581 | |
131072 -> 262143 : 38 | |
262144 -> 524287 : 5 | |
524288 -> 1048575 : 1 | |
1048576 -> 2097151 : 9 | |
Detaching...
It seems that most calls to pthread_mutex_lock completed rather quickly (in
under 4us), but there were some cases of considerable contention, sometimes
over a full millisecond.
Run a quick-and-dirty profiler over all the functions in an executable:
# ./funclatency /home/user/primes:* -p $(pidof primes) -F
Tracing 15 functions for "/home/user/primes:*"... Hit Ctrl-C to end.
^C
Function = is_prime [6556]
nsecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 0 | |
16 -> 31 : 0 | |
32 -> 63 : 0 | |
64 -> 127 : 0 | |
128 -> 255 : 0 | |
256 -> 511 : 0 | |
512 -> 1023 : 0 | |
1024 -> 2047 : 0 | |
2048 -> 4095 : 1495322 |****************************************|
4096 -> 8191 : 95744 |** |
8192 -> 16383 : 9926 | |
16384 -> 32767 : 3070 | |
32768 -> 65535 : 1415 | |
65536 -> 131071 : 112 | |
131072 -> 262143 : 9 | |
262144 -> 524287 : 3 | |
524288 -> 1048575 : 0 | |
1048576 -> 2097151 : 8 | |
Function = insert_result [6556]
nsecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 0 | |
16 -> 31 : 0 | |
32 -> 63 : 0 | |
64 -> 127 : 0 | |
128 -> 255 : 0 | |
256 -> 511 : 0 | |
512 -> 1023 : 0 | |
1024 -> 2047 : 0 | |
2048 -> 4095 : 111047 |****************************************|
4096 -> 8191 : 3998 |* |
8192 -> 16383 : 720 | |
16384 -> 32767 : 238 | |
32768 -> 65535 : 106 | |
65536 -> 131071 : 5 | |
131072 -> 262143 : 4 | |
Detaching...
From the results, we can see that the is_prime function has something resembling
an exponential distribution -- very few primes take a very long time to test,
while most numbers are verified as prime or composite in less than 4us. The
insert_result function exhibits a similar phenomenon, likely due to contention
over a shared results container.
Now vfs_read() is traced, and a microseconds histogram printed:
# ./funclatency -u vfs_read
......@@ -239,13 +331,13 @@ Detaching...
USAGE message:
# ./funclatency -h
usage: funclatency [-h] [-p PID] [-i INTERVAL] [-T] [-u] [-m] [-F] [-r]
usage: funclatency [-h] [-p PID] [-i INTERVAL] [-T] [-u] [-m] [-F] [-r] [-v]
pattern
Time kernel functions and print latency as a histogram
Time functions and print latency as a histogram
positional arguments:
pattern search expression for kernel functions
pattern search expression for functions
optional arguments:
-h, --help show this help message and exit
......@@ -258,12 +350,15 @@ optional arguments:
-F, --function show a separate histogram per function
-r, --regexp use regular expressions. Default is "*" wildcards
only.
-v, --verbose print the BPF program (for debugging purposes)
examples:
./funclatency do_sys_open # time the do_sys_open() kernel function
./funclatency c:read # time the read() C library function
./funclatency -u vfs_read # time vfs_read(), in microseconds
./funclatency -m do_nanosleep # time do_nanosleep(), in milliseconds
./funclatency -mTi 5 vfs_read # output every 5 seconds, with timestamps
./funclatency -p 181 vfs_read # time process 181 only
./funclatency 'vfs_fstat*' # time both vfs_fstat() and vfs_fstatat()
./funclatency 'c:*printf' # time the *printf family of functions
./funclatency -F 'vfs_r*' # show one histogram per matched function
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment