Commit 064b2f51 authored by Brendan Gregg's avatar Brendan Gregg Committed by GitHub

Merge pull request #1078 from goldshtn/funcslower

funcslower: Trace slow kernel or user function calls
parents caa14ed6 9f8b9e93
......@@ -104,6 +104,7 @@ pair of .c and .py files, and some are directories of files.
- tools/[filetop](tools/filetop.py): File reads and writes by filename and process. Top for files. [Examples](tools/filetop_example.txt).
- tools/[funccount](tools/funccount.py): Count kernel function calls. [Examples](tools/funccount_example.txt).
- tools/[funclatency](tools/funclatency.py): Time functions and show their latency distribution. [Examples](tools/funclatency_example.txt).
- tools/[funcslower](tools/funcslower.py): Trace slow kernel or user function calls. [Examples](tools/funcslower_example.txt).
- tools/[gethostlatency](tools/gethostlatency.py): Show latency for getaddrinfo/gethostbyname[2] calls. [Examples](tools/gethostlatency_example.txt).
- tools/[hardirqs](tools/hardirqs.py): Measure hard IRQ (hard interrupt) event time. [Examples](tools/hardirqs_example.txt).
- tools/[killsnoop](tools/killsnoop.py): Trace signals issued by the kill() syscall. [Examples](tools/killsnoop_example.txt).
......
.TH funcslower 8 "2017-03-30" "USER COMMANDS"
.SH NAME
fileslower \- Trace slow kernel or user function calls.
.SH SYNOPSIS
.B fileslower [\-h] [\-p PID] [-m MIN_MS] [-u MIN_US] [-a ARGUMENTS] [-T] [-t] [-v] function [function ...]
.SH DESCRIPTION
This script traces a kernel or user function's entry and return points, and
prints a message when the function's latency exceeded the specified threshold.
Multiple functions are supported, and you can mix kernel functions with user
functions in different libraries.
WARNING: See the OVERHEAD section.
By default, a minimum millisecond threshold of 1 is used. Recursive functions
are not supported: only the inner-most recursive invocation will be traced.
Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_BPF and bcc.
.SH OPTIONS
\-p PID
Trace this PID only.
.TP
\-m MIN_NS
Minimum duration to trace, in milliseconds. Default is 1 ms.
.TP
\-u MIN_US
Minimum duration to trace, in microseconds.
.TP
\-a ARGUMENTS
Print the function's arguments, up to 6.
.TP
\-T
Print a HH:MM:SS timestamp with each entry.
.TP
\-t
Print a seconds timestamp with each entry, at microsecond resolution.
.TP
\-v
Print the resulting BPF program, for debugging purposes.
.TP
function
The function to trace -- multiple functions are supported. If a plain function
name is provided, the function is assumed to be a kernel function. For user
functions, provide the library name and the function name, e.g. bash:readline
or c:malloc.
.SH EXAMPLES
.TP
Trace vfs_write calls slower than 1ms:
#
.B funcslower vfs_write
.TP
Trace open() calls in libc slower than 10us:
#
.B funcslower \-u 10 c:open
.TP
Trace both malloc() and free() slower than 10us, in pid 135 only:
#
.B funcslower \-p 135 \-u 10 c:malloc c:free
.TP
Trace the write syscall and print its first 4 arguments:
#
.B funcslower -a 4 SyS_write
.SH FIELDS
.TP
TIME
Time of the event as a human-readable HH:MM:SS format, or a timestamp in seconds
at microsecond-accuracy from the first event seen.
.TP
COMM
Process name.
.TP
PID
Process ID.
.TP
LAT
Latency of the operation in either microseconds (us) or milliseconds (ms).
.TP
RVAL
The return value from the function. Often useful for diagnosing a relationship
between slow and failed function calls.
.TP
FUNC
The function name, followed by its arguments if requested.
.SH OVERHEAD
Depending on the function(s) being traced, overhead can become severe. For
example, tracing a common function like malloc() can slow down a C/C++ program
by a factor of 2 or more. On the other hand, tracing a low-frequency event like
the SyS_setreuid() function will probably not be as prohibitive, and in fact
negligible for functions that are called up to 100-1000 times per second.
You should first use the funclatency and argdist tools for investigation,
because they summarize data in-kernel and have a much lower overhead than this
tool. To get a general idea of the number of times a particular function is
called (and estimate the overhead), use the funccount tool, e.g.:
.PP
# funccount c:open
.SH SOURCE
This is from bcc.
.IP
https://github.com/iovisor/bcc
.PP
Also look in the bcc distribution for a companion _examples.txt file containing
example usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Sasha Goldshtein
.SH SEE ALSO
funccount(8), funclatency(8), argdist(8), trace(8)
......@@ -164,6 +164,10 @@ class SmokeTests(TestCase):
def test_funclatency(self):
self.run_with_int("funclatency.py __kmalloc -i 1")
@skipUnless(kernel_version_ge(4,4), "requires kernel >= 4.4")
def test_funcslower(self):
self.run_with_int("funcslower.py __kmalloc")
@skipUnless(kernel_version_ge(4,4), "requires kernel >= 4.4")
def test_gethostlatency(self):
self.run_with_int("gethostlatency.py")
......
#!/usr/bin/python
# @lint-avoid-python-3-compatibility-imports
#
# funcslower Trace slow kernel or user function calls.
# For Linux, uses BCC, eBPF.
#
# USAGE: funcslower [-h] [-p PID] [-m MIN_MS] [-u MIN_US] [-a ARGUMENTS]
# [-T] [-t] [-v] function [function ...]
#
# WARNING: This tool traces function calls by instrumenting the entry and
# return from each function. For commonly-invoked functions like memory allocs
# or file writes, this can be extremely expensive. Mind the overhead.
#
# By default, a minimum millisecond threshold of 1 is used.
#
# Copyright 2017, Sasha Goldshtein
# Licensed under the Apache License, Version 2.0 (the "License")
#
# 30-Mar-2017 Sasha Goldshtein Created this.
from __future__ import print_function
from bcc import BPF
import argparse
import ctypes as ct
import time
examples = """examples:
./funcslower vfs_write # trace vfs_write calls slower than 1ms
./funcslower -m 10 vfs_write # same, but slower than 10ms
./funcslower -u 10 c:open # trace open calls slower than 10us
./funcslower -p 135 c:open # trace pid 135 only
./funcslower c:malloc c:free # trace both malloc and free slower than 1ms
./funcslower -a 2 c:open # show first two arguments to open
"""
parser = argparse.ArgumentParser(
description="Trace slow kernel or user function calls.",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=examples)
parser.add_argument("-p", "--pid", type=int, metavar="PID", dest="tgid",
help="trace this PID only")
parser.add_argument("-m", "--min-ms", type=float, dest="min_ms",
help="minimum duration to trace (ms)")
parser.add_argument("-u", "--min-us", type=float, dest="min_us",
help="minimum duration to trace (us)")
parser.add_argument("-a", "--arguments", type=int,
help="print this many entry arguments, as hex")
parser.add_argument("-T", "--time", action="store_true",
help="show HH:MM:SS timestamp")
parser.add_argument("-t", "--timestamp", action="store_true",
help="show timestamp in seconds at us resolution")
parser.add_argument("-v", "--verbose", action="store_true",
help="print the BPF program for debugging purposes")
parser.add_argument(metavar="function", nargs="+", dest="functions",
help="function(s) to trace")
args = parser.parse_args()
# fractions are allowed, but rounded to an integer nanosecond
if args.min_ms:
duration_ns = int(args.min_ms * 1000000)
elif args.min_us:
duration_ns = int(args.min_us * 1000)
else:
duration_ns = 1000000 # default to 1ms
bpf_text = """
#include <uapi/linux/ptrace.h>
#include <linux/sched.h> // for TASK_COMM_LEN
struct entry_t {
u64 id;
u64 start_ns;
#ifdef GRAB_ARGS
u64 args[6];
#endif
};
struct data_t {
u64 id;
u64 tgid_pid;
u64 start_ns;
u64 duration_ns;
u64 retval;
char comm[TASK_COMM_LEN];
#ifdef GRAB_ARGS
u64 args[6];
#endif
};
BPF_HASH(entryinfo, u64, struct entry_t);
BPF_PERF_OUTPUT(events);
static int trace_entry(struct pt_regs *ctx, int id)
{
u64 tgid_pid = bpf_get_current_pid_tgid();
u32 tgid = tgid_pid >> 32;
if (TGID_FILTER)
return 0;
u32 pid = tgid_pid;
struct entry_t entry = {};
entry.start_ns = bpf_ktime_get_ns();
entry.id = id;
#ifdef GRAB_ARGS
entry.args[0] = PT_REGS_PARM1(ctx);
entry.args[1] = PT_REGS_PARM2(ctx);
entry.args[2] = PT_REGS_PARM3(ctx);
entry.args[3] = PT_REGS_PARM4(ctx);
entry.args[4] = PT_REGS_PARM5(ctx);
entry.args[5] = PT_REGS_PARM6(ctx);
#endif
entryinfo.update(&tgid_pid, &entry);
return 0;
}
int trace_return(struct pt_regs *ctx)
{
struct entry_t *entryp;
u64 tgid_pid = bpf_get_current_pid_tgid();
entryp = entryinfo.lookup(&tgid_pid);
if (entryp == 0) {
return 0;
}
u64 delta_ns = bpf_ktime_get_ns() - entryp->start_ns;
entryinfo.delete(&tgid_pid);
if (delta_ns < DURATION_NS)
return 0;
struct data_t data = {};
data.id = entryp->id;
data.tgid_pid = tgid_pid;
data.start_ns = entryp->start_ns;
data.duration_ns = delta_ns;
data.retval = PT_REGS_RC(ctx);
#ifdef GRAB_ARGS
bpf_probe_read(&data.args[0], sizeof(data.args), entryp->args);
#endif
bpf_get_current_comm(&data.comm, sizeof(data.comm));
events.perf_submit(ctx, &data, sizeof(data));
return 0;
}
"""
bpf_text = bpf_text.replace('DURATION_NS', str(duration_ns))
if args.arguments:
bpf_text = "#define GRAB_ARGS\n" + bpf_text
if args.tgid:
bpf_text = bpf_text.replace('TGID_FILTER', 'tgid != %d' % tgid)
else:
bpf_text = bpf_text.replace('TGID_FILTER', '0')
for i in range(len(args.functions)):
bpf_text += """
int trace_%d(struct pt_regs *ctx) {
return trace_entry(ctx, %d);
}
""" % (i, i)
if args.verbose:
print(bpf_text)
b = BPF(text=bpf_text)
for i, function in enumerate(args.functions):
if ":" in function:
library, func = function.split(":")
b.attach_uprobe(name=library, sym=func, fn_name="trace_%d" % i)
b.attach_uretprobe(name=library, sym=func, fn_name="trace_return")
else:
b.attach_kprobe(event=function, fn_name="trace_%d" % i)
b.attach_kretprobe(event=function, fn_name="trace_return")
TASK_COMM_LEN = 16 # linux/sched.h
class Data(ct.Structure):
_fields_ = [
("id", ct.c_ulonglong),
("tgid_pid", ct.c_ulonglong),
("start_ns", ct.c_ulonglong),
("duration_ns", ct.c_ulonglong),
("retval", ct.c_ulonglong),
("comm", ct.c_char * TASK_COMM_LEN)
] + ([("args", ct.c_ulonglong * 6)] if args.arguments else [])
time_designator = "us" if args.min_us else "ms"
time_value = args.min_us or args.min_ms or 1
time_multiplier = 1000 if args.min_us else 1000000
time_col = args.time or args.timestamp
print("Tracing function calls slower than %g %s... Ctrl+C to quit." %
(time_value, time_designator))
print((("%-10s " % "TIME" if time_col else "") + "%-14s %-6s %7s %16s %s") %
("COMM", "PID", "LAT(%s)" % time_designator, "RVAL",
"FUNC" + (" ARGS" if args.arguments else "")))
earliest_ts = 0
def time_str(event):
if args.time:
return "%-10s " % time.strftime("%H:%M:%S")
if args.timestamp:
global earliest_ts
if earliest_ts == 0:
earliest_ts = event.start_ns
return "%-10.6f " % ((event.start_ns - earliest_ts) / 1000000000.0)
return ""
def args_str(event):
if not args.arguments:
return ""
return str.join(" ", ["0x%x" % arg for arg in event.args[:args.arguments]])
def print_event(cpu, data, size):
event = ct.cast(data, ct.POINTER(Data)).contents
ts = float(event.duration_ns) / time_multiplier
print((time_str(event) + "%-14.14s %-6s %7.2f %16x %s %s") %
(event.comm.decode(), event.tgid_pid >> 32,
ts, event.retval, args.functions[event.id], args_str(event)))
b["events"].open_perf_buffer(print_event, page_cnt=64)
while True:
b.kprobe_poll()
Demonstrations of funcslower, the Linux eBPF/bcc version.
funcslower shows kernel or user function invocations slower than a threshold.
This can be used for last-resort diagnostics when aggregation-based tools have
failed. For example, trace the open() function in libc when it is slower than
1 microsecond (us):
# ./funcslower c:open -u 1
Tracing function calls slower than 1 us... Ctrl+C to quit.
COMM PID LAT(us) RVAL FUNC
less 27074 33.77 3 c:open
less 27074 9.96 ffffffffffffffff c:open
less 27074 5.92 ffffffffffffffff c:open
less 27074 15.88 ffffffffffffffff c:open
less 27074 8.89 3 c:open
less 27074 15.89 3 c:open
sh 27075 20.97 4 c:open
bash 27075 20.14 4 c:open
lesspipe.sh 27075 18.77 4 c:open
lesspipe.sh 27075 11.21 4 c:open
lesspipe.sh 27075 13.68 4 c:open
file 27076 14.83 ffffffffffffffff c:open
file 27076 8.02 4 c:open
file 27076 10.26 4 c:open
file 27076 6.55 4 c:open
less 27074 11.67 4 c:open
^C
This shows several open operations performed by less and some helpers it invoked
in the process. The latency (in microseconds) is shown, as well as the return
value from the open() function, which helps indicate if there is a correlation
between failures and slow invocations. Most open() calls seemed to have
completed successfully (returning a valid file descriptor), but some have failed
and returned -1.
You can also trace kernel functions:
# ./funcslower -m 10 vfs_read
Tracing function calls slower than 10 ms... Ctrl+C to quit.
COMM PID LAT(ms) RVAL FUNC
bash 11527 78.97 1 vfs_read
bash 11527 101.26 1 vfs_read
bash 11527 1053.60 1 vfs_read
bash 11527 44.21 1 vfs_read
bash 11527 79.50 1 vfs_read
bash 11527 33.37 1 vfs_read
bash 11527 112.17 1 vfs_read
bash 11527 101.49 1 vfs_read
^C
Occasionally, it is also useful to see the arguments passed to the functions.
The raw hex values of the arguments are available when using the -a switch:
# ./funcslower __kmalloc -a 2 -u 1
Tracing function calls slower than 1 us... Ctrl+C to quit.
COMM PID LAT(us) RVAL FUNC ARGS
kworker/0:2 27077 7.46 ffff90054f9f8e40 __kmalloc 0x98 0x1400000
kworker/0:2 27077 6.84 ffff90054f9f8e40 __kmalloc 0x98 0x1400000
bash 11527 6.87 ffff90054f9f8e40 __kmalloc 0x90 0x1408240
bash 11527 1.15 ffff90054f9f8e40 __kmalloc 0x90 0x1408240
bash 11527 1.15 ffff90055a1b8c00 __kmalloc 0x2c 0x1400240
bash 11527 1.18 ffff90054b87d240 __kmalloc 0x1c 0x1400040
bash 11527 10.59 ffff900546d60000 __kmalloc 0x10000 0x14082c0
bash 11527 1.49 ffff90054fbd4c00 __kmalloc 0x280 0x15080c0
bash 11527 1.00 ffff90054789b000 __kmalloc 0x800 0x15012c0
bash 27128 3.47 ffff90057ca1a200 __kmalloc 0x150 0x1400240
bash 27128 1.82 ffff90054fbd4c00 __kmalloc 0x230 0x14000c0
bash 27128 1.17 ffff90054b87d5a0 __kmalloc 0x1c 0x14000c0
perf 27128 4.81 ffff90054f9f8e40 __kmalloc 0x90 0x1408240
perf 27128 24.71 ffff900566990000 __kmalloc 0x10000 0x14082c0
^C
This shows the first two arguments to __kmalloc -- the first one is the size
of the requested allocation. The return value is also shown (null return values
would indicate a failure).
USAGE message:
usage: funcslower.py [-h] [-p PID] [-m MIN_MS] [-u MIN_US] [-a ARGUMENTS] [-T]
[-t] [-v]
function [function ...]
Trace slow kernel or user function calls.
positional arguments:
function function(s) to trace
optional arguments:
-h, --help show this help message and exit
-p PID, --pid PID trace this PID only
-m MIN_MS, --min-ms MIN_MS
minimum duration to trace (ms)
-u MIN_US, --min-us MIN_US
minimum duration to trace (us)
-a ARGUMENTS, --arguments ARGUMENTS
print this many entry arguments, as hex
-T, --time show HH:MM:SS timestamp
-t, --timestamp show timestamp in seconds at us resolution
-v, --verbose print the BPF program for debugging purposes
examples:
./funcslower vfs_write # trace vfs_write calls slower than 1ms
./funcslower -m 10 vfs_write # same, but slower than 10ms
./funcslower -u 10 c:open # trace open calls slower than 10us
./funcslower -p 135 c:open # trace pid 135 only
./funcslower c:malloc c:free # trace both malloc and free slower than 1ms
./funcslower -a 2 c:open # show first two arguments to open
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment