Commit b712c66a authored by 4ast's avatar 4ast

Merge pull request #383 from brendangregg/master

3 tools: tcpretrans, zfsslower, zfsdist
parents fb7ae106 157fee3f
...@@ -72,10 +72,10 @@ Tools: ...@@ -72,10 +72,10 @@ Tools:
- tools/[biosnoop](tools/biosnoop.py): Trace block device I/O with PID and latency. [Examples](tools/biosnoop_example.txt). - tools/[biosnoop](tools/biosnoop.py): Trace block device I/O with PID and latency. [Examples](tools/biosnoop_example.txt).
- tools/[bitesize](tools/bitesize.py): Show per process I/O size histogram. [Examples](tools/bitesize_example.txt). - tools/[bitesize](tools/bitesize.py): Show per process I/O size histogram. [Examples](tools/bitesize_example.txt).
- tools/[cachestat](tools/cachestat.py): Trace page cache hit/miss ratio. [Examples](tools/cachestat_example.txt). - tools/[cachestat](tools/cachestat.py): Trace page cache hit/miss ratio. [Examples](tools/cachestat_example.txt).
- tools/[execsnoop](tools/execsnoop.py): Trace new processes via exec() syscalls. [Examples](tools/execsnoop_example.txt).
- tools/[dcsnoop](tools/dcsnoop.py): Trace directory entry cache (dcache) lookups. [Examples](tools/dcsnoop_example.txt). - tools/[dcsnoop](tools/dcsnoop.py): Trace directory entry cache (dcache) lookups. [Examples](tools/dcsnoop_example.txt).
- tools/[dcstat](tools/dcstat.py): Directory entry cache (dcache) stats. [Examples](tools/dcstat_example.txt). - tools/[dcstat](tools/dcstat.py): Directory entry cache (dcache) stats. [Examples](tools/dcstat_example.txt).
- tools/[ext4dist](tools/ext4dist.py): Summarize ext4 operation latency. [Examples](tools/ext4dist_example.txt). - tools/[execsnoop](tools/execsnoop.py): Trace new processes via exec() syscalls. [Examples](tools/execsnoop_example.txt).
- tools/[ext4dist](tools/ext4dist.py): Summarize ext4 operation latency distribution as a histogram. [Examples](tools/ext4dist_example.txt).
- tools/[ext4slower](tools/ext4slower.py): Trace slow ext4 operations. [Examples](tools/ext4slower_example.txt). - tools/[ext4slower](tools/ext4slower.py): Trace slow ext4 operations. [Examples](tools/ext4slower_example.txt).
- tools/[filelife](tools/filelife.py): Trace the lifespan of short-lived files. [Examples](tools/filelife_example.txt). - tools/[filelife](tools/filelife.py): Trace the lifespan of short-lived files. [Examples](tools/filelife_example.txt).
- tools/[fileslower](tools/fileslower.py): Trace slow synchronous file reads and writes. [Examples](tools/fileslower_example.txt). - tools/[fileslower](tools/fileslower.py): Trace slow synchronous file reads and writes. [Examples](tools/fileslower_example.txt).
...@@ -85,7 +85,7 @@ Tools: ...@@ -85,7 +85,7 @@ Tools:
- tools/[gethostlatency](tools/gethostlatency.py): Show latency for getaddrinfo/gethostbyname[2] calls. [Examples](tools/gethostlatency_example.txt). - tools/[gethostlatency](tools/gethostlatency.py): Show latency for getaddrinfo/gethostbyname[2] calls. [Examples](tools/gethostlatency_example.txt).
- tools/[hardirqs](tools/hardirqs.py): Measure hard IRQ (hard interrupt) event time. [Examples](tools/hardirqs_example.txt). - tools/[hardirqs](tools/hardirqs.py): Measure hard IRQ (hard interrupt) event time. [Examples](tools/hardirqs_example.txt).
- tools/[killsnoop](tools/killsnoop.py): Trace signals issued by the kill() syscall. [Examples](tools/killsnoop_example.txt). - tools/[killsnoop](tools/killsnoop.py): Trace signals issued by the kill() syscall. [Examples](tools/killsnoop_example.txt).
- tools/[mdflush](tools/mdflush.py): Trace md flush events. [Examples](tools/mdflush.txt). - tools/[mdflush](tools/mdflush.py): Trace md flush events. [Examples](tools/mdflush_example.txt).
- tools/[memleak](tools/memleak.py): Display outstanding memory allocations to find memory leaks. [Examples](tools/memleak_example.txt). - tools/[memleak](tools/memleak.py): Display outstanding memory allocations to find memory leaks. [Examples](tools/memleak_example.txt).
- tools/[offcputime](tools/offcputime.py): Summarize off-CPU time by kernel stack trace. [Examples](tools/offcputime_example.txt). - tools/[offcputime](tools/offcputime.py): Summarize off-CPU time by kernel stack trace. [Examples](tools/offcputime_example.txt).
- tools/[offwaketime](tools/offwaketime.py): Summarize blocked time by kernel off-CPU stack and waker stack. [Examples](tools/offwaketime_example.txt). - tools/[offwaketime](tools/offwaketime.py): Summarize blocked time by kernel off-CPU stack and waker stack. [Examples](tools/offwaketime_example.txt).
...@@ -100,11 +100,14 @@ Tools: ...@@ -100,11 +100,14 @@ Tools:
- tools/[syncsnoop](tools/syncsnoop.py): Trace sync() syscall. [Examples](tools/syncsnoop_example.txt). - tools/[syncsnoop](tools/syncsnoop.py): Trace sync() syscall. [Examples](tools/syncsnoop_example.txt).
- tools/[tcpaccept](tools/tcpaccept.py): Trace TCP passive connections (accept()). [Examples](tools/tcpaccept_example.txt). - tools/[tcpaccept](tools/tcpaccept.py): Trace TCP passive connections (accept()). [Examples](tools/tcpaccept_example.txt).
- tools/[tcpconnect](tools/tcpconnect.py): Trace TCP active connections (connect()). [Examples](tools/tcpconnect_example.txt). - tools/[tcpconnect](tools/tcpconnect.py): Trace TCP active connections (connect()). [Examples](tools/tcpconnect_example.txt).
- tools/[tcpretrans](tools/tcpretrans.py): Trace TCP retransmits and TLPs. [Examples](tools/tcpretrans_example.txt).
- tools/[vfscount](tools/vfscount.py) tools/[vfscount.c](tools/vfscount.c): Count VFS calls. [Examples](tools/vfscount_example.txt). - tools/[vfscount](tools/vfscount.py) tools/[vfscount.c](tools/vfscount.c): Count VFS calls. [Examples](tools/vfscount_example.txt).
- tools/[vfsstat](tools/vfsstat.py) tools/[vfsstat.c](tools/vfsstat.c): Count some VFS calls, with column output. [Examples](tools/vfsstat_example.txt). - tools/[vfsstat](tools/vfsstat.py) tools/[vfsstat.c](tools/vfsstat.c): Count some VFS calls, with column output. [Examples](tools/vfsstat_example.txt).
- tools/[wakeuptime](tools/wakeuptime.py): Summarize sleep to wakeup time by waker kernel stack. [Examples](tools/wakeuptime_example.txt). - tools/[wakeuptime](tools/wakeuptime.py): Summarize sleep to wakeup time by waker kernel stack. [Examples](tools/wakeuptime_example.txt).
- tools/[xfsdist](tools/xfsdist.py): Summarize XFS operation latency. [Examples](tools/xfsdist_example.txt). - tools/[xfsdist](tools/xfsdist.py): Summarize XFS operation latency distribution as a histogram. [Examples](tools/xfsdist_example.txt).
- tools/[xfsslower](tools/xfsslower.py): Trace slow XFS operations. [Examples](tools/xfsslower_example.txt). - tools/[xfsslower](tools/xfsslower.py): Trace slow XFS operations. [Examples](tools/xfsslower_example.txt).
- tools/[zfsdist](tools/zfsdist.py): Summarize ZFS operation latency distribution as a histogram. [Examples](tools/zfsdist_example.txt).
- tools/[zfsslower](tools/zfsslower.py): Trace slow ZFS operations. [Examples](tools/zfsslower_example.txt).
### Networking ### Networking
......
.TH tcpaccept 8 "2016-02-14" "USER COMMANDS"
.SH NAME
tcpaccept \- Trace TCP retransmits and TLPs. Uses Linux eBPF/bcc.
.SH SYNOPSIS
.B tcpaccept [\-h] [\-l]
.SH DESCRIPTION
This traces TCP retransmits, showing address, port, and TCP state information,
and sometimes the PID (although usually not, since retransmits are usually
sent by the kernel on timeouts). To keep overhead very low, only
the TCP retransmit functions are traced. This does not trace every packet
(like tcpdump(8) or a packet sniffer).
This uses dynamic tracing of the kernel tcp_retransmit_skb() and
tcp_send_loss_probe() functions, and will need to be updated to
match kernel changes to these functions.
Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_BPF and bcc.
.SH OPTIONS
.TP
\-h
Print usage message.
.TP
\-l
Include tail loss probe attempts (in some cases the kernel may not
complete the TLP send).
.SH EXAMPLES
.TP
Trace TCP retransmits:
#
.B tcpretrans
.TP
Trace TCP retransmits and TLP attempts:
#
.B tcpconnect \-l
.SH FIELDS
.TP
TIME
Time of the retransmit.
.TP
PID
Process ID that was on-CPU. This is less useful than it might sound, as it
may usually be 0, for the kernel, for timer-based retransmits.
.TP
IP
IP address family (4 or 6).
.TP
LADDR
Local IP address. IPv4 as a dotted quad, IPv6 shows "..." then the last 4
bytes (check for newer versions of this tool for the full address).
.TP
LPORT
Local port.
.TP
T>
Type of event: R> == retransmit, L> == tail loss probe.
.TP
RADDR
Remote IP address. IPv4 as a dotted quad, IPv6 shows "..." then the last 4
bytes (check for newer versions of this tool for the full address).
.TP
RPORT
Remote port.
.TP
STATE
TCP session state.
.SH OVERHEAD
Should be negligible: TCP retransmit events should be low (<1000/s), and the
low overhead this tool adds to each event should make the cost negligible.
.SH SOURCE
This is from bcc.
.IP
https://github.com/iovisor/bcc
.PP
Also look in the bcc distribution for a companion _examples.txt file containing
example usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
tcpconnect(8), tcpaccept(8)
.TH zfsdist 8 "2016-02-12" "USER COMMANDS"
.SH NAME
zfsdist \- Summarize ZFS operation latency. Uses Linux eBPF/bcc.
.SH SYNOPSIS
.B zfsdist [\-h] [\-T] [\-N] [\-d] [interval] [count]
.SH DESCRIPTION
This tool summarizes time (latency) spent in common ZFS file operations: reads,
writes, opens, and syncs, and presents it as a power-of-2 histogram. It uses an
in-kernel eBPF map to store the histogram for efficiency.
This uses kernel dynamic tracing of the ZPL interface (ZFS POSIX
Layer), and will need updates to match any changes to this interface.
.TP
This is intended to work with the ZFS on Linux project:
http://zfsonlinux.org
.PP
Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_BPF and bcc.
.SH OPTIONS
.TP
\-h
Print usage message.
.TP
\-T
Don't include timestamps on interval output.
.TP
\-m
Output in milliseconds.
.TP
\-p PID
Trace this PID only.
.SH EXAMPLES
.TP
Trace ZFS operation time, and print a summary on Ctrl-C:
#
.B zfsdist
.TP
Trace PID 181 only:
#
.B zfsdist -p 181
.TP
Print 1 second summaries, 10 times:
#
.B zfsdist 1 10
.TP
1 second summaries, printed in milliseconds
#
.B zfsdist \-m 1
.SH FIELDS
.TP
msecs
Range of milliseconds for this bucket.
.TP
usecs
Range of microseconds for this bucket.
.TP
count
Number of operations in this time range.
.TP
distribution
ASCII representation of the distribution (the count column).
.SH OVERHEAD
This adds low-overhead instrumentation to these ZFS operations,
including reads and writes from the file system cache. Such reads and writes
can be very frequent (depending on the workload; eg, 1M/sec), at which
point the overhead of this tool may become noticeable.
Measure and quantify before use.
.SH SOURCE
This is from bcc.
.IP
https://github.com/iovisor/bcc
.PP
Also look in the bcc distribution for a companion _examples.txt file containing
example usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
zfssnoop(8)
.TH zfsslower 8 "2016-02-11" "USER COMMANDS"
.SH NAME
zfsslower \- Trace slow zfs file operations, with per-event details.
.SH SYNOPSIS
.B zfsslower [\-h] [\-j] [\-p PID] [min_ms]
.SH DESCRIPTION
This tool traces common ZFS file operations: reads, writes, opens, and
syncs. It measures the time spent in these operations, and prints details
for each that exceeded a threshold.
WARNING: See the OVERHEAD section.
By default, a minimum millisecond threshold of 10 is used. If a threshold of 0
is used, all events are printed (warning: verbose).
This uses kernel dynamic tracing of the ZPL interface (ZFS POSIX
Layer), and will need updates to match any changes to this interface.
.TP
This is intended to work with the ZFS on Linux project:
http://zfsonlinux.org
.PP
Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_BPF and bcc.
.SH OPTIONS
\-p PID
Trace this PID only.
.TP
min_ms
Minimum I/O latency (duration) to trace, in milliseconds. Default is 10 ms.
.SH EXAMPLES
.TP
Trace synchronous file reads and writes slower than 10 ms:
#
.B zfsslower
.TP
Trace slower than 1 ms:
#
.B zfsslower 1
.TP
Trace slower than 1 ms, and output just the fields in parsable format (csv):
#
.B zfsslower \-j 1
.TP
Trace all file reads and writes (warning: the output will be verbose):
#
.B zfsslower 0
.TP
Trace slower than 1 ms, for PID 181 only:
#
.B zfsslower \-p 181 1
.SH FIELDS
.TP
TIME(s)
Time of I/O completion since the first I/O seen, in seconds.
.TP
COMM
Process name.
.TP
PID
Process ID.
.TP
T
Type of operation. R == read, W == write, O == open, S == fsync.
.TP
OFF_KB
File offset for the I/O, in Kbytes.
.TP
BYTES
Size of I/O, in bytes.
.TP
LAT(ms)
Latency (duration) of I/O, measured from when it was issued by VFS to the
filesystem, to when it completed. This time is inclusive of block device I/O,
file system CPU cycles, file system locks, run queue latency, etc. It's a more
accurate measure of the latency suffered by applications performing file
system I/O, than to measure this down at the block device interface.
.TP
FILENAME
A cached kernel file name (comes from dentry->d_iname).
.TP
ENDTIME_us
Completion timestamp, microseconds (\-j only).
.TP
OFFSET_b
File offset, bytes (\-j only).
.TP
LATENCY_us
Latency (duration) of the I/O, in microseconds (\-j only).
.SH OVERHEAD
This adds low-overhead instrumentation to these ZFS operations,
including reads and writes from the file system cache. Such reads and writes
can be very frequent (depending on the workload; eg, 1M/sec), at which
point the overhead of this tool (even if it prints no "slower" events) can
begin to become significant. Measure and quantify before use. If this
continues to be a problem, consider switching to a tool that prints in-kernel
summaries only.
.PP
Note that the overhead of this tool should be less than fileslower(8), as
this tool targets zfs functions only, and not all file read/write paths
(which can include socket I/O).
.SH SOURCE
This is from bcc.
.IP
https://github.com/iovisor/bcc
.PP
Also look in the bcc distribution for a companion _examples.txt file containing
example usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
biosnoop(8), funccount(8), fileslower(8)
#!/usr/bin/python
# @lint-avoid-python-3-compatibility-imports
#
# tcpretrans Trace TCP retransmits and TLPs.
# For Linux, uses BCC, eBPF. Embedded C.
#
# USAGE: tcpretrans [-h] [-l]
#
# This uses dynamic tracing of kernel functions, and will need to be updated
# to match kernel changes.
#
# IPv4 addresses are printed as dotted quads. For IPv6 addresses, the last four
# bytes are printed after "..."; check for future versions with better IPv6
# support.
#
# Copyright 2016 Netflix, Inc.
# Licensed under the Apache License, Version 2.0 (the "License")
#
# 14-Feb-2016 Brendan Gregg Created this.
from __future__ import print_function
from bcc import BPF
import argparse
from time import strftime
import ctypes as ct
# arguments
examples = """examples:
./tcpretrans # trace TCP retransmits
./tcpretrans -l # include TLP attempts
"""
parser = argparse.ArgumentParser(
description="Trace TCP retransmits",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=examples)
parser.add_argument("-l", "--lossprobe", action="store_true",
help="include tail loss probe attempts")
args = parser.parse_args()
# define BPF program
bpf_text = """
#include <uapi/linux/ptrace.h>
#include <net/sock.h>
#include <bcc/proto.h>
#define RETRANSMIT 1
#define TLP 2
// separate data structs for ipv4 and ipv6
struct ipv4_data_t {
// XXX: switch some to u32's when supported
u64 pid;
u64 ip;
u64 saddr;
u64 daddr;
u64 lport;
u64 dport;
u64 state;
u64 type;
};
BPF_PERF_OUTPUT(ipv4_events);
struct ipv6_data_t {
// XXX: update to transfer full ipv6 addrs
u64 pid;
u64 ip;
u64 saddr;
u64 daddr;
u64 lport;
u64 dport;
u64 state;
u64 type;
};
BPF_PERF_OUTPUT(ipv6_events);
static int trace_event(struct pt_regs *ctx, struct sock *sk, int type)
{
if (sk == NULL)
return 0;
u32 pid = bpf_get_current_pid_tgid();
struct sock *skp = NULL;
bpf_probe_read(&skp, sizeof(skp), &sk);
// pull in details
u16 family = 0, lport = 0, dport = 0;
char state = 0;
bpf_probe_read(&family, sizeof(family), &skp->__sk_common.skc_family);
bpf_probe_read(&lport, sizeof(lport), &skp->__sk_common.skc_num);
bpf_probe_read(&dport, sizeof(dport), &skp->__sk_common.skc_dport);
bpf_probe_read(&state, sizeof(state), (void *)&skp->__sk_common.skc_state);
if (family == AF_INET) {
struct ipv4_data_t data4 = {.pid = pid, .ip = 4, .type = type};
bpf_probe_read(&data4.saddr, sizeof(u32),
&skp->__sk_common.skc_rcv_saddr);
bpf_probe_read(&data4.daddr, sizeof(u32),
&skp->__sk_common.skc_daddr);
data4.lport = lport;
data4.dport = dport;
data4.state = state;
ipv4_events.perf_submit(ctx, &data4, sizeof(data4));
} else if (family == AF_INET6) {
struct ipv6_data_t data6 = {.pid = pid, .ip = 6, .type = type};
// just grab the last 4 bytes for now
u32 saddr = 0, daddr = 0;
bpf_probe_read(&saddr, sizeof(saddr),
&skp->__sk_common.skc_v6_rcv_saddr.in6_u.u6_addr32[3]);
bpf_probe_read(&daddr, sizeof(daddr),
&skp->__sk_common.skc_v6_daddr.in6_u.u6_addr32[3]);
data6.saddr = bpf_ntohl(saddr);
data6.daddr = bpf_ntohl(daddr);
data6.lport = lport;
data6.dport = dport;
data6.state = state;
ipv6_events.perf_submit(ctx, &data6, sizeof(data6));
}
// else drop
return 0;
}
int trace_retransmit(struct pt_regs *ctx, struct sock *sk)
{
trace_event(ctx, sk, RETRANSMIT);
return 0;
}
int trace_tlp(struct pt_regs *ctx, struct sock *sk)
{
trace_event(ctx, sk, TLP);
return 0;
}
"""
# event data
TASK_COMM_LEN = 16 # linux/sched.h
class Data_ipv4(ct.Structure):
_fields_ = [
("pid", ct.c_ulonglong),
("ip", ct.c_ulonglong),
("saddr", ct.c_ulonglong),
("daddr", ct.c_ulonglong),
("lport", ct.c_ulonglong),
("dport", ct.c_ulonglong),
("state", ct.c_ulonglong),
("type", ct.c_ulonglong)
]
class Data_ipv6(ct.Structure):
_fields_ = [
("pid", ct.c_ulonglong),
("ip", ct.c_ulonglong),
("saddr", ct.c_ulonglong),
("daddr", ct.c_ulonglong),
("lport", ct.c_ulonglong),
("dport", ct.c_ulonglong),
("state", ct.c_ulonglong),
("type", ct.c_ulonglong)
]
# from bpf_text:
type = {}
type[1] = 'R'
type[2] = 'L'
def inet_ntoa(addr):
dq = ''
for i in range(0, 4):
dq = dq + str(addr & 0xff)
if (i != 3):
dq = dq + '.'
addr = addr >> 8
return dq
# from include/net/tcp_states.h:
tcpstate = {}
tcpstate[1] = 'ESTABLISHED'
tcpstate[2] = 'SYN_SENT'
tcpstate[3] = 'SYN_RECV'
tcpstate[4] = 'FIN_WAIT1'
tcpstate[5] = 'FIN_WAIT2'
tcpstate[6] = 'TIME_WAIT'
tcpstate[7] = 'CLOSE'
tcpstate[8] = 'CLOSE_WAIT'
tcpstate[9] = 'LAST_ACK'
tcpstate[10] = 'LISTEN'
tcpstate[11] = 'CLOSING'
tcpstate[12] = 'NEW_SYN_RECV'
# process event
def print_ipv4_event(cpu, data, size):
event = ct.cast(data, ct.POINTER(Data_ipv4)).contents
print("%-8s %-6d %-2d %-20s %1s> %-20s %s" % (
strftime("%H:%M:%S"), event.pid, event.ip,
"%s:%s" % (inet_ntoa(event.saddr), event.lport),
type[event.type],
"%s:%s" % (inet_ntoa(event.daddr), event.dport),
tcpstate[event.state]))
def print_ipv6_event(cpu, data, size):
event = ct.cast(data, ct.POINTER(Data_ipv6)).contents
print("%%-8s -6d %-2d %-20s %1s> %-20s %s" % (
strftime("%H:%M:%S"), event.pid, event.ip,
"...%x:%d" % (event.saddr, event.lport),
type[event.type],
"...%x:%d" % (event.daddr, event.dport),
tcpstate[event.state]))
# initialize BPF
b = BPF(text=bpf_text)
b.attach_kprobe(event="tcp_retransmit_skb", fn_name="trace_retransmit")
b.attach_kprobe(event="tcp_send_loss_probe", fn_name="trace_tlp")
# header
print("%-8s %-6s %-2s %-20s %1s> %-20s %-4s" % ("TIME", "PID", "IP",
"LADDR:LPORT", "T", "RADDR:RPORT", "STATE"))
# read events
b["ipv4_events"].open_perf_buffer(print_ipv4_event)
b["ipv6_events"].open_perf_buffer(print_ipv6_event)
while 1:
b.kprobe_poll()
Demonstrations of tcpretrans, the Linux eBPF/bcc version.
This tool traces the kernel TCP retransmit function to show details of these
retransmits. For example:
# ./tcpretrans
TIME PID IP LADDR:LPORT T> RADDR:RPORT STATE
01:55:05 0 4 10.153.223.157:22 R> 69.53.245.40:34619 ESTABLISHED
01:55:05 0 4 10.153.223.157:22 R> 69.53.245.40:34619 ESTABLISHED
01:55:17 0 4 10.153.223.157:22 R> 69.53.245.40:22957 ESTABLISHED
[...]
This output shows three TCP retransmits, the first two were for an IPv4
connection from 10.153.223.157 port 22 to 69.53.245.40 port 34619. The TCP
state was "ESTABLISHED" at the time of the retransmit. The on-CPU PID at the
time of the retransmit is printed, in this case 0 (the kernel, which will
be the case most of the time).
Retransmits are usually a sign of poor network health, and this tool is
useful for their investigation. Unlike using tcpdump, this tool has very
low overhead, as it only traces the retransmit function. It also prints
additional kernel details: the state of the TCP session at the time of the
retransmit.
A -l option will include TCP tail loss probe attempts:
# ./tcpretrans -l
TIME PID IP LADDR:LPORT T> RADDR:RPORT STATE
01:55:45 0 4 10.153.223.157:22 R> 69.53.245.40:51601 ESTABLISHED
01:55:46 0 4 10.153.223.157:22 R> 69.53.245.40:51601 ESTABLISHED
01:55:46 0 4 10.153.223.157:22 R> 69.53.245.40:51601 ESTABLISHED
01:55:53 0 4 10.153.223.157:22 L> 69.53.245.40:46444 ESTABLISHED
01:56:06 0 4 10.153.223.157:22 R> 69.53.245.40:46444 ESTABLISHED
01:56:06 0 4 10.153.223.157:22 R> 69.53.245.40:46444 ESTABLISHED
01:56:08 0 4 10.153.223.157:22 R> 69.53.245.40:46444 ESTABLISHED
01:56:08 0 4 10.153.223.157:22 R> 69.53.245.40:46444 ESTABLISHED
01:56:08 1938 4 10.153.223.157:22 R> 69.53.245.40:46444 ESTABLISHED
01:56:08 0 4 10.153.223.157:22 R> 69.53.245.40:46444 ESTABLISHED
01:56:08 0 4 10.153.223.157:22 R> 69.53.245.40:46444 ESTABLISHED
[...]
See the "L>" in the "T>" column. These are attempts: the kernel probably
sent a TLP, but in some cases it might not have been ultimately sent.
USAGE message:
# ./tcpretrans -h
usage: tcpretrans [-h] [-l]
Trace TCP retransmits
optional arguments:
-h, --help show this help message and exit
-l, --lossprobe include tail loss probe attempts
examples:
./tcpretrans # trace TCP retransmits
./tcpretrans -l # include TLP attempts
#!/usr/bin/python
# @lint-avoid-python-3-compatibility-imports
#
# zfsdist Summarize ZFS operation latency.
# For Linux, uses BCC, eBPF.
#
# USAGE: zfsdist [-h] [-T] [-m] [-p PID] [interval] [count]
#
# Copyright 2016 Netflix, Inc.
# Licensed under the Apache License, Version 2.0 (the "License")
#
# 14-Feb-2016 Brendan Gregg Created this.
from __future__ import print_function
from bcc import BPF
from time import sleep, strftime
import argparse
# arguments
examples = """examples:
./zfsdist # show operation latency as a histogram
./zfsdist -p 181 # trace PID 181 only
./zfsdist 1 10 # print 1 second summaries, 10 times
./zfsdist -m 5 # 5s summaries, milliseconds
"""
parser = argparse.ArgumentParser(
description="Summarize ZFS operation latency",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=examples)
parser.add_argument("-T", "--notimestamp", action="store_true",
help="don't include timestamp on interval output")
parser.add_argument("-m", "--milliseconds", action="store_true",
help="output in milliseconds")
parser.add_argument("-p", "--pid",
help="trace this PID only")
parser.add_argument("interval", nargs="?",
help="output interval, in seconds")
parser.add_argument("count", nargs="?", default=99999999,
help="number of outputs")
args = parser.parse_args()
pid = args.pid
countdown = int(args.count)
if args.milliseconds:
factor = 1000000
label = "msecs"
else:
factor = 1000
label = "usecs"
if args.interval and int(args.interval) == 0:
print("ERROR: interval 0. Exiting.")
exit()
debug = 0
# define BPF program
bpf_text = """
#include <uapi/linux/ptrace.h>
#include <linux/fs.h>
#include <linux/sched.h>
#define OP_NAME_LEN 8
typedef struct dist_key {
char op[OP_NAME_LEN];
u64 slot;
} dist_key_t;
BPF_HASH(start, u32);
BPF_HISTOGRAM(dist, dist_key_t);
// time operation
int trace_entry(struct pt_regs *ctx)
{
u32 pid = bpf_get_current_pid_tgid();
if (FILTER_PID)
return 0;
u64 ts = bpf_ktime_get_ns();
start.update(&pid, &ts);
return 0;
}
static int trace_return(struct pt_regs *ctx, const char *op)
{
u64 *tsp;
u32 pid = bpf_get_current_pid_tgid();
// fetch timestamp and calculate delta
tsp = start.lookup(&pid);
if (tsp == 0) {
return 0; // missed start or filtered
}
u64 delta = (bpf_ktime_get_ns() - *tsp) / FACTOR;
// store as histogram
dist_key_t key = {.slot = bpf_log2l(delta)};
__builtin_memcpy(&key.op, op, sizeof(key.op));
dist.increment(key);
start.delete(&pid);
return 0;
}
int trace_read_return(struct pt_regs *ctx)
{
char *op = "read";
return trace_return(ctx, op);
}
int trace_write_return(struct pt_regs *ctx)
{
char *op = "write";
return trace_return(ctx, op);
}
int trace_open_return(struct pt_regs *ctx)
{
char *op = "open";
return trace_return(ctx, op);
}
int trace_fsync_return(struct pt_regs *ctx)
{
char *op = "fsync";
return trace_return(ctx, op);
}
"""
bpf_text = bpf_text.replace('FACTOR', str(factor))
if args.pid:
bpf_text = bpf_text.replace('FILTER_PID', 'pid != %s' % pid)
else:
bpf_text = bpf_text.replace('FILTER_PID', '0')
if debug:
print(bpf_text)
# load BPF program
b = BPF(text=bpf_text)
# common file functions
b.attach_kprobe(event="zpl_read", fn_name="trace_entry")
b.attach_kprobe(event="zpl_write", fn_name="trace_entry")
b.attach_kprobe(event="zpl_open", fn_name="trace_entry")
b.attach_kprobe(event="zpl_fsync", fn_name="trace_entry")
b.attach_kretprobe(event="zpl_read", fn_name="trace_read_return")
b.attach_kretprobe(event="zpl_write", fn_name="trace_write_return")
b.attach_kretprobe(event="zpl_open", fn_name="trace_open_return")
b.attach_kretprobe(event="zpl_fsync", fn_name="trace_fsync_return")
print("Tracing ZFS operation latency... Hit Ctrl-C to end.")
# output
exiting = 0
dist = b.get_table("dist")
while (1):
try:
if args.interval:
sleep(int(args.interval))
else:
sleep(99999999)
except KeyboardInterrupt:
exiting = 1
print()
if args.interval and (not args.notimestamp):
print(strftime("%H:%M:%S:"))
dist.print_log2_hist(label, "operation")
dist.clear()
countdown -= 1
if exiting or countdown == 0:
exit()
Demonstrations of zfsdist, the Linux eBPF/bcc version.
zfsdist traces ZFS reads, writes, opens, and fsyncs, and summarizes their
latency as a power-of-2 histogram. It has been written to work on ZFS on Linux
(http://zfsonlinux.org). For example:
# ./zfsdist
Tracing ZFS operation latency... Hit Ctrl-C to end.
^C
operation = 'read'
usecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 4479 |****************************************|
8 -> 15 : 1028 |********* |
16 -> 31 : 14 | |
32 -> 63 : 1 | |
64 -> 127 : 2 | |
128 -> 255 : 6 | |
256 -> 511 : 1 | |
512 -> 1023 : 1256 |*********** |
1024 -> 2047 : 9 | |
2048 -> 4095 : 1 | |
4096 -> 8191 : 2 | |
operation = 'write'
usecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 0 | |
16 -> 31 : 0 | |
32 -> 63 : 0 | |
64 -> 127 : 0 | |
128 -> 255 : 75 |****************************************|
256 -> 511 : 11 |***** |
512 -> 1023 : 0 | |
1024 -> 2047 : 0 | |
2048 -> 4095 : 0 | |
4096 -> 8191 : 0 | |
8192 -> 16383 : 0 | |
16384 -> 32767 : 0 | |
32768 -> 65535 : 0 | |
65536 -> 131071 : 13 |****** |
131072 -> 262143 : 1 | |
operation = 'open'
usecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 2 |****************************************|
This output shows a bimodal distribution for read latency, with a faster
mode of around 5 thousand reads that took between 4 and 15 microseconds, and a
slower mode of 1256 reads that took between 512 and 1023 microseconds. It's
likely that the faster mode was a hit from the in-memory file system cache,
and the slower mode is a read from a storage device (disk).
The write latency is also bimodal, with a faster mode between 128 and 511 us,
and the slower mode between 65 and 131 ms.
This "latency" is measured from when the operation was issued from the VFS
interface to the file system (via the ZFS POSIX layer), to when it completed.
This spans everything: block device I/O (disk I/O), file system CPU cycles,
file system locks, run queue latency, etc. This is a better measure of the
latency suffered by applications reading from the file system than measuring
this down at the block device interface.
Note that this only traces the common file system operations previously
listed: other file system operations (eg, inode operations including
getattr()) are not traced.
An optional interval and a count can be provided, as well as -m to show the
distributions in milliseconds. For example:
# ./zfsdist 1 5
Tracing ZFS operation latency... Hit Ctrl-C to end.
06:55:41:
operation = 'read'
usecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 3976 |****************************************|
8 -> 15 : 1181 |*********** |
16 -> 31 : 18 | |
32 -> 63 : 4 | |
64 -> 127 : 17 | |
128 -> 255 : 16 | |
256 -> 511 : 0 | |
512 -> 1023 : 1275 |************ |
1024 -> 2047 : 36 | |
2048 -> 4095 : 3 | |
4096 -> 8191 : 0 | |
8192 -> 16383 : 1 | |
16384 -> 32767 : 1 | |
06:55:42:
operation = 'read'
usecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 12751 |****************************************|
8 -> 15 : 1190 |*** |
16 -> 31 : 38 | |
32 -> 63 : 7 | |
64 -> 127 : 85 | |
128 -> 255 : 47 | |
256 -> 511 : 0 | |
512 -> 1023 : 1010 |*** |
1024 -> 2047 : 49 | |
2048 -> 4095 : 12 | |
06:55:43:
operation = 'read'
usecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 80925 |****************************************|
8 -> 15 : 1645 | |
16 -> 31 : 251 | |
32 -> 63 : 24 | |
64 -> 127 : 16 | |
128 -> 255 : 12 | |
256 -> 511 : 0 | |
512 -> 1023 : 80 | |
1024 -> 2047 : 1 | |
06:55:44:
operation = 'read'
usecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 81207 |****************************************|
8 -> 15 : 2075 |* |
16 -> 31 : 2005 | |
32 -> 63 : 177 | |
64 -> 127 : 3 | |
06:55:45:
operation = 'read'
usecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 74364 |****************************************|
8 -> 15 : 865 | |
16 -> 31 : 4960 |** |
32 -> 63 : 625 | |
64 -> 127 : 2 | |
This workload was randomly reading from a file that became cached. The slower
mode can be seen to disappear by the final summaries.
USAGE message:
# ./zfsdist -h
usage: zfsdist [-h] [-T] [-m] [-p PID] [interval] [count]
Summarize ZFS operation latency
positional arguments:
interval output interval, in seconds
count number of outputs
optional arguments:
-h, --help show this help message and exit
-T, --notimestamp don't include timestamp on interval output
-m, --milliseconds output in milliseconds
-p PID, --pid PID trace this PID only
examples:
./zfsdist # show operation latency as a histogram
./zfsdist -p 181 # trace PID 181 only
./zfsdist 1 10 # print 1 second summaries, 10 times
./zfsdist -m 5 # 5s summaries, milliseconds
#!/usr/bin/python
# @lint-avoid-python-3-compatibility-imports
#
# zfsslower Trace slow ZFS operations.
# For Linux, uses BCC, eBPF.
#
# USAGE: zfsslower [-h] [-j] [-p PID] [min_ms]
#
# This script traces common ZFS file operations: reads, writes, opens, and
# syncs. It measures the time spent in these operations, and prints details
# for each that exceeded a threshold.
#
# WARNING: This adds low-overhead instrumentation to these ZFS operations,
# including reads and writes from the file system cache. Such reads and writes
# can be very frequent (depending on the workload; eg, 1M/sec), at which
# point the overhead of this tool (even if it prints no "slower" events) can
# begin to become significant.
#
# This works by using kernel dynamic tracing of the ZPL interface, and will
# need updates to match any changes to this interface.
#
# By default, a minimum millisecond threshold of 10 is used.
#
# Copyright 2016 Netflix, Inc.
# Licensed under the Apache License, Version 2.0 (the "License")
#
# 14-Feb-2016 Brendan Gregg Created this.
from __future__ import print_function
from bcc import BPF
import argparse
from time import strftime
import ctypes as ct
# arguments
examples = """examples:
./zfsslower # trace operations slower than 10 ms (default)
./zfsslower 1 # trace operations slower than 1 ms
./zfsslower -j 1 # ... 1 ms, parsable output (csv)
./zfsslower 0 # trace all operations (warning: verbose)
./zfsslower -p 185 # trace PID 185 only
"""
parser = argparse.ArgumentParser(
description="Trace common ZFS file operations slower than a threshold",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=examples)
parser.add_argument("-j", "--csv", action="store_true",
help="just print fields: comma-separated values")
parser.add_argument("-p", "--pid",
help="trace this PID only")
parser.add_argument("min_ms", nargs="?", default='10',
help="minimum I/O duration to trace, in ms (default 10)")
args = parser.parse_args()
min_ms = int(args.min_ms)
pid = args.pid
csv = args.csv
debug = 0
# define BPF program
bpf_text = """
#include <uapi/linux/ptrace.h>
#include <linux/fs.h>
#include <linux/sched.h>
#include <linux/dcache.h>
// XXX: switch these to char's when supported
#define TRACE_READ 0
#define TRACE_WRITE 1
#define TRACE_OPEN 2
#define TRACE_FSYNC 3
struct val_t {
u64 ts;
u64 offset;
struct file *fp;
};
struct data_t {
// XXX: switch some to u32's when supported
u64 ts_us;
u64 type;
u64 size;
u64 offset;
u64 delta_us;
u64 pid;
char task[TASK_COMM_LEN];
char file[DNAME_INLINE_LEN];
};
BPF_HASH(entryinfo, pid_t, struct val_t);
BPF_PERF_OUTPUT(events);
//
// Store timestamp and size on entry
//
// zpl_read(), zpl_write():
int trace_rw_entry(struct pt_regs *ctx, struct file *filp, char __user *buf,
size_t len, loff_t *ppos)
{
u32 pid;
pid = bpf_get_current_pid_tgid();
if (FILTER_PID)
return 0;
// store filep and timestamp by pid
struct val_t val = {};
val.ts = bpf_ktime_get_ns();
val.fp = filp;
val.offset = *ppos;
if (val.fp)
entryinfo.update(&pid, &val);
return 0;
}
// zpl_open():
int trace_open_entry(struct pt_regs *ctx, struct inode *inode,
struct file *filp)
{
u32 pid;
pid = bpf_get_current_pid_tgid();
if (FILTER_PID)
return 0;
// store filep and timestamp by pid
struct val_t val = {};
val.ts = bpf_ktime_get_ns();
val.fp = filp;
val.offset = 0;
if (val.fp)
entryinfo.update(&pid, &val);
return 0;
}
// zpl_fsync():
int trace_fsync_entry(struct pt_regs *ctx, struct file *filp)
{
u32 pid;
pid = bpf_get_current_pid_tgid();
if (FILTER_PID)
return 0;
// store filp and timestamp by pid
struct val_t val = {};
val.ts = bpf_ktime_get_ns();
val.fp = filp;
val.offset = 0;
if (val.fp)
entryinfo.update(&pid, &val);
return 0;
}
//
// Output
//
static int trace_return(struct pt_regs *ctx, int type)
{
struct val_t *valp;
u32 pid = bpf_get_current_pid_tgid();
valp = entryinfo.lookup(&pid);
if (valp == 0) {
// missed tracing issue or filtered
return 0;
}
// calculate delta
u64 ts = bpf_ktime_get_ns();
u64 delta_us = (ts - valp->ts) / 1000;
entryinfo.delete(&pid);
if (FILTER_US)
return 0;
// workaround (rewriter should handle file to d_iname in one step):
struct dentry *de = NULL;
bpf_probe_read(&de, sizeof(de), &valp->fp->f_path.dentry);
// populate output struct
u32 size = ctx->ax;
struct data_t data = {.type = type, .size = size, .delta_us = delta_us,
.pid = pid};
data.ts_us = ts / 1000;
data.offset = valp->offset;
bpf_probe_read(&data.file, sizeof(data.file), de->d_iname);
bpf_get_current_comm(&data.task, sizeof(data.task));
events.perf_submit(ctx, &data, sizeof(data));
return 0;
}
int trace_read_return(struct pt_regs *ctx)
{
return trace_return(ctx, TRACE_READ);
}
int trace_write_return(struct pt_regs *ctx)
{
return trace_return(ctx, TRACE_WRITE);
}
int trace_open_return(struct pt_regs *ctx)
{
return trace_return(ctx, TRACE_OPEN);
}
int trace_fsync_return(struct pt_regs *ctx)
{
return trace_return(ctx, TRACE_FSYNC);
}
"""
if min_ms == 0:
bpf_text = bpf_text.replace('FILTER_US', '0')
else:
bpf_text = bpf_text.replace('FILTER_US',
'delta_us <= %s' % str(min_ms * 1000))
if args.pid:
bpf_text = bpf_text.replace('FILTER_PID', 'pid != %s' % pid)
else:
bpf_text = bpf_text.replace('FILTER_PID', '0')
if debug:
print(bpf_text)
# kernel->user event data: struct data_t
DNAME_INLINE_LEN = 32 # linux/dcache.h
TASK_COMM_LEN = 16 # linux/sched.h
class Data(ct.Structure):
_fields_ = [
("ts_us", ct.c_ulonglong),
("type", ct.c_ulonglong),
("size", ct.c_ulonglong),
("offset", ct.c_ulonglong),
("delta_us", ct.c_ulonglong),
("pid", ct.c_ulonglong),
("task", ct.c_char * TASK_COMM_LEN),
("file", ct.c_char * DNAME_INLINE_LEN)
]
# process event
def print_event(cpu, data, size):
event = ct.cast(data, ct.POINTER(Data)).contents
type = 'R'
if event.type == 1:
type = 'W'
elif event.type == 2:
type = 'O'
elif event.type == 3:
type = 'S'
if (csv):
print("%d,%s,%d,%s,%d,%d,%d,%s" % (
event.ts_us, event.task, event.pid, type, event.size,
event.offset, event.delta_us, event.file))
return
print("%-8s %-14.14s %-6s %1s %-7s %-8d %7.2f %s" % (strftime("%H:%M:%S"),
event.task, event.pid, type, event.size, event.offset / 1024,
float(event.delta_us) / 1000, event.file))
# initialize BPF
b = BPF(text=bpf_text)
# common file functions
b.attach_kprobe(event="zpl_read", fn_name="trace_rw_entry")
b.attach_kprobe(event="zpl_write", fn_name="trace_rw_entry")
b.attach_kprobe(event="zpl_open", fn_name="trace_open_entry")
b.attach_kprobe(event="zpl_fsync", fn_name="trace_fsync_entry")
b.attach_kretprobe(event="zpl_read", fn_name="trace_read_return")
b.attach_kretprobe(event="zpl_write", fn_name="trace_write_return")
b.attach_kretprobe(event="zpl_open", fn_name="trace_open_return")
b.attach_kretprobe(event="zpl_fsync", fn_name="trace_fsync_return")
# header
if (csv):
print("ENDTIME_us,TASK,PID,TYPE,BYTES,OFFSET_b,LATENCY_us,FILE")
else:
if min_ms == 0:
print("Tracing ZFS operations")
else:
print("Tracing ZFS operations slower than %d ms" % min_ms)
print("%-8s %-14s %-6s %1s %-7s %-8s %7s %s" % ("TIME", "COMM", "PID", "T",
"BYTES", "OFF_KB", "LAT(ms)", "FILENAME"))
# read events
b["events"].open_perf_buffer(print_event)
while 1:
b.kprobe_poll()
Demonstrations of zfsslower, the Linux eBPF/bcc version.
zfsslower shows ZFS reads, writes, opens, and fsyncs, slower than a threshold.
It has been written to work on ZFS on Linux (http://zfsonlinux.org). For
example:
# ./zfsslower
Tracing ZFS operations slower than 10 ms
TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME
06:31:28 dd 25570 W 131072 38784 303.92 data1
06:31:34 dd 25686 W 131072 38784 388.28 data1
06:31:35 dd 25686 W 131072 78720 519.66 data1
06:31:35 dd 25686 W 131072 116992 405.94 data1
06:31:35 dd 25686 W 131072 153600 433.52 data1
06:31:36 dd 25686 W 131072 188672 314.37 data1
06:31:36 dd 25686 W 131072 222336 372.33 data1
06:31:36 dd 25686 W 131072 254592 309.59 data1
06:31:37 dd 25686 W 131072 285440 304.52 data1
06:31:37 dd 25686 W 131072 315008 236.45 data1
06:31:37 dd 25686 W 131072 343424 193.54 data1
06:31:38 dd 25686 W 131072 370560 286.07 data1
06:31:38 dd 25686 W 131072 396672 251.92 data1
[...]
This shows writes to a "data1" file, each taking well over the 10 ms threshold.
the slowest, on the 3rd line of output, reached 519.66 ms for a 128 Kbyte
write by the "dd" command.
This "latency" is measured from when the operation was issued from the VFS
interface to the file system (via the ZFS POSIX layer), to when it completed.
This spans everything: block device I/O (disk I/O), file system CPU cycles,
file system locks, run queue latency, etc. This is a better measure of the
latency suffered by applications reading from the file system than measuring
this down at the block device interface.
Note that this only traces the common file system operations previously
listed: other file system operations (eg, inode operations including
getattr()) are not traced.
A threshold of 0 will trace all operations. Warning: the output will be
verbose, as it will include all file system cache hits.
# ./zfsslower 0
Tracing ZFS operations
TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME
06:36:07 dd 32242 O 0 0 0.01 data1
06:36:07 dd 32242 W 131072 0 0.25 data1
06:36:07 dd 32242 W 131072 128 0.03 data1
06:36:07 dd 32242 W 131072 256 0.04 data1
06:36:07 dd 32242 W 131072 384 0.04 data1
06:36:07 dd 32242 W 131072 512 0.04 data1
06:36:07 dd 32242 W 131072 640 0.03 data1
06:36:07 dd 32242 W 131072 768 0.03 data1
06:36:07 dd 32242 W 131072 896 0.04 data1
06:36:07 dd 32242 W 131072 1024 0.28 data1
06:36:07 dd 32242 W 131072 1152 0.04 data1
06:36:07 dd 32242 W 131072 1280 0.03 data1
[...]
06:36:07 dd 32242 W 131072 13824 0.04 data1
06:36:07 dd 32242 W 131072 13952 0.04 data1
06:36:07 dd 32242 W 131072 14080 0.04 data1
06:36:07 dd 32242 W 131072 14208 398.92 data1
06:36:07 dd 32242 W 131072 14336 0.04 data1
06:36:07 dd 32242 W 131072 14464 0.04 data1
06:36:07 dd 32242 W 131072 15104 0.03 data1
[...]
The output now includes the open operation for this file ("O"), and then the
writes. Most of the writes are very fast, with only an occasional outlier that
is in the hundreds of milliseconds.
Fortunately this is not a real world environment: I setup a zpool on top of a
XFS file system for testing purposes. More debugging using other tools will
explain these outliers: possibly XFS flushing.
Here's a random read workload, and showing operations slower than 1 ms:
# ./zfsslower 1
Tracing ZFS operations slower than 1 ms
TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME
06:47:30 randread.pl 15431 R 8192 97840 1.03 data1
06:47:30 randread.pl 15431 R 8192 416744 1.12 data1
06:47:31 randread.pl 15431 R 8192 228856 1.96 data1
06:47:31 randread.pl 15431 R 8192 452248 1.02 data1
06:47:31 randread.pl 15431 R 8192 315288 5.90 data1
06:47:31 randread.pl 15431 R 8192 752696 1.20 data1
06:47:31 randread.pl 15431 R 8192 481832 1.39 data1
06:47:31 randread.pl 15431 R 8192 673752 1.39 data1
06:47:31 randread.pl 15431 R 8192 691736 1.01 data1
06:47:31 randread.pl 15431 R 8192 694776 1.78 data1
06:47:31 randread.pl 15431 R 8192 403328 3.75 data1
06:47:31 randread.pl 15431 R 8192 567688 1.08 data1
06:47:31 randread.pl 15431 R 8192 694280 1.31 data1
06:47:31 randread.pl 15431 R 8192 669280 1.06 data1
06:47:31 randread.pl 15431 R 8192 426608 1.56 data1
06:47:31 randread.pl 15431 R 8192 42512 1.01 data1
06:47:31 randread.pl 15431 R 8192 22944 1.33 data1
06:47:31 randread.pl 15431 R 8192 427432 1.48 data1
06:47:31 randread.pl 15431 R 8192 261320 1.28 data1
06:47:31 randread.pl 15431 R 8192 132248 1.23 data1
06:47:31 randread.pl 15431 R 8192 96936 1.04 data1
06:47:31 randread.pl 15431 R 8192 482800 2.63 data1
[...]
A -j option will print just the fields (parsable output, csv):
# ./zfsslower -j 1
ENDTIME_us,TASK,PID,TYPE,BYTES,OFFSET_b,LATENCY_us,FILE
252305490911,randread.pl,17922,R,8192,163446784,1156,data1
252305493852,randread.pl,17922,R,8192,321437696,1129,data1
252305498839,randread.pl,17922,R,8192,475152384,1154,data1
252305505515,randread.pl,17922,R,8192,49094656,1082,data1
252305506774,randread.pl,17922,R,8192,470401024,1245,data1
252305509265,randread.pl,17922,R,8192,553246720,2412,data1
252305512365,randread.pl,17922,R,8192,20963328,1093,data1
252305513755,randread.pl,17922,R,8192,304111616,1350,data1
252305583330,randread.pl,17922,R,8192,166174720,1154,data1
252305593913,randread.pl,17922,R,8192,175079424,1241,data1
252305602833,randread.pl,17922,R,8192,305340416,3307,data1
252305608663,randread.pl,17922,R,8192,655958016,2704,data1
252305611212,randread.pl,17922,R,8192,40951808,1033,data1
252305614609,randread.pl,17922,R,8192,318922752,2687,data1
252305623800,randread.pl,17922,R,8192,246734848,2983,data1
252305711125,randread.pl,17922,R,8192,581795840,1091,data1
252305728694,randread.pl,17922,R,8192,710483968,1034,data1
252305762046,randread.pl,17922,R,8192,329367552,1405,data1
252305798215,randread.pl,17922,R,8192,44482560,1030,data1
252305806748,randread.pl,17922,R,8192,660602880,1069,data1
252305826360,randread.pl,17922,R,8192,616144896,2327,data1
[...]
USAGE message:
# ./zfsslower -h
usage: zfsslower [-h] [-j] [-p PID] [min_ms]
Trace common ZFS file operations slower than a threshold
positional arguments:
min_ms minimum I/O duration to trace, in ms (default 10)
optional arguments:
-h, --help show this help message and exit
-j, --csv just print fields: comma-separated values
-p PID, --pid PID trace this PID only
examples:
./zfsslower # trace operations slower than 10 ms (default)
./zfsslower 1 # trace operations slower than 1 ms
./zfsslower -j 1 # ... 1 ms, parsable output (csv)
./zfsslower 0 # trace all operations (warning: verbose)
./zfsslower -p 185 # trace PID 185 only
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment