Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
B
bcc
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
Kirill Smelkov
bcc
Commits
cd1cad12
Commit
cd1cad12
authored
Feb 12, 2016
by
Brendan Gregg
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
ext4slower
parent
23c96fe4
Changes
4
Expand all
Hide whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
653 additions
and
0 deletions
+653
-0
README.md
README.md
+1
-0
man/man8/ext4slower.8
man/man8/ext4slower.8
+113
-0
tools/ext4slower.py
tools/ext4slower.py
+330
-0
tools/ext4slower_example.txt
tools/ext4slower_example.txt
+209
-0
No files found.
README.md
View file @
cd1cad12
...
@@ -74,6 +74,7 @@ Tools:
...
@@ -74,6 +74,7 @@ Tools:
-
tools/
[
execsnoop
](
tools/execsnoop.py
)
: Trace new processes via exec() syscalls.
[
Examples
](
tools/execsnoop_example.txt
)
.
-
tools/
[
execsnoop
](
tools/execsnoop.py
)
: Trace new processes via exec() syscalls.
[
Examples
](
tools/execsnoop_example.txt
)
.
-
tools/
[
dcsnoop
](
tools/dcsnoop.py
)
: Trace directory entry cache (dcache) lookups.
[
Examples
](
tools/dcsnoop_example.txt
)
.
-
tools/
[
dcsnoop
](
tools/dcsnoop.py
)
: Trace directory entry cache (dcache) lookups.
[
Examples
](
tools/dcsnoop_example.txt
)
.
-
tools/
[
dcstat
](
tools/dcstat.py
)
: Directory entry cache (dcache) stats.
[
Examples
](
tools/dcstat_example.txt
)
.
-
tools/
[
dcstat
](
tools/dcstat.py
)
: Directory entry cache (dcache) stats.
[
Examples
](
tools/dcstat_example.txt
)
.
-
tools/
[
ext4slower
](
tools/ext4slower.py
)
: Trace slow ext4 operations.
[
Examples
](
tools/ext4slower_example.txt
)
.
-
tools/
[
filelife
](
tools/filelife.py
)
: Trace the lifespan of short-lived files.
[
Examples
](
tools/filelife_example.txt
)
.
-
tools/
[
filelife
](
tools/filelife.py
)
: Trace the lifespan of short-lived files.
[
Examples
](
tools/filelife_example.txt
)
.
-
tools/
[
fileslower
](
tools/fileslower.py
)
: Trace slow synchronous file reads and writes.
[
Examples
](
tools/fileslower_example.txt
)
.
-
tools/
[
fileslower
](
tools/fileslower.py
)
: Trace slow synchronous file reads and writes.
[
Examples
](
tools/fileslower_example.txt
)
.
-
tools/
[
filetop
](
tools/filetop.py
)
: File reads and writes by filename and process. Top for files.
[
Examples
](
tools/filetop_example.txt
)
.
-
tools/
[
filetop
](
tools/filetop.py
)
: File reads and writes by filename and process. Top for files.
[
Examples
](
tools/filetop_example.txt
)
.
...
...
man/man8/ext4slower.8
0 → 100644
View file @
cd1cad12
.TH ext4slower 8 "2016-02-11" "USER COMMANDS"
.SH NAME
ext4slower \- Trace slow ext4 file operations, with per-event details.
.SH SYNOPSIS
.B ext4slower [\-h] [\-j] [\-p PID] [min_ms]
.SH DESCRIPTION
This tool traces common ext4 file operations: reads, writes, opens, and
syncs. It measures the time spent in these operations, and prints details
for each that exceeded a threshold.
WARNING: See the OVERHEAD section.
By default, a minimum millisecond threshold of 10 is used. If a threshold of 0
is used, all events are printed (warning: verbose).
Since this works by tracing the ext4_file_operations interface functions, it
will need updating to match any changes to these functions.
Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_BPF and bcc.
.SH OPTIONS
\-p PID
Trace this PID only.
.TP
min_ms
Minimum I/O latency (duration) to trace, in milliseconds. Default is 10 ms.
.SH EXAMPLES
.TP
Trace synchronous file reads and writes slower than 10 ms:
#
.B ext4slower
.TP
Trace slower than 1 ms:
#
.B ext4slower 1
.TP
Trace slower than 1 ms, and output just the fields in parsable format (csv):
#
.B ext4slower \-j 1
.TP
Trace all file reads and writes (warning: the output will be verbose):
#
.B ext4slower 0
.TP
Trace slower than 1 ms, for PID 181 only:
#
.B ext4slower \-p 181 1
.SH FIELDS
.TP
TIME(s)
Time of I/O completion since the first I/O seen, in seconds.
.TP
COMM
Process name.
.TP
PID
Process ID.
.TP
T
Type of operation. R == read, W == write, O == open, S == fsync.
.TP
OFF_KB
File offset for the I/O, in Kbytes.
.TP
BYTES
Size of I/O, in bytes.
.TP
LAT(ms)
Latency (duration) of I/O, measured from when it was issued by VFS to the
filesystem, to when it completed. This time is inclusive of block device I/O,
file system CPU cycles, file system locks, run queue latency, etc. It's a more
accurate measure of the latency suffered by applications performing file
system I/O, than to measure this down at the block device interface.
.TP
FILENAME
A cached kernel file name (comes from dentry->d_iname).
.TP
ENDTIME_us
Completion timestamp, microseconds (\-j only).
.TP
OFFSET_b
File offset, bytes (\-j only).
.TP
LATENCY_us
Latency (duration) of the I/O, in microseconds (\-j only).
.SH OVERHEAD
This adds low-overhead instrumentation to these ext4 operations,
including reads and writes from the file system cache. Such reads and writes
can be very frequent (depending on the workload; eg, 1M/sec), at which
point the overhead of this tool (even if it prints no "slower" events) can
begin to become significant. Measure and quantify before use. If this
continues to be a problem, consider switching to a tool that prints in-kernel
summaries only.
.PP
Note that the overhead of this tool should be less than fileslower(8), as
this tool targets ext4 functions only, and not all file read/write paths
(which can include socket I/O).
.SH SOURCE
This is from bcc.
.IP
https://github.com/iovisor/bcc
.PP
Also look in the bcc distribution for a companion _examples.txt file containing
example usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
biosnoop(8), funccount(8), fileslower(8)
tools/ext4slower.py
0 → 100755
View file @
cd1cad12
#!/usr/bin/python
# @lint-avoid-python-3-compatibility-imports
#
# ext4slower Trace slow ext4 operations.
# For Linux, uses BCC, eBPF.
#
# USAGE: ext4slower [-h] [-j] [-p PID] [min_ms]
#
# This script traces common ext4 file operations: reads, writes, opens, and
# syncs. It measures the time spent in these operations, and prints details
# for each that exceeded a threshold.
#
# WARNING: This adds low-overhead instrumentation to these ext4 operations,
# including reads and writes from the file system cache. Such reads and writes
# can be very frequent (depending on the workload; eg, 1M/sec), at which
# point the overhead of this tool (even if it prints no "slower" events) can
# begin to become significant.
#
# By default, a minimum millisecond threshold of 10 is used.
#
# Copyright 2016 Netflix, Inc.
# Licensed under the Apache License, Version 2.0 (the "License")
#
# 11-Feb-2016 Brendan Gregg Created this.
from
__future__
import
print_function
from
bcc
import
BPF
import
argparse
from
time
import
strftime
import
ctypes
as
ct
# symbols
kallsyms
=
"/proc/kallsyms"
# arguments
examples
=
"""examples:
./ext4slower # trace operations slower than 10 ms (default)
./ext4slower 1 # trace operations slower than 1 ms
./ext4slower -j 1 # ... 1 ms, parsable output (csv)
./ext4slower 0 # trace all operations (warning: verbose)
./ext4slower -p 185 # trace PID 185 only
"""
parser
=
argparse
.
ArgumentParser
(
description
=
"Trace common ext4 file operations slower than a threshold"
,
formatter_class
=
argparse
.
RawDescriptionHelpFormatter
,
epilog
=
examples
)
parser
.
add_argument
(
"-j"
,
"--csv"
,
action
=
"store_true"
,
help
=
"just print fields: comma-separated values"
)
parser
.
add_argument
(
"-p"
,
"--pid"
,
help
=
"trace this PID only"
)
parser
.
add_argument
(
"min_ms"
,
nargs
=
"?"
,
default
=
'10'
,
help
=
"minimum I/O duration to trace, in ms (default 10)"
)
args
=
parser
.
parse_args
()
min_ms
=
int
(
args
.
min_ms
)
pid
=
args
.
pid
csv
=
args
.
csv
debug
=
0
# define BPF program
bpf_text
=
"""
#include <uapi/linux/ptrace.h>
#include <linux/fs.h>
#include <linux/sched.h>
#include <linux/dcache.h>
// XXX: switch these to char's when supported
#define TRACE_READ 0
#define TRACE_WRITE 1
#define TRACE_OPEN 2
#define TRACE_FSYNC 3
struct val_t {
u64 ts;
u64 offset;
struct file *fp;
};
struct data_t {
// XXX: switch some to u32's when supported
u64 ts_us;
u64 type;
u64 size;
u64 offset;
u64 delta_us;
u64 pid;
char task[TASK_COMM_LEN];
char file[DNAME_INLINE_LEN];
};
BPF_HASH(entryinfo, pid_t, struct val_t);
BPF_PERF_OUTPUT(events);
//
// Store timestamp and size on entry
//
// The current ext4 (Linux 4.5) uses generic_file_read_iter(), instead of it's
// own function, for reads. So we need to trace that and then filter on ext4,
// which I do by checking file->f_op.
int trace_read_entry(struct pt_regs *ctx, struct kiocb *iocb)
{
u32 pid;
pid = bpf_get_current_pid_tgid();
if (FILTER_PID)
return 0;
// ext4 filter on file->f_op == ext4_file_operations
struct file *fp = iocb->ki_filp;
if ((u64)fp->f_op != EXT4_FILE_OPERATIONS)
return 0;
// store filep and timestamp by pid
struct val_t val = {};
val.ts = bpf_ktime_get_ns();
val.fp = fp;
val.offset = iocb->ki_pos;
if (val.fp)
entryinfo.update(&pid, &val);
return 0;
}
// ext4_file_write_iter():
int trace_write_entry(struct pt_regs *ctx, struct kiocb *iocb)
{
u32 pid;
pid = bpf_get_current_pid_tgid();
if (FILTER_PID)
return 0;
// store filep and timestamp by pid
struct val_t val = {};
val.ts = bpf_ktime_get_ns();
val.fp = iocb->ki_filp;
val.offset = iocb->ki_pos;
if (val.fp)
entryinfo.update(&pid, &val);
return 0;
}
// ext4_file_open():
int trace_open_entry(struct pt_regs *ctx, struct inode *inode,
struct file *file)
{
u32 pid;
pid = bpf_get_current_pid_tgid();
if (FILTER_PID)
return 0;
// store filep and timestamp by pid
struct val_t val = {};
val.ts = bpf_ktime_get_ns();
val.fp = file;
val.offset = 0;
if (val.fp)
entryinfo.update(&pid, &val);
return 0;
}
// ext4_sync_file():
int trace_fsync_entry(struct pt_regs *ctx, struct file *file)
{
u32 pid;
pid = bpf_get_current_pid_tgid();
if (FILTER_PID)
return 0;
// store filep and timestamp by pid
struct val_t val = {};
val.ts = bpf_ktime_get_ns();
val.fp = file;
val.offset = 0;
if (val.fp)
entryinfo.update(&pid, &val);
return 0;
}
//
// Output
//
static int trace_return(struct pt_regs *ctx, int type)
{
struct val_t *valp;
u32 pid = bpf_get_current_pid_tgid();
valp = entryinfo.lookup(&pid);
if (valp == 0) {
// missed tracing issue or filtered
return 0;
}
// calculate delta
u64 ts = bpf_ktime_get_ns();
u64 delta_us = (ts - valp->ts) / 1000;
entryinfo.delete(&pid);
if (FILTER_US)
return 0;
// workaround (rewriter should handle file to d_iname in one step):
struct dentry *de = NULL;
bpf_probe_read(&de, sizeof(de), &valp->fp->f_path.dentry);
// populate output struct
u32 size = ctx->ax;
struct data_t data = {.type = type, .size = size, .delta_us = delta_us,
.pid = pid};
data.ts_us = ts / 1000;
data.offset = valp->offset;
bpf_probe_read(&data.file, sizeof(data.file), de->d_iname);
bpf_get_current_comm(&data.task, sizeof(data.task));
events.perf_submit(ctx, &data, sizeof(data));
return 0;
}
int trace_read_return(struct pt_regs *ctx)
{
return trace_return(ctx, TRACE_READ);
}
int trace_write_return(struct pt_regs *ctx)
{
return trace_return(ctx, TRACE_WRITE);
}
int trace_open_return(struct pt_regs *ctx)
{
return trace_return(ctx, TRACE_OPEN);
}
int trace_fsync_return(struct pt_regs *ctx)
{
return trace_return(ctx, TRACE_FSYNC);
}
"""
# code replacements
with
open
(
kallsyms
)
as
syms
:
ops
=
''
for
line
in
syms
:
(
addr
,
size
,
name
)
=
line
.
rstrip
().
split
(
" "
,
2
)
if
name
==
"ext4_file_operations"
:
ops
=
"0x"
+
addr
break
if
ops
==
''
:
print
(
"ERROR: no ext4_file_operations in /proc/kallsyms. Exiting."
)
exit
()
bpf_text
=
bpf_text
.
replace
(
'EXT4_FILE_OPERATIONS'
,
ops
)
if
min_ms
==
0
:
bpf_text
=
bpf_text
.
replace
(
'FILTER_US'
,
'0'
)
else
:
bpf_text
=
bpf_text
.
replace
(
'FILTER_US'
,
'delta_us <= %s'
%
str
(
min_ms
*
1000
))
if
args
.
pid
:
bpf_text
=
bpf_text
.
replace
(
'FILTER_PID'
,
'pid != %s'
%
pid
)
else
:
bpf_text
=
bpf_text
.
replace
(
'FILTER_PID'
,
'0'
)
if
debug
:
print
(
bpf_text
)
# kernel->user event data: struct data_t
DNAME_INLINE_LEN
=
32
# linux/dcache.h
TASK_COMM_LEN
=
16
# linux/sched.h
class
Data
(
ct
.
Structure
):
_fields_
=
[
(
"ts_us"
,
ct
.
c_ulonglong
),
(
"type"
,
ct
.
c_ulonglong
),
(
"size"
,
ct
.
c_ulonglong
),
(
"offset"
,
ct
.
c_ulonglong
),
(
"delta_us"
,
ct
.
c_ulonglong
),
(
"pid"
,
ct
.
c_ulonglong
),
(
"task"
,
ct
.
c_char
*
TASK_COMM_LEN
),
(
"file"
,
ct
.
c_char
*
DNAME_INLINE_LEN
)
]
# process event
def
print_event
(
cpu
,
data
,
size
):
event
=
ct
.
cast
(
data
,
ct
.
POINTER
(
Data
)).
contents
type
=
'R'
if
event
.
type
==
1
:
type
=
'W'
elif
event
.
type
==
2
:
type
=
'O'
elif
event
.
type
==
3
:
type
=
'S'
if
(
csv
):
print
(
"%d,%s,%d,%s,%d,%d,%d,%s"
%
(
event
.
ts_us
,
event
.
task
,
event
.
pid
,
type
,
event
.
size
,
event
.
offset
,
event
.
delta_us
,
event
.
file
))
return
print
(
"%-8s %-14.14s %-6s %1s %-7s %-8d %7.2f %s"
%
(
strftime
(
"%H:%M:%S"
),
event
.
task
,
event
.
pid
,
type
,
event
.
size
,
event
.
offset
/
1024
,
float
(
event
.
delta_us
)
/
1000
,
event
.
file
))
# initialize BPF
b
=
BPF
(
text
=
bpf_text
)
# Common file functions. See earlier comment about generic_file_read_iter().
b
.
attach_kprobe
(
event
=
"generic_file_read_iter"
,
fn_name
=
"trace_read_entry"
)
b
.
attach_kprobe
(
event
=
"ext4_file_write_iter"
,
fn_name
=
"trace_write_entry"
)
b
.
attach_kprobe
(
event
=
"ext4_file_open"
,
fn_name
=
"trace_open_entry"
)
b
.
attach_kprobe
(
event
=
"ext4_sync_file"
,
fn_name
=
"trace_fsync_entry"
)
b
.
attach_kretprobe
(
event
=
"generic_file_read_iter"
,
fn_name
=
"trace_read_return"
)
b
.
attach_kretprobe
(
event
=
"ext4_file_write_iter"
,
fn_name
=
"trace_write_return"
)
b
.
attach_kretprobe
(
event
=
"ext4_file_open"
,
fn_name
=
"trace_open_return"
)
b
.
attach_kretprobe
(
event
=
"ext4_sync_file"
,
fn_name
=
"trace_fsync_return"
)
# header
if
(
csv
):
print
(
"ENDTIME_us,TASK,PID,TYPE,BYTES,OFFSET_b,LATENCY_us,FILE"
)
else
:
if
min_ms
==
0
:
print
(
"Tracing ext4 operations"
)
else
:
print
(
"Tracing ext4 operations slower than %d ms"
%
min_ms
)
print
(
"%-8s %-14s %-6s %1s %-7s %-8s %7s %s"
%
(
"TIME"
,
"COMM"
,
"PID"
,
"T"
,
"BYTES"
,
"OFF_KB"
,
"LAT(ms)"
,
"FILENAME"
))
# read events
b
[
"events"
].
open_perf_buffer
(
print_event
)
while
1
:
b
.
kprobe_poll
()
tools/ext4slower_example.txt
0 → 100644
View file @
cd1cad12
This diff is collapsed.
Click to expand it.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment