Commit 2371b1bc authored by Brendan Gregg's avatar Brendan Gregg

add writeback tool

parent 06f28be0
......@@ -118,6 +118,7 @@ bpftrace contains various tools, which also serve as examples of programming in
- tools/[syscount.bt](tools/syscount.bt): Count system callls. [Examples](tools/syscount_example.txt).
- tools/[vfscount.bt](tools/vfscount.bt): Count VFS calls. [Examples](tools/vfscount_example.txt).
- tools/[vfsstat.bt](tools/vfsstat.bt): Count some VFS calls, with per-second summaries. [Examples](tools/vfsstat_example.txt).
- tools/[writeback.bt](tools/writeback.bt): Trace file system writeback events with details. [Examples](tools/writeback_example.txt).
- tools/[xfsdist.bt](tools/xfsdist.bt): Summarize XFS operation latency distribution as a histogram. [Examples](tools/xfsdist_example.txt).
For more eBPF observability tools, see [bcc tools](https://github.com/iovisor/bcc#tools).
......
.TH writeback 8 "2018-09-14" "USER COMMANDS"
.SH NAME
writeback.bt \- Trace file system writeback events with details. Uses bpftrace/eBPF.
.SH SYNOPSIS
.B writeback.bt
.SH DESCRIPTION
This traces when file system dirtied pages are flushed to disk by kernel
writeback, and prints details including when the event occurred, and the
duration of the event. This can be useful for correlating these times with
other performace problems, and if there is a match, it would be a clue
that the problem may be caused by writeback. How quickly the kernel does
writeback can be tuned: see the kernel docs, eg,
vm.dirty_writeback_centisecs.
This uses the tracepoint:writeback:writeback_start and
tracepoint:writeback:writeback_written tracepoints.
Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_BPF and bpftrace.
.SH EXAMPLES
.TP
Trace all writeback events with timestamps and latency details:
#
.B writeback.bt
.SH FIELDS
.TP
TIME
Time that the writeback event completed, in %H:%M:%S format.
.TP
DEVICE
Device name in major:minor number format.
.TP
PAGES
Pages written during writeback.
.TP
REASON
Reason for the writeback event. This may be "background", "vmscan", "sync", "periodic", etc.
.TP
ms
Duration of the writeback event in milliseconds.
.SH OVERHEAD
Since writeback events are expected to be infrequent (<10/sec), the overhead
of this tool is expected to be negligible (near 0%).
.SH SOURCE
This is from bpftrace.
.IP
https://github.com/iovisor/bpftrace
.PP
Also look in the bpftrace distribution for a companion _examples.txt file containing
example usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
biosnoop(8)
/*
* writeback Trace file system writeback events with details.
* For Linux, uses bpftrace and eBPF.
*
* This traces when file system dirtied pages are flushed to disk by kernel
* writeback, and prints details including when the event occurred, and the
* duration of the event. This can be useful for correlating these times with
* other performace problems, and if there is a match, it would be a clue
* that the problem may be caused by writeback. How quickly the kernel does
* writeback can be tuned: see the kernel docs, eg,
* vm.dirty_writeback_centisecs.
*
* USAGE: writeback.bt
*
* Copyright 2018 Netflix, Inc.
* Licensed under the Apache License, Version 2.0 (the "License")
*
* 14-Sep-2018 Brendan Gregg Created this.
*/
BEGIN
{
printf("Tracing writeback... Hit Ctrl-C to end.\n");
printf("%-9s %-8s %-8s %-16s %s\n", "TIME", "DEVICE", "PAGES",
"REASON", "ms");
// see /sys/kernel/debug/tracing/events/writeback/writeback_start/format
@reason[0] = "background";
@reason[1] = "vmscan";
@reason[2] = "sync";
@reason[3] = "periodic";
@reason[4] = "laptop_timer";
@reason[5] = "free_more_memory";
@reason[6] = "fs_free_space";
@reason[7] = "forker_thread";
}
tracepoint:writeback:writeback_start
{
$sb_dev = *(ctx + 48) & 0xffffffff;
@start[$sb_dev] = nsecs;
}
tracepoint:writeback:writeback_written
{
$name = str(ctx + 8);
$nr_pages = *(ctx + 40) & 0xffff;
$sb_dev = *(ctx + 48) & 0xffffffff;
$r = *(ctx + 68) & 0xffffffff;
$s = @start[$sb_dev];
delete(@start[$sb_dev]);
$lat = $s ? (nsecs - $s) / 1000 : 0;
time("%H:%M:%S ");
printf("%-8s %-8d %-16s %d.%03d\n", $name, $nr_pages, @reason[$r],
$lat / 1000, $lat % 1000);
}
END
{
clear(@reason);
clear(@start);
}
Demonstrations of writeback, the Linux bpftrace/eBPF version.
This tool traces when the kernel writeback procedure is writing dirtied pages
to disk, and shows details such as the time, device numbers, reason for the
write back, and the duration. For example:
# bpftrace writeback.bt
Attaching 4 probes...
Tracing writeback... Hit Ctrl-C to end.
TIME DEVICE PAGES REASON ms
23:28:47 259:1 15791 periodic 0.005
23:28:48 259:0 15792 periodic 0.004
23:28:52 259:1 15784 periodic 0.003
23:28:53 259:0 18682 periodic 0.003
23:28:55 259:0 41970 background 326.663
23:28:56 259:0 18418 background 332.689
23:28:56 259:0 60402 background 362.446
23:28:57 259:1 18230 periodic 0.005
23:28:57 259:1 65492 background 3.343
23:28:57 259:1 65492 background 0.002
23:28:58 259:0 36850 background 0.000
23:28:58 259:0 13298 background 597.198
23:28:58 259:0 55282 background 322.050
23:28:59 259:0 31730 background 336.031
23:28:59 259:0 8178 background 357.119
23:29:01 259:0 50162 background 1803.146
23:29:02 259:0 27634 background 1311.876
23:29:03 259:0 6130 background 331.599
23:29:03 259:0 50162 background 293.968
23:29:03 259:0 28658 background 284.946
23:29:03 259:0 7154 background 286.572
[...]
By looking a the timestamps and latency, it can be seen that the system was
not spending much time in writeback until 23:28:55, when "background"
writeback began, taking over 300 milliseconds per flush.
If timestamps of heavy writeback coincide with times when applications suffered
performance issues, that would be a clue that they are correlated and there
is contention for the disk devices. There are various ways to tune this:
eg, vm.dirty_writeback_centisecs.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment