add writeback tool

2371b1bc · Brendan Gregg · 06f28be0 · 2371b1bc · 2371b1bc · 2371b1bc
Commit 2371b1bc authored Sep 15, 2018 by Brendan Gregg
Showing with 165 additions and 0 deletions

README.md README.md +1 -0

man/man8/writeback.8 man/man8/writeback.8 +59 -0

tools/writeback.bt tools/writeback.bt +63 -0

tools/writeback_example.txt tools/writeback_example.txt +42 -0

No files found.
--- a/README.md
+++ b/README.md
@@ -118,6 +118,7 @@ bpftrace contains various tools, which also serve as examples of programming in
 - tools/[syscount.bt](tools/syscount.bt): Count system callls. [Examples](tools/syscount_example.txt).
 - tools/[vfscount.bt](tools/vfscount.bt): Count VFS calls. [Examples](tools/vfscount_example.txt).
 - tools/[vfsstat.bt](tools/vfsstat.bt): Count some VFS calls, with per-second summaries. [Examples](tools/vfsstat_example.txt).
+- tools/[writeback.bt](tools/writeback.bt): Trace file system writeback events with details. [Examples](tools/writeback_example.txt).
 - tools/[xfsdist.bt](tools/xfsdist.bt): Summarize XFS operation latency distribution as a histogram. [Examples](tools/xfsdist_example.txt).

 For more eBPF observability tools, see [bcc tools](https://github.com/iovisor/bcc#tools).

--- a/man/man8/writeback.8
+++ b/man/man8/writeback.8
+.TH writeback 8  "2018-09-14" "USER COMMANDS"
+.SH NAME
+writeback.bt \- Trace file system writeback events with details. Uses bpftrace/eBPF.
+.SH SYNOPSIS
+.B writeback.bt
+.SH DESCRIPTION
+This traces when file system dirtied pages are flushed to disk by kernel
+writeback, and prints details including when the event occurred, and the
+duration of the event. This can be useful for correlating these times with
+other performace problems, and if there is a match, it would be a clue
+that the problem may be caused by writeback. How quickly the kernel does
+writeback can be tuned: see the kernel docs, eg,
+vm.dirty_writeback_centisecs.
+
+This uses the tracepoint:writeback:writeback_start and
+tracepoint:writeback:writeback_written tracepoints.
+
+Since this uses BPF, only the root user can use this tool.
+.SH REQUIREMENTS
+CONFIG_BPF and bpftrace.
+.SH EXAMPLES
+.TP
+Trace all writeback events with timestamps and latency details:
+#
+.B writeback.bt
+.SH FIELDS
+.TP
+TIME
+Time that the writeback event completed, in %H:%M:%S format.
+.TP
+DEVICE
+Device name in major:minor number format.
+.TP
+PAGES
+Pages written during writeback.
+.TP
+REASON
+Reason for the writeback event. This may be "background", "vmscan", "sync", "periodic", etc.
+.TP
+ms
+Duration of the writeback event in milliseconds.
+.SH OVERHEAD
+Since writeback events are expected to be infrequent (<10/sec), the overhead
+of this tool is expected to be negligible (near 0%).
+.SH SOURCE
+This is from bpftrace.
+.IP
+https://github.com/iovisor/bpftrace
+.PP
+Also look in the bpftrace distribution for a companion _examples.txt file containing
+example usage, output, and commentary for this tool.
+.SH OS
+Linux
+.SH STABILITY
+Unstable - in development.
+.SH AUTHOR
+Brendan Gregg
+.SH SEE ALSO
+biosnoop(8)
--- a/tools/writeback.bt
+++ b/tools/writeback.bt
+/*
+ * writeback	Trace file system writeback events with details.
+ * 		For Linux, uses bpftrace and eBPF.
+ *
+ * This traces when file system dirtied pages are flushed to disk by kernel
+ * writeback, and prints details including when the event occurred, and the
+ * duration of the event. This can be useful for correlating these times with
+ * other performace problems, and if there is a match, it would be a clue
+ * that the problem may be caused by writeback. How quickly the kernel does
+ * writeback can be tuned: see the kernel docs, eg,
+ * vm.dirty_writeback_centisecs.
+ *
+ * USAGE: writeback.bt
+ *
+ * Copyright 2018 Netflix, Inc.
+ * Licensed under the Apache License, Version 2.0 (the "License")
+ *
+ * 14-Sep-2018	Brendan Gregg	Created this.
+ */
+
+BEGIN
+{
+	printf("Tracing writeback... Hit Ctrl-C to end.\n");
+	printf("%-9s %-8s %-8s %-16s %s\n", "TIME", "DEVICE", "PAGES",
+	    "REASON", "ms");
+
+	// see /sys/kernel/debug/tracing/events/writeback/writeback_start/format
+	@reason[0] = "background";
+	@reason[1] = "vmscan";
+	@reason[2] = "sync";
+	@reason[3] = "periodic";
+	@reason[4] = "laptop_timer";
+	@reason[5] = "free_more_memory";
+	@reason[6] = "fs_free_space";
+	@reason[7] = "forker_thread";
+}
+
+tracepoint:writeback:writeback_start
+{
+	$sb_dev = *(ctx + 48) & 0xffffffff;
+	@start[$sb_dev] = nsecs;
+}
+
+tracepoint:writeback:writeback_written
+{
+	$name = str(ctx + 8);
+	$nr_pages = *(ctx + 40) & 0xffff;
+	$sb_dev = *(ctx + 48) & 0xffffffff;
+	$r = *(ctx + 68) & 0xffffffff;
+	$s = @start[$sb_dev];
+	delete(@start[$sb_dev]);
+	$lat = $s ? (nsecs - $s) / 1000 : 0;
+
+	time("%H:%M:%S  ");
+	printf("%-8s %-8d %-16s %d.%03d\n", $name, $nr_pages, @reason[$r],
+	    $lat / 1000, $lat % 1000);
+}
+
+END
+{
+	clear(@reason);
+	clear(@start);
+}
--- a/tools/writeback_example.txt
+++ b/tools/writeback_example.txt
+Demonstrations of writeback, the Linux bpftrace/eBPF version.
+
+
+This tool traces when the kernel writeback procedure is writing dirtied pages
+to disk, and shows details such as the time, device numbers, reason for the
+write back, and the duration. For example:
+
+# bpftrace writeback.bt
+Attaching 4 probes...
+Tracing writeback... Hit Ctrl-C to end.
+TIME      DEVICE   PAGES    REASON           ms
+23:28:47  259:1    15791    periodic         0.005
+23:28:48  259:0    15792    periodic         0.004
+23:28:52  259:1    15784    periodic         0.003
+23:28:53  259:0    18682    periodic         0.003
+23:28:55  259:0    41970    background       326.663
+23:28:56  259:0    18418    background       332.689
+23:28:56  259:0    60402    background       362.446
+23:28:57  259:1    18230    periodic         0.005
+23:28:57  259:1    65492    background       3.343
+23:28:57  259:1    65492    background       0.002
+23:28:58  259:0    36850    background       0.000
+23:28:58  259:0    13298    background       597.198
+23:28:58  259:0    55282    background       322.050
+23:28:59  259:0    31730    background       336.031
+23:28:59  259:0    8178     background       357.119
+23:29:01  259:0    50162    background       1803.146
+23:29:02  259:0    27634    background       1311.876
+23:29:03  259:0    6130     background       331.599
+23:29:03  259:0    50162    background       293.968
+23:29:03  259:0    28658    background       284.946
+23:29:03  259:0    7154     background       286.572
+[...]
+
+By looking a the timestamps and latency, it can be seen that the system was
+not spending much time in writeback until 23:28:55, when "background"
+writeback began, taking over 300 milliseconds per flush.
+
+If timestamps of heavy writeback coincide with times when applications suffered
+performance issues, that would be a clue that they are correlated and there
+is contention for the disk devices. There are various ways to tune this:
+eg, vm.dirty_writeback_centisecs.