Commit 64df2359 authored by Brendan Gregg's avatar Brendan Gregg

add dcsnoop tool

parent 269c7388
......@@ -144,6 +144,7 @@ bpftrace contains various tools, which also serve as examples of programming in
- tools/[bitesize.bt](tools/bitesize.bt): Show disk I/O size as a histogram. [Examples](tools/bitesize_example.txt).
- tools/[capable.bt](tools/capable.bt): Trace security capabilitiy checks. [Examples](tools/capable_example.txt).
- tools/[cpuwalk.bt](tools/cpuwalk.bt): Sample which CPUs are executing processes. [Examples](tools/cpuwalk_example.txt).
- tools/[dcsnoop.bt](tools/dcsnoop.bt): Trace directory entry cache (dcache) lookups. [Examples](tools/dcsnoop_example.txt).
- tools/[execsnoop.bt](tools/execsnoop.bt): Trace new processes via exec() syscalls. [Examples](tools/execsnoop_example.txt).
- tools/[gethostlatency.bt](tools/gethostlatency.bt): Show latency for getaddrinfo/gethostbyname[2] calls. [Examples](tools/gethostlatency_example.txt).
- tools/[killsnoop.bt](tools/killsnoop.bt): Trace signals issued by the kill() syscall. [Examples](tools/killsnoop_example.txt).
......
.TH dcsnoop 8 "2018-09-08" "USER COMMANDS"
.SH NAME
dcsnoop.bt \- Trace directory entry cache (dcache) lookups. Uses bpftrace/eBPF.
.SH SYNOPSIS
.B dcsnoop.bt
.SH DESCRIPTION
By default, this traces every dcache lookup, and shows the
process performing the lookup and the filename requested.
The output of this tool can be verbose, and is intended for further
investigations of dcache performance beyond dcstat(8), which prints
per-second summaries.
This uses kernel dynamic tracing of the d_lookup() function, and will need
and will need updating to match any changes to this function.
Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_BPF and bcc.
.SH EXAMPLES
.TP
Trace all dcache lookups:
#
.B dcsnoop
.SH FIELDS
.TP
TIME(ms)
Time of lookup, in milliseconds.
.TP
PID
Process ID.
.TP
COMM
Process name.
.TP
T
Type: R == reference, M == miss. A miss will print two
lines, one for the reference, and one for the miss.
.TP
FILE
The file name component that was being looked up. This contains trailing
pathname components (after '/'), which will be the subject of subsequent
lookups.
.SH OVERHEAD
File name lookups can be frequent (depending on the workload), and this tool
prints a line for each failed lookup, and with \-a, each reference as well. The
output may be verbose, and the incurred overhead, while optimized to some
extent, may still be from noticeable to significant. This is only really
intended for deeper investigations beyond dcstat(8), when absolutely necessary.
Measure and quantify the overhead in a test environment before use.
.SH SOURCE
This is from bpftrace.
.IP
https://github.com/iovisor/bpftrace
.PP
Also look in the bpftrace distribution for a companion _examples.txt file containing
example usage, output, and commentary for this tool.
This is a bpftrace version of the bcc tool of the same name. The bcc tool
may provide more options and customizations.
.IP
https://github.com/iovisor/bcc
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
dcstat(8)
/*
* dcsnoop Trace directory entry cache (dcache) lookups.
* For Linux, uses bpftrace and eBPF.
*
* This uses kernel dynamic tracing of kernel functions, lookup_fast() and
* d_lookup(), which will need to be modified to match kernel changes. See
* code comments.
*
* USAGE: dcsnoop.bt
*
* Copyright 2018 Netflix, Inc.
* Licensed under the Apache License, Version 2.0 (the "License")
*
* 08-Sep-2018 Brendan Gregg Created this.
*/
#include <linux/fs.h>
#include <linux/sched.h>
// from fs/namei.c:
struct nameidata {
struct path path;
struct qstr last;
// [...]
}
BEGIN
{
printf("Tracing dcache lookups... Hit Ctrl-C to end.\n");
printf("%-8s %-6s %-16s %1s %s\n", "TIME", "PID", "COMM", "T", "FILE");
@epoch = nsecs;
}
// comment out this block to avoid showing hits:
kprobe:lookup_fast
{
$nd = (nameidata *)arg0;
printf("%-8d %-6d %-16s R %s\n", (nsecs - @epoch) / 1000000, pid, comm,
str($nd->last.name));
}
kprobe:d_lookup
{
$name = (qstr *)arg1;
@fname[tid] = $name->name;
}
kretprobe:d_lookup
/@fname[tid]/
{
printf("%-8d %-6d %-16s M %s\n", (nsecs - @epoch) / 1000000, pid, comm,
str(@fname[tid]));
delete(@fname[tid]);
}
Demonstrations of dcsnoop, the Linux bpftrace/eBPF version.
dcsnoop traces directory entry cache (dcache) lookups, and can be used for
further investigation beyond dcstat(8). The output is likely verbose, as
dcache lookups are likely frequent. For example:
# dcsnoop.bt
Attaching 4 probes...
Tracing dcache lookups... Hit Ctrl-C to end.
TIME PID COMM T FILE
427 1518 irqbalance R proc/interrupts
427 1518 irqbalance R interrupts
427 1518 irqbalance R proc/stat
427 1518 irqbalance R stat
483 2440 snmp-pass R proc/cpuinfo
483 2440 snmp-pass R cpuinfo
486 2440 snmp-pass R proc/stat
486 2440 snmp-pass R stat
834 1744 snmpd R proc/net/dev
834 1744 snmpd R net/dev
834 1744 snmpd R self/net
834 1744 snmpd R 1744
834 1744 snmpd R net
834 1744 snmpd R dev
834 1744 snmpd R proc/net/if_inet6
834 1744 snmpd R net/if_inet6
834 1744 snmpd R self/net
834 1744 snmpd R 1744
834 1744 snmpd R net
834 1744 snmpd R if_inet6
835 1744 snmpd R sys/class/net/docker0/device/vendor
835 1744 snmpd R class/net/docker0/device/vendor
835 1744 snmpd R net/docker0/device/vendor
835 1744 snmpd R docker0/device/vendor
835 1744 snmpd R devices/virtual/net/docker0
835 1744 snmpd R virtual/net/docker0
835 1744 snmpd R net/docker0
835 1744 snmpd R docker0
835 1744 snmpd R device/vendor
835 1744 snmpd R proc/sys/net/ipv4/neigh/docker0/retrans_time_ms
835 1744 snmpd R sys/net/ipv4/neigh/docker0/retrans_time_ms
835 1744 snmpd R net/ipv4/neigh/docker0/retrans_time_ms
835 1744 snmpd R ipv4/neigh/docker0/retrans_time_ms
835 1744 snmpd R neigh/docker0/retrans_time_ms
835 1744 snmpd R docker0/retrans_time_ms
835 1744 snmpd R retrans_time_ms
835 1744 snmpd R proc/sys/net/ipv6/neigh/docker0/retrans_time_ms
835 1744 snmpd R sys/net/ipv6/neigh/docker0/retrans_time_ms
835 1744 snmpd R net/ipv6/neigh/docker0/retrans_time_ms
835 1744 snmpd R ipv6/neigh/docker0/retrans_time_ms
835 1744 snmpd R neigh/docker0/retrans_time_ms
835 1744 snmpd R docker0/retrans_time_ms
835 1744 snmpd R retrans_time_ms
835 1744 snmpd R proc/sys/net/ipv6/conf/docker0/forwarding
835 1744 snmpd R sys/net/ipv6/conf/docker0/forwarding
835 1744 snmpd R net/ipv6/conf/docker0/forwarding
835 1744 snmpd R ipv6/conf/docker0/forwarding
835 1744 snmpd R conf/docker0/forwarding
[...]
5154 934 cksum R usr/bin/basename
5154 934 cksum R bin/basename
5154 934 cksum R basename
5154 934 cksum R usr/bin/bashbug
5154 934 cksum R bin/bashbug
5154 934 cksum R bashbug
5154 934 cksum M bashbug
5155 934 cksum R usr/bin/batch
5155 934 cksum R bin/batch
5155 934 cksum R batch
5155 934 cksum M batch
5155 934 cksum R usr/bin/bc
5155 934 cksum R bin/bc
5155 934 cksum R bc
5155 934 cksum M bc
5169 934 cksum R usr/bin/bdftopcf
5169 934 cksum R bin/bdftopcf
5169 934 cksum R bdftopcf
5169 934 cksum M bdftopcf
5173 934 cksum R usr/bin/bdftruncate
5173 934 cksum R bin/bdftruncate
5173 934 cksum R bdftruncate
5173 934 cksum M bdftruncate
The way the dcache is currently implemented, each component of a path is
checked in turn. The first line, showing "proc/interrupts" from irqbalance,
will be a lookup for "proc" in a directory (that isn't shown here). If it
finds "proc", it will then lookup "interrupts" inside net.
The script is easily modifiable to only show misses, reducing the volume of
the output. Or use the bcc version of this tool, which only shows misses by
default: https://github.com/iovisor/bcc
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment