Commit 7e2a0daa authored by Brenden Blanco's avatar Brenden Blanco

Merge pull request #276 from brendangregg/master

softirq and hardirq
parents 971e1c25 860b6497
......@@ -69,9 +69,11 @@ Tools:
- tools/[biosnoop](tools/biosnoop): Trace block device I/O with PID and latency. [Examples](tools/biosnoop_example.txt).
- tools/[funccount](tools/funccount): Count kernel function calls. [Examples](tools/funccount_example.txt).
- tools/[funclatency](tools/funclatency): Time kernel functions and show their latency distribution. [Examples](tools/funclatency_example.txt).
- tools/[hardirqs](tools/hardirqs): Measure hard IRQ (hard interrupt) event time. [Examples](tools/hardirqs_example.txt).
- tools/[killsnoop](tools/killsnoop): Trace signals issued by the kill() syscall. [Examples](tools/killsnoop_example.txt).
- tools/[opensnoop](tools/opensnoop): Trace open() syscalls. [Examples](tools/opensnoop_example.txt).
- tools/[pidpersec](tools/pidpersec): Count new processes (via fork). [Examples](tools/pidpersec_example.txt).
- tools/[softirqs](tools/softirqs): Measure soft IRQ (soft interrupt) event time. [Examples](tools/softirqs_example.txt).
- tools/[syncsnoop](tools/syncsnoop): Trace sync() syscall. [Examples](tools/syncsnoop_example.txt).
- tools/[tcpaccept](tools/tcpaccept): Trace TCP passive connections (accept()). [Examples](tools/tcpaccept_example.txt).
- tools/[tcpconnect](tools/tcpconnect): Trace TCP active connections (connect()). [Examples](tools/tcpconnect_example.txt).
......
.TH hardirqs 8 "2015-10-20" "USER COMMANDS"
.SH NAME
hardirqs \- Measure hard IRQ (hard interrupt) event time. Uses Linux eBPF/bcc.
.SH SYNOPSIS
.B hardirqs [\-h] [\-T] [\-N] [\-d] [interval] [count]
.SH DESCRIPTION
This summarizes the time spent servicing hard IRQs (hard interrupts), and can
show this time as either totals or histogram distributions. A system-wide
summary of this time is shown by the %irq column of mpstat(1), and event
counts (but not times) are shown by /proc/interrupts.
WARNING: This currently uses dynamic tracing of hard interrupts. You should
understand what this means before use. Try in a test environment. Future
versions should switch to tracepoints.
Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_BPF and bcc.
.SH OPTIONS
.TP
\-h
Print usage message.
.TP
\-T
Include timestamps on output.
.TP
\-N
Output in nanoseconds
.TP
\-d
Show IRQ time distribution as histograms
.SH EXAMPLES
.TP
Sum hard IRQ event time until Ctrl-C:
#
.B hardirqs
.TP
Show hard IRQ event time as histograms:
#
.B hardirqs \-d
.TP
Print 1 second summaries, 10 times:
#
.B hardirqs 1 10
.TP
1 second summaries, printed in nanoseconds, with timestamps:
#
.B hardirqs \-NT 1
.SH FIELDS
.TP
HARDIRQ
The irq action name for this hard IRQ.
.TP
TOTAL_usecs
Total time spent in this hard IRQ in microseconds.
.TP
TOTAL_nsecs
Total time spent in this hard IRQ in nanoseconds.
.TP
usecs
Range of microseconds for this bucket.
.TP
nsecs
Range of nanoseconds for this bucket.
.TP
count
Number of hard IRQs in this time range.
.TP
distribution
ASCII representation of the distribution (the count column).
.SH OVERHEAD
This traces kernel functions and maintains in-kernel counts, which
are asynchronously copied to user-space. While the rate of interrupts
be very high (>1M/sec), this is a relatively efficient way to trace these
events, and so the overhead is expected to be small for normal workloads, but
could become noticable for heavy workloads. Measure in a test environment
before use.
.SH SOURCE
This is from bcc.
.IP
https://github.com/iovisor/bcc
.PP
Also look in the bcc distribution for a companion _examples.txt file containing
example usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
softirqs(8)
.TH softirqs 8 "2015-10-20" "USER COMMANDS"
.SH NAME
softirqs \- Measure soft IRQ (soft interrupt) event time. Uses Linux eBPF/bcc.
.SH SYNOPSIS
.B softirqs [\-h] [\-T] [\-N] [\-d] [interval] [count]
.SH DESCRIPTION
This summarizes the time spent servicing soft IRQs (soft interrupts), and can
show this time as either totals or histogram distributions. A system-wide
summary of this time is shown by the %soft column of mpstat(1), and soft IRQ
event counts (but not times) are available in /proc/softirqs.
WARNING: This currently uses dynamic tracing of various soft interrupt
functions, and can easily not work with different kernel versions. Check and
adjust the code as necessary. Also try in a test environment and ensure this
tool is safe before use. Future versions should switch to tracepoints.
Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_BPF and bcc.
.SH OPTIONS
.TP
\-h
Print usage message.
.TP
\-T
Include timestamps on output.
.TP
\-N
Output in nanoseconds
.TP
\-d
Show IRQ time distribution as histograms
.SH EXAMPLES
.TP
Sum soft IRQ event time until Ctrl-C:
#
.B softirqs
.TP
Show soft IRQ event time as histograms:
#
.B softirqs \-d
.TP
Print 1 second summaries, 10 times:
#
.B softirqs 1 10
.TP
1 second summaries, printed in nanoseconds, with timestamps:
#
.B softirqs \-NT 1
.SH FIELDS
.TP
SOFTIRQ
The kernel function name that performs the soft IRQ action.
.TP
TOTAL_usecs
Total time spent in this soft IRQ function in microseconds.
.TP
TOTAL_nsecs
Total time spent in this soft IRQ function in nanoseconds.
.TP
usecs
Range of microseconds for this bucket.
.TP
nsecs
Range of nanoseconds for this bucket.
.TP
count
Number of soft IRQs in this time range.
.TP
distribution
ASCII representation of the distribution (the count column).
.SH OVERHEAD
This traces kernel functions and maintains in-kernel counts, which
are asynchronously copied to user-space. While the rate of interrupts
be very high (>1M/sec), this is a relatively efficient way to trace these
events, and so the overhead is expected to be small for normal workloads, but
could become noticable for heavy workloads. Measure in a test environment
before use.
.SH SOURCE
This is from bcc.
.IP
https://github.com/iovisor/bcc
.PP
Also look in the bcc distribution for a companion _examples.txt file containing
example usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
hardirqs(8)
#!/usr/bin/python
#
# hardirqs Summarize hard IRQ (interrupt) event time.
# For Linux, uses BCC, eBPF.
#
# USAGE: hardirqs [-h] [-T] [-Q] [-m] [-D] [interval] [count]
#
# Thanks Amer Ather for help understanding irq behavior.
#
# Copyright (c) 2015 Brendan Gregg.
# Licensed under the Apache License, Version 2.0 (the "License")
#
# 19-Oct-2015 Brendan Gregg Created this.
from __future__ import print_function
from bcc import BPF
from time import sleep, strftime
import argparse
### arguments
examples = """examples:
./hardirqs # sum hard irq event time
./hardirqs -d # show hard irq event time as histograms
./hardirqs 1 10 # print 1 second summaries, 10 times
./hardirqs -NT 1 # 1s summaries, nanoseconds, and timestamps
"""
parser = argparse.ArgumentParser(
description="Summarize hard irq event time as histograms",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=examples)
parser.add_argument("-T", "--timestamp", action="store_true",
help="include timestamp on output")
parser.add_argument("-N", "--nanoseconds", action="store_true",
help="output in nanoseconds")
parser.add_argument("-d", "--dist", action="store_true",
help="show distributions as histograms")
parser.add_argument("interval", nargs="?", default=99999999,
help="output interval, in seconds")
parser.add_argument("count", nargs="?", default=99999999,
help="number of outputs")
args = parser.parse_args()
countdown = int(args.count)
if args.nanoseconds:
factor = 1
label = "nsecs"
else:
factor = 1000
label = "usecs"
debug = 0
### define BPF program
bpf_text = """
#include <uapi/linux/ptrace.h>
#include <linux/irq.h>
#include <linux/irqdesc.h>
#include <linux/interrupt.h>
typedef struct irq_key {
char name[32];
u64 slot;
} irq_key_t;
BPF_HASH(start, u32);
BPF_HASH(irqdesc, u32, struct irq_desc *);
BPF_HISTOGRAM(dist, irq_key_t);
// time IRQ
int trace_start(struct pt_regs *ctx, struct irq_desc *desc)
{
u32 pid = bpf_get_current_pid_tgid();
u64 ts = bpf_ktime_get_ns();
start.update(&pid, &ts);
irqdesc.update(&pid, &desc);
return 0;
}
int trace_completion(struct pt_regs *ctx)
{
u64 *tsp, delta;
struct irq_desc **descp;
u32 pid = bpf_get_current_pid_tgid();
// fetch timestamp and calculate delta
tsp = start.lookup(&pid);
descp = irqdesc.lookup(&pid);
if (tsp == 0 || descp == 0) {
return 0; // missed start
}
// Note: descp is a value from map, so '&' can be done without
// probe_read, but the next level irqaction * needs a probe read.
// Do these steps first after reading the map, otherwise some of these
// pointers may get pushed onto the stack and verifier will fail.
struct irqaction *action = 0;
bpf_probe_read(&action, sizeof(action), &(*descp)->action);
const char **namep = &action->name;
char *name = 0;
bpf_probe_read(&name, sizeof(name), namep);
delta = bpf_ktime_get_ns() - *tsp;
// store as sum or histogram
STORE
start.delete(&pid);
irqdesc.delete(&pid);
return 0;
}
"""
### code substitutions
if args.dist:
bpf_text = bpf_text.replace('STORE',
'irq_key_t key = {.slot = bpf_log2l(delta)};' +
'bpf_probe_read(&key.name, sizeof(key.name), name);' +
'dist.increment(key);')
else:
bpf_text = bpf_text.replace('STORE',
'irq_key_t key = {.slot = 0 /* ignore */};' +
'bpf_probe_read(&key.name, sizeof(key.name), name);' +
'u64 zero = 0, *vp = dist.lookup_or_init(&key, &zero);' +
'(*vp) += delta;')
if debug:
print(bpf_text)
### load BPF program
b = BPF(text=bpf_text)
# these should really use irq:irq_handler_entry/exit tracepoints:
b.attach_kprobe(event="handle_irq_event_percpu", fn_name="trace_start")
b.attach_kretprobe(event="handle_irq_event_percpu", fn_name="trace_completion")
print("Tracing hard irq event time... Hit Ctrl-C to end.")
### output
exiting = 0 if args.interval else 1
dist = b.get_table("dist")
while (1):
try:
sleep(int(args.interval))
except KeyboardInterrupt:
exiting=1
print()
if args.timestamp:
print("%-8s\n" % strftime("%H:%M:%S"), end="")
if args.dist:
dist.print_log2_hist(label, "hardirq")
else:
print("%-26s %11s" % ("HARDIRQ", "TOTAL_" + label))
for k, v in sorted(dist.items(), key=lambda dist: dist[1].value):
print("%-26s %11d" % (k.name, v.value / factor))
dist.clear()
countdown -= 1
if exiting or countdown == 0:
exit()
This diff is collapsed.
#!/usr/bin/python
#
# softirqs Summarize soft IRQ (interrupt) event time.
# For Linux, uses BCC, eBPF.
#
# USAGE: softirqs [-h] [-T] [-N] [-d] [interval] [count]
#
# Copyright (c) 2015 Brendan Gregg.
# Licensed under the Apache License, Version 2.0 (the "License")
#
# 20-Oct-2015 Brendan Gregg Created this.
from __future__ import print_function
from bcc import BPF
from time import sleep, strftime
import argparse
### arguments
examples = """examples:
./softirqs # sum soft irq event time
./softirqs -d # show soft irq event time as histograms
./softirqs 1 10 # print 1 second summaries, 10 times
./softirqs -NT 1 # 1s summaries, nanoseconds, and timestamps
"""
parser = argparse.ArgumentParser(
description="Summarize soft irq event time as histograms",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=examples)
parser.add_argument("-T", "--timestamp", action="store_true",
help="include timestamp on output")
parser.add_argument("-N", "--nanoseconds", action="store_true",
help="output in nanoseconds")
parser.add_argument("-d", "--dist", action="store_true",
help="show distributions as histograms")
parser.add_argument("interval", nargs="?", default=99999999,
help="output interval, in seconds")
parser.add_argument("count", nargs="?", default=99999999,
help="number of outputs")
args = parser.parse_args()
countdown = int(args.count)
if args.nanoseconds:
factor = 1
label = "nsecs"
else:
factor = 1000
label = "usecs"
debug = 0
### define BPF program
bpf_text = """
#include <uapi/linux/ptrace.h>
typedef struct irq_key {
u64 ip;
u64 slot;
} irq_key_t;
BPF_HASH(start, u32);
BPF_HASH(iptr, u32);
BPF_HISTOGRAM(dist, irq_key_t);
// time IRQ
int trace_start(struct pt_regs *ctx)
{
u32 pid = bpf_get_current_pid_tgid();
u64 ip = ctx->ip, ts = bpf_ktime_get_ns();
start.update(&pid, &ts);
iptr.update(&pid, &ip);
return 0;
}
int trace_completion(struct pt_regs *ctx)
{
u64 *tsp, delta, ip, *ipp;
u32 pid = bpf_get_current_pid_tgid();
// fetch timestamp and calculate delta
tsp = start.lookup(&pid);
ipp = iptr.lookup(&pid);
if (tsp == 0 || ipp == 0) {
return 0; // missed start
}
delta = bpf_ktime_get_ns() - *tsp;
ip = *ipp;
// store as sum or histogram
STORE
start.delete(&pid);
iptr.delete(&pid);
return 0;
}
"""
### code substitutions
if args.dist:
bpf_text = bpf_text.replace('STORE',
'irq_key_t key = {.ip = ip, .slot = bpf_log2l(delta)};' +
'dist.increment(key);')
else:
bpf_text = bpf_text.replace('STORE',
'irq_key_t key = {.ip = ip, .slot = 0 /* ignore */};' +
'u64 zero = 0, *vp = dist.lookup_or_init(&key, &zero);' +
'(*vp) += delta;')
if debug:
print(bpf_text)
### load BPF program
b = BPF(text=bpf_text)
# this should really use irq:softirq_entry/exit tracepoints; for now the
# soft irq functions are individually traced (search your kernel for
# open_softirq() calls, and adjust the following list as needed).
for softirqfunc in ("blk_iopoll_softirq", "blk_done_softirq",
"rcu_process_callbacks", "run_rebalance_domains", "tasklet_action",
"tasklet_hi_action", "run_timer_softirq", "net_tx_action",
"net_rx_action"):
b.attach_kprobe(event=softirqfunc, fn_name="trace_start")
b.attach_kretprobe(event=softirqfunc, fn_name="trace_completion")
print("Tracing soft irq event time... Hit Ctrl-C to end.")
### output
exiting = 0 if args.interval else 1
dist = b.get_table("dist")
while (1):
try:
sleep(int(args.interval))
except KeyboardInterrupt:
exiting=1
print()
if args.timestamp:
print("%-8s\n" % strftime("%H:%M:%S"), end="")
if args.dist:
dist.print_log2_hist(label, "softirq", section_print_fn=b.ksym)
else:
print("%-26s %11s" % ("SOFTIRQ", "TOTAL_" + label))
for k, v in sorted(dist.items(), key=lambda dist: dist[1].value):
print("%-26s %11d" % (b.ksym(k.ip), v.value / factor))
dist.clear()
countdown -= 1
if exiting or countdown == 0:
exit()
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment