Commit 77772f51 authored by Brendan Gregg's avatar Brendan Gregg

add runqlen tool

parent 2a1e2dae
......@@ -153,6 +153,7 @@ bpftrace contains various tools, which also serve as examples of programming in
- tools/[opensnoop.bt](tools/loads.bt): Trace open() syscalls showing filenames. [Examples](tools/opensnoop_example.txt).
- tools/[oomkill.bt](tools/oomkill.bt): Trace OOM killer. [Examples](tools/oomkill_example.txt).
- tools/[pidpersec.bt](tools/pidpersec.bt): Count new procesess (via fork). [Examples](tools/pidpersec_example.txt).
- tools/[runqlen.bt](tools/runqlen.bt): CPU scheduler run queue length as a histogram. [Examples](tools/runqlen_example.txt).
- tools/[statsnoop.bt](tools/statsnoop.bt): Trace stat() syscalls for general debugging. [Examples](tools/statsnoop_example.txt).
- tools/[syncsnoop.bt](tools/syncsnoop.bt): Trace sync() variety of syscalls. [Examples](tools/syncsnoop_example.txt).
- tools/[syscount.bt](tools/syscount.bt): Count system callls. [Examples](tools/syscount_example.txt).
......
.TH runqlen 8 "2018-10-07" "USER COMMANDS"
.SH NAME
runqlen.bt \- CPU scheduler run queue length as a histogram. Uses bpftrace/eBPF.
.SH SYNOPSIS
.B runqlen.bt
.SH DESCRIPTION
This program summarizes scheduler queue length as a histogram, and can also
show run queue occupancy. It works by sampling the run queue length on all
CPUs at 99 Hertz.
This tool can be used to identify imbalances, eg, when processes are bound
to CPUs causing queueing, or interrupt mappings causing the same.
Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_BPF and bpftrace.
.SH EXAMPLES
.TP
Trace CPU run queue length system wide, printing a histogram on Ctrl-C:
#
.B runqlen.bt
.SH FIELDS
.TP
1st, 2nd
The run queue length is shown in the first field (after "[").
.TP
3rd
A column showing the count of samples in for that length.
.TP
4th
This is an ASCII histogram representing the count colimn.
.SH OVERHEAD
This samples scheduler structs at 99 Hertz across all CPUs. Relatively,
this is a low rate of events, and the overhead of this tool is expected
to be near zero.
.SH SOURCE
This is from bpftrace.
.IP
https://github.com/iovisor/bpftrace
.PP
Also look in the bpftrace distribution for a companion _examples.txt file containing
example usage, output, and commentary for this tool.
This is a bpftrace version of the bcc tool of the same name. The bcc tool
may provide more options and customizations.
.IP
https://github.com/iovisor/bcc
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
mpstat(1), pidstat(1), uptime(1)
/*
* runqlen.bt CPU scheduler run queue length as a histogram.
* For Linux, uses bpftrace, eBPF.
*
* This is a bpftrace version of the bcc tool of the same name.
*
* Copyright 2018 Netflix, Inc.
* Licensed under the Apache License, Version 2.0 (the "License")
*
* 07-Oct-2018 Brendan Gregg Created this.
*/
#include <linux/sched.h>
// Until BTF is available, we'll need to declare some of this struct manually,
// since it isn't avaible to be #included. This will need maintenance to match
// your kernel version. It is from kernel/sched/sched.h:
struct cfs_rq_partial {
struct load_weight load;
unsigned long runnable_weight;
unsigned int nr_running;
unsigned int h_nr_running;
}
BEGIN
{
printf("Sampling run queue length at 99 Hertz... Hit Ctrl-C to end.\n");
}
profile:hz:99
{
$task = (task_struct *)curtask;
$my_q = (cfs_rq_partial *)$task->se.cfs_rq;
$len = $my_q->nr_running;
$len = $len > 0 ? $len - 1 : 0; // subtract currently runing task
@runqlen = lhist($len, 0, 100, 1);
}
Demonstrations of runqlen, the Linux BPF/bpftrace version.
This tool samples the length of the CPU scheduler run queues, showing these
sampled lengths as a histogram. This can be used to characterize demand for
CPU resources. For example:
# runqlen.bt
Attaching 2 probes...
Sampling run queue length at 99 Hertz... Hit Ctrl-C to end.
^C
@runqlen:
[0, 1) 1967 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1, 2) 0 | |
[2, 3) 0 | |
[3, 4) 306 |@@@@@@@@ |
This output shows that the run queue length was usually zero, except for some
samples where it was 3. This was caused by binding 4 CPU bound threads to a
single CPUs.
There is another version of this tool in bcc: https://github.com/iovisor/bcc
The bcc version provides options to customize the output.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment