Commit d96f6ca1 authored by affansyed's avatar affansyed

Merge branch 'master' into master

parents 7c83a343 8ddb79d3
![BCC Logo](images/logo2.png)
# BPF Compiler Collection (BCC)
This directory contains source code for BCC, a toolkit for creating small
programs that can be dynamically loaded into a Linux kernel.
BCC is a toolkit for creating efficient kernel tracing and manipulation
programs, and includes several useful tools and examples. It makes use of eBPF
(Extended Berkeley Packet Filters), a new feature that was first added to
Linux 3.15. Much of what BCC uses requires Linux 4.1 and above.
eBPF was [described by](https://lkml.org/lkml/2015/4/14/232) Ingo Molnár as:
> One of the more interesting features in this cycle is the ability to attach eBPF programs (user-defined, sandboxed bytecode executed by the kernel) to kprobes. This allows user-defined instrumentation on a live kernel image that can never crash, hang or interfere with the kernel negatively.
BCC makes eBPF programs easier to write, with kernel instrumentation in C
and a front-end in Python. It is suited for many tasks, including performance
analysis and network traffic control.
## Screenshot
This example traces a disk I/O kernel function, and populates an in-kernel
power-of-2 histogram of the I/O size. For efficiency, only the histogram
summary is returned to user-level.
```Shell
# ./bitehist.py
Tracing... Hit Ctrl-C to end.
^C
value : count distribution
0 -> 1 : 3 | |
2 -> 3 : 0 | |
4 -> 7 : 211 |********** |
8 -> 15 : 0 | |
16 -> 31 : 0 | |
32 -> 63 : 0 | |
64 -> 127 : 1 | |
128 -> 255 : 800 |**************************************|
```
The above output shows a bimodal distribution, where the largest mode of
800 I/O was between 128 and 255 Kbytes in size.
The compiler relies upon eBPF (Extended Berkeley Packet Filters), which is a
feature in Linux kernels starting from 3.15. Currently, this compiler leverages
features which are mostly available in Linux 4.1 and above.
See the source: [bitehist.c](examples/bitehist.c) and
[bitehist.py](examples/bitehist.py). What this traces, what this stores, and how
the data is presented, can be entirely customized. This shows only some of
many possible capabilities.
## Installing
See [INSTALL.md](INSTALL.md) for installation steps on your platform.
## Contents
Some of these are single files that contain both C and Python, others have a
pair of .c and .py files, and some are directories of files.
### Tracing
Examples:
- examples/[bitehist.py](examples/bitehist.py) examples/[bitehist.c](examples/bitehist.c): Block I/O size histogram. [Examples](examples/bitehist_example.txt).
- examples/[disksnoop.py](examples/disksnoop.py) examples/[disksnoop.c](examples/disksnoop.c): Trace block device I/O latency. [Examples](examples/disksnoop_example.txt).
- examples/[hello_world.py](examples/hello_world.py): Prints "Hello, World!" for new processes.
- examples/[trace_fields.py](examples/trace_fields.py): Simple example of printing fields from traced events.
- examples/[vfsreadlat.py](examples/vfsreadlat.py) examples/[vfsreadlat.c](examples/vfsreadlat.c): VFS read latency distribution. [Examples](examples/vfsreadlat_example.txt).
Tools:
- tools/[funccount](tools/funccount): Count kernel function calls. [Examples](tools/funccount_example.txt).
- tools/[pidpersec](tools/pidpersec): Count new processes (via fork). [Examples](tools/pidpersec_example.txt).
- tools/[syncsnoop](tools/syncsnoop): Trace sync() syscall. [Examples](tools/syncsnoop_example.txt).
- tools/[vfscount](tools/vfscount) tools/[vfscount.c](tools/vfscount.c): Count VFS calls. [Examples](tools/vfscount_example.txt).
- tools/[vfsstat](tools/vfsstat) tools/[vfsstat.c](tools/vfsstat.c): Count some VFS calls, with column output. [Examples](tools/vfsstat_example.txt).
### Networking
Examples:
- examples/[distributed_bridge/](examples/distributed_bridge): Distributed bridge example.
- examples/[simple_tc.py](examples/simple_tc.py): Simple traffic control example.
- examples/[simulation.py](examples/simulation.py): Simulation helper.
- examples/[tc_neighbor_sharing.py](examples/tc_neighbor_sharing.py) examples/[tc_neighbor_sharing.c](examples/tc_neighbor_sharing.c): Per-IP classification and rate limiting.
- examples/[tunnel_monitor/](examples/tunnel_monitor): Efficiently monitor traffic flows. [Example video](https://www.youtube.com/watch?v=yYy3Cwce02k).
- examples/[vlan_learning.py](examples/vlan_learning.py) examples/[vlan_learning.c](examples/vlan_learning.c): Demux Ethernet traffic into worker veth+namespaces.
## Motivation
BPF guarantees that the programs loaded into the kernel cannot crash, and
......@@ -46,11 +115,11 @@ The features of this toolkit include:
In the future, more bindings besides python will likely be supported. Feel free
to add support for the language of your choice and send a pull request!
## Examples
## Tutorial
This toolchain is currently composed of two parts: a C wrapper around LLVM, and
a Python API to interact with the running program. Later, we will go into more
detail of how this all works.
The BCC toolchain is currently composed of two parts: a C wrapper around LLVM,
and a Python API to interact with the running program. Later, we will go into
more detail of how this all works.
### Hello, World
......@@ -65,32 +134,20 @@ The BPF program always takes at least one argument, which is a pointer to the
context for this type of program. Different program types have different calling
conventions, but for this one we don't care so `void *` is fine.
```python
prog = """
int hello(void *ctx) {
bpf_trace_printk("Hello, World!\\n");
return 0;
};
"""
b = BPF(text=prog)
BPF(text='void kprobe__sys_clone(void *ctx) { bpf_trace_printk("Hello, World!\\n"); }').trace_print()
```
For this example, we will call the program every time `fork()` is called by a
userspace process. Underneath the hood, fork translates to the `clone` syscall,
so we will attach our program to the kernel symbol `sys_clone`.
```python
b.attach_kprobe(event="sys_clone", fn_name="hello")
```
userspace process. Underneath the hood, fork translates to the `clone` syscall.
BCC recognizes prefix `kprobe__`, and will auto attach our program to the kernel symbol `sys_clone`.
The python process will then print the trace printk circular buffer until ctrl-c
is pressed. The BPF program is removed from the kernel when the userspace
process that loaded it closes the fd (or exits).
```python
b.trace_print()
```
Output:
```
bcc/examples$ sudo python hello_world.py
bcc/examples$ sudo python hello_world.py
python-7282 [002] d... 3757.488508: : Hello, World!
```
......
......@@ -38,7 +38,7 @@ static unsigned int log2l(unsigned long v)
return log2(v) + 1;
}
int do_request(struct pt_regs *ctx, struct request *req)
int kprobe__blk_start_request(struct pt_regs *ctx, struct request *req)
{
int index = log2l(req->__data_len / 1024);
u64 *leaf = dist.lookup(&index);
......
......@@ -5,8 +5,6 @@
#
# Written as a basic example of using a histogram to show a distribution.
#
# USAGE: bitehist.py [interval [count]]
#
# The default interval is 5 seconds. A Ctrl-C will print the partially
# gathered histogram then exit.
#
......@@ -16,90 +14,18 @@
# 15-Aug-2015 Brendan Gregg Created this.
from bcc import BPF
from ctypes import c_ushort, c_int, c_ulonglong
from time import sleep
from sys import argv
def usage():
print("USAGE: %s [interval [count]]" % argv[0])
exit()
# arguments
interval = 5
count = -1
if len(argv) > 1:
try:
interval = int(argv[1])
if interval == 0:
raise
if len(argv) > 2:
count = int(argv[2])
except: # also catches -h, --help
usage()
# load BPF program
b = BPF(src_file = "bitehist.c")
b.attach_kprobe(event="blk_start_request", fn_name="do_request")
dist_max = 64
# header
print("Tracing... Hit Ctrl-C to end.")
# functions
stars_max = 38
def stars(val, val_max, width):
i = 0
text = ""
while (1):
if (i > (width * val / val_max) - 1) or (i > width - 1):
break
text += "*"
i += 1
if val > val_max:
text = text[:-1] + "+"
return text
def print_log2_hist(dist, val_type):
idx_max = -1
val_max = 0
for i in range(1, dist_max + 1):
try:
val = dist[c_int(i)].value
if (val > 0):
idx_max = i
if (val > val_max):
val_max = val
except:
break
if idx_max > 0:
print(" %-15s : count distribution" % val_type);
for i in range(1, idx_max + 1):
low = (1 << i) >> 1
high = (1 << i) - 1
if (low == high):
low -= 1
try:
val = dist[c_int(i)].value
print("%8d -> %-8d : %-8d |%-*s|" % (low, high, val,
stars_max, stars(val, val_max, stars_max)))
except:
break
# output
loop = 0
do_exit = 0
while (1):
if count > 0:
loop += 1
if loop > count:
exit()
try:
sleep(interval)
except KeyboardInterrupt:
pass; do_exit = 1
try:
sleep(99999999)
except KeyboardInterrupt:
print
print_log2_hist(b["dist"], "kbytes")
b["dist"].clear()
if do_exit:
exit()
b["dist"].print_log2_hist()
Demonstrations of bitehist.py, the Linux eBPF/bcc version.
This prints a power-of-2 histogram to show the block I/O size distribution.
By default, a summary is printed every five seconds:
A summary is printed after Ctrl-C is hit.
# ./bitehist.py
# ./bitehist.py
Tracing... Hit Ctrl-C to end.
kbytes : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 26 |************* |
8 -> 15 : 3 |* |
16 -> 31 : 5 |** |
32 -> 63 : 6 |*** |
64 -> 127 : 7 |*** |
128 -> 255 : 75 |**************************************|
kbytes : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 83 |**************************************|
8 -> 15 : 2 | |
16 -> 31 : 6 |** |
32 -> 63 : 6 |** |
64 -> 127 : 5 |** |
128 -> 255 : 21 |********* |
^C
kbytes : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 8 |**************************************|
The first output shows a bimodal distribution. The largest mode of 75 I/O were
between 128 and 255 Kbytes in size, and another mode of 26 I/O were between 4
and 7 Kbytes in size.
The next output summary shows the workload is doing more 4 - 7 Kbyte I/O.
The final output is partial, showing what was measured until Ctrl-C was hit.
For an output interval of one second, and three summaries:
# ./bitehist.py 1 3
Tracing... Hit Ctrl-C to end.
kbytes : count distribution
0 -> 1 : 0 | |
value : count distribution
0 -> 1 : 3 | |
2 -> 3 : 0 | |
4 -> 7 : 4 |**************************************|
kbytes : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 5 |**************************************|
4 -> 7 : 211 |********** |
8 -> 15 : 0 | |
16 -> 31 : 0 | |
32 -> 63 : 1 |******* |
kbytes : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 4 |**************************************|
32 -> 63 : 0 | |
64 -> 127 : 1 | |
128 -> 255 : 800 |**************************************|
Full usage:
This output shows a bimodal distribution. The largest mod of 800 I/O were
between 128 and 255 Kbytes in size, and another mode of 211 I/O were between
4 and 7 Kbytes in size.
# ./bitehist.py -h
USAGE: ./bitehist.py [interval [count]]
Understanding this distribution is useful for characterizing workloads and
understanding performance. The existance of this distribution is not visible
from averages alone.
......@@ -13,36 +13,23 @@
#include <uapi/linux/ptrace.h>
#include <linux/blkdev.h>
struct key_t {
struct request *req;
};
BPF_TABLE("hash", struct key_t, u64, start, 10240);
int do_request(struct pt_regs *ctx, struct request *req) {
struct key_t key = {};
u64 ts;
BPF_HASH(start, struct request *);
void kprobe__blk_start_request(struct pt_regs *ctx, struct request *req) {
// stash start timestamp by request ptr
ts = bpf_ktime_get_ns();
key.req = req;
start.update(&key, &ts);
u64 ts = bpf_ktime_get_ns();
return 0;
start.update(&req, &ts);
}
int do_completion(struct pt_regs *ctx, struct request *req) {
struct key_t key = {};
void kprobe__blk_update_request(struct pt_regs *ctx, struct request *req) {
u64 *tsp, delta;
key.req = req;
tsp = start.lookup(&key);
tsp = start.lookup(&req);
if (tsp != 0) {
delta = bpf_ktime_get_ns() - *tsp;
bpf_trace_printk("%d %x %d\n", req->__data_len,
req->cmd_flags, delta / 1000);
start.delete(&key);
start.delete(&req);
}
return 0;
}
......@@ -17,8 +17,6 @@ REQ_WRITE = 1 # from include/linux/blk_types.h
# load BPF program
b = BPF(src_file="disksnoop.c")
b.attach_kprobe(event="blk_start_request", fn_name="do_request")
b.attach_kprobe(event="blk_update_request", fn_name="do_completion")
# header
print("%-18s %-2s %-7s %8s" % ("TIME(s)", "T", "BYTES", "LAT(ms)"))
......
......@@ -4,15 +4,8 @@
# run in project examples directory with:
# sudo ./hello_world.py"
# see trace_fields.py for a longer example
from bcc import BPF
prog = """
int hello(void *ctx) {
bpf_trace_printk("Hello, World!\\n");
return 0;
}
"""
b = BPF(text=prog)
b.attach_kprobe(event="sys_clone", fn_name="hello")
b.trace_print(fmt="{1} {5}")
BPF(text='void kprobe__sys_clone(void *ctx) { bpf_trace_printk("Hello, World!\\n"); }').trace_print()
#!/usr/bin/env python
# Copyright (c) PLUMgrid, Inc.
# Licensed under the Apache License, Version 2.0 (the "License")
# This is an example of tracing an event and printing custom fields.
# run in project examples directory with:
# sudo ./trace_fields.py"
from bcc import BPF
prog = """
int hello(void *ctx) {
bpf_trace_printk("Hello, World!\\n");
return 0;
}
"""
b = BPF(text=prog)
b.attach_kprobe(event="sys_clone", fn_name="hello")
print "PID MESSAGE"
b.trace_print(fmt="{1} {5}")
......@@ -13,11 +13,7 @@
#include <uapi/linux/ptrace.h>
struct key_t {
u32 pid;
};
BPF_TABLE("hash", struct key_t, u64, start, 10240);
BPF_HASH(start, u32);
BPF_TABLE("array", int, u64, dist, 64);
static unsigned int log2(unsigned int v)
......@@ -44,29 +40,29 @@ static unsigned int log2l(unsigned long v)
int do_entry(struct pt_regs *ctx)
{
struct key_t key = {};
u64 ts, *val, zero = 0;
u32 pid;
u64 ts, *val;
key.pid = bpf_get_current_pid_tgid();
pid = bpf_get_current_pid_tgid();
ts = bpf_ktime_get_ns();
start.update(&key, &ts);
start.update(&pid, &ts);
return 0;
}
int do_return(struct pt_regs *ctx)
{
struct key_t key = {};
u32 pid;
u64 *tsp, delta;
key.pid = bpf_get_current_pid_tgid();
tsp = start.lookup(&key);
pid = bpf_get_current_pid_tgid();
tsp = start.lookup(&pid);
if (tsp != 0) {
delta = bpf_ktime_get_ns() - *tsp;
int index = log2l(delta / 1000);
u64 *leaf = dist.lookup(&index);
if (leaf) (*leaf)++;
start.delete(&key);
start.delete(&pid);
}
return 0;
......
......@@ -41,51 +41,10 @@ if len(argv) > 1:
b = BPF(src_file = "vfsreadlat.c")
b.attach_kprobe(event="vfs_read", fn_name="do_entry")
b.attach_kretprobe(event="vfs_read", fn_name="do_return")
dist_max = 64
# header
print("Tracing... Hit Ctrl-C to end.")
# functions
stars_max = 38
def stars(val, val_max, width):
i = 0
text = ""
while (1):
if (i > (width * val / val_max) - 1) or (i > width - 1):
break
text += "*"
i += 1
if val > val_max:
text = text[:-1] + "+"
return text
def print_log2_hist(dist, val_type):
idx_max = -1
val_max = 0
for i in range(1, dist_max + 1):
try:
val = dist[c_int(i)].value
if (val > 0):
idx_max = i
if (val > val_max):
val_max = val
except:
break
if idx_max > 0:
print(" %-15s : count distribution" % val_type);
for i in range(1, idx_max + 1):
low = (1 << i) >> 1
high = (1 << i) - 1
if (low == high):
low -= 1
try:
val = dist[c_int(i)].value
print("%8d -> %-8d : %-8d |%-*s|" % (low, high, val,
stars_max, stars(val, val_max, stars_max)))
except:
break
# output
loop = 0
do_exit = 0
......@@ -100,7 +59,7 @@ while (1):
pass; do_exit = 1
print
print_log2_hist(b["dist"], "usecs")
b["dist"].print_log2_hist("usecs")
b["dist"].clear()
if do_exit:
exit()
.TH funccount 8 "2015-08-18" "USER COMMANDS"
.SH NAME
funccount \- Count kernel function calls matching a pattern. Uses Linux eBPF/bcc.
.SH SYNOPSIS
.B funccount [\-h] [\-p PID] [\-i INTERVAL] [\-T] [\-r] pattern
.SH DESCRIPTION
This tool is a quick way to determine which kernel functions are being called,
and at what rate. It uses in-kernel eBPF maps to count function calls.
WARNING: This uses dynamic tracing of (what can be many) kernel functions, an
activity that has had issues on some kernel versions (risk of panics or
freezes). Test, and know what you are doing, before use.
Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_BPF and bcc.
.SH OPTIONS
pattern
Search pattern. Supports "*" wildcards. See EXAMPLES. You can also use \-r for regular expressions.
\-h
Print usage message.
.TP
\-p PID
Trace this process ID only.
.TP
\-i INTERVAL
Print output every interval seconds.
.TP
\-T
Include timestamps on output.
.TP
\-r
Use regular expressions for the search pattern.
.SH EXAMPLES
.TP
Count kernel functions beginning with "vfs_", until Ctrl-C is hit:
#
.B funccount 'vfs_*'
.TP
Count kernel functions beginning with "tcp_send", until Ctrl-C is hit:
#
.B funccount 'tcp_send*'
.TP
Print kernel functions beginning with "vfs_", every second:
#
.B funccount \-i 1 'vfs_*'
.TP
Match kernel functions beginning with "vfs_", using regular expressions:
#
.B funccount \-r '^vfs_.*'
.TP
Count vfs calls for process ID 181 only:
#
.B funccount \-p 181 'vfs_*'
.SH FIELDS
.TP
ADDR
Address of the instruction pointer that was traced (only useful if the FUNC column is suspicious and you would like to double check the translation).
.TP
FUNC
Kernel function name
.TP
COUNT
Number of calls while tracing
.SH OVERHEAD
This traces kernel functions and maintains in-kernel counts, which
are asynchronously copied to user-space. While the rate of VFS operations can
be very high (>1M/sec), this is a relatively efficient way to trace these
events, and so the overhead is expected to be small for normal workloads.
Measure in a test environment, and if overheads are an issue, edit the script
to reduce the types of vfs functions traced (currently all beginning with
"vfs_").
.SH SOURCE
This is from bcc.
.IP
https://github.com/iovisor/bcc
.PP
Also look in the bcc distribution for a companion _examples.txt file containing
example usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
vfscount(8)
......@@ -36,7 +36,7 @@ endif()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DBCC_INSTALL_PREFIX='\"${CMAKE_INSTALL_PREFIX}\"'")
add_library(bcc SHARED bpf_common.cc bpf_module.cc libbpf.c)
set_target_properties(bcc PROPERTIES VERSION ${REVISION} SOVERSION 0)
set_target_properties(bcc PROPERTIES VERSION ${REVISION_LAST} SOVERSION 0)
# BPF is still experimental otherwise it should be available
#llvm_map_components_to_libnames(llvm_libs bpf mcjit irreader passes)
......
......@@ -295,7 +295,7 @@ unique_ptr<ExecutionEngine> BPFModule::finalize_rw(unique_ptr<Module> m) {
// load an entire c file as a module
int BPFModule::load_cfile(const string &file, bool in_memory) {
clang_loader_ = make_unique<ClangLoader>(&*ctx_);
clang_loader_ = make_unique<ClangLoader>(&*ctx_, flags_);
if (clang_loader_->parse(&mod_, &tables_, file, in_memory))
return -1;
return 0;
......@@ -307,7 +307,7 @@ int BPFModule::load_cfile(const string &file, bool in_memory) {
// Load in a pre-built list of functions into the initial Module object, then
// build an ExecutionEngine.
int BPFModule::load_includes(const string &tmpfile) {
clang_loader_ = make_unique<ClangLoader>(&*ctx_);
clang_loader_ = make_unique<ClangLoader>(&*ctx_, flags_);
if (clang_loader_->parse(&mod_, &tables_, tmpfile, false))
return -1;
return 0;
......@@ -652,7 +652,7 @@ int BPFModule::load_b(const string &filename, const string &proto_filename) {
if (int rc = load_includes(BCC_INSTALL_PREFIX "/share/bcc/include/bcc/helpers.h"))
return rc;
b_loader_.reset(new BLoader);
b_loader_.reset(new BLoader(flags_));
if (int rc = b_loader_->parse(&*mod_, filename, proto_filename, &tables_))
return rc;
if (int rc = annotate())
......
......@@ -41,6 +41,20 @@ struct _name##_table_t { \
__attribute__((section("maps/" _table_type))) \
struct _name##_table_t _name
#define BPF_HASH1(_name) \
BPF_TABLE("hash", u64, u64, _name, 10240)
#define BPF_HASH2(_name, _key_type) \
BPF_TABLE("hash", _key_type, u64, _name, 10240)
#define BPF_HASH3(_name, _key_type, _leaf_type) \
BPF_TABLE("hash", _key_type, _leaf_type, _name, 10240)
// helper for default-variable macro function
#define BPF_HASHX(_1, _2, _3, NAME, ...) NAME
// Define a hash function, some arguments optional
// BPF_HASH(name, key_type=u64, leaf_type=u64, size=10240)
#define BPF_HASH(...) \
BPF_HASHX(__VA_ARGS__, BPF_HASH3, BPF_HASH2, BPF_HASH1)(__VA_ARGS__)
// packet parsing state machine helpers
#define cursor_advance(_cursor, _len) \
({ void *_tmp = _cursor; _cursor += _len; _tmp; })
......
......@@ -27,7 +27,8 @@ using std::vector;
namespace ebpf {
BLoader::BLoader() {
BLoader::BLoader(unsigned flags) : flags_(flags) {
(void)flags_;
}
BLoader::~BLoader() {
......
......@@ -35,11 +35,12 @@ class CodegenLLVM;
class BLoader {
public:
BLoader();
explicit BLoader(unsigned flags);
~BLoader();
int parse(llvm::Module *mod, const std::string &filename, const std::string &proto_filename,
std::unique_ptr<std::vector<TableDesc>> *tables);
private:
unsigned flags_;
std::unique_ptr<cc::Parser> parser_;
std::unique_ptr<cc::Parser> proto_parser_;
std::unique_ptr<cc::CodegenLLVM> codegen_;
......
......@@ -102,6 +102,7 @@ bool BTypeVisitor::VisitFunctionDecl(FunctionDecl *D) {
rewriter_.InsertText(D->getLocStart(), attr);
// remember the arg names of the current function...first one is the ctx
fn_args_.clear();
string preamble = "{";
for (auto arg : D->params()) {
if (arg->getName() == "") {
C.getDiagnostics().Report(arg->getLocEnd(), diag::err_expected)
......@@ -109,7 +110,15 @@ bool BTypeVisitor::VisitFunctionDecl(FunctionDecl *D) {
return false;
}
fn_args_.push_back(arg);
if (fn_args_.size() > 1) {
size_t d = fn_args_.size() - 2;
const char *reg = calling_conv_regs[d];
preamble += arg->getName().str() + " = " + fn_args_[0]->getName().str() + "->" + string(reg) + ";";
}
}
// for each trace argument, convert the variable from ptregs to something on stack
if (CompoundStmt *S = dyn_cast<CompoundStmt>(D->getBody()))
rewriter_.ReplaceText(S->getLBracLoc(), 1, preamble);
}
return true;
}
......@@ -268,7 +277,7 @@ bool BTypeVisitor::VisitMemberExpr(MemberExpr *E) {
string base_type = Ref->getType()->getPointeeType().getAsString();
string pre, post;
pre = "({ " + E->getType().getAsString() + " _val; memset(&_val, 0, sizeof(_val));";
pre += " bpf_probe_read(&_val, sizeof(_val), ";
pre += " bpf_probe_read(&_val, sizeof(_val), (u64)";
post = " + offsetof(" + base_type + ", " + F->getName().str() + ")";
post += "); _val; })";
rewriter_.InsertText(E->getLocStart(), pre);
......@@ -278,23 +287,6 @@ bool BTypeVisitor::VisitMemberExpr(MemberExpr *E) {
return true;
}
bool BTypeVisitor::VisitDeclRefExpr(DeclRefExpr *E) {
auto it = std::find(fn_args_.begin() + 1, fn_args_.end(), E->getDecl());
if (it != fn_args_.end()) {
if (!rewriter_.isRewritable(E->getLocStart())) {
C.getDiagnostics().Report(E->getLocStart(), diag::err_expected)
<< "use of probe argument not in a macro";
return false;
}
size_t d = std::distance(fn_args_.begin() + 1, it);
const char *reg = calling_conv_regs[d];
string text = "((u64)" + fn_args_[0]->getName().str() + "->" + string(reg) + ")";
rewriter_.ReplaceText(SourceRange(E->getLocStart(), E->getLocEnd()), text);
return true;
}
return true;
}
bool BTypeVisitor::VisitBinaryOperator(BinaryOperator *E) {
if (!E->isAssignmentOp())
return true;
......@@ -442,13 +434,14 @@ bool BTypeConsumer::HandleTopLevelDecl(DeclGroupRef D) {
return true;
}
BFrontendAction::BFrontendAction(llvm::raw_ostream &os)
: rewriter_(new Rewriter), os_(os), tables_(new vector<TableDesc>) {
BFrontendAction::BFrontendAction(llvm::raw_ostream &os, unsigned flags)
: os_(os), flags_(flags), rewriter_(new Rewriter), tables_(new vector<TableDesc>) {
}
void BFrontendAction::EndSourceFileAction() {
// uncomment to see rewritten source
//rewriter_->getEditBuffer(rewriter_->getSourceMgr().getMainFileID()).write(llvm::errs());
if (flags_ & 0x4)
rewriter_->getEditBuffer(rewriter_->getSourceMgr().getMainFileID()).write(llvm::errs());
rewriter_->getEditBuffer(rewriter_->getSourceMgr().getMainFileID()).write(os_);
os_.flush();
}
......
......@@ -67,7 +67,6 @@ class BTypeVisitor : public clang::RecursiveASTVisitor<BTypeVisitor> {
bool VisitCallExpr(clang::CallExpr *Call);
bool VisitVarDecl(clang::VarDecl *Decl);
bool VisitMemberExpr(clang::MemberExpr *E);
bool VisitDeclRefExpr(clang::DeclRefExpr *E);
bool VisitBinaryOperator(clang::BinaryOperator *E);
bool VisitImplicitCastExpr(clang::ImplicitCastExpr *E);
......@@ -96,7 +95,7 @@ class BFrontendAction : public clang::ASTFrontendAction {
public:
// Initialize with the output stream where the new source file contents
// should be written.
explicit BFrontendAction(llvm::raw_ostream &os);
BFrontendAction(llvm::raw_ostream &os, unsigned flags);
// Called by clang when the AST has been completed, here the output stream
// will be flushed.
......@@ -108,8 +107,9 @@ class BFrontendAction : public clang::ASTFrontendAction {
// take ownership of the table-to-fd mapping data structure
std::unique_ptr<std::vector<TableDesc>> take_tables() { return move(tables_); }
private:
std::unique_ptr<clang::Rewriter> rewriter_;
llvm::raw_ostream &os_;
unsigned flags_;
std::unique_ptr<clang::Rewriter> rewriter_;
std::unique_ptr<std::vector<TableDesc>> tables_;
};
......
......@@ -58,8 +58,8 @@ using std::vector;
namespace ebpf {
ClangLoader::ClangLoader(llvm::LLVMContext *ctx)
: ctx_(ctx)
ClangLoader::ClangLoader(llvm::LLVMContext *ctx, unsigned flags)
: ctx_(ctx), flags_(flags)
{}
ClangLoader::~ClangLoader() {}
......@@ -138,6 +138,13 @@ int ClangLoader::parse(unique_ptr<llvm::Module> *mod, unique_ptr<vector<TableDes
// Initialize a compiler invocation object from the clang (-cc1) arguments.
const driver::ArgStringList &ccargs = cmd.getArguments();
if (flags_ & 0x4) {
llvm::errs() << "clang";
for (auto arg : ccargs)
llvm::errs() << " " << arg;
llvm::errs() << "\n";
}
// first pass
auto invocation1 = make_unique<CompilerInvocation>();
if (!CompilerInvocation::CreateFromArgs(*invocation1, const_cast<const char **>(ccargs.data()),
......@@ -159,7 +166,7 @@ int ClangLoader::parse(unique_ptr<llvm::Module> *mod, unique_ptr<vector<TableDes
// capture the rewritten c file
string out_str;
llvm::raw_string_ostream os(out_str);
BFrontendAction bact(os);
BFrontendAction bact(os, flags_);
if (!compiler1.ExecuteAction(bact))
return -1;
// this contains the open FDs
......
......@@ -36,12 +36,13 @@ class CodegenLLVM;
class ClangLoader {
public:
explicit ClangLoader(llvm::LLVMContext *ctx);
explicit ClangLoader(llvm::LLVMContext *ctx, unsigned flags);
~ClangLoader();
int parse(std::unique_ptr<llvm::Module> *mod, std::unique_ptr<std::vector<TableDesc>> *tables,
const std::string &file, bool in_memory);
private:
llvm::LLVMContext *ctx_;
unsigned flags_;
};
} // namespace ebpf
......@@ -239,7 +239,9 @@ int bpf_attach_kprobe(int progfd, const char *event,
}
if (write(kfd, event_desc, strlen(event_desc)) < 0) {
perror("write(kprobe_events)");
fprintf(stderr, "write of \"%s\" into kprobe_events failed: %s\n", event_desc, strerror(errno));
if (errno == EINVAL)
fprintf(stderr, "check dmesg output for possible cause\n");
goto cleanup;
}
......
......@@ -38,6 +38,10 @@ lib.bpf_module_license.restype = ct.c_char_p
lib.bpf_module_license.argtypes = [ct.c_void_p]
lib.bpf_module_kern_version.restype = ct.c_uint
lib.bpf_module_kern_version.argtypes = [ct.c_void_p]
lib.bpf_num_functions.restype = ct.c_ulonglong
lib.bpf_num_functions.argtypes = [ct.c_void_p]
lib.bpf_function_name.restype = ct.c_char_p
lib.bpf_function_name.argtypes = [ct.c_void_p, ct.c_ulonglong]
lib.bpf_function_start.restype = ct.c_void_p
lib.bpf_function_start.argtypes = [ct.c_void_p, ct.c_char_p]
lib.bpf_function_size.restype = ct.c_size_t
......@@ -95,6 +99,7 @@ KALLSYMS = "/proc/kallsyms"
ksym_addrs = []
ksym_names = []
ksym_loaded = 0
stars_max = 38
@atexit.register
def cleanup_kprobes():
......@@ -220,6 +225,52 @@ class BPF(object):
else:
super(BPF.Table, self).clear()
@staticmethod
def _stars(val, val_max, width):
i = 0
text = ""
while (1):
if (i > (width * val / val_max) - 1) or (i > width - 1):
break
text += "*"
i += 1
if val > val_max:
text = text[:-1] + "+"
return text
def print_log2_hist(self, val_type="value"):
"""print_log2_hist(type=value)
Prints a table as a log2 histogram. The table must be stored as
log2. The type argument is optional, and is a column header.
"""
global stars_max
log2_dist_max = 64
idx_max = -1
val_max = 0
for i in range(1, log2_dist_max + 1):
try:
val = self[ct.c_int(i)].value
if (val > 0):
idx_max = i
if (val > val_max):
val_max = val
except:
break
if idx_max > 0:
print(" %-15s : count distribution" % val_type);
for i in range(1, idx_max + 1):
low = (1 << i) >> 1
high = (1 << i) - 1
if (low == high):
low -= 1
try:
val = self[ct.c_int(i)].value
print("%8d -> %-8d : %-8d |%-*s|" % (low, high, val,
stars_max, self._stars(val, val_max, stars_max)))
except:
break
def __iter__(self):
return BPF.Table.Iter(self, self.Key)
......@@ -297,6 +348,23 @@ class BPF(object):
if self.module == None:
raise Exception("Failed to compile BPF module %s" % src_file)
# If any "kprobe__" prefixed functions were defined, they will be
# loaded and attached here.
self._trace_autoload()
def load_funcs(self, prog_type=KPROBE):
"""load_funcs(prog_type=KPROBE)
Load all functions in this BPF module with the given type.
Returns a list of the function handles."""
fns = []
for i in range(0, lib.bpf_num_functions(self.module)):
func_name = lib.bpf_function_name(self.module, i).decode()
fns.append(self.load_func(func_name, prog_type))
return fns
def load_func(self, func_name, prog_type):
if func_name in self.funcs:
return self.funcs[func_name]
......@@ -409,7 +477,10 @@ class BPF(object):
p = Popen(["awk", "$1 ~ /%s/ { print $1 }" % event_re,
"%s/available_filter_functions" % TRACEFS], stdout=PIPE)
lines = p.communicate()[0].decode().split()
return [line.rstrip() for line in lines if line != "\n"]
with open("%s/../kprobes/blacklist" % TRACEFS) as f:
blacklist = [line.split()[1] for line in f.readlines()]
return [line.rstrip() for line in lines if
(line != "\n" and line not in blacklist)]
def attach_kprobe(self, event="", fn_name="", event_re="",
pid=0, cpu=-1, group_fd=-1):
......@@ -417,12 +488,15 @@ class BPF(object):
# allow the caller to glob multiple functions together
if event_re:
for line in BPF._get_kprobe_functions(event_re):
self.attach_kprobe(event=line, fn_name=fn_name, pid=pid,
cpu=cpu, group_fd=group_fd)
try:
self.attach_kprobe(event=line, fn_name=fn_name, pid=pid,
cpu=cpu, group_fd=group_fd)
except:
pass
return
fn = self.load_func(fn_name, BPF.KPROBE)
ev_name = "p_" + event.replace("+", "_")
ev_name = "p_" + event.replace("+", "_").replace(".", "_")
desc = "p:kprobes/%s %s" % (ev_name, event)
res = lib.bpf_attach_kprobe(fn.fd, ev_name.encode("ascii"),
desc.encode("ascii"), pid, cpu, group_fd)
......@@ -433,7 +507,7 @@ class BPF(object):
@staticmethod
def detach_kprobe(event):
ev_name = "p_" + event.replace("+", "_")
ev_name = "p_" + event.replace("+", "_").replace(".", "_")
if ev_name not in open_kprobes:
raise Exception("Kprobe %s is not attached" % event)
os.close(open_kprobes[ev_name])
......@@ -449,12 +523,15 @@ class BPF(object):
# allow the caller to glob multiple functions together
if event_re:
for line in BPF._get_kprobe_functions(event_re):
self.attach_kretprobe(event=line, fn_name=fn_name, pid=pid,
cpu=cpu, group_fd=group_fd)
try:
self.attach_kretprobe(event=line, fn_name=fn_name, pid=pid,
cpu=cpu, group_fd=group_fd)
except:
pass
return
fn = self.load_func(fn_name, BPF.KPROBE)
ev_name = "r_" + event.replace("+", "_")
ev_name = "r_" + event.replace("+", "_").replace(".", "_")
desc = "r:kprobes/%s %s" % (ev_name, event)
res = lib.bpf_attach_kprobe(fn.fd, ev_name.encode("ascii"),
desc.encode("ascii"), pid, cpu, group_fd)
......@@ -465,7 +542,7 @@ class BPF(object):
@staticmethod
def detach_kretprobe(event):
ev_name = "r_" + event.replace("+", "_")
ev_name = "r_" + event.replace("+", "_").replace(".", "_")
if ev_name not in open_kprobes:
raise Exception("Kretprobe %s is not attached" % event)
os.close(open_kprobes[ev_name])
......@@ -475,13 +552,24 @@ class BPF(object):
raise Exception("Failed to detach BPF from kprobe")
del open_kprobes[ev_name]
@staticmethod
def trace_open(nonblocking=False):
def _trace_autoload(self):
# Cater to one-liner case where attach_kprobe is omitted and C function
# name matches that of the kprobe.
if len(open_kprobes) == 0:
for i in range(0, lib.bpf_num_functions(self.module)):
func_name = lib.bpf_function_name(self.module, i).decode()
if func_name.startswith("kprobe__"):
fn = self.load_func(func_name, BPF.KPROBE)
self.attach_kprobe(event=fn.name[8:], fn_name=fn.name)
elif func_name.startswith("kretprobe__"):
fn = self.load_func(func_name, BPF.KPROBE)
self.attach_kretprobe(event=fn.name[11:], fn_name=fn.name)
def trace_open(self, nonblocking=False):
"""trace_open(nonblocking=False)
Open the trace_pipe if not already open
"""
global tracefile
if not tracefile:
tracefile = open("%s/trace_pipe" % TRACEFS)
......@@ -491,18 +579,18 @@ class BPF(object):
fcntl.fcntl(fd, fcntl.F_SETFL, fl | os.O_NONBLOCK)
return tracefile
@staticmethod
def trace_fields(nonblocking=False):
def trace_fields(self, nonblocking=False):
"""trace_fields(nonblocking=False)
Read from the kernel debug trace pipe and return a tuple of the
fields (task, pid, cpu, flags, timestamp, msg) or None if no
line was read (nonblocking=True)
"""
line = BPF.trace_readline(nonblocking)
if line:
while True:
line = self.trace_readline(nonblocking)
if not line and nonblocking: return (None,) * 6
# don't print messages related to lost events
if line.startswith("CPU:"): return
if line.startswith("CPU:"): continue
task = line[:16].lstrip()
line = line[17:]
ts_end = line.find(":")
......@@ -510,17 +598,15 @@ class BPF(object):
cpu = cpu[1:-1]
msg = line[ts_end + 4:]
return (task, int(pid), int(cpu), flags, float(ts), msg)
return
@staticmethod
def trace_readline(nonblocking=False):
def trace_readline(self, nonblocking=False):
"""trace_readline(nonblocking=False)
Read from the kernel debug trace pipe and return one line
If nonblocking is False, this will block until ctrl-C is pressed.
"""
trace = BPF.trace_open(nonblocking)
trace = self.trace_open(nonblocking)
line = None
try:
......@@ -531,9 +617,8 @@ class BPF(object):
exit()
return line
@staticmethod
def trace_print(fmt=None):
"""trace_print(fmt=None)
def trace_print(self, fmt=None):
"""trace_print(self, fmt=None)
Read from the kernel debug trace pipe and print on stdout.
If fmt is specified, apply as a format string to the output. See
......@@ -543,11 +628,11 @@ class BPF(object):
while True:
if fmt:
fields = BPF.trace_fields(nonblocking=False)
fields = self.trace_fields(nonblocking=False)
if not fields: continue
line = fmt.format(*fields)
else:
line = BPF.trace_readline(nonblocking=False)
line = self.trace_readline(nonblocking=False)
print(line)
sys.stdout.flush()
......
......@@ -46,6 +46,28 @@ int count_foo(struct pt_regs *ctx, unsigned long a, unsigned long b) {
b = BPF(text=text, debug=0)
fn = b.load_func("count_foo", BPF.KPROBE)
def test_probe_read_keys(self):
text = """
#include <uapi/linux/ptrace.h>
#include <linux/blkdev.h>
BPF_HASH(start, struct request *);
int do_request(struct pt_regs *ctx, struct request *req) {
u64 ts = bpf_ktime_get_ns();
start.update(&req, &ts);
return 0;
}
int do_completion(struct pt_regs *ctx, struct request *req) {
u64 *tsp = start.lookup(&req);
if (tsp != 0) {
start.delete(&req);
}
return 0;
}
"""
b = BPF(text=text, debug=0)
fns = b.load_funcs(BPF.KPROBE)
def test_sscanf(self):
text = """
BPF_TABLE("hash", int, struct { u64 a; u64 b; u64 c:36; u64 d:28; struct { u32 a; u32 b; } s; }, stats, 10);
......@@ -100,5 +122,13 @@ int do_request(struct pt_regs *ctx, int req) {
b = BPF(text=text, debug=0)
fn = b.load_func("do_request", BPF.KPROBE)
def test_bpf_hash(self):
text = """
BPF_HASH(table1);
BPF_HASH(table2, u32);
BPF_HASH(table3, u32, int);
"""
b = BPF(text=text, debug=0)
if __name__ == "__main__":
main()
......@@ -30,5 +30,12 @@ class TestKprobeRgx(TestCase):
k2 = self.b["stats"].Key(2)
self.assertEqual(self.b["stats"][k1].val, self.b["stats"][k2].val + 1)
class TestKprobeReplace(TestCase):
def setUp(self):
self.b = BPF(text="int empty(void *ctx) { return 0; }")
def test_periods(self):
self.b.attach_kprobe(event_re="^tcp_enter_cwr.*", fn_name="empty")
if __name__ == "__main__":
main()
#!/usr/bin/python
#
# funccount Count kernel function calls.
# For Linux, uses BCC, eBPF. See .c file.
#
# USAGE: funccount [-h] [-p PID] [-i INTERVAL] [-T] [-r] pattern
#
# The pattern is a string with optional '*' wildcards, similar to file globbing.
# If you'd prefer to use regular expressions, use the -r option.
#
# Copyright (c) 2015 Brendan Gregg.
# Licensed under the Apache License, Version 2.0 (the "License")
#
# 09-Sep-2015 Brendan Gregg Created this.
from __future__ import print_function
from bcc import BPF
from time import sleep, strftime
import argparse
import re
import signal
# arguments
examples = """examples:
./funccount 'vfs_*' # count kernel functions starting with "vfs"
./funccount 'tcp_send*' # count kernel funcs starting with "tcp_send"
./funccount -r '^vfs.*' # same as above, using regular expressions
./funccount -Ti 5 'vfs_*' # output every 5 seconds, with timestamps
./funccount -p 185 'vfs_*' # count vfs calls for PID 181 only
"""
parser = argparse.ArgumentParser(
description="Count kernel function calls",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=examples)
parser.add_argument("-p", "--pid",
help="trace this PID only")
parser.add_argument("-i", "--interval", default=99999999,
help="summary interval, seconds")
parser.add_argument("-T", "--timestamp", action="store_true",
help="include timestamp on output")
parser.add_argument("-r", "--regexp", action="store_true",
help="use regular expressions. Default is \"*\" wildcards only.")
parser.add_argument("pattern",
help="search expression for kernel functions")
args = parser.parse_args()
pattern = args.pattern
if not args.regexp:
pattern = pattern.replace('*', '.*')
pattern = '^' + pattern + '$'
debug = 0
# signal handler
def signal_ignore(signal, frame):
print()
# load BPF program
bpf_text = """
#include <uapi/linux/ptrace.h>
struct key_t {
u64 ip;
};
BPF_HASH(counts, struct key_t);
int trace_count(struct pt_regs *ctx) {
FILTER_START
struct key_t key = {};
u64 zero = 0, *val;
key.ip = ctx->ip;
val = counts.lookup_or_init(&key, &zero);
(*val)++;
FILTER_DONE
return 0;
}
"""
if args.pid:
bpf_text = bpf_text.replace('FILTER_START',
('u32 pid; pid = bpf_get_current_pid_tgid(); ' +
'if (pid == %s) {') % (args.pid))
bpf_text = bpf_text.replace('FILTER_DONE', '}')
else:
bpf_text = bpf_text.replace('FILTER_START', '')
bpf_text = bpf_text.replace('FILTER_DONE', '')
if debug:
print(bpf_text)
b = BPF(text=bpf_text)
b.attach_kprobe(event_re=pattern, fn_name="trace_count")
# header
print("Tracing... Ctrl-C to end.")
# output
exiting = 0 if args.interval else 1
while (1):
try:
sleep(int(args.interval))
except KeyboardInterrupt:
exiting=1
# as cleanup can take many seconds, trap Ctrl-C:
signal.signal(signal.SIGINT, signal_ignore)
print()
if args.timestamp:
print("%-8s\n" % strftime("%H:%M:%S"), end="")
print("%-16s %-26s %8s" % ("ADDR", "FUNC", "COUNT"))
counts = b.get_table("counts")
for k, v in sorted(counts.items(), key=lambda counts: counts[1].value):
print("%-16x %-26s %8d" % (k.ip, b.ksym(k.ip), v.value))
counts.clear()
if exiting:
print("Detaching...")
exit()
Demonstrations of funccount, the Linux eBPF/bcc version.
This program traces kernel functions that match a specified pattern, and when
Ctrl-C is hit prints a summary of their count while tracing. Eg, tracing all
functions that begin with "vfs_":
# ./funccount 'vfs_*'
Tracing... Ctrl-C to end.
^C
ADDR FUNC COUNT
ffffffff811efe81 vfs_create 1
ffffffff811f24a1 vfs_rename 1
ffffffff81215191 vfs_fsync_range 2
ffffffff81231df1 vfs_lock_file 30
ffffffff811e8dd1 vfs_fstatat 152
ffffffff811e8d71 vfs_fstat 154
ffffffff811e4381 vfs_write 166
ffffffff811e8c71 vfs_getattr_nosec 262
ffffffff811e8d41 vfs_getattr 262
ffffffff811e3221 vfs_open 264
ffffffff811e4251 vfs_read 470
Detaching...
The above output shows that while tracing the vfs_read() function was called 470
times, and vfs_open() 264 times, etc.
This is useful for exploring kernel code, to figure out which functions are in
use and which are not. This can narrow down an investigation to just a few
functions, whose counts are similar to the workload investigated.
Tracing all tcp functions:
# ./funccount 'tcp_*'
Tracing... Ctrl-C to end.
^C
ADDR FUNC COUNT
ffffffff816baf51 tcp_try_undo_recovery 1
ffffffff816cc431 tcp_twsk_destructor 1
ffffffff816bac51 tcp_enter_recovery 1
ffffffff816c6421 tcp_xmit_retransmit_queue 1
ffffffff816b95a1 tcp_update_scoreboard 1
ffffffff816b8921 tcp_verify_retransmit_hint 1
ffffffff816c4dd1 tcp_tsq_handler.part.31 1
ffffffff816bc721 tcp_sacktag_write_queue 1
ffffffff816b8eb1 tcp_match_skb_to_sack 1
ffffffff816cd4b1 tcp_time_wait 1
ffffffff816b8c91 tcp_mark_head_lost 1
ffffffff816b8a71 tcp_init_cwnd_reduction 1
ffffffff816b90e1 tcp_sacktag_one 1
ffffffff816ba7e1 tcp_sacktag_walk 1
ffffffff816c6321 tcp_retransmit_skb 1
ffffffff816c4ec1 tcp_tasklet_func 1
ffffffff816bed01 tcp_resume_early_retransmit 1
ffffffff816b9351 tcp_dsack_set 1
ffffffff816ca181 tcp_v4_syn_recv_sock 2
ffffffff816cd3d1 tcp_ca_openreq_child 2
ffffffff816cfa91 tcp_try_fastopen 2
ffffffff816cd221 tcp_openreq_init_rwin 2
ffffffff816c8931 tcp_v4_init_req 2
ffffffff816cc461 tcp_create_openreq_child 2
ffffffff816cb841 tcp_v4_send_synack 2
ffffffff816c8121 tcp_v4_init_sequence 2
ffffffff816c2ab1 tcp_fragment 2
ffffffff816c9421 tcp_v4_conn_request 2
ffffffff816b99e1 tcp_conn_request 2
ffffffff816c88f1 tcp_v4_route_req 2
ffffffff816c1ea1 tcp_fragment_tstamp 2
ffffffff816b9511 tcp_try_keep_open 2
ffffffff816c8221 tcp_v4_reqsk_destructor 2
ffffffff816c30e1 tcp_may_send_now 2
ffffffff816c24e1 tcp_make_synack 2
ffffffff816cc8f1 tcp_child_process 2
ffffffff816cc9d1 tcp_check_req 2
ffffffff816bbaf1 tcp_fastretrans_alert 2
ffffffff816c8071 tcp_set_keepalive 2
ffffffff816c0cd1 tcp_finish_connect 3
ffffffff816c1e11 tcp_connect_queue_skb 3
ffffffff816c9c51 tcp_v4_connect 3
ffffffff816b3911 tcp_init_sock 3
ffffffff816c9051 tcp_v4_init_sock 3
ffffffff816c5111 tcp_connect 3
ffffffff816b94e1 tcp_any_retrans_done.part.35 3
ffffffff816be881 tcp_clear_retrans 3
ffffffff816b6f21 tcp_setsockopt 4
ffffffff816cf321 tcp_update_metrics 5
ffffffff816b4ee1 tcp_done 5
ffffffff816b8831 tcp_initialize_rcv_mss 5
ffffffff816b8c01 tcp_sndbuf_expand 5
ffffffff816bb921 tcp_fin 5
ffffffff816c7151 tcp_init_xmit_timers 5
ffffffff816b8301 tcp_close 5
ffffffff816cdd91 tcp_init_congestion_control 5
ffffffff816cf4d1 tcp_init_metrics 5
ffffffff816d02b1 tcp_gro_complete 5
ffffffff816b81c1 tcp_free_fastopen_req 5
ffffffff816ca4e1 tcp_v4_destroy_sock 5
ffffffff816cddb1 tcp_cleanup_congestion_control 5
ffffffff816c67b1 tcp_send_fin 5
ffffffff816bd8e1 tcp_init_buffer_space 5
ffffffff816be801 tcp_init_cwnd 5
ffffffff816c1c51 tcp_select_initial_window 5
ffffffff816b8201 tcp_check_oom 5
ffffffff816c2a81 tcp_default_init_rwnd 5
ffffffff816cdc71 tcp_assign_congestion_control 5
ffffffff816b54b1 tcp_getsockopt 6
ffffffff816b3b21 tcp_ioctl 6
ffffffff816c2fe1 tcp_mtup_init 8
ffffffff816b96d1 tcp_parse_options 8
ffffffff816c2f91 tcp_mss_to_mtu 8
ffffffff816bd511 tcp_try_rmem_schedule 8
ffffffff816cf051 tcp_get_metrics 10
ffffffff816ba271 tcp_try_coalesce 10
ffffffff816c0de1 tcp_rcv_state_process 14
ffffffff816c2941 tcp_sync_mss 14
ffffffff816c7e31 tcp_write_timer_handler 15
ffffffff816c8001 tcp_write_timer 16
ffffffff816bb171 tcp_grow_window.isra.27 22
ffffffff816b45b1 tcp_set_state 23
ffffffff816c5921 tcp_send_ack 37
ffffffff816c7641 tcp_delack_timer 42
ffffffff816c7471 tcp_delack_timer_handler 42
ffffffff816c01f1 tcp_validate_incoming 91
ffffffff816b44f1 tcp_prequeue_process 112
ffffffff816cb8f1 tcp_v4_early_demux 117
ffffffff816d08b1 tcp_gro_receive 146
ffffffff816bb5e1 tcp_queue_rcv 167
ffffffff816bdb91 tcp_data_queue 215
ffffffff816ba321 tcp_urg 219
ffffffff816c6c81 tcp_send_delayed_ack 257
ffffffff816b3ee1 tcp_send_mss 275
ffffffff816b3dc1 tcp_push 275
ffffffff816b76c1 tcp_sendmsg 275
ffffffff816bb2a1 tcp_event_data_recv 275
ffffffff816c1d61 tcp_nagle_check 279
ffffffff816c3f11 tcp_write_xmit 282
ffffffff816c2341 tcp_event_new_data_sent 282
ffffffff816c3061 tcp_current_mss 284
ffffffff816c1db1 tcp_init_tso_segs 284
ffffffff816c2871 tcp_wfree 286
ffffffff816c3251 tcp_schedule_loss_probe 305
ffffffff816cb821 tcp_v4_send_check 323
ffffffff816c3581 tcp_transmit_skb 323
ffffffff816b54e1 tcp_recvmsg 323
ffffffff816c2111 tcp_options_write 325
ffffffff816bda61 tcp_rcv_space_adjust 328
ffffffff816bb721 tcp_check_space 332
ffffffff816c04a1 tcp_rcv_established 337
ffffffff816bee61 tcp_ack 337
ffffffff816b9611 tcp_parse_aligned_timestamp.part.43 345
ffffffff816cafc1 tcp_prequeue 346
ffffffff816cab21 tcp_v4_do_rcv 351
ffffffff816cba51 tcp_v4_rcv 351
ffffffff816b8b91 tcp_parse_md5sig_option 351
ffffffff816b3fb1 tcp_cleanup_rbuf 436
ffffffff816b64a1 tcp_poll 468
ffffffff816c1f01 tcp_established_options 604
ffffffff816c82f1 tcp_v4_md5_lookup 615
ffffffff816c4e11 tcp_release_cb 736
ffffffff816bec01 tcp_rearm_rto 843
ffffffff816c8261 tcp_md5_do_lookup 968
Detaching...
The current implementation can take many seconds to detach from tracing, after
Ctrl-C has been hit.
Couting all vfs functions for process ID 5276 only:
# ./funccount -p 5276 'vfs_*'
Tracing... Ctrl-C to end.
^C
ADDR FUNC COUNT
ffffffff811e8c71 vfs_getattr_nosec 7
ffffffff811e8d41 vfs_getattr 7
ffffffff811e8dd1 vfs_fstatat 11
ffffffff811e4251 vfs_read 12
ffffffff811e4381 vfs_write 16
Detaching...
This matches when that PID is on-CPU and the kernel function is called.
An interval can be provided. Eg, printing output every 1 second for vfs calls:
# ./funccount -i 1 'vfs_*'
Tracing... Ctrl-C to end.
ADDR FUNC COUNT
ffffffff811e8dd1 vfs_fstatat 1
ffffffff811e8d71 vfs_fstat 16
ffffffff811e8c71 vfs_getattr_nosec 17
ffffffff811e8d41 vfs_getattr 17
ffffffff811e4381 vfs_write 52
ffffffff811e4251 vfs_read 79
ffffffff811e3221 vfs_open 98
ADDR FUNC COUNT
ffffffff811e8dd1 vfs_fstatat 10
ffffffff811e8d71 vfs_fstat 10
ffffffff811e3221 vfs_open 13
ffffffff811e8c71 vfs_getattr_nosec 20
ffffffff811e8d41 vfs_getattr 20
ffffffff811e4381 vfs_write 28
ffffffff811e4251 vfs_read 39
ADDR FUNC COUNT
ffffffff81215191 vfs_fsync_range 2
ffffffff81231df1 vfs_lock_file 30
ffffffff811e4381 vfs_write 107
ffffffff811e8dd1 vfs_fstatat 129
ffffffff811e8d71 vfs_fstat 130
ffffffff811e3221 vfs_open 154
ffffffff811e8c71 vfs_getattr_nosec 222
ffffffff811e8d41 vfs_getattr 222
ffffffff811e4251 vfs_read 384
^C
ADDR FUNC COUNT
ffffffff811e4251 vfs_read 4
ffffffff811e4381 vfs_write 5
Detaching...
This can be useful for making some ad hoc tools, exposing new counts of
kernel activity that aren't visible in other metrics.
Include -T to print timestamps on output.
The "*" wildcard can be used multiple times. Eg, matching functions that contain
the word "readdir":
# ./funccount '*readdir*'
Tracing... Ctrl-C to end.
^C
ADDR FUNC COUNT
ffffffff81260911 ext4_readdir 4
Detaching...
Matching "tcp" then "send":
# ./funccount '*tcp*send*'
Tracing... Ctrl-C to end.
^C
ADDR FUNC COUNT
ffffffff816c5921 tcp_send_ack 4
ffffffff816c6c81 tcp_send_delayed_ack 19
ffffffff816b3ee1 tcp_send_mss 26
ffffffff816b76c1 tcp_sendmsg 26
ffffffff816cb821 tcp_v4_send_check 30
ffffffff816cb731 __tcp_v4_send_check 30
Detaching...
Regular expressions can also be used with the -r option. Eg:
# ./funccount -r '^vfs_[rw]'
Tracing... Ctrl-C to end.
^C
ADDR FUNC COUNT
ffffffff811e4381 vfs_write 26
ffffffff811e4251 vfs_read 42
Detaching...
Full USAGE:
# ./funccount -h
usage: funccount [-h] [-p PID] [-i INTERVAL] [-T] [-r] pattern
Count kernel function calls
positional arguments:
pattern search expression for kernel functions
optional arguments:
-h, --help show this help message and exit
-p PID, --pid PID trace this PID only
-i INTERVAL, --interval INTERVAL
summary interval, seconds
-T, --timestamp include timestamp on output
-r, --regexp use regular expressions. Default is "*" wildcards
only.
examples:
./funccount 'vfs_*' # count kernel functions starting with "vfs"
./funccount 'tcp_send*' # count kernel funcs starting with "tcp_send"
./funccount -r '^vfs.*' # same as above, using regular expressions
./funccount -Ti 5 'vfs_*' # output every 5 seconds, with timestamps
./funccount -p 185 'vfs_*' # count vfs calls for PID 181 only
......@@ -16,12 +16,10 @@ from bcc import BPF
# load BPF program
b = BPF(text = """
int do_sync(void *ctx) {
void kprobe__sys_sync(void *ctx) {
bpf_trace_printk("sync()\\n");
return 0;
};
""")
b.attach_kprobe(event="sys_sync", fn_name="do_sync")
# header
print("%-18s %s" % ("TIME(s)", "CALL"))
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment