Commit f1788962 authored by Brenden Blanco's avatar Brenden Blanco

Merge pull request #362 from goldshtn/memleak-enh

Enhancements to memleak.py
parents 849f83d2 43fa0419
......@@ -2,7 +2,8 @@
.SH NAME
memleak \- Print a summary of outstanding allocations and their call stacks to detect memory leaks. Uses Linux eBPF/bcc.
.SH SYNOPSIS
.B memleak [-h] [-p PID] [-t] [-i INTERVAL] [-a] [-o OLDER] [-c COMMAND] [-s SAMPLE_RATE]
.B memleak [-h] [-p PID] [-t] [-a] [-o OLDER] [-c COMMAND] [-s SAMPLE_RATE]
[-d STACK_DEPTH] [-T TOP] [-z MIN_SIZE] [-Z MAX_SIZE] [INTERVAL] [COUNT]
.SH DESCRIPTION
memleak traces and matches memory allocation and deallocation requests, and
collects call stacks for each allocation. memleak can then print a summary
......@@ -11,7 +12,11 @@ of which call stacks performed allocations that weren't subsequently freed.
When tracing a specific process, memleak instruments malloc and free from libc.
When tracing all processes, memleak instruments kmalloc and kfree.
The stack depth is currently limited to 10 (+1 for the current instruction pointer).
memleak may introduce significant overhead when tracing processes that allocate
and free many blocks very quickly. See the OVERHEAD section below.
The stack depth is limited to 10 by default (+1 for the current instruction pointer),
but it can be controlled using the \-d switch if deeper stacks are required.
This currently only works on x86_64. Check for future versions.
.SH REQUIREMENTS
......@@ -27,10 +32,6 @@ Trace this process ID only (filtered in-kernel). This traces malloc and free fro
\-t
Print a trace of all allocation and free requests and results.
.TP
\-i INTERVAL
Print a summary of oustanding allocations and their call stacks every INTERVAL seconds.
The default interval is 5 seconds.
.TP
\-a
Print a list of allocations that weren't freed (and their sizes) in addition to their call stacks.
.TP
......@@ -43,25 +44,56 @@ Run the specified command and trace its allocations only. This traces malloc and
.TP
\-s SAMPLE_RATE
Record roughly every SAMPLE_RATE-th allocation to reduce overhead.
.TP
\-d STACK_DEPTH
Capture STACK_DEPTH frames (or less) when obtaining allocation call stacks.
The default value is 10.
.TP
\-t TOP
Print only the top TOP stacks (sorted by size).
The default value is 10.
.TP
\-z MIN_SIZE
Capture only allocations that are larger than or equal to MIN_SIZE bytes.
.TP
\-Z MAX_SIZE
Capture only allocations that are smaller than or equal to MAX_SIZE bytes.
.TP
INTERVAL
Print a summary of oustanding allocations and their call stacks every INTERVAL seconds.
The default interval is 5 seconds.
.TP
COUNT
Print the outstanding allocations summary COUNT times and then exit.
.SH EXAMPLES
.TP
Print outstanding kernel allocation stacks every 3 seconds:
#
.B memleak -i 3
.B memleak 3
.TP
Print user outstanding allocation stacks and allocation details for the process 1005:
#
.B memleak -p 1005 -a
.TP
Sample roughly every 5th allocation (~20%) of the call stacks and print the top 5
stacks 10 times before quitting.
#
.B memleak -s 5 --top=5 10
.TP
Run ./allocs and print outstanding allocation stacks for that process:
#
.B memleak -c "./allocs"
.TP
Capture only allocations between 16 and 32 bytes in size:
#
.B memleak -z 16 -Z 32
.SH OVERHEAD
memleak can have significant overhead if the target process or kernel performs
allocations at a very high rate. Pathological cases may exhibit up to 100x
degradation in running time. Most of the time, however, memleak shouldn't cause
a significant slowdown. You can also use the \-s switch to reduce the overhead
further by capturing only every N-th allocation.
a significant slowdown. You can use the \-s switch to reduce the overhead
further by capturing only every N-th allocation. The \-z and \-Z switches can
also reduce overhead by capturing only allocations of specific sizes.
To determine the rate at which your application is calling malloc/free, or the
rate at which your kernel is calling kmalloc/kfree, place a probe with perf and
......@@ -73,6 +105,10 @@ placed in a typical period of 10 seconds:
#
.B perf stat -a -e 'probe:__kmalloc' -- sleep 10
Another setting that may help reduce overhead is lowering the number of stack
frames captured and parsed by memleak for each allocation, using the \-d switch.
.SH SOURCE
This is from bcc.
.IP
......
#include <uapi/linux/ptrace.h>
#define MAX_STACK_SIZE 10
struct alloc_info_t {
u64 size;
u64 timestamp_ns;
int num_frames;
u64 callstack[MAX_STACK_SIZE];
};
BPF_HASH(sizes, u64);
BPF_HASH(allocs, u64, struct alloc_info_t);
// Adapted from https://github.com/iovisor/bcc/tools/offcputime.py
static u64 get_frame(u64 *bp) {
if (*bp) {
// The following stack walker is x86_64 specific
u64 ret = 0;
if (bpf_probe_read(&ret, sizeof(ret), (void *)(*bp+8)))
return 0;
if (bpf_probe_read(bp, sizeof(*bp), (void *)*bp))
*bp = 0;
return ret;
}
return 0;
}
static int grab_stack(struct pt_regs *ctx, struct alloc_info_t *info)
{
int depth = 0;
u64 bp = ctx->bp;
if (!(info->callstack[depth++] = get_frame(&bp))) return depth;
if (!(info->callstack[depth++] = get_frame(&bp))) return depth;
if (!(info->callstack[depth++] = get_frame(&bp))) return depth;
if (!(info->callstack[depth++] = get_frame(&bp))) return depth;
if (!(info->callstack[depth++] = get_frame(&bp))) return depth;
if (!(info->callstack[depth++] = get_frame(&bp))) return depth;
if (!(info->callstack[depth++] = get_frame(&bp))) return depth;
if (!(info->callstack[depth++] = get_frame(&bp))) return depth;
if (!(info->callstack[depth++] = get_frame(&bp))) return depth;
if (!(info->callstack[depth++] = get_frame(&bp))) return depth;
return depth;
}
int alloc_enter(struct pt_regs *ctx, size_t size)
{
// Ideally, this should use a random number source, such as
// BPF_FUNC_get_prandom_u32, but that's currently not supported
// by the bcc front-end.
if (SAMPLE_EVERY_N > 1) {
u64 ts = bpf_ktime_get_ns();
if (ts % SAMPLE_EVERY_N != 0)
return 0;
}
u64 pid = bpf_get_current_pid_tgid();
u64 size64 = size;
sizes.update(&pid, &size64);
if (SHOULD_PRINT)
bpf_trace_printk("alloc entered, size = %u\n", size);
return 0;
}
int alloc_exit(struct pt_regs *ctx)
{
u64 address = ctx->ax;
u64 pid = bpf_get_current_pid_tgid();
u64* size64 = sizes.lookup(&pid);
struct alloc_info_t info = {0};
if (size64 == 0)
return 0; // missed alloc entry
info.size = *size64;
sizes.delete(&pid);
info.timestamp_ns = bpf_ktime_get_ns();
info.num_frames = grab_stack(ctx, &info) - 2;
allocs.update(&address, &info);
if (SHOULD_PRINT)
bpf_trace_printk("alloc exited, size = %lu, result = %lx, frames = %d\n", info.size, address, info.num_frames);
return 0;
}
int free_enter(struct pt_regs *ctx, void *address)
{
u64 addr = (u64)address;
struct alloc_info_t *info = allocs.lookup(&addr);
if (info == 0)
return 0;
allocs.delete(&addr);
if (SHOULD_PRINT)
bpf_trace_printk("free entered, address = %lx, size = %lu\n", address, info->size);
return 0;
}
#!/usr/bin/env python
#
# memleak.py Trace and display outstanding allocations to detect
# memory leaks in user-mode processes and the kernel.
#
# USAGE: memleak.py [-h] [-p PID] [-t] [-a] [-o OLDER] [-c COMMAND]
# [-s SAMPLE_RATE] [-d STACK_DEPTH] [-T TOP] [-z MIN_SIZE]
# [-Z MAX_SIZE]
# [interval] [count]
#
# Licensed under the Apache License, Version 2.0 (the "License")
# Copyright (C) 2016 Sasha Goldshtein.
from bcc import BPF
from time import sleep
from datetime import datetime
import argparse
import subprocess
import ctypes
......@@ -150,7 +162,7 @@ EXAMPLES:
allocations every 5 seconds
./memleak.py -p $(pidof allocs) -t
Trace allocations and display each individual call to malloc/free
./memleak.py -p $(pidof allocs) -a -i 10
./memleak.py -ap $(pidof allocs) 10
Trace allocations and display allocated addresses, sizes, and stacks
every 10 seconds for outstanding allocations
./memleak.py -c "./allocs"
......@@ -174,38 +186,161 @@ allocations made with kmalloc/kfree.
parser = argparse.ArgumentParser(description=description,
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=examples)
parser.add_argument("-p", "--pid",
parser.add_argument("-p", "--pid", type=int, default=-1,
help="the PID to trace; if not specified, trace kernel allocs")
parser.add_argument("-t", "--trace", action="store_true",
help="print trace messages for each alloc/free call")
parser.add_argument("-i", "--interval", default=5,
parser.add_argument("interval", nargs="?", default=5, type=int,
help="interval in seconds to print outstanding allocations")
parser.add_argument("count", nargs="?", type=int,
help="number of times to print the report before exiting")
parser.add_argument("-a", "--show-allocs", default=False, action="store_true",
help="show allocation addresses and sizes as well as call stacks")
parser.add_argument("-o", "--older", default=500,
parser.add_argument("-o", "--older", default=500, type=int,
help="prune allocations younger than this age in milliseconds")
parser.add_argument("-c", "--command",
help="execute and trace the specified command")
parser.add_argument("-s", "--sample-rate", default=1,
parser.add_argument("-s", "--sample-rate", default=1, type=int,
help="sample every N-th allocation to decrease the overhead")
parser.add_argument("-d", "--stack-depth", default=10, type=int,
help="maximum stack depth to capture")
parser.add_argument("-T", "--top", type=int, default=10,
help="display only this many top allocating stacks (by size)")
parser.add_argument("-z", "--min-size", type=int,
help="capture only allocations larger than this size")
parser.add_argument("-Z", "--max-size", type=int,
help="capture only allocations smaller than this size")
args = parser.parse_args()
pid = -1 if args.pid is None else int(args.pid)
pid = args.pid
command = args.command
kernel_trace = (pid == -1 and command is None)
trace_all = args.trace
interval = int(args.interval)
min_age_ns = 1e6 * int(args.older)
interval = args.interval
min_age_ns = 1e6 * args.older
sample_every_n = args.sample_rate
num_prints = args.count
max_stack_size = args.stack_depth + 2
top_stacks = args.top
min_size = args.min_size
max_size = args.max_size
if min_size is not None and max_size is not None and min_size > max_size:
print("min_size (-z) can't be greater than max_size (-Z)")
exit(1)
if command is not None:
print("Executing '%s' and tracing the resulting process." % command)
pid = run_command_get_pid(command)
bpf_source = open("memleak.c").read()
bpf_source = """
#include <uapi/linux/ptrace.h>
struct alloc_info_t {
u64 size;
u64 timestamp_ns;
int num_frames;
u64 callstack[MAX_STACK_SIZE];
};
BPF_HASH(sizes, u64);
BPF_HASH(allocs, u64, struct alloc_info_t);
// Adapted from https://github.com/iovisor/bcc/tools/offcputime.py
static u64 get_frame(u64 *bp) {
if (*bp) {
// The following stack walker is x86_64 specific
u64 ret = 0;
if (bpf_probe_read(&ret, sizeof(ret), (void *)(*bp+8)))
return 0;
if (bpf_probe_read(bp, sizeof(*bp), (void *)*bp))
*bp = 0;
return ret;
}
return 0;
}
static int grab_stack(struct pt_regs *ctx, struct alloc_info_t *info)
{
int depth = 0;
u64 bp = ctx->bp;
GRAB_ONE_FRAME
return depth;
}
int alloc_enter(struct pt_regs *ctx, size_t size)
{
SIZE_FILTER
if (SAMPLE_EVERY_N > 1) {
u64 ts = bpf_ktime_get_ns();
if (ts % SAMPLE_EVERY_N != 0)
return 0;
}
u64 pid = bpf_get_current_pid_tgid();
u64 size64 = size;
sizes.update(&pid, &size64);
if (SHOULD_PRINT)
bpf_trace_printk("alloc entered, size = %u\\n", size);
return 0;
}
int alloc_exit(struct pt_regs *ctx)
{
u64 address = ctx->ax;
u64 pid = bpf_get_current_pid_tgid();
u64* size64 = sizes.lookup(&pid);
struct alloc_info_t info = {0};
if (size64 == 0)
return 0; // missed alloc entry
info.size = *size64;
sizes.delete(&pid);
info.timestamp_ns = bpf_ktime_get_ns();
info.num_frames = grab_stack(ctx, &info) - 2;
allocs.update(&address, &info);
if (SHOULD_PRINT) {
bpf_trace_printk("alloc exited, size = %lu, result = %lx, frames = %d\\n",
info.size, address, info.num_frames);
}
return 0;
}
int free_enter(struct pt_regs *ctx, void *address)
{
u64 addr = (u64)address;
struct alloc_info_t *info = allocs.lookup(&addr);
if (info == 0)
return 0;
allocs.delete(&addr);
if (SHOULD_PRINT) {
bpf_trace_printk("free entered, address = %lx, size = %lu\\n",
address, info->size);
}
return 0;
}
"""
bpf_source = bpf_source.replace("SHOULD_PRINT", "1" if trace_all else "0")
bpf_source = bpf_source.replace("SAMPLE_EVERY_N", str(sample_every_n))
bpf_source = bpf_source.replace("GRAB_ONE_FRAME", max_stack_size *
"\tif (!(info->callstack[depth++] = get_frame(&bp))) return depth;\n")
bpf_source = bpf_source.replace("MAX_STACK_SIZE", str(max_stack_size))
size_filter = ""
if min_size is not None and max_size is not None:
size_filter = "if (size < %d || size > %d) return 0;" % \
(min_size, max_size)
elif min_size is not None:
size_filter = "if (size < %d) return 0;" % min_size
elif max_size is not None:
size_filter = "if (size > %d) return 0;" % max_size
bpf_source = bpf_source.replace("SIZE_FILTER", size_filter)
bpf_program = BPF(text=bpf_source)
......@@ -227,7 +362,8 @@ decoder = StackDecoder(pid, bpf_program)
def print_outstanding():
stacks = {}
print("*** Outstanding allocations:")
print("[%s] Top %d stacks with outstanding allocations:" %
(datetime.now().strftime("%H:%M:%S"), top_stacks))
allocs = bpf_program.get_table("allocs")
for address, info in sorted(allocs.items(), key=lambda a: a[1].size):
if Time.monotonic_time() - min_age_ns < info.timestamp_ns:
......@@ -241,11 +377,12 @@ def print_outstanding():
if args.show_allocs:
print("\taddr = %x size = %s" %
(address.value, info.size))
for stack, (count, size) in sorted(stacks.items(),
key=lambda s: s[1][1]):
to_show = sorted(stacks.items(), key=lambda s: s[1][1])[-top_stacks:]
for stack, (count, size) in to_show:
print("\t%d bytes in %d allocations from stack\n\t\t%s" %
(size, count, stack.replace(";", "\n\t\t")))
count_so_far = 0
while True:
if trace_all:
print bpf_program.trace_fields()
......@@ -256,3 +393,6 @@ while True:
exit()
decoder.refresh_code_ranges()
print_outstanding()
count_so_far += 1
if num_prints is not None and count_so_far >= num_prints:
exit()
......@@ -8,12 +8,12 @@ For example:
# ./memleak.py -p $(pidof allocs)
Attaching to malloc and free in pid 5193, Ctrl+C to quit.
*** Outstanding allocations:
[11:16:33] Top 2 stacks with outstanding allocations:
80 bytes in 5 allocations from stack
main+0x6d [/home/vagrant/allocs] (400862)
__libc_start_main+0xf0 [/usr/lib64/libc-2.21.so] (7fd460ac2790)
*** Outstanding allocations:
[11:16:34] Top 2 stacks with outstanding allocations:
160 bytes in 10 allocations from stack
main+0x6d [/home/vagrant/allocs] (400862)
__libc_start_main+0xf0 [/usr/lib64/libc-2.21.so] (7fd460ac2790)
......@@ -34,7 +34,7 @@ prevalent. Use the -a switch:
# ./memleak.py -p $(pidof allocs) -a
Attaching to malloc and free in pid 5193, Ctrl+C to quit.
*** Outstanding allocations:
[11:16:33] Top 2 stacks with outstanding allocations:
addr = 948cd0 size = 16
addr = 948d10 size = 16
addr = 948d30 size = 16
......@@ -43,7 +43,7 @@ Attaching to malloc and free in pid 5193, Ctrl+C to quit.
main+0x6d [/home/vagrant/allocs] (400862)
__libc_start_main+0xf0 [/usr/lib64/libc-2.21.so] (7fd460ac2790)
*** Outstanding allocations:
[11:16:34] Top 2 stacks with outstanding allocations:
addr = 948d50 size = 16
addr = 948cd0 size = 16
addr = 948d10 size = 16
......@@ -110,26 +110,62 @@ To avoid false positives, allocations younger than a certain age (500ms by
default) are not printed. To change this threshold, use the -o switch.
By default, memleak prints its output every 5 seconds. To change this
interval, use the -i switch.
interval, pass the interval as a positional parameter to memleak. You can
also control the number of times the output will be printed before exiting.
For example:
# ./memleak.py 1 10
... will print the outstanding allocation statistics every second, for ten
times, and then exit.
memleak may introduce considerable overhead if your application or kernel is
allocating and freeing memory at a very high rate. In that case, you can
control the overhead by sampling every N-th allocation. For example, to sample
roughly 10% of the allocations and print the outstanding allocations every 5
seconds, 3 times before quitting:
# ./memleak.py -p $(pidof allocs) -s 10 5 3
Attaching to malloc and free in pid 2614, Ctrl+C to quit.
[11:16:33] Top 2 stacks with outstanding allocations:
16 bytes in 1 allocations from stack
main+0x6d [/home/vagrant/allocs] (400862)
__libc_start_main+0xf0 [/usr/lib64/libc-2.21.so] (7fdc11ce8790)
[11:16:38] Top 2 stacks with outstanding allocations:
16 bytes in 1 allocations from stack
main+0x6d [/home/vagrant/allocs] (400862)
__libc_start_main+0xf0 [/usr/lib64/libc-2.21.so] (7fdc11ce8790)
[11:16:43] Top 2 stacks with outstanding allocations:
32 bytes in 2 allocations from stack
main+0x6d [/home/vagrant/allocs] (400862)
__libc_start_main+0xf0 [/usr/lib64/libc-2.21.so] (7fdc11ce8790)
Note that even though the application leaks 16 bytes of memory every second,
the report (printed every 5 seconds) doesn't "see" all the allocations because
of the sampling rate applied.
USAGE message:
# ./memleak.py -h
usage: memleak.py [-h] [-p PID] [-t] [-i INTERVAL] [-a] [-o OLDER]
[-c COMMAND] [-s SAMPLE_RATE]
usage: memleak.py [-h] [-p PID] [-t] [-a] [-o OLDER] [-c COMMAND]
[-s SAMPLE_RATE] [-d STACK_DEPTH] [-T TOP]
[interval] [count]
Trace outstanding memory allocations that weren't freed.
Supports both user-mode allocations made with malloc/free and kernel-mode
allocations made with kmalloc/kfree.
interval interval in seconds to print outstanding allocations
count number of times to print the report before exiting
optional arguments:
-h, --help show this help message and exit
-p PID, --pid PID the PID to trace; if not specified, trace kernel
allocs
-t, --trace print trace messages for each alloc/free call
-i INTERVAL, --interval INTERVAL
interval in seconds to print outstanding allocations
-a, --show-allocs show allocation addresses and sizes as well as call
stacks
-o OLDER, --older OLDER
......@@ -139,6 +175,13 @@ optional arguments:
execute and trace the specified command
-s SAMPLE_RATE, --sample-rate SAMPLE_RATE
sample every N-th allocation to decrease the overhead
-d STACK_DEPTH, --stack_depth STACK_DEPTH
maximum stack depth to capture
-T TOP, --top TOP display only this many top allocating stacks (by size)
-z MIN_SIZE, --min-size MIN_SIZE
capture only allocations larger than this size
-Z MAX_SIZE, --max-size MAX_SIZE
capture only allocations smaller than this size
EXAMPLES:
......@@ -147,7 +190,16 @@ EXAMPLES:
allocations every 5 seconds
./memleak.py -p $(pidof allocs) -t
Trace allocations and display each individual call to malloc/free
./memleak.py -p $(pidof allocs) -a -i 10 Trace allocations and display allocated addresses, sizes, and stacks every 10 seconds for outstanding allocations ./memleak.py -c "./allocs" Run the specified command and trace its allocations ./memleak.py Trace allocations in kernel mode and display a summary of outstanding allocations every 5 seconds ./memleak.py -o 60000 Trace allocations in kernel mode and display a summary of outstanding allocations that are at least one minute (60 seconds) old
./memleak.py -ap $(pidof allocs) 10
Trace allocations and display allocated addresses, sizes, and stacks
every 10 seconds for outstanding allocations
./memleak.py -c "./allocs"
Run the specified command and trace its allocations
./memleak.py
Trace allocations in kernel mode and display a summary of outstanding
allocations every 5 seconds
./memleak.py -o 60000
Trace allocations in kernel mode and display a summary of outstanding
allocations that are at least one minute (60 seconds) old
./memleak.py -s 5
Trace roughly every 5th allocation, to reduce overhead
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment