Commit d2241f4f authored by Sasha Goldshtein's avatar Sasha Goldshtein

Added --stack-depth switch to control the number of stack frames captured for each allocation

parent 75ba13f9
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
.SH NAME .SH NAME
memleak \- Print a summary of outstanding allocations and their call stacks to detect memory leaks. Uses Linux eBPF/bcc. memleak \- Print a summary of outstanding allocations and their call stacks to detect memory leaks. Uses Linux eBPF/bcc.
.SH SYNOPSIS .SH SYNOPSIS
.B memleak [-h] [-p PID] [-t] [-a] [-o OLDER] [-c COMMAND] [-s SAMPLE_RATE] [INTERVAL] [COUNT] .B memleak [-h] [-p PID] [-t] [-a] [-o OLDER] [-c COMMAND] [-s SAMPLE_RATE] [-d STACK_DEPTH] [INTERVAL] [COUNT]
.SH DESCRIPTION .SH DESCRIPTION
memleak traces and matches memory allocation and deallocation requests, and memleak traces and matches memory allocation and deallocation requests, and
collects call stacks for each allocation. memleak can then print a summary collects call stacks for each allocation. memleak can then print a summary
...@@ -11,7 +11,8 @@ of which call stacks performed allocations that weren't subsequently freed. ...@@ -11,7 +11,8 @@ of which call stacks performed allocations that weren't subsequently freed.
When tracing a specific process, memleak instruments malloc and free from libc. When tracing a specific process, memleak instruments malloc and free from libc.
When tracing all processes, memleak instruments kmalloc and kfree. When tracing all processes, memleak instruments kmalloc and kfree.
The stack depth is currently limited to 10 (+1 for the current instruction pointer). The stack depth is limited to 10 by default (+1 for the current instruction pointer),
but it can be controlled using the \-d switch if deeper stacks are required.
This currently only works on x86_64. Check for future versions. This currently only works on x86_64. Check for future versions.
.SH REQUIREMENTS .SH REQUIREMENTS
...@@ -27,13 +28,6 @@ Trace this process ID only (filtered in-kernel). This traces malloc and free fro ...@@ -27,13 +28,6 @@ Trace this process ID only (filtered in-kernel). This traces malloc and free fro
\-t \-t
Print a trace of all allocation and free requests and results. Print a trace of all allocation and free requests and results.
.TP .TP
INTERVAL
Print a summary of oustanding allocations and their call stacks every INTERVAL seconds.
The default interval is 5 seconds.
.TP
COUNT
Print the outstanding allocations summary COUNT times and then exit.
.TP
\-a \-a
Print a list of allocations that weren't freed (and their sizes) in addition to their call stacks. Print a list of allocations that weren't freed (and their sizes) in addition to their call stacks.
.TP .TP
...@@ -46,6 +40,17 @@ Run the specified command and trace its allocations only. This traces malloc and ...@@ -46,6 +40,17 @@ Run the specified command and trace its allocations only. This traces malloc and
.TP .TP
\-s SAMPLE_RATE \-s SAMPLE_RATE
Record roughly every SAMPLE_RATE-th allocation to reduce overhead. Record roughly every SAMPLE_RATE-th allocation to reduce overhead.
.TP
\-d STACK_DEPTH
Capture STACK_DEPTH frames (or less) when obtaining allocation call stacks.
The default value is 10.
.TP
INTERVAL
Print a summary of oustanding allocations and their call stacks every INTERVAL seconds.
The default interval is 5 seconds.
.TP
COUNT
Print the outstanding allocations summary COUNT times and then exit.
.SH EXAMPLES .SH EXAMPLES
.TP .TP
Print outstanding kernel allocation stacks every 3 seconds: Print outstanding kernel allocation stacks every 3 seconds:
...@@ -76,6 +81,10 @@ placed in a typical period of 10 seconds: ...@@ -76,6 +81,10 @@ placed in a typical period of 10 seconds:
# #
.B perf stat -a -e 'probe:__kmalloc' -- sleep 10 .B perf stat -a -e 'probe:__kmalloc' -- sleep 10
Another setting that may help reduce overhead is lowering the number of stack
frames captured and parsed by memleak for each allocation, using the \-d switch.
.SH SOURCE .SH SOURCE
This is from bcc. This is from bcc.
.IP .IP
......
#include <uapi/linux/ptrace.h> #include <uapi/linux/ptrace.h>
#define MAX_STACK_SIZE 10
struct alloc_info_t { struct alloc_info_t {
u64 size; u64 size;
u64 timestamp_ns; u64 timestamp_ns;
...@@ -29,16 +27,7 @@ static int grab_stack(struct pt_regs *ctx, struct alloc_info_t *info) ...@@ -29,16 +27,7 @@ static int grab_stack(struct pt_regs *ctx, struct alloc_info_t *info)
{ {
int depth = 0; int depth = 0;
u64 bp = ctx->bp; u64 bp = ctx->bp;
if (!(info->callstack[depth++] = get_frame(&bp))) return depth; GRAB_ONE_FRAME
if (!(info->callstack[depth++] = get_frame(&bp))) return depth;
if (!(info->callstack[depth++] = get_frame(&bp))) return depth;
if (!(info->callstack[depth++] = get_frame(&bp))) return depth;
if (!(info->callstack[depth++] = get_frame(&bp))) return depth;
if (!(info->callstack[depth++] = get_frame(&bp))) return depth;
if (!(info->callstack[depth++] = get_frame(&bp))) return depth;
if (!(info->callstack[depth++] = get_frame(&bp))) return depth;
if (!(info->callstack[depth++] = get_frame(&bp))) return depth;
if (!(info->callstack[depth++] = get_frame(&bp))) return depth;
return depth; return depth;
} }
......
...@@ -174,7 +174,7 @@ allocations made with kmalloc/kfree. ...@@ -174,7 +174,7 @@ allocations made with kmalloc/kfree.
parser = argparse.ArgumentParser(description=description, parser = argparse.ArgumentParser(description=description,
formatter_class=argparse.RawDescriptionHelpFormatter, formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=examples) epilog=examples)
parser.add_argument("-p", "--pid", type=int, parser.add_argument("-p", "--pid", type=int, default=-1,
help="the PID to trace; if not specified, trace kernel allocs") help="the PID to trace; if not specified, trace kernel allocs")
parser.add_argument("-t", "--trace", action="store_true", parser.add_argument("-t", "--trace", action="store_true",
help="print trace messages for each alloc/free call") help="print trace messages for each alloc/free call")
...@@ -190,10 +190,12 @@ parser.add_argument("-c", "--command", ...@@ -190,10 +190,12 @@ parser.add_argument("-c", "--command",
help="execute and trace the specified command") help="execute and trace the specified command")
parser.add_argument("-s", "--sample-rate", default=1, type=int, parser.add_argument("-s", "--sample-rate", default=1, type=int,
help="sample every N-th allocation to decrease the overhead") help="sample every N-th allocation to decrease the overhead")
parser.add_argument("-d", "--stack_depth", default=10, type=int,
help="maximum stack depth to capture")
args = parser.parse_args() args = parser.parse_args()
pid = -1 if args.pid is None else args.pid pid = args.pid
command = args.command command = args.command
kernel_trace = (pid == -1 and command is None) kernel_trace = (pid == -1 and command is None)
trace_all = args.trace trace_all = args.trace
...@@ -201,6 +203,7 @@ interval = args.interval ...@@ -201,6 +203,7 @@ interval = args.interval
min_age_ns = 1e6 * args.older min_age_ns = 1e6 * args.older
sample_every_n = args.sample_rate sample_every_n = args.sample_rate
num_prints = args.count num_prints = args.count
max_stack_size = args.stack_depth + 2
if command is not None: if command is not None:
print("Executing '%s' and tracing the resulting process." % command) print("Executing '%s' and tracing the resulting process." % command)
...@@ -209,7 +212,9 @@ if command is not None: ...@@ -209,7 +212,9 @@ if command is not None:
bpf_source = open("memleak.c").read() bpf_source = open("memleak.c").read()
bpf_source = bpf_source.replace("SHOULD_PRINT", "1" if trace_all else "0") bpf_source = bpf_source.replace("SHOULD_PRINT", "1" if trace_all else "0")
bpf_source = bpf_source.replace("SAMPLE_EVERY_N", str(sample_every_n)) bpf_source = bpf_source.replace("SAMPLE_EVERY_N", str(sample_every_n))
bpf_source = bpf_source.replace("GRAB_ONE_FRAME", max_stack_size *
"\tif (!(info->callstack[depth++] = get_frame(&bp))) return depth;\n")
bpf_source = bpf_source.replace("MAX_STACK_SIZE", str(max_stack_size))
bpf_program = BPF(text=bpf_source) bpf_program = BPF(text=bpf_source)
if not kernel_trace: if not kernel_trace:
......
...@@ -119,12 +119,39 @@ For example: ...@@ -119,12 +119,39 @@ For example:
... will print the outstanding allocation statistics every second, for ten ... will print the outstanding allocation statistics every second, for ten
times, and then exit. times, and then exit.
memleak may introduce considerable overhead if your application or kernel is
allocating and freeing memory at a very high rate. In that case, you can
control the overhead by sampling every N-th allocation. For example, to sample
roughly 10% of the allocations and print the outstanding allocations every 5
seconds, 3 times before quitting:
# ./memleak.py -p $(pidof allocs) -s 10 5 3
Attaching to malloc and free in pid 2614, Ctrl+C to quit.
*** Outstanding allocations:
16 bytes in 1 allocations from stack
main+0x6d [/home/vagrant/allocs] (400862)
__libc_start_main+0xf0 [/usr/lib64/libc-2.21.so] (7fdc11ce8790)
*** Outstanding allocations:
16 bytes in 1 allocations from stack
main+0x6d [/home/vagrant/allocs] (400862)
__libc_start_main+0xf0 [/usr/lib64/libc-2.21.so] (7fdc11ce8790)
*** Outstanding allocations:
32 bytes in 2 allocations from stack
main+0x6d [/home/vagrant/allocs] (400862)
__libc_start_main+0xf0 [/usr/lib64/libc-2.21.so] (7fdc11ce8790)
Note that even though the application leaks 16 bytes of memory every second,
the report (printed every 5 seconds) doesn't "see" all the allocations because
of the sampling rate applied.
USAGE message: USAGE message:
# ./memleak.py -h # ./memleak.py -h
usage: memleak.py [-h] [-p PID] [-t] [-a] [-o OLDER] [-c COMMAND] usage: memleak.py [-h] [-p PID] [-t] [-a] [-o OLDER] [-c COMMAND]
[-s SAMPLE_RATE] [-s SAMPLE_RATE] [-d STACK_DEPTH]
[interval] [count] [interval] [count]
Trace outstanding memory allocations that weren't freed. Trace outstanding memory allocations that weren't freed.
...@@ -148,6 +175,8 @@ optional arguments: ...@@ -148,6 +175,8 @@ optional arguments:
execute and trace the specified command execute and trace the specified command
-s SAMPLE_RATE, --sample-rate SAMPLE_RATE -s SAMPLE_RATE, --sample-rate SAMPLE_RATE
sample every N-th allocation to decrease the overhead sample every N-th allocation to decrease the overhead
-d STACK_DEPTH, --stack_depth STACK_DEPTH
maximum stack depth to capture
EXAMPLES: EXAMPLES:
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment