Commit c726b61c authored by Ingo Molnar's avatar Ingo Molnar

Merge branch 'perf/core' of...

Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core
parents 7be79236 018378c5
What: /sys/kernel/debug/kmemtrace/
Date: July 2008
Contact: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Description:
In kmemtrace-enabled kernels, the following files are created:
/sys/kernel/debug/kmemtrace/
cpu<n> (0400) Per-CPU tracing data, see below. (binary)
total_overruns (0400) Total number of bytes which were dropped from
cpu<n> files because of full buffer condition,
non-binary. (text)
abi_version (0400) Kernel's kmemtrace ABI version. (text)
Each per-CPU file should be read according to the relay interface. That is,
the reader should set affinity to that specific CPU and, as currently done by
the userspace application (though there are other methods), use poll() with
an infinite timeout before every read(). Otherwise, erroneous data may be
read. The binary data has the following _core_ format:
Event ID (1 byte) Unsigned integer, one of:
0 - represents an allocation (KMEMTRACE_EVENT_ALLOC)
1 - represents a freeing of previously allocated memory
(KMEMTRACE_EVENT_FREE)
Type ID (1 byte) Unsigned integer, one of:
0 - this is a kmalloc() / kfree()
1 - this is a kmem_cache_alloc() / kmem_cache_free()
2 - this is a __get_free_pages() et al.
Event size (2 bytes) Unsigned integer representing the
size of this event. Used to extend
kmemtrace. Discard the bytes you
don't know about.
Sequence number (4 bytes) Signed integer used to reorder data
logged on SMP machines. Wraparound
must be taken into account, although
it is unlikely.
Caller address (8 bytes) Return address to the caller.
Pointer to mem (8 bytes) Pointer to target memory area. Can be
NULL, but not all such calls might be
recorded.
In case of KMEMTRACE_EVENT_ALLOC events, the next fields follow:
Requested bytes (8 bytes) Total number of requested bytes,
unsigned, must not be zero.
Allocated bytes (8 bytes) Total number of actually allocated
bytes, unsigned, must not be lower
than requested bytes.
Requested flags (4 bytes) GFP flags supplied by the caller.
Target CPU (4 bytes) Signed integer, valid for event id 1.
If equal to -1, target CPU is the same
as origin CPU, but the reverse might
not be true.
The data is made available in the same endianness the machine has.
Other event ids and type ids may be defined and added. Other fields may be
added by increasing event size, but see below for details.
Every modification to the ABI, including new id definitions, are followed
by bumping the ABI version by one.
Adding new data to the packet (features) is done at the end of the mandatory
data:
Feature size (2 byte)
Feature ID (1 byte)
Feature data (Feature size - 3 bytes)
Users:
kmemtrace-user - git://repo.or.cz/kmemtrace-user.git
kmemtrace - Kernel Memory Tracer
by Eduard - Gabriel Munteanu
<eduard.munteanu@linux360.ro>
I. Introduction
===============
kmemtrace helps kernel developers figure out two things:
1) how different allocators (SLAB, SLUB etc.) perform
2) how kernel code allocates memory and how much
To do this, we trace every allocation and export information to the userspace
through the relay interface. We export things such as the number of requested
bytes, the number of bytes actually allocated (i.e. including internal
fragmentation), whether this is a slab allocation or a plain kmalloc() and so
on.
The actual analysis is performed by a userspace tool (see section III for
details on where to get it from). It logs the data exported by the kernel,
processes it and (as of writing this) can provide the following information:
- the total amount of memory allocated and fragmentation per call-site
- the amount of memory allocated and fragmentation per allocation
- total memory allocated and fragmentation in the collected dataset
- number of cross-CPU allocation and frees (makes sense in NUMA environments)
Moreover, it can potentially find inconsistent and erroneous behavior in
kernel code, such as using slab free functions on kmalloc'ed memory or
allocating less memory than requested (but not truly failed allocations).
kmemtrace also makes provisions for tracing on some arch and analysing the
data on another.
II. Design and goals
====================
kmemtrace was designed to handle rather large amounts of data. Thus, it uses
the relay interface to export whatever is logged to userspace, which then
stores it. Analysis and reporting is done asynchronously, that is, after the
data is collected and stored. By design, it allows one to log and analyse
on different machines and different arches.
As of writing this, the ABI is not considered stable, though it might not
change much. However, no guarantees are made about compatibility yet. When
deemed stable, the ABI should still allow easy extension while maintaining
backward compatibility. This is described further in Documentation/ABI.
Summary of design goals:
- allow logging and analysis to be done across different machines
- be fast and anticipate usage in high-load environments (*)
- be reasonably extensible
- make it possible for GNU/Linux distributions to have kmemtrace
included in their repositories
(*) - one of the reasons Pekka Enberg's original userspace data analysis
tool's code was rewritten from Perl to C (although this is more than a
simple conversion)
III. Quick usage guide
======================
1) Get a kernel that supports kmemtrace and build it accordingly (i.e. enable
CONFIG_KMEMTRACE).
2) Get the userspace tool and build it:
$ git clone git://repo.or.cz/kmemtrace-user.git # current repository
$ cd kmemtrace-user/
$ ./autogen.sh
$ ./configure
$ make
3) Boot the kmemtrace-enabled kernel if you haven't, preferably in the
'single' runlevel (so that relay buffers don't fill up easily), and run
kmemtrace:
# '$' does not mean user, but root here.
$ mount -t debugfs none /sys/kernel/debug
$ mount -t proc none /proc
$ cd path/to/kmemtrace-user/
$ ./kmemtraced
Wait a bit, then stop it with CTRL+C.
$ cat /sys/kernel/debug/kmemtrace/total_overruns # Check if we didn't
# overrun, should
# be zero.
$ (Optionally) [Run kmemtrace_check separately on each cpu[0-9]*.out file to
check its correctness]
$ ./kmemtrace-report
Now you should have a nice and short summary of how the allocator performs.
IV. FAQ and known issues
========================
Q: 'cat /sys/kernel/debug/kmemtrace/total_overruns' is non-zero, how do I fix
this? Should I worry?
A: If it's non-zero, this affects kmemtrace's accuracy, depending on how
large the number is. You can fix it by supplying a higher
'kmemtrace.subbufs=N' kernel parameter.
---
Q: kmemtrace_check reports errors, how do I fix this? Should I worry?
A: This is a bug and should be reported. It can occur for a variety of
reasons:
- possible bugs in relay code
- possible misuse of relay by kmemtrace
- timestamps being collected unorderly
Or you may fix it yourself and send us a patch.
---
Q: kmemtrace_report shows many errors, how do I fix this? Should I worry?
A: This is a known issue and I'm working on it. These might be true errors
in kernel code, which may have inconsistent behavior (e.g. allocating memory
with kmem_cache_alloc() and freeing it with kfree()). Pekka Enberg pointed
out this behavior may work with SLAB, but may fail with other allocators.
It may also be due to lack of tracing in some unusual allocator functions.
We don't want bug reports regarding this issue yet.
---
V. See also
===========
Documentation/kernel-parameters.txt
Documentation/ABI/testing/debugfs-kmemtrace
......@@ -3368,13 +3368,6 @@ F: include/linux/kmemleak.h
F: mm/kmemleak.c
F: mm/kmemleak-test.c
KMEMTRACE
M: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
S: Maintained
F: Documentation/trace/kmemtrace.txt
F: include/linux/kmemtrace.h
F: kernel/trace/kmemtrace.c
KPROBES
M: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
M: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
......
......@@ -21,3 +21,15 @@
#ifdef CONFIG_FSL_EMB_PERF_EVENT
#include <asm/perf_event_fsl_emb.h>
#endif
#ifdef CONFIG_PERF_EVENTS
#include <asm/ptrace.h>
#include <asm/reg.h>
#define perf_arch_fetch_caller_regs(regs, __ip) \
do { \
(regs)->nip = __ip; \
(regs)->gpr[1] = *(unsigned long *)__get_SP(); \
asm volatile("mfmsr %0" : "=r" ((regs)->msr)); \
} while (0)
#endif
......@@ -127,29 +127,3 @@ _GLOBAL(__setup_cpu_power7)
_GLOBAL(__restore_cpu_power7)
/* place holder */
blr
/*
* Get a minimal set of registers for our caller's nth caller.
* r3 = regs pointer, r5 = n.
*
* We only get R1 (stack pointer), NIP (next instruction pointer)
* and LR (link register). These are all we can get in the
* general case without doing complicated stack unwinding, but
* fortunately they are enough to do a stack backtrace, which
* is all we need them for.
*/
_GLOBAL(perf_arch_fetch_caller_regs)
mr r6,r1
cmpwi r5,0
mflr r4
ble 2f
mtctr r5
1: PPC_LL r6,0(r6)
bdnz 1b
PPC_LL r4,PPC_LR_STKOFF(r6)
2: PPC_LL r7,0(r6)
PPC_LL r7,PPC_LR_STKOFF(r7)
PPC_STL r6,GPR1-STACK_FRAME_OVERHEAD(r3)
PPC_STL r4,_NIP-STACK_FRAME_OVERHEAD(r3)
PPC_STL r7,_LINK-STACK_FRAME_OVERHEAD(r3)
blr
......@@ -6,7 +6,15 @@ extern void set_perf_event_pending(void);
#define PERF_EVENT_INDEX_OFFSET 0
#ifdef CONFIG_PERF_EVENTS
#include <asm/ptrace.h>
extern void init_hw_perf_events(void);
extern void
__perf_arch_fetch_caller_regs(struct pt_regs *regs, unsigned long ip, int skip);
#define perf_arch_fetch_caller_regs(pt_regs, ip) \
__perf_arch_fetch_caller_regs(pt_regs, ip, 1);
#else
static inline void init_hw_perf_events(void) { }
#endif
......
......@@ -47,9 +47,9 @@ stack_trace_flush:
.size stack_trace_flush,.-stack_trace_flush
#ifdef CONFIG_PERF_EVENTS
.globl perf_arch_fetch_caller_regs
.type perf_arch_fetch_caller_regs,#function
perf_arch_fetch_caller_regs:
.globl __perf_arch_fetch_caller_regs
.type __perf_arch_fetch_caller_regs,#function
__perf_arch_fetch_caller_regs:
/* We always read the %pstate into %o5 since we will use
* that to construct a fake %tstate to store into the regs.
*/
......
......@@ -141,6 +141,19 @@ extern unsigned long perf_instruction_pointer(struct pt_regs *regs);
extern unsigned long perf_misc_flags(struct pt_regs *regs);
#define perf_misc_flags(regs) perf_misc_flags(regs)
#include <asm/stacktrace.h>
/*
* We abuse bit 3 from flags to pass exact information, see perf_misc_flags
* and the comment with PERF_EFLAGS_EXACT.
*/
#define perf_arch_fetch_caller_regs(regs, __ip) { \
(regs)->ip = (__ip); \
(regs)->bp = caller_frame_pointer(); \
(regs)->cs = __KERNEL_CS; \
regs->flags = 0; \
}
#else
static inline void init_hw_perf_events(void) { }
static inline void perf_events_lapic_init(void) { }
......
/*
* Copyright (C) 1991, 1992 Linus Torvalds
* Copyright (C) 2000, 2001, 2002 Andi Kleen, SuSE Labs
*/
#ifndef _ASM_X86_STACKTRACE_H
#define _ASM_X86_STACKTRACE_H
#include <linux/uaccess.h>
extern int kstack_depth_to_print;
struct thread_info;
......@@ -42,4 +49,46 @@ void dump_trace(struct task_struct *tsk, struct pt_regs *regs,
unsigned long *stack, unsigned long bp,
const struct stacktrace_ops *ops, void *data);
#ifdef CONFIG_X86_32
#define STACKSLOTS_PER_LINE 8
#define get_bp(bp) asm("movl %%ebp, %0" : "=r" (bp) :)
#else
#define STACKSLOTS_PER_LINE 4
#define get_bp(bp) asm("movq %%rbp, %0" : "=r" (bp) :)
#endif
extern void
show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
unsigned long *stack, unsigned long bp, char *log_lvl);
extern void
show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
unsigned long *sp, unsigned long bp, char *log_lvl);
extern unsigned int code_bytes;
/* The form of the top of the frame on the stack */
struct stack_frame {
struct stack_frame *next_frame;
unsigned long return_address;
};
struct stack_frame_ia32 {
u32 next_frame;
u32 return_address;
};
static inline unsigned long caller_frame_pointer(void)
{
struct stack_frame *frame;
get_bp(frame);
#ifdef CONFIG_FRAME_POINTER
frame = frame->next_frame;
#endif
return (unsigned long)frame;
}
#endif /* _ASM_X86_STACKTRACE_H */
......@@ -1613,8 +1613,6 @@ static const struct stacktrace_ops backtrace_ops = {
.walk_stack = print_context_stack_bp,
};
#include "../dumpstack.h"
static void
perf_callchain_kernel(struct pt_regs *regs, struct perf_callchain_entry *entry)
{
......@@ -1736,22 +1734,6 @@ struct perf_callchain_entry *perf_callchain(struct pt_regs *regs)
return entry;
}
void perf_arch_fetch_caller_regs(struct pt_regs *regs, unsigned long ip, int skip)
{
regs->ip = ip;
/*
* perf_arch_fetch_caller_regs adds another call, we need to increment
* the skip level
*/
regs->bp = rewind_frame_pointer(skip + 1);
regs->cs = __KERNEL_CS;
/*
* We abuse bit 3 to pass exact information, see perf_misc_flags
* and the comment with PERF_EFLAGS_EXACT.
*/
regs->flags = 0;
}
unsigned long perf_instruction_pointer(struct pt_regs *regs)
{
unsigned long ip;
......
......@@ -18,7 +18,6 @@
#include <asm/stacktrace.h>
#include "dumpstack.h"
int panic_on_unrecovered_nmi;
int panic_on_io_nmi;
......
/*
* Copyright (C) 1991, 1992 Linus Torvalds
* Copyright (C) 2000, 2001, 2002 Andi Kleen, SuSE Labs
*/
#ifndef DUMPSTACK_H
#define DUMPSTACK_H
#ifdef CONFIG_X86_32
#define STACKSLOTS_PER_LINE 8
#define get_bp(bp) asm("movl %%ebp, %0" : "=r" (bp) :)
#else
#define STACKSLOTS_PER_LINE 4
#define get_bp(bp) asm("movq %%rbp, %0" : "=r" (bp) :)
#endif
#include <linux/uaccess.h>
extern void
show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
unsigned long *stack, unsigned long bp, char *log_lvl);
extern void
show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
unsigned long *sp, unsigned long bp, char *log_lvl);
extern unsigned int code_bytes;
/* The form of the top of the frame on the stack */
struct stack_frame {
struct stack_frame *next_frame;
unsigned long return_address;
};
struct stack_frame_ia32 {
u32 next_frame;
u32 return_address;
};
static inline unsigned long rewind_frame_pointer(int n)
{
struct stack_frame *frame;
get_bp(frame);
#ifdef CONFIG_FRAME_POINTER
while (n--) {
if (probe_kernel_address(&frame->next_frame, frame))
break;
}
#endif
return (unsigned long)frame;
}
#endif /* DUMPSTACK_H */
......@@ -16,8 +16,6 @@
#include <asm/stacktrace.h>
#include "dumpstack.h"
void dump_trace(struct task_struct *task, struct pt_regs *regs,
unsigned long *stack, unsigned long bp,
......
......@@ -16,7 +16,6 @@
#include <asm/stacktrace.h>
#include "dumpstack.h"
#define N_EXCEPTION_STACKS_END \
(N_EXCEPTION_STACKS + DEBUG_STKSZ/EXCEPTION_STKSZ - 2)
......
......@@ -23,11 +23,16 @@ static int save_stack_stack(void *data, char *name)
return 0;
}
static void save_stack_address(void *data, unsigned long addr, int reliable)
static void
__save_stack_address(void *data, unsigned long addr, bool reliable, bool nosched)
{
struct stack_trace *trace = data;
#ifdef CONFIG_FRAME_POINTER
if (!reliable)
return;
#endif
if (nosched && in_sched_functions(addr))
return;
if (trace->skip > 0) {
trace->skip--;
return;
......@@ -36,20 +41,15 @@ static void save_stack_address(void *data, unsigned long addr, int reliable)
trace->entries[trace->nr_entries++] = addr;
}
static void save_stack_address(void *data, unsigned long addr, int reliable)
{
return __save_stack_address(data, addr, reliable, false);
}
static void
save_stack_address_nosched(void *data, unsigned long addr, int reliable)
{
struct stack_trace *trace = (struct stack_trace *)data;
if (!reliable)
return;
if (in_sched_functions(addr))
return;
if (trace->skip > 0) {
trace->skip--;
return;
}
if (trace->nr_entries < trace->max_entries)
trace->entries[trace->nr_entries++] = addr;
return __save_stack_address(data, addr, reliable, true);
}
static const struct stacktrace_ops save_stack_ops = {
......@@ -96,12 +96,13 @@ EXPORT_SYMBOL_GPL(save_stack_trace_tsk);
/* Userspace stacktrace - based on kernel/trace/trace_sysprof.c */
struct stack_frame {
struct stack_frame_user {
const void __user *next_fp;
unsigned long ret_addr;
};
static int copy_stack_frame(const void __user *fp, struct stack_frame *frame)
static int
copy_stack_frame(const void __user *fp, struct stack_frame_user *frame)
{
int ret;
......@@ -126,7 +127,7 @@ static inline void __save_stack_trace_user(struct stack_trace *trace)
trace->entries[trace->nr_entries++] = regs->ip;
while (trace->nr_entries < trace->max_entries) {
struct stack_frame frame;
struct stack_frame_user frame;
frame.next_fp = NULL;
frame.ret_addr = 0;
......
/*
* Copyright (C) 2008 Eduard - Gabriel Munteanu
*
* This file is released under GPL version 2.
*/
#ifndef _LINUX_KMEMTRACE_H
#define _LINUX_KMEMTRACE_H
#ifdef __KERNEL__
#include <trace/events/kmem.h>
#ifdef CONFIG_KMEMTRACE
extern void kmemtrace_init(void);
#else
static inline void kmemtrace_init(void)
{
}
#endif
#endif /* __KERNEL__ */
#endif /* _LINUX_KMEMTRACE_H */
......@@ -932,8 +932,10 @@ extern atomic_t perf_swevent_enabled[PERF_COUNT_SW_MAX];
extern void __perf_sw_event(u32, u64, int, struct pt_regs *, u64);
extern void
perf_arch_fetch_caller_regs(struct pt_regs *regs, unsigned long ip, int skip);
#ifndef perf_arch_fetch_caller_regs
static inline void
perf_arch_fetch_caller_regs(struct regs *regs, unsigned long ip) { }
#endif
/*
* Take a snapshot of the regs. Skip ip and frame pointer to
......@@ -943,31 +945,11 @@ perf_arch_fetch_caller_regs(struct pt_regs *regs, unsigned long ip, int skip);
* - bp for callchains
* - eflags, for future purposes, just in case
*/
static inline void perf_fetch_caller_regs(struct pt_regs *regs, int skip)
static inline void perf_fetch_caller_regs(struct pt_regs *regs)
{
unsigned long ip;
memset(regs, 0, sizeof(*regs));
switch (skip) {
case 1 :
ip = CALLER_ADDR0;
break;
case 2 :
ip = CALLER_ADDR1;
break;
case 3 :
ip = CALLER_ADDR2;
break;
case 4:
ip = CALLER_ADDR3;
break;
/* No need to support further for now */
default:
ip = 0;
}
return perf_arch_fetch_caller_regs(regs, ip, skip);
perf_arch_fetch_caller_regs(regs, CALLER_ADDR0);
}
static inline void
......@@ -977,7 +959,7 @@ perf_sw_event(u32 event_id, u64 nr, int nmi, struct pt_regs *regs, u64 addr)
struct pt_regs hot_regs;
if (!regs) {
perf_fetch_caller_regs(&hot_regs, 1);
perf_fetch_caller_regs(&hot_regs);
regs = &hot_regs;
}
__perf_sw_event(event_id, nr, nmi, regs, addr);
......
......@@ -14,7 +14,8 @@
#include <asm/page.h> /* kmalloc_sizes.h needs PAGE_SIZE */
#include <asm/cache.h> /* kmalloc_sizes.h needs L1_CACHE_BYTES */
#include <linux/compiler.h>
#include <linux/kmemtrace.h>
#include <trace/events/kmem.h>
#ifndef ARCH_KMALLOC_MINALIGN
/*
......
......@@ -10,9 +10,10 @@
#include <linux/gfp.h>
#include <linux/workqueue.h>
#include <linux/kobject.h>
#include <linux/kmemtrace.h>
#include <linux/kmemleak.h>
#include <trace/events/kmem.h>
enum stat_item {
ALLOC_FASTPATH, /* Allocation from cpu slab */
ALLOC_SLOWPATH, /* Allocation by getting a new cpu slab */
......
#ifndef _LINUX_TRACE_BOOT_H
#define _LINUX_TRACE_BOOT_H
#include <linux/module.h>
#include <linux/kallsyms.h>
#include <linux/init.h>
/*
* Structure which defines the trace of an initcall
* while it is called.
* You don't have to fill the func field since it is
* only used internally by the tracer.
*/
struct boot_trace_call {
pid_t caller;
char func[KSYM_SYMBOL_LEN];
};
/*
* Structure which defines the trace of an initcall
* while it returns.
*/
struct boot_trace_ret {
char func[KSYM_SYMBOL_LEN];
int result;
unsigned long long duration; /* nsecs */
};
#ifdef CONFIG_BOOT_TRACER
/* Append the traces on the ring-buffer */
extern void trace_boot_call(struct boot_trace_call *bt, initcall_t fn);
extern void trace_boot_ret(struct boot_trace_ret *bt, initcall_t fn);
/* Tells the tracer that smp_pre_initcall is finished.
* So we can start the tracing
*/
extern void start_boot_trace(void);
/* Resume the tracing of other necessary events
* such as sched switches
*/
extern void enable_boot_trace(void);
/* Suspend this tracing. Actually, only sched_switches tracing have
* to be suspended. Initcalls doesn't need it.)
*/
extern void disable_boot_trace(void);
#else
static inline
void trace_boot_call(struct boot_trace_call *bt, initcall_t fn) { }
static inline
void trace_boot_ret(struct boot_trace_ret *bt, initcall_t fn) { }
static inline void start_boot_trace(void) { }
static inline void enable_boot_trace(void) { }
static inline void disable_boot_trace(void) { }
#endif /* CONFIG_BOOT_TRACER */
#endif /* __LINUX_TRACE_BOOT_H */
......@@ -705,7 +705,7 @@ perf_trace_##call(void *__data, proto) \
int __data_size; \
int rctx; \
\
perf_fetch_caller_regs(&__regs, 1); \
perf_fetch_caller_regs(&__regs); \
\
__data_size = ftrace_get_offsets_##call(&__data_offsets, args); \
__entry_size = ALIGN(__data_size + sizeof(*entry) + sizeof(u32),\
......
......@@ -66,11 +66,9 @@
#include <linux/ftrace.h>
#include <linux/async.h>
#include <linux/kmemcheck.h>
#include <linux/kmemtrace.h>
#include <linux/sfi.h>
#include <linux/shmem_fs.h>
#include <linux/slab.h>
#include <trace/boot.h>
#include <asm/io.h>
#include <asm/bugs.h>
......@@ -653,7 +651,6 @@ asmlinkage void __init start_kernel(void)
#endif
page_cgroup_init();
enable_debug_pagealloc();
kmemtrace_init();
kmemleak_init();
debug_objects_mem_init();
idr_init_cache();
......@@ -715,38 +712,33 @@ int initcall_debug;
core_param(initcall_debug, initcall_debug, bool, 0644);
static char msgbuf[64];
static struct boot_trace_call call;
static struct boot_trace_ret ret;
int do_one_initcall(initcall_t fn)
{
int count = preempt_count();
ktime_t calltime, delta, rettime;
unsigned long long duration;
int ret;
if (initcall_debug) {
call.caller = task_pid_nr(current);
printk("calling %pF @ %i\n", fn, call.caller);
printk("calling %pF @ %i\n", fn, task_pid_nr(current));
calltime = ktime_get();
trace_boot_call(&call, fn);
enable_boot_trace();
}
ret.result = fn();
ret = fn();
if (initcall_debug) {
disable_boot_trace();
rettime = ktime_get();
delta = ktime_sub(rettime, calltime);
ret.duration = (unsigned long long) ktime_to_ns(delta) >> 10;
trace_boot_ret(&ret, fn);
printk("initcall %pF returned %d after %Ld usecs\n", fn,
ret.result, ret.duration);
duration = (unsigned long long) ktime_to_ns(delta) >> 10;
printk("initcall %pF returned %d after %lld usecs\n", fn,
ret, duration);
}
msgbuf[0] = 0;
if (ret.result && ret.result != -ENODEV && initcall_debug)
sprintf(msgbuf, "error code %d ", ret.result);
if (ret && ret != -ENODEV && initcall_debug)
sprintf(msgbuf, "error code %d ", ret);
if (preempt_count() != count) {
strlcat(msgbuf, "preemption imbalance ", sizeof(msgbuf));
......@@ -760,7 +752,7 @@ int do_one_initcall(initcall_t fn)
printk("initcall %pF returned with %s\n", fn, msgbuf);
}
return ret.result;
return ret;
}
......@@ -880,7 +872,6 @@ static int __init kernel_init(void * unused)
smp_prepare_cpus(setup_max_cpus);
do_pre_smp_initcalls();
start_boot_trace();
smp_init();
sched_init_smp();
......
......@@ -2946,11 +2946,6 @@ __weak struct perf_callchain_entry *perf_callchain(struct pt_regs *regs)
return NULL;
}
__weak
void perf_arch_fetch_caller_regs(struct pt_regs *regs, unsigned long ip, int skip)
{
}
/*
* We assume there is only KVM supporting the callbacks.
......
......@@ -229,23 +229,6 @@ config FTRACE_SYSCALLS
help
Basic tracer to catch the syscall entry and exit events.
config BOOT_TRACER
bool "Trace boot initcalls"
select GENERIC_TRACER
select CONTEXT_SWITCH_TRACER
help
This tracer helps developers to optimize boot times: it records
the timings of the initcalls and traces key events and the identity
of tasks that can cause boot delays, such as context-switches.
Its aim is to be parsed by the scripts/bootgraph.pl tool to
produce pretty graphics about boot inefficiencies, giving a visual
representation of the delays during initcalls - but the raw
/debug/tracing/trace text output is readable too.
You must pass in initcall_debug and ftrace=initcall to the kernel
command line to enable this on bootup.
config TRACE_BRANCH_PROFILING
bool
select GENERIC_TRACER
......@@ -371,26 +354,6 @@ config STACK_TRACER
Say N if unsure.
config KMEMTRACE
bool "Trace SLAB allocations"
select GENERIC_TRACER
help
kmemtrace provides tracing for slab allocator functions, such as
kmalloc, kfree, kmem_cache_alloc, kmem_cache_free, etc. Collected
data is then fed to the userspace application in order to analyse
allocation hotspots, internal fragmentation and so on, making it
possible to see how well an allocator performs, as well as debug
and profile kernel code.
This requires an userspace application to use. See
Documentation/trace/kmemtrace.txt for more information.
Saying Y will make the kernel somewhat larger and slower. However,
if you disable kmemtrace at run-time or boot-time, the performance
impact is minimal (depending on the arch the kernel is built for).
If unsure, say N.
config WORKQUEUE_TRACER
bool "Trace workqueues"
select GENERIC_TRACER
......
......@@ -38,10 +38,8 @@ obj-$(CONFIG_SCHED_TRACER) += trace_sched_wakeup.o
obj-$(CONFIG_NOP_TRACER) += trace_nop.o
obj-$(CONFIG_STACK_TRACER) += trace_stack.o
obj-$(CONFIG_MMIOTRACE) += trace_mmiotrace.o
obj-$(CONFIG_BOOT_TRACER) += trace_boot.o
obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += trace_functions_graph.o
obj-$(CONFIG_TRACE_BRANCH_PROFILING) += trace_branch.o
obj-$(CONFIG_KMEMTRACE) += kmemtrace.o
obj-$(CONFIG_WORKQUEUE_TRACER) += trace_workqueue.o
obj-$(CONFIG_BLK_DEV_IO_TRACE) += blktrace.o
ifeq ($(CONFIG_BLOCK),y)
......
This diff is collapsed.
......@@ -4596,9 +4596,6 @@ __init static int tracer_alloc_buffers(void)
register_tracer(&nop_trace);
current_trace = &nop_trace;
#ifdef CONFIG_BOOT_TRACER
register_tracer(&boot_tracer);
#endif
/* All seems OK, enable tracing */
tracing_disabled = 0;
......
......@@ -9,10 +9,7 @@
#include <linux/mmiotrace.h>
#include <linux/tracepoint.h>
#include <linux/ftrace.h>
#include <trace/boot.h>
#include <linux/kmemtrace.h>
#include <linux/hw_breakpoint.h>
#include <linux/trace_seq.h>
#include <linux/ftrace_event.h>
......@@ -29,26 +26,15 @@ enum trace_type {
TRACE_MMIO_RW,
TRACE_MMIO_MAP,
TRACE_BRANCH,
TRACE_BOOT_CALL,
TRACE_BOOT_RET,
TRACE_GRAPH_RET,
TRACE_GRAPH_ENT,
TRACE_USER_STACK,
TRACE_KMEM_ALLOC,
TRACE_KMEM_FREE,
TRACE_BLK,
TRACE_KSYM,
__TRACE_LAST_TYPE,
};
enum kmemtrace_type_id {
KMEMTRACE_TYPE_KMALLOC = 0, /* kmalloc() or kfree(). */
KMEMTRACE_TYPE_CACHE, /* kmem_cache_*(). */
KMEMTRACE_TYPE_PAGES, /* __get_free_pages() and friends. */
};
extern struct tracer boot_tracer;
#undef __field
#define __field(type, item) type item;
......@@ -209,17 +195,11 @@ extern void __ftrace_bad_type(void);
TRACE_MMIO_RW); \
IF_ASSIGN(var, ent, struct trace_mmiotrace_map, \
TRACE_MMIO_MAP); \
IF_ASSIGN(var, ent, struct trace_boot_call, TRACE_BOOT_CALL);\
IF_ASSIGN(var, ent, struct trace_boot_ret, TRACE_BOOT_RET);\
IF_ASSIGN(var, ent, struct trace_branch, TRACE_BRANCH); \
IF_ASSIGN(var, ent, struct ftrace_graph_ent_entry, \
TRACE_GRAPH_ENT); \
IF_ASSIGN(var, ent, struct ftrace_graph_ret_entry, \
TRACE_GRAPH_RET); \
IF_ASSIGN(var, ent, struct kmemtrace_alloc_entry, \
TRACE_KMEM_ALLOC); \
IF_ASSIGN(var, ent, struct kmemtrace_free_entry, \
TRACE_KMEM_FREE); \
IF_ASSIGN(var, ent, struct ksym_trace_entry, TRACE_KSYM);\
__ftrace_bad_type(); \
} while (0)
......
/*
* ring buffer based initcalls tracer
*
* Copyright (C) 2008 Frederic Weisbecker <fweisbec@gmail.com>
*
*/
#include <linux/init.h>
#include <linux/debugfs.h>
#include <linux/ftrace.h>
#include <linux/kallsyms.h>
#include <linux/time.h>
#include "trace.h"
#include "trace_output.h"
static struct trace_array *boot_trace;
static bool pre_initcalls_finished;
/* Tells the boot tracer that the pre_smp_initcalls are finished.
* So we are ready .
* It doesn't enable sched events tracing however.
* You have to call enable_boot_trace to do so.
*/
void start_boot_trace(void)
{
pre_initcalls_finished = true;
}
void enable_boot_trace(void)
{
if (boot_trace && pre_initcalls_finished)
tracing_start_sched_switch_record();
}
void disable_boot_trace(void)
{
if (boot_trace && pre_initcalls_finished)
tracing_stop_sched_switch_record();
}
static int boot_trace_init(struct trace_array *tr)
{
boot_trace = tr;
if (!tr)
return 0;
tracing_reset_online_cpus(tr);
tracing_sched_switch_assign_trace(tr);
return 0;
}
static enum print_line_t
initcall_call_print_line(struct trace_iterator *iter)
{
struct trace_entry *entry = iter->ent;
struct trace_seq *s = &iter->seq;
struct trace_boot_call *field;
struct boot_trace_call *call;
u64 ts;
unsigned long nsec_rem;
int ret;
trace_assign_type(field, entry);
call = &field->boot_call;
ts = iter->ts;
nsec_rem = do_div(ts, NSEC_PER_SEC);
ret = trace_seq_printf(s, "[%5ld.%09ld] calling %s @ %i\n",
(unsigned long)ts, nsec_rem, call->func, call->caller);
if (!ret)
return TRACE_TYPE_PARTIAL_LINE;
else
return TRACE_TYPE_HANDLED;
}
static enum print_line_t
initcall_ret_print_line(struct trace_iterator *iter)
{
struct trace_entry *entry = iter->ent;
struct trace_seq *s = &iter->seq;
struct trace_boot_ret *field;
struct boot_trace_ret *init_ret;
u64 ts;
unsigned long nsec_rem;
int ret;
trace_assign_type(field, entry);
init_ret = &field->boot_ret;
ts = iter->ts;
nsec_rem = do_div(ts, NSEC_PER_SEC);
ret = trace_seq_printf(s, "[%5ld.%09ld] initcall %s "
"returned %d after %llu msecs\n",
(unsigned long) ts,
nsec_rem,
init_ret->func, init_ret->result, init_ret->duration);
if (!ret)
return TRACE_TYPE_PARTIAL_LINE;
else
return TRACE_TYPE_HANDLED;
}
static enum print_line_t initcall_print_line(struct trace_iterator *iter)
{
struct trace_entry *entry = iter->ent;
switch (entry->type) {
case TRACE_BOOT_CALL:
return initcall_call_print_line(iter);
case TRACE_BOOT_RET:
return initcall_ret_print_line(iter);
default:
return TRACE_TYPE_UNHANDLED;
}
}
struct tracer boot_tracer __read_mostly =
{
.name = "initcall",
.init = boot_trace_init,
.reset = tracing_reset_online_cpus,
.print_line = initcall_print_line,
};
void trace_boot_call(struct boot_trace_call *bt, initcall_t fn)
{
struct ftrace_event_call *call = &event_boot_call;
struct ring_buffer_event *event;
struct ring_buffer *buffer;
struct trace_boot_call *entry;
struct trace_array *tr = boot_trace;
if (!tr || !pre_initcalls_finished)
return;
/* Get its name now since this function could
* disappear because it is in the .init section.
*/
sprint_symbol(bt->func, (unsigned long)fn);
preempt_disable();
buffer = tr->buffer;
event = trace_buffer_lock_reserve(buffer, TRACE_BOOT_CALL,
sizeof(*entry), 0, 0);
if (!event)
goto out;
entry = ring_buffer_event_data(event);
entry->boot_call = *bt;
if (!filter_check_discard(call, entry, buffer, event))
trace_buffer_unlock_commit(buffer, event, 0, 0);
out:
preempt_enable();
}
void trace_boot_ret(struct boot_trace_ret *bt, initcall_t fn)
{
struct ftrace_event_call *call = &event_boot_ret;
struct ring_buffer_event *event;
struct ring_buffer *buffer;
struct trace_boot_ret *entry;
struct trace_array *tr = boot_trace;
if (!tr || !pre_initcalls_finished)
return;
sprint_symbol(bt->func, (unsigned long)fn);
preempt_disable();
buffer = tr->buffer;
event = trace_buffer_lock_reserve(buffer, TRACE_BOOT_RET,
sizeof(*entry), 0, 0);
if (!event)
goto out;
entry = ring_buffer_event_data(event);
entry->boot_ret = *bt;
if (!filter_check_discard(call, entry, buffer, event))
trace_buffer_unlock_commit(buffer, event, 0, 0);
out:
preempt_enable();
}
......@@ -271,33 +271,6 @@ FTRACE_ENTRY(mmiotrace_map, trace_mmiotrace_map,
__entry->map_id, __entry->opcode)
);
FTRACE_ENTRY(boot_call, trace_boot_call,
TRACE_BOOT_CALL,
F_STRUCT(
__field_struct( struct boot_trace_call, boot_call )
__field_desc( pid_t, boot_call, caller )
__array_desc( char, boot_call, func, KSYM_SYMBOL_LEN)
),
F_printk("%d %s", __entry->caller, __entry->func)
);
FTRACE_ENTRY(boot_ret, trace_boot_ret,
TRACE_BOOT_RET,
F_STRUCT(
__field_struct( struct boot_trace_ret, boot_ret )
__array_desc( char, boot_ret, func, KSYM_SYMBOL_LEN)
__field_desc( int, boot_ret, result )
__field_desc( unsigned long, boot_ret, duration )
),
F_printk("%s %d %lx",
__entry->func, __entry->result, __entry->duration)
);
#define TRACE_FUNC_SIZE 30
#define TRACE_FILE_SIZE 20
......@@ -318,41 +291,6 @@ FTRACE_ENTRY(branch, trace_branch,
__entry->func, __entry->file, __entry->correct)
);
FTRACE_ENTRY(kmem_alloc, kmemtrace_alloc_entry,
TRACE_KMEM_ALLOC,
F_STRUCT(
__field( enum kmemtrace_type_id, type_id )
__field( unsigned long, call_site )
__field( const void *, ptr )
__field( size_t, bytes_req )
__field( size_t, bytes_alloc )
__field( gfp_t, gfp_flags )
__field( int, node )
),
F_printk("type:%u call_site:%lx ptr:%p req:%zi alloc:%zi"
" flags:%x node:%d",
__entry->type_id, __entry->call_site, __entry->ptr,
__entry->bytes_req, __entry->bytes_alloc,
__entry->gfp_flags, __entry->node)
);
FTRACE_ENTRY(kmem_free, kmemtrace_free_entry,
TRACE_KMEM_FREE,
F_STRUCT(
__field( enum kmemtrace_type_id, type_id )
__field( unsigned long, call_site )
__field( const void *, ptr )
),
F_printk("type:%u call_site:%lx ptr:%p",
__entry->type_id, __entry->call_site, __entry->ptr)
);
FTRACE_ENTRY(ksym_trace, ksym_trace_entry,
TRACE_KSYM,
......
......@@ -9,8 +9,6 @@
#include <linux/kprobes.h>
#include "trace.h"
EXPORT_SYMBOL_GPL(perf_arch_fetch_caller_regs);
static char *perf_trace_buf[4];
/*
......
......@@ -33,12 +33,13 @@ static DEFINE_MUTEX(sample_timer_lock);
*/
static DEFINE_PER_CPU(struct hrtimer, stack_trace_hrtimer);
struct stack_frame {
struct stack_frame_user {
const void __user *next_fp;
unsigned long return_address;
};
static int copy_stack_frame(const void __user *fp, struct stack_frame *frame)
static int
copy_stack_frame(const void __user *fp, struct stack_frame_user *frame)
{
int ret;
......@@ -125,7 +126,7 @@ trace_kernel(struct pt_regs *regs, struct trace_array *tr,
static void timer_notify(struct pt_regs *regs, int cpu)
{
struct trace_array_cpu *data;
struct stack_frame frame;
struct stack_frame_user frame;
struct trace_array *tr;
const void __user *fp;
int is_user;
......
......@@ -102,7 +102,6 @@
#include <linux/cpu.h>
#include <linux/sysctl.h>
#include <linux/module.h>
#include <linux/kmemtrace.h>
#include <linux/rcupdate.h>
#include <linux/string.h>
#include <linux/uaccess.h>
......
......@@ -66,8 +66,10 @@
#include <linux/module.h>
#include <linux/rcupdate.h>
#include <linux/list.h>
#include <linux/kmemtrace.h>
#include <linux/kmemleak.h>
#include <trace/events/kmem.h>
#include <asm/atomic.h>
/*
......
......@@ -17,7 +17,6 @@
#include <linux/slab.h>
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
#include <linux/kmemtrace.h>
#include <linux/kmemcheck.h>
#include <linux/cpu.h>
#include <linux/cpuset.h>
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment