Commit 0d99b708 authored by Linus Torvalds's avatar Linus Torvalds

Merge branches 'perf-urgent-for-linus' and 'perf-core-for-linus' of...

Merge branches 'perf-urgent-for-linus' and 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf changes from Ingo Molnar:
 "As a first remark I'd like to point out that the obsolete '-f'
  (--force) option, which has not done anything for several releases,
  has been removed from 'perf record' and related utilities.  Everyone
  please update muscle memory accordingly! :-)

  Main changes on the perf kernel side:

   - Performance optimizations:
        . for trace events, by Steve Rostedt.
        . for time values, by Peter Zijlstra

   - New hardware support:
        . for Intel Silvermont (22nm Atom) CPUs, by Zheng Yan
        . for Intel SNB-EP uncore PMUs, by Zheng Yan

   - Enhanced hardware support:
        . for Intel uncore PMUs: add filter support for QPI boxes, by Zheng Yan

   - Core perf events code enhancements and fixes:
        . for full-nohz feature handling, by Frederic Weisbecker
        . for group events, by Jiri Olsa
        . for call chains, by Frederic Weisbecker
        . for event stream parsing, by Adrian Hunter

   - New ABI details:
        . Add attr->mmap2 attribute, by Stephane Eranian
        . Add PERF_EVENT_IOC_ID ioctl to return event ID, by Jiri Olsa
        . Export u64 time_zero on the mmap header page to allow TSC
          calculation, by Adrian Hunter
        . Add dummy software event, by Adrian Hunter.
        . Add a new PERF_SAMPLE_IDENTIFIER to make samples always
          parseable, by Adrian Hunter.
        . Make Power7 events available via sysfs, by Runzhen Wang.

   - Code cleanups and refactorings:
        . for nohz-full, by Frederic Weisbecker
        . for group events, by Jiri Olsa

   - Documentation updates:
        . for perf_event_type, by Peter Zijlstra

  Main changes on the perf tooling side (some of these tooling changes
  utilize the above kernel side changes):

   - Lots of 'perf trace' enhancements:

        . Make 'perf trace' command line arguments consistent with
          'perf record', by David Ahern.

        . Allow specifying syscalls a la strace, by Arnaldo Carvalho de Melo.

        . Add --verbose and -o/--output options, by Arnaldo Carvalho de Melo.

        . Support ! in -e expressions, to filter a list of syscalls,
          by Arnaldo Carvalho de Melo.

        . Arg formatting improvements to allow masking arguments in
          syscalls such as futex and open, where the some arguments are
          ignored and thus should not be printed depending on other args,
          by Arnaldo Carvalho de Melo.

        . Beautify futex open, openat, open_by_handle_at, lseek and futex
          syscalls, by Arnaldo Carvalho de Melo.

        . Add option to analyze events in a file versus live, so that
          one can do:

           [root@zoo ~]# perf record -a -e raw_syscalls:* sleep 1
           [ perf record: Woken up 0 times to write data ]
           [ perf record: Captured and wrote 25.150 MB perf.data (~1098836 samples) ]
           [root@zoo ~]# perf trace -i perf.data -e futex --duration 1
              17.799 ( 1.020 ms): 7127 futex(uaddr: 0x7fff3f6c6674, op: 393, val: 1, utime: 0x7fff3f6c6470, ua
             113.344 (95.429 ms): 7127 futex(uaddr: 0x7fff3f6c6674, op: 393, val: 1, utime: 0x7fff3f6c6470, uaddr2: 0x7fff3f6c6648, val3: 4294967
             133.778 ( 1.042 ms): 18004 futex(uaddr: 0x7fff3f6c6674, op: 393, val: 1, utime: 0x7fff3f6c6470, uaddr2: 0x7fff3f6c6648, val3: 429496
           [root@zoo ~]#

          By David Ahern.

        . Honor target pid / tid options when analyzing a file, by David Ahern.

        . Introduce better formatting of syscall arguments, including so
          far beautifiers for mmap, madvise, syscall return values,
          by Arnaldo Carvalho de Melo.

        . Handle HUGEPAGE defines in the mmap beautifier, by David Ahern.

   - 'perf report/top' enhancements:

        . Do annotation using /proc/kcore and /proc/kallsyms when
          available, removing the forced need for a vmlinux file kernel
          assembly annotation. This also improves this use case because
          vmlinux has just the initial kernel image, not what is actually
          in use after various code patchings by things like alternatives.
          By Adrian Hunter.

        . Add --ignore-callees=<regex> option to collapse undesired parts
          of call graphs, by Greg Price.

        . Simplify symbol filtering by doing it at machine class level,
          by Adrian Hunter.

        . Add support for callchains in the gtk UI, by Namhyung Kim.

        . Add --objdump option to 'perf top', by Sukadev Bhattiprolu.

   - 'perf kvm' enhancements:

        . Add option to print only events that exceed a specified time
          duration, by David Ahern.

        . Improve stack trace printing, by David Ahern.

        . Update documentation of the live command, by David Ahern

        . Add perf kvm stat live mode that combines aspects of 'perf kvm
          stat' record and report, by David Ahern.

        . Add option to analyze specific VM in perf kvm stat report, by
          David Ahern.

        . Do not require /lib/modules/* on a guest, by Jason Wessel.

   - 'perf script' enhancements:

        . Fix symbol offset computation for some dsos, by David Ahern.

        . Fix named threads support, by David Ahern.

        . Don't install scripting files files when perl/python support
          is disabled, by Arnaldo Carvalho de Melo.

   - 'perf test' enhancements:

        . Add various improvements and fixes to the "vmlinux matches
          kallsyms" 'perf test' entry, related to the /proc/kcore
          annotation feature. By Adrian Hunter.

        . Add sample parsing test, by Adrian Hunter.

        . Add test for reading object code, by Adrian Hunter.

        . Add attr record group sampling test, by Jiri Olsa.

        . Misc testing infrastructure improvements and other details,
          by Jiri Olsa.

   - 'perf list' enhancements:

        . Skip unsupported hardware events, by Namhyung Kim.

        . List pmu events, by Andi Kleen.

   - 'perf diff' enhancements:

        . Add support for more than two files comparison, by Jiri Olsa.

   - 'perf sched' enhancements:

        . Various improvements, including removing reliance on some
          scheduler tracepoints that provide the same information as the
          PERF_RECORD_{FORK,EXIT} events. By David Ahern.

        . Remove odd build stall by moving a large struct initialization
          from a local variable to a global one, by Namhyung Kim.

   - 'perf stat' enhancements:

        . Add --initial-delay option to skip measuring for a defined
          startup phase, by Andi Kleen.

   - Generic perf tooling infrastructure/plumbing changes:

        . Tidy up sample parsing validation, by Adrian Hunter.

        . Fix up jobserver setup in libtraceevent Makefile.
          by Arnaldo Carvalho de Melo.

        . Debug improvements, by Adrian Hunter.

        . Fix correlation of samples coming after PERF_RECORD_EXIT event,
          by David Ahern.

        . Improve robustness of the topology parsing code,
          by Stephane Eranian.

        . Add group leader sampling, that allows just one event in a group
          to sample while the other events have just its values read,
          by Jiri Olsa.

        . Add support for a new modifier "D", which requests that the
          event, or group of events, be pinned to the PMU.
          By Michael Ellerman.

        . Support callchain sorting based on addresses, by Andi Kleen

        . Prep work for multi perf data file storage, by Jiri Olsa.

        . libtraceevent cleanups, by Namhyung Kim.

  And lots and lots of other fixes and code reorganizations that did not
  make it into the list, see the shortlog, diffstat and the Git log for
  details!"

[ Also merge a leftover from the 3.11 cycle ]

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Prevent race in unthrottling code

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (237 commits)
  perf trace: Tell arg formatters the arg index
  perf trace: Add beautifier for open's flags arg
  perf trace: Add beautifier for lseek's whence arg
  perf tools: Fix symbol offset computation for some dsos
  perf list: Skip unsupported events
  perf tests: Add 'keep tracking' test
  perf tools: Add support for PERF_COUNT_SW_DUMMY
  perf: Add a dummy software event to keep tracking
  perf trace: Add beautifier for futex 'operation' parm
  perf trace: Allow syscall arg formatters to mask args
  perf: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node()
  perf: Export struct perf_branch_entry to userspace
  perf: Add attr->mmap2 attribute to an event
  perf/x86: Add Silvermont (22nm Atom) support
  perf/x86: use INTEL_UEVENT_EXTRA_REG to define MSR_OFFCORE_RSP_X
  perf trace: Handle missing HUGEPAGE defines
  perf trace: Honor target pid / tid options when analyzing a file
  perf trace: Add option to analyze events in a file versus live
  perf evlist: Add tracepoint lookup by name
  perf tests: Add a sample parsing test
  ...
......@@ -138,11 +138,11 @@ extern ssize_t power_events_sysfs_show(struct device *dev,
#define EVENT_PTR(_id, _suffix) &EVENT_VAR(_id, _suffix).attr.attr
#define EVENT_ATTR(_name, _id, _suffix) \
PMU_EVENT_ATTR(_name, EVENT_VAR(_id, _suffix), PME_PM_##_id, \
PMU_EVENT_ATTR(_name, EVENT_VAR(_id, _suffix), PME_##_id, \
power_events_sysfs_show)
#define GENERIC_EVENT_ATTR(_name, _id) EVENT_ATTR(_name, _id, _g)
#define GENERIC_EVENT_PTR(_id) EVENT_PTR(_id, _g)
#define POWER_EVENT_ATTR(_name, _id) EVENT_ATTR(PM_##_name, _id, _p)
#define POWER_EVENT_ATTR(_name, _id) EVENT_ATTR(_name, _id, _p)
#define POWER_EVENT_PTR(_id) EVENT_PTR(_id, _p)
This diff is collapsed.
......@@ -53,37 +53,13 @@
/*
* Power7 event codes.
*/
#define PME_PM_CYC 0x1e
#define PME_PM_GCT_NOSLOT_CYC 0x100f8
#define PME_PM_CMPLU_STALL 0x4000a
#define PME_PM_INST_CMPL 0x2
#define PME_PM_LD_REF_L1 0xc880
#define PME_PM_LD_MISS_L1 0x400f0
#define PME_PM_BRU_FIN 0x10068
#define PME_PM_BR_MPRED 0x400f6
#define PME_PM_CMPLU_STALL_FXU 0x20014
#define PME_PM_CMPLU_STALL_DIV 0x40014
#define PME_PM_CMPLU_STALL_SCALAR 0x40012
#define PME_PM_CMPLU_STALL_SCALAR_LONG 0x20018
#define PME_PM_CMPLU_STALL_VECTOR 0x2001c
#define PME_PM_CMPLU_STALL_VECTOR_LONG 0x4004a
#define PME_PM_CMPLU_STALL_LSU 0x20012
#define PME_PM_CMPLU_STALL_REJECT 0x40016
#define PME_PM_CMPLU_STALL_ERAT_MISS 0x40018
#define PME_PM_CMPLU_STALL_DCACHE_MISS 0x20016
#define PME_PM_CMPLU_STALL_STORE 0x2004a
#define PME_PM_CMPLU_STALL_THRD 0x1001c
#define PME_PM_CMPLU_STALL_IFU 0x4004c
#define PME_PM_CMPLU_STALL_BRU 0x4004e
#define PME_PM_GCT_NOSLOT_IC_MISS 0x2001a
#define PME_PM_GCT_NOSLOT_BR_MPRED 0x4001a
#define PME_PM_GCT_NOSLOT_BR_MPRED_IC_MISS 0x4001c
#define PME_PM_GRP_CMPL 0x30004
#define PME_PM_1PLUS_PPC_CMPL 0x100f2
#define PME_PM_CMPLU_STALL_DFU 0x2003c
#define PME_PM_RUN_CYC 0x200f4
#define PME_PM_RUN_INST_CMPL 0x400fa
#define EVENT(_name, _code) \
PME_##_name = _code,
enum {
#include "power7-events-list.h"
};
#undef EVENT
/*
* Layout of constraint bits:
......@@ -398,96 +374,36 @@ static int power7_cache_events[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
};
GENERIC_EVENT_ATTR(cpu-cycles, CYC);
GENERIC_EVENT_ATTR(stalled-cycles-frontend, GCT_NOSLOT_CYC);
GENERIC_EVENT_ATTR(stalled-cycles-backend, CMPLU_STALL);
GENERIC_EVENT_ATTR(instructions, INST_CMPL);
GENERIC_EVENT_ATTR(cache-references, LD_REF_L1);
GENERIC_EVENT_ATTR(cache-misses, LD_MISS_L1);
GENERIC_EVENT_ATTR(branch-instructions, BRU_FIN);
GENERIC_EVENT_ATTR(branch-misses, BR_MPRED);
POWER_EVENT_ATTR(CYC, CYC);
POWER_EVENT_ATTR(GCT_NOSLOT_CYC, GCT_NOSLOT_CYC);
POWER_EVENT_ATTR(CMPLU_STALL, CMPLU_STALL);
POWER_EVENT_ATTR(INST_CMPL, INST_CMPL);
POWER_EVENT_ATTR(LD_REF_L1, LD_REF_L1);
POWER_EVENT_ATTR(LD_MISS_L1, LD_MISS_L1);
POWER_EVENT_ATTR(BRU_FIN, BRU_FIN)
POWER_EVENT_ATTR(BR_MPRED, BR_MPRED);
POWER_EVENT_ATTR(CMPLU_STALL_FXU, CMPLU_STALL_FXU);
POWER_EVENT_ATTR(CMPLU_STALL_DIV, CMPLU_STALL_DIV);
POWER_EVENT_ATTR(CMPLU_STALL_SCALAR, CMPLU_STALL_SCALAR);
POWER_EVENT_ATTR(CMPLU_STALL_SCALAR_LONG, CMPLU_STALL_SCALAR_LONG);
POWER_EVENT_ATTR(CMPLU_STALL_VECTOR, CMPLU_STALL_VECTOR);
POWER_EVENT_ATTR(CMPLU_STALL_VECTOR_LONG, CMPLU_STALL_VECTOR_LONG);
POWER_EVENT_ATTR(CMPLU_STALL_LSU, CMPLU_STALL_LSU);
POWER_EVENT_ATTR(CMPLU_STALL_REJECT, CMPLU_STALL_REJECT);
POWER_EVENT_ATTR(CMPLU_STALL_ERAT_MISS, CMPLU_STALL_ERAT_MISS);
POWER_EVENT_ATTR(CMPLU_STALL_DCACHE_MISS, CMPLU_STALL_DCACHE_MISS);
POWER_EVENT_ATTR(CMPLU_STALL_STORE, CMPLU_STALL_STORE);
POWER_EVENT_ATTR(CMPLU_STALL_THRD, CMPLU_STALL_THRD);
POWER_EVENT_ATTR(CMPLU_STALL_IFU, CMPLU_STALL_IFU);
POWER_EVENT_ATTR(CMPLU_STALL_BRU, CMPLU_STALL_BRU);
POWER_EVENT_ATTR(GCT_NOSLOT_IC_MISS, GCT_NOSLOT_IC_MISS);
POWER_EVENT_ATTR(GCT_NOSLOT_BR_MPRED, GCT_NOSLOT_BR_MPRED);
POWER_EVENT_ATTR(GCT_NOSLOT_BR_MPRED_IC_MISS, GCT_NOSLOT_BR_MPRED_IC_MISS);
POWER_EVENT_ATTR(GRP_CMPL, GRP_CMPL);
POWER_EVENT_ATTR(1PLUS_PPC_CMPL, 1PLUS_PPC_CMPL);
POWER_EVENT_ATTR(CMPLU_STALL_DFU, CMPLU_STALL_DFU);
POWER_EVENT_ATTR(RUN_CYC, RUN_CYC);
POWER_EVENT_ATTR(RUN_INST_CMPL, RUN_INST_CMPL);
GENERIC_EVENT_ATTR(cpu-cycles, PM_CYC);
GENERIC_EVENT_ATTR(stalled-cycles-frontend, PM_GCT_NOSLOT_CYC);
GENERIC_EVENT_ATTR(stalled-cycles-backend, PM_CMPLU_STALL);
GENERIC_EVENT_ATTR(instructions, PM_INST_CMPL);
GENERIC_EVENT_ATTR(cache-references, PM_LD_REF_L1);
GENERIC_EVENT_ATTR(cache-misses, PM_LD_MISS_L1);
GENERIC_EVENT_ATTR(branch-instructions, PM_BRU_FIN);
GENERIC_EVENT_ATTR(branch-misses, PM_BR_MPRED);
#define EVENT(_name, _code) POWER_EVENT_ATTR(_name, _name);
#include "power7-events-list.h"
#undef EVENT
#define EVENT(_name, _code) POWER_EVENT_PTR(_name),
static struct attribute *power7_events_attr[] = {
GENERIC_EVENT_PTR(CYC),
GENERIC_EVENT_PTR(GCT_NOSLOT_CYC),
GENERIC_EVENT_PTR(CMPLU_STALL),
GENERIC_EVENT_PTR(INST_CMPL),
GENERIC_EVENT_PTR(LD_REF_L1),
GENERIC_EVENT_PTR(LD_MISS_L1),
GENERIC_EVENT_PTR(BRU_FIN),
GENERIC_EVENT_PTR(BR_MPRED),
POWER_EVENT_PTR(CYC),
POWER_EVENT_PTR(GCT_NOSLOT_CYC),
POWER_EVENT_PTR(CMPLU_STALL),
POWER_EVENT_PTR(INST_CMPL),
POWER_EVENT_PTR(LD_REF_L1),
POWER_EVENT_PTR(LD_MISS_L1),
POWER_EVENT_PTR(BRU_FIN),
POWER_EVENT_PTR(BR_MPRED),
POWER_EVENT_PTR(CMPLU_STALL_FXU),
POWER_EVENT_PTR(CMPLU_STALL_DIV),
POWER_EVENT_PTR(CMPLU_STALL_SCALAR),
POWER_EVENT_PTR(CMPLU_STALL_SCALAR_LONG),
POWER_EVENT_PTR(CMPLU_STALL_VECTOR),
POWER_EVENT_PTR(CMPLU_STALL_VECTOR_LONG),
POWER_EVENT_PTR(CMPLU_STALL_LSU),
POWER_EVENT_PTR(CMPLU_STALL_REJECT),
POWER_EVENT_PTR(CMPLU_STALL_ERAT_MISS),
POWER_EVENT_PTR(CMPLU_STALL_DCACHE_MISS),
POWER_EVENT_PTR(CMPLU_STALL_STORE),
POWER_EVENT_PTR(CMPLU_STALL_THRD),
POWER_EVENT_PTR(CMPLU_STALL_IFU),
POWER_EVENT_PTR(CMPLU_STALL_BRU),
POWER_EVENT_PTR(GCT_NOSLOT_IC_MISS),
POWER_EVENT_PTR(GCT_NOSLOT_BR_MPRED),
POWER_EVENT_PTR(GCT_NOSLOT_BR_MPRED_IC_MISS),
POWER_EVENT_PTR(GRP_CMPL),
POWER_EVENT_PTR(1PLUS_PPC_CMPL),
POWER_EVENT_PTR(CMPLU_STALL_DFU),
POWER_EVENT_PTR(RUN_CYC),
POWER_EVENT_PTR(RUN_INST_CMPL),
GENERIC_EVENT_PTR(PM_CYC),
GENERIC_EVENT_PTR(PM_GCT_NOSLOT_CYC),
GENERIC_EVENT_PTR(PM_CMPLU_STALL),
GENERIC_EVENT_PTR(PM_INST_CMPL),
GENERIC_EVENT_PTR(PM_LD_REF_L1),
GENERIC_EVENT_PTR(PM_LD_MISS_L1),
GENERIC_EVENT_PTR(PM_BRU_FIN),
GENERIC_EVENT_PTR(PM_BR_MPRED),
#include "power7-events-list.h"
#undef EVENT
NULL
};
static struct attribute_group power7_pmu_events_group = {
.name = "events",
.attrs = power7_events_attr,
......
......@@ -82,7 +82,6 @@ config X86
select HAVE_USER_RETURN_NOTIFIER
select ARCH_BINFMT_ELF_RANDOMIZE_PIE
select HAVE_ARCH_JUMP_LABEL
select HAVE_TEXT_POKE_SMP
select HAVE_GENERIC_HARDIRQS
select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
select SPARSE_IRQ
......@@ -2333,10 +2332,6 @@ config HAVE_ATOMIC_IOMAP
def_bool y
depends on X86_32
config HAVE_TEXT_POKE_SMP
bool
select STOP_MACHINE if SMP
config X86_DEV_DMA_OPS
bool
depends on X86_64 || STA2X11
......
......@@ -5,6 +5,7 @@
#include <linux/stddef.h>
#include <linux/stringify.h>
#include <asm/asm.h>
#include <asm/ptrace.h>
/*
* Alternative inline assembly for SMP.
......@@ -220,20 +221,11 @@ extern void *text_poke_early(void *addr, const void *opcode, size_t len);
* no thread can be preempted in the instructions being modified (no iret to an
* invalid instruction possible) or if the instructions are changed from a
* consistent state to another consistent state atomically.
* More care must be taken when modifying code in the SMP case because of
* Intel's errata. text_poke_smp() takes care that errata, but still
* doesn't support NMI/MCE handler code modifying.
* On the local CPU you need to be protected again NMI or MCE handlers seeing an
* inconsistent instruction while you patch.
*/
struct text_poke_param {
void *addr;
const void *opcode;
size_t len;
};
extern void *text_poke(void *addr, const void *opcode, size_t len);
extern void *text_poke_smp(void *addr, const void *opcode, size_t len);
extern void text_poke_smp_batch(struct text_poke_param *params, int n);
extern int poke_int3_handler(struct pt_regs *regs);
extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler);
#endif /* _ASM_X86_ALTERNATIVE_H */
......@@ -49,6 +49,7 @@ extern void tsc_init(void);
extern void mark_tsc_unstable(char *reason);
extern int unsynchronized_tsc(void);
extern int check_tsc_unstable(void);
extern int check_tsc_disabled(void);
extern unsigned long native_calibrate_tsc(void);
extern int tsc_clocksource_reliable;
......
......@@ -11,6 +11,7 @@
#include <linux/memory.h>
#include <linux/stop_machine.h>
#include <linux/slab.h>
#include <linux/kdebug.h>
#include <asm/alternative.h>
#include <asm/sections.h>
#include <asm/pgtable.h>
......@@ -596,97 +597,93 @@ void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
return addr;
}
/*
* Cross-modifying kernel text with stop_machine().
* This code originally comes from immediate value.
*/
static atomic_t stop_machine_first;
static int wrote_text;
static void do_sync_core(void *info)
{
sync_core();
}
struct text_poke_params {
struct text_poke_param *params;
int nparams;
};
static bool bp_patching_in_progress;
static void *bp_int3_handler, *bp_int3_addr;
static int __kprobes stop_machine_text_poke(void *data)
int poke_int3_handler(struct pt_regs *regs)
{
struct text_poke_params *tpp = data;
struct text_poke_param *p;
int i;
/* bp_patching_in_progress */
smp_rmb();
if (atomic_xchg(&stop_machine_first, 0)) {
for (i = 0; i < tpp->nparams; i++) {
p = &tpp->params[i];
text_poke(p->addr, p->opcode, p->len);
}
smp_wmb(); /* Make sure other cpus see that this has run */
wrote_text = 1;
} else {
while (!wrote_text)
cpu_relax();
smp_mb(); /* Load wrote_text before following execution */
}
if (likely(!bp_patching_in_progress))
return 0;
for (i = 0; i < tpp->nparams; i++) {
p = &tpp->params[i];
flush_icache_range((unsigned long)p->addr,
(unsigned long)p->addr + p->len);
}
/*
* Intel Archiecture Software Developer's Manual section 7.1.3 specifies
* that a core serializing instruction such as "cpuid" should be
* executed on _each_ core before the new instruction is made visible.
*/
sync_core();
return 0;
}
if (user_mode_vm(regs) || regs->ip != (unsigned long)bp_int3_addr)
return 0;
/* set up the specified breakpoint handler */
regs->ip = (unsigned long) bp_int3_handler;
return 1;
/**
* text_poke_smp - Update instructions on a live kernel on SMP
* @addr: address to modify
* @opcode: source of the copy
* @len: length to copy
*
* Modify multi-byte instruction by using stop_machine() on SMP. This allows
* user to poke/set multi-byte text on SMP. Only non-NMI/MCE code modifying
* should be allowed, since stop_machine() does _not_ protect code against
* NMI and MCE.
*
* Note: Must be called under get_online_cpus() and text_mutex.
*/
void *__kprobes text_poke_smp(void *addr, const void *opcode, size_t len)
{
struct text_poke_params tpp;
struct text_poke_param p;
p.addr = addr;
p.opcode = opcode;
p.len = len;
tpp.params = &p;
tpp.nparams = 1;
atomic_set(&stop_machine_first, 1);
wrote_text = 0;
/* Use __stop_machine() because the caller already got online_cpus. */
__stop_machine(stop_machine_text_poke, (void *)&tpp, cpu_online_mask);
return addr;
}
/**
* text_poke_smp_batch - Update instructions on a live kernel on SMP
* @params: an array of text_poke parameters
* @n: the number of elements in params.
* text_poke_bp() -- update instructions on live kernel on SMP
* @addr: address to patch
* @opcode: opcode of new instruction
* @len: length to copy
* @handler: address to jump to when the temporary breakpoint is hit
*
* Modify multi-byte instruction by using stop_machine() on SMP. Since the
* stop_machine() is heavy task, it is better to aggregate text_poke requests
* and do it once if possible.
* Modify multi-byte instruction by using int3 breakpoint on SMP.
* We completely avoid stop_machine() here, and achieve the
* synchronization using int3 breakpoint.
*
* Note: Must be called under get_online_cpus() and text_mutex.
* The way it is done:
* - add a int3 trap to the address that will be patched
* - sync cores
* - update all but the first byte of the patched range
* - sync cores
* - replace the first byte (int3) by the first byte of
* replacing opcode
* - sync cores
*
* Note: must be called under text_mutex.
*/
void __kprobes text_poke_smp_batch(struct text_poke_param *params, int n)
void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
{
struct text_poke_params tpp = {.params = params, .nparams = n};
unsigned char int3 = 0xcc;
bp_int3_handler = handler;
bp_int3_addr = (u8 *)addr + sizeof(int3);
bp_patching_in_progress = true;
/*
* Corresponding read barrier in int3 notifier for
* making sure the in_progress flags is correctly ordered wrt.
* patching
*/
smp_wmb();
text_poke(addr, &int3, sizeof(int3));
atomic_set(&stop_machine_first, 1);
wrote_text = 0;
__stop_machine(stop_machine_text_poke, (void *)&tpp, cpu_online_mask);
on_each_cpu(do_sync_core, NULL, 1);
if (len - sizeof(int3) > 0) {
/* patch all but the first byte */
text_poke((char *)addr + sizeof(int3),
(const char *) opcode + sizeof(int3),
len - sizeof(int3));
/*
* According to Intel, this core syncing is very likely
* not necessary and we'd be safe even without it. But
* better safe than sorry (plus there's not only Intel).
*/
on_each_cpu(do_sync_core, NULL, 1);
}
/* patch the first byte */
text_poke(addr, opcode, sizeof(int3));
on_each_cpu(do_sync_core, NULL, 1);
bp_patching_in_progress = false;
smp_wmb();
return addr;
}
......@@ -1884,6 +1884,7 @@ static struct pmu pmu = {
void arch_perf_update_userpage(struct perf_event_mmap_page *userpg, u64 now)
{
userpg->cap_usr_time = 0;
userpg->cap_usr_time_zero = 0;
userpg->cap_usr_rdpmc = x86_pmu.attr_rdpmc;
userpg->pmc_width = x86_pmu.cntval_bits;
......@@ -1897,6 +1898,11 @@ void arch_perf_update_userpage(struct perf_event_mmap_page *userpg, u64 now)
userpg->time_mult = this_cpu_read(cyc2ns);
userpg->time_shift = CYC2NS_SCALE_FACTOR;
userpg->time_offset = this_cpu_read(cyc2ns_offset) - now;
if (sched_clock_stable && !check_tsc_disabled()) {
userpg->cap_usr_time_zero = 1;
userpg->time_zero = this_cpu_read(cyc2ns_offset);
}
}
/*
......
......@@ -641,6 +641,8 @@ extern struct event_constraint intel_core2_pebs_event_constraints[];
extern struct event_constraint intel_atom_pebs_event_constraints[];
extern struct event_constraint intel_slm_pebs_event_constraints[];
extern struct event_constraint intel_nehalem_pebs_event_constraints[];
extern struct event_constraint intel_westmere_pebs_event_constraints[];
......
......@@ -347,8 +347,7 @@ static struct amd_nb *amd_alloc_nb(int cpu)
struct amd_nb *nb;
int i;
nb = kmalloc_node(sizeof(struct amd_nb), GFP_KERNEL | __GFP_ZERO,
cpu_to_node(cpu));
nb = kzalloc_node(sizeof(struct amd_nb), GFP_KERNEL, cpu_to_node(cpu));
if (!nb)
return NULL;
......
......@@ -81,7 +81,8 @@ static struct event_constraint intel_nehalem_event_constraints[] __read_mostly =
static struct extra_reg intel_nehalem_extra_regs[] __read_mostly =
{
INTEL_EVENT_EXTRA_REG(0xb7, MSR_OFFCORE_RSP_0, 0xffff, RSP_0),
/* must define OFFCORE_RSP_X first, see intel_fixup_er() */
INTEL_UEVENT_EXTRA_REG(0x01b7, MSR_OFFCORE_RSP_0, 0xffff, RSP_0),
INTEL_UEVENT_PEBS_LDLAT_EXTRA_REG(0x100b),
EVENT_EXTRA_END
};
......@@ -143,8 +144,9 @@ static struct event_constraint intel_ivb_event_constraints[] __read_mostly =
static struct extra_reg intel_westmere_extra_regs[] __read_mostly =
{
INTEL_EVENT_EXTRA_REG(0xb7, MSR_OFFCORE_RSP_0, 0xffff, RSP_0),
INTEL_EVENT_EXTRA_REG(0xbb, MSR_OFFCORE_RSP_1, 0xffff, RSP_1),
/* must define OFFCORE_RSP_X first, see intel_fixup_er() */
INTEL_UEVENT_EXTRA_REG(0x01b7, MSR_OFFCORE_RSP_0, 0xffff, RSP_0),
INTEL_UEVENT_EXTRA_REG(0x01bb, MSR_OFFCORE_RSP_1, 0xffff, RSP_1),
INTEL_UEVENT_PEBS_LDLAT_EXTRA_REG(0x100b),
EVENT_EXTRA_END
};
......@@ -162,16 +164,27 @@ static struct event_constraint intel_gen_event_constraints[] __read_mostly =
EVENT_CONSTRAINT_END
};
static struct event_constraint intel_slm_event_constraints[] __read_mostly =
{
FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */
FIXED_EVENT_CONSTRAINT(0x003c, 1), /* CPU_CLK_UNHALTED.CORE */
FIXED_EVENT_CONSTRAINT(0x013c, 2), /* CPU_CLK_UNHALTED.REF */
FIXED_EVENT_CONSTRAINT(0x0300, 2), /* pseudo CPU_CLK_UNHALTED.REF */
EVENT_CONSTRAINT_END
};
static struct extra_reg intel_snb_extra_regs[] __read_mostly = {
INTEL_EVENT_EXTRA_REG(0xb7, MSR_OFFCORE_RSP_0, 0x3f807f8fffull, RSP_0),
INTEL_EVENT_EXTRA_REG(0xbb, MSR_OFFCORE_RSP_1, 0x3f807f8fffull, RSP_1),
/* must define OFFCORE_RSP_X first, see intel_fixup_er() */
INTEL_UEVENT_EXTRA_REG(0x01b7, MSR_OFFCORE_RSP_0, 0x3f807f8fffull, RSP_0),
INTEL_UEVENT_EXTRA_REG(0x01bb, MSR_OFFCORE_RSP_1, 0x3f807f8fffull, RSP_1),
INTEL_UEVENT_PEBS_LDLAT_EXTRA_REG(0x01cd),
EVENT_EXTRA_END
};
static struct extra_reg intel_snbep_extra_regs[] __read_mostly = {
INTEL_EVENT_EXTRA_REG(0xb7, MSR_OFFCORE_RSP_0, 0x3fffff8fffull, RSP_0),
INTEL_EVENT_EXTRA_REG(0xbb, MSR_OFFCORE_RSP_1, 0x3fffff8fffull, RSP_1),
/* must define OFFCORE_RSP_X first, see intel_fixup_er() */
INTEL_UEVENT_EXTRA_REG(0x01b7, MSR_OFFCORE_RSP_0, 0x3fffff8fffull, RSP_0),
INTEL_UEVENT_EXTRA_REG(0x01bb, MSR_OFFCORE_RSP_1, 0x3fffff8fffull, RSP_1),
INTEL_UEVENT_PEBS_LDLAT_EXTRA_REG(0x01cd),
EVENT_EXTRA_END
};
......@@ -882,6 +895,140 @@ static __initconst const u64 atom_hw_cache_event_ids
},
};
static struct extra_reg intel_slm_extra_regs[] __read_mostly =
{
/* must define OFFCORE_RSP_X first, see intel_fixup_er() */
INTEL_UEVENT_EXTRA_REG(0x01b7, MSR_OFFCORE_RSP_0, 0x768005ffff, RSP_0),
INTEL_UEVENT_EXTRA_REG(0x02b7, MSR_OFFCORE_RSP_1, 0x768005ffff, RSP_1),
EVENT_EXTRA_END
};
#define SLM_DMND_READ SNB_DMND_DATA_RD
#define SLM_DMND_WRITE SNB_DMND_RFO
#define SLM_DMND_PREFETCH (SNB_PF_DATA_RD|SNB_PF_RFO)
#define SLM_SNP_ANY (SNB_SNP_NONE|SNB_SNP_MISS|SNB_NO_FWD|SNB_HITM)
#define SLM_LLC_ACCESS SNB_RESP_ANY
#define SLM_LLC_MISS (SLM_SNP_ANY|SNB_NON_DRAM)
static __initconst const u64 slm_hw_cache_extra_regs
[PERF_COUNT_HW_CACHE_MAX]
[PERF_COUNT_HW_CACHE_OP_MAX]
[PERF_COUNT_HW_CACHE_RESULT_MAX] =
{
[ C(LL ) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = SLM_DMND_READ|SLM_LLC_ACCESS,
[ C(RESULT_MISS) ] = SLM_DMND_READ|SLM_LLC_MISS,
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = SLM_DMND_WRITE|SLM_LLC_ACCESS,
[ C(RESULT_MISS) ] = SLM_DMND_WRITE|SLM_LLC_MISS,
},
[ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = SLM_DMND_PREFETCH|SLM_LLC_ACCESS,
[ C(RESULT_MISS) ] = SLM_DMND_PREFETCH|SLM_LLC_MISS,
},
},
};
static __initconst const u64 slm_hw_cache_event_ids
[PERF_COUNT_HW_CACHE_MAX]
[PERF_COUNT_HW_CACHE_OP_MAX]
[PERF_COUNT_HW_CACHE_RESULT_MAX] =
{
[ C(L1D) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0,
[ C(RESULT_MISS) ] = 0x0104, /* LD_DCU_MISS */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = 0,
[ C(RESULT_MISS) ] = 0,
},
[ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = 0,
[ C(RESULT_MISS) ] = 0,
},
},
[ C(L1I ) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0x0380, /* ICACHE.ACCESSES */
[ C(RESULT_MISS) ] = 0x0280, /* ICACGE.MISSES */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = -1,
[ C(RESULT_MISS) ] = -1,
},
[ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = 0,
[ C(RESULT_MISS) ] = 0,
},
},
[ C(LL ) ] = {
[ C(OP_READ) ] = {
/* OFFCORE_RESPONSE.ANY_DATA.LOCAL_CACHE */
[ C(RESULT_ACCESS) ] = 0x01b7,
/* OFFCORE_RESPONSE.ANY_DATA.ANY_LLC_MISS */
[ C(RESULT_MISS) ] = 0x01b7,
},
[ C(OP_WRITE) ] = {
/* OFFCORE_RESPONSE.ANY_RFO.LOCAL_CACHE */
[ C(RESULT_ACCESS) ] = 0x01b7,
/* OFFCORE_RESPONSE.ANY_RFO.ANY_LLC_MISS */
[ C(RESULT_MISS) ] = 0x01b7,
},
[ C(OP_PREFETCH) ] = {
/* OFFCORE_RESPONSE.PREFETCH.LOCAL_CACHE */
[ C(RESULT_ACCESS) ] = 0x01b7,
/* OFFCORE_RESPONSE.PREFETCH.ANY_LLC_MISS */
[ C(RESULT_MISS) ] = 0x01b7,
},
},
[ C(DTLB) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0,
[ C(RESULT_MISS) ] = 0x0804, /* LD_DTLB_MISS */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = 0,
[ C(RESULT_MISS) ] = 0,
},
[ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = 0,
[ C(RESULT_MISS) ] = 0,
},
},
[ C(ITLB) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0x00c0, /* INST_RETIRED.ANY_P */
[ C(RESULT_MISS) ] = 0x0282, /* ITLB.MISSES */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = -1,
[ C(RESULT_MISS) ] = -1,
},
[ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = -1,
[ C(RESULT_MISS) ] = -1,
},
},
[ C(BPU ) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0x00c4, /* BR_INST_RETIRED.ANY */
[ C(RESULT_MISS) ] = 0x00c5, /* BP_INST_RETIRED.MISPRED */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = -1,
[ C(RESULT_MISS) ] = -1,
},
[ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = -1,
[ C(RESULT_MISS) ] = -1,
},
},
};
static inline bool intel_pmu_needs_lbr_smpl(struct perf_event *event)
{
/* user explicitly requested branch sampling */
......@@ -1301,11 +1448,11 @@ static void intel_fixup_er(struct perf_event *event, int idx)
if (idx == EXTRA_REG_RSP_0) {
event->hw.config &= ~INTEL_ARCH_EVENT_MASK;
event->hw.config |= 0x01b7;
event->hw.config |= x86_pmu.extra_regs[EXTRA_REG_RSP_0].event;
event->hw.extra_reg.reg = MSR_OFFCORE_RSP_0;
} else if (idx == EXTRA_REG_RSP_1) {
event->hw.config &= ~INTEL_ARCH_EVENT_MASK;
event->hw.config |= 0x01bb;
event->hw.config |= x86_pmu.extra_regs[EXTRA_REG_RSP_1].event;
event->hw.extra_reg.reg = MSR_OFFCORE_RSP_1;
}
}
......@@ -2176,6 +2323,21 @@ __init int intel_pmu_init(void)
pr_cont("Atom events, ");
break;
case 55: /* Atom 22nm "Silvermont" */
memcpy(hw_cache_event_ids, slm_hw_cache_event_ids,
sizeof(hw_cache_event_ids));
memcpy(hw_cache_extra_regs, slm_hw_cache_extra_regs,
sizeof(hw_cache_extra_regs));
intel_pmu_lbr_init_atom();
x86_pmu.event_constraints = intel_slm_event_constraints;
x86_pmu.pebs_constraints = intel_slm_pebs_event_constraints;
x86_pmu.extra_regs = intel_slm_extra_regs;
x86_pmu.er_flags |= ERF_HAS_RSP_1;
pr_cont("Silvermont events, ");
break;
case 37: /* 32 nm nehalem, "Clarkdale" */
case 44: /* 32 nm nehalem, "Gulftown" */
case 47: /* 32 nm Xeon E7 */
......
......@@ -224,7 +224,7 @@ static int alloc_pebs_buffer(int cpu)
if (!x86_pmu.pebs)
return 0;
buffer = kmalloc_node(PEBS_BUFFER_SIZE, GFP_KERNEL | __GFP_ZERO, node);
buffer = kzalloc_node(PEBS_BUFFER_SIZE, GFP_KERNEL, node);
if (unlikely(!buffer))
return -ENOMEM;
......@@ -262,7 +262,7 @@ static int alloc_bts_buffer(int cpu)
if (!x86_pmu.bts)
return 0;
buffer = kmalloc_node(BTS_BUFFER_SIZE, GFP_KERNEL | __GFP_ZERO, node);
buffer = kzalloc_node(BTS_BUFFER_SIZE, GFP_KERNEL, node);
if (unlikely(!buffer))
return -ENOMEM;
......@@ -295,7 +295,7 @@ static int alloc_ds_buffer(int cpu)
int node = cpu_to_node(cpu);
struct debug_store *ds;
ds = kmalloc_node(sizeof(*ds), GFP_KERNEL | __GFP_ZERO, node);
ds = kzalloc_node(sizeof(*ds), GFP_KERNEL, node);
if (unlikely(!ds))
return -ENOMEM;
......@@ -517,6 +517,32 @@ struct event_constraint intel_atom_pebs_event_constraints[] = {
EVENT_CONSTRAINT_END
};
struct event_constraint intel_slm_pebs_event_constraints[] = {
INTEL_UEVENT_CONSTRAINT(0x0103, 0x1), /* REHABQ.LD_BLOCK_ST_FORWARD_PS */
INTEL_UEVENT_CONSTRAINT(0x0803, 0x1), /* REHABQ.LD_SPLITS_PS */
INTEL_UEVENT_CONSTRAINT(0x0204, 0x1), /* MEM_UOPS_RETIRED.L2_HIT_LOADS_PS */
INTEL_UEVENT_CONSTRAINT(0x0404, 0x1), /* MEM_UOPS_RETIRED.L2_MISS_LOADS_PS */
INTEL_UEVENT_CONSTRAINT(0x0804, 0x1), /* MEM_UOPS_RETIRED.DTLB_MISS_LOADS_PS */
INTEL_UEVENT_CONSTRAINT(0x2004, 0x1), /* MEM_UOPS_RETIRED.HITM_PS */
INTEL_UEVENT_CONSTRAINT(0x00c0, 0x1), /* INST_RETIRED.ANY_PS */
INTEL_UEVENT_CONSTRAINT(0x00c4, 0x1), /* BR_INST_RETIRED.ALL_BRANCHES_PS */
INTEL_UEVENT_CONSTRAINT(0x7ec4, 0x1), /* BR_INST_RETIRED.JCC_PS */
INTEL_UEVENT_CONSTRAINT(0xbfc4, 0x1), /* BR_INST_RETIRED.FAR_BRANCH_PS */
INTEL_UEVENT_CONSTRAINT(0xebc4, 0x1), /* BR_INST_RETIRED.NON_RETURN_IND_PS */
INTEL_UEVENT_CONSTRAINT(0xf7c4, 0x1), /* BR_INST_RETIRED.RETURN_PS */
INTEL_UEVENT_CONSTRAINT(0xf9c4, 0x1), /* BR_INST_RETIRED.CALL_PS */
INTEL_UEVENT_CONSTRAINT(0xfbc4, 0x1), /* BR_INST_RETIRED.IND_CALL_PS */
INTEL_UEVENT_CONSTRAINT(0xfdc4, 0x1), /* BR_INST_RETIRED.REL_CALL_PS */
INTEL_UEVENT_CONSTRAINT(0xfec4, 0x1), /* BR_INST_RETIRED.TAKEN_JCC_PS */
INTEL_UEVENT_CONSTRAINT(0x00c5, 0x1), /* BR_INST_MISP_RETIRED.ALL_BRANCHES_PS */
INTEL_UEVENT_CONSTRAINT(0x7ec5, 0x1), /* BR_INST_MISP_RETIRED.JCC_PS */
INTEL_UEVENT_CONSTRAINT(0xebc5, 0x1), /* BR_INST_MISP_RETIRED.NON_RETURN_IND_PS */
INTEL_UEVENT_CONSTRAINT(0xf7c5, 0x1), /* BR_INST_MISP_RETIRED.RETURN_PS */
INTEL_UEVENT_CONSTRAINT(0xfbc5, 0x1), /* BR_INST_MISP_RETIRED.IND_CALL_PS */
INTEL_UEVENT_CONSTRAINT(0xfec5, 0x1), /* BR_INST_MISP_RETIRED.TAKEN_JCC_PS */
EVENT_CONSTRAINT_END
};
struct event_constraint intel_nehalem_pebs_event_constraints[] = {
INTEL_PLD_CONSTRAINT(0x100b, 0xf), /* MEM_INST_RETIRED.* */
INTEL_EVENT_CONSTRAINT(0x0f, 0xf), /* MEM_UNCORE_RETIRED.* */
......
......@@ -12,6 +12,15 @@
#define UNCORE_PMC_IDX_FIXED UNCORE_PMC_IDX_MAX_GENERIC
#define UNCORE_PMC_IDX_MAX (UNCORE_PMC_IDX_FIXED + 1)
#define UNCORE_PCI_DEV_DATA(type, idx) ((type << 8) | idx)
#define UNCORE_PCI_DEV_TYPE(data) ((data >> 8) & 0xff)
#define UNCORE_PCI_DEV_IDX(data) (data & 0xff)
#define UNCORE_EXTRA_PCI_DEV 0xff
#define UNCORE_EXTRA_PCI_DEV_MAX 2
/* support up to 8 sockets */
#define UNCORE_SOCKET_MAX 8
#define UNCORE_EVENT_CONSTRAINT(c, n) EVENT_CONSTRAINT(c, n, 0xff)
/* SNB event control */
......@@ -108,6 +117,7 @@
(SNBEP_PMON_CTL_EV_SEL_MASK | \
SNBEP_PCU_MSR_PMON_CTL_OCC_SEL_MASK | \
SNBEP_PMON_CTL_EDGE_DET | \
SNBEP_PMON_CTL_EV_SEL_EXT | \
SNBEP_PMON_CTL_INVERT | \
SNBEP_PCU_MSR_PMON_CTL_TRESH_MASK | \
SNBEP_PCU_MSR_PMON_CTL_OCC_INVERT | \
......
......@@ -37,7 +37,19 @@ static void __jump_label_transform(struct jump_entry *entry,
} else
memcpy(&code, ideal_nops[NOP_ATOMIC5], JUMP_LABEL_NOP_SIZE);
(*poker)((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE);
/*
* Make text_poke_bp() a default fallback poker.
*
* At the time the change is being done, just ignore whether we
* are doing nop -> jump or jump -> nop transition, and assume
* always nop being the 'currently valid' instruction
*
*/
if (poker)
(*poker)((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE);
else
text_poke_bp((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE,
(void *)entry->code + JUMP_LABEL_NOP_SIZE);
}
void arch_jump_label_transform(struct jump_entry *entry,
......@@ -45,7 +57,7 @@ void arch_jump_label_transform(struct jump_entry *entry,
{
get_online_cpus();
mutex_lock(&text_mutex);
__jump_label_transform(entry, type, text_poke_smp);
__jump_label_transform(entry, type, NULL);
mutex_unlock(&text_mutex);
put_online_cpus();
}
......
......@@ -82,14 +82,9 @@ extern void synthesize_reljump(void *from, void *to);
extern void synthesize_relcall(void *from, void *to);
#ifdef CONFIG_OPTPROBES
extern int arch_init_optprobes(void);
extern int setup_detour_execution(struct kprobe *p, struct pt_regs *regs, int reenter);
extern unsigned long __recover_optprobed_insn(kprobe_opcode_t *buf, unsigned long addr);
#else /* !CONFIG_OPTPROBES */
static inline int arch_init_optprobes(void)
{
return 0;
}
static inline int setup_detour_execution(struct kprobe *p, struct pt_regs *regs, int reenter)
{
return 0;
......
......@@ -1068,7 +1068,7 @@ int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
int __init arch_init_kprobes(void)
{
return arch_init_optprobes();
return 0;
}
int __kprobes arch_trampoline_kprobe(struct kprobe *p)
......
......@@ -371,31 +371,6 @@ int __kprobes arch_prepare_optimized_kprobe(struct optimized_kprobe *op)
return 0;
}
#define MAX_OPTIMIZE_PROBES 256
static struct text_poke_param *jump_poke_params;
static struct jump_poke_buffer {
u8 buf[RELATIVEJUMP_SIZE];
} *jump_poke_bufs;
static void __kprobes setup_optimize_kprobe(struct text_poke_param *tprm,
u8 *insn_buf,
struct optimized_kprobe *op)
{
s32 rel = (s32)((long)op->optinsn.insn -
((long)op->kp.addr + RELATIVEJUMP_SIZE));
/* Backup instructions which will be replaced by jump address */
memcpy(op->optinsn.copied_insn, op->kp.addr + INT3_SIZE,
RELATIVE_ADDR_SIZE);
insn_buf[0] = RELATIVEJUMP_OPCODE;
*(s32 *)(&insn_buf[1]) = rel;
tprm->addr = op->kp.addr;
tprm->opcode = insn_buf;
tprm->len = RELATIVEJUMP_SIZE;
}
/*
* Replace breakpoints (int3) with relative jumps.
* Caller must call with locking kprobe_mutex and text_mutex.
......@@ -403,37 +378,38 @@ static void __kprobes setup_optimize_kprobe(struct text_poke_param *tprm,
void __kprobes arch_optimize_kprobes(struct list_head *oplist)
{
struct optimized_kprobe *op, *tmp;
int c = 0;
u8 insn_buf[RELATIVEJUMP_SIZE];
list_for_each_entry_safe(op, tmp, oplist, list) {
s32 rel = (s32)((long)op->optinsn.insn -
((long)op->kp.addr + RELATIVEJUMP_SIZE));
WARN_ON(kprobe_disabled(&op->kp));
/* Setup param */
setup_optimize_kprobe(&jump_poke_params[c],
jump_poke_bufs[c].buf, op);
/* Backup instructions which will be replaced by jump address */
memcpy(op->optinsn.copied_insn, op->kp.addr + INT3_SIZE,
RELATIVE_ADDR_SIZE);
insn_buf[0] = RELATIVEJUMP_OPCODE;
*(s32 *)(&insn_buf[1]) = rel;
text_poke_bp(op->kp.addr, insn_buf, RELATIVEJUMP_SIZE,
op->optinsn.insn);
list_del_init(&op->list);
if (++c >= MAX_OPTIMIZE_PROBES)
break;
}
/*
* text_poke_smp doesn't support NMI/MCE code modifying.
* However, since kprobes itself also doesn't support NMI/MCE
* code probing, it's not a problem.
*/
text_poke_smp_batch(jump_poke_params, c);
}
static void __kprobes setup_unoptimize_kprobe(struct text_poke_param *tprm,
u8 *insn_buf,
struct optimized_kprobe *op)
/* Replace a relative jump with a breakpoint (int3). */
void __kprobes arch_unoptimize_kprobe(struct optimized_kprobe *op)
{
u8 insn_buf[RELATIVEJUMP_SIZE];
/* Set int3 to first byte for kprobes */
insn_buf[0] = BREAKPOINT_INSTRUCTION;
memcpy(insn_buf + 1, op->optinsn.copied_insn, RELATIVE_ADDR_SIZE);
tprm->addr = op->kp.addr;
tprm->opcode = insn_buf;
tprm->len = RELATIVEJUMP_SIZE;
text_poke_bp(op->kp.addr, insn_buf, RELATIVEJUMP_SIZE,
op->optinsn.insn);
}
/*
......@@ -444,34 +420,11 @@ extern void arch_unoptimize_kprobes(struct list_head *oplist,
struct list_head *done_list)
{
struct optimized_kprobe *op, *tmp;
int c = 0;
list_for_each_entry_safe(op, tmp, oplist, list) {
/* Setup param */
setup_unoptimize_kprobe(&jump_poke_params[c],
jump_poke_bufs[c].buf, op);
arch_unoptimize_kprobe(op);
list_move(&op->list, done_list);
if (++c >= MAX_OPTIMIZE_PROBES)
break;
}
/*
* text_poke_smp doesn't support NMI/MCE code modifying.
* However, since kprobes itself also doesn't support NMI/MCE
* code probing, it's not a problem.
*/
text_poke_smp_batch(jump_poke_params, c);
}
/* Replace a relative jump with a breakpoint (int3). */
void __kprobes arch_unoptimize_kprobe(struct optimized_kprobe *op)
{
u8 buf[RELATIVEJUMP_SIZE];
/* Set int3 to first byte for kprobes */
buf[0] = BREAKPOINT_INSTRUCTION;
memcpy(buf + 1, op->optinsn.copied_insn, RELATIVE_ADDR_SIZE);
text_poke_smp(op->kp.addr, buf, RELATIVEJUMP_SIZE);
}
int __kprobes
......@@ -491,22 +444,3 @@ setup_detour_execution(struct kprobe *p, struct pt_regs *regs, int reenter)
}
return 0;
}
int __kprobes arch_init_optprobes(void)
{
/* Allocate code buffer and parameter array */
jump_poke_bufs = kmalloc(sizeof(struct jump_poke_buffer) *
MAX_OPTIMIZE_PROBES, GFP_KERNEL);
if (!jump_poke_bufs)
return -ENOMEM;
jump_poke_params = kmalloc(sizeof(struct text_poke_param) *
MAX_OPTIMIZE_PROBES, GFP_KERNEL);
if (!jump_poke_params) {
kfree(jump_poke_bufs);
jump_poke_bufs = NULL;
return -ENOMEM;
}
return 0;
}
......@@ -58,6 +58,7 @@
#include <asm/mce.h>
#include <asm/fixmap.h>
#include <asm/mach_traps.h>
#include <asm/alternative.h>
#ifdef CONFIG_X86_64
#include <asm/x86_init.h>
......@@ -327,6 +328,9 @@ dotraplinkage void __kprobes notrace do_int3(struct pt_regs *regs, long error_co
ftrace_int3_handler(regs))
return;
#endif
if (poke_int3_handler(regs))
return;
prev_state = exception_enter();
#ifdef CONFIG_KGDB_LOW_LEVEL_TRAP
if (kgdb_ll_trap(DIE_INT3, "int3", regs, error_code, X86_TRAP_BP,
......
......@@ -89,6 +89,12 @@ int check_tsc_unstable(void)
}
EXPORT_SYMBOL_GPL(check_tsc_unstable);
int check_tsc_disabled(void)
{
return tsc_disabled;
}
EXPORT_SYMBOL_GPL(check_tsc_disabled);
#ifdef CONFIG_X86_TSC
int __init notsc_setup(char *str)
{
......
......@@ -63,30 +63,6 @@ struct perf_raw_record {
void *data;
};
/*
* single taken branch record layout:
*
* from: source instruction (may not always be a branch insn)
* to: branch target
* mispred: branch target was mispredicted
* predicted: branch target was predicted
*
* support for mispred, predicted is optional. In case it
* is not supported mispred = predicted = 0.
*
* in_tx: running in a hardware transaction
* abort: aborting a hardware transaction
*/
struct perf_branch_entry {
__u64 from;
__u64 to;
__u64 mispred:1, /* target mispredicted */
predicted:1,/* target predicted */
in_tx:1, /* in transaction */
abort:1, /* transaction abort */
reserved:60;
};
/*
* branch stack layout:
* nr: number of taken branches stored in entries[]
......
......@@ -1034,6 +1034,9 @@ struct task_struct {
#ifdef CONFIG_SMP
struct llist_node wake_entry;
int on_cpu;
struct task_struct *last_wakee;
unsigned long wakee_flips;
unsigned long wakee_flip_decay_ts;
#endif
int on_rq;
......
......@@ -57,7 +57,7 @@ DECLARE_EVENT_CLASS(sched_wakeup_template,
TP_PROTO(struct task_struct *p, int success),
TP_ARGS(p, success),
TP_ARGS(__perf_task(p), success),
TP_STRUCT__entry(
__array( char, comm, TASK_COMM_LEN )
......@@ -73,9 +73,6 @@ DECLARE_EVENT_CLASS(sched_wakeup_template,
__entry->prio = p->prio;
__entry->success = success;
__entry->target_cpu = task_cpu(p);
)
TP_perf_assign(
__perf_task(p);
),
TP_printk("comm=%s pid=%d prio=%d success=%d target_cpu=%03d",
......@@ -313,7 +310,7 @@ DECLARE_EVENT_CLASS(sched_stat_template,
TP_PROTO(struct task_struct *tsk, u64 delay),
TP_ARGS(tsk, delay),
TP_ARGS(__perf_task(tsk), __perf_count(delay)),
TP_STRUCT__entry(
__array( char, comm, TASK_COMM_LEN )
......@@ -325,10 +322,6 @@ DECLARE_EVENT_CLASS(sched_stat_template,
memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
__entry->pid = tsk->pid;
__entry->delay = delay;
)
TP_perf_assign(
__perf_count(delay);
__perf_task(tsk);
),
TP_printk("comm=%s pid=%d delay=%Lu [ns]",
......@@ -372,11 +365,11 @@ DEFINE_EVENT(sched_stat_template, sched_stat_blocked,
* Tracepoint for accounting runtime (time the task is executing
* on a CPU).
*/
TRACE_EVENT(sched_stat_runtime,
DECLARE_EVENT_CLASS(sched_stat_runtime,
TP_PROTO(struct task_struct *tsk, u64 runtime, u64 vruntime),
TP_ARGS(tsk, runtime, vruntime),
TP_ARGS(tsk, __perf_count(runtime), vruntime),
TP_STRUCT__entry(
__array( char, comm, TASK_COMM_LEN )
......@@ -390,9 +383,6 @@ TRACE_EVENT(sched_stat_runtime,
__entry->pid = tsk->pid;
__entry->runtime = runtime;
__entry->vruntime = vruntime;
)
TP_perf_assign(
__perf_count(runtime);
),
TP_printk("comm=%s pid=%d runtime=%Lu [ns] vruntime=%Lu [ns]",
......@@ -401,6 +391,10 @@ TRACE_EVENT(sched_stat_runtime,
(unsigned long long)__entry->vruntime)
);
DEFINE_EVENT(sched_stat_runtime, sched_stat_runtime,
TP_PROTO(struct task_struct *tsk, u64 runtime, u64 vruntime),
TP_ARGS(tsk, runtime, vruntime));
/*
* Tracepoint for showing priority inheritance modifying a tasks
* priority.
......
......@@ -507,8 +507,14 @@ static inline notrace int ftrace_get_offsets_##call( \
#undef TP_fast_assign
#define TP_fast_assign(args...) args
#undef TP_perf_assign
#define TP_perf_assign(args...)
#undef __perf_addr
#define __perf_addr(a) (a)
#undef __perf_count
#define __perf_count(c) (c)
#undef __perf_task
#define __perf_task(t) (t)
#undef DECLARE_EVENT_CLASS
#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
......@@ -636,16 +642,13 @@ __attribute__((section("_ftrace_events"))) *__event_##call = &event_##call
#define __get_str(field) (char *)__get_dynamic_array(field)
#undef __perf_addr
#define __perf_addr(a) __addr = (a)
#define __perf_addr(a) (__addr = (a))
#undef __perf_count
#define __perf_count(c) __count = (c)
#define __perf_count(c) (__count = (c))
#undef __perf_task
#define __perf_task(t) __task = (t)
#undef TP_perf_assign
#define TP_perf_assign(args...) args
#define __perf_task(t) (__task = (t))
#undef DECLARE_EVENT_CLASS
#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
......@@ -663,15 +666,20 @@ perf_trace_##call(void *__data, proto) \
int __data_size; \
int rctx; \
\
perf_fetch_caller_regs(&__regs); \
\
__data_size = ftrace_get_offsets_##call(&__data_offsets, args); \
\
head = this_cpu_ptr(event_call->perf_events); \
if (__builtin_constant_p(!__task) && !__task && \
hlist_empty(head)) \
return; \
\
__entry_size = ALIGN(__data_size + sizeof(*entry) + sizeof(u32),\
sizeof(u64)); \
__entry_size -= sizeof(u32); \
\
entry = (struct ftrace_raw_##call *)perf_trace_buf_prepare( \
__entry_size, event_call->event.type, &__regs, &rctx); \
perf_fetch_caller_regs(&__regs); \
entry = perf_trace_buf_prepare(__entry_size, \
event_call->event.type, &__regs, &rctx); \
if (!entry) \
return; \
\
......@@ -679,7 +687,6 @@ perf_trace_##call(void *__data, proto) \
\
{ assign; } \
\
head = this_cpu_ptr(event_call->perf_events); \
perf_trace_buf_submit(entry, __entry_size, rctx, __addr, \
__count, &__regs, head, __task); \
}
......
......@@ -109,6 +109,7 @@ enum perf_sw_ids {
PERF_COUNT_SW_PAGE_FAULTS_MAJ = 6,
PERF_COUNT_SW_ALIGNMENT_FAULTS = 7,
PERF_COUNT_SW_EMULATION_FAULTS = 8,
PERF_COUNT_SW_DUMMY = 9,
PERF_COUNT_SW_MAX, /* non-ABI */
};
......@@ -134,8 +135,9 @@ enum perf_event_sample_format {
PERF_SAMPLE_STACK_USER = 1U << 13,
PERF_SAMPLE_WEIGHT = 1U << 14,
PERF_SAMPLE_DATA_SRC = 1U << 15,
PERF_SAMPLE_IDENTIFIER = 1U << 16,
PERF_SAMPLE_MAX = 1U << 16, /* non-ABI */
PERF_SAMPLE_MAX = 1U << 17, /* non-ABI */
};
/*
......@@ -275,8 +277,9 @@ struct perf_event_attr {
exclude_callchain_kernel : 1, /* exclude kernel callchains */
exclude_callchain_user : 1, /* exclude user callchains */
mmap2 : 1, /* include mmap with inode data */
__reserved_1 : 41;
__reserved_1 : 40;
union {
__u32 wakeup_events; /* wakeup every n events */
......@@ -321,6 +324,7 @@ struct perf_event_attr {
#define PERF_EVENT_IOC_PERIOD _IOW('$', 4, __u64)
#define PERF_EVENT_IOC_SET_OUTPUT _IO ('$', 5)
#define PERF_EVENT_IOC_SET_FILTER _IOW('$', 6, char *)
#define PERF_EVENT_IOC_ID _IOR('$', 7, u64 *)
enum perf_event_ioc_flags {
PERF_IOC_FLAG_GROUP = 1U << 0,
......@@ -375,9 +379,12 @@ struct perf_event_mmap_page {
__u64 time_running; /* time event on cpu */
union {
__u64 capabilities;
__u64 cap_usr_time : 1,
cap_usr_rdpmc : 1,
cap_____res : 62;
struct {
__u64 cap_usr_time : 1,
cap_usr_rdpmc : 1,
cap_usr_time_zero : 1,
cap_____res : 61;
};
};
/*
......@@ -418,12 +425,29 @@ struct perf_event_mmap_page {
__u16 time_shift;
__u32 time_mult;
__u64 time_offset;
/*
* If cap_usr_time_zero, the hardware clock (e.g. TSC) can be calculated
* from sample timestamps.
*
* time = timestamp - time_zero;
* quot = time / time_mult;
* rem = time % time_mult;
* cyc = (quot << time_shift) + (rem << time_shift) / time_mult;
*
* And vice versa:
*
* quot = cyc >> time_shift;
* rem = cyc & ((1 << time_shift) - 1);
* timestamp = time_zero + quot * time_mult +
* ((rem * time_mult) >> time_shift);
*/
__u64 time_zero;
/*
* Hole for extension of the self monitor capabilities
*/
__u64 __reserved[120]; /* align to 1k */
__u64 __reserved[119]; /* align to 1k */
/*
* Control data for the mmap() data buffer.
......@@ -471,13 +495,28 @@ enum perf_event_type {
/*
* If perf_event_attr.sample_id_all is set then all event types will
* have the sample_type selected fields related to where/when
* (identity) an event took place (TID, TIME, ID, CPU, STREAM_ID)
* described in PERF_RECORD_SAMPLE below, it will be stashed just after
* the perf_event_header and the fields already present for the existing
* fields, i.e. at the end of the payload. That way a newer perf.data
* file will be supported by older perf tools, with these new optional
* fields being ignored.
* (identity) an event took place (TID, TIME, ID, STREAM_ID, CPU,
* IDENTIFIER) described in PERF_RECORD_SAMPLE below, it will be stashed
* just after the perf_event_header and the fields already present for
* the existing fields, i.e. at the end of the payload. That way a newer
* perf.data file will be supported by older perf tools, with these new
* optional fields being ignored.
*
* struct sample_id {
* { u32 pid, tid; } && PERF_SAMPLE_TID
* { u64 time; } && PERF_SAMPLE_TIME
* { u64 id; } && PERF_SAMPLE_ID
* { u64 stream_id;} && PERF_SAMPLE_STREAM_ID
* { u32 cpu, res; } && PERF_SAMPLE_CPU
* { u64 id; } && PERF_SAMPLE_IDENTIFIER
* } && perf_event_attr::sample_id_all
*
* Note that PERF_SAMPLE_IDENTIFIER duplicates PERF_SAMPLE_ID. The
* advantage of PERF_SAMPLE_IDENTIFIER is that its position is fixed
* relative to header.size.
*/
/*
* The MMAP events record the PROT_EXEC mappings so that we can
* correlate userspace IPs to code. They have the following structure:
*
......@@ -498,6 +537,7 @@ enum perf_event_type {
* struct perf_event_header header;
* u64 id;
* u64 lost;
* struct sample_id sample_id;
* };
*/
PERF_RECORD_LOST = 2,
......@@ -508,6 +548,7 @@ enum perf_event_type {
*
* u32 pid, tid;
* char comm[];
* struct sample_id sample_id;
* };
*/
PERF_RECORD_COMM = 3,
......@@ -518,6 +559,7 @@ enum perf_event_type {
* u32 pid, ppid;
* u32 tid, ptid;
* u64 time;
* struct sample_id sample_id;
* };
*/
PERF_RECORD_EXIT = 4,
......@@ -528,6 +570,7 @@ enum perf_event_type {
* u64 time;
* u64 id;
* u64 stream_id;
* struct sample_id sample_id;
* };
*/
PERF_RECORD_THROTTLE = 5,
......@@ -539,6 +582,7 @@ enum perf_event_type {
* u32 pid, ppid;
* u32 tid, ptid;
* u64 time;
* struct sample_id sample_id;
* };
*/
PERF_RECORD_FORK = 7,
......@@ -549,6 +593,7 @@ enum perf_event_type {
* u32 pid, tid;
*
* struct read_format values;
* struct sample_id sample_id;
* };
*/
PERF_RECORD_READ = 8,
......@@ -557,6 +602,13 @@ enum perf_event_type {
* struct {
* struct perf_event_header header;
*
* #
* # Note that PERF_SAMPLE_IDENTIFIER duplicates PERF_SAMPLE_ID.
* # The advantage of PERF_SAMPLE_IDENTIFIER is that its position
* # is fixed relative to header.
* #
*
* { u64 id; } && PERF_SAMPLE_IDENTIFIER
* { u64 ip; } && PERF_SAMPLE_IP
* { u32 pid, tid; } && PERF_SAMPLE_TID
* { u64 time; } && PERF_SAMPLE_TIME
......@@ -596,11 +648,32 @@ enum perf_event_type {
* u64 dyn_size; } && PERF_SAMPLE_STACK_USER
*
* { u64 weight; } && PERF_SAMPLE_WEIGHT
* { u64 data_src; } && PERF_SAMPLE_DATA_SRC
* { u64 data_src; } && PERF_SAMPLE_DATA_SRC
* };
*/
PERF_RECORD_SAMPLE = 9,
/*
* The MMAP2 records are an augmented version of MMAP, they add
* maj, min, ino numbers to be used to uniquely identify each mapping
*
* struct {
* struct perf_event_header header;
*
* u32 pid, tid;
* u64 addr;
* u64 len;
* u64 pgoff;
* u32 maj;
* u32 min;
* u64 ino;
* u64 ino_generation;
* char filename[];
* struct sample_id sample_id;
* };
*/
PERF_RECORD_MMAP2 = 10,
PERF_RECORD_MAX, /* non-ABI */
};
......@@ -685,4 +758,28 @@ union perf_mem_data_src {
#define PERF_MEM_S(a, s) \
(((u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT)
/*
* single taken branch record layout:
*
* from: source instruction (may not always be a branch insn)
* to: branch target
* mispred: branch target was mispredicted
* predicted: branch target was predicted
*
* support for mispred, predicted is optional. In case it
* is not supported mispred = predicted = 0.
*
* in_tx: running in a hardware transaction
* abort: aborting a hardware transaction
*/
struct perf_branch_entry {
__u64 from;
__u64 to;
__u64 mispred:1, /* target mispredicted */
predicted:1,/* target predicted */
in_tx:1, /* in transaction */
abort:1, /* transaction abort */
reserved:60;
};
#endif /* _UAPI_LINUX_PERF_EVENT_H */
......@@ -116,6 +116,9 @@ int get_callchain_buffers(void)
err = alloc_callchain_buffers();
exit:
if (err)
atomic_dec(&nr_callchain_events);
mutex_unlock(&callchain_mutex);
return err;
......
This diff is collapsed.
......@@ -5133,18 +5133,23 @@ static void destroy_sched_domains(struct sched_domain *sd, int cpu)
* two cpus are in the same cache domain, see cpus_share_cache().
*/
DEFINE_PER_CPU(struct sched_domain *, sd_llc);
DEFINE_PER_CPU(int, sd_llc_size);
DEFINE_PER_CPU(int, sd_llc_id);
static void update_top_cache_domain(int cpu)
{
struct sched_domain *sd;
int id = cpu;
int size = 1;
sd = highest_flag_domain(cpu, SD_SHARE_PKG_RESOURCES);
if (sd)
if (sd) {
id = cpumask_first(sched_domain_span(sd));
size = cpumask_weight(sched_domain_span(sd));
}
rcu_assign_pointer(per_cpu(sd_llc, cpu), sd);
per_cpu(sd_llc_size, cpu) = size;
per_cpu(sd_llc_id, cpu) = id;
}
......
......@@ -3018,6 +3018,23 @@ static unsigned long cpu_avg_load_per_task(int cpu)
return 0;
}
static void record_wakee(struct task_struct *p)
{
/*
* Rough decay (wiping) for cost saving, don't worry
* about the boundary, really active task won't care
* about the loss.
*/
if (jiffies > current->wakee_flip_decay_ts + HZ) {
current->wakee_flips = 0;
current->wakee_flip_decay_ts = jiffies;
}
if (current->last_wakee != p) {
current->last_wakee = p;
current->wakee_flips++;
}
}
static void task_waking_fair(struct task_struct *p)
{
......@@ -3038,6 +3055,7 @@ static void task_waking_fair(struct task_struct *p)
#endif
se->vruntime -= min_vruntime;
record_wakee(p);
}
#ifdef CONFIG_FAIR_GROUP_SCHED
......@@ -3156,6 +3174,28 @@ static inline unsigned long effective_load(struct task_group *tg, int cpu,
#endif
static int wake_wide(struct task_struct *p)
{
int factor = this_cpu_read(sd_llc_size);
/*
* Yeah, it's the switching-frequency, could means many wakee or
* rapidly switch, use factor here will just help to automatically
* adjust the loose-degree, so bigger node will lead to more pull.
*/
if (p->wakee_flips > factor) {
/*
* wakee is somewhat hot, it needs certain amount of cpu
* resource, so if waker is far more hot, prefer to leave
* it alone.
*/
if (current->wakee_flips > (factor * p->wakee_flips))
return 1;
}
return 0;
}
static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
{
s64 this_load, load;
......@@ -3165,6 +3205,13 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
unsigned long weight;
int balanced;
/*
* If we wake multiple tasks be careful to not bounce
* ourselves around too much.
*/
if (wake_wide(p))
return 0;
idx = sd->wake_idx;
this_cpu = smp_processor_id();
prev_cpu = task_cpu(p);
......@@ -4172,47 +4219,48 @@ static void update_blocked_averages(int cpu)
}
/*
* Compute the cpu's hierarchical load factor for each task group.
* Compute the hierarchical load factor for cfs_rq and all its ascendants.
* This needs to be done in a top-down fashion because the load of a child
* group is a fraction of its parents load.
*/
static int tg_load_down(struct task_group *tg, void *data)
{
unsigned long load;
long cpu = (long)data;
if (!tg->parent) {
load = cpu_rq(cpu)->avg.load_avg_contrib;
} else {
load = tg->parent->cfs_rq[cpu]->h_load;
load = div64_ul(load * tg->se[cpu]->avg.load_avg_contrib,
tg->parent->cfs_rq[cpu]->runnable_load_avg + 1);
}
tg->cfs_rq[cpu]->h_load = load;
return 0;
}
static void update_h_load(long cpu)
static void update_cfs_rq_h_load(struct cfs_rq *cfs_rq)
{
struct rq *rq = cpu_rq(cpu);
struct rq *rq = rq_of(cfs_rq);
struct sched_entity *se = cfs_rq->tg->se[cpu_of(rq)];
unsigned long now = jiffies;
unsigned long load;
if (rq->h_load_throttle == now)
if (cfs_rq->last_h_load_update == now)
return;
rq->h_load_throttle = now;
cfs_rq->h_load_next = NULL;
for_each_sched_entity(se) {
cfs_rq = cfs_rq_of(se);
cfs_rq->h_load_next = se;
if (cfs_rq->last_h_load_update == now)
break;
}
rcu_read_lock();
walk_tg_tree(tg_load_down, tg_nop, (void *)cpu);
rcu_read_unlock();
if (!se) {
cfs_rq->h_load = rq->avg.load_avg_contrib;
cfs_rq->last_h_load_update = now;
}
while ((se = cfs_rq->h_load_next) != NULL) {
load = cfs_rq->h_load;
load = div64_ul(load * se->avg.load_avg_contrib,
cfs_rq->runnable_load_avg + 1);
cfs_rq = group_cfs_rq(se);
cfs_rq->h_load = load;
cfs_rq->last_h_load_update = now;
}
}
static unsigned long task_h_load(struct task_struct *p)
{
struct cfs_rq *cfs_rq = task_cfs_rq(p);
update_cfs_rq_h_load(cfs_rq);
return div64_ul(p->se.avg.load_avg_contrib * cfs_rq->h_load,
cfs_rq->runnable_load_avg + 1);
}
......@@ -4221,10 +4269,6 @@ static inline void update_blocked_averages(int cpu)
{
}
static inline void update_h_load(long cpu)
{
}
static unsigned long task_h_load(struct task_struct *p)
{
return p->se.avg.load_avg_contrib;
......@@ -5114,7 +5158,6 @@ static int load_balance(int this_cpu, struct rq *this_rq,
env.src_rq = busiest;
env.loop_max = min(sysctl_sched_nr_migrate, busiest->nr_running);
update_h_load(env.src_cpu);
more_balance:
local_irq_save(flags);
double_rq_lock(env.dst_rq, busiest);
......
......@@ -285,7 +285,6 @@ struct cfs_rq {
/* Required to track per-cpu representation of a task_group */
u32 tg_runnable_contrib;
unsigned long tg_load_contrib;
#endif /* CONFIG_FAIR_GROUP_SCHED */
/*
* h_load = weight * f(tg)
......@@ -294,6 +293,9 @@ struct cfs_rq {
* this group.
*/
unsigned long h_load;
u64 last_h_load_update;
struct sched_entity *h_load_next;
#endif /* CONFIG_FAIR_GROUP_SCHED */
#endif /* CONFIG_SMP */
#ifdef CONFIG_FAIR_GROUP_SCHED
......@@ -429,9 +431,6 @@ struct rq {
#ifdef CONFIG_FAIR_GROUP_SCHED
/* list of leaf cfs_rq on this cpu: */
struct list_head leaf_cfs_rq_list;
#ifdef CONFIG_SMP
unsigned long h_load_throttle;
#endif /* CONFIG_SMP */
#endif /* CONFIG_FAIR_GROUP_SCHED */
#ifdef CONFIG_RT_GROUP_SCHED
......@@ -595,6 +594,7 @@ static inline struct sched_domain *highest_flag_domain(int cpu, int flag)
}
DECLARE_PER_CPU(struct sched_domain *, sd_llc);
DECLARE_PER_CPU(int, sd_llc_size);
DECLARE_PER_CPU(int, sd_llc_id);
struct sched_group_power {
......
......@@ -553,14 +553,6 @@ void __init lockup_detector_init(void)
{
set_sample_period();
#ifdef CONFIG_NO_HZ_FULL
if (watchdog_user_enabled) {
watchdog_user_enabled = 0;
pr_warning("Disabled lockup detectors by default for full dynticks\n");
pr_warning("You can reactivate it with 'sysctl -w kernel.watchdog=1'\n");
}
#endif
if (watchdog_user_enabled)
watchdog_enable_all_cpus();
}
......@@ -3,21 +3,6 @@ include ../../scripts/Makefile.include
CC = $(CROSS_COMPILE)gcc
AR = $(CROSS_COMPILE)ar
# Makefiles suck: This macro sets a default value of $(2) for the
# variable named by $(1), unless the variable has been set by
# environment or command line. This is necessary for CC and AR
# because make sets default values, so the simpler ?= approach
# won't work as expected.
define allow-override
$(if $(or $(findstring environment,$(origin $(1))),\
$(findstring command line,$(origin $(1)))),,\
$(eval $(1) = $(2)))
endef
# Allow setting CC and AR, or setting CROSS_COMPILE as a prefix.
$(call allow-override,CC,$(CROSS_COMPILE)gcc)
$(call allow-override,AR,$(CROSS_COMPILE)ar)
# guard against environment variables
LIB_H=
LIB_OBJS=
......
......@@ -39,13 +39,8 @@ bindir_relative = bin
bindir = $(prefix)/$(bindir_relative)
man_dir = $(prefix)/share/man
man_dir_SQ = '$(subst ','\'',$(man_dir))'
html_install = $(prefix)/share/kernelshark/html
html_install_SQ = '$(subst ','\'',$(html_install))'
img_install = $(prefix)/share/kernelshark/html/images
img_install_SQ = '$(subst ','\'',$(img_install))'
export man_dir man_dir_SQ html_install html_install_SQ INSTALL
export img_install img_install_SQ
export man_dir man_dir_SQ INSTALL
export DESTDIR DESTDIR_SQ
# copy a bit from Linux kbuild
......@@ -65,7 +60,7 @@ ifeq ($(BUILD_SRC),)
ifneq ($(BUILD_OUTPUT),)
define build_output
$(if $(VERBOSE:1=),@)$(MAKE) -C $(BUILD_OUTPUT) \
$(if $(VERBOSE:1=),@)+$(MAKE) -C $(BUILD_OUTPUT) \
BUILD_SRC=$(CURDIR) -f $(CURDIR)/Makefile $1
endef
......@@ -76,10 +71,7 @@ $(if $(BUILD_OUTPUT),, \
all: sub-make
gui: force
$(call build_output, all_cmd)
$(filter-out gui,$(MAKECMDGOALS)): sub-make
$(MAKECMDGOALS): sub-make
sub-make: force
$(call build_output, $(MAKECMDGOALS))
......@@ -189,6 +181,7 @@ $(obj)/%.o: $(src)/%.c
$(Q)$(call do_compile)
PEVENT_LIB_OBJS = event-parse.o trace-seq.o parse-filter.o parse-utils.o
PEVENT_LIB_OBJS += kbuffer-parse.o
ALL_OBJS = $(PEVENT_LIB_OBJS)
......@@ -258,9 +251,6 @@ define check_deps
$(RM) $@.$$$$
endef
$(gui_deps): ks_version.h
$(non_gui_deps): tc_version.h
$(all_deps): .%.d: $(src)/%.c
$(Q)$(call check_deps)
......@@ -300,7 +290,7 @@ define do_install
$(INSTALL) $1 '$(DESTDIR_SQ)$2'
endef
install_lib: all_cmd install_plugins install_python
install_lib: all_cmd
$(Q)$(call do_install,$(LIB_FILE),$(bindir_SQ))
install: install_lib
......
......@@ -5450,10 +5450,9 @@ int pevent_register_print_function(struct pevent *pevent,
* If @id is >= 0, then it is used to find the event.
* else @sys_name and @event_name are used.
*/
int pevent_register_event_handler(struct pevent *pevent,
int id, char *sys_name, char *event_name,
pevent_event_handler_func func,
void *context)
int pevent_register_event_handler(struct pevent *pevent, int id,
const char *sys_name, const char *event_name,
pevent_event_handler_func func, void *context)
{
struct event_format *event;
struct event_handler *handle;
......
......@@ -69,6 +69,7 @@ struct trace_seq {
};
void trace_seq_init(struct trace_seq *s);
void trace_seq_reset(struct trace_seq *s);
void trace_seq_destroy(struct trace_seq *s);
extern int trace_seq_printf(struct trace_seq *s, const char *fmt, ...)
......@@ -399,6 +400,7 @@ struct pevent {
int cpus;
int long_size;
int page_size;
struct cmdline *cmdlines;
struct cmdline_list *cmdlist;
......@@ -561,7 +563,8 @@ int pevent_print_num_field(struct trace_seq *s, const char *fmt,
struct event_format *event, const char *name,
struct pevent_record *record, int err);
int pevent_register_event_handler(struct pevent *pevent, int id, char *sys_name, char *event_name,
int pevent_register_event_handler(struct pevent *pevent, int id,
const char *sys_name, const char *event_name,
pevent_event_handler_func func, void *context);
int pevent_register_print_function(struct pevent *pevent,
pevent_func_handler func,
......@@ -619,6 +622,16 @@ static inline void pevent_set_long_size(struct pevent *pevent, int long_size)
pevent->long_size = long_size;
}
static inline int pevent_get_page_size(struct pevent *pevent)
{
return pevent->page_size;
}
static inline void pevent_set_page_size(struct pevent *pevent, int _page_size)
{
pevent->page_size = _page_size;
}
static inline int pevent_is_file_bigendian(struct pevent *pevent)
{
return pevent->file_bigendian;
......
This diff is collapsed.
/*
* Copyright (C) 2012 Red Hat Inc, Steven Rostedt <srostedt@redhat.com>
*
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
* License as published by the Free Software Foundation;
* version 2.1 of the License (not later!)
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public
* License along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
*
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/
#ifndef _KBUFFER_H
#define _KBUFFER_H
#ifndef TS_SHIFT
#define TS_SHIFT 27
#endif
enum kbuffer_endian {
KBUFFER_ENDIAN_BIG,
KBUFFER_ENDIAN_LITTLE,
};
enum kbuffer_long_size {
KBUFFER_LSIZE_4,
KBUFFER_LSIZE_8,
};
enum {
KBUFFER_TYPE_PADDING = 29,
KBUFFER_TYPE_TIME_EXTEND = 30,
KBUFFER_TYPE_TIME_STAMP = 31,
};
struct kbuffer;
struct kbuffer *kbuffer_alloc(enum kbuffer_long_size size, enum kbuffer_endian endian);
void kbuffer_free(struct kbuffer *kbuf);
int kbuffer_load_subbuffer(struct kbuffer *kbuf, void *subbuffer);
void *kbuffer_read_event(struct kbuffer *kbuf, unsigned long long *ts);
void *kbuffer_next_event(struct kbuffer *kbuf, unsigned long long *ts);
unsigned long long kbuffer_timestamp(struct kbuffer *kbuf);
void *kbuffer_translate_data(int swap, void *data, unsigned int *size);
void *kbuffer_read_at_offset(struct kbuffer *kbuf, int offset, unsigned long long *ts);
int kbuffer_curr_index(struct kbuffer *kbuf);
int kbuffer_curr_offset(struct kbuffer *kbuf);
int kbuffer_curr_size(struct kbuffer *kbuf);
int kbuffer_event_size(struct kbuffer *kbuf);
int kbuffer_missed_events(struct kbuffer *kbuf);
int kbuffer_subbuffer_size(struct kbuffer *kbuf);
void kbuffer_set_old_format(struct kbuffer *kbuf);
#endif /* _K_BUFFER_H */
......@@ -48,6 +48,19 @@ void trace_seq_init(struct trace_seq *s)
s->buffer = malloc_or_die(s->buffer_size);
}
/**
* trace_seq_reset - re-initialize the trace_seq structure
* @s: a pointer to the trace_seq structure to reset
*/
void trace_seq_reset(struct trace_seq *s)
{
if (!s)
return;
TRACE_SEQ_CHECK(s);
s->len = 0;
s->readpos = 0;
}
/**
* trace_seq_destroy - free up memory of a trace_seq
* @s: a pointer to the trace_seq to free the buffer
......
......@@ -3,17 +3,17 @@ perf-diff(1)
NAME
----
perf-diff - Read two perf.data files and display the differential profile
perf-diff - Read perf.data files and display the differential profile
SYNOPSIS
--------
[verse]
'perf diff' [oldfile] [newfile]
'perf diff' [baseline file] [data file1] [[data file2] ... ]
DESCRIPTION
-----------
This command displays the performance difference amongst two perf.data files
captured via perf record.
This command displays the performance difference amongst two or more perf.data
files captured via perf record.
If no parameters are passed it will assume perf.data.old and perf.data.
......@@ -75,8 +75,6 @@ OPTIONS
-c::
--compute::
Differential computation selection - delta,ratio,wdiff (default is delta).
If '+' is specified as a first character, the output is sorted based
on the computation results.
See COMPARISON METHODS section for more info.
-p::
......@@ -87,6 +85,63 @@ OPTIONS
--formula::
Show formula for given computation.
-o::
--order::
Specify compute sorting column number.
COMPARISON
----------
The comparison is governed by the baseline file. The baseline perf.data
file is iterated for samples. All other perf.data files specified on
the command line are searched for the baseline sample pair. If the pair
is found, specified computation is made and result is displayed.
All samples from non-baseline perf.data files, that do not match any
baseline entry, are displayed with empty space within baseline column
and possible computation results (delta) in their related column.
Example files samples:
- file A with samples f1, f2, f3, f4, f6
- file B with samples f2, f4, f5
- file C with samples f1, f2, f5
Example output:
x - computation takes place for pair
b - baseline sample percentage
- perf diff A B C
baseline/A compute/B compute/C samples
---------------------------------------
b x f1
b x x f2
b f3
b x f4
b f6
x x f5
- perf diff B A C
baseline/B compute/A compute/C samples
---------------------------------------
b x x f2
b x f4
b x f5
x x f1
x f3
x f6
- perf diff C B A
baseline/C compute/B compute/A samples
---------------------------------------
b x f1
b x x f2
b x f5
x f3
x x f4
x f6
COMPARISON METHODS
------------------
delta
......@@ -96,7 +151,7 @@ If specified the 'Delta' column is displayed with value 'd' computed as:
d = A->period_percent - B->period_percent
with:
- A/B being matching hist entry from first/second file specified
- A/B being matching hist entry from data/baseline file specified
(or perf.data/perf.data.old) respectively.
- period_percent being the % of the hist entry period value within
......@@ -109,24 +164,26 @@ If specified the 'Ratio' column is displayed with value 'r' computed as:
r = A->period / B->period
with:
- A/B being matching hist entry from first/second file specified
- A/B being matching hist entry from data/baseline file specified
(or perf.data/perf.data.old) respectively.
- period being the hist entry period value
wdiff
~~~~~
wdiff:WEIGHT-B,WEIGHT-A
~~~~~~~~~~~~~~~~~~~~~~~
If specified the 'Weighted diff' column is displayed with value 'd' computed as:
d = B->period * WEIGHT-A - A->period * WEIGHT-B
- A/B being matching hist entry from first/second file specified
- A/B being matching hist entry from data/baseline file specified
(or perf.data/perf.data.old) respectively.
- period being the hist entry period value
- WEIGHT-A/WEIGHT-B being user suplied weights in the the '-c' option
behind ':' separator like '-c wdiff:1,2'.
- WIEGHT-A being the weight of the data file
- WIEGHT-B being the weight of the baseline data file
SEE ALSO
--------
......
......@@ -13,6 +13,7 @@ SYNOPSIS
{top|record|report|diff|buildid-list}
'perf kvm' [--host] [--guest] [--guestkallsyms=<path> --guestmodules=<path>
| --guestvmlinux=<path>] {top|record|report|diff|buildid-list|stat}
'perf kvm stat [record|report|live] [<options>]
DESCRIPTION
-----------
......@@ -50,6 +51,10 @@ There are a couple of variants of perf kvm:
'perf kvm stat report' reports statistical data which includes events
handled time, samples, and so on.
'perf kvm stat live' reports statistical data in a live mode (similar to
record + report but with statistical data updated live at a given display
rate).
OPTIONS
-------
-i::
......@@ -85,13 +90,50 @@ STAT REPORT OPTIONS
--vcpu=<value>::
analyze events which occures on this vcpu. (default: all vcpus)
--events=<value>::
events to be analyzed. Possible values: vmexit, mmio, ioport.
--event=<value>::
event to be analyzed. Possible values: vmexit, mmio, ioport.
(default: vmexit)
-k::
--key=<value>::
Sorting key. Possible values: sample (default, sort by samples
number), time (sort by average time).
-p::
--pid=::
Analyze events only for given process ID(s) (comma separated list).
STAT LIVE OPTIONS
-----------------
-d::
--display::
Time in seconds between display updates
-m::
--mmap-pages=::
Number of mmap data pages. Must be a power of two.
-a::
--all-cpus::
System-wide collection from all CPUs.
-p::
--pid=::
Analyze events only for given process ID(s) (comma separated list).
--vcpu=<value>::
analyze events which occures on this vcpu. (default: all vcpus)
--event=<value>::
event to be analyzed. Possible values: vmexit, mmio, ioport.
(default: vmexit)
-k::
--key=<value>::
Sorting key. Possible values: sample (default, sort by samples
number), time (sort by average time).
--duration=<value>::
Show events other than HLT that take longer than duration usecs.
SEE ALSO
--------
......
......@@ -8,7 +8,7 @@ perf-list - List all symbolic event types
SYNOPSIS
--------
[verse]
'perf list' [hw|sw|cache|tracepoint|event_glob]
'perf list' [hw|sw|cache|tracepoint|pmu|event_glob]
DESCRIPTION
-----------
......@@ -29,6 +29,8 @@ counted. The following modifiers exist:
G - guest counting (in KVM guests)
H - host counting (not in KVM guests)
p - precise level
S - read sample value (PERF_SAMPLE_READ)
D - pin the event to the PMU
The 'p' modifier can be used for specifying how precise the instruction
address should be. The 'p' modifier can be specified multiple times:
......@@ -104,6 +106,8 @@ To limit the list use:
'subsys_glob:event_glob' to filter by tracepoint subsystems such as sched,
block, etc.
. 'pmu' to print the kernel supplied PMU events.
. If none of the above is matched, it will apply the supplied glob to all
events, printing the ones that match.
......
......@@ -115,7 +115,7 @@ OPTIONS
--dump-raw-trace::
Dump raw trace in ASCII.
-g [type,min[,limit],order]::
-g [type,min[,limit],order[,key]]::
--call-graph::
Display call chains using type, min percent threshold, optional print
limit and order.
......@@ -129,12 +129,21 @@ OPTIONS
- callee: callee based call graph.
- caller: inverted caller based call graph.
Default: fractal,0.5,callee.
key can be:
- function: compare on functions
- address: compare on individual code addresses
Default: fractal,0.5,callee,function.
-G::
--inverted::
alias for inverted caller based call graph.
--ignore-callees=<regex>::
Ignore callees of the function(s) matching the given regex.
This has the effect of collecting the callers of each such
function into one place in the call-graph tree.
--pretty=<key>::
Pretty printing style. key: normal, raw
......
......@@ -132,6 +132,11 @@ is a useful mode to detect imbalance between physical cores. To enable this mod
use --per-core in addition to -a. (system-wide). The output includes the
core number and the number of online logical processors on that physical processor.
-D msecs::
--initial-delay msecs::
After starting the program, wait msecs before measuring. This is useful to
filter out the startup phase of the program, which is often very different.
EXAMPLES
--------
......
......@@ -155,6 +155,11 @@ Default is to monitor all CPUS.
Default: fractal,0.5,callee.
--ignore-callees=<regex>::
Ignore callees of the function(s) matching the given regex.
This has the effect of collecting the callers of each such
function into one place in the call-graph tree.
--percent-limit::
Do not show entries which have an overhead under that percent.
(Default: 0).
......
......@@ -23,25 +23,45 @@ analysis phases.
OPTIONS
-------
-a::
--all-cpus::
System-wide collection from all CPUs.
-e::
--expr::
List of events to show, currently only syscall names.
Prefixing with ! shows all syscalls but the ones specified. You may
need to escape it.
-o::
--output=::
Output file name.
-p::
--pid=::
Record events on existing process ID (comma separated list).
-t::
--tid=::
Record events on existing thread ID (comma separated list).
-u::
--uid=::
Record events in threads owned by uid. Name or number.
-v::
--verbose=::
Verbosity level.
-i::
--no-inherit::
Child tasks do not inherit counters.
-m::
--mmap-pages=::
Number of mmap data pages. Must be a power of two.
-C::
--cpu::
Collect samples only on the list of CPUs provided. Multiple CPUs can be provided as a
comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2.
......@@ -54,6 +74,10 @@ the thread executes on the designated CPUs. Default is to monitor all CPUs.
--sched:
Accrue thread runtime and provide a summary at the end of the session.
-i
--input
Process events from a given perf data file.
SEE ALSO
--------
linkperf:perf-record[1], linkperf:perf-script[1]
......@@ -124,7 +124,7 @@ strip-libs = $(filter-out -l%,$(1))
ifneq ($(OUTPUT),)
TE_PATH=$(OUTPUT)
ifneq ($(subdir),)
LK_PATH=$(objtree)/lib/lk/
LK_PATH=$(OUTPUT)/../lib/lk/
else
LK_PATH=$(OUTPUT)
endif
......@@ -281,7 +281,7 @@ LIB_H += util/cpumap.h
LIB_H += util/top.h
LIB_H += $(ARCH_INCLUDE)
LIB_H += util/cgroup.h
LIB_H += $(TRACE_EVENT_DIR)event-parse.h
LIB_H += $(LIB_INCLUDE)traceevent/event-parse.h
LIB_H += util/target.h
LIB_H += util/rblist.h
LIB_H += util/intlist.h
......@@ -360,6 +360,7 @@ LIB_OBJS += $(OUTPUT)util/rblist.o
LIB_OBJS += $(OUTPUT)util/intlist.o
LIB_OBJS += $(OUTPUT)util/vdso.o
LIB_OBJS += $(OUTPUT)util/stat.o
LIB_OBJS += $(OUTPUT)util/record.o
LIB_OBJS += $(OUTPUT)ui/setup.o
LIB_OBJS += $(OUTPUT)ui/helpline.o
......@@ -389,6 +390,10 @@ LIB_OBJS += $(OUTPUT)tests/bp_signal.o
LIB_OBJS += $(OUTPUT)tests/bp_signal_overflow.o
LIB_OBJS += $(OUTPUT)tests/task-exit.o
LIB_OBJS += $(OUTPUT)tests/sw-clock.o
ifeq ($(ARCH),x86)
LIB_OBJS += $(OUTPUT)tests/perf-time-to-tsc.o
endif
LIB_OBJS += $(OUTPUT)tests/code-reading.o
BUILTIN_OBJS += $(OUTPUT)builtin-annotate.o
BUILTIN_OBJS += $(OUTPUT)builtin-bench.o
......@@ -434,6 +439,7 @@ PERFLIBS = $(LIB_FILE) $(LIBLK) $(LIBTRACEEVENT)
ifneq ($(OUTPUT),)
CFLAGS += -I$(OUTPUT)
endif
LIB_OBJS += $(OUTPUT)tests/sample-parsing.o
ifdef NO_LIBELF
EXTLIBS := $(filter-out -lelf,$(EXTLIBS))
......@@ -459,6 +465,7 @@ endif # NO_LIBELF
ifndef NO_LIBUNWIND
LIB_OBJS += $(OUTPUT)util/unwind.o
endif
LIB_OBJS += $(OUTPUT)tests/keep-tracking.o
ifndef NO_LIBAUDIT
BUILTIN_OBJS += $(OUTPUT)builtin-trace.o
......@@ -631,10 +638,10 @@ $(OUTPUT)util/parse-events.o: util/parse-events.c $(OUTPUT)PERF-CFLAGS
$(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) -Wno-redundant-decls $<
$(OUTPUT)util/scripting-engines/trace-event-perl.o: util/scripting-engines/trace-event-perl.c $(OUTPUT)PERF-CFLAGS
$(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) $(PERL_EMBED_CCOPTS) -Wno-redundant-decls -Wno-strict-prototypes -Wno-unused-parameter -Wno-shadow $<
$(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) $(PERL_EMBED_CCOPTS) -Wno-redundant-decls -Wno-strict-prototypes -Wno-unused-parameter -Wno-shadow -Wno-undef -Wno-switch-default $<
$(OUTPUT)scripts/perl/Perf-Trace-Util/Context.o: scripts/perl/Perf-Trace-Util/Context.c $(OUTPUT)PERF-CFLAGS
$(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) $(PERL_EMBED_CCOPTS) -Wno-redundant-decls -Wno-strict-prototypes -Wno-unused-parameter -Wno-nested-externs $<
$(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) $(PERL_EMBED_CCOPTS) -Wno-redundant-decls -Wno-strict-prototypes -Wno-unused-parameter -Wno-nested-externs -Wno-undef -Wno-switch-default $<
$(OUTPUT)util/scripting-engines/trace-event-python.o: util/scripting-engines/trace-event-python.c $(OUTPUT)PERF-CFLAGS
$(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) $(PYTHON_EMBED_CCOPTS) -Wno-redundant-decls -Wno-strict-prototypes -Wno-unused-parameter -Wno-shadow $<
......@@ -762,17 +769,21 @@ check: $(OUTPUT)common-cmds.h
install-bin: all
$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(bindir_SQ)'
$(INSTALL) $(OUTPUT)perf '$(DESTDIR_SQ)$(bindir_SQ)'
$(INSTALL) $(OUTPUT)perf-archive -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)'
ifndef NO_LIBPERL
$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/perl/Perf-Trace-Util/lib/Perf/Trace'
$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/perl/bin'
$(INSTALL) $(OUTPUT)perf-archive -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)'
$(INSTALL) scripts/perl/Perf-Trace-Util/lib/Perf/Trace/* -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/perl/Perf-Trace-Util/lib/Perf/Trace'
$(INSTALL) scripts/perl/*.pl -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/perl'
$(INSTALL) scripts/perl/bin/* -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/perl/bin'
endif
ifndef NO_LIBPYTHON
$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/Perf-Trace-Util/lib/Perf/Trace'
$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/bin'
$(INSTALL) scripts/python/Perf-Trace-Util/lib/Perf/Trace/* -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/Perf-Trace-Util/lib/Perf/Trace'
$(INSTALL) scripts/python/*.py -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python'
$(INSTALL) scripts/python/bin/* -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/bin'
endif
$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(sysconfdir_SQ)/bash_completion.d'
$(INSTALL) bash_completion '$(DESTDIR_SQ)$(sysconfdir_SQ)/bash_completion.d/perf'
$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/tests'
......
......@@ -6,3 +6,5 @@ ifndef NO_LIBUNWIND
LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/unwind.o
endif
LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/header.o
LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/tsc.o
LIB_H += arch/$(ARCH)/util/tsc.h
#include <stdbool.h>
#include <errno.h>
#include <linux/perf_event.h>
#include "../../perf.h"
#include "../../util/types.h"
#include "../../util/debug.h"
#include "tsc.h"
u64 perf_time_to_tsc(u64 ns, struct perf_tsc_conversion *tc)
{
u64 t, quot, rem;
t = ns - tc->time_zero;
quot = t / tc->time_mult;
rem = t % tc->time_mult;
return (quot << tc->time_shift) +
(rem << tc->time_shift) / tc->time_mult;
}
u64 tsc_to_perf_time(u64 cyc, struct perf_tsc_conversion *tc)
{
u64 quot, rem;
quot = cyc >> tc->time_shift;
rem = cyc & ((1 << tc->time_shift) - 1);
return tc->time_zero + quot * tc->time_mult +
((rem * tc->time_mult) >> tc->time_shift);
}
int perf_read_tsc_conversion(const struct perf_event_mmap_page *pc,
struct perf_tsc_conversion *tc)
{
bool cap_usr_time_zero;
u32 seq;
int i = 0;
while (1) {
seq = pc->lock;
rmb();
tc->time_mult = pc->time_mult;
tc->time_shift = pc->time_shift;
tc->time_zero = pc->time_zero;
cap_usr_time_zero = pc->cap_usr_time_zero;
rmb();
if (pc->lock == seq && !(seq & 1))
break;
if (++i > 10000) {
pr_debug("failed to get perf_event_mmap_page lock\n");
return -EINVAL;
}
}
if (!cap_usr_time_zero)
return -EOPNOTSUPP;
return 0;
}
#ifndef TOOLS_PERF_ARCH_X86_UTIL_TSC_H__
#define TOOLS_PERF_ARCH_X86_UTIL_TSC_H__
#include "../../util/types.h"
struct perf_tsc_conversion {
u16 time_shift;
u32 time_mult;
u64 time_zero;
};
struct perf_event_mmap_page;
int perf_read_tsc_conversion(const struct perf_event_mmap_page *pc,
struct perf_tsc_conversion *tc);
u64 perf_time_to_tsc(u64 ns, struct perf_tsc_conversion *tc);
u64 tsc_to_perf_time(u64 cyc, struct perf_tsc_conversion *tc);
#endif /* TOOLS_PERF_ARCH_X86_UTIL_TSC_H__ */
......@@ -117,6 +117,8 @@ static void alloc_mem(void **dst, void **src, size_t length)
*src = zalloc(length);
if (!*src)
die("memory allocation failed - maybe length is too large?\n");
/* Make sure to always replace the zero pages even if MMAP_THRESH is crossed */
memset(*src, 0, length);
}
static u64 do_memcpy_cycle(memcpy_t fn, size_t len, bool prefault)
......
......@@ -90,8 +90,7 @@ static int process_sample_event(struct perf_tool *tool,
struct perf_annotate *ann = container_of(tool, struct perf_annotate, tool);
struct addr_location al;
if (perf_event__preprocess_sample(event, machine, &al, sample,
symbol__annotate_init) < 0) {
if (perf_event__preprocess_sample(event, machine, &al, sample) < 0) {
pr_warning("problem processing %d event, skipping it.\n",
event->header.type);
return -1;
......@@ -195,6 +194,8 @@ static int __cmd_annotate(struct perf_annotate *ann)
if (session == NULL)
return -ENOMEM;
machines__set_symbol_filter(&session->machines, symbol__annotate_init);
if (ann->cpu_list) {
ret = perf_session__cpu_bitmap(session, ann->cpu_list,
ann->cpu_bitmap);
......
This diff is collapsed.
......@@ -38,8 +38,7 @@ struct event_entry {
};
static int perf_event__repipe_synth(struct perf_tool *tool,
union perf_event *event,
struct machine *machine __maybe_unused)
union perf_event *event)
{
struct perf_inject *inject = container_of(tool, struct perf_inject, tool);
uint32_t size;
......@@ -65,39 +64,28 @@ static int perf_event__repipe_op2_synth(struct perf_tool *tool,
struct perf_session *session
__maybe_unused)
{
return perf_event__repipe_synth(tool, event, NULL);
return perf_event__repipe_synth(tool, event);
}
static int perf_event__repipe_event_type_synth(struct perf_tool *tool,
union perf_event *event)
{
return perf_event__repipe_synth(tool, event, NULL);
}
static int perf_event__repipe_tracing_data_synth(union perf_event *event,
struct perf_session *session
__maybe_unused)
{
return perf_event__repipe_synth(NULL, event, NULL);
}
static int perf_event__repipe_attr(union perf_event *event,
struct perf_evlist **pevlist __maybe_unused)
static int perf_event__repipe_attr(struct perf_tool *tool,
union perf_event *event,
struct perf_evlist **pevlist)
{
int ret;
ret = perf_event__process_attr(event, pevlist);
ret = perf_event__process_attr(tool, event, pevlist);
if (ret)
return ret;
return perf_event__repipe_synth(NULL, event, NULL);
return perf_event__repipe_synth(tool, event);
}
static int perf_event__repipe(struct perf_tool *tool,
union perf_event *event,
struct perf_sample *sample __maybe_unused,
struct machine *machine)
struct machine *machine __maybe_unused)
{
return perf_event__repipe_synth(tool, event, machine);
return perf_event__repipe_synth(tool, event);
}
typedef int (*inject_handler)(struct perf_tool *tool,
......@@ -119,7 +107,7 @@ static int perf_event__repipe_sample(struct perf_tool *tool,
build_id__mark_dso_hit(tool, event, sample, evsel, machine);
return perf_event__repipe_synth(tool, event, machine);
return perf_event__repipe_synth(tool, event);
}
static int perf_event__repipe_mmap(struct perf_tool *tool,
......@@ -148,13 +136,14 @@ static int perf_event__repipe_fork(struct perf_tool *tool,
return err;
}
static int perf_event__repipe_tracing_data(union perf_event *event,
static int perf_event__repipe_tracing_data(struct perf_tool *tool,
union perf_event *event,
struct perf_session *session)
{
int err;
perf_event__repipe_synth(NULL, event, NULL);
err = perf_event__process_tracing_data(event, session);
perf_event__repipe_synth(tool, event);
err = perf_event__process_tracing_data(tool, event, session);
return err;
}
......@@ -209,7 +198,7 @@ static int perf_event__inject_buildid(struct perf_tool *tool,
cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
thread = machine__findnew_thread(machine, event->ip.pid);
thread = machine__findnew_thread(machine, sample->pid, sample->pid);
if (thread == NULL) {
pr_err("problem processing %d event, skipping it.\n",
event->header.type);
......@@ -217,7 +206,7 @@ static int perf_event__inject_buildid(struct perf_tool *tool,
}
thread__find_addr_map(thread, machine, cpumode, MAP__FUNCTION,
event->ip.ip, &al);
sample->ip, &al);
if (al.map != NULL) {
if (!al.map->dso->hit) {
......@@ -312,7 +301,9 @@ static int perf_inject__sched_stat(struct perf_tool *tool,
sample_sw.period = sample->period;
sample_sw.time = sample->time;
perf_event__synthesize_sample(event_sw, evsel->attr.sample_type,
&sample_sw, false);
evsel->attr.sample_regs_user,
evsel->attr.read_format, &sample_sw,
false);
build_id__mark_dso_hit(tool, event_sw, &sample_sw, evsel, machine);
return perf_event__repipe(tool, event_sw, &sample_sw, machine);
}
......@@ -407,8 +398,8 @@ int cmd_inject(int argc, const char **argv, const char *prefix __maybe_unused)
.throttle = perf_event__repipe,
.unthrottle = perf_event__repipe,
.attr = perf_event__repipe_attr,
.event_type = perf_event__repipe_event_type_synth,
.tracing_data = perf_event__repipe_tracing_data_synth,
.tracing_data = perf_event__repipe_op2_synth,
.finished_round = perf_event__repipe_op2_synth,
.build_id = perf_event__repipe_op2_synth,
},
.input_name = "-",
......
......@@ -305,7 +305,8 @@ static int process_sample_event(struct perf_tool *tool __maybe_unused,
struct perf_evsel *evsel,
struct machine *machine)
{
struct thread *thread = machine__findnew_thread(machine, event->ip.pid);
struct thread *thread = machine__findnew_thread(machine, sample->pid,
sample->pid);
if (thread == NULL) {
pr_debug("problem processing %d event, skipping it.\n",
......@@ -313,7 +314,7 @@ static int process_sample_event(struct perf_tool *tool __maybe_unused,
return -1;
}
dump_printf(" ... thread: %s:%d\n", thread->comm, thread->pid);
dump_printf(" ... thread: %s:%d\n", thread->comm, thread->tid);
if (evsel->handler.func != NULL) {
tracepoint_handler f = evsel->handler.func;
......
This diff is collapsed.
......@@ -13,6 +13,7 @@
#include "util/parse-events.h"
#include "util/cache.h"
#include "util/pmu.h"
int cmd_list(int argc, const char **argv, const char *prefix __maybe_unused)
{
......@@ -37,6 +38,8 @@ int cmd_list(int argc, const char **argv, const char *prefix __maybe_unused)
else if (strcmp(argv[i], "cache") == 0 ||
strcmp(argv[i], "hwcache") == 0)
print_hwcache_events(NULL, false);
else if (strcmp(argv[i], "pmu") == 0)
print_pmu_events(NULL, false);
else if (strcmp(argv[i], "--raw-dump") == 0)
print_events(NULL, true);
else {
......
......@@ -805,7 +805,8 @@ static int process_sample_event(struct perf_tool *tool __maybe_unused,
struct perf_evsel *evsel,
struct machine *machine)
{
struct thread *thread = machine__findnew_thread(machine, sample->tid);
struct thread *thread = machine__findnew_thread(machine, sample->pid,
sample->tid);
if (thread == NULL) {
pr_debug("problem processing %d event, skipping it.\n",
......
......@@ -14,7 +14,6 @@ static const char *mem_operation = MEM_OPERATION_LOAD;
struct perf_mem {
struct perf_tool tool;
char const *input_name;
symbol_filter_t annotate_init;
bool hide_unresolved;
bool dump_raw;
const char *cpu_list;
......@@ -69,8 +68,7 @@ dump_raw_samples(struct perf_tool *tool,
struct addr_location al;
const char *fmt;
if (perf_event__preprocess_sample(event, machine, &al, sample,
mem->annotate_init) < 0) {
if (perf_event__preprocess_sample(event, machine, &al, sample) < 0) {
fprintf(stderr, "problem processing %d event, skipping it.\n",
event->header.type);
return -1;
......@@ -96,7 +94,7 @@ dump_raw_samples(struct perf_tool *tool,
symbol_conf.field_sep,
sample->tid,
symbol_conf.field_sep,
event->ip.ip,
sample->ip,
symbol_conf.field_sep,
sample->addr,
symbol_conf.field_sep,
......
......@@ -474,13 +474,6 @@ static int __cmd_record(struct perf_record *rec, int argc, const char **argv)
goto out_delete_session;
}
err = perf_event__synthesize_event_types(tool, process_synthesized_event,
machine);
if (err < 0) {
pr_err("Couldn't synthesize event_types.\n");
goto out_delete_session;
}
if (have_tracepoints(&evsel_list->entries)) {
/*
* FIXME err <= 0 here actually means that
......@@ -904,7 +897,6 @@ const struct option record_options[] = {
int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
{
int err = -ENOMEM;
struct perf_evsel *pos;
struct perf_evlist *evsel_list;
struct perf_record *rec = &record;
char errbuf[BUFSIZ];
......@@ -968,11 +960,6 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
if (perf_evlist__create_maps(evsel_list, &rec->opts.target) < 0)
usage_with_options(record_usage, record_options);
list_for_each_entry(pos, &evsel_list->entries, node) {
if (perf_header__push_event(pos->attr.config, perf_evsel__name(pos)))
goto out_free_fd;
}
if (rec->opts.user_interval != ULLONG_MAX)
rec->opts.default_interval = rec->opts.user_interval;
if (rec->opts.user_freq != UINT_MAX)
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -125,6 +125,9 @@
#ifndef NSEC_PER_SEC
# define NSEC_PER_SEC 1000000000ULL
#endif
#ifndef NSEC_PER_USEC
# define NSEC_PER_USEC 1000ULL
#endif
static inline unsigned long long rdclock(void)
{
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment