Commit 3a36281a authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'perf-tools-for-v5.12-2020-02-19' of...

Merge tag 'perf-tools-for-v5.12-2020-02-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tool updates from Arnaldo Carvalho de Melo:
 "New features:

   - Support instruction latency in 'perf report', with both memory
     latency (weight) and instruction latency information, users can
     locate expensive load instructions and understand time spent in
     different stages.

   - Extend 'perf c2c' to display the number of loads which were blocked
     by data or address conflict.

   - Add 'perf stat' support for L2 topdown events in systems such as
     Intel's Sapphire rapids server.

   - Add support for PERF_SAMPLE_CODE_PAGE_SIZE in various tools, as a
     sort key, for instance:

        perf report --stdio --sort=comm,symbol,code_page_size

   - New 'perf daemon' command to run long running sessions while
     providing a way to control the enablement of events without
     restarting a traditional 'perf record' session.

   - Enable counting events for BPF programs in 'perf stat' just like
     for other targets (tid, cgroup, cpu, etc), e.g.:

        # perf stat -e ref-cycles,cycles -b 254 -I 1000
           1.487903822            115,200      ref-cycles
           1.487903822             86,012      cycles
           2.489147029             80,560      ref-cycles
           2.489147029             73,784      cycles
        ^C

     The example above counts 'cycles' and 'ref-cycles' of BPF program
     of id 254. It is similar to bpftool-prog-profile command, but more
     flexible.

   - Support the new layout for PERF_RECORD_MMAP2 to carry the DSO
     build-id using infrastructure generalised from the eBPF subsystem,
     removing the need for traversing the perf.data file to collect
     build-ids at the end of 'perf record' sessions and helping with
     long running sessions where binaries can get replaced in updates,
     leading to possible mis-resolution of symbols.

   - Support filtering by hex address in 'perf script'.

   - Support DSO filter in 'perf script', like in other perf tools.

   - Add namespaces support to 'perf inject'

   - Add support for SDT (Dtrace Style Markers) events on ARM64.

  perf record:

   - Fix handling of eventfd() when draining a buffer in 'perf record'.

   - Improvements to the generation of metadata events for pre-existing
     threads (mmaps, comm, etc), speeding up the work done at the start
     of system wide or per CPU 'perf record' sessions.

  Hardware tracing:

   - Initial support for tracing KVM with Intel PT.

   - Intel PT fixes for IPC

   - Support Intel PT PSB (synchronization packets) events.

   - Automatically group aux-output events to overcome --filter syntax.

   - Enable PERF_SAMPLE_DATA_SRC on ARMs SPE.

   - Update ARM's CoreSight hardware tracing OpenCSD library to v1.0.0.

  perf annotate TUI:

   - Fix handling of 'k' ("show line number") hotkey

   - Fix jump parsing for C++ code.

  perf probe:

   - Add protection to avoid endless loop.

  cgroups:

   - Avoid reading cgroup mountpoint multiple times, caching it.

   - Fix handling of cgroup v1/v2 in mixed hierarchy.

  Symbol resolving:

   - Add OCaml symbol demangling.

   - Further fixes for handling PE executables when using perf with Wine
     and .exe/.dll files.

   - Fix 'perf unwind' DSO handling.

   - Resolve symbols against debug file first, to deal with artifacts
     related to LTO.

   - Fix gap between kernel end and module start on powerpc.

  Reporting tools:

   - The DSO filter shouldn't show samples in unresolved maps.

   - Improve debuginfod support in various tools.

  build ids:

   - Fix 16-byte build ids in 'perf buildid-cache', add a 'perf test'
     entry for that case.

  perf test:

   - Support for PERF_SAMPLE_WEIGHT_STRUCT.

   - Add test case for PERF_SAMPLE_CODE_PAGE_SIZE.

   - Shell based tests for 'perf daemon's commands ('start', 'stop,
     'reconfig', 'list', etc).

   - ARM cs-etm 'perf test' fixes.

   - Add parse-metric memory bandwidth testcase.

  Compiler related:

   - Fix 'perf probe' kretprobe issue caused by gcc 11 bug when used
     with -fpatchable-function-entry.

   - Fix ARM64 build with gcc 11's -Wformat-overflow.

   - Fix unaligned access in sample parsing test.

   - Fix printf conversion specifier for IP addresses on arm64, s390 and
     powerpc.

  Arch specific:

   - Support exposing Performance Monitor Counter SPRs as part of
     extended regs on powerpc.

   - Add JSON 'perf stat' metrics for ARM64's imx8mp, imx8mq and imx8mn
     DDR, fix imx8mm ones.

   - Fix common and uarch events for ARM64's A76 and Ampere eMag"

* tag 'perf-tools-for-v5.12-2020-02-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (148 commits)
  perf buildid-cache: Don't skip 16-byte build-ids
  perf buildid-cache: Add test for 16-byte build-id
  perf symbol: Remove redundant libbfd checks
  perf test: Output the sub testing result in cs-etm
  perf test: Suppress logs in cs-etm testing
  perf tools: Fix arm64 build error with gcc-11
  perf intel-pt: Add documentation for tracing virtual machines
  perf intel-pt: Split VM-Entry and VM-Exit branches
  perf intel-pt: Adjust sample flags for VM-Exit
  perf intel-pt: Allow for a guest kernel address filter
  perf intel-pt: Support decoding of guest kernel
  perf machine: Factor out machine__idle_thread()
  perf machine: Factor out machines__find_guest()
  perf intel-pt: Amend decoder to track the NR flag
  perf intel-pt: Retain the last PIP packet payload as is
  perf intel_pt: Add vmlaunch and vmresume as branches
  perf script: Add branch types for VM-Entry and VM-Exit
  perf auxtrace: Automatically group aux-output events
  perf test: Fix unaligned access in sample parsing test
  perf tools: Support arch specific PERF_SAMPLE_WEIGHT_STRUCT processing
  ...
parents 7c70f3a7 3027ce36
......@@ -55,17 +55,33 @@ enum perf_event_powerpc_regs {
PERF_REG_POWERPC_MMCR3,
PERF_REG_POWERPC_SIER2,
PERF_REG_POWERPC_SIER3,
PERF_REG_POWERPC_PMC1,
PERF_REG_POWERPC_PMC2,
PERF_REG_POWERPC_PMC3,
PERF_REG_POWERPC_PMC4,
PERF_REG_POWERPC_PMC5,
PERF_REG_POWERPC_PMC6,
/* Max regs without the extended regs */
PERF_REG_POWERPC_MAX = PERF_REG_POWERPC_MMCRA + 1,
};
#define PERF_REG_PMU_MASK ((1ULL << PERF_REG_POWERPC_MAX) - 1)
/* PERF_REG_EXTENDED_MASK value for CPU_FTR_ARCH_300 */
#define PERF_REG_PMU_MASK_300 (((1ULL << (PERF_REG_POWERPC_MMCR2 + 1)) - 1) - PERF_REG_PMU_MASK)
/* PERF_REG_EXTENDED_MASK value for CPU_FTR_ARCH_31 */
#define PERF_REG_PMU_MASK_31 (((1ULL << (PERF_REG_POWERPC_SIER3 + 1)) - 1) - PERF_REG_PMU_MASK)
/* Exclude MMCR3, SIER2, SIER3 for CPU_FTR_ARCH_300 */
#define PERF_EXCLUDE_REG_EXT_300 (7ULL << PERF_REG_POWERPC_MMCR3)
#define PERF_REG_MAX_ISA_300 (PERF_REG_POWERPC_MMCR2 + 1)
#define PERF_REG_MAX_ISA_31 (PERF_REG_POWERPC_SIER3 + 1)
/*
* PERF_REG_EXTENDED_MASK value for CPU_FTR_ARCH_300
* includes 9 SPRS from MMCR0 to PMC6 excluding the
* unsupported SPRS in PERF_EXCLUDE_REG_EXT_300.
*/
#define PERF_REG_PMU_MASK_300 ((0xfffULL << PERF_REG_POWERPC_MMCR0) - PERF_EXCLUDE_REG_EXT_300)
/*
* PERF_REG_EXTENDED_MASK value for CPU_FTR_ARCH_31
* includes 12 SPRs from MMCR0 to PMC6.
*/
#define PERF_REG_PMU_MASK_31 (0xfffULL << PERF_REG_POWERPC_MMCR0)
#define PERF_REG_EXTENDED_MAX (PERF_REG_POWERPC_PMC6 + 1)
#endif /* _UAPI_ASM_POWERPC_PERF_REGS_H */
......@@ -146,6 +146,8 @@ VMLINUX_BTF_PATHS ?= $(if $(O),$(O)/vmlinux) \
/boot/vmlinux-$(shell uname -r)
VMLINUX_BTF ?= $(abspath $(firstword $(wildcard $(VMLINUX_BTF_PATHS))))
bootstrap: $(BPFTOOL_BOOTSTRAP)
ifneq ($(VMLINUX_BTF)$(VMLINUX_H),)
ifeq ($(feature-clang-bpf-co-re),1)
......
......@@ -99,7 +99,9 @@ FEATURE_TESTS_EXTRA := \
clang \
libbpf \
libpfm4 \
libdebuginfod
libdebuginfod \
clang-bpf-co-re
FEATURE_TESTS ?= $(FEATURE_TESTS_BASIC)
......
......@@ -4,9 +4,9 @@
/*
* Check OpenCSD library version is sufficient to provide required features
*/
#define OCSD_MIN_VER ((0 << 16) | (14 << 8) | (0))
#define OCSD_MIN_VER ((1 << 16) | (0 << 8) | (0))
#if !defined(OCSD_VER_NUM) || (OCSD_VER_NUM < OCSD_MIN_VER)
#error "OpenCSD >= 0.14.0 is required"
#error "OpenCSD >= 1.0.0 is required"
#endif
int main(void)
......
......@@ -145,12 +145,14 @@ enum perf_event_sample_format {
PERF_SAMPLE_CGROUP = 1U << 21,
PERF_SAMPLE_DATA_PAGE_SIZE = 1U << 22,
PERF_SAMPLE_CODE_PAGE_SIZE = 1U << 23,
PERF_SAMPLE_WEIGHT_STRUCT = 1U << 24,
PERF_SAMPLE_MAX = 1U << 24, /* non-ABI */
PERF_SAMPLE_MAX = 1U << 25, /* non-ABI */
__PERF_SAMPLE_CALLCHAIN_EARLY = 1ULL << 63, /* non-ABI; internal use */
};
#define PERF_SAMPLE_WEIGHT_TYPE (PERF_SAMPLE_WEIGHT | PERF_SAMPLE_WEIGHT_STRUCT)
/*
* values to program into branch_sample_type when PERF_SAMPLE_BRANCH is set
*
......@@ -386,7 +388,8 @@ struct perf_event_attr {
aux_output : 1, /* generate AUX records instead of events */
cgroup : 1, /* include cgroup events */
text_poke : 1, /* include text poke events */
__reserved_1 : 30;
build_id : 1, /* use build id in mmap2 events */
__reserved_1 : 29;
union {
__u32 wakeup_events; /* wakeup every n events */
......@@ -659,6 +662,22 @@ struct perf_event_mmap_page {
__u64 aux_size;
};
/*
* The current state of perf_event_header::misc bits usage:
* ('|' used bit, '-' unused bit)
*
* 012 CDEF
* |||---------||||
*
* Where:
* 0-2 CPUMODE_MASK
*
* C PROC_MAP_PARSE_TIMEOUT
* D MMAP_DATA / COMM_EXEC / FORK_EXEC / SWITCH_OUT
* E MMAP_BUILD_ID / EXACT_IP / SCHED_OUT_PREEMPT
* F (reserved)
*/
#define PERF_RECORD_MISC_CPUMODE_MASK (7 << 0)
#define PERF_RECORD_MISC_CPUMODE_UNKNOWN (0 << 0)
#define PERF_RECORD_MISC_KERNEL (1 << 0)
......@@ -690,6 +709,7 @@ struct perf_event_mmap_page {
*
* PERF_RECORD_MISC_EXACT_IP - PERF_RECORD_SAMPLE of precise events
* PERF_RECORD_MISC_SWITCH_OUT_PREEMPT - PERF_RECORD_SWITCH* events
* PERF_RECORD_MISC_MMAP_BUILD_ID - PERF_RECORD_MMAP2 event
*
*
* PERF_RECORD_MISC_EXACT_IP:
......@@ -699,9 +719,13 @@ struct perf_event_mmap_page {
*
* PERF_RECORD_MISC_SWITCH_OUT_PREEMPT:
* Indicates that thread was preempted in TASK_RUNNING state.
*
* PERF_RECORD_MISC_MMAP_BUILD_ID:
* Indicates that mmap2 event carries build id data.
*/
#define PERF_RECORD_MISC_EXACT_IP (1 << 14)
#define PERF_RECORD_MISC_SWITCH_OUT_PREEMPT (1 << 14)
#define PERF_RECORD_MISC_MMAP_BUILD_ID (1 << 14)
/*
* Reserve the last bit to indicate some extended misc field
*/
......@@ -890,7 +914,24 @@ enum perf_event_type {
* char data[size];
* u64 dyn_size; } && PERF_SAMPLE_STACK_USER
*
* { u64 weight; } && PERF_SAMPLE_WEIGHT
* { union perf_sample_weight
* {
* u64 full; && PERF_SAMPLE_WEIGHT
* #if defined(__LITTLE_ENDIAN_BITFIELD)
* struct {
* u32 var1_dw;
* u16 var2_w;
* u16 var3_w;
* } && PERF_SAMPLE_WEIGHT_STRUCT
* #elif defined(__BIG_ENDIAN_BITFIELD)
* struct {
* u16 var3_w;
* u16 var2_w;
* u32 var1_dw;
* } && PERF_SAMPLE_WEIGHT_STRUCT
* #endif
* }
* }
* { u64 data_src; } && PERF_SAMPLE_DATA_SRC
* { u64 transaction; } && PERF_SAMPLE_TRANSACTION
* { u64 abi; # enum perf_sample_regs_abi
......@@ -915,10 +956,20 @@ enum perf_event_type {
* u64 addr;
* u64 len;
* u64 pgoff;
* u32 maj;
* u32 min;
* u64 ino;
* u64 ino_generation;
* union {
* struct {
* u32 maj;
* u32 min;
* u64 ino;
* u64 ino_generation;
* };
* struct {
* u8 build_id_size;
* u8 __reserved_1;
* u16 __reserved_2;
* u8 build_id[20];
* };
* };
* u32 prot, flags;
* char filename[];
* struct sample_id sample_id;
......@@ -1127,14 +1178,16 @@ union perf_mem_data_src {
mem_lvl_num:4, /* memory hierarchy level number */
mem_remote:1, /* remote */
mem_snoopx:2, /* snoop mode, ext */
mem_rsvd:24;
mem_blk:3, /* access blocked */
mem_rsvd:21;
};
};
#elif defined(__BIG_ENDIAN_BITFIELD)
union perf_mem_data_src {
__u64 val;
struct {
__u64 mem_rsvd:24,
__u64 mem_rsvd:21,
mem_blk:3, /* access blocked */
mem_snoopx:2, /* snoop mode, ext */
mem_remote:1, /* remote */
mem_lvl_num:4, /* memory hierarchy level number */
......@@ -1217,6 +1270,12 @@ union perf_mem_data_src {
#define PERF_MEM_TLB_OS 0x40 /* OS fault handler */
#define PERF_MEM_TLB_SHIFT 26
/* Access blocked */
#define PERF_MEM_BLK_NA 0x01 /* not available */
#define PERF_MEM_BLK_DATA 0x02 /* data could not be forwarded */
#define PERF_MEM_BLK_ADDR 0x04 /* address conflict */
#define PERF_MEM_BLK_SHIFT 40
#define PERF_MEM_S(a, s) \
(((__u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT)
......@@ -1248,4 +1307,23 @@ struct perf_branch_entry {
reserved:40;
};
union perf_sample_weight {
__u64 full;
#if defined(__LITTLE_ENDIAN_BITFIELD)
struct {
__u32 var1_dw;
__u16 var2_w;
__u16 var3_w;
};
#elif defined(__BIG_ENDIAN_BITFIELD)
struct {
__u16 var3_w;
__u16 var2_w;
__u32 var1_dw;
};
#else
#error "Unknown endianness"
#endif
};
#endif /* _UAPI_LINUX_PERF_EVENT_H */
......@@ -251,5 +251,8 @@ struct prctl_mm_map {
#define PR_SET_SYSCALL_USER_DISPATCH 59
# define PR_SYS_DISPATCH_OFF 0
# define PR_SYS_DISPATCH_ON 1
/* The control values for the user space selector when dispatch is enabled */
# define SYSCALL_DISPATCH_FILTER_ALLOW 0
# define SYSCALL_DISPATCH_FILTER_BLOCK 1
#endif /* _LINUX_PRCTL_H */
......@@ -8,12 +8,29 @@
#include <string.h>
#include "fs.h"
struct cgroupfs_cache_entry {
char subsys[32];
char mountpoint[PATH_MAX];
};
/* just cache last used one */
static struct cgroupfs_cache_entry cached;
int cgroupfs_find_mountpoint(char *buf, size_t maxlen, const char *subsys)
{
FILE *fp;
char mountpoint[PATH_MAX + 1], tokens[PATH_MAX + 1], type[PATH_MAX + 1];
char path_v1[PATH_MAX + 1], path_v2[PATH_MAX + 2], *path;
char *token, *saved_ptr = NULL;
char *line = NULL;
size_t len = 0;
char *p, *path;
char mountpoint[PATH_MAX];
if (!strcmp(cached.subsys, subsys)) {
if (strlen(cached.mountpoint) < maxlen) {
strcpy(buf, cached.mountpoint);
return 0;
}
return -1;
}
fp = fopen("/proc/mounts", "r");
if (!fp)
......@@ -22,45 +39,63 @@ int cgroupfs_find_mountpoint(char *buf, size_t maxlen, const char *subsys)
/*
* in order to handle split hierarchy, we need to scan /proc/mounts
* and inspect every cgroupfs mount point to find one that has
* perf_event subsystem
* the given subsystem. If we found v1, just use it. If not we can
* use v2 path as a fallback.
*/
path_v1[0] = '\0';
path_v2[0] = '\0';
mountpoint[0] = '\0';
while (fscanf(fp, "%*s %"__stringify(PATH_MAX)"s %"__stringify(PATH_MAX)"s %"
__stringify(PATH_MAX)"s %*d %*d\n",
mountpoint, type, tokens) == 3) {
/*
* The /proc/mounts has the follow format:
*
* <devname> <mount point> <fs type> <options> ...
*
*/
while (getline(&line, &len, fp) != -1) {
/* skip devname */
p = strchr(line, ' ');
if (p == NULL)
continue;
/* save the mount point */
path = ++p;
p = strchr(p, ' ');
if (p == NULL)
continue;
if (!path_v1[0] && !strcmp(type, "cgroup")) {
*p++ = '\0';
token = strtok_r(tokens, ",", &saved_ptr);
/* check filesystem type */
if (strncmp(p, "cgroup", 6))
continue;
while (token != NULL) {
if (subsys && !strcmp(token, subsys)) {
strcpy(path_v1, mountpoint);
break;
}
token = strtok_r(NULL, ",", &saved_ptr);
}
if (p[6] == '2') {
/* save cgroup v2 path */
strcpy(mountpoint, path);
continue;
}
if (!path_v2[0] && !strcmp(type, "cgroup2"))
strcpy(path_v2, mountpoint);
/* now we have cgroup v1, check the options for subsystem */
p += 7;
if (path_v1[0] && path_v2[0])
break;
p = strstr(p, subsys);
if (p == NULL)
continue;
/* sanity check: it should be separated by a space or a comma */
if (!strchr(" ,", p[-1]) || !strchr(" ,", p[strlen(subsys)]))
continue;
strcpy(mountpoint, path);
break;
}
free(line);
fclose(fp);
if (path_v1[0])
path = path_v1;
else if (path_v2[0])
path = path_v2;
else
return -1;
strncpy(cached.subsys, subsys, sizeof(cached.subsys) - 1);
strcpy(cached.mountpoint, mountpoint);
if (strlen(path) < maxlen) {
strcpy(buf, path);
if (mountpoint[0] && strlen(mountpoint) < maxlen) {
strcpy(buf, mountpoint);
return 0;
}
return -1;
......
......@@ -23,10 +23,20 @@ struct perf_record_mmap2 {
__u64 start;
__u64 len;
__u64 pgoff;
__u32 maj;
__u32 min;
__u64 ino;
__u64 ino_generation;
union {
struct {
__u32 maj;
__u32 min;
__u64 ino;
__u64 ino_generation;
};
struct {
__u8 build_id_size;
__u8 __reserved_1;
__u16 __reserved_2;
__u8 build_id[20];
};
};
__u32 prot;
__u32 flags;
char filename[PATH_MAX];
......
......@@ -24,6 +24,7 @@ perf-y += builtin-mem.o
perf-y += builtin-data.o
perf-y += builtin-version.o
perf-y += builtin-c2c.o
perf-y += builtin-daemon.o
perf-$(CONFIG_TRACE) += builtin-trace.o
perf-$(CONFIG_LIBELF) += builtin-probe.o
......
......@@ -3,7 +3,7 @@
****** perf by examples ******
------------------------------
[ From an e-mail by Ingo Molnar, http://lkml.org/lkml/2009/8/4/346 ]
[ From an e-mail by Ingo Molnar, https://lore.kernel.org/lkml/20090804195717.GA5998@elte.hu ]
First, discovery/enumeration of available counters can be done via
......
......@@ -4,7 +4,7 @@
r synthesize branches events (returns only)
x synthesize transactions events
w synthesize ptwrite events
p synthesize power events
p synthesize power events (incl. PSB events for Intel PT)
o synthesize other events recorded due to the use
of aux-output (refer to perf record)
e synthesize error events
......
......@@ -74,6 +74,12 @@ OPTIONS
used when creating a uprobe for a process that resides in a
different mount namespace from the perf(1) utility.
--debuginfod=URLs::
Specify debuginfod URL to be used when retrieving perf.data binaries,
it follows the same syntax as the DEBUGINFOD_URLS variable, like:
buildid-cache.debuginfod=http://192.168.122.174:8002
SEE ALSO
--------
linkperf:perf-record[1], linkperf:perf-report[1], linkperf:perf-buildid-list[1]
......@@ -238,6 +238,13 @@ buildid.*::
cache location, or to disable it altogether. If you want to disable it,
set buildid.dir to /dev/null. The default is $HOME/.debug
buildid-cache.*::
buildid-cache.debuginfod=URLs
Specify debuginfod URLs to be used when retrieving perf.data binaries,
it follows the same syntax as the DEBUGINFOD_URLS variable, like:
buildid-cache.debuginfod=http://192.168.122.174:8002
annotate.*::
These are in control of addresses, jump function, source code
in lines of assembly code from a specific program.
......@@ -552,11 +559,12 @@ kmem.*::
record.*::
record.build-id::
This option can be 'cache', 'no-cache' or 'skip'.
This option can be 'cache', 'no-cache', 'skip' or 'mmap'.
'cache' is to post-process data and save/update the binaries into
the build-id cache (in ~/.debug). This is the default.
But if this option is 'no-cache', it will not update the build-id cache.
'skip' skips post-processing and does not update the cache.
'mmap' skips post-processing and reads build-ids from MMAP events.
record.call-graph::
This is identical to 'call-graph.record-mode', except it is
......@@ -695,6 +703,20 @@ auxtrace.*::
If the directory does not exist or has the wrong file type,
the current directory is used.
daemon.*::
daemon.base::
Base path for daemon data. All sessions data are stored under
this path.
session-<NAME>.*::
session-<NAME>.run::
Defines new record session for daemon. The value is record's
command line without the 'record' keyword.
SEE ALSO
--------
linkperf:perf[1]
perf-daemon(1)
==============
NAME
----
perf-daemon - Run record sessions on background
SYNOPSIS
--------
[verse]
'perf daemon'
'perf daemon' [<options>]
'perf daemon start' [<options>]
'perf daemon stop' [<options>]
'perf daemon signal' [<options>]
'perf daemon ping' [<options>]
DESCRIPTION
-----------
This command allows to run simple daemon process that starts and
monitors configured record sessions.
You can imagine 'perf daemon' of background process with several
'perf record' child tasks, like:
# ps axjf
...
1 916507 ... perf daemon start
916507 916508 ... \_ perf record --control=fifo:control,ack -m 10M -e cycles --overwrite --switch-output -a
916507 916509 ... \_ perf record --control=fifo:control,ack -m 20M -e sched:* --overwrite --switch-output -a
Not every 'perf record' session is suitable for running under daemon.
User need perf session that either produces data on query, like the
flight recorder sessions in above example or session that is configured
to produce data periodically, like with --switch-output configuration
for time and size.
Each session is started with control setup (with perf record --control
options).
Sessions are configured through config file, see CONFIG FILE section
with EXAMPLES.
OPTIONS
-------
-v::
--verbose::
Be more verbose.
--config=<PATH>::
Config file path. If not provided, perf will check system and default
locations (/etc/perfconfig, $HOME/.perfconfig).
--base=<PATH>::
Base directory path. Each daemon instance is running on top
of base directory. Only one instance of server can run on
top of one directory at the time.
All generic options are available also under commands.
START COMMAND
-------------
The start command creates the daemon process.
-f::
--foreground::
Do not put the process in background.
STOP COMMAND
------------
The stop command stops all the session and the daemon process.
SIGNAL COMMAND
--------------
The signal command sends signal to configured sessions.
--session::
Send signal to specific session.
PING COMMAND
------------
The ping command sends control ping to configured sessions.
--session::
Send ping to specific session.
CONFIG FILE
-----------
The daemon is configured within standard perf config file by
following new variables:
daemon.base:
Base path for daemon data. All sessions data are
stored under this path.
session-<NAME>.run:
Defines new record session. The value is record's command
line without the 'record' keyword.
Each perf record session is run in daemon.base/<NAME> directory.
EXAMPLES
--------
Example with 2 record sessions:
# cat ~/.perfconfig
[daemon]
base=/opt/perfdata
[session-cycles]
run = -m 10M -e cycles --overwrite --switch-output -a
[session-sched]
run = -m 20M -e sched:* --overwrite --switch-output -a
Starting the daemon:
# perf daemon start
Check sessions:
# perf daemon
[603349:daemon] base: /opt/perfdata
[603350:cycles] perf record -m 10M -e cycles --overwrite --switch-output -a
[603351:sched] perf record -m 20M -e sched:* --overwrite --switch-output -a
First line is daemon process info with configured daemon base.
Check sessions with more info:
# perf daemon -v
[603349:daemon] base: /opt/perfdata
output: /opt/perfdata/output
lock: /opt/perfdata/lock
up: 1 minutes
[603350:cycles] perf record -m 10M -e cycles --overwrite --switch-output -a
base: /opt/perfdata/session-cycles
output: /opt/perfdata/session-cycles/output
control: /opt/perfdata/session-cycles/control
ack: /opt/perfdata/session-cycles/ack
up: 1 minutes
[603351:sched] perf record -m 20M -e sched:* --overwrite --switch-output -a
base: /opt/perfdata/session-sched
output: /opt/perfdata/session-sched/output
control: /opt/perfdata/session-sched/control
ack: /opt/perfdata/session-sched/ack
up: 1 minutes
The 'base' path is daemon/session base.
The 'lock' file is daemon's lock file guarding that no other
daemon is running on top of the base.
The 'output' file is perf record output for specific session.
The 'control' and 'ack' files are perf control files.
The 'up' number shows minutes daemon/session is running.
Make sure control session is online:
# perf daemon ping
OK cycles
OK sched
Send USR2 signal to session 'cycles' to generate perf.data file:
# perf daemon signal --session cycles
signal 12 sent to session 'cycles [603452]'
# tail -2 /opt/perfdata/session-cycles/output
[ perf record: dump data: Woken up 1 times ]
[ perf record: Dump perf.data.2020123017013149 ]
Send USR2 signal to all sessions:
# perf daemon signal
signal 12 sent to session 'cycles [603452]'
signal 12 sent to session 'sched [603453]'
# tail -2 /opt/perfdata/session-cycles/output
[ perf record: dump data: Woken up 1 times ]
[ perf record: Dump perf.data.2020123017024689 ]
# tail -2 /opt/perfdata/session-sched/output
[ perf record: dump data: Woken up 1 times ]
[ perf record: Dump perf.data.2020123017024713 ]
Stop daemon:
# perf daemon stop
SEE ALSO
--------
linkperf:perf-record[1], linkperf:perf-config[1]
......@@ -858,7 +858,7 @@ The letters are:
b synthesize "branches" events
x synthesize "transactions" events
w synthesize "ptwrite" events
p synthesize "power" events
p synthesize "power" events (incl. PSB events)
c synthesize branches events (calls only)
r synthesize branches events (returns only)
e synthesize tracing error events
......@@ -913,6 +913,11 @@ Where:
For more details refer to the Intel 64 and IA-32 Architectures Software
Developer Manuals.
PSB events show when a PSB+ occurred and also the byte-offset in the trace.
Emitting a PSB+ can cause a CPU a slight delay. When doing timing analysis
of code with Intel PT, it is useful to know if a timing bubble was caused
by Intel PT or not.
Error events show where the decoder lost the trace. Error events
are quite important. Users must know if what they are seeing is a complete
picture or not. The "e" option may be followed by flags which affect what errors
......@@ -1141,6 +1146,88 @@ XED
include::build-xed.txt[]
Tracing Virtual Machines
------------------------
Currently, only kernel tracing is supported and only with "timeless" decoding
i.e. no TSC timestamps
Other limitations and caveats
VMX controls may suppress packets needed for decoding resulting in decoding errors
VMX controls may block the perf NMI to the host potentially resulting in lost trace data
Guest kernel self-modifying code (e.g. jump labels or JIT-compiled eBPF) will result in decoding errors
Guest thread information is unknown
Guest VCPU is unknown but may be able to be inferred from the host thread
Callchains are not supported
Example
Start VM
$ sudo virsh start kubuntu20.04
Domain kubuntu20.04 started
Mount the guest file system. Note sshfs needs -o direct_io to enable reading of proc files. root access is needed to read /proc/kcore.
$ mkdir vm0
$ sshfs -o direct_io root@vm0:/ vm0
Copy the guest /proc/kallsyms, /proc/modules and /proc/kcore
$ perf buildid-cache -v --kcore vm0/proc/kcore
kcore added to build-id cache directory /home/user/.debug/[kernel.kcore]/9600f316a53a0f54278885e8d9710538ec5f6a08/2021021807494306
$ KALLSYMS=/home/user/.debug/[kernel.kcore]/9600f316a53a0f54278885e8d9710538ec5f6a08/2021021807494306/kallsyms
Find the VM process
$ ps -eLl | grep 'KVM\|PID'
F S UID PID PPID LWP C PRI NI ADDR SZ WCHAN TTY TIME CMD
3 S 64055 1430 1 1440 1 80 0 - 1921718 - ? 00:02:47 CPU 0/KVM
3 S 64055 1430 1 1441 1 80 0 - 1921718 - ? 00:02:41 CPU 1/KVM
3 S 64055 1430 1 1442 1 80 0 - 1921718 - ? 00:02:38 CPU 2/KVM
3 S 64055 1430 1 1443 2 80 0 - 1921718 - ? 00:03:18 CPU 3/KVM
Start an open-ended perf record, tracing the VM process, do something on the VM, and then ctrl-C to stop.
TSC is not supported and tsc=0 must be specified. That means mtc is useless, so add mtc=0.
However, IPC can still be determined, hence cyc=1 can be added.
Only kernel decoding is supported, so 'k' must be specified.
Intel PT traces both the host and the guest so --guest and --host need to be specified.
Without timestamps, --per-thread must be specified to distinguish threads.
$ sudo perf kvm --guest --host --guestkallsyms $KALLSYMS record --kcore -e intel_pt/tsc=0,mtc=0,cyc=1/k -p 1430 --per-thread
^C
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 5.829 MB ]
perf script can be used to provide an instruction trace
$ perf script --guestkallsyms $KALLSYMS --insn-trace --xed -F+ipc | grep -C10 vmresume | head -21
CPU 0/KVM 1440 ffffffff82133cdd __vmx_vcpu_run+0x3d ([kernel.kallsyms]) movq 0x48(%rax), %r9
CPU 0/KVM 1440 ffffffff82133ce1 __vmx_vcpu_run+0x41 ([kernel.kallsyms]) movq 0x50(%rax), %r10
CPU 0/KVM 1440 ffffffff82133ce5 __vmx_vcpu_run+0x45 ([kernel.kallsyms]) movq 0x58(%rax), %r11
CPU 0/KVM 1440 ffffffff82133ce9 __vmx_vcpu_run+0x49 ([kernel.kallsyms]) movq 0x60(%rax), %r12
CPU 0/KVM 1440 ffffffff82133ced __vmx_vcpu_run+0x4d ([kernel.kallsyms]) movq 0x68(%rax), %r13
CPU 0/KVM 1440 ffffffff82133cf1 __vmx_vcpu_run+0x51 ([kernel.kallsyms]) movq 0x70(%rax), %r14
CPU 0/KVM 1440 ffffffff82133cf5 __vmx_vcpu_run+0x55 ([kernel.kallsyms]) movq 0x78(%rax), %r15
CPU 0/KVM 1440 ffffffff82133cf9 __vmx_vcpu_run+0x59 ([kernel.kallsyms]) movq (%rax), %rax
CPU 0/KVM 1440 ffffffff82133cfc __vmx_vcpu_run+0x5c ([kernel.kallsyms]) callq 0xffffffff82133c40
CPU 0/KVM 1440 ffffffff82133c40 vmx_vmenter+0x0 ([kernel.kallsyms]) jz 0xffffffff82133c46
CPU 0/KVM 1440 ffffffff82133c42 vmx_vmenter+0x2 ([kernel.kallsyms]) vmresume IPC: 0.11 (50/445)
:1440 1440 ffffffffbb678b06 native_write_msr+0x6 ([guest.kernel.kallsyms]) nopl %eax, (%rax,%rax,1)
:1440 1440 ffffffffbb678b0b native_write_msr+0xb ([guest.kernel.kallsyms]) retq IPC: 0.04 (2/41)
:1440 1440 ffffffffbb666646 lapic_next_deadline+0x26 ([guest.kernel.kallsyms]) data16 nop
:1440 1440 ffffffffbb666648 lapic_next_deadline+0x28 ([guest.kernel.kallsyms]) xor %eax, %eax
:1440 1440 ffffffffbb66664a lapic_next_deadline+0x2a ([guest.kernel.kallsyms]) popq %rbp
:1440 1440 ffffffffbb66664b lapic_next_deadline+0x2b ([guest.kernel.kallsyms]) retq IPC: 0.16 (4/25)
:1440 1440 ffffffffbb74607f clockevents_program_event+0x8f ([guest.kernel.kallsyms]) test %eax, %eax
:1440 1440 ffffffffbb746081 clockevents_program_event+0x91 ([guest.kernel.kallsyms]) jz 0xffffffffbb74603c IPC: 0.06 (2/30)
:1440 1440 ffffffffbb74603c clockevents_program_event+0x4c ([guest.kernel.kallsyms]) popq %rbx
:1440 1440 ffffffffbb74603d clockevents_program_event+0x4d ([guest.kernel.kallsyms]) popq %r12
SEE ALSO
--------
......
......@@ -63,6 +63,9 @@ OPTIONS
--phys-data::
Record/Report sample physical addresses
--data-page-size::
Record/Report sample data address page size
RECORD OPTIONS
--------------
-e::
......
......@@ -296,6 +296,9 @@ OPTIONS
--data-page-size::
Record the sampled data address data page size.
--code-page-size::
Record the sampled code address (ip) page size
-T::
--timestamp::
Record the sample timestamps. Use it with 'perf report -D' to see the
......@@ -485,6 +488,9 @@ Specify vmlinux path which has debuginfo.
--buildid-all::
Record build-id of all DSOs regardless whether it's actually hit or not.
--buildid-mmap::
Record build ids in mmap2 events, disables build id cache (implies --no-buildid).
--aio[=n]::
Use <n> control blocks in asynchronous (Posix AIO) trace writing mode (default: 1, max: 4).
Asynchronous mode is supported only when linking Perf tool with libc library
......@@ -640,9 +646,18 @@ ctl-fifo / ack-fifo are opened and used as ctl-fd / ack-fd as follows.
Listen on ctl-fd descriptor for command to control measurement.
Available commands:
'enable' : enable events
'disable' : disable events
'snapshot': AUX area tracing snapshot).
'enable' : enable events
'disable' : disable events
'enable name' : enable event 'name'
'disable name' : disable event 'name'
'snapshot' : AUX area tracing snapshot).
'stop' : stop perf record
'ping' : ping
'evlist [-v|-g|-F] : display all events
-F Show just the sample frequency used for each event.
-v Show all fields.
-g Show event group information.
Measurements can be started with events disabled using --delay=-1 option. Optionally
send control command completion ('ack\n') to ack-fd descriptor to synchronize with the
......
......@@ -108,6 +108,10 @@ OPTIONS
- period: Raw number of event count of sample
- time: Separate the samples by time stamp with the resolution specified by
--time-quantum (default 100ms). Specify with overhead and before it.
- code_page_size: the code page size of sampled code address (ip)
- ins_lat: Instruction latency in core cycles. This is the global instruction
latency
- local_ins_lat: Local instruction latency version
By default, comm, dso and symbol keys are used.
(i.e. --sort comm,dso,symbol)
......@@ -139,7 +143,7 @@ OPTIONS
If the --mem-mode option is used, the following sort keys are also available
(incompatible with --branch-stack):
symbol_daddr, dso_daddr, locked, tlb, mem, snoop, dcacheline.
symbol_daddr, dso_daddr, locked, tlb, mem, snoop, dcacheline, blocked.
- symbol_daddr: name of data symbol being executed on at the time of sample
- dso_daddr: name of library or module containing the data being executed
......@@ -151,9 +155,11 @@ OPTIONS
- dcacheline: the cacheline the data address is on at the time of the sample
- phys_daddr: physical address of data being executed on at the time of sample
- data_page_size: the data page size of data being executed on at the time of sample
- blocked: reason of blocked load access for the data at the time of the sample
And the default sort keys are changed to local_weight, mem, sym, dso,
symbol_daddr, dso_daddr, snoop, tlb, locked, see '--mem-mode'.
symbol_daddr, dso_daddr, snoop, tlb, locked, blocked, local_ins_lat,
see '--mem-mode'.
If the data file has tracepoint event(s), following (dynamic) sort keys
are also available:
......
......@@ -118,7 +118,7 @@ OPTIONS
comm, tid, pid, time, cpu, event, trace, ip, sym, dso, addr, symoff,
srcline, period, iregs, uregs, brstack, brstacksym, flags, bpf-output,
brstackinsn, brstackoff, callindent, insn, insnlen, synth, phys_addr,
metric, misc, srccode, ipc, data_page_size.
metric, misc, srccode, ipc, data_page_size, code_page_size.
Field list can be prepended with the type, trace, sw or hw,
to indicate to which event type the field list applies.
e.g., -F sw:comm,tid,time,ip,sym and -F trace:time,cpu,trace
......@@ -422,9 +422,32 @@ include::itrace.txt[]
Only consider the listed symbols. Symbols are typically a name
but they may also be hexadecimal address.
The hexadecimal address may be the start address of a symbol or
any other address to filter the trace records
For example, to select the symbol noploop or the address 0x4007a0:
perf script --symbols=noploop,0x4007a0
Support filtering trace records by symbol name, start address of
symbol, any hexadecimal address and address range.
The comparison order is:
1. symbol name comparison
2. symbol start address comparison.
3. any hexadecimal address comparison.
4. address range comparison (see --addr-range).
--addr-range::
Use with -S or --symbols to list traced records within address range.
For example, to list the traced records within the address range
[0x4007a0, 0x0x4007a9]:
perf script -S 0x4007a0 --addr-range 10
--dsos=::
Only consider symbols in these DSOs.
--call-trace::
Show call stream for intel_pt traces. The CPUs are interleaved, but
can be filtered with -C.
......
......@@ -75,6 +75,24 @@ report::
--tid=<tid>::
stat events on existing thread id (comma separated list)
-b::
--bpf-prog::
stat events on existing bpf program id (comma separated list),
requiring root rights. bpftool-prog could be used to find program
id all bpf programs in the system. For example:
# bpftool prog | head -n 1
17247: tracepoint name sys_enter tag 192d548b9d754067 gpl
# perf stat -e cycles,instructions --bpf-prog 17247 --timeout 1000
Performance counter stats for 'BPF program(s) 17247':
85,967 cycles
28,982 instructions # 0.34 insn per cycle
1.102235068 seconds time elapsed
ifdef::HAVE_LIBPFM[]
--pfm-events events::
Select a PMU event using libpfm4 syntax (see http://perfmon2.sf.net)
......@@ -358,7 +376,7 @@ See perf list output for the possble metrics and metricgroups.
Do not aggregate counts across all monitored CPUs.
--topdown::
Print top down level 1 metrics if supported by the CPU. This allows to
Print complete top-down metrics supported by the CPU. This allows to
determine bottle necks in the CPU pipeline for CPU bound workloads,
by breaking the cycles consumed down into frontend bound, backend bound,
bad speculation and retiring.
......@@ -393,6 +411,18 @@ To interpret the results it is usually needed to know on which
CPUs the workload runs on. If needed the CPUs can be forced using
taskset.
--td-level::
Print the top-down statistics that equal to or lower than the input level.
It allows users to print the interested top-down metrics level instead of
the complete top-down metrics.
The availability of the top-down metrics level depends on the hardware. For
example, Ice Lake only supports L1 top-down metrics. The Sapphire Rapids
supports both L1 and L2 top-down metrics.
Default: 0 means the max level that the current hardware support.
Error out if the input is higher than the supported max level.
--no-merge::
Do not merge results from same PMUs.
......
......@@ -121,7 +121,7 @@ to read slots and the topdown metrics at different points of the program:
#define RDPMC_METRIC (1 << 29) /* return metric counters */
#define FIXED_COUNTER_SLOTS 3
#define METRIC_COUNTER_TOPDOWN_L1 0
#define METRIC_COUNTER_TOPDOWN_L1_L2 0
static inline uint64_t read_slots(void)
{
......@@ -130,7 +130,7 @@ static inline uint64_t read_slots(void)
static inline uint64_t read_metrics(void)
{
return _rdpmc(RDPMC_METRIC | METRIC_COUNTER_TOPDOWN_L1);
return _rdpmc(RDPMC_METRIC | METRIC_COUNTER_TOPDOWN_L1_L2);
}
Then the program can be instrumented to read these metrics at different
......@@ -152,11 +152,21 @@ The binary ratios in the metric value can be converted to float ratios:
#define GET_METRIC(m, i) (((m) >> (i*8)) & 0xff)
/* L1 Topdown metric events */
#define TOPDOWN_RETIRING(val) ((float)GET_METRIC(val, 0) / 0xff)
#define TOPDOWN_BAD_SPEC(val) ((float)GET_METRIC(val, 1) / 0xff)
#define TOPDOWN_FE_BOUND(val) ((float)GET_METRIC(val, 2) / 0xff)
#define TOPDOWN_BE_BOUND(val) ((float)GET_METRIC(val, 3) / 0xff)
/*
* L2 Topdown metric events.
* Available on Sapphire Rapids and later platforms.
*/
#define TOPDOWN_HEAVY_OPS(val) ((float)GET_METRIC(val, 4) / 0xff)
#define TOPDOWN_BR_MISPREDICT(val) ((float)GET_METRIC(val, 5) / 0xff)
#define TOPDOWN_FETCH_LAT(val) ((float)GET_METRIC(val, 6) / 0xff)
#define TOPDOWN_MEM_BOUND(val) ((float)GET_METRIC(val, 7) / 0xff)
and then converted to percent for printing.
The ratios in the metric accumulate for the time when the counter
......@@ -190,8 +200,8 @@ for that time period.
fe_bound_slots = GET_METRIC(metric_b, 2) * slots_b - fe_bound_slots_a
be_bound_slots = GET_METRIC(metric_b, 3) * slots_b - be_bound_slots_a
Later the individual ratios for the measurement period can be recreated
from these counts.
Later the individual ratios of L1 metric events for the measurement period can
be recreated from these counts.
slots_delta = slots_b - slots_a
retiring_ratio = (float)retiring_slots / slots_delta
......@@ -205,6 +215,48 @@ from these counts.
fe_bound_ratio * 100.,
be_bound_ratio * 100.);
The individual ratios of L2 metric events for the measurement period can be
recreated from L1 and L2 metric counters. (Available on Sapphire Rapids and
later platforms)
# compute scaled metrics for measurement a
heavy_ops_slots_a = GET_METRIC(metric_a, 4) * slots_a
br_mispredict_slots_a = GET_METRIC(metric_a, 5) * slots_a
fetch_lat_slots_a = GET_METRIC(metric_a, 6) * slots_a
mem_bound_slots_a = GET_METRIC(metric_a, 7) * slots_a
# compute delta scaled metrics between b and a
heavy_ops_slots = GET_METRIC(metric_b, 4) * slots_b - heavy_ops_slots_a
br_mispredict_slots = GET_METRIC(metric_b, 5) * slots_b - br_mispredict_slots_a
fetch_lat_slots = GET_METRIC(metric_b, 6) * slots_b - fetch_lat_slots_a
mem_bound_slots = GET_METRIC(metric_b, 7) * slots_b - mem_bound_slots_a
slots_delta = slots_b - slots_a
heavy_ops_ratio = (float)heavy_ops_slots / slots_delta
light_ops_ratio = retiring_ratio - heavy_ops_ratio;
br_mispredict_ratio = (float)br_mispredict_slots / slots_delta
machine_clears_ratio = bad_spec_ratio - br_mispredict_ratio;
fetch_lat_ratio = (float)fetch_lat_slots / slots_delta
fetch_bw_ratio = fe_bound_ratio - fetch_lat_ratio;
mem_bound_ratio = (float)mem_bound_slots / slota_delta
core_bound_ratio = be_bound_ratio - mem_bound_ratio;
printf("Heavy Operations %.2f%% Light Operations %.2f%% "
"Branch Mispredict %.2f%% Machine Clears %.2f%% "
"Fetch Latency %.2f%% Fetch Bandwidth %.2f%% "
"Mem Bound %.2f%% Core Bound %.2f%%\n",
heavy_ops_ratio * 100.,
light_ops_ratio * 100.,
br_mispredict_ratio * 100.,
machine_clears_ratio * 100.,
fetch_lat_ratio * 100.,
fetch_bw_ratio * 100.,
mem_bound_ratio * 100.,
core_bound_ratio * 100.);
Resetting metrics counters
==========================
......@@ -248,6 +300,24 @@ a sampling read group. Since the SLOTS event must be the leader of a TopDown
group, the second event of the group is the sampling event.
For example, perf record -e '{slots, $sampling_event, topdown-retiring}:S'
Extension on Sapphire Rapids Server
===================================
The metrics counter is extended to support TMA method level 2 metrics.
The lower half of the register is the TMA level 1 metrics (legacy).
The upper half is also divided into four 8-bit fields for the new level 2
metrics. Four more TopDown metric events are exposed for the end-users,
topdown-heavy-ops, topdown-br-mispredict, topdown-fetch-lat and
topdown-mem-bound.
Each of the new level 2 metrics in the upper half is a subset of the
corresponding level 1 metric in the lower half. Software can deduce the
other four level 2 metrics by subtracting corresponding metrics as below.
Light_Operations = Retiring - Heavy_Operations
Machine_Clears = Bad_Speculation - Branch_Mispredicts
Fetch_Bandwidth = Frontend_Bound - Fetch_Latency
Core_Bound = Backend_Bound - Memory_Bound
[1] https://software.intel.com/en-us/top-down-microarchitecture-analysis-method-win
[2] https://github.com/andikleen/pmu-tools/wiki/toplev-manual
......
......@@ -621,6 +621,15 @@ ifndef NO_LIBBPF
endif
endif
ifdef BUILD_BPF_SKEL
$(call feature_check,clang-bpf-co-re)
ifeq ($(feature-clang-bpf-co-re), 0)
dummy := $(error Error: clang too old. Please install recent clang)
endif
$(call detected,CONFIG_PERF_BPF_SKEL)
CFLAGS += -DHAVE_BPF_SKEL
endif
dwarf-post-unwind := 1
dwarf-post-unwind-text := BUG
......
......@@ -126,6 +126,8 @@ include ../scripts/utilities.mak
#
# Define NO_LIBDEBUGINFOD if you do not want support debuginfod
#
# Define BUILD_BPF_SKEL to enable BPF skeletons
#
# As per kernel Makefile, avoid funny character set dependencies
unexport LC_ALL
......@@ -175,6 +177,12 @@ endef
LD += $(EXTRA_LDFLAGS)
HOSTCC ?= gcc
HOSTLD ?= ld
HOSTAR ?= ar
CLANG ?= clang
LLVM_STRIP ?= llvm-strip
PKG_CONFIG = $(CROSS_COMPILE)pkg-config
RM = rm -f
......@@ -730,7 +738,8 @@ prepare: $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)common-cmds.h archheaders $(drm_ioc
$(x86_arch_prctl_code_array) \
$(rename_flags_array) \
$(arch_errno_name_array) \
$(sync_file_range_arrays)
$(sync_file_range_arrays) \
bpf-skel
$(OUTPUT)%.o: %.c prepare FORCE
$(Q)$(MAKE) -f $(srctree)/tools/build/Makefile.build dir=$(build-dir) $@
......@@ -1003,7 +1012,43 @@ config-clean:
python-clean:
$(python-clean)
clean:: $(LIBTRACEEVENT)-clean $(LIBAPI)-clean $(LIBBPF)-clean $(LIBSUBCMD)-clean $(LIBPERF)-clean config-clean fixdep-clean python-clean
SKEL_OUT := $(abspath $(OUTPUT)util/bpf_skel)
SKEL_TMP_OUT := $(abspath $(SKEL_OUT)/.tmp)
SKELETONS := $(SKEL_OUT)/bpf_prog_profiler.skel.h
ifdef BUILD_BPF_SKEL
BPFTOOL := $(SKEL_TMP_OUT)/bootstrap/bpftool
LIBBPF_SRC := $(abspath ../lib/bpf)
BPF_INCLUDE := -I$(SKEL_TMP_OUT)/.. -I$(BPF_PATH) -I$(LIBBPF_SRC)/..
$(SKEL_TMP_OUT):
$(Q)$(MKDIR) -p $@
$(BPFTOOL): | $(SKEL_TMP_OUT)
CFLAGS= $(MAKE) -C ../bpf/bpftool \
OUTPUT=$(SKEL_TMP_OUT)/ bootstrap
$(SKEL_TMP_OUT)/%.bpf.o: util/bpf_skel/%.bpf.c $(LIBBPF) | $(SKEL_TMP_OUT)
$(QUIET_CLANG)$(CLANG) -g -O2 -target bpf $(BPF_INCLUDE) \
-c $(filter util/bpf_skel/%.bpf.c,$^) -o $@ && $(LLVM_STRIP) -g $@
$(SKEL_OUT)/%.skel.h: $(SKEL_TMP_OUT)/%.bpf.o | $(BPFTOOL)
$(QUIET_GENSKEL)$(BPFTOOL) gen skeleton $< > $@
bpf-skel: $(SKELETONS)
.PRECIOUS: $(SKEL_TMP_OUT)/%.bpf.o
else # BUILD_BPF_SKEL
bpf-skel:
endif # BUILD_BPF_SKEL
bpf-skel-clean:
$(call QUIET_CLEAN, bpf-skel) $(RM) -r $(SKEL_TMP_OUT) $(SKELETONS)
clean:: $(LIBTRACEEVENT)-clean $(LIBAPI)-clean $(LIBBPF)-clean $(LIBSUBCMD)-clean $(LIBPERF)-clean config-clean fixdep-clean python-clean bpf-skel-clean
$(call QUIET_CLEAN, core-objs) $(RM) $(LIBPERF_A) $(OUTPUT)perf-archive $(OUTPUT)perf-with-kcore $(LANG_BINDINGS)
$(Q)find $(if $(OUTPUT),$(OUTPUT),.) -name '*.o' -delete -o -name '\.*.cmd' -delete -o -name '\.*.d' -delete
$(Q)$(RM) $(OUTPUT).config-detected
......
......@@ -15,7 +15,7 @@ void perf_regs_load(u64 *regs);
#define PERF_REG_IP PERF_REG_ARM_PC
#define PERF_REG_SP PERF_REG_ARM_SP
static inline const char *perf_reg_name(int id)
static inline const char *__perf_reg_name(int id)
{
switch (id) {
case PERF_REG_ARM_R0:
......
......@@ -15,7 +15,7 @@ void perf_regs_load(u64 *regs);
#define PERF_REG_IP PERF_REG_ARM64_PC
#define PERF_REG_SP PERF_REG_ARM64_SP
static inline const char *perf_reg_name(int id)
static inline const char *__perf_reg_name(int id)
{
switch (id) {
case PERF_REG_ARM64_X0:
......
// SPDX-License-Identifier: GPL-2.0
#include <inttypes.h>
#include <stdio.h>
#include <string.h>
#include "debug.h"
......@@ -23,5 +24,5 @@ void arch__symbols__fixup_end(struct symbol *p, struct symbol *c)
p->end += SYMBOL_LIMIT;
else
p->end = c->start;
pr_debug4("%s sym:%s end:%#lx\n", __func__, p->name, p->end);
pr_debug4("%s sym:%s end:%#" PRIx64 "\n", __func__, p->name, p->end);
}
// SPDX-License-Identifier: GPL-2.0
#include <errno.h>
#include <regex.h>
#include <string.h>
#include <linux/kernel.h>
#include <linux/zalloc.h>
#include "../../../util/debug.h"
#include "../../../util/event.h"
#include "../../../util/perf_regs.h"
const struct sample_reg sample_reg_masks[] = {
......@@ -37,3 +45,89 @@ const struct sample_reg sample_reg_masks[] = {
SMPL_REG(pc, PERF_REG_ARM64_PC),
SMPL_REG_END
};
/* %xNUM */
#define SDT_OP_REGEX1 "^(x[1-2]?[0-9]|3[0-1])$"
/* [sp], [sp, NUM] */
#define SDT_OP_REGEX2 "^\\[sp(, )?([0-9]+)?\\]$"
static regex_t sdt_op_regex1, sdt_op_regex2;
static int sdt_init_op_regex(void)
{
static int initialized;
int ret = 0;
if (initialized)
return 0;
ret = regcomp(&sdt_op_regex1, SDT_OP_REGEX1, REG_EXTENDED);
if (ret)
goto error;
ret = regcomp(&sdt_op_regex2, SDT_OP_REGEX2, REG_EXTENDED);
if (ret)
goto free_regex1;
initialized = 1;
return 0;
free_regex1:
regfree(&sdt_op_regex1);
error:
pr_debug4("Regex compilation error.\n");
return ret;
}
/*
* SDT marker arguments on Arm64 uses %xREG or [sp, NUM], currently
* support these two formats.
*/
int arch_sdt_arg_parse_op(char *old_op, char **new_op)
{
int ret, new_len;
regmatch_t rm[5];
ret = sdt_init_op_regex();
if (ret < 0)
return ret;
if (!regexec(&sdt_op_regex1, old_op, 3, rm, 0)) {
/* Extract xNUM */
new_len = 2; /* % NULL */
new_len += (int)(rm[1].rm_eo - rm[1].rm_so);
*new_op = zalloc(new_len);
if (!*new_op)
return -ENOMEM;
scnprintf(*new_op, new_len, "%%%.*s",
(int)(rm[1].rm_eo - rm[1].rm_so), old_op + rm[1].rm_so);
} else if (!regexec(&sdt_op_regex2, old_op, 5, rm, 0)) {
/* [sp], [sp, NUM] or [sp,NUM] */
new_len = 7; /* + ( % s p ) NULL */
/* If the arugment is [sp], need to fill offset '0' */
if (rm[2].rm_so == -1)
new_len += 1;
else
new_len += (int)(rm[2].rm_eo - rm[2].rm_so);
*new_op = zalloc(new_len);
if (!*new_op)
return -ENOMEM;
if (rm[2].rm_so == -1)
scnprintf(*new_op, new_len, "+0(%%sp)");
else
scnprintf(*new_op, new_len, "+%.*s(%%sp)",
(int)(rm[2].rm_eo - rm[2].rm_so),
old_op + rm[2].rm_so);
} else {
pr_debug4("Skipping unsupported SDT argument: %s\n", old_op);
return SDT_ARG_SKIP;
}
return SDT_ARG_VALID;
}
......@@ -15,7 +15,7 @@
#define PERF_REG_IP PERF_REG_CSKY_PC
#define PERF_REG_SP PERF_REG_CSKY_SP
static inline const char *perf_reg_name(int id)
static inline const char *__perf_reg_name(int id)
{
switch (id) {
case PERF_REG_CSKY_A0:
......
......@@ -71,9 +71,15 @@ static const char *reg_names[] = {
[PERF_REG_POWERPC_MMCR3] = "mmcr3",
[PERF_REG_POWERPC_SIER2] = "sier2",
[PERF_REG_POWERPC_SIER3] = "sier3",
[PERF_REG_POWERPC_PMC1] = "pmc1",
[PERF_REG_POWERPC_PMC2] = "pmc2",
[PERF_REG_POWERPC_PMC3] = "pmc3",
[PERF_REG_POWERPC_PMC4] = "pmc4",
[PERF_REG_POWERPC_PMC5] = "pmc5",
[PERF_REG_POWERPC_PMC6] = "pmc6",
};
static inline const char *perf_reg_name(int id)
static inline const char *__perf_reg_name(int id)
{
return reg_names[id];
}
......
perf-y += header.o
perf-y += machine.o
perf-y += kvm-stat.o
perf-y += perf_regs.o
perf-y += mem-events.o
......
// SPDX-License-Identifier: GPL-2.0
#include <inttypes.h>
#include <stdio.h>
#include <string.h>
#include <internal/lib.h> // page_size
#include "debug.h"
#include "symbol.h"
/* On powerpc kernel text segment start at memory addresses, 0xc000000000000000
* whereas the modules are located at very high memory addresses,
* for example 0xc00800000xxxxxxx. The gap between end of kernel text segment
* and beginning of first module's text segment is very high.
* Therefore do not fill this gap and do not assign it to the kernel dso map.
*/
void arch__symbols__fixup_end(struct symbol *p, struct symbol *c)
{
if (strchr(p->name, '[') == NULL && strchr(c->name, '['))
/* Limit the range of last kernel symbol */
p->end += page_size;
else
p->end = c->start;
pr_debug4("%s sym:%s end:%#" PRIx64 "\n", __func__, p->name, p->end);
}
......@@ -68,6 +68,12 @@ const struct sample_reg sample_reg_masks[] = {
SMPL_REG(mmcr3, PERF_REG_POWERPC_MMCR3),
SMPL_REG(sier2, PERF_REG_POWERPC_SIER2),
SMPL_REG(sier3, PERF_REG_POWERPC_SIER3),
SMPL_REG(pmc1, PERF_REG_POWERPC_PMC1),
SMPL_REG(pmc2, PERF_REG_POWERPC_PMC2),
SMPL_REG(pmc3, PERF_REG_POWERPC_PMC3),
SMPL_REG(pmc4, PERF_REG_POWERPC_PMC4),
SMPL_REG(pmc5, PERF_REG_POWERPC_PMC5),
SMPL_REG(pmc6, PERF_REG_POWERPC_PMC6),
SMPL_REG_END
};
......
......@@ -19,7 +19,7 @@
#define PERF_REG_IP PERF_REG_RISCV_PC
#define PERF_REG_SP PERF_REG_RISCV_SP
static inline const char *perf_reg_name(int id)
static inline const char *__perf_reg_name(int id)
{
switch (id) {
case PERF_REG_RISCV_PC:
......
......@@ -14,7 +14,7 @@ void perf_regs_load(u64 *regs);
#define PERF_REG_IP PERF_REG_S390_PC
#define PERF_REG_SP PERF_REG_S390_R15
static inline const char *perf_reg_name(int id)
static inline const char *__perf_reg_name(int id)
{
switch (id) {
case PERF_REG_S390_R0:
......
// SPDX-License-Identifier: GPL-2.0
#include <inttypes.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>
......@@ -48,5 +49,5 @@ void arch__symbols__fixup_end(struct symbol *p, struct symbol *c)
p->end = roundup(p->end, page_size);
else
p->end = c->start;
pr_debug4("%s sym:%s end:%#lx\n", __func__, p->name, p->end);
pr_debug4("%s sym:%s end:%#" PRIx64 "\n", __func__, p->name, p->end);
}
......@@ -23,7 +23,7 @@ void perf_regs_load(u64 *regs);
#define PERF_REG_IP PERF_REG_X86_IP
#define PERF_REG_SP PERF_REG_X86_SP
static inline const char *perf_reg_name(int id)
static inline const char *__perf_reg_name(int id)
{
switch (id) {
case PERF_REG_X86_AX:
......
......@@ -48,6 +48,7 @@ static int get_op(const char *op_str)
{"int", INTEL_PT_OP_INT},
{"syscall", INTEL_PT_OP_SYSCALL},
{"sysret", INTEL_PT_OP_SYSRET},
{"vmentry", INTEL_PT_OP_VMENTRY},
{NULL, 0},
};
struct val_data *val;
......
......@@ -66,8 +66,8 @@ struct test_data {
{7, {0x9d, 1, 2, 3, 4, 5, 6}, 0, {INTEL_PT_FUP, 4, 0x60504030201}, 0, 0 },
{9, {0xdd, 1, 2, 3, 4, 5, 6, 7, 8}, 0, {INTEL_PT_FUP, 6, 0x807060504030201}, 0, 0 },
/* Paging Information Packet */
{8, {0x02, 0x43, 2, 4, 6, 8, 10, 12}, 0, {INTEL_PT_PIP, 0, 0x60504030201}, 0, 0 },
{8, {0x02, 0x43, 3, 4, 6, 8, 10, 12}, 0, {INTEL_PT_PIP, 0, 0x60504030201 | (1ULL << 63)}, 0, 0 },
{8, {0x02, 0x43, 2, 4, 6, 8, 10, 12}, 0, {INTEL_PT_PIP, 0, 0xC0A08060402}, 0, 0 },
{8, {0x02, 0x43, 3, 4, 6, 8, 10, 12}, 0, {INTEL_PT_PIP, 0, 0xC0A08060403}, 0, 0 },
/* Mode Exec Packet */
{2, {0x99, 0x00}, 0, {INTEL_PT_MODE_EXEC, 0, 16}, 0, 0 },
{2, {0x99, 0x01}, 0, {INTEL_PT_MODE_EXEC, 0, 64}, 0, 0 },
......
......@@ -6,6 +6,9 @@ perf-y += perf_regs.o
perf-y += topdown.o
perf-y += machine.o
perf-y += event.o
perf-y += evlist.o
perf-y += mem-events.o
perf-y += evsel.o
perf-$(CONFIG_DWARF) += dwarf-regs.o
perf-$(CONFIG_BPF_PROLOGUE) += dwarf-regs.o
......
......@@ -75,3 +75,28 @@ int perf_event__synthesize_extra_kmaps(struct perf_tool *tool,
}
#endif
void arch_perf_parse_sample_weight(struct perf_sample *data,
const __u64 *array, u64 type)
{
union perf_sample_weight weight;
weight.full = *array;
if (type & PERF_SAMPLE_WEIGHT)
data->weight = weight.full;
else {
data->weight = weight.var1_dw;
data->ins_lat = weight.var2_w;
}
}
void arch_perf_synthesize_sample_weight(const struct perf_sample *data,
__u64 *array, u64 type)
{
*array = data->weight;
if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
*array &= 0xffffffff;
*array |= ((u64)data->ins_lat << 32);
}
}
// SPDX-License-Identifier: GPL-2.0
#include <stdio.h>
#include "util/pmu.h"
#include "util/evlist.h"
#include "util/parse-events.h"
#define TOPDOWN_L1_EVENTS "{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound}"
int arch_evlist__add_default_attrs(struct evlist *evlist)
{
if (!pmu_have_event("cpu", "slots"))
return 0;
return parse_events(evlist, TOPDOWN_L1_EVENTS, NULL);
}
// SPDX-License-Identifier: GPL-2.0
#include <stdio.h>
#include "util/evsel.h"
void arch_evsel__set_sample_weight(struct evsel *evsel)
{
evsel__set_sample_bit(evsel, WEIGHT_STRUCT);
}
// SPDX-License-Identifier: GPL-2.0
#include "util/pmu.h"
#include "map_symbol.h"
#include "mem-events.h"
static char mem_loads_name[100];
static bool mem_loads_name__init;
#define MEM_LOADS_AUX 0x8203
#define MEM_LOADS_AUX_NAME "{cpu/mem-loads-aux/,cpu/mem-loads,ldlat=%u/pp}:S"
bool is_mem_loads_aux_event(struct evsel *leader)
{
if (!pmu_have_event("cpu", "mem-loads-aux"))
return false;
return leader->core.attr.config == MEM_LOADS_AUX;
}
char *perf_mem_events__name(int i)
{
struct perf_mem_event *e = perf_mem_events__ptr(i);
if (!e)
return NULL;
if (i == PERF_MEM_EVENTS__LOAD) {
if (mem_loads_name__init)
return mem_loads_name;
mem_loads_name__init = true;
if (pmu_have_event("cpu", "mem-loads-aux")) {
scnprintf(mem_loads_name, sizeof(mem_loads_name),
MEM_LOADS_AUX_NAME, perf_mem_events__loads_ldlat);
} else {
scnprintf(mem_loads_name, sizeof(mem_loads_name),
e->name, perf_mem_events__loads_ldlat);
}
return mem_loads_name;
}
return (char *)e->name;
}
......@@ -21,7 +21,6 @@
#include <sys/resource.h>
#include <sys/epoll.h>
#include <sys/eventfd.h>
#include <internal/cpumap.h>
#include <perf/cpumap.h>
#include "../util/stat.h"
......
......@@ -76,7 +76,6 @@
#include <sys/epoll.h>
#include <sys/eventfd.h>
#include <sys/types.h>
#include <internal/cpumap.h>
#include <perf/cpumap.h>
#include "../util/stat.h"
......
......@@ -20,7 +20,6 @@
#include <linux/kernel.h>
#include <linux/zalloc.h>
#include <sys/time.h>
#include <internal/cpumap.h>
#include <perf/cpumap.h>
#include "../util/stat.h"
......
......@@ -14,7 +14,6 @@
#include <linux/kernel.h>
#include <linux/zalloc.h>
#include <errno.h>
#include <internal/cpumap.h>
#include <perf/cpumap.h>
#include "bench.h"
#include "futex.h"
......
......@@ -20,7 +20,6 @@
#include <linux/kernel.h>
#include <linux/time64.h>
#include <errno.h>
#include <internal/cpumap.h>
#include <perf/cpumap.h>
#include "bench.h"
#include "futex.h"
......
......@@ -29,7 +29,6 @@ int bench_futex_wake_parallel(int argc __maybe_unused, const char **argv __maybe
#include <linux/time64.h>
#include <errno.h>
#include "futex.h"
#include <internal/cpumap.h>
#include <perf/cpumap.h>
#include <err.h>
......
......@@ -20,7 +20,6 @@
#include <linux/kernel.h>
#include <linux/time64.h>
#include <errno.h>
#include <internal/cpumap.h>
#include <perf/cpumap.h>
#include "bench.h"
#include "futex.h"
......
......@@ -27,6 +27,7 @@
#include "util/time-utils.h"
#include "util/util.h"
#include "util/probe-file.h"
#include "util/config.h"
#include <linux/string.h>
#include <linux/err.h>
......@@ -348,12 +349,21 @@ static int build_id_cache__show_all(void)
return 0;
}
static int perf_buildid_cache_config(const char *var, const char *value, void *cb)
{
const char **debuginfod = cb;
if (!strcmp(var, "buildid-cache.debuginfod"))
*debuginfod = strdup(value);
return 0;
}
int cmd_buildid_cache(int argc, const char **argv)
{
struct strlist *list;
struct str_node *pos;
int ret = 0;
int ns_id = -1;
int ret, ns_id = -1;
bool force = false;
bool list_files = false;
bool opts_flag = false;
......@@ -363,7 +373,8 @@ int cmd_buildid_cache(int argc, const char **argv)
*purge_name_list_str = NULL,
*missing_filename = NULL,
*update_name_list_str = NULL,
*kcore_filename = NULL;
*kcore_filename = NULL,
*debuginfod = NULL;
char sbuf[STRERR_BUFSIZE];
struct perf_data data = {
......@@ -388,6 +399,8 @@ int cmd_buildid_cache(int argc, const char **argv)
OPT_BOOLEAN('f', "force", &force, "don't complain, do it"),
OPT_STRING('u', "update", &update_name_list_str, "file list",
"file(s) to update"),
OPT_STRING(0, "debuginfod", &debuginfod, "debuginfod url",
"set debuginfod url"),
OPT_INCR('v', "verbose", &verbose, "be more verbose"),
OPT_INTEGER(0, "target-ns", &ns_id, "target pid for namespace context"),
OPT_END()
......@@ -397,6 +410,10 @@ int cmd_buildid_cache(int argc, const char **argv)
NULL
};
ret = perf_config(perf_buildid_cache_config, &debuginfod);
if (ret)
return ret;
argc = parse_options(argc, argv, buildid_cache_options,
buildid_cache_usage, 0);
......@@ -408,6 +425,11 @@ int cmd_buildid_cache(int argc, const char **argv)
if (argc || !(list_files || opts_flag))
usage_with_options(buildid_cache_usage, buildid_cache_options);
if (debuginfod) {
pr_debug("DEBUGINFOD_URLS=%s\n", debuginfod);
setenv("DEBUGINFOD_URLS", debuginfod, 1);
}
/* -l is exclusive. It can not be used with other options. */
if (list_files && opts_flag) {
usage_with_options_msg(buildid_cache_usage,
......
......@@ -77,6 +77,9 @@ static int perf_session__list_build_ids(bool force, bool with_hits)
perf_header__has_feat(&session->header, HEADER_AUXTRACE))
with_hits = false;
if (!perf_header__has_feat(&session->header, HEADER_BUILD_ID))
with_hits = true;
/*
* in pipe-mode, the only way to get the buildids is to parse
* the record stream. Buildids are stored as RECORD_HEADER_BUILD_ID
......
......@@ -97,8 +97,8 @@ struct perf_c2c {
bool symbol_full;
bool stitch_lbr;
/* HITM shared clines stats */
struct c2c_stats hitm_stats;
/* Shared cache line stats */
struct c2c_stats shared_clines_stats;
int shared_clines;
int display;
......@@ -876,7 +876,7 @@ static struct c2c_stats *total_stats(struct hist_entry *he)
return &hists->stats;
}
static double percent(int st, int tot)
static double percent(u32 st, u32 tot)
{
return tot ? 100. * (double) st / (double) tot : 0;
}
......@@ -1048,6 +1048,19 @@ empty_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
return 0;
}
static int display_metrics(struct perf_hpp *hpp, u32 val, u32 sum)
{
int ret;
if (sum != 0)
ret = scnprintf(hpp->buf, hpp->size, "%5.1f%% ",
percent(val, sum));
else
ret = scnprintf(hpp->buf, hpp->size, "%6s ", "n/a");
return ret;
}
static int
node_entry(struct perf_hpp_fmt *fmt __maybe_unused, struct perf_hpp *hpp,
struct hist_entry *he)
......@@ -1091,29 +1104,23 @@ node_entry(struct perf_hpp_fmt *fmt __maybe_unused, struct perf_hpp *hpp,
ret = scnprintf(hpp->buf, hpp->size, "%2d{%2d ", node, num);
advance_hpp(hpp, ret);
#define DISPLAY_HITM(__h) \
if (c2c_he->stats.__h> 0) { \
ret = scnprintf(hpp->buf, hpp->size, "%5.1f%% ", \
percent(stats->__h, c2c_he->stats.__h));\
} else { \
ret = scnprintf(hpp->buf, hpp->size, "%6s ", "n/a"); \
}
switch (c2c.display) {
case DISPLAY_RMT:
DISPLAY_HITM(rmt_hitm);
ret = display_metrics(hpp, stats->rmt_hitm,
c2c_he->stats.rmt_hitm);
break;
case DISPLAY_LCL:
DISPLAY_HITM(lcl_hitm);
ret = display_metrics(hpp, stats->lcl_hitm,
c2c_he->stats.lcl_hitm);
break;
case DISPLAY_TOT:
DISPLAY_HITM(tot_hitm);
ret = display_metrics(hpp, stats->tot_hitm,
c2c_he->stats.tot_hitm);
break;
default:
break;
}
#undef DISPLAY_HITM
advance_hpp(hpp, ret);
if (c2c_he->stats.store > 0) {
......@@ -1851,53 +1858,69 @@ static int c2c_hists__reinit(struct c2c_hists *c2c_hists,
#define DISPLAY_LINE_LIMIT 0.001
static u8 filter_display(u32 val, u32 sum)
{
if (sum == 0 || ((double)val / sum) < DISPLAY_LINE_LIMIT)
return HIST_FILTER__C2C;
return 0;
}
static bool he__display(struct hist_entry *he, struct c2c_stats *stats)
{
struct c2c_hist_entry *c2c_he;
double ld_dist;
if (c2c.show_all)
return true;
c2c_he = container_of(he, struct c2c_hist_entry, he);
#define FILTER_HITM(__h) \
if (stats->__h) { \
ld_dist = ((double)c2c_he->stats.__h / stats->__h); \
if (ld_dist < DISPLAY_LINE_LIMIT) \
he->filtered = HIST_FILTER__C2C; \
} else { \
he->filtered = HIST_FILTER__C2C; \
}
switch (c2c.display) {
case DISPLAY_LCL:
FILTER_HITM(lcl_hitm);
he->filtered = filter_display(c2c_he->stats.lcl_hitm,
stats->lcl_hitm);
break;
case DISPLAY_RMT:
FILTER_HITM(rmt_hitm);
he->filtered = filter_display(c2c_he->stats.rmt_hitm,
stats->rmt_hitm);
break;
case DISPLAY_TOT:
FILTER_HITM(tot_hitm);
he->filtered = filter_display(c2c_he->stats.tot_hitm,
stats->tot_hitm);
break;
default:
break;
}
#undef FILTER_HITM
return he->filtered == 0;
}
static inline int valid_hitm_or_store(struct hist_entry *he)
static inline bool is_valid_hist_entry(struct hist_entry *he)
{
struct c2c_hist_entry *c2c_he;
bool has_hitm;
bool has_record = false;
c2c_he = container_of(he, struct c2c_hist_entry, he);
has_hitm = c2c.display == DISPLAY_TOT ? c2c_he->stats.tot_hitm :
c2c.display == DISPLAY_LCL ? c2c_he->stats.lcl_hitm :
c2c_he->stats.rmt_hitm;
return has_hitm || c2c_he->stats.store;
/* It's a valid entry if contains stores */
if (c2c_he->stats.store)
return true;
switch (c2c.display) {
case DISPLAY_LCL:
has_record = !!c2c_he->stats.lcl_hitm;
break;
case DISPLAY_RMT:
has_record = !!c2c_he->stats.rmt_hitm;
break;
case DISPLAY_TOT:
has_record = !!c2c_he->stats.tot_hitm;
break;
default:
break;
}
return has_record;
}
static void set_node_width(struct c2c_hist_entry *c2c_he, int len)
......@@ -1951,7 +1974,7 @@ static int filter_cb(struct hist_entry *he, void *arg __maybe_unused)
calc_width(c2c_he);
if (!valid_hitm_or_store(he))
if (!is_valid_hist_entry(he))
he->filtered = HIST_FILTER__C2C;
return 0;
......@@ -1961,7 +1984,7 @@ static int resort_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
{
struct c2c_hist_entry *c2c_he;
struct c2c_hists *c2c_hists;
bool display = he__display(he, &c2c.hitm_stats);
bool display = he__display(he, &c2c.shared_clines_stats);
c2c_he = container_of(he, struct c2c_hist_entry, he);
c2c_hists = c2c_he->hists;
......@@ -2048,14 +2071,14 @@ static int setup_nodes(struct perf_session *session)
#define HAS_HITMS(__h) ((__h)->stats.lcl_hitm || (__h)->stats.rmt_hitm)
static int resort_hitm_cb(struct hist_entry *he, void *arg __maybe_unused)
static int resort_shared_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
{
struct c2c_hist_entry *c2c_he;
c2c_he = container_of(he, struct c2c_hist_entry, he);
if (HAS_HITMS(c2c_he)) {
c2c.shared_clines++;
c2c_add_stats(&c2c.hitm_stats, &c2c_he->stats);
c2c_add_stats(&c2c.shared_clines_stats, &c2c_he->stats);
}
return 0;
......@@ -2111,6 +2134,8 @@ static void print_c2c__display_stats(FILE *out)
fprintf(out, " Load MESI State Exclusive : %10d\n", stats->ld_excl);
fprintf(out, " Load MESI State Shared : %10d\n", stats->ld_shared);
fprintf(out, " Load LLC Misses : %10d\n", llc_misses);
fprintf(out, " Load access blocked by data : %10d\n", stats->blk_data);
fprintf(out, " Load access blocked by address : %10d\n", stats->blk_addr);
fprintf(out, " LLC Misses to Local DRAM : %10.1f%%\n", ((double)stats->lcl_dram/(double)llc_misses) * 100.);
fprintf(out, " LLC Misses to Remote DRAM : %10.1f%%\n", ((double)stats->rmt_dram/(double)llc_misses) * 100.);
fprintf(out, " LLC Misses to Remote cache (HIT) : %10.1f%%\n", ((double)stats->rmt_hit /(double)llc_misses) * 100.);
......@@ -2126,7 +2151,7 @@ static void print_c2c__display_stats(FILE *out)
static void print_shared_cacheline_info(FILE *out)
{
struct c2c_stats *stats = &c2c.hitm_stats;
struct c2c_stats *stats = &c2c.shared_clines_stats;
int hitm_cnt = stats->lcl_hitm + stats->rmt_hitm;
fprintf(out, "=================================================\n");
......@@ -2139,6 +2164,7 @@ static void print_shared_cacheline_info(FILE *out)
fprintf(out, " L2D hits on shared lines : %10d\n", stats->ld_l2hit);
fprintf(out, " LLC hits on shared lines : %10d\n", stats->ld_llchit + stats->lcl_hitm);
fprintf(out, " Locked Access on shared lines : %10d\n", stats->locks);
fprintf(out, " Blocked Access on shared lines : %10d\n", stats->blk_data + stats->blk_addr);
fprintf(out, " Store HITs on shared lines : %10d\n", stats->store);
fprintf(out, " Store L1D hits on shared lines : %10d\n", stats->st_l1hit);
fprintf(out, " Total Merged records : %10d\n", hitm_cnt + stats->store);
......@@ -2176,16 +2202,17 @@ static void print_pareto(FILE *out)
struct perf_hpp_list hpp_list;
struct rb_node *nd;
int ret;
const char *cl_output;
cl_output = "cl_num,"
"cl_rmt_hitm,"
"cl_lcl_hitm,"
"cl_stores_l1hit,"
"cl_stores_l1miss,"
"dcacheline";
perf_hpp_list__init(&hpp_list);
ret = hpp_list__parse(&hpp_list,
"cl_num,"
"cl_rmt_hitm,"
"cl_lcl_hitm,"
"cl_stores_l1hit,"
"cl_stores_l1miss,"
"dcacheline",
NULL);
ret = hpp_list__parse(&hpp_list, cl_output, NULL);
if (WARN_ONCE(ret, "failed to setup sort entries\n"))
return;
......@@ -2729,6 +2756,7 @@ static int perf_c2c__report(int argc, const char **argv)
OPT_END()
};
int err = 0;
const char *output_str, *sort_str = NULL;
argc = parse_options(argc, argv, options, report_c2c_usage,
PARSE_OPT_STOP_AT_NON_OPTION);
......@@ -2805,29 +2833,34 @@ static int perf_c2c__report(int argc, const char **argv)
goto out_mem2node;
}
c2c_hists__reinit(&c2c.hists,
"cl_idx,"
"dcacheline,"
"dcacheline_node,"
"dcacheline_count,"
"percent_hitm,"
"tot_hitm,lcl_hitm,rmt_hitm,"
"tot_recs,"
"tot_loads,"
"tot_stores,"
"stores_l1hit,stores_l1miss,"
"ld_fbhit,ld_l1hit,ld_l2hit,"
"ld_lclhit,lcl_hitm,"
"ld_rmthit,rmt_hitm,"
"dram_lcl,dram_rmt",
c2c.display == DISPLAY_TOT ? "tot_hitm" :
c2c.display == DISPLAY_LCL ? "lcl_hitm" : "rmt_hitm"
);
output_str = "cl_idx,"
"dcacheline,"
"dcacheline_node,"
"dcacheline_count,"
"percent_hitm,"
"tot_hitm,lcl_hitm,rmt_hitm,"
"tot_recs,"
"tot_loads,"
"tot_stores,"
"stores_l1hit,stores_l1miss,"
"ld_fbhit,ld_l1hit,ld_l2hit,"
"ld_lclhit,lcl_hitm,"
"ld_rmthit,rmt_hitm,"
"dram_lcl,dram_rmt";
if (c2c.display == DISPLAY_TOT)
sort_str = "tot_hitm";
else if (c2c.display == DISPLAY_RMT)
sort_str = "rmt_hitm";
else if (c2c.display == DISPLAY_LCL)
sort_str = "lcl_hitm";
c2c_hists__reinit(&c2c.hists, output_str, sort_str);
ui_progress__init(&prog, c2c.hists.hists.nr_entries, "Sorting...");
hists__collapse_resort(&c2c.hists.hists, NULL);
hists__output_resort_cb(&c2c.hists.hists, &prog, resort_hitm_cb);
hists__output_resort_cb(&c2c.hists.hists, &prog, resort_shared_cl_cb);
hists__iterate_cb(&c2c.hists.hists, resort_cl_cb);
ui_progress__finish();
......
This diff is collapsed.
......@@ -313,7 +313,7 @@ static int perf_event__jit_repipe_mmap(struct perf_tool *tool,
* if jit marker, then inject jit mmaps and generate ELF images
*/
ret = jit_process(inject->session, &inject->output, machine,
event->mmap.filename, event->mmap.pid, &n);
event->mmap.filename, event->mmap.pid, event->mmap.tid, &n);
if (ret < 0)
return ret;
if (ret) {
......@@ -413,7 +413,7 @@ static int perf_event__jit_repipe_mmap2(struct perf_tool *tool,
* if jit marker, then inject jit mmaps and generate ELF images
*/
ret = jit_process(inject->session, &inject->output, machine,
event->mmap2.filename, event->mmap2.pid, &n);
event->mmap2.filename, event->mmap2.pid, event->mmap2.tid, &n);
if (ret < 0)
return ret;
if (ret) {
......
......@@ -30,6 +30,7 @@ struct perf_mem {
bool dump_raw;
bool force;
bool phys_addr;
bool data_page_size;
int operation;
const char *cpu_list;
DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
......@@ -124,6 +125,9 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)
if (mem->phys_addr)
rec_argv[i++] = "--phys-data";
if (mem->data_page_size)
rec_argv[i++] = "--data-page-size";
for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
e = perf_mem_events__ptr(j);
if (!e->record)
......@@ -172,7 +176,8 @@ dump_raw_samples(struct perf_tool *tool,
{
struct perf_mem *mem = container_of(tool, struct perf_mem, tool);
struct addr_location al;
const char *fmt;
const char *fmt, *field_sep;
char str[PAGE_SIZE_NAME_LEN];
if (machine__resolve(machine, &al, sample) < 0) {
fprintf(stderr, "problem processing %d event, skipping it.\n",
......@@ -186,60 +191,47 @@ dump_raw_samples(struct perf_tool *tool,
if (al.map != NULL)
al.map->dso->hit = 1;
if (mem->phys_addr) {
if (symbol_conf.field_sep) {
fmt = "%d%s%d%s0x%"PRIx64"%s0x%"PRIx64"%s0x%016"PRIx64
"%s%"PRIu64"%s0x%"PRIx64"%s%s:%s\n";
} else {
fmt = "%5d%s%5d%s0x%016"PRIx64"%s0x016%"PRIx64
"%s0x%016"PRIx64"%s%5"PRIu64"%s0x%06"PRIx64
"%s%s:%s\n";
symbol_conf.field_sep = " ";
}
field_sep = symbol_conf.field_sep;
if (field_sep) {
fmt = "%d%s%d%s0x%"PRIx64"%s0x%"PRIx64"%s";
} else {
fmt = "%5d%s%5d%s0x%016"PRIx64"%s0x016%"PRIx64"%s";
symbol_conf.field_sep = " ";
}
printf(fmt,
sample->pid,
symbol_conf.field_sep,
sample->tid,
symbol_conf.field_sep,
sample->ip,
symbol_conf.field_sep,
sample->addr,
symbol_conf.field_sep);
printf(fmt,
sample->pid,
symbol_conf.field_sep,
sample->tid,
symbol_conf.field_sep,
sample->ip,
symbol_conf.field_sep,
sample->addr,
symbol_conf.field_sep,
if (mem->phys_addr) {
printf("0x%016"PRIx64"%s",
sample->phys_addr,
symbol_conf.field_sep,
sample->weight,
symbol_conf.field_sep,
sample->data_src,
symbol_conf.field_sep,
al.map ? (al.map->dso ? al.map->dso->long_name : "???") : "???",
al.sym ? al.sym->name : "???");
} else {
if (symbol_conf.field_sep) {
fmt = "%d%s%d%s0x%"PRIx64"%s0x%"PRIx64"%s%"PRIu64
"%s0x%"PRIx64"%s%s:%s\n";
} else {
fmt = "%5d%s%5d%s0x%016"PRIx64"%s0x016%"PRIx64
"%s%5"PRIu64"%s0x%06"PRIx64"%s%s:%s\n";
symbol_conf.field_sep = " ";
}
symbol_conf.field_sep);
}
printf(fmt,
sample->pid,
symbol_conf.field_sep,
sample->tid,
symbol_conf.field_sep,
sample->ip,
symbol_conf.field_sep,
sample->addr,
symbol_conf.field_sep,
sample->weight,
symbol_conf.field_sep,
sample->data_src,
symbol_conf.field_sep,
al.map ? (al.map->dso ? al.map->dso->long_name : "???") : "???",
al.sym ? al.sym->name : "???");
if (mem->data_page_size) {
printf("%s%s",
get_page_size_name(sample->data_page_size, str),
symbol_conf.field_sep);
}
if (field_sep)
fmt = "%"PRIu64"%s0x%"PRIx64"%s%s:%s\n";
else
fmt = "%5"PRIu64"%s0x%06"PRIx64"%s%s:%s\n";
printf(fmt,
sample->weight,
symbol_conf.field_sep,
sample->data_src,
symbol_conf.field_sep,
al.map ? (al.map->dso ? al.map->dso->long_name : "???") : "???",
al.sym ? al.sym->name : "???");
out_put:
addr_location__put(&al);
return 0;
......@@ -287,10 +279,15 @@ static int report_raw_events(struct perf_mem *mem)
if (ret < 0)
goto out_delete;
printf("# PID, TID, IP, ADDR, ");
if (mem->phys_addr)
printf("# PID, TID, IP, ADDR, PHYS ADDR, LOCAL WEIGHT, DSRC, SYMBOL\n");
else
printf("# PID, TID, IP, ADDR, LOCAL WEIGHT, DSRC, SYMBOL\n");
printf("PHYS ADDR, ");
if (mem->data_page_size)
printf("DATA PAGE SIZE, ");
printf("LOCAL WEIGHT, DSRC, SYMBOL\n");
ret = perf_session__process_events(session);
......@@ -300,7 +297,7 @@ static int report_raw_events(struct perf_mem *mem)
}
static char *get_sort_order(struct perf_mem *mem)
{
bool has_extra_options = mem->phys_addr ? true : false;
bool has_extra_options = (mem->phys_addr | mem->data_page_size) ? true : false;
char sort[128];
/*
......@@ -312,13 +309,16 @@ static char *get_sort_order(struct perf_mem *mem)
"dso_daddr,tlb,locked");
} else if (has_extra_options) {
strcpy(sort, "--sort=local_weight,mem,sym,dso,symbol_daddr,"
"dso_daddr,snoop,tlb,locked");
"dso_daddr,snoop,tlb,locked,blocked");
} else
return NULL;
if (mem->phys_addr)
strcat(sort, ",phys_daddr");
if (mem->data_page_size)
strcat(sort, ",data_page_size");
return strdup(sort);
}
......@@ -464,6 +464,7 @@ int cmd_mem(int argc, const char **argv)
" between columns '.' is reserved."),
OPT_BOOLEAN('f', "force", &mem.force, "don't complain, do it"),
OPT_BOOLEAN('p', "phys-data", &mem.phys_addr, "Record/Report sample physical addresses"),
OPT_BOOLEAN(0, "data-page-size", &mem.data_page_size, "Record/Report sample data address page size"),
OPT_END()
};
const char *const mem_subcommands[] = { "record", "report", NULL };
......
......@@ -102,6 +102,7 @@ struct record {
bool no_buildid_cache;
bool no_buildid_cache_set;
bool buildid_all;
bool buildid_mmap;
bool timestamp_filename;
bool timestamp_boundary;
struct switch_output switch_output;
......@@ -730,6 +731,8 @@ static int record__auxtrace_init(struct record *rec)
if (err)
return err;
auxtrace_regroup_aux_output(rec->evlist);
return auxtrace_parse_filters(rec->evlist);
}
......@@ -1663,7 +1666,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
status = -1;
goto out_delete_session;
}
err = evlist__add_pollfd(rec->evlist, done_fd);
err = evlist__add_wakeup_eventfd(rec->evlist, done_fd);
if (err < 0) {
pr_err("Failed to add wakeup eventfd to poll list\n");
status = err;
......@@ -1937,18 +1940,19 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
if (evlist__ctlfd_process(rec->evlist, &cmd) > 0) {
switch (cmd) {
case EVLIST_CTL_CMD_ENABLE:
pr_info(EVLIST_ENABLED_MSG);
break;
case EVLIST_CTL_CMD_DISABLE:
pr_info(EVLIST_DISABLED_MSG);
break;
case EVLIST_CTL_CMD_SNAPSHOT:
hit_auxtrace_snapshot_trigger(rec);
evlist__ctlfd_ack(rec->evlist);
break;
case EVLIST_CTL_CMD_STOP:
done = 1;
break;
case EVLIST_CTL_CMD_ACK:
case EVLIST_CTL_CMD_UNSUPPORTED:
case EVLIST_CTL_CMD_ENABLE:
case EVLIST_CTL_CMD_DISABLE:
case EVLIST_CTL_CMD_EVLIST:
case EVLIST_CTL_CMD_PING:
default:
break;
}
......@@ -2135,6 +2139,8 @@ static int perf_record_config(const char *var, const char *value, void *cb)
rec->no_buildid_cache = true;
else if (!strcmp(value, "skip"))
rec->no_buildid = true;
else if (!strcmp(value, "mmap"))
rec->buildid_mmap = true;
else
return -1;
return 0;
......@@ -2474,6 +2480,8 @@ static struct option __record_options[] = {
"Record the sample physical addresses"),
OPT_BOOLEAN(0, "data-page-size", &record.opts.sample_data_page_size,
"Record the sampled data address data page size"),
OPT_BOOLEAN(0, "code-page-size", &record.opts.sample_code_page_size,
"Record the sampled code address (ip) page size"),
OPT_BOOLEAN(0, "sample-cpu", &record.opts.sample_cpu, "Record the sample cpu"),
OPT_BOOLEAN_SET('T', "timestamp", &record.opts.sample_time,
&record.opts.sample_time_set,
......@@ -2552,6 +2560,8 @@ static struct option __record_options[] = {
"file", "vmlinux pathname"),
OPT_BOOLEAN(0, "buildid-all", &record.buildid_all,
"Record build-id of all DSOs regardless of hits"),
OPT_BOOLEAN(0, "buildid-mmap", &record.buildid_mmap,
"Record build-id in map events"),
OPT_BOOLEAN(0, "timestamp-filename", &record.timestamp_filename,
"append timestamp to output filename"),
OPT_BOOLEAN(0, "timestamp-boundary", &record.timestamp_boundary,
......@@ -2655,6 +2665,21 @@ int cmd_record(int argc, const char **argv)
}
if (rec->buildid_mmap) {
if (!perf_can_record_build_id()) {
pr_err("Failed: no support to record build id in mmap events, update your kernel.\n");
err = -EINVAL;
goto out_opts;
}
pr_debug("Enabling build id in mmap2 events.\n");
/* Enable mmap build id synthesizing. */
symbol_conf.buildid_mmap2 = true;
/* Enable perf_event_attr::build_id bit. */
rec->opts.build_id = true;
/* Disable build id cache. */
rec->no_buildid = true;
}
if (rec->opts.kcore)
rec->data.is_dir = true;
......
......@@ -117,6 +117,7 @@ enum perf_output_field {
PERF_OUTPUT_IPC = 1ULL << 31,
PERF_OUTPUT_TOD = 1ULL << 32,
PERF_OUTPUT_DATA_PAGE_SIZE = 1ULL << 33,
PERF_OUTPUT_CODE_PAGE_SIZE = 1ULL << 34,
};
struct perf_script {
......@@ -182,6 +183,7 @@ struct output_option {
{.str = "ipc", .field = PERF_OUTPUT_IPC},
{.str = "tod", .field = PERF_OUTPUT_TOD},
{.str = "data_page_size", .field = PERF_OUTPUT_DATA_PAGE_SIZE},
{.str = "code_page_size", .field = PERF_OUTPUT_CODE_PAGE_SIZE},
};
enum {
......@@ -256,7 +258,7 @@ static struct {
PERF_OUTPUT_DSO | PERF_OUTPUT_PERIOD |
PERF_OUTPUT_ADDR | PERF_OUTPUT_DATA_SRC |
PERF_OUTPUT_WEIGHT | PERF_OUTPUT_PHYS_ADDR |
PERF_OUTPUT_DATA_PAGE_SIZE,
PERF_OUTPUT_DATA_PAGE_SIZE | PERF_OUTPUT_CODE_PAGE_SIZE,
.invalid_fields = PERF_OUTPUT_TRACE | PERF_OUTPUT_BPF_OUTPUT,
},
......@@ -523,6 +525,10 @@ static int evsel__check_attr(struct evsel *evsel, struct perf_session *session)
evsel__check_stype(evsel, PERF_SAMPLE_DATA_PAGE_SIZE, "DATA_PAGE_SIZE", PERF_OUTPUT_DATA_PAGE_SIZE))
return -EINVAL;
if (PRINT_FIELD(CODE_PAGE_SIZE) &&
evsel__check_stype(evsel, PERF_SAMPLE_CODE_PAGE_SIZE, "CODE_PAGE_SIZE", PERF_OUTPUT_CODE_PAGE_SIZE))
return -EINVAL;
return 0;
}
......@@ -1531,6 +1537,8 @@ static struct {
{PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_TX_ABORT, "tx abrt"},
{PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_TRACE_BEGIN, "tr strt"},
{PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_TRACE_END, "tr end"},
{PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL | PERF_IP_FLAG_VMENTRY, "vmentry"},
{PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL | PERF_IP_FLAG_VMEXIT, "vmexit"},
{0, NULL}
};
......@@ -1760,6 +1768,18 @@ static int perf_sample__fprintf_synth_cbr(struct perf_sample *sample, FILE *fp)
return len + perf_sample__fprintf_pt_spacing(len, fp);
}
static int perf_sample__fprintf_synth_psb(struct perf_sample *sample, FILE *fp)
{
struct perf_synth_intel_psb *data = perf_sample__synth_ptr(sample);
int len;
if (perf_sample__bad_synth_size(sample, *data))
return 0;
len = fprintf(fp, " psb offs: %#" PRIx64, data->offset);
return len + perf_sample__fprintf_pt_spacing(len, fp);
}
static int perf_sample__fprintf_synth(struct perf_sample *sample,
struct evsel *evsel, FILE *fp)
{
......@@ -1776,6 +1796,8 @@ static int perf_sample__fprintf_synth(struct perf_sample *sample,
return perf_sample__fprintf_synth_pwrx(sample, fp);
case PERF_SYNTH_INTEL_CBR:
return perf_sample__fprintf_synth_cbr(sample, fp);
case PERF_SYNTH_INTEL_PSB:
return perf_sample__fprintf_synth_psb(sample, fp);
default:
break;
}
......@@ -2036,6 +2058,9 @@ static void process_event(struct perf_script *script,
if (PRINT_FIELD(DATA_PAGE_SIZE))
fprintf(fp, " %s", get_page_size_name(sample->data_page_size, str));
if (PRINT_FIELD(CODE_PAGE_SIZE))
fprintf(fp, " %s", get_page_size_name(sample->code_page_size, str));
perf_sample__fprintf_ipc(sample, attr, fp);
fprintf(fp, "\n");
......@@ -2786,7 +2811,7 @@ static int parse_output_fields(const struct option *opt __maybe_unused,
break;
}
if (i == imax && strcmp(tok, "flags") == 0) {
print_flags = change == REMOVE ? false : true;
print_flags = change != REMOVE;
continue;
}
if (i == imax) {
......@@ -3234,7 +3259,7 @@ static char *get_script_path(const char *script_root, const char *suffix)
static bool is_top_script(const char *script_path)
{
return ends_with(script_path, "top") == NULL ? false : true;
return ends_with(script_path, "top") != NULL;
}
static int has_required_arg(char *script_path)
......@@ -3535,12 +3560,16 @@ int cmd_script(int argc, const char **argv)
"addr,symoff,srcline,period,iregs,uregs,brstack,"
"brstacksym,flags,bpf-output,brstackinsn,brstackoff,"
"callindent,insn,insnlen,synth,phys_addr,metric,misc,ipc,tod,"
"data_page_size",
"data_page_size,code_page_size",
parse_output_fields),
OPT_BOOLEAN('a', "all-cpus", &system_wide,
"system-wide collection from all CPUs"),
OPT_STRING(0, "dsos", &symbol_conf.dso_list_str, "dso[,dso...]",
"only consider symbols in these DSOs"),
OPT_STRING('S', "symbols", &symbol_conf.sym_list_str, "symbol[,symbol...]",
"only consider these symbols"),
OPT_INTEGER(0, "addr-range", &symbol_conf.addr_range,
"Use with -S to list traced records within address range"),
OPT_CALLBACK_OPTARG(0, "insn-trace", &itrace_synth_opts, NULL, NULL,
"Decode instructions from itrace", parse_insn_trace),
OPT_CALLBACK_OPTARG(0, "xed", NULL, NULL, NULL,
......
......@@ -67,6 +67,7 @@
#include "util/top.h"
#include "util/affinity.h"
#include "util/pfm.h"
#include "util/bpf_counter.h"
#include "asm/bug.h"
#include <linux/time64.h>
......@@ -137,6 +138,19 @@ static const char *topdown_metric_attrs[] = {
NULL,
};
static const char *topdown_metric_L2_attrs[] = {
"slots",
"topdown-retiring",
"topdown-bad-spec",
"topdown-fe-bound",
"topdown-be-bound",
"topdown-heavy-ops",
"topdown-br-mispredict",
"topdown-fetch-lat",
"topdown-mem-bound",
NULL,
};
static const char *smi_cost_attrs = {
"{"
"msr/aperf/,"
......@@ -409,12 +423,32 @@ static int read_affinity_counters(struct timespec *rs)
return 0;
}
static int read_bpf_map_counters(void)
{
struct evsel *counter;
int err;
evlist__for_each_entry(evsel_list, counter) {
err = bpf_counter__read(counter);
if (err)
return err;
}
return 0;
}
static void read_counters(struct timespec *rs)
{
struct evsel *counter;
int err;
if (!stat_config.stop_read_counter && (read_affinity_counters(rs) < 0))
return;
if (!stat_config.stop_read_counter) {
if (target__has_bpf(&target))
err = read_bpf_map_counters();
else
err = read_affinity_counters(rs);
if (err < 0)
return;
}
evlist__for_each_entry(evsel_list, counter) {
if (counter->err)
......@@ -496,11 +530,22 @@ static bool handle_interval(unsigned int interval, int *times)
return false;
}
static void enable_counters(void)
static int enable_counters(void)
{
struct evsel *evsel;
int err;
if (target__has_bpf(&target)) {
evlist__for_each_entry(evsel_list, evsel) {
err = bpf_counter__enable(evsel);
if (err)
return err;
}
}
if (stat_config.initial_delay < 0) {
pr_info(EVLIST_DISABLED_MSG);
return;
return 0;
}
if (stat_config.initial_delay > 0) {
......@@ -518,6 +563,7 @@ static void enable_counters(void)
if (stat_config.initial_delay > 0)
pr_info(EVLIST_ENABLED_MSG);
}
return 0;
}
static void disable_counters(void)
......@@ -578,18 +624,19 @@ static void process_evlist(struct evlist *evlist, unsigned int interval)
if (evlist__ctlfd_process(evlist, &cmd) > 0) {
switch (cmd) {
case EVLIST_CTL_CMD_ENABLE:
pr_info(EVLIST_ENABLED_MSG);
if (interval)
process_interval();
break;
case EVLIST_CTL_CMD_DISABLE:
if (interval)
process_interval();
pr_info(EVLIST_DISABLED_MSG);
break;
case EVLIST_CTL_CMD_SNAPSHOT:
case EVLIST_CTL_CMD_ACK:
case EVLIST_CTL_CMD_UNSUPPORTED:
case EVLIST_CTL_CMD_EVLIST:
case EVLIST_CTL_CMD_STOP:
case EVLIST_CTL_CMD_PING:
default:
break;
}
......@@ -720,7 +767,7 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
const bool forks = (argc > 0);
bool is_pipe = STAT_RECORD ? perf_stat.data.is_pipe : false;
struct affinity affinity;
int i, cpu;
int i, cpu, err;
bool second_pass = false;
if (forks) {
......@@ -737,6 +784,13 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
if (affinity__setup(&affinity) < 0)
return -1;
if (target__has_bpf(&target)) {
evlist__for_each_entry(evsel_list, counter) {
if (bpf_counter__load(counter, &target))
return -1;
}
}
evlist__for_each_cpu (evsel_list, i, cpu) {
affinity__set(&affinity, cpu);
......@@ -850,7 +904,7 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
}
if (STAT_RECORD) {
int err, fd = perf_data__fd(&perf_stat.data);
int fd = perf_data__fd(&perf_stat.data);
if (is_pipe) {
err = perf_header__write_pipe(perf_data__fd(&perf_stat.data));
......@@ -876,7 +930,9 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
if (forks) {
evlist__start_workload(evsel_list);
enable_counters();
err = enable_counters();
if (err)
return -1;
if (interval || timeout || evlist__ctlfd_initialized(evsel_list))
status = dispatch_events(forks, timeout, interval, &times);
......@@ -895,7 +951,9 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
if (WIFSIGNALED(status))
psignal(WTERMSIG(status), argv[0]);
} else {
enable_counters();
err = enable_counters();
if (err)
return -1;
status = dispatch_events(forks, timeout, interval, &times);
}
......@@ -1085,6 +1143,10 @@ static struct option stat_options[] = {
"stat events on existing process id"),
OPT_STRING('t', "tid", &target.tid, "tid",
"stat events on existing thread id"),
#ifdef HAVE_BPF_SKEL
OPT_STRING('b', "bpf-prog", &target.bpf_str, "bpf-prog-id",
"stat events on existing bpf program id"),
#endif
OPT_BOOLEAN('a', "all-cpus", &target.system_wide,
"system-wide collection from all CPUs"),
OPT_BOOLEAN('g', "group", &group,
......@@ -1153,7 +1215,9 @@ static struct option stat_options[] = {
OPT_BOOLEAN(0, "metric-no-merge", &stat_config.metric_no_merge,
"don't try to share events between metrics in a group"),
OPT_BOOLEAN(0, "topdown", &topdown_run,
"measure topdown level 1 statistics"),
"measure top-down statistics"),
OPT_UINTEGER(0, "td-level", &stat_config.topdown_level,
"Set the metrics level for the top-down statistics (0: max level)"),
OPT_BOOLEAN(0, "smi-cost", &smi_cost,
"measure SMI cost"),
OPT_CALLBACK('M', "metrics", &evsel_list, "metric/metric group list",
......@@ -1706,17 +1770,30 @@ static int add_default_attributes(void)
}
if (topdown_run) {
const char **metric_attrs = topdown_metric_attrs;
unsigned int max_level = 1;
char *str = NULL;
bool warn = false;
if (!force_metric_only)
stat_config.metric_only = true;
if (topdown_filter_events(topdown_metric_attrs, &str, 1) < 0) {
if (pmu_have_event("cpu", topdown_metric_L2_attrs[5])) {
metric_attrs = topdown_metric_L2_attrs;
max_level = 2;
}
if (stat_config.topdown_level > max_level) {
pr_err("Invalid top-down metrics level. The max level is %u.\n", max_level);
return -1;
} else if (!stat_config.topdown_level)
stat_config.topdown_level = max_level;
if (topdown_filter_events(metric_attrs, &str, 1) < 0) {
pr_err("Out of memory\n");
return -1;
}
if (topdown_metric_attrs[0] && str) {
if (metric_attrs[0] && str) {
if (!stat_config.interval && !stat_config.metric_only) {
fprintf(stat_config.output,
"Topdown accuracy may decrease when measuring long periods.\n"
......@@ -1779,6 +1856,9 @@ static int add_default_attributes(void)
}
if (evlist__add_default_attrs(evsel_list, default_attrs1) < 0)
return -1;
if (arch_evlist__add_default_attrs(evsel_list) < 0)
return -1;
}
/* Detailed events get appended to the event list: */
......@@ -2064,11 +2144,12 @@ int cmd_stat(int argc, const char **argv)
"perf stat [<options>] [<command>]",
NULL
};
int status = -EINVAL, run_idx;
int status = -EINVAL, run_idx, err;
const char *mode;
FILE *output = stderr;
unsigned int interval, timeout;
const char * const stat_subcommands[] = { "record", "report" };
char errbuf[BUFSIZ];
setlocale(LC_ALL, "");
......@@ -2179,6 +2260,12 @@ int cmd_stat(int argc, const char **argv)
} else if (big_num_opt == 0) /* User passed --no-big-num */
stat_config.big_num = false;
err = target__validate(&target);
if (err) {
target__strerror(&target, err, errbuf, BUFSIZ);
pr_warning("%s\n", errbuf);
}
setup_system_wide(argc);
/*
......@@ -2252,8 +2339,6 @@ int cmd_stat(int argc, const char **argv)
}
}
target__validate(&target);
if ((stat_config.aggr_mode == AGGR_THREAD) && (target.system_wide))
target.per_thread = true;
......@@ -2384,9 +2469,10 @@ int cmd_stat(int argc, const char **argv)
* tools remain -acme
*/
int fd = perf_data__fd(&perf_stat.data);
int err = perf_event__synthesize_kernel_mmap((void *)&perf_stat,
process_synthesized_event,
&perf_stat.session->machines.host);
err = perf_event__synthesize_kernel_mmap((void *)&perf_stat,
process_synthesized_event,
&perf_stat.session->machines.host);
if (err) {
pr_warning("Couldn't synthesize the kernel mmap record, harmless, "
"older tools may produce warnings about this file\n.");
......
......@@ -37,6 +37,7 @@ int cmd_inject(int argc, const char **argv);
int cmd_mem(int argc, const char **argv);
int cmd_data(int argc, const char **argv);
int cmd_ftrace(int argc, const char **argv);
int cmd_daemon(int argc, const char **argv);
int find_scripts(char **scripts_array, char **scripts_path_array, int num,
int pathlen);
......
......@@ -31,3 +31,4 @@ perf-timechart mainporcelain common
perf-top mainporcelain common
perf-trace mainporcelain audit
perf-version mainporcelain common
perf-daemon mainporcelain common
......@@ -88,6 +88,7 @@ static struct cmd_struct commands[] = {
{ "mem", cmd_mem, 0 },
{ "data", cmd_data, 0 },
{ "ftrace", cmd_ftrace, 0 },
{ "daemon", cmd_daemon, 0 },
};
struct pager_config {
......
......@@ -9,15 +9,11 @@
"ArchStdEvent": "BR_INDIRECT_SPEC"
},
{
"PublicDescription": "Mispredicted or not predicted branch speculatively executed",
"EventCode": "0x10",
"EventName": "BR_MIS_PRED",
"ArchStdEvent": "BR_MIS_PRED",
"BriefDescription": "Branch mispredicted"
},
{
"PublicDescription": "Predictable branch speculatively executed",
"EventCode": "0x12",
"EventName": "BR_PRED",
"ArchStdEvent": "BR_PRED",
"BriefDescription": "Predictable branch"
}
]
......@@ -18,9 +18,6 @@
"ArchStdEvent": "BUS_ACCESS_PERIPH"
},
{
"PublicDescription": "Bus access",
"EventCode": "0x19",
"EventName": "BUS_ACCESS",
"BriefDescription": "Bus access"
"ArchStdEvent": "BUS_ACCESS",
}
]
......@@ -39,70 +39,40 @@
"ArchStdEvent": "L2D_CACHE_INVAL"
},
{
"PublicDescription": "Level 1 instruction cache refill",
"EventCode": "0x01",
"EventName": "L1I_CACHE_REFILL",
"BriefDescription": "L1I cache refill"
"ArchStdEvent": "L1I_CACHE_REFILL",
},
{
"PublicDescription": "Level 1 instruction TLB refill",
"EventCode": "0x02",
"EventName": "L1I_TLB_REFILL",
"BriefDescription": "L1I TLB refill"
"ArchStdEvent": "L1I_TLB_REFILL",
},
{
"PublicDescription": "Level 1 data cache refill",
"EventCode": "0x03",
"EventName": "L1D_CACHE_REFILL",
"BriefDescription": "L1D cache refill"
"ArchStdEvent": "L1D_CACHE_REFILL",
},
{
"PublicDescription": "Level 1 data cache access",
"EventCode": "0x04",
"EventName": "L1D_CACHE_ACCESS",
"BriefDescription": "L1D cache access"
"ArchStdEvent": "L1D_CACHE",
},
{
"PublicDescription": "Level 1 data TLB refill",
"EventCode": "0x05",
"EventName": "L1D_TLB_REFILL",
"BriefDescription": "L1D TLB refill"
"ArchStdEvent": "L1D_TLB_REFILL",
},
{
"PublicDescription": "Level 1 instruction cache access",
"EventCode": "0x14",
"EventName": "L1I_CACHE_ACCESS",
"BriefDescription": "L1I cache access"
"ArchStdEvent": "L1I_CACHE",
},
{
"PublicDescription": "Level 2 data cache access",
"EventCode": "0x16",
"EventName": "L2D_CACHE_ACCESS",
"BriefDescription": "L2D cache access"
"ArchStdEvent": "L2D_CACHE",
},
{
"PublicDescription": "Level 2 data refill",
"EventCode": "0x17",
"EventName": "L2D_CACHE_REFILL",
"BriefDescription": "L2D cache refill"
"ArchStdEvent": "L2D_CACHE_REFILL",
},
{
"PublicDescription": "Level 2 data cache, Write-Back",
"EventCode": "0x18",
"EventName": "L2D_CACHE_WB",
"BriefDescription": "L2D cache Write-Back"
"ArchStdEvent": "L2D_CACHE_WB",
},
{
"PublicDescription": "Level 1 data TLB access. This event counts any load or store operation which accesses the data L1 TLB",
"EventCode": "0x25",
"EventName": "L1D_TLB_ACCESS",
"PublicDescription": "This event counts any load or store operation which accesses the data L1 TLB",
"ArchStdEvent": "L1D_TLB",
"BriefDescription": "L1D TLB access"
},
{
"PublicDescription": "Level 1 instruction TLB access. This event counts any instruction fetch which accesses the instruction L1 TLB",
"EventCode": "0x26",
"EventName": "L1I_TLB_ACCESS",
"BriefDescription": "L1I TLB access"
"PublicDescription": "This event counts any instruction fetch which accesses the instruction L1 TLB",
"ArchStdEvent": "L1I_TLB",
},
{
"PublicDescription": "Level 2 access to data TLB that caused a page table walk. This event counts on any data access which causes L2D_TLB_REFILL to count",
......@@ -114,7 +84,7 @@
"PublicDescription": "Level 2 access to instruciton TLB that caused a page table walk. This event counts on any instruciton access which causes L2I_TLB_REFILL to count",
"EventCode": "0x35",
"EventName": "L2I_TLB_ACCESS",
"BriefDescription": "L2D TLB access"
"BriefDescription": "L2I TLB access"
},
{
"PublicDescription": "Branch target buffer misprediction",
......
[
{
"PublicDescription": "The number of core clock cycles",
"EventCode": "0x11",
"EventName": "CPU_CYCLES",
"BriefDescription": "Clock cycles"
"ArchStdEvent": "CPU_CYCLES",
},
{
"PublicDescription": "FSU clocking gated off cycle",
......
......@@ -36,15 +36,9 @@
"ArchStdEvent": "EXC_TRAP_FIQ"
},
{
"PublicDescription": "Exception taken",
"EventCode": "0x09",
"EventName": "EXC_TAKEN",
"BriefDescription": "Exception taken"
"ArchStdEvent": "EXC_TAKEN",
},
{
"PublicDescription": "Instruction architecturally executed, condition check pass, exception return",
"EventCode": "0x0a",
"EventName": "EXC_RETURN",
"BriefDescription": "Exception return"
"ArchStdEvent": "EXC_RETURN",
}
]
......@@ -40,45 +40,29 @@
},
{
"PublicDescription": "Instruction architecturally executed, software increment",
"EventCode": "0x00",
"EventName": "SW_INCR",
"ArchStdEvent": "SW_INCR",
"BriefDescription": "Software increment"
},
{
"PublicDescription": "Instruction architecturally executed",
"EventCode": "0x08",
"EventName": "INST_RETIRED",
"BriefDescription": "Instruction retired"
"ArchStdEvent": "INST_RETIRED",
},
{
"PublicDescription": "Instruction architecturally executed, condition code check pass, write to CONTEXTIDR",
"EventCode": "0x0b",
"EventName": "CID_WRITE_RETIRED",
"ArchStdEvent": "CID_WRITE_RETIRED",
"BriefDescription": "Write to CONTEXTIDR"
},
{
"PublicDescription": "Operation speculatively executed",
"EventCode": "0x1b",
"EventName": "INST_SPEC",
"BriefDescription": "Speculatively executed"
"ArchStdEvent": "INST_SPEC",
},
{
"PublicDescription": "Instruction architecturally executed (condition check pass), write to TTBR",
"EventCode": "0x1c",
"EventName": "TTBR_WRITE_RETIRED",
"BriefDescription": "Instruction executed, TTBR write"
"ArchStdEvent": "TTBR_WRITE_RETIRED",
},
{
"PublicDescription": "Instruction architecturally executed, branch. This event counts all branches, taken or not. This excludes exception entries, debug entries and CCFAIL branches",
"EventCode": "0x21",
"EventName": "BR_RETIRED",
"BriefDescription": "Branch retired"
"PublicDescription": "This event counts all branches, taken or not. This excludes exception entries, debug entries and CCFAIL branches",
"ArchStdEvent": "BR_RETIRED",
},
{
"PublicDescription": "Instruction architecturally executed, mispredicted branch. This event counts any branch counted by BR_RETIRED which is not correctly predicted and causes a pipeline flush",
"EventCode": "0x22",
"EventName": "BR_MISPRED_RETIRED",
"BriefDescription": "Mispredicted branch retired"
"PublicDescription": "This event counts any branch counted by BR_RETIRED which is not correctly predicted and causes a pipeline flush",
"ArchStdEvent": "BR_MIS_PRED_RETIRED",
},
{
"PublicDescription": "Operation speculatively executed, NOP",
......
......@@ -15,15 +15,10 @@
"ArchStdEvent": "UNALIGNED_LDST_SPEC"
},
{
"PublicDescription": "Data memory access",
"EventCode": "0x13",
"EventName": "MEM_ACCESS",
"BriefDescription": "Memory access"
"ArchStdEvent": "MEM_ACCESS",
},
{
"PublicDescription": "Local memory error. This event counts any correctable or uncorrectable memory error (ECC or parity) in the protected core RAMs",
"EventCode": "0x1a",
"EventName": "MEM_ERROR",
"BriefDescription": "Memory error"
"PublicDescription": "This event counts any correctable or uncorrectable memory error (ECC or parity) in the protected core RAMs",
"ArchStdEvent": "MEMORY_ERROR",
}
]
[
{
"PublicDescription": "Mispredicted or not predicted branch speculatively executed. This event counts any predictable branch instruction which is mispredicted either due to dynamic misprediction or because the MMU is off and the branches are statically predicted not taken.",
"EventCode": "0x10",
"EventName": "BR_MIS_PRED",
"BriefDescription": "Mispredicted or not predicted branch speculatively executed."
"PublicDescription": "This event counts any predictable branch instruction which is mispredicted either due to dynamic misprediction or because the MMU is off and the branches are statically predicted not taken",
"ArchStdEvent": "BR_MIS_PRED",
},
{
"PublicDescription": "Predictable branch speculatively executed. This event counts all predictable branches.",
"EventCode": "0x12",
"EventName": "BR_PRED",
"BriefDescription": "Predictable branch speculatively executed."
"PublicDescription": "This event counts all predictable branches.",
"ArchStdEvent": "BR_PRED",
}
]
[
{
"EventCode": "0x11",
"EventName": "CPU_CYCLES",
"PublicDescription": "The number of core clock cycles"
"ArchStdEvent": "CPU_CYCLES",
"BriefDescription": "The number of core clock cycles."
},
{
"PublicDescription": "Bus access. This event counts for every beat of data transferred over the data channels between the core and the SCU. If both read and write data beats are transferred on a given cycle, this event is counted twice on that cycle. This event counts the sum of BUS_ACCESS_RD and BUS_ACCESS_WR.",
"EventCode": "0x19",
"EventName": "BUS_ACCESS",
"BriefDescription": "Bus access."
"PublicDescription": "This event counts for every beat of data transferred over the data channels between the core and the SCU. If both read and write data beats are transferred on a given cycle, this event is counted twice on that cycle. This event counts the sum of BUS_ACCESS_RD and BUS_ACCESS_WR.",
"ArchStdEvent": "BUS_ACCESS",
},
{
"EventCode": "0x1D",
"EventName": "BUS_CYCLES",
"BriefDescription": "Bus cycles. This event duplicates CPU_CYCLES."
"PublicDescription": "This event duplicates CPU_CYCLES."
"ArchStdEvent": "BUS_CYCLES",
},
{
"ArchStdEvent": "BUS_ACCESS_RD"
"ArchStdEvent": "BUS_ACCESS_RD",
},
{
"ArchStdEvent": "BUS_ACCESS_WR"
"ArchStdEvent": "BUS_ACCESS_WR",
}
]
[
{
"EventCode": "0x09",
"EventName": "EXC_TAKEN",
"BriefDescription": "Exception taken."
"ArchStdEvent": "EXC_TAKEN",
},
{
"PublicDescription": "Local memory error. This event counts any correctable or uncorrectable memory error (ECC or parity) in the protected core RAMs",
"EventCode": "0x1A",
"EventName": "MEMORY_ERROR",
"BriefDescription": "Local memory error."
"PublicDescription": "This event counts any correctable or uncorrectable memory error (ECC or parity) in the protected core RAMs",
"ArchStdEvent": "MEMORY_ERROR",
},
{
"ArchStdEvent": "EXC_DABORT"
......
[
{
"PublicDescription": "Software increment. Instruction architecturally executed (condition code check pass).",
"EventCode": "0x00",
"EventName": "SW_INCR",
"BriefDescription": "Software increment."
"ArchStdEvent": "SW_INCR",
},
{
"PublicDescription": "Instruction architecturally executed. This event counts all retired instructions, including those that fail their condition check.",
"EventCode": "0x08",
"EventName": "INST_RETIRED",
"BriefDescription": "Instruction architecturally executed."
"PublicDescription": "This event counts all retired instructions, including those that fail their condition check.",
"ArchStdEvent": "INST_RETIRED",
},
{
"EventCode": "0x0A",
"EventName": "EXC_RETURN",
"BriefDescription": "Instruction architecturally executed, condition code check pass, exception return."
"ArchStdEvent": "EXC_RETURN",
},
{
"PublicDescription": "Instruction architecturally executed, condition code check pass, write to CONTEXTIDR. This event only counts writes to CONTEXTIDR in AArch32 state, and via the CONTEXTIDR_EL1 mnemonic in AArch64 state.",
"EventCode": "0x0B",
"EventName": "CID_WRITE_RETIRED",
"BriefDescription": "Instruction architecturally executed, condition code check pass, write to CONTEXTIDR."
"PublicDescription": "This event only counts writes to CONTEXTIDR in AArch32 state, and via the CONTEXTIDR_EL1 mnemonic in AArch64 state.",
"ArchStdEvent": "CID_WRITE_RETIRED",
},
{
"EventCode": "0x1B",
"EventName": "INST_SPEC",
"BriefDescription": "Operation speculatively executed"
"ArchStdEvent": "INST_SPEC",
},
{
"PublicDescription": "Instruction architecturally executed, condition code check pass, write to TTBR. This event only counts writes to TTBR0/TTBR1 in AArch32 state and TTBR0_EL1/TTBR1_EL1 in AArch64 state.",
"EventCode": "0x1C",
"EventName": "TTBR_WRITE_RETIRED",
"BriefDescription": "Instruction architecturally executed, condition code check pass, write to TTBR"
"PublicDescription": "This event only counts writes to TTBR0/TTBR1 in AArch32 state and TTBR0_EL1/TTBR1_EL1 in AArch64 state.",
"ArchStdEvent": "TTBR_WRITE_RETIRED",
},
{
"PublicDescription": "Instruction architecturally executed, branch. This event counts all branches, taken or not. This excludes exception entries, debug entries and CCFAIL branches.",
"EventCode": "0x21",
"EventName": "BR_RETIRED",
"BriefDescription": "Instruction architecturally executed, branch."
{,
"PublicDescription": "This event counts all branches, taken or not. This excludes exception entries, debug entries and CCFAIL branches.",
"ArchStdEvent": "BR_RETIRED",
},
{
"PublicDescription": "Instruction architecturally executed, mispredicted branch. This event counts any branch counted by BR_RETIRED which is not correctly predicted and causes a pipeline flush.",
"EventCode": "0x22",
"EventName": "BR_MIS_PRED_RETIRED",
"BriefDescription": "Instruction architecturally executed, mispredicted branch."
"PublicDescription": "This event counts any branch counted by BR_RETIRED which is not correctly predicted and causes a pipeline flush.",
"ArchStdEvent": "BR_MIS_PRED_RETIRED",
},
{
"ArchStdEvent": "ASE_SPEC"
......
[
{
"PublicDescription": "Data memory access. This event counts memory accesses due to load or store instructions. This event counts the sum of MEM_ACCESS_RD and MEM_ACCESS_WR.",
"EventCode": "0x13",
"EventName": "MEM_ACCESS",
"BriefDescription": "Data memory access"
"PublicDescription": "This event counts memory accesses due to load or store instructions. This event counts the sum of MEM_ACCESS_RD and MEM_ACCESS_WR.",
"ArchStdEvent": "MEM_ACCESS",
},
{
"ArchStdEvent": "MEM_ACCESS_RD"
......
[
{
"EventCode": "0x31",
"EventName": "REMOTE_ACCESS",
"BriefDescription": "Access to another socket in a multi-socket system"
"ArchStdEvent": "REMOTE_ACCESS",
}
]
[
{
"PublicDescription": "No operation issued because of the frontend. The counter counts on any cycle when there are no fetched instructions available to dispatch.",
"EventCode": "0x23",
"EventName": "STALL_FRONTEND",
"BriefDescription": "No operation issued because of the frontend."
"PublicDescription": "The counter counts on any cycle when there are no fetched instructions available to dispatch.",
"ArchStdEvent": "STALL_FRONTEND",
},
{
"PublicDescription": "No operation issued because of the backend. The counter counts on any cycle fetched instructions are not dispatched due to resource constraints.",
"EventCode": "0x24",
"EventName": "STALL_BACKEND",
"BriefDescription": "No operation issued because of the backend."
"PublicDescription": "The counter counts on any cycle fetched instructions are not dispatched due to resource constraints.",
"ArchStdEvent": "STALL_BACKEND",
}
]
[
{
"PublicDescription": "Instruction architecturally executed, Condition code check pass, software increment",
"EventCode": "0x00",
"EventName": "SW_INCR",
"BriefDescription": "Instruction architecturally executed, Condition code check pass, software increment"
},
{
"PublicDescription": "Level 1 instruction cache refill",
"EventCode": "0x01",
"EventName": "L1I_CACHE_REFILL",
"BriefDescription": "Level 1 instruction cache refill"
},
{
"PublicDescription": "Attributable Level 1 instruction TLB refill",
"EventCode": "0x02",
"EventName": "L1I_TLB_REFILL",
"BriefDescription": "Attributable Level 1 instruction TLB refill"
},
{
"PublicDescription": "Level 1 data cache refill",
"EventCode": "0x03",
"EventName": "L1D_CACHE_REFILL",
"BriefDescription": "Level 1 data cache refill"
},
{
"PublicDescription": "Level 1 data cache access",
"EventCode": "0x04",
"EventName": "L1D_CACHE",
"BriefDescription": "Level 1 data cache access"
},
{
"PublicDescription": "Attributable Level 1 data TLB refill",
"EventCode": "0x05",
"EventName": "L1D_TLB_REFILL",
"BriefDescription": "Attributable Level 1 data TLB refill"
},
{
"PublicDescription": "Instruction architecturally executed",
"EventCode": "0x08",
"EventName": "INST_RETIRED",
"BriefDescription": "Instruction architecturally executed"
},
{
"PublicDescription": "Exception taken",
"EventCode": "0x09",
"EventName": "EXC_TAKEN",
"BriefDescription": "Exception taken"
},
{
"PublicDescription": "Instruction architecturally executed, condition check pass, exception return",
"EventCode": "0x0a",
"EventName": "EXC_RETURN",
"BriefDescription": "Instruction architecturally executed, condition check pass, exception return"
},
{
"PublicDescription": "Instruction architecturally executed, condition code check pass, write to CONTEXTIDR",
"EventCode": "0x0b",
"EventName": "CID_WRITE_RETIRED",
"BriefDescription": "Instruction architecturally executed, condition code check pass, write to CONTEXTIDR"
},
{
"PublicDescription": "Mispredicted or not predicted branch speculatively executed",
"EventCode": "0x10",
"EventName": "BR_MIS_PRED",
"BriefDescription": "Mispredicted or not predicted branch speculatively executed"
},
{
"PublicDescription": "Cycle",
"EventCode": "0x11",
"EventName": "CPU_CYCLES",
"BriefDescription": "Cycle"
},
{
"PublicDescription": "Predictable branch speculatively executed",
"EventCode": "0x12",
"EventName": "BR_PRED",
"BriefDescription": "Predictable branch speculatively executed"
},
{
"PublicDescription": "Data memory access",
"EventCode": "0x13",
"EventName": "MEM_ACCESS",
"BriefDescription": "Data memory access"
},
{
"PublicDescription": "Attributable Level 1 instruction cache access",
"EventCode": "0x14",
"EventName": "L1I_CACHE",
"BriefDescription": "Attributable Level 1 instruction cache access"
},
{
"PublicDescription": "Attributable Level 1 data cache write-back",
"EventCode": "0x15",
"EventName": "L1D_CACHE_WB",
"BriefDescription": "Attributable Level 1 data cache write-back"
},
{
"PublicDescription": "Level 2 data cache access",
"EventCode": "0x16",
"EventName": "L2D_CACHE",
"BriefDescription": "Level 2 data cache access"
},
{
"PublicDescription": "Level 2 data refill",
"EventCode": "0x17",
"EventName": "L2D_CACHE_REFILL",
"BriefDescription": "Level 2 data refill"
},
{
"PublicDescription": "Attributable Level 2 data cache write-back",
"EventCode": "0x18",
"EventName": "L2D_CACHE_WB",
"BriefDescription": "Attributable Level 2 data cache write-back"
},
{
"PublicDescription": "Attributable Bus access",
"EventCode": "0x19",
"EventName": "BUS_ACCESS",
"BriefDescription": "Attributable Bus access"
},
{
"PublicDescription": "Local memory error",
"EventCode": "0x1a",
"EventName": "MEMORY_ERROR",
"BriefDescription": "Local memory error"
},
{
"PublicDescription": "Operation speculatively executed",
"EventCode": "0x1b",
"EventName": "INST_SPEC",
"BriefDescription": "Operation speculatively executed"
},
{
"PublicDescription": "Instruction architecturally executed, Condition code check pass, write to TTBR",
"EventCode": "0x1c",
"EventName": "TTBR_WRITE_RETIRED",
"BriefDescription": "Instruction architecturally executed, Condition code check pass, write to TTBR"
},
{
"PublicDescription": "Bus cycle",
"EventCode": "0x1D",
"EventName": "BUS_CYCLES",
"BriefDescription": "Bus cycle"
},
{
"PublicDescription": "Attributable Level 2 data cache allocation without refill",
"EventCode": "0x20",
"EventName": "L2D_CACHE_ALLOCATE",
"BriefDescription": "Attributable Level 2 data cache allocation without refill"
},
{
"PublicDescription": "Instruction architecturally executed, branch",
"EventCode": "0x21",
"EventName": "BR_RETIRED",
"BriefDescription": "Instruction architecturally executed, branch"
},
{
"PublicDescription": "Instruction architecturally executed, mispredicted branch",
"EventCode": "0x22",
"EventName": "BR_MIS_PRED_RETIRED",
"BriefDescription": "Instruction architecturally executed, mispredicted branch"
},
{
"PublicDescription": "No operation issued because of the frontend",
"EventCode": "0x23",
"EventName": "STALL_FRONTEND",
"BriefDescription": "No operation issued because of the frontend"
},
{
"PublicDescription": "No operation issued due to the backend",
"EventCode": "0x24",
"EventName": "STALL_BACKEND",
"BriefDescription": "No operation issued due to the backend"
},
{
"PublicDescription": "Attributable Level 1 data or unified TLB access",
"EventCode": "0x25",
"EventName": "L1D_TLB",
"BriefDescription": "Attributable Level 1 data or unified TLB access"
},
{
"PublicDescription": "Attributable Level 1 instruction TLB access",
"EventCode": "0x26",
"EventName": "L1I_TLB",
"BriefDescription": "Attributable Level 1 instruction TLB access"
},
{
"PublicDescription": "Attributable Level 3 data cache allocation without refill",
"EventCode": "0x29",
"EventName": "L3D_CACHE_ALLOCATE",
"BriefDescription": "Attributable Level 3 data cache allocation without refill"
},
{
"PublicDescription": "Attributable Level 3 data cache refill",
"EventCode": "0x2A",
"EventName": "L3D_CACHE_REFILL",
"BriefDescription": "Attributable Level 3 data cache refill"
},
{
"PublicDescription": "Attributable Level 3 data cache access",
"EventCode": "0x2B",
"EventName": "L3D_CACHE",
"BriefDescription": "Attributable Level 3 data cache access"
},
{
"PublicDescription": "Attributable Level 2 data TLB refill",
"EventCode": "0x2D",
"EventName": "L2D_TLB_REFILL",
"BriefDescription": "Attributable Level 2 data TLB refill"
},
{
"PublicDescription": "Attributable Level 2 data or unified TLB access",
"EventCode": "0x2F",
"EventName": "L2D_TLB",
"BriefDescription": "Attributable Level 2 data or unified TLB access"
},
{
"PublicDescription": "Access to another socket in a multi-socket system",
"EventCode": "0x31",
"EventName": "REMOTE_ACCESS",
"BriefDescription": "Access to another socket in a multi-socket system"
},
{
"PublicDescription": "Access to data TLB causes a translation table walk",
"EventCode": "0x34",
"EventName": "DTLB_WALK",
"BriefDescription": "Access to data TLB causes a translation table walk"
},
{
"PublicDescription": "Access to instruction TLB that causes a translation table walk",
"EventCode": "0x35",
"EventName": "ITLB_WALK",
"BriefDescription": "Access to instruction TLB that causes a translation table walk"
},
{
"PublicDescription": "Attributable Last level cache memory read",
"EventCode": "0x36",
"EventName": "LL_CACHE_RD",
"BriefDescription": "Attributable Last level cache memory read"
},
{
"PublicDescription": "Last level cache miss, read",
"EventCode": "0x37",
"EventName": "LL_CACHE_MISS_RD",
"BriefDescription": "Last level cache miss, read"
}
]
......@@ -6,7 +6,7 @@
"ScaleUnit": "9.765625e-4KB",
"Unit": "imx8_ddr",
"Compat": "i.MX8MM"
},
},
{
"BriefDescription": "bytes all masters write to ddr based on write-cycles event",
"MetricName": "imx8mm_ddr_write.all",
......@@ -14,5 +14,5 @@
"ScaleUnit": "9.765625e-4KB",
"Unit": "imx8_ddr",
"Compat": "i.MX8MM"
}
}
]
[
{
"BriefDescription": "ddr cycles event",
"EventCode": "0x00",
"EventName": "imx8mn_ddr.cycles",
"Unit": "imx8_ddr",
"Compat": "i.MX8MN"
},
{
"BriefDescription": "ddr read-cycles event",
"EventCode": "0x2a",
"EventName": "imx8mn_ddr.read_cycles",
"Unit": "imx8_ddr",
"Compat": "i.MX8MN"
},
{
"BriefDescription": "ddr write-cycles event",
"EventCode": "0x2b",
"EventName": "imx8mn_ddr.write_cycles",
"Unit": "imx8_ddr",
"Compat": "i.MX8MN"
},
{
"BriefDescription": "ddr read event",
"EventCode": "0x35",
"EventName": "imx8mn_ddr.read",
"Unit": "imx8_ddr",
"Compat": "i.MX8MN"
},
{
"BriefDescription": "ddr write event",
"EventCode": "0x38",
"EventName": "imx8mn_ddr.write",
"Unit": "imx8_ddr",
"Compat": "i.MX8MN"
}
]
......@@ -58,6 +58,7 @@ perf-y += time-utils-test.o
perf-y += genelf.o
perf-y += api-io.o
perf-y += demangle-java-test.o
perf-y += demangle-ocaml-test.o
perf-y += pfm.o
perf-y += parse-metric.o
perf-y += pe-file-parsing.o
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment