1. 12 Apr, 2024 18 commits
  2. 08 Apr, 2024 11 commits
  3. 05 Apr, 2024 2 commits
    • Andi Kleen's avatar
      perf script: Add capstone support for '-F +brstackdisasm' · d8120446
      Andi Kleen authored
      Support capstone output for the '-F +brstackinsn' branch dump.
      
      The new output is enabled with the new field 'brstackdisasm'.
      
      This was possible before with --xed, but now also allow it for users
      that don't have xed using the builtin capstone support.
      
      Before:
      
        perf record -b emacs -Q --batch '()'
        perf script -F +brstackinsn
        ...
                  emacs   55778 1814366.755945:     151564 cycles:P:      7f0ab2d17192 intel_check_word.constprop.0+0x162 (/usr/lib64/ld-linux-x86-64.s>        intel_check_word.constprop.0+237:
                00007f0ab2d1711d        insn: 75 e6                     # PRED 3 cycles [3]
                00007f0ab2d17105        insn: 73 51
                00007f0ab2d17107        insn: 48 89 c1
                00007f0ab2d1710a        insn: 48 39 ca
                00007f0ab2d1710d        insn: 73 96
                00007f0ab2d1710f        insn: 48 8d 04 11
                00007f0ab2d17113        insn: 48 d1 e8
                00007f0ab2d17116        insn: 49 8d 34 c1
                00007f0ab2d1711a        insn: 44 3a 06
                00007f0ab2d1711d        insn: 75 e6                     # PRED 3 cycles [6] 3.00 IPC
                00007f0ab2d17105        insn: 73 51                     # PRED 1 cycles [7] 1.00 IPC
                00007f0ab2d17158        insn: 48 8d 50 01
                00007f0ab2d1715c        insn: eb 92                     # PRED 1 cycles [8] 2.00 IPC
                00007f0ab2d170f0        insn: 48 39 ca
                00007f0ab2d170f3        insn: 73 b0                     # PRED 1 cycles [9] 2.00 IPC
      
      After (perf must be compiled with capstone):
      
        perf script -F +brstackdisasm
      
        ...
                   emacs   55778 1814366.755945:     151564 cycles:P:      7f0ab2d17192 intel_check_word.constprop.0+0x162 (/usr/lib64/ld-linux-x86-64.s>        intel_check_word.constprop.0+237:
                00007f0ab2d1711d        jne intel_check_word.constprop.0+0xd5   # PRED 3 cycles [3]
                00007f0ab2d17105        jae intel_check_word.constprop.0+0x128
                00007f0ab2d17107        movq %rax, %rcx
                00007f0ab2d1710a        cmpq %rcx, %rdx
                00007f0ab2d1710d        jae intel_check_word.constprop.0+0x75
                00007f0ab2d1710f        leaq (%rcx, %rdx), %rax
                00007f0ab2d17113        shrq $1, %rax
                00007f0ab2d17116        leaq (%r9, %rax, 8), %rsi
                00007f0ab2d1711a        cmpb (%rsi), %r8b
                00007f0ab2d1711d        jne intel_check_word.constprop.0+0xd5   # PRED 3 cycles [6] 3.00 IPC
                00007f0ab2d17105        jae intel_check_word.constprop.0+0x128  # PRED 1 cycles [7] 1.00 IPC
                00007f0ab2d17158        leaq 1(%rax), %rdx
                00007f0ab2d1715c        jmp intel_check_word.constprop.0+0xc0   # PRED 1 cycles [8] 2.00 IPC
                00007f0ab2d170f0        cmpq %rcx, %rdx
                00007f0ab2d170f3        jae intel_check_word.constprop.0+0x75   # PRED 1 cycles [9] 2.00 IPC
      Reviewed-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Link: https://lore.kernel.org/r/20240401210925.209671-3-ak@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d8120446
    • Andi Kleen's avatar
      perf script: Support 32bit code under 64bit OS with capstone · 38ab6013
      Andi Kleen authored
      Use the DSO to resolve whether an IP is 32bit or 64bit and use that to
      configure capstone to the correct mode. This allows to correctly
      disassemble 32bit code under a 64bit OS.
      
        % cat > loop.c
        volatile int var;
        int main(void)
        {
        	int i;
        	for (i = 0; i < 100000; i++)
        		var++;
        }
        % gcc -m32 -o loop loop.c
        % perf record -e cycles:u ./loop
        % perf script -F +disasm
          loop   82665 1833176.618023:      1 cycles:u:   f7eed500 _start+0x0 (/usr/lib/ld-linux.so.2)   movl %esp, %eax
          loop   82665 1833176.618029:      1 cycles:u:   f7eed500 _start+0x0 (/usr/lib/ld-linux.so.2)   movl %esp, %eax
          loop   82665 1833176.618031:      7 cycles:u:   f7eed500 _start+0x0 (/usr/lib/ld-linux.so.2)   movl %esp, %eax
          loop   82665 1833176.618034:     91 cycles:u:   f7eed500 _start+0x0 (/usr/lib/ld-linux.so.2)   movl %esp, %eax
          loop   82665 1833176.618036:   1242 cycles:u:   f7eed500 _start+0x0 (/usr/lib/ld-linux.so.2)   movl %esp, %eax
      Reviewed-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Acked-by: default avatarThomas Richter <tmricht@linux.ibm.com>
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Link: https://lore.kernel.org/r/20240401210925.209671-2-ak@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      38ab6013
  4. 04 Apr, 2024 2 commits
    • Thomas Richter's avatar
      perf stat: Do not fail on metrics on s390 z/VM systems · c2f3d7df
      Thomas Richter authored
      On s390 z/VM virtual machines command 'perf list' also displays metrics:
      
        # perf list | grep -A 20 'Metric Groups:'
        Metric Groups:
      
        No_group:
         cpi
              [Cycles per Instruction]
         est_cpi
              [Estimated Instruction Complexity CPI infinite Level 1]
         finite_cpi
              [Cycles per Instructions from Finite cache/memory]
         l1mp
              [Level One Miss per 100 Instructions]
         l2p
              [Percentage sourced from Level 2 cache]
         l3p
              [Percentage sourced from Level 3 on same chip cache]
         l4lp
              [Percentage sourced from Level 4 Local cache on same book]
         l4rp
              [Percentage sourced from Level 4 Remote cache on different book]
         memp
              [Percentage sourced from memory]
         ....
        #
      
      The command
      
        # perf stat -M cpi -- true
        event syntax error: '{CPU_CYCLES/metric-id=CPU_CYCLES/.....'
                              \___ Bad event or PMU
      
        Unable to find PMU or event on a PMU of 'CPU_CYCLES'
      
         event syntax error: '{CPU_CYCLES/metric-id=CPU_CYCLES/...'
                              \___ Cannot find PMU `CPU_CYCLES'.
                                   Missing kernel support?
       #
      
      fails. 'perf stat' should not fail on metrics when the referenced CPU
      Counter Measurement PMU is not available.
      
      Output after:
      
        # perf stat -M est_cpi -- sleep 1
      
        Performance counter stats for 'sleep 1':
      
           1,000,887,494 ns   duration_time   #     0.00 est_cpi
      
             1.000887494 seconds time elapsed
      
             0.000143000 seconds user
             0.000662000 seconds sys
      
       #
      
      Fixes: 7f76b311 ("perf list: Add IBM z16 event description for s390")
      Suggested-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarThomas Richter <tmricht@linux.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Link: https://lore.kernel.org/r/20240404064806.1362876-2-tmricht@linux.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c2f3d7df
    • Thomas Richter's avatar
      perf report: Fix PAI counter names for s390 virtual machines · b74bc5a6
      Thomas Richter authored
      s390 introduced the Processor Activity Instrumentation (PAI) counter
      facility on LPAR and virtual machines z/VM for models 3931 and 3932.
      
      These counters are stored as raw data in the perf.data file and are
      displayed with:
      
       # perf report -i /tmp//perfout-635468 -D | grep Counter
      	Counter:007 <unknown> Value:0x00000000000186a0
      	Counter:032 <unknown> Value:0x0000000000000001
      	Counter:032 <unknown> Value:0x0000000000000001
      	Counter:032 <unknown> Value:0x0000000000000001
       #
      
      However on z/VM virtual machines, the counter names are not retrieved
      from the PMU and are shown as '<unknown>'.  This is caused by the CPU
      string saved in the mapfile.csv for this machine:
      
         ^IBM.393[12].*3\.7.[[:xdigit:]]+$,3,cf_z16,core
      
      This string contains the CPU Measurement facility first and second
      version number and authorization level (3\.7.[[:xdigit:]]+).  These
      numbers do not apply to the PAI counter facility.  In fact they can be
      omitted.
      
      Shorten the CPU identification string for this machine to manufacturer
      and model. This is sufficient for all PMU devices.
      
      Output after:
      
       # perf report -i /tmp//perfout-635468 -D | grep Counter
      	Counter:007 km_aes_128 Value:0x00000000000186a0
      	Counter:032 kma_gcm_aes_256 Value:0x0000000000000001
      	Counter:032 kma_gcm_aes_256 Value:0x0000000000000001
      	Counter:032 kma_gcm_aes_256 Value:0x0000000000000001
       #
      
      Fixes: b539deaf ("perf report: Add s390 raw data interpretation for PAI counters")
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarThomas Richter <tmricht@linux.ibm.com>
      Acked-by: default avatarSumanth Korikkar <sumanthk@linux.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Link: https://lore.kernel.org/r/20240404064806.1362876-1-tmricht@linux.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b74bc5a6
  5. 03 Apr, 2024 7 commits
    • Arnaldo Carvalho de Melo's avatar
      perf annotate: Initialize 'arch' variable not to trip some -Werror=maybe-uninitialized · b6347cb5
      Arnaldo Carvalho de Melo authored
      In some older distros the build is failing due to
      -Werror=maybe-uninitialized, in this case we know that this isn't the
      case because 'arch' gets initialized by evsel__get_arch(), so make sure
      it is initialized to NULL before returning from evsel__get_arch(), as
      suggested by Ian Rogers.
      
      E.g.:
      
          32    17.12 opensuse:15.5                 : FAIL gcc version 7.5.0 (SUSE Linux)
              util/annotate.c: In function 'hist_entry__get_data_type':
          util/annotate.c:2269:15: error: 'arch' may be used uninitialized in this function [-Werror=maybe-uninitialized]
            struct arch *arch;
                         ^~~~
          cc1: all warnings being treated as errors
      
            43     7.30 ubuntu:18.04-x-powerpc64el    : FAIL gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
          util/annotate.c: In function 'hist_entry__get_data_type':
          util/annotate.c:2351:36: error: 'arch' may be used uninitialized in this function [-Werror=maybe-uninitialized]
             if (map__dso(ms->map)->kernel && arch__is(arch, "x86") &&
                                              ^~~~~~~~~~~~~~~~~~~~~
          cc1: all warnings being treated as errors
      Suggested-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/lkml/CAP-5=fUqtjxAsmdGrnkjhUTLHs-JvV10TtxyocpYDJK_+LYTiQ@mail.gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b6347cb5
    • Yang Jihong's avatar
      perf build: Add LIBTRACEEVENT_DIR build option · baa2ca59
      Yang Jihong authored
      Currently, when libtraceevent is not linked,
      perf does not support tracepoint:
      
        # ./perf record -e sched:sched_switch -a sleep 10
        event syntax error: 'sched:sched_switch'
                             \___ unsupported tracepoint
      
        libtraceevent is necessary for tracepoint support
        Run 'perf list' for a list of valid events
      
         Usage: perf record [<options>] [<command>]
            or: perf record [<options>] -- <command> [<options>]
      
            -e, --event <event>   event selector. use 'perf list' to list available events
      
      For cross-compilation scenario, library may not be installed in the default
      system path. Based on the above requirements, add LIBTRACEEVENT_DIR build
      option to support specifying path of libtraceevent.
      
      Example:
      
        1. Cross compile libtraceevent
        # cd /opt/libtraceevent
        # CROSS_COMPILE=aarch64-linux-gnu- make
      
        2. Cross compile perf
        # cd tool/perf
        # make VF=1 ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- NO_LIBELF=1 LDFLAGS=--static LIBTRACEEVENT_DIR=/opt/libtraceevent
        <SNIP>
        Auto-detecting system features:
        <SNIP>
        ...                       LIBTRACEEVENT_DIR: /opt/libtraceevent
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarYang Jihong <yangjihong@bytedance.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240314063000.2139877-1-yangjihong@bytedance.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      baa2ca59
    • Yang Jihong's avatar
      perf beauty: Fix AT_EACCESS undeclared build error for system with kernel versions lower than v5.8 · 089ef2f4
      Yang Jihong authored
      In the environment of ubuntu 20.04 (the version of kernel headers is
      5.4), there is an error in building perf:
      
          CC      trace/beauty/fs_at_flags.o
        trace/beauty/fs_at_flags.c: In function ‘faccessat2__scnprintf_flags’:
        trace/beauty/fs_at_flags.c:35:14: error: ‘AT_EACCESS’ undeclared (first use in this function); did you mean ‘DN_ACCESS’?
           35 |  if (flags & AT_EACCESS) {
              |              ^~~~~~~~~~
              |              DN_ACCESS
        trace/beauty/fs_at_flags.c:35:14: note: each undeclared identifier is reported only once for each function it appears in
      
      commit 8a1ad441 ("tools headers: Remove now unused copies of
      uapi/{fcntl,openat2}.h and asm/fcntl.h") removes fcntl.h from tools
      headers directory, and fs_at_flags.c uses the 'AT_EACCESS' macro.
      
      This macro was introduced in the kernel version v5.8.  For system with a
      kernel version older than this version, it will cause compilation to
      fail.
      
      Fixes: 8a1ad441 ("tools headers: Remove now unused copies of uapi/{fcntl,openat2}.h and asm/fcntl.h")
      Signed-off-by: default avatarYang Jihong <yangjihong@bytedance.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240403122558.1438841-1-yangjihong@bytedance.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      089ef2f4
    • Namhyung Kim's avatar
      perf annotate: Add symbol name when using capstone · 92dfc594
      Namhyung Kim authored
      This is to keep the existing behavior with objdump.  It needs to show
      symbol information of global variables like below:
      
         Percent |      Source code & Disassembly of elf for cycles:P (1 samples, percent: local period)
        ------------------------------------------------------------------------------------------------
                 : 0                0xffffffff81338f70 <vm_normal_page>:
            0.00 :   ffffffff81338f70:       endbr64
            0.00 :   ffffffff81338f74:       callq   0xffffffff81083a40
            0.00 :   ffffffff81338f79:       movq    %rdi, %r8
            0.00 :   ffffffff81338f7c:       movq    %rdx, %rdi
            0.00 :   ffffffff81338f7f:       callq   *0x17021c3(%rip)   # ffffffff82a3b148 <pv_ops+0x1e8>
            0.00 :   ffffffff81338f85:       movq    0xffbf3c(%rip), %rdx       # ffffffff82334ec8 <physical_mask>
            0.00 :   ffffffff81338f8c:       testq   %rax, %rax                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
            0.00 :   ffffffff81338f8f:       je      0xffffffff81338fd0                         here
            0.00 :   ffffffff81338f91:       movq    %rax, %rcx
            0.00 :   ffffffff81338f94:       andl    $1, %ecx
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240329215812.537846-6-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      92dfc594
    • Namhyung Kim's avatar
      perf annotate: Use libcapstone to disassemble · 6d17edc1
      Namhyung Kim authored
      Now it can use the capstone library to disassemble the instructions.
      Let's use that (if available) for perf annotate to speed up.  Currently
      it only supports x86 architecture.  With this change I can see ~3x speed
      up in data type profiling.
      
      But note that capstone cannot give the source file and line number info.
      For now, users should use the external objdump for that by specifying
      the --objdump option explicitly.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240329215812.537846-5-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6d17edc1
    • Namhyung Kim's avatar
      perf annotate: Split out util/disasm.c · 98f69a57
      Namhyung Kim authored
      The util/annotate.c code has both disassembly and sample annotation
      related codes.  Factor out the disasm part so that it can be handled
      more easily.
      
      No functional changes intended.
      
      Committer notes:
      
      Add missing include env.h, util.h, bpf-event.h and bpf-util.h to
      disasm.c, to fix things like:
      
        util/disasm.c: In function ‘symbol__disassemble_bpf’:
        util/disasm.c:1203:9: error: implicit declaration of function ‘perf_exe’ [-Werror=implicit-function-declaration]
         1203 |         perf_exe(tpath, sizeof(tpath));
              |         ^~~~~~~~
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240329215812.537846-4-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      98f69a57
    • Namhyung Kim's avatar
      perf annotate: Add and use ins__is_nop() · 10adbf77
      Namhyung Kim authored
      Likewise, add ins__is_nop() to check if the current instruction is NOP.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240329215812.537846-3-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      10adbf77