1. 12 Apr, 2024 13 commits
  2. 08 Apr, 2024 11 commits
  3. 05 Apr, 2024 2 commits
    • Andi Kleen's avatar
      perf script: Add capstone support for '-F +brstackdisasm' · d8120446
      Andi Kleen authored
      Support capstone output for the '-F +brstackinsn' branch dump.
      
      The new output is enabled with the new field 'brstackdisasm'.
      
      This was possible before with --xed, but now also allow it for users
      that don't have xed using the builtin capstone support.
      
      Before:
      
        perf record -b emacs -Q --batch '()'
        perf script -F +brstackinsn
        ...
                  emacs   55778 1814366.755945:     151564 cycles:P:      7f0ab2d17192 intel_check_word.constprop.0+0x162 (/usr/lib64/ld-linux-x86-64.s>        intel_check_word.constprop.0+237:
                00007f0ab2d1711d        insn: 75 e6                     # PRED 3 cycles [3]
                00007f0ab2d17105        insn: 73 51
                00007f0ab2d17107        insn: 48 89 c1
                00007f0ab2d1710a        insn: 48 39 ca
                00007f0ab2d1710d        insn: 73 96
                00007f0ab2d1710f        insn: 48 8d 04 11
                00007f0ab2d17113        insn: 48 d1 e8
                00007f0ab2d17116        insn: 49 8d 34 c1
                00007f0ab2d1711a        insn: 44 3a 06
                00007f0ab2d1711d        insn: 75 e6                     # PRED 3 cycles [6] 3.00 IPC
                00007f0ab2d17105        insn: 73 51                     # PRED 1 cycles [7] 1.00 IPC
                00007f0ab2d17158        insn: 48 8d 50 01
                00007f0ab2d1715c        insn: eb 92                     # PRED 1 cycles [8] 2.00 IPC
                00007f0ab2d170f0        insn: 48 39 ca
                00007f0ab2d170f3        insn: 73 b0                     # PRED 1 cycles [9] 2.00 IPC
      
      After (perf must be compiled with capstone):
      
        perf script -F +brstackdisasm
      
        ...
                   emacs   55778 1814366.755945:     151564 cycles:P:      7f0ab2d17192 intel_check_word.constprop.0+0x162 (/usr/lib64/ld-linux-x86-64.s>        intel_check_word.constprop.0+237:
                00007f0ab2d1711d        jne intel_check_word.constprop.0+0xd5   # PRED 3 cycles [3]
                00007f0ab2d17105        jae intel_check_word.constprop.0+0x128
                00007f0ab2d17107        movq %rax, %rcx
                00007f0ab2d1710a        cmpq %rcx, %rdx
                00007f0ab2d1710d        jae intel_check_word.constprop.0+0x75
                00007f0ab2d1710f        leaq (%rcx, %rdx), %rax
                00007f0ab2d17113        shrq $1, %rax
                00007f0ab2d17116        leaq (%r9, %rax, 8), %rsi
                00007f0ab2d1711a        cmpb (%rsi), %r8b
                00007f0ab2d1711d        jne intel_check_word.constprop.0+0xd5   # PRED 3 cycles [6] 3.00 IPC
                00007f0ab2d17105        jae intel_check_word.constprop.0+0x128  # PRED 1 cycles [7] 1.00 IPC
                00007f0ab2d17158        leaq 1(%rax), %rdx
                00007f0ab2d1715c        jmp intel_check_word.constprop.0+0xc0   # PRED 1 cycles [8] 2.00 IPC
                00007f0ab2d170f0        cmpq %rcx, %rdx
                00007f0ab2d170f3        jae intel_check_word.constprop.0+0x75   # PRED 1 cycles [9] 2.00 IPC
      Reviewed-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Link: https://lore.kernel.org/r/20240401210925.209671-3-ak@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d8120446
    • Andi Kleen's avatar
      perf script: Support 32bit code under 64bit OS with capstone · 38ab6013
      Andi Kleen authored
      Use the DSO to resolve whether an IP is 32bit or 64bit and use that to
      configure capstone to the correct mode. This allows to correctly
      disassemble 32bit code under a 64bit OS.
      
        % cat > loop.c
        volatile int var;
        int main(void)
        {
        	int i;
        	for (i = 0; i < 100000; i++)
        		var++;
        }
        % gcc -m32 -o loop loop.c
        % perf record -e cycles:u ./loop
        % perf script -F +disasm
          loop   82665 1833176.618023:      1 cycles:u:   f7eed500 _start+0x0 (/usr/lib/ld-linux.so.2)   movl %esp, %eax
          loop   82665 1833176.618029:      1 cycles:u:   f7eed500 _start+0x0 (/usr/lib/ld-linux.so.2)   movl %esp, %eax
          loop   82665 1833176.618031:      7 cycles:u:   f7eed500 _start+0x0 (/usr/lib/ld-linux.so.2)   movl %esp, %eax
          loop   82665 1833176.618034:     91 cycles:u:   f7eed500 _start+0x0 (/usr/lib/ld-linux.so.2)   movl %esp, %eax
          loop   82665 1833176.618036:   1242 cycles:u:   f7eed500 _start+0x0 (/usr/lib/ld-linux.so.2)   movl %esp, %eax
      Reviewed-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Acked-by: default avatarThomas Richter <tmricht@linux.ibm.com>
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Link: https://lore.kernel.org/r/20240401210925.209671-2-ak@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      38ab6013
  4. 04 Apr, 2024 2 commits
    • Thomas Richter's avatar
      perf stat: Do not fail on metrics on s390 z/VM systems · c2f3d7df
      Thomas Richter authored
      On s390 z/VM virtual machines command 'perf list' also displays metrics:
      
        # perf list | grep -A 20 'Metric Groups:'
        Metric Groups:
      
        No_group:
         cpi
              [Cycles per Instruction]
         est_cpi
              [Estimated Instruction Complexity CPI infinite Level 1]
         finite_cpi
              [Cycles per Instructions from Finite cache/memory]
         l1mp
              [Level One Miss per 100 Instructions]
         l2p
              [Percentage sourced from Level 2 cache]
         l3p
              [Percentage sourced from Level 3 on same chip cache]
         l4lp
              [Percentage sourced from Level 4 Local cache on same book]
         l4rp
              [Percentage sourced from Level 4 Remote cache on different book]
         memp
              [Percentage sourced from memory]
         ....
        #
      
      The command
      
        # perf stat -M cpi -- true
        event syntax error: '{CPU_CYCLES/metric-id=CPU_CYCLES/.....'
                              \___ Bad event or PMU
      
        Unable to find PMU or event on a PMU of 'CPU_CYCLES'
      
         event syntax error: '{CPU_CYCLES/metric-id=CPU_CYCLES/...'
                              \___ Cannot find PMU `CPU_CYCLES'.
                                   Missing kernel support?
       #
      
      fails. 'perf stat' should not fail on metrics when the referenced CPU
      Counter Measurement PMU is not available.
      
      Output after:
      
        # perf stat -M est_cpi -- sleep 1
      
        Performance counter stats for 'sleep 1':
      
           1,000,887,494 ns   duration_time   #     0.00 est_cpi
      
             1.000887494 seconds time elapsed
      
             0.000143000 seconds user
             0.000662000 seconds sys
      
       #
      
      Fixes: 7f76b311 ("perf list: Add IBM z16 event description for s390")
      Suggested-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarThomas Richter <tmricht@linux.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Link: https://lore.kernel.org/r/20240404064806.1362876-2-tmricht@linux.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c2f3d7df
    • Thomas Richter's avatar
      perf report: Fix PAI counter names for s390 virtual machines · b74bc5a6
      Thomas Richter authored
      s390 introduced the Processor Activity Instrumentation (PAI) counter
      facility on LPAR and virtual machines z/VM for models 3931 and 3932.
      
      These counters are stored as raw data in the perf.data file and are
      displayed with:
      
       # perf report -i /tmp//perfout-635468 -D | grep Counter
      	Counter:007 <unknown> Value:0x00000000000186a0
      	Counter:032 <unknown> Value:0x0000000000000001
      	Counter:032 <unknown> Value:0x0000000000000001
      	Counter:032 <unknown> Value:0x0000000000000001
       #
      
      However on z/VM virtual machines, the counter names are not retrieved
      from the PMU and are shown as '<unknown>'.  This is caused by the CPU
      string saved in the mapfile.csv for this machine:
      
         ^IBM.393[12].*3\.7.[[:xdigit:]]+$,3,cf_z16,core
      
      This string contains the CPU Measurement facility first and second
      version number and authorization level (3\.7.[[:xdigit:]]+).  These
      numbers do not apply to the PAI counter facility.  In fact they can be
      omitted.
      
      Shorten the CPU identification string for this machine to manufacturer
      and model. This is sufficient for all PMU devices.
      
      Output after:
      
       # perf report -i /tmp//perfout-635468 -D | grep Counter
      	Counter:007 km_aes_128 Value:0x00000000000186a0
      	Counter:032 kma_gcm_aes_256 Value:0x0000000000000001
      	Counter:032 kma_gcm_aes_256 Value:0x0000000000000001
      	Counter:032 kma_gcm_aes_256 Value:0x0000000000000001
       #
      
      Fixes: b539deaf ("perf report: Add s390 raw data interpretation for PAI counters")
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarThomas Richter <tmricht@linux.ibm.com>
      Acked-by: default avatarSumanth Korikkar <sumanthk@linux.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Link: https://lore.kernel.org/r/20240404064806.1362876-1-tmricht@linux.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b74bc5a6
  5. 03 Apr, 2024 12 commits
    • Arnaldo Carvalho de Melo's avatar
      perf annotate: Initialize 'arch' variable not to trip some -Werror=maybe-uninitialized · b6347cb5
      Arnaldo Carvalho de Melo authored
      In some older distros the build is failing due to
      -Werror=maybe-uninitialized, in this case we know that this isn't the
      case because 'arch' gets initialized by evsel__get_arch(), so make sure
      it is initialized to NULL before returning from evsel__get_arch(), as
      suggested by Ian Rogers.
      
      E.g.:
      
          32    17.12 opensuse:15.5                 : FAIL gcc version 7.5.0 (SUSE Linux)
              util/annotate.c: In function 'hist_entry__get_data_type':
          util/annotate.c:2269:15: error: 'arch' may be used uninitialized in this function [-Werror=maybe-uninitialized]
            struct arch *arch;
                         ^~~~
          cc1: all warnings being treated as errors
      
            43     7.30 ubuntu:18.04-x-powerpc64el    : FAIL gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
          util/annotate.c: In function 'hist_entry__get_data_type':
          util/annotate.c:2351:36: error: 'arch' may be used uninitialized in this function [-Werror=maybe-uninitialized]
             if (map__dso(ms->map)->kernel && arch__is(arch, "x86") &&
                                              ^~~~~~~~~~~~~~~~~~~~~
          cc1: all warnings being treated as errors
      Suggested-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/lkml/CAP-5=fUqtjxAsmdGrnkjhUTLHs-JvV10TtxyocpYDJK_+LYTiQ@mail.gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b6347cb5
    • Yang Jihong's avatar
      perf build: Add LIBTRACEEVENT_DIR build option · baa2ca59
      Yang Jihong authored
      Currently, when libtraceevent is not linked,
      perf does not support tracepoint:
      
        # ./perf record -e sched:sched_switch -a sleep 10
        event syntax error: 'sched:sched_switch'
                             \___ unsupported tracepoint
      
        libtraceevent is necessary for tracepoint support
        Run 'perf list' for a list of valid events
      
         Usage: perf record [<options>] [<command>]
            or: perf record [<options>] -- <command> [<options>]
      
            -e, --event <event>   event selector. use 'perf list' to list available events
      
      For cross-compilation scenario, library may not be installed in the default
      system path. Based on the above requirements, add LIBTRACEEVENT_DIR build
      option to support specifying path of libtraceevent.
      
      Example:
      
        1. Cross compile libtraceevent
        # cd /opt/libtraceevent
        # CROSS_COMPILE=aarch64-linux-gnu- make
      
        2. Cross compile perf
        # cd tool/perf
        # make VF=1 ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- NO_LIBELF=1 LDFLAGS=--static LIBTRACEEVENT_DIR=/opt/libtraceevent
        <SNIP>
        Auto-detecting system features:
        <SNIP>
        ...                       LIBTRACEEVENT_DIR: /opt/libtraceevent
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarYang Jihong <yangjihong@bytedance.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240314063000.2139877-1-yangjihong@bytedance.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      baa2ca59
    • Yang Jihong's avatar
      perf beauty: Fix AT_EACCESS undeclared build error for system with kernel versions lower than v5.8 · 089ef2f4
      Yang Jihong authored
      In the environment of ubuntu 20.04 (the version of kernel headers is
      5.4), there is an error in building perf:
      
          CC      trace/beauty/fs_at_flags.o
        trace/beauty/fs_at_flags.c: In function ‘faccessat2__scnprintf_flags’:
        trace/beauty/fs_at_flags.c:35:14: error: ‘AT_EACCESS’ undeclared (first use in this function); did you mean ‘DN_ACCESS’?
           35 |  if (flags & AT_EACCESS) {
              |              ^~~~~~~~~~
              |              DN_ACCESS
        trace/beauty/fs_at_flags.c:35:14: note: each undeclared identifier is reported only once for each function it appears in
      
      commit 8a1ad441 ("tools headers: Remove now unused copies of
      uapi/{fcntl,openat2}.h and asm/fcntl.h") removes fcntl.h from tools
      headers directory, and fs_at_flags.c uses the 'AT_EACCESS' macro.
      
      This macro was introduced in the kernel version v5.8.  For system with a
      kernel version older than this version, it will cause compilation to
      fail.
      
      Fixes: 8a1ad441 ("tools headers: Remove now unused copies of uapi/{fcntl,openat2}.h and asm/fcntl.h")
      Signed-off-by: default avatarYang Jihong <yangjihong@bytedance.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240403122558.1438841-1-yangjihong@bytedance.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      089ef2f4
    • Namhyung Kim's avatar
      perf annotate: Add symbol name when using capstone · 92dfc594
      Namhyung Kim authored
      This is to keep the existing behavior with objdump.  It needs to show
      symbol information of global variables like below:
      
         Percent |      Source code & Disassembly of elf for cycles:P (1 samples, percent: local period)
        ------------------------------------------------------------------------------------------------
                 : 0                0xffffffff81338f70 <vm_normal_page>:
            0.00 :   ffffffff81338f70:       endbr64
            0.00 :   ffffffff81338f74:       callq   0xffffffff81083a40
            0.00 :   ffffffff81338f79:       movq    %rdi, %r8
            0.00 :   ffffffff81338f7c:       movq    %rdx, %rdi
            0.00 :   ffffffff81338f7f:       callq   *0x17021c3(%rip)   # ffffffff82a3b148 <pv_ops+0x1e8>
            0.00 :   ffffffff81338f85:       movq    0xffbf3c(%rip), %rdx       # ffffffff82334ec8 <physical_mask>
            0.00 :   ffffffff81338f8c:       testq   %rax, %rax                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
            0.00 :   ffffffff81338f8f:       je      0xffffffff81338fd0                         here
            0.00 :   ffffffff81338f91:       movq    %rax, %rcx
            0.00 :   ffffffff81338f94:       andl    $1, %ecx
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240329215812.537846-6-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      92dfc594
    • Namhyung Kim's avatar
      perf annotate: Use libcapstone to disassemble · 6d17edc1
      Namhyung Kim authored
      Now it can use the capstone library to disassemble the instructions.
      Let's use that (if available) for perf annotate to speed up.  Currently
      it only supports x86 architecture.  With this change I can see ~3x speed
      up in data type profiling.
      
      But note that capstone cannot give the source file and line number info.
      For now, users should use the external objdump for that by specifying
      the --objdump option explicitly.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240329215812.537846-5-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6d17edc1
    • Namhyung Kim's avatar
      perf annotate: Split out util/disasm.c · 98f69a57
      Namhyung Kim authored
      The util/annotate.c code has both disassembly and sample annotation
      related codes.  Factor out the disasm part so that it can be handled
      more easily.
      
      No functional changes intended.
      
      Committer notes:
      
      Add missing include env.h, util.h, bpf-event.h and bpf-util.h to
      disasm.c, to fix things like:
      
        util/disasm.c: In function ‘symbol__disassemble_bpf’:
        util/disasm.c:1203:9: error: implicit declaration of function ‘perf_exe’ [-Werror=implicit-function-declaration]
         1203 |         perf_exe(tpath, sizeof(tpath));
              |         ^~~~~~~~
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240329215812.537846-4-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      98f69a57
    • Namhyung Kim's avatar
      perf annotate: Add and use ins__is_nop() · 10adbf77
      Namhyung Kim authored
      Likewise, add ins__is_nop() to check if the current instruction is NOP.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240329215812.537846-3-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      10adbf77
    • Namhyung Kim's avatar
      perf annotate: Use ins__is_xxx() if possible · ad399baa
      Namhyung Kim authored
      This is to prepare separation of disasm related code.  Use the public
      ins API instead of checking the internal data structure.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240329215812.537846-2-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ad399baa
    • Yang Jihong's avatar
      perf evsel: Use evsel__name_is() helper · 09d2056e
      Yang Jihong authored
      Code cleanup, replace strcmp(evsel__name(evsel, {NAME})) with
      evsel__name_is() helper.
      
      No functional change.
      
      Committer notes:
      
      Fix this build error:
      
                trace.syscalls.events.bpf_output = evlist__last(trace.evlist);
        -       assert(evsel__name_is(trace.syscalls.events.bpf_output), "__augmented_syscalls__");
        +       assert(evsel__name_is(trace.syscalls.events.bpf_output, "__augmented_syscalls__"));
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarYang Jihong <yangjihong@bytedance.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240401062724.1006010-3-yangjihong@bytedance.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      09d2056e
    • Yang Jihong's avatar
      perf sched timehist: Fix -g/--call-graph option failure · 6e4b3987
      Yang Jihong authored
      When 'perf sched' enables the call-graph recording, sample_type of dummy
      event does not have PERF_SAMPLE_CALLCHAIN, timehist_check_attr() checks
      that the evsel does not have a callchain, and set show_callchain to 0.
      
      Currently 'perf sched timehist' only saves callchain when processing the
      'sched:sched_switch event', timehist_check_attr() only needs to determine
      whether the event has PERF_SAMPLE_CALLCHAIN.
      
      Before:
      
        # perf sched record -g true
        [ perf record: Woken up 0 times to write data ]
        [ perf record: Captured and wrote 4.153 MB perf.data (7536 samples) ]
        # perf sched timehist
        Samples do not have callchains.
                   time    cpu  task name                       wait time  sch delay   run time
                                [tid/pid]                          (msec)     (msec)     (msec)
        --------------- ------  ------------------------------  ---------  ---------  ---------
          147851.826019 [0000]  perf[285035]                        0.000      0.000      0.000
          147851.826029 [0000]  migration/0[15]                     0.000      0.003      0.009
          147851.826063 [0001]  perf[285035]                        0.000      0.000      0.000
          147851.826069 [0001]  migration/1[21]                     0.000      0.003      0.006
        <SNIP>
      
      After:
      
        # perf sched record -g true
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 2.572 MB perf.data (822 samples) ]
        # perf sched timehist
               time cpu task name        waittime  sch delay  runtime
                          [tid/pid]        (msec)  (msec)    (msec)
        ----------- --- ---------------  --------  --------  -----
        4193.035164 [0] perf[277062]        0.000     0.000   0.000 __traceiter_sched_switch <- __traceiter_sched_switch <- __sched_text_start <- preempt_schedule_common <- __cond_resched <- __wait_for_common <- wait_for_completion
        4193.035174 [0] migration/0[15]     0.000     0.003   0.009 __traceiter_sched_switch <- __traceiter_sched_switch <- __sched_text_start <- smpboot_thread_fn <- kthread <- ret_from_fork
        4193.035207 [1] perf[277062]        0.000     0.000   0.000 __traceiter_sched_switch <- __traceiter_sched_switch <- __sched_text_start <- preempt_schedule_common <- __cond_resched <- __wait_for_common <- wait_for_completion
        4193.035214 [1] migration/1[21]     0.000     0.003   0.007 __traceiter_sched_switch <- __traceiter_sched_switch <- __sched_text_start <- smpboot_thread_fn <- kthread <- ret_from_fork
        <SNIP>
      
      Fixes: 9c95e4ef ("perf evlist: Add evlist__findnew_tracking_event() helper")
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarYang Jihong <yangjihong@bytedance.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Link: https://lore.kernel.org/r/20240401062724.1006010-2-yangjihong@bytedance.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6e4b3987
    • Namhyung Kim's avatar
      perf annotate: Honor output options with --data-type · bdeaf6ff
      Namhyung Kim authored
      For data type profiling output, it should be in sync with normal output
      so make it display percentage for each field.  Also use coloring scheme
      for users to identify fields with big overhead easily.
      
      Users can use --show-total-period or --show-nr-samples to change the
      output style like in the normal perf annotate output.
      
      Before:
      
        $ perf annotate --data-type
        Annotate type: 'struct task_struct' in [kernel.kallsyms] (34 samples):
        ============================================================================
            samples     offset       size  field
                 34          0       9792  struct task_struct    {
                  2          0         24      struct thread_info       thread_info {
                  0          0          8          long unsigned int    flags;
                  1          8          8          long unsigned int    syscall_work;
                  0         16          4          u32  status;
                  1         20          4          u32  cpu;
                                               };
      
      After:
      
        $ perf annotate --data-type
        Annotate type: 'struct task_struct' in [kernel.kallsyms] (34 samples):
        ============================================================================
         Percent     offset       size  field
          100.00          0       9792  struct task_struct       {
            3.55          0         24      struct thread_info  thread_info {
            0.00          0          8          long unsigned int       flags;
            1.63          8          8          long unsigned int       syscall_work;
            0.00         16          4          u32     status;
            1.91         20          4          u32     cpu;
                                            };
      
      Committer testing:
      
      First collect a suitable perf.data file for use with 'perf annotate --data-type':
      
        root@number:~# perf mem record -a sleep 1s
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 11.047 MB perf.data (3466 samples) ]
        root@number:~#
      
      Then, before:
      
        root@number:~# perf annotate --data-type
        Annotate type: 'union ' in /usr/lib64/libc.so.6 (6 samples):
        ============================================================================
            samples     offset       size  field
                  6          0         40  union         {
                  6          0         40      struct __pthread_mutex_s __data {
                  2          0          4          int  __lock;
                  0          4          4          unsigned int __count;
                  0          8          4          int  __owner;
                  1         12          4          unsigned int __nusers;
                  2         16          4          int  __kind;
                  1         20          2          short int    __spins;
                  0         22          2          short int    __elision;
                  0         24         16          __pthread_list_t     __list {
                  0         24          8              struct __pthread_internal_list*  __prev;
                  0         32          8              struct __pthread_internal_list*  __next;
                                                   };
                                               };
                  0          0          0      char*    __size;
                  2          0          8      long int __align;
                                           };
        <SNIP>
      
      And after:
      
        Annotate type: 'union ' in /usr/lib64/libc.so.6 (6 samples):
        ============================================================================
         Percent     offset       size  field
          100.00          0         40  union    {
          100.00          0         40      struct __pthread_mutex_s    __data {
           31.27          0          4          int     __lock;
            0.00          4          4          unsigned int    __count;
            0.00          8          4          int     __owner;
            7.67         12          4          unsigned int    __nusers;
           53.10         16          4          int     __kind;
            7.96         20          2          short int       __spins;
            0.00         22          2          short int       __elision;
            0.00         24         16          __pthread_list_t        __list {
            0.00         24          8              struct __pthread_internal_list*     __prev;
            0.00         32          8              struct __pthread_internal_list*     __next;
                                                };
                                            };
            0.00          0          0      char*       __size;
           31.27          0          8      long int    __align;
                                        };
        <SNIP>
      
      The lines with percentages >= 7.67 have its percentages red colored.
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240322224313.423181-2-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bdeaf6ff
    • Namhyung Kim's avatar
      perf annotate: Get rid of duplicate --group option item · 374af9f1
      Namhyung Kim authored
      The options array in cmd_annotate() has duplicate --group options.  It
      only needs one and let's get rid of the other.
      
        $ perf annotate -h 2>&1 | grep group
              --group           Show event group information together
              --group           Show event group information together
      
      Fixes: 7ebaf489 ("perf annotate: Support '--group' option")
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240322224313.423181-1-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      374af9f1