1. 08 Sep, 2022 4 commits
    • Zhengjun Xing's avatar
      perf script: Fix Cannot print 'iregs' field for hybrid systems · 82b2425f
      Zhengjun Xing authored
      Commit b91e5492 ("perf record: Add a dummy event on hybrid
      systems to collect metadata records") adds a dummy event on hybrid
      systems to fix the symbol "unknown" issue when the workload is created
      in a P-core but runs on an E-core. The added dummy event will cause
      "perf script -F iregs" to fail. Dummy events do not have "iregs"
      attribute set, so when we do evsel__check_attr, the "iregs" attribute
      check will fail, so the issue happened.
      
      The following commit [1] has fixed a similar issue by skipping the attr
      check for the dummy event because it does not have any samples anyway. It
      works okay for the normal mode, but the issue still happened when running
      the test in the pipe mode. In the pipe mode, it calls process_attr() which
      still checks the attr for the dummy event. This commit fixed the issue by
      skipping the attr check for the dummy event in the API evsel__check_attr,
      Otherwise, we have to patch everywhere when evsel__check_attr() is called.
      
      Before:
      
        #./perf record -o - --intr-regs=di,r8,dx,cx -e br_inst_retired.near_call:p -c 1000 --per-thread true 2>/dev/null|./perf script -F iregs |head -5
        Samples for 'dummy:HG' event do not have IREGS attribute set. Cannot print 'iregs' field.
        0x120 [0x90]: failed to process type: 64
        #
      
      After:
      
        # ./perf record -o - --intr-regs=di,r8,dx,cx -e br_inst_retired.near_call:p -c 1000 --per-thread true 2>/dev/null|./perf script -F iregs |head -5
        ABI:2    CX:0x55b8efa87000    DX:0x55b8efa7e000    DI:0xffffba5e625efbb0    R8:0xffff90e51f8ae100
        ABI:2    CX:0x7f1dae1e4000    DX:0xd0    DI:0xffff90e18c675ac0    R8:0x71
        ABI:2    CX:0xcc0    DX:0x1    DI:0xffff90e199880240    R8:0x0
        ABI:2    CX:0xffff90e180dd7500    DX:0xffff90e180dd7500    DI:0xffff90e180043500    R8:0x1
        ABI:2    CX:0x50    DX:0xffff90e18c583bd0    DI:0xffff90e1998803c0    R8:0x58
        #
      
      [1]https://lore.kernel.org/lkml/20220831124041.219925-1-jolsa@kernel.org/
      
      Fixes: b91e5492 ("perf record: Add a dummy event on hybrid systems to collect metadata records")
      Suggested-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarXing Zhengjun <zhengjun.xing@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220908070030.3455164-1-zhengjun.xing@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      82b2425f
    • Yang Jihong's avatar
      perf lock: Remove redundant word 'contention' in help message · 3705a6ef
      Yang Jihong authored
      Before:
        # perf lock -h
      
         Usage: perf lock [<options>] {record|report|script|info|contention|contention}
      
            -D, --dump-raw-trace  dump raw trace in ASCII
            -f, --force           don't complain, do it
            -i, --input <file>    input file name
            -v, --verbose         be more verbose (show symbol address, etc)
                --kallsyms <file>
                                  kallsyms pathname
                --vmlinux <file>  vmlinux pathname
      
      After:
        # perf lock -h
      
         Usage: perf lock [<options>] {record|report|script|info|contention}
      
            -D, --dump-raw-trace  dump raw trace in ASCII
            -f, --force           don't complain, do it
            -i, --input <file>    input file name
            -v, --verbose         be more verbose (show symbol address, etc)
                --kallsyms <file>
                                  kallsyms pathname
                --vmlinux <file>  vmlinux pathname
      
      Fixes: 528b9cab ("perf lock: Add 'contention' subcommand")
      Signed-off-by: default avatarYang Jihong <yangjihong1@huawei.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220908014854.151203-1-yangjihong1@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3705a6ef
    • Adrian Hunter's avatar
      perf dlfilter dlfilter-show-cycles: Fix types for print format · 1706623e
      Adrian Hunter authored
      Avoid compiler warning about format %llu that expects long long unsigned
      int but argument has type __u64.
      Reported-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Fixes: c3afd6e5 ("perf dlfilter: Add dlfilter-show-cycles")
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20220905074735.4513-1-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      1706623e
    • Adrian Hunter's avatar
      libperf evlist: Fix per-thread mmaps for multi-threaded targets · 7864d8f7
      Adrian Hunter authored
      The offending commit removed mmap_per_thread(), which did not consider
      the different set-output rules for per-thread mmaps i.e. in the per-thread
      case set-output is used for file descriptors of the same thread not the
      same cpu.
      
      This was not immediately noticed because it only happens with
      multi-threaded targets and we do not have a test for that yet.
      
      Reinstate mmap_per_thread() expanding it to cover also system-wide per-cpu
      events i.e. to continue to allow the mixing of per-thread and per-cpu
      mmaps.
      
      Debug messages (with -vv) show the file descriptors that are opened with
      sys_perf_event_open. New debug messages are added (needs -vvv) that show
      also which file descriptors are mmapped and which are redirected with
      set-output.
      
      In the per-cpu case (cpu != -1) file descriptors for the same CPU are
      set-output to the first file descriptor for that CPU.
      
      In the per-thread case (cpu == -1) file descriptors for the same thread are
      set-output to the first file descriptor for that thread.
      
      Example (process 17489 has 2 threads):
      
       Before (but with new debug prints):
      
         $ perf record --no-bpf-event -vvv --per-thread -p 17489
         <SNIP>
         sys_perf_event_open: pid 17489  cpu -1  group_fd -1  flags 0x8 = 5
         sys_perf_event_open: pid 17490  cpu -1  group_fd -1  flags 0x8 = 6
         <SNIP>
         libperf: idx 0: mmapping fd 5
         libperf: idx 0: set output fd 6 -> 5
         failed to mmap with 22 (Invalid argument)
      
       After:
      
         $ perf record --no-bpf-event -vvv --per-thread -p 17489
         <SNIP>
         sys_perf_event_open: pid 17489  cpu -1  group_fd -1  flags 0x8 = 5
         sys_perf_event_open: pid 17490  cpu -1  group_fd -1  flags 0x8 = 6
         <SNIP>
         libperf: mmap_per_thread: nr cpu values (may include -1) 1 nr threads 2
         libperf: idx 0: mmapping fd 5
         libperf: idx 1: mmapping fd 6
         <SNIP>
         [ perf record: Woken up 2 times to write data ]
         [ perf record: Captured and wrote 0.018 MB perf.data (15 samples) ]
      
      Per-cpu example (process 20341 has 2 threads, same as above):
      
         $ perf record --no-bpf-event -vvv -p 20341
         <SNIP>
         sys_perf_event_open: pid 20341  cpu 0  group_fd -1  flags 0x8 = 5
         sys_perf_event_open: pid 20342  cpu 0  group_fd -1  flags 0x8 = 6
         sys_perf_event_open: pid 20341  cpu 1  group_fd -1  flags 0x8 = 7
         sys_perf_event_open: pid 20342  cpu 1  group_fd -1  flags 0x8 = 8
         sys_perf_event_open: pid 20341  cpu 2  group_fd -1  flags 0x8 = 9
         sys_perf_event_open: pid 20342  cpu 2  group_fd -1  flags 0x8 = 10
         sys_perf_event_open: pid 20341  cpu 3  group_fd -1  flags 0x8 = 11
         sys_perf_event_open: pid 20342  cpu 3  group_fd -1  flags 0x8 = 12
         sys_perf_event_open: pid 20341  cpu 4  group_fd -1  flags 0x8 = 13
         sys_perf_event_open: pid 20342  cpu 4  group_fd -1  flags 0x8 = 14
         sys_perf_event_open: pid 20341  cpu 5  group_fd -1  flags 0x8 = 15
         sys_perf_event_open: pid 20342  cpu 5  group_fd -1  flags 0x8 = 16
         sys_perf_event_open: pid 20341  cpu 6  group_fd -1  flags 0x8 = 17
         sys_perf_event_open: pid 20342  cpu 6  group_fd -1  flags 0x8 = 18
         sys_perf_event_open: pid 20341  cpu 7  group_fd -1  flags 0x8 = 19
         sys_perf_event_open: pid 20342  cpu 7  group_fd -1  flags 0x8 = 20
         <SNIP>
         libperf: mmap_per_cpu: nr cpu values 8 nr threads 2
         libperf: idx 0: mmapping fd 5
         libperf: idx 0: set output fd 6 -> 5
         libperf: idx 1: mmapping fd 7
         libperf: idx 1: set output fd 8 -> 7
         libperf: idx 2: mmapping fd 9
         libperf: idx 2: set output fd 10 -> 9
         libperf: idx 3: mmapping fd 11
         libperf: idx 3: set output fd 12 -> 11
         libperf: idx 4: mmapping fd 13
         libperf: idx 4: set output fd 14 -> 13
         libperf: idx 5: mmapping fd 15
         libperf: idx 5: set output fd 16 -> 15
         libperf: idx 6: mmapping fd 17
         libperf: idx 6: set output fd 18 -> 17
         libperf: idx 7: mmapping fd 19
         libperf: idx 7: set output fd 20 -> 19
         <SNIP>
         [ perf record: Woken up 7 times to write data ]
         [ perf record: Captured and wrote 0.020 MB perf.data (17 samples) ]
      
      Fixes: ae4f8ae1 ("libperf evlist: Allow mixing per-thread and per-cpu mmaps")
      Reported-by: default avatarTomáš Trnka <trnka@scm.com>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216441Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220905114209.8389-1-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7864d8f7
  2. 06 Sep, 2022 4 commits
  3. 02 Sep, 2022 1 commit
    • Zhengjun Xing's avatar
      perf stat: Fix L2 Topdown metrics disappear for raw events · f0c86a2b
      Zhengjun Xing authored
      In perf/Documentation/perf-stat.txt, for "--td-level" the default "0" means
      the max level that the current hardware support.
      
      So we need initialize the stat_config.topdown_level to TOPDOWN_MAX_LEVEL
      when “--td-level=0” or no “--td-level” option. Otherwise, for the
      hardware with a max level is 2, the 2nd level metrics disappear for raw
      events in this case.
      
      The issue cannot be observed for the perf stat default or "--topdown"
      options. This commit fixes the raw events issue and removes the
      duplicated code for the perf stat default.
      
      Before:
      
       # ./perf stat -e "cpu-clock,context-switches,cpu-migrations,page-faults,instructions,cycles,ref-cycles,branches,branch-misses,{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound,topdown-heavy-ops,topdown-br-mispredict,topdown-fetch-lat,topdown-mem-bound}" sleep 1
      
       Performance counter stats for 'sleep 1':
      
                    1.03 msec cpu-clock                        #    0.001 CPUs utilized
                       1      context-switches                 #  966.216 /sec
                       0      cpu-migrations                   #    0.000 /sec
                      60      page-faults                      #   57.973 K/sec
               1,132,112      instructions                     #    1.41  insn per cycle
                 803,872      cycles                           #    0.777 GHz
               1,909,120      ref-cycles                       #    1.845 G/sec
                 236,634      branches                         #  228.640 M/sec
                   6,367      branch-misses                    #    2.69% of all branches
               4,823,232      slots                            #    4.660 G/sec
               1,210,536      topdown-retiring                 #     25.1% Retiring
                 699,841      topdown-bad-spec                 #     14.5% Bad Speculation
               1,777,975      topdown-fe-bound                 #     36.9% Frontend Bound
               1,134,878      topdown-be-bound                 #     23.5% Backend Bound
                 189,146      topdown-heavy-ops                #  182.756 M/sec
                 662,012      topdown-br-mispredict            #  639.647 M/sec
               1,097,048      topdown-fetch-lat                #    1.060 G/sec
                 416,121      topdown-mem-bound                #  402.063 M/sec
      
             1.002423690 seconds time elapsed
      
             0.002494000 seconds user
             0.000000000 seconds sys
      
      After:
      
       # ./perf stat -e "cpu-clock,context-switches,cpu-migrations,page-faults,instructions,cycles,ref-cycles,branches,branch-misses,{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound,topdown-heavy-ops,topdown-br-mispredict,topdown-fetch-lat,topdown-mem-bound}" sleep 1
      
       Performance counter stats for 'sleep 1':
      
                    1.13 msec cpu-clock                        #    0.001 CPUs utilized
                       1      context-switches                 #  882.128 /sec
                       0      cpu-migrations                   #    0.000 /sec
                      61      page-faults                      #   53.810 K/sec
               1,137,612      instructions                     #    1.29  insn per cycle
                 881,477      cycles                           #    0.778 GHz
               2,093,496      ref-cycles                       #    1.847 G/sec
                 236,356      branches                         #  208.496 M/sec
                   7,090      branch-misses                    #    3.00% of all branches
               5,288,862      slots                            #    4.665 G/sec
               1,223,697      topdown-retiring                 #     23.1% Retiring
                 767,403      topdown-bad-spec                 #     14.5% Bad Speculation
               2,053,322      topdown-fe-bound                 #     38.8% Frontend Bound
               1,244,438      topdown-be-bound                 #     23.5% Backend Bound
                 186,665      topdown-heavy-ops                #      3.5% Heavy Operations       #     19.6% Light Operations
                 725,922      topdown-br-mispredict            #     13.7% Branch Mispredict      #      0.8% Machine Clears
               1,327,400      topdown-fetch-lat                #     25.1% Fetch Latency          #     13.7% Fetch Bandwidth
                 497,775      topdown-mem-bound                #      9.4% Memory Bound           #     14.1% Core Bound
      
             1.002701530 seconds time elapsed
      
             0.002744000 seconds user
             0.000000000 seconds sys
      
      Fixes: 63e39aa6 ("perf stat: Support L2 Topdown events")
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarXing Zhengjun <zhengjun.xing@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220826140057.3289401-1-zhengjun.xing@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f0c86a2b
  4. 31 Aug, 2022 2 commits
    • Jiri Olsa's avatar
      perf script: Skip dummy event attr check · 35503ce1
      Jiri Olsa authored
      Hongtao Yu reported problem when displaying uregs in perf script
      for system wide perf.data:
      
        # perf script -F uregs | head -10
        Samples for 'dummy:HG' event do not have UREGS attribute set. Cannot print 'uregs' field.
      
      The problem is the extra dummy event added for system wide,
      which does not have proper sample_type setup.
      
      Skipping attr check completely for dummy event as suggested
      by Namhyung, because it does not have any samples anyway.
      Reported-by: default avatarHongtao Yu <hoy@fb.com>
      Suggested-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220831124041.219925-1-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      35503ce1
    • Ian Rogers's avatar
      perf metric: Return early if no CPU PMU table exists · 3f5df3ac
      Ian Rogers authored
      Previous behavior is to segfault if there is no CPU PMU table and a
      metric is sought. To reproduce compile with NO_JEVENTS=1 then request a
      metric, for example, "perf stat -M IPC true".
      
      Committer testing:
      
      Before:
      
        $ make -k NO_JEVENTS=1 BUILD_BPF_SKEL=1 O=/tmp/build/perf-urgent -C tools/perf install-bin
        $ perf stat -M IPC true
        Segmentation fault (core dumped)
        $
      
      After:
      
        $ perf stat -M IPC true
      
         Usage: perf stat [<options>] [<command>]
      
            -M, --metrics <metric/metric group list>
                                  monitor specified metrics or metric groups (separated by ,)
        $
      
      Fixes: 00facc76 ("perf jevents: Switch build to use jevents.py")
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Caleb Biggers <caleb.biggers@intel.com>
      Cc: Florian Fischer <florian.fischer@muhq.space>
      Cc: Ian Rogers <rogers.email@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Kshipra Bopardikar <kshipra.bopardikar@intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Miaoqian Lin <linmq006@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Perry Taylor <perry.taylor@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Link: https://lore.kernel.org/r/20220830164846.401143-3-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3f5df3ac
  5. 29 Aug, 2022 2 commits
  6. 28 Aug, 2022 25 commits
  7. 27 Aug, 2022 2 commits