1. 02 May, 2024 10 commits
    • Namhyung Kim's avatar
      perf maps: Remove check_invariants() from maps__lock() · 3cdd98b4
      Namhyung Kim authored
      I found that the debug build was a slowed down a lot by the maps lock
      code since it checks the invariants whenever it gets the pointer to the
      lock.  This means it checks twice the invariants before and after the
      access.
      
      Instead, let's move the checking code within the lock area but after any
      modification and remove it from the read paths.  This would remove (more
      than) half of the maps lock overhead.
      
      The time for perf report with a huge data file (200k+ of MMAP2 events).
      
        Non-debug     Before      After
        ---------   --------   --------
           2m 43s     6m 45s     4m 21s
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240429225738.1491791-1-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3cdd98b4
    • James Clark's avatar
      perf cs-etm: Improve version detection and error reporting · e3123079
      James Clark authored
      When the config validation functions are warning about ETMv3, they do it
      based on "not ETMv4". If the drivers aren't all loaded or the hardware
      doesn't support Coresight it will appear as "not ETMv4" and then Perf
      will print the error message "... not supported in ETMv3 ..." which is
      wrong and confusing.
      
      cs_etm_is_etmv4() is also misnamed because it also returns true for
      ETE because ETE has a superset of the ETMv4 metadata files. Although
      this was always done in the correct order so it wasn't a bug.
      
      Improve all this by making a single get version function which also
      handles not present as a separate case. Change the ETMv3 error message
      to only print when ETMv3 is detected, and add a new error message for
      the not present case.
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarLeo Yan <leo.yan@linux.dev>
      Signed-off-by: default avatarJames Clark <james.clark@arm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20240501135753.508022-4-james.clark@arm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e3123079
    • James Clark's avatar
      perf cs-etm: Remove repeated fetches of the ETM PMU · bc5e0e1b
      James Clark authored
      Most functions already have cs_etm_pmu, so it's a bit neater to pass
      it through rather than itr only to convert it again.
      Signed-off-by: default avatarJames Clark <james.clark@arm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20240501135753.508022-3-james.clark@arm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bc5e0e1b
    • James Clark's avatar
      perf cs-etm: Use struct perf_cpu as much as possible · cbaf2c4f
      James Clark authored
      The perf_cpu struct makes some iterators simpler and avoids some
      mistakes with interchanging CPU IDs with indexes etc. At the moment in
      this file the conversion to an integer is done somewhere in the middle
      of the call tree. Change it to delay the conversion to an int until the
      leaf functions.
      
      Some of the usage patterns are duplicated, so instead of changing them
      all, make cs_etm_get_ro() more reusable and use that everywhere.
      cs_etm_get_ro() didn't return an error before, but return one now so
      that it can also be used where an error is needed. Continue to ignore
      the error where it was already ignored.
      
      Use cs_etm_pmu_path_exists() instead of cs_etm_get_ro() in
      cs_etm_is_etmv4() because cs_etm_get_ro() prints a warning, but path
      exists is sufficient for this use case.
      Signed-off-by: default avatarJames Clark <james.clark@arm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20240501135753.508022-2-james.clark@arm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      cbaf2c4f
    • Namhyung Kim's avatar
      perf annotate-data: Check kind of stack variables · b7d4aacf
      Namhyung Kim authored
      I sometimes see ("unknown type") in the result and it was because it
      didn't check the type of stack variables properly during the instruction
      tracking.  The stack can carry constant values (without type info) and
      if the target instruction is accessing the stack location, it resulted
      in the "unknown type".
      
      Maybe we could pick one of integer types for the constant, but it
      doesn't really mean anything useful.  Let's just drop the stack slot if
      it doesn't have a valid type info.
      
      Here's an example how it got the unknown type.
      Note that 0xffffff48 = -0xb8.
        -----------------------------------------------------------
        find data type for 0xffffff48(reg6) at ...
        CU for ...
        frame base: cfa=0 fbreg=6
        scope: [2/2] (die:11cb97f)
        bb: [37 - 3a]
        var [37] reg15 type='int' size=0x4 (die:0x1180633)
        bb: [40 - 4b]
        mov [40] imm=0x1 -> reg13
        var [45] reg8 type='sigset_t*' size=0x8 (die:0x11a39ee)
        mov [45] imm=0x1 -> reg2                     <---  here reg2 has a constant
        bb: [215 - 237]
        mov [218] reg2 -> -0xb8(stack) constant      <---  and save it to the stack
        mov [225] reg13 -> -0xc4(stack) constant
        call [22f] find_task_by_vgpid
        call [22f] return -> reg0 type='struct task_struct*' size=0x8 (die:0x11881e8)
        bb: [5c8 - 5cf]
        bb: [2fb - 302]
        mov [2fb] -0xc4(stack) -> reg13 constant
        bb: [13b - 14d]
        mov [143] 0xd50(reg3) -> reg5 type='struct task_struct*' size=0x8 (die:0xa31f3c)
        bb: [153 - 153]
        chk [153] reg6 offset=0xffffff48 ok=0 kind=0 fbreg    <--- access here
        found by insn track: 0xffffff48(reg6) type-offset=0
         type='G<EF>^K<F6><AF>U' size=0 (die:0xffffffffffffffff)
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240502060011.1838090-7-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b7d4aacf
    • Namhyung Kim's avatar
      perf annotate-data: Handle multi regs in find_data_type_block() · af89e8f2
      Namhyung Kim authored
      The instruction tracking should be the same for the both registers.
      
      Just do it once and compare the result with multi regs as with the
      previous patches.
      
      Then we don't need to call find_data_type_block() separately for each
      reg.
      
      Let's remove the 'reg' argument from the relevant functions.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240502060011.1838090-6-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      af89e8f2
    • Namhyung Kim's avatar
      perf annotate-data: Check memory access with two registers · eba1f853
      Namhyung Kim authored
      The following instruction pattern is used to access a global variable.
      
        mov     $0x231c0, %rax
        movsql  %edi, %rcx
        mov     -0x7dc94ae0(,%rcx,8), %rcx
        cmpl    $0x0, 0xa60(%rcx,%rax,1)     <<<--- here
      
      The first instruction set the address of the per-cpu variable (here, it
      is 'runqueues' of type 'struct rq').  The second instruction seems like
      a cpu number of the per-cpu base.  The third instruction get the base
      offset of per-cpu area for that cpu.  The last instruction compares the
      value of the per-cpu variable at the offset of 0xa60.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240502060011.1838090-5-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      eba1f853
    • Namhyung Kim's avatar
      perf annotate-data: Handle direct global variable access · 4449c904
      Namhyung Kim authored
      Like per-cpu base offset array, sometimes it accesses the global
      variable directly using the offset.  Allow this type of instructions as
      long as it finds a global variable for the address.
      
        movslq  %edi, %rcx
        mov     -0x7dc94ae0(,%rcx,8), %rcx   <<<--- here
      
      As %rcx has a valid type (i.e. array index) from the first instruction,
      it will be checked by the first case in check_matching_type().  But as
      it's not a pointer type, the match will fail.  But in this case, it
      should check if it accesses the kernel global array variable.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240502060011.1838090-4-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4449c904
    • Namhyung Kim's avatar
      perf annotate-data: Collect global variables in advance · c1da8411
      Namhyung Kim authored
      Currently it looks up global variables from the current CU using address
      and name.  But it sometimes fails to find a variable as the variable can
      come from a different CU - but it's still strange it failed to find a
      declaration for some reason.
      
      Anyway, it can collect all global variables from all CU once and then
      lookup them later on.  This slightly improves the success rate of my
      test data set.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240502060011.1838090-3-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c1da8411
    • Namhyung Kim's avatar
      perf dwarf-aux: Add die_collect_global_vars() · d7b60803
      Namhyung Kim authored
      This function is to search all global variables in the CU.  We want to
      have the list of global variables at once and match them later.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240502060011.1838090-2-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d7b60803
  2. 27 Apr, 2024 30 commits
    • Arnaldo Carvalho de Melo's avatar
      perf test: Reintroduce -p/--parallel and make -S/--sequential the default · 8c618b58
      Arnaldo Carvalho de Melo authored
      We can't default to doing parallel tests as there are tests that compete
      for the same resources and thus clash, for instance tests that put in
      place 'perf probe' probes, that clean the probes without regard to other
      tests needs, ARM64 coresight tests, Intel PT ones, etc.
      
      So reintroduce --p/--parallel and make -S/--sequential the default.
      
      We need to come up with infrastructure that state which tests can't run
      in parallel because they need exclusive access to some resource,
      something as simple as "probes" that would then avoid 'perf probe' tests
      from running while other such test is running, or make the tests more
      resilient, till then we can't use parallel mode as default.
      
      While at it, document all these options in the 'perf test' man page.
      Reported-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Reported-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Reported-by: default avatarJames Clark <james.clark@arm.com>
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/lkml/Ziwm18BqIn_vc1vn@x1Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8c618b58
    • Arnaldo Carvalho de Melo's avatar
      tools headers: Synchronize linux/bits.h with the kernel sources · 450f941e
      Arnaldo Carvalho de Melo authored
      To pick up the changes in this cset:
      
         3c7a8e19 ("uapi: introduce uapi-friendly macros for GENMASK")
      
      That just causes perf to rebuild. Its just some macros going to an uapi
      header that we now have to grab a copy into tools/ as well.
      
      This addresses this perf build warning:
      
        Warning: Kernel ABI header differences:
          diff -u tools/include/linux/bits.h include/linux/bits.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Link: https://lore.kernel.org/lkml/ZiwJsFOBez0MS4r9@x1Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      450f941e
    • Arnaldo Carvalho de Melo's avatar
      tools headers x86 cpufeatures: Sync with the kernel sources to pick BHI mitigation changes · 8f211643
      Arnaldo Carvalho de Melo authored
      To pick the changes from:
      
        95a6ccbd ("x86/bhi: Mitigate KVM by default")
        ec9404e4 ("x86/bhi: Add BHI mitigation knob")
        be482ff9 ("x86/bhi: Enumerate Branch History Injection (BHI) bug")
        0f4a8376 ("x86/bhi: Define SPEC_CTRL_BHI_DIS_S")
        7390db8a ("x86/bhi: Add support for clearing branch history at syscall entry")
      
      This causes these perf files to be rebuilt and brings some X86_FEATURE
      that will be used when updating the copies of
      tools/arch/x86/lib/mem{cpy,set}_64.S with the kernel sources:
      
            CC       /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
            CC       /tmp/build/perf/bench/mem-memset-x86-64-asm.o
      
      And addresses this perf build warning:
      
        Warning: Kernel ABI header differences:
          diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Daniel Sneddon <daniel.sneddon@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lore.kernel.org/lkml/ZirIx4kPtJwGFZS0@x1Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8f211643
    • Namhyung Kim's avatar
      perf annotate: Fix data type profiling on stdio · 2b87383c
      Namhyung Kim authored
      The loop in hists__find_annotations() never set the 'nd' pointer to NULL
      and it makes stdio output repeating the last element forever.  I think
      it doesn't set to NULL for TUI to prevent it from exiting unexpectedly.
      But it should just set on stdio mode.
      
      Fixes: d001c7a7 ("perf annotate-data: Add hist_entry__annotate_data_tui()")
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240423020643.740029-1-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2b87383c
    • Ian Rogers's avatar
      perf build: Pretend scandirat is missing with msan · 8524d71c
      Ian Rogers authored
      Memory sanitizer lacks an interceptor for scandirat, reporting all
      memory it allocates as uninitialized. Memory sanitizer has a scandir
      interceptor so use the fallback function in this case. This allows
      'perf test' to run under memory sanitizer.
      
      Additional notes from Ian on running in this mode:
      
      Note, as msan needs to instrument memory allocations libraries need to
      be compiled with it. I lacked the msan built libraries and so built
      with:
      ```
      $ make -C tools/perf O=/tmp/perf DEBUG=1 EXTRA_CFLAGS="-O0 -g
      -fno-omit-frame-pointer -fsanitize=memory
      -fsanitize-memory-track-origins" CC=clang CXX=clang++ HOSTCC=clang
      NO_LIBTRACEEVENT=1 NO_LIBELF=1 BUILD_BPF_SKEL=0 NO_LIBPFM=1
      ```
      oh, I disabled libbpf here as the bpf system call also lacks msan interceptors.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240320163244.1287780-1-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8524d71c
    • Adrian Hunter's avatar
      perf intel-pt: Fix unassigned instruction op (discovered by MemorySanitizer) · e101a05f
      Adrian Hunter authored
      MemorySanitizer discovered instances where the instruction op value was
      not assigned.:
      
        WARNING: MemorySanitizer: use-of-uninitialized-value
          #0 0x5581c00a76b3 in intel_pt_sample_flags tools/perf/util/intel-pt.c:1527:17
        Uninitialized value was stored to memory at
          #0 0x5581c005ddf8 in intel_pt_walk_insn tools/perf/util/intel-pt-decoder/intel-pt-decoder.c:1256:25
      
      The op value is used to set branch flags for branch instructions
      encountered when walking the code, so fix by setting op to
      INTEL_PT_OP_OTHER in other cases.
      
      Fixes: 4c761d80 ("perf intel-pt: Fix intel_pt_fup_event() assumptions about setting state type")
      Reported-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Tested-by: default avatarIan Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Closes: https://lore.kernel.org/linux-perf-users/20240320162619.1272015-1-irogers@google.com/
      Link: https://lore.kernel.org/r/20240326083223.10883-1-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e101a05f
    • Howard Chu's avatar
      perf record: Fix comment misspellings · 7cc72090
      Howard Chu authored
      Fix comment misspellings
      Signed-off-by: default avatarHoward Chu <howardchu95@gmail.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240425060427.1800663-1-howardchu95@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7cc72090
    • Namhyung Kim's avatar
      perf annotate: Update DSO binary type when trying build-id · 8f3ec810
      Namhyung Kim authored
      dso__disassemble_filename() tries to get the filename for objdump (or
      capstone) using build-id.  But I found sometimes it didn't disassemble
      some functions.
      
      It turned out that those functions belong to a DSO which has no binary
      type set.  It seems it sets the binary type for some special files only
      - like kernel (kallsyms or kcore) or BPF images.  And there's a logic to
      skip dso with DSO_BINARY_TYPE__NOT_FOUND.
      
      As it's checked the build-id cache link, it should set the binary type
      as DSO_BINARY_TYPE__BUILD_ID_CACHE.
      
      Fixes: 873a8373 ("perf annotate: Skip DSOs not found")
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240425005157.1104789-2-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8f3ec810
    • Namhyung Kim's avatar
      perf annotate: Fallback disassemble to objdump when capstone fails · f35847de
      Namhyung Kim authored
      I found some cases that capstone failed to disassemble.  Probably my
      capstone is an old version but anyway there's a chance it can fail.  And
      then it silently stopped in the middle.  In my case, it didn't
      understand "RDPKRU" instruction.
      
      Let's check if the capstone disassemble reached the end of the function
      and fallback to objdump if not.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240425005157.1104789-1-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f35847de
    • Namhyung Kim's avatar
      perf annotate-data: Check if 'struct annotation_source' was allocated on 'perf report' TUI · 47557db9
      Namhyung Kim authored
      As it removed the sample accounting for code when no symbol sort key is
      given for 'perf report' TUI, it might not have allocated the
      'struct annotated_source' yet.  Let's check if it's NULL first.
      
      Fixes: 6cdd977e ("perf report: Do not collect sample histogram unnecessarily")
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240424230015.1054013-1-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      47557db9
    • Namhyung Kim's avatar
      perf test: Add a new test for 'perf annotate' · 281bf8f6
      Namhyung Kim authored
      Add a basic 'perf annotate' test:
      
        $ ./perf test annotate -vv
         76: perf annotate basic tests:
        --- start ---
        test child forked, pid 846989
         fbcd0-fbd55 l noploop
        perf does have symbol 'noploop'
        Basic perf annotate test
                 : 0     0xfbcd0 <noploop>:
            0.00 :   fbcd0:       pushq   %rbp
            0.00 :   fbcd1:       movq    %rsp, %rbp
            0.00 :   fbcd4:       pushq   %r12
            0.00 :   fbcd6:       pushq   %rbx
            0.00 :   fbcd7:       movl    $1, %ebx
            0.00 :   fbcdc:       subq    $0x10, %rsp
            0.00 :   fbce0:       movq    %fs:0x28, %rax
            0.00 :   fbce9:       movq    %rax, -0x18(%rbp)
            0.00 :   fbced:       xorl    %eax, %eax
            0.00 :   fbcef:       testl   %edi, %edi
            0.00 :   fbcf1:       jle     0xfbd04
            0.00 :   fbcf3:       movq    (%rsi), %rdi
            0.00 :   fbcf6:       movl    $0xa, %edx
            0.00 :   fbcfb:       xorl    %esi, %esi
            0.00 :   fbcfd:       callq   0x41920
            0.00 :   fbd02:       movl    %eax, %ebx
            0.00 :   fbd04:       leaq    -0x7b(%rip), %r12	# fbc90 <sighandler>
            0.00 :   fbd0b:       movl    $2, %edi
            0.00 :   fbd10:       movq    %r12, %rsi
            0.00 :   fbd13:       callq   0x40a00
            0.00 :   fbd18:       movl    $0xe, %edi
            0.00 :   fbd1d:       movq    %r12, %rsi
            0.00 :   fbd20:       callq   0x40a00
            0.00 :   fbd25:       movl    %ebx, %edi
            0.00 :   fbd27:       callq   0x407c0
            0.10 :   fbd2c:       movl    0x89785e(%rip), %eax	# 993590 <done>
            0.00 :   fbd32:       testl   %eax, %eax
           99.90 :   fbd34:       je      0xfbd2c
            0.00 :   fbd36:       movq    -0x18(%rbp), %rax
            0.00 :   fbd3a:       subq    %fs:0x28, %rax
            0.00 :   fbd43:       jne     0xfbd50
            0.00 :   fbd45:       addq    $0x10, %rsp
            0.00 :   fbd49:       xorl    %eax, %eax
            0.00 :   fbd4b:       popq    %rbx
            0.00 :   fbd4c:       popq    %r12
            0.00 :   fbd4e:       popq    %rbp
            0.00 :   fbd4f:       retq
            0.00 :   fbd50:       callq   0x407e0
            0.00 :   fbcd0:       pushq   %rbp
            0.00 :   fbcd1:       movq    %rsp, %rbp
            0.00 :   fbcd4:       pushq   %r12
            0.00 :   fbcd0:  push   %rbp
            0.00 :   fbcd1:  mov    %rsp,%rbp
            0.00 :   fbcd4:  push   %r12
        Basic annotate test [Success]
        ---- end(0) ----
         76: perf annotate basic tests                                       : Ok
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240424001231.849972-1-namhyung@kernel.org
      [ Improved a bit the error messages ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      281bf8f6
    • Ian Rogers's avatar
      perf parse-events: Tidy the setting of the default event name · bb65ff78
      Ian Rogers authored
      Add comments. Pass ownership of the event name to save on a strdup.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Tested-by: default avatarAtish Patra <atishp@rivosinc.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Beeman Strong <beeman@rivosinc.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240416061533.921723-17-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bb65ff78
    • Ian Rogers's avatar
      perf parse-events: Minor grouping tidy up · afd876bb
      Ian Rogers authored
      Add comments. Ensure leader->group_name is freed before overwriting
      it.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Tested-by: default avatarAtish Patra <atishp@rivosinc.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Beeman Strong <beeman@rivosinc.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240416061533.921723-16-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      afd876bb
    • Ian Rogers's avatar
      perf parse-event: Constify event_symbol arrays · 4a20e793
      Ian Rogers authored
      Moves 352 bytes from .data to .data.rel.ro.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Tested-by: default avatarAtish Patra <atishp@rivosinc.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Beeman Strong <beeman@rivosinc.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240416061533.921723-15-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4a20e793
    • Ian Rogers's avatar
      perf parse-events: Improvements to modifier parsing · e30a7912
      Ian Rogers authored
      Use a struct/bitmap rather than a copied string from lexer.
      
      In lexer give improved error message when too many precise flags are
      given or repeated modifiers.
      
      Before:
      
        $ perf stat -e 'cycles:kuk' true
        event syntax error: 'cycles:kuk'
                                    \___ Bad modifier
        ...
        $ perf stat -e 'cycles:pppp' true
        event syntax error: 'cycles:pppp'
                                    \___ Bad modifier
        ...
        $ perf stat -e '{instructions:p,cycles:pp}:pp' -a true
        event syntax error: '..cycles:pp}:pp'
                                          \___ Bad modifier
        ...
      
      After:
      
        $ perf stat -e 'cycles:kuk' true
        event syntax error: 'cycles:kuk'
                                      \___ Duplicate modifier 'k' (kernel)
        ...
        $ perf stat -e 'cycles:pppp' true
        event syntax error: 'cycles:pppp'
                                       \___ Maximum precise value is 3
        ...
        $ perf stat -e '{instructions:p,cycles:pp}:pp' true
        event syntax error: '..cycles:pp}:pp'
                                          \___ Maximum combined precise value is 3, adding precision to "cycles:pp"
        ...
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Tested-by: default avatarAtish Patra <atishp@rivosinc.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Beeman Strong <beeman@rivosinc.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240416061533.921723-14-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e30a7912
    • Ian Rogers's avatar
      perf parse-events: Inline parse_events_evlist_error · e18601d8
      Ian Rogers authored
      Inline parse_events_evlist_error that is only used in
      parse_events_error. Modify parse_events_error to not report a parser
      error unless errors haven't already been reported. Make it clearer
      that the latter case only happens for unrecognized input.
      
      Before:
      
        $ perf stat -e 'cycles/period=99999999999999999999/' true
        event syntax error: 'cycles/period=99999999999999999999/'
                                          \___ parser error
      
        event syntax error: '..les/period=99999999999999999999/'
                                          \___ Bad base 10 number "99999999999999999999"
        Run 'perf list' for a list of valid events
      
         Usage: perf stat [<options>] [<command>]
      
            -e, --event <event>   event selector. use 'perf list' to list available events
        $ perf stat -e 'cycles:xyz' true
        event syntax error: 'cycles:xyz'
                                   \___ parser error
        Run 'perf list' for a list of valid events
      
         Usage: perf stat [<options>] [<command>]
      
            -e, --event <event>   event selector. use 'perf list' to list available events
      
      After:
      
        $ perf stat -e 'cycles/period=99999999999999999999/xyz' true
        event syntax error: '..les/period=99999999999999999999/xyz'
                                          \___ Bad base 10 number "99999999999999999999"
        Run 'perf list' for a list of valid events
      
         Usage: perf stat [<options>] [<command>]
      
            -e, --event <event>   event selector. use 'perf list' to list available events
        $ perf stat -e 'cycles:xyz' true
        event syntax error: 'cycles:xyz'
                                   \___ Unrecognized input
        Run 'perf list' for a list of valid events
      
         Usage: perf stat [<options>] [<command>]
      
            -e, --event <event>   event selector. use 'perf list' to list available events
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Tested-by: default avatarAtish Patra <atishp@rivosinc.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Beeman Strong <beeman@rivosinc.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240416061533.921723-13-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e18601d8
    • Ian Rogers's avatar
      perf parse-events: Improve error message for bad numbers · ba5c371e
      Ian Rogers authored
      Use the error handler from the parse_state to give a more informative
      error message.
      
      Before:
      
        $ perf stat -e 'cycles/period=99999999999999999999/' true
        event syntax error: 'cycles/period=99999999999999999999/'
                                          \___ parser error
        Run 'perf list' for a list of valid events
      
         Usage: perf stat [<options>] [<command>]
      
            -e, --event <event>   event selector. use 'perf list' to list available events
      
      After:
      
        $ perf stat -e 'cycles/period=99999999999999999999/' true
        event syntax error: 'cycles/period=99999999999999999999/'
                                          \___ parser error
      
        event syntax error: '..les/period=99999999999999999999/'
                                          \___ Bad base 10 number "99999999999999999999"
        Run 'perf list' for a list of valid events
      
         Usage: perf stat [<options>] [<command>]
      
            -e, --event <event>   event selector. use 'perf list' to list available events
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Tested-by: default avatarAtish Patra <atishp@rivosinc.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Beeman Strong <beeman@rivosinc.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240416061533.921723-12-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ba5c371e
    • Ian Rogers's avatar
      perf parse-events: Inline parse_events_update_lists · 4e5484b4
      Ian Rogers authored
      The helper function just wraps a splice and free. Making the free
      inline removes a comment, so then it just wraps a splice which we can
      make inline too.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Tested-by: default avatarAtish Patra <atishp@rivosinc.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Beeman Strong <beeman@rivosinc.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240416061533.921723-11-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4e5484b4
    • Ian Rogers's avatar
      perf parse-events: Prefer sysfs/JSON hardware events over legacy · 617824a7
      Ian Rogers authored
      It was requested that RISC-V be able to add events to the perf tool so
      the PMU driver didn't need to map legacy events to config encodings:
      https://lore.kernel.org/lkml/20240217005738.3744121-1-atishp@rivosinc.com/
      
      This change makes the priority of events specified without a PMU the
      same as those specified with a PMU, namely sysfs and JSON events are
      checked first before using the legacy encoding.
      
      The hw_term is made more generic as a hardware_event that encodes a
      pair of string and int value, allowing parse_events_multi_pmu_add to
      fall back on a known encoding when the sysfs/JSON adding fails for
      core events. As this covers PE_VALUE_SYM_HW, that token is removed and
      related code simplified.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Tested-by: default avatarAtish Patra <atishp@rivosinc.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Beeman Strong <beeman@rivosinc.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      617824a7
    • Ian Rogers's avatar
      perf parse-events: Constify parse_events_add_numeric · 5ccc4edf
      Ian Rogers authored
      Allow the term list to be const so that other functions can pass const
      term lists. Add const as necessary to called functions.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Tested-by: default avatarAtish Patra <atishp@rivosinc.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Beeman Strong <beeman@rivosinc.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240416061533.921723-9-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      5ccc4edf
    • Ian Rogers's avatar
      perf parse-events: Handle PE_TERM_HW in name_or_raw · 9d0dba23
      Ian Rogers authored
      Avoid duplicate logic for name_or_raw and PE_TERM_HW by having a rule
      to turn PE_TERM_HW into a name_or_raw.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Tested-by: default avatarAtish Patra <atishp@rivosinc.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Beeman Strong <beeman@rivosinc.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240416061533.921723-8-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9d0dba23
    • Ian Rogers's avatar
      perf parse-events: Legacy cache names on all PMUs and lower priority · 62593394
      Ian Rogers authored
      Prior behavior is to not look for legacy cache names in sysfs/JSON and
      to create events on all core PMUs. New behavior is to look for
      sysfs/JSON events first on all PMUs, for core PMUs add a legacy event
      if the sysfs/JSON event isn't present.
      
      This is done so that there is consistency with how event names in
      terms are handled and their prioritization of sysfs/JSON over
      legacy. It may make sense to use a legacy cache event name as an event
      name on a non-core PMU so we should allow it.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Tested-by: default avatarAtish Patra <atishp@rivosinc.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Beeman Strong <beeman@rivosinc.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240416061533.921723-7-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      62593394
    • Ian Rogers's avatar
      perf tests parse-events: Use "branches" rather than "cache-references" · 78fae207
      Ian Rogers authored
      Switch from "cache-references" to "branches" in test as Intel has a
      sysfs event for "cache-references" and changing the priority for sysfs
      over legacy causes the test to fail.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Tested-by: default avatarAtish Patra <atishp@rivosinc.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Beeman Strong <beeman@rivosinc.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240416061533.921723-6-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      78fae207
    • Ian Rogers's avatar
      perf pmu: Refactor perf_pmu__match() · f91fa2ae
      Ian Rogers authored
      Move all implementation to pmu code. Don't allocate a fnmatch wildcard
      pattern, matching ignoring the suffix already handles this, and only
      use fnmatch if the given PMU name has a '*' in it.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Tested-by: default avatarAtish Patra <atishp@rivosinc.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Beeman Strong <beeman@rivosinc.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240416061533.921723-5-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f91fa2ae
    • Ian Rogers's avatar
      perf parse-events: Avoid copying an empty list · 90b2c210
      Ian Rogers authored
      In parse_events_add_pmu, delay copying the list of terms until it is
      known the list contains terms.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Tested-by: default avatarAtish Patra <atishp@rivosinc.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Beeman Strong <beeman@rivosinc.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240416061533.921723-4-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      90b2c210
    • Ian Rogers's avatar
      perf parse-events: Directly pass PMU to parse_events_add_pmu() · 63dfcde9
      Ian Rogers authored
      Avoid passing the name of a PMU then finding it again, just directly
      pass the PMU. parse_events_multi_pmu_add_or_add_pmu() is the only version
      that needs to find a PMU, so move the find there. Remove the error
      message as parse_events_multi_pmu_add_or_add_pmu will given an error at
      the end when a name isn't either a PMU name or event name. Without the
      error message being created the location in the input parameter (loc)
      can be removed.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Tested-by: default avatarAtish Patra <atishp@rivosinc.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Beeman Strong <beeman@rivosinc.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240416061533.921723-3-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      63dfcde9
    • Ian Rogers's avatar
      perf parse-events: Factor out '<event_or_pmu>/.../' parsing · 8b734eaa
      Ian Rogers authored
      Factor out the case of an event or PMU name followed by a slash based
      term list. This is with a view to sharing the code with new legacy
      hardware parsing. Use early return to reduce indentation in the code.
      Make parse_events_add_pmu static now it doesn't need sharing with
      parse-events.y.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Tested-by: default avatarAtish Patra <atishp@rivosinc.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Beeman Strong <beeman@rivosinc.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240416061533.921723-2-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8b734eaa
    • Adrian Hunter's avatar
      perf scripts python: Add a script to run instances of 'perf script' in parallel · e0c48bf9
      Adrian Hunter authored
      Add a Python script to run a perf script command multiple times in
      parallel, using perf script options --cpu and --time so that each job
      processes a different chunk of the data.
      
      Extend perf script tests to test also the new script.
      
      The script supports the use of normal 'perf script' options like
      --dlfilter and --script, so that the benefit of running parallel jobs
      naturally extends to them also. In addition, a command can be provided
      (refer --pipe-to option) to pipe standard output to a custom command.
      
      Refer to the script's own help text at the end of the patch for more
      details.
      
      The script is useful for Intel PT traces, that can be efficiently
      decoded by 'perf script' when split by CPU and/or time ranges. Running
      jobs in parallel can decrease the overall decoding time.
      
      Committer testing:
      
        Ian reported that shellcheck found some issues, I installed it as there
        are no warnings about it not being available, but when available it
        fails the build with:
      
          TEST    /tmp/build/perf-tools-next/tests/shell/script.sh.shellcheck_log
          CC      /tmp/build/perf-tools-next/util/header.o
      
        In tests/shell/script.sh line 20:
                        rm -rf "${temp_dir}/"*
                               ^-------------^ SC2115 (warning): Use "${var:?}" to ensure this never expands to /* .
      
        In tests/shell/script.sh line 83:
                output1_dir="${temp_dir}/output1"
                ^---------^ SC2034 (warning): output1_dir appears unused. Verify use (or export if used externally).
      
        In tests/shell/script.sh line 84:
                output2_dir="${temp_dir}/output2"
                ^---------^ SC2034 (warning): output2_dir appears unused. Verify use (or export if used externally).
      
        In tests/shell/script.sh line 86:
                python3 "${pp}" -o "${output_dir}" --jobs 4 --verbose -- perf script -i "${perf_data}"
                                    ^-----------^ SC2154 (warning): output_dir is referenced but not assigned (did you mean 'output1_dir'?).
      
        For more information:
          https://www.shellcheck.net/wiki/SC2034 -- output1_dir appears unused. Verif...
          https://www.shellcheck.net/wiki/SC2115 -- Use "${var:?}" to ensure this nev...
          https://www.shellcheck.net/wiki/SC2154 -- output_dir is referenced but not ...
      
      Did these fixes:
      
        -               rm -rf "${temp_dir}/"*
        +               rm -rf "${temp_dir:?}/"*
      
      And:
      
         @@ -83,8 +83,8 @@ test_parallel_perf()
                output1_dir="${temp_dir}/output1"
                output2_dir="${temp_dir}/output2"
                perf record -o "${perf_data}" --sample-cpu uname
        -       python3 "${pp}" -o "${output_dir}" --jobs 4 --verbose -- perf script -i "${perf_data}"
        -       python3 "${pp}" -o "${output_dir}" --jobs 4 --verbose --per-cpu -- perf script -i "${perf_data}"
        +       python3 "${pp}" -o "${output1_dir}" --jobs 4 --verbose -- perf script -i "${perf_data}"
        +       python3 "${pp}" -o "${output2_dir}" --jobs 4 --verbose --per-cpu -- perf script -i "${perf_data}"
      
      After that:
      
        root@number:~# perf test -vv "perf script tests"
         97: perf script tests:
        --- start ---
        test child forked, pid 4084139
        DB test
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.032 MB /tmp/perf-test-script.T4MJDr0L6J/perf.data (7 samples) ]
        <SNIP>
        DB test [Success]
        parallel-perf test
        Linux
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.034 MB /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data (7 samples) ]
        Starting: perf script --time=,91898.301878499 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Starting: perf script --time=91898.301878500,91898.301905999 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Starting: perf script --time=91898.301906000,91898.301933499 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Starting: perf script --time=91898.301933500, -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Finished: perf script --time=91898.301878500,91898.301905999 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Finished: perf script --time=91898.301906000,91898.301933499 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        There are 4 jobs: 2 completed, 2 running
        Finished: perf script --time=,91898.301878499 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Finished: perf script --time=91898.301933500, -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        There are 4 jobs: 4 completed, 0 running
        All jobs finished successfully
        parallel-perf.py done
        Starting: perf script --cpu=0 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Starting: perf script --cpu=1 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Starting: perf script --cpu=2 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Starting: perf script --cpu=3 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Finished: perf script --cpu=0 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Finished: perf script --cpu=1 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Finished: perf script --cpu=2 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Finished: perf script --cpu=3 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        There are 28 jobs: 4 completed, 0 running
        Starting: perf script --cpu=4 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Starting: perf script --cpu=5 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Starting: perf script --cpu=6 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Starting: perf script --cpu=7 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Finished: perf script --cpu=4 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Finished: perf script --cpu=5 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Finished: perf script --cpu=6 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Finished: perf script --cpu=7 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        There are 28 jobs: 8 completed, 0 running
        Starting: perf script --cpu=8 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Starting: perf script --cpu=9 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Starting: perf script --cpu=10 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Starting: perf script --cpu=11 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Finished: perf script --cpu=8 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Finished: perf script --cpu=9 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Finished: perf script --cpu=10 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Finished: perf script --cpu=11 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        There are 28 jobs: 12 completed, 0 running
        Starting: perf script --cpu=12 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Starting: perf script --cpu=13 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Starting: perf script --cpu=14 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Starting: perf script --cpu=15 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Finished: perf script --cpu=12 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Finished: perf script --cpu=13 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Finished: perf script --cpu=14 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Finished: perf script --cpu=15 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        There are 28 jobs: 16 completed, 0 running
        Starting: perf script --cpu=16 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Starting: perf script --cpu=17 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Starting: perf script --cpu=18 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Starting: perf script --cpu=19 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Finished: perf script --cpu=16 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Finished: perf script --cpu=17 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Finished: perf script --cpu=18 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Finished: perf script --cpu=19 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        There are 28 jobs: 20 completed, 0 running
        Starting: perf script --cpu=20 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Starting: perf script --cpu=21 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Starting: perf script --cpu=22 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Starting: perf script --cpu=23 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Finished: perf script --cpu=20 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Finished: perf script --cpu=21 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Finished: perf script --cpu=22 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Finished: perf script --cpu=23 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        There are 28 jobs: 24 completed, 0 running
        Starting: perf script --cpu=24 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Starting: perf script --cpu=25 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Starting: perf script --cpu=26 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Starting: perf script --cpu=27 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Finished: perf script --cpu=25 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Finished: perf script --cpu=26 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        Finished: perf script --cpu=27 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        There are 28 jobs: 27 completed, 1 running
        Finished: perf script --cpu=24 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
        There are 28 jobs: 28 completed, 0 running
        All jobs finished successfully
        parallel-perf.py done
        parallel-perf test [Success]
        --- Cleaning up ---
        ---- end(0) ----
         97: perf script tests                                               : Ok
        root@number:~#
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240423133248.10206-1-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e0c48bf9
    • Arnaldo Carvalho de Melo's avatar
      tools lib rbtree: Pick some improvements from the kernel rbtree code · cd88c11c
      Arnaldo Carvalho de Melo authored
      The tools/lib/rbtree.c code came from the kernel, removing the
      EXPORT_SYMBOL() that make sense only there, unfortunately it is not
      being checked with tools/perf/check_headers.sh, will try to remedy this,
      till then pick the improvements from:
      
        b0687c11 ("lib/rbtree: use '+' instead of '|' for setting color.")
      
      That I noticed by doing:
      
        diff -u tools/lib/rbtree.c lib/rbtree.c
        diff -u tools/include/linux/rbtree_augmented.h include/linux/rbtree_augmented.h
      
      There is one other cases, but lets pick it in separate patches.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Noah Goldstein <goldstein.w.n@gmail.com>
      Link: https://lore.kernel.org/lkml/ZigZzeFoukzRKG1Q@x1Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      cd88c11c
    • Arnaldo Carvalho de Melo's avatar
      perf tests shell kprobes: Add missing description as used by 'perf test' output · 7255fcc8
      Arnaldo Carvalho de Melo authored
      Before:
      
        root@x1:~# perf test 76
         76: SPDX-License-Identifier: GPL-2.0                                : Ok
        root@x1:~#
      
      After:
      
        root@x1:~# perf test 76
         76: Add 'perf probe's, list and remove them.                        : Ok
        root@x1:~#
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Veronika Molnarova <vmolnaro@redhat.com>
      Link: https://lore.kernel.org/lkml/ZigRDKUGkcDqD-yW@x1Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7255fcc8