1. 07 Apr, 2023 12 commits
    • Ian Rogers's avatar
      perf pmu: Improve name/comments, avoid a memory allocation · 240e6fd0
      Ian Rogers authored
      Improve documentation around perf_pmu_alias pmu_name and on
      functions.
      
      Reduce the scope of pmu_uncore_alias_match to just file.
      
      Rename perf_pmu__valid_suffix to the more revealing
      perf_pmu__match_ignoring_suffix.
      
      Add a short-cut to perf_pmu__match_ignoring_suffix for PMU names that
      don't also have a socket value, and can therefore avoid a memory
      allocation.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Link: https://lore.kernel.org/r/20230406235256.2768773-1-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      240e6fd0
    • Ian Rogers's avatar
      perf pmu: Fewer const casts · 330f40a0
      Ian Rogers authored
      struct pmu_event has const char*s, only unit needs to be non-const for
      the sake of passing as an out argument to strtod().
      
      Reduce the const casts from 4 down to 1.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Link: https://lore.kernel.org/r/20230406235256.2768773-1-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      330f40a0
    • Namhyung Kim's avatar
      perf lock contention: Do not try to update if hash map is full · 222de5e5
      Namhyung Kim authored
      It doesn't delete data in the task_data and lock_stat maps.  The data
      is kept there until it's consumed by userspace at the end.  But it calls
      bpf_map_update_elem() again and again, and the data will be discarded if
      the map is full.  This is not good.
      
      Worse, in the bpf_map_update_elem(), it keeps trying to get a new node
      even if the map was full.  I guess it makes sense if it deletes some node
      like in the tstamp map (that's why I didn't make the change there).
      
      In a pre-allocated hash map, that means it'd iterate all CPU to check the
      freelist.  And it has a bad performance impact on large machines.
      
      I've checked it on my 64 CPU machine with this.
      
        $ perf bench sched messaging -g 1000
        # Running 'sched/messaging' benchmark:
        # 20 sender and receiver processes per group
        # 1000 groups == 40000 processes run
      
             Total time: 2.825 [sec]
      
      And I used the task mode, so that it can guarantee the map is full.
      The default map entry size is 16K and this workload has 40K tasks.
      
      Before:
        $ sudo ./perf lock con -abt -E3 -- perf bench sched messaging -g 1000
        # Running 'sched/messaging' benchmark:
        # 20 sender and receiver processes per group
        # 1000 groups == 40000 processes run
      
             Total time: 11.299 [sec]
         contended   total wait     max wait     avg wait          pid   comm
      
             19284      3.51 s       3.70 ms    181.91 us      1305863   sched-messaging
               243     84.09 ms    466.67 us    346.04 us      1336608   sched-messaging
               177     66.35 ms     12.08 ms    374.88 us      1220416   node
      
      For some reason, it didn't report the data failures.  But you can see the
      total time in the workload is increased a lot (2.8 -> 11.3).  If it fails
      early when the map is full, it goes back to normal.
      
      After:
        $ sudo ./perf lock con -abt -E3 -- perf bench sched messaging -g 1000
        # Running 'sched/messaging' benchmark:
        # 20 sender and receiver processes per group
        # 1000 groups == 40000 processes run
      
             Total time: 3.044 [sec]
         contended   total wait     max wait     avg wait          pid   comm
      
             18743    591.92 ms    442.96 us     31.58 us      1431454   sched-messaging
                51    210.64 ms    207.45 ms      4.13 ms      1468724   sched-messaging
                81     68.61 ms     65.79 ms    847.07 us      1463183   sched-messaging
      
        === output for debug ===
      
        bad: 1164137, total: 2253341
        bad rate: 51.66 %
        histogram of failure reasons
               task: 0
              stack: 0
               time: 0
               data: 1164137
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Hao Luo <haoluo@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <song@kernel.org>
      Cc: bpf@vger.kernel.org
      Link: https://lore.kernel.org/r/20230406210611.1622492-2-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      222de5e5
    • Namhyung Kim's avatar
      perf lock contention: Revise needs_callstack() condition · 0fba2265
      Namhyung Kim authored
      It needs callstacks for two reasons:
      
       * for stack aggregation mode, the map key is the stack id and it can
         also show the full stack traces when -v is used
      
       * for other aggregation modes, the stack filter can be used to limit
         lock contentions from known call paths
      
      The -v option is meaningful (in terms of stack trace) only for stack
      aggregation mode, so it should not set the save_callstack for other
      mode like with -t or -l options.
      
      I've noticed this with the following command line:
      
        $ sudo ./perf lock con -ablv -E 3 -M 16 -- ./perf bench sched messaging
        ...
         contended   total wait     max wait     avg wait            address   symbol
      
                88      4.59 ms    108.07 us     52.13 us   ffff935757f46ec0    (spinlock)
                33    905.22 us     73.67 us     27.43 us   ffff935757f41700    (spinlock)
                28    703.69 us     79.28 us     25.13 us   ffff938a3d9b0c80   rq_lock (spinlock)
      
        === output for debug ===
      
        bad: 12272, total: 12421
        bad rate: 98.80 %
        histogram of failure reasons
               task: 8285
              stack: 3987    <---------- here
               time: 0
               data: 0
      
      It should not have any failure on stacks since it doesn't use it.
      No functional change intended.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Hao Luo <haoluo@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <song@kernel.org>
      Cc: bpf@vger.kernel.org
      Link: https://lore.kernel.org/r/20230406210611.1622492-2-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      0fba2265
    • Namhyung Kim's avatar
      perf lock contention: Update total/bad stats for hidden entries · aae7e453
      Namhyung Kim authored
      When -E option is used, it only prints the given number of entries but
      the event stat at the end should have the numbers for entire entries.
      
      Likewise, -S option will hide entries that don't have the named
      function in the callstack.  Also update event stat for them.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Hao Luo <haoluo@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <song@kernel.org>
      Cc: bpf@vger.kernel.org
      Link: https://lore.kernel.org/r/20230406210611.1622492-2-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      aae7e453
    • Namhyung Kim's avatar
      perf lock contention: Add data failure stat · 954cdac7
      Namhyung Kim authored
      It's possible to fail to update the data when the lock_stat map is full.
      We should check that case and show the number at the end.
      
        $ sudo ./perf lock con -ablv -E3 -- ./perf bench sched messaging
        ...
         contended   total wait     max wait     avg wait            address   symbol
      
              6157    208.48 ms     69.29 us     33.86 us   ffff934c001c1f00    (spinlock)
              4030     72.04 ms     61.84 us     17.88 us   ffff934c000415c0    (spinlock)
              3201     50.30 ms     47.73 us     15.71 us   ffff934c2eead850    (spinlock)
      
        === output for debug ===
      
        bad: 0, total: 13388
        bad rate: 0.00 %
        histogram of failure reasons
               task: 0
              stack: 0
               time: 0
               data: 0      <----- added
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Hao Luo <haoluo@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <song@kernel.org>
      Cc: bpf@vger.kernel.org
      Link: https://lore.kernel.org/r/20230406210611.1622492-2-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      954cdac7
    • Namhyung Kim's avatar
      perf lock contention: Update default map size to 16384 · 2d8d0165
      Namhyung Kim authored
      The BPF hash map will align the map size to a power of 2.  So 10k would
      be 16k anyway.  Let's have the actual size to avoid confusions.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Hao Luo <haoluo@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <song@kernel.org>
      Cc: bpf@vger.kernel.org
      Link: https://lore.kernel.org/r/20230406210611.1622492-2-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2d8d0165
    • Namhyung Kim's avatar
      perf lock contention: Use -M for --map-nr-entries · 84b91920
      Namhyung Kim authored
      Users often want to change the map size, let's add a short option (-M)
      for that.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Hao Luo <haoluo@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <song@kernel.org>
      Cc: bpf@vger.kernel.org
      Link: https://lore.kernel.org/r/20230406210611.1622492-2-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      84b91920
    • Namhyung Kim's avatar
      perf lock contention: Simplify parse_lock_type() · d783ea8f
      Namhyung Kim authored
      The get_type_flag() should check both str and name fields in the
      lock_type_table so that it can find the appropriate flag without retrying
      with ':R' or ':W' suffix from the caller.
      
      Also fix a typo in the rt-mutex.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Hao Luo <haoluo@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <song@kernel.org>
      Cc: bpf@vger.kernel.org
      Link: https://lore.kernel.org/r/20230406210611.1622492-2-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d783ea8f
    • Liam Howlett's avatar
      tools: Rename __fallthrough to fallthrough · f7a858bf
      Liam Howlett authored
      Rename the fallthrough attribute to better align with the kernel
      version.  Copy the definition from include/linux/compiler_attributes.h
      including the #else clause.  Adding the #else clause allows the tools
      compiler.h header to drop the check for a definition entirely and keeps
      both definitions together.
      
      Change any __fallthrough statements to fallthrough anywhere it was used
      within perf.
      
      This allows other tools to use the same key word as the kernel.
      
      Committer notes:
      
      Did some missing conversions to:
      
        builtin-list.c
      
      Also included gtk.h before the 'fallthrough' definition in:
      
        tools/perf/ui/gtk/hists.c
        tools/perf/ui/gtk/helpline.c
        tools/perf/ui/gtk/browser.c
      
      As it is the arg name for a macro in glib.h:
      
        /var/home/acme/git/perf-tools-next/tools/include/linux/compiler-gcc.h:16:55: error: missing binary operator before token "("
           16 | # define fallthrough                    __attribute__((__fallthrough__))
              |                                                       ^
        /usr/include/glib-2.0/glib/gmacros.h:637:28: note: in expansion of macro ‘fallthrough’
          637 | #if g_macro__has_attribute(fallthrough)
      Reviewed-by: default avatarMiguel Ojeda <ojeda@kernel.org>
      Signed-off-by: default avatarLiam Howlett <Liam.Howlett@oracle.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Tom Rix <trix@redhat.com>
      Cc: linux-sparse@vger.kernel.org <linux-sparse@vger.kernel.org>
      Cc: llvm@lists.linux.dev <llvm@lists.linux.dev>
      Link: https://lore.kernel.org/r/20221125154947.2163498-1-Liam.Howlett@oracle.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f7a858bf
    • Ian Rogers's avatar
      perf pmu: Fix a few potential fd leaks · 0ea8920e
      Ian Rogers authored
      Ensure fd is closed on error paths.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Gaosheng Cui <cuigaosheng1@huawei.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Link: https://lore.kernel.org/r/20230406065224.2553640-1-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      0ea8920e
    • Ian Rogers's avatar
      perf pmu: Make parser reentrant · 3d88aec0
      Ian Rogers authored
      By default bison uses global state for compatibility with yacc. Make
      the parser reentrant so that it may be used in asynchronous and
      multithreaded situations.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Gaosheng Cui <cuigaosheng1@huawei.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Link: https://lore.kernel.org/r/20230406065224.2553640-1-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3d88aec0
  2. 04 Apr, 2023 28 commits
    • Ian Rogers's avatar
      perf map: Add accessor for start and end · e5116f46
      Ian Rogers authored
      Later changes will add reference count checking for struct map, start
      and end are frequently accessed variables. Add an accessor so that the
      reference count check is only necessary in one place.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Darren Hart <dvhart@infradead.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Dmitriy Vyukov <dvyukov@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Hao Luo <haoluo@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Miaoqian Lin <linmq006@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
      Cc: Song Liu <song@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Cc: Yury Norov <yury.norov@gmail.com>
      Link: https://lore.kernel.org/r/20230320212248.1175731-2-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e5116f46
    • Ian Rogers's avatar
      perf map: Add accessor for dso · 63df0e4b
      Ian Rogers authored
      Later changes will add reference count checking for struct map, with
      dso being the most frequently accessed variable. Add an accessor so
      that the reference count check is only necessary in one place.
      
      Additional changes:
       - add a dso variable to avoid repeated map__dso calls.
       - in builtin-mem.c dump_raw_samples, code only partially tested for
         dso == NULL. Make the possibility of NULL consistent.
       - in thread.c thread__memcpy fix use of spaces and use tabs.
      
      Committer notes:
      
      Did missing conversions on these files:
      
         tools/perf/arch/powerpc/util/skip-callchain-idx.c
         tools/perf/arch/powerpc/util/sym-handling.c
         tools/perf/ui/browsers/hists.c
         tools/perf/ui/gtk/annotate.c
         tools/perf/util/cs-etm.c
         tools/perf/util/thread.c
         tools/perf/util/unwind-libunwind-local.c
         tools/perf/util/unwind-libunwind.c
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Darren Hart <dvhart@infradead.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Dmitriy Vyukov <dvyukov@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Hao Luo <haoluo@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Miaoqian Lin <linmq006@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
      Cc: Song Liu <song@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Cc: Yury Norov <yury.norov@gmail.com>
      Link: https://lore.kernel.org/r/20230320212248.1175731-2-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      63df0e4b
    • Ian Rogers's avatar
      perf maps: Add functions to access maps · 5ab6d715
      Ian Rogers authored
      Introduce functions to access struct maps. These functions reduce the
      number of places reference counting is necessary. While tidying APIs do
      some small const-ification, in particlar to unwind_libunwind_ops.
      
      Committer notes:
      
      Fixed up tools/perf/util/unwind-libunwind.c:
      
      -               return ops->get_entries(cb, arg, thread, data, max_stack);
      +               return ops->get_entries(cb, arg, thread, data, max_stack, best_effort);
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Darren Hart <dvhart@infradead.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Dmitriy Vyukov <dvyukov@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Hao Luo <haoluo@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Miaoqian Lin <linmq006@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
      Cc: Song Liu <song@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Cc: Yury Norov <yury.norov@gmail.com>
      Link: https://lore.kernel.org/r/20230320212248.1175731-2-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      5ab6d715
    • Ian Rogers's avatar
      perf maps: Remove rb_node from struct map · ff583dc4
      Ian Rogers authored
      struct map is reference counted, having it also be a node in an
      red-black tree complicates the reference counting. Switch to having a
      map_rb_node which is a red-block tree node but points at the reference
      counted struct map. This reference is responsible for a single reference
      count.
      
      Committer notes:
      
      Fixed up tools/perf/util/unwind-libunwind-local.c to use map_rb_node as
      well.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Darren Hart <dvhart@infradead.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Dmitriy Vyukov <dvyukov@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Hao Luo <haoluo@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Miaoqian Lin <linmq006@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
      Cc: Song Liu <song@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Cc: Yury Norov <yury.norov@gmail.com>
      Link: https://lore.kernel.org/r/20230320212248.1175731-2-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ff583dc4
    • Ian Rogers's avatar
      perf map: Move map list node into symbol · 83720209
      Ian Rogers authored
      Using a perf map as a list node is only done in symbol. Move the
      list_node struct into symbol as a single pointer to the map. This makes
      reference count behavior more obvious and easy to check.
      
      Committer notes:
      
      Some changes to reduce the number of lines touched by keeping, for
      instance, the 'new_map' variable and setting it to new_node->map, so
      that we keep more of the project history in place and keep as much
      as possible the value of the 'git blame' tool.
      
      Also use map__zput() when putting a struct members, so that when we free
      the container struct we can get use-after-free errors as NULL pointer
      derefs sometimes.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Darren Hart <dvhart@infradead.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Dmitriy Vyukov <dvyukov@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Hao Luo <haoluo@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Miaoqian Lin <linmq006@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
      Cc: Song Liu <song@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Cc: Yury Norov <yury.norov@gmail.com>
      Link: https://lore.kernel.org/r/20230320212248.1175731-2-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      83720209
    • Ian Rogers's avatar
      perf jit: Fix a few memory leaks · dc67c783
      Ian Rogers authored
      As reported by leak sanitizer.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Brian Robbins <brianrob@linux.microsoft.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yuan Can <yuancan@huawei.com>
      Link: https://lore.kernel.org/r/20230403203545.1872196-1-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      dc67c783
    • Ian Rogers's avatar
      perf build: Allow C++ demangle without libelf · 3ad45105
      Ian Rogers authored
      The cxa demangle support isn't dependent on libelf and so we no longer
      need to disable demangling if libelf isn't present.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andrii Nakryiko <andrii@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20230403211021.1892231-1-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3ad45105
    • Ian Rogers's avatar
      perf srcline: Avoid addr2line SIGPIPEs · 75a616c6
      Ian Rogers authored
      Ignore SIGPIPEs when addr2line is configured.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tom Rix <trix@redhat.com>
      Cc: llvm@lists.linux.dev
      Link: https://lore.kernel.org/r/20230403184033.1836023-2-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      75a616c6
    • Ian Rogers's avatar
      perf srcline: Support for llvm-addr2line · 2c4b9280
      Ian Rogers authored
      The sentinel value differs for llvm-addr2line. Configure this once and
      then detect when reading records.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tom Rix <trix@redhat.com>
      Cc: llvm@lists.linux.dev
      Link: https://lore.kernel.org/r/20230403184033.1836023-2-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2c4b9280
    • Ian Rogers's avatar
      perf srcline: Simplify addr2line subprocess · b3801e79
      Ian Rogers authored
      Don't wrap stdin and stdout of subprocess with streams, use the api/io
      library for buffering.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tom Rix <trix@redhat.com>
      Cc: llvm@lists.linux.dev
      Link: https://lore.kernel.org/r/20230403184033.1836023-2-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b3801e79
    • Ian Rogers's avatar
      tools api: Add io__getline · c9dc580c
      Ian Rogers authored
      Reads a line to allocated memory up to a newline following the getline
      API.
      
      Committer notes:
      
      It also adds this new function to the 'api io' 'perf test' entry:
      
        $ perf test "api io"
         64: Test api io                                                     : Ok
        $
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tom Rix <trix@redhat.com>
      Cc: llvm@lists.linux.dev
      Link: https://lore.kernel.org/r/20230403184033.1836023-2-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c9dc580c
    • Namhyung Kim's avatar
      perf intel-pt: Use perf_pmu__scan_file_at() if possible · 98b7ce0e
      Namhyung Kim authored
      Intel-PT calls perf_pmu__scan_file() a lot, let's use relative address
      when it accesses multiple files at one place.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20230331202949.810326-2-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      98b7ce0e
    • Namhyung Kim's avatar
      perf pmu: Add perf_pmu__{open,scan}_file_at() · 3a69672e
      Namhyung Kim authored
      These two helpers will also use openat() to reduce the overhead with
      relative pathnames.  Convert other functions in pmu_lookup() to use
      the new helpers.
      
      Committer testing:
      
      Before:
      
        ⬢[acme@toolbox perf-tools-next]$ perf bench internals pmu-scan
        # Running 'internals/pmu-scan' benchmark:
        Computing performance of sysfs PMU event scan for 100 times
          Average PMU scanning took: 2729.040 usec (+- 7.117 usec)
        ⬢[acme@toolbox perf-tools-next]$
      
      After:
      
        ⬢[acme@toolbox perf-tools-next]$ perf bench internals pmu-scan
        # Running 'internals/pmu-scan' benchmark:
        Computing performance of sysfs PMU event scan for 100 times
          Average PMU scanning took: 2419.870 usec (+- 9.057 usec)
        ⬢[acme@toolbox perf-tools-next]$
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20230331202949.810326-2-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3a69672e
    • Namhyung Kim's avatar
      perf pmu: Use relative path in setup_pmu_alias_list() · 46378665
      Namhyung Kim authored
      Likewise, x86 needs to traverse the PMU list to build alias.
      Let's use the new helpers to use relative paths.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20230331202949.810326-2-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      46378665
    • Namhyung Kim's avatar
      perf pmu: Use relative path in perf_pmu__caps_parse() · b39094d3
      Namhyung Kim authored
      Likewise, it needs to traverse the pmu/caps directory, let's use
      openat() with the dirfd instead of open() using the absolute path.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Link: https://lore.kernel.org/r/20230331202949.810326-2-namhyung@kernel.org
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: LKML <linux-kernel@vger.kernel.org>
      Cc: linux-perf-users@vger.kernel.org
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b39094d3
    • Namhyung Kim's avatar
      perf pmu: Use relative path for sysfs scan · e293a5e8
      Namhyung Kim authored
      The PMU information is in the kernel sysfs so it needs to scan the
      directory to get the whole information like event aliases, formats and
      so on.  During the traversal, it opens a lot of files and directories
      like below:
      
        dir = opendir("/sys/bus/event_source/devices");
        while (dentry = readdir(dir)) {
          char buf[PATH_MAX];
      
          snprintf(buf, sizeof(buf), "%s/%s",
                   "/sys/bus/event_source/devices", dentry->d_name);
          fd = open(buf, O_RDONLY);
          ...
        }
      
      But this is not good since it needs to copy the string to build the
      absolute pathname, and it makes redundant pathname walk (from the /sys)
      unnecessarily.  We can use openat(2) to open the file in the given
      directory.  While it's not a problem ususally, it can be a problem when
      the kernel has contentions on the sysfs.
      
      Add a couple of new helper to return the file descriptor of PMU
      directory so that it can use it with relative paths.
      
       * perf_pmu__event_source_devices_fd()
         - returns a fd for the PMU root ("/sys/bus/event_source/devices")
      
       * perf_pmu__pathname_fd()
         - returns a fd for "<pmu>/<file>" under the PMU root
      
      Now the above code can be converted something like below:
      
        dirfd = perf_pmu__event_source_devices_fd();
        dir = fdopendir(dirfd);
        while (dentry = readdir(dir)) {
          fd = openat(dirfd, dentry->d_name, O_RDONLY);
          ...
        }
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20230331202949.810326-2-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e293a5e8
    • Namhyung Kim's avatar
      perf bench: Add pmu-scan benchmark · f6a7bbbf
      Namhyung Kim authored
      The pmu-scan benchmark will repeatedly scan the sysfs to get the
      available PMU information.
      
        $ ./perf bench internals pmu-scan
        # Running 'internals/pmu-scan' benchmark:
        Computing performance of sysfs PMU event scan for 100 times
          Average PMU scanning took: 6850.990 usec (+- 48.445 usec)
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20230331202949.810326-2-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f6a7bbbf
    • Namhyung Kim's avatar
      perf pmu: Add perf_pmu__destroy() function · eec11310
      Namhyung Kim authored
      It seems there's no function to delete the perf pmu struct.  Add the
      perf_pmu__destroy() to do the job.  While at it, add some more helper
      functions to delete pmu aliases and caps.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20230331202949.810326-2-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      eec11310
    • Namhyung Kim's avatar
      perf tools: Fix a asan issue in parse_events_multi_pmu_add() · 66c9598b
      Namhyung Kim authored
      In the parse_events_multi_pmu_add() it passes the 'config' variable
      twice to parse_events_term__num() - one for config and another for
      loc_term.  I'm not sure about the second one as it's converted to
      YYLTYPE variable.  Asan reports it like below:
      
        In function ‘parse_events_term__num’,
            inlined from ‘parse_events_multi_pmu_add’ at util/parse-events.c:1602:6:
        util/parse-events.c:2653:64: error: array subscript ‘YYLTYPE[0]’ is partly outside
                                            array bounds of ‘char[8]’ [-Werror=array-bounds]
         2653 |                 .err_term  = loc_term ? loc_term->first_column : 0,
              |                              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~
        util/parse-events.c: In function ‘parse_events_multi_pmu_add’:
        util/parse-events.c:1587:15: note: object ‘config’ of size 8
         1587 |         char *config;
              |               ^~~~~~
        cc1: all warnings being treated as errors
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20230331202949.810326-2-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      66c9598b
    • Namhyung Kim's avatar
      perf list: Use relative path for tracepoint scan · 00462d8e
      Namhyung Kim authored
      Committer notes:
      
      Added missing #include <unistd.h> for the close() prototype to fix this
      on Alma Linux 8:
      
         1    21.54 almalinux:8                   : FAIL gcc version 8.5.0 20210514 (Red Hat 8.5.0-16) (GCC)
          util/print-events.c: In function 'print_tracepoint_events':
          util/print-events.c:103:4: error: implicit declaration of function 'close'; did you mean 'clone'? [-Werror=implicit-function-declaration]
              close(evt_fd);
              ^~~~~
              clone
      
      Also use the newly added scandirat feature test to check if that
      function is available, providing a HAVE_SCANDIRAT_SUPPORT conditional
      warning to the user if it isn't available.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20230331202949.810326-2-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      00462d8e
    • Arnaldo Carvalho de Melo's avatar
      tools build: Add a feature test for scandirat(), that is not implemented so far in musl and uclibc · 9e03608e
      Arnaldo Carvalho de Melo authored
      We use it just when listing tracepoint events, and for root, so just
      emit a warning about it to get users to ask the library maintainers to
      implement it, as suggested in this systemd ticket:
      
       https://github.com/systemd/casync/issues/129
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/lkml/ZCwv4z5Dh%2FdHUMG6@kernel.org/Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9e03608e
    • Adrian Hunter's avatar
      perf intel-pt: Fix CYC timestamps after standalone CBR · 430635a0
      Adrian Hunter authored
      After a standalone CBR (not associated with TSC), update the cycles
      reference timestamp and reset the cycle count, so that CYC timestamps
      are calculated relative to that point with the new frequency.
      
      Fixes: cc336186 ("perf tools: Add Intel PT support for decoding CYC packets")
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20230403154831.8651-2-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      430635a0
    • Adrian Hunter's avatar
      perf auxtrace: Fix address filter entire kernel size · 1f9f33cc
      Adrian Hunter authored
      kallsyms is not completely in address order.
      
      In find_entire_kern_cb(), calculate the kernel end from the maximum
      address not the last symbol.
      
      Example:
      
       Before:
      
          $ sudo cat /proc/kallsyms | grep ' [twTw] ' | tail -1
          ffffffffc00b8bd0 t bpf_prog_6deef7357e7b4530    [bpf]
          $ sudo cat /proc/kallsyms | grep ' [twTw] ' | sort | tail -1
          ffffffffc15e0cc0 t iwl_mvm_exit [iwlmvm]
          $ perf.d093603a05aa record -v --kcore -e intel_pt// --filter 'filter *' -- uname |& grep filter
          Address filter: filter 0xffffffff93200000/0x2ceba000
      
       After:
      
          $ perf.8fb0f7a01f8e record -v --kcore -e intel_pt// --filter 'filter *' -- uname |& grep filter
          Address filter: filter 0xffffffff93200000/0x2e3e2000
      
      Fixes: 1b36c03e ("perf record: Add support for using symbols in address filters")
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20230403154831.8651-2-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      1f9f33cc
    • Rob Herring's avatar
      perf arm-spe: Add raw decoding for SPEv1.3 MTE and MOPS load/store · 34fb6040
      Rob Herring authored
      Arm SPEv1.3 adds new load/store operation subclasses for Memory Tagging
      Extension (MTE) and memory operations (MOPS). The memory operations
      are memcpy and memset. Add support for decoding these new subclasses in
      the raw decoding.
      
      Reviewed-by: Leo Yan <leo.yan@linaro.org
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20230327162057.4057188-1-robh@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      34fb6040
    • Mike Leach's avatar
      perf cs-etm: Handle PERF_RECORD_AUX_OUTPUT_HW_ID packet · b6521ea2
      Mike Leach authored
      When using dynamically assigned CoreSight trace IDs the drivers can output
      the ID / CPU association as a PERF_RECORD_AUX_OUTPUT_HW_ID packet.
      
      Update cs-etm decoder to handle this packet by setting the CPU/Trace ID
      mapping.
      Reviewed-by: default avatarJames Clark <james.clark@arm.com>
      Signed-off-by: default avatarMike Leach <mike.leach@linaro.org>
      Acked-by: default avatarSuzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Darren Hart <darren@os.amperecomputing.com>
      Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20230331055645.26918-2-mike.leach@linaro.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b6521ea2
    • Mike Leach's avatar
      perf cs-etm: Update record event to use new Trace ID protocol · e5fa5b41
      Mike Leach authored
      Trace IDs are now dynamically allocated.
      
      Previously used the static association algorithm that is no longer
      used. The 'cpu * 2 + seed' was outdated and broken for systems with high
      core counts (>46). as it did not scale and was broken for larger
      core counts.
      
      Trace ID will now be sent in PERF_RECORD_AUX_OUTPUT_HW_ID record.
      
      Legacy ID algorithm renamed and retained for limited backward
      compatibility use.
      Reviewed-by: default avatarJames Clark <james.clark@arm.com>
      Signed-off-by: default avatarMike Leach <mike.leach@linaro.org>
      Acked-by: default avatarSuzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Darren Hart <darren@os.amperecomputing.com>
      Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20230331055645.26918-2-mike.leach@linaro.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e5fa5b41
    • Mike Leach's avatar
      perf cs-etm: Move mapping of Trace ID and cpu into helper function · 09277295
      Mike Leach authored
      The information to associate Trace ID and CPU will be changing.
      
      Drivers will start outputting this as a hardware ID packet in the data
      file which if present will be used in preference to the AUXINFO values.
      
      To prepare for this we provide a helper functions to do the individual ID
      mapping, and one to extract the IDs from the completed metadata blocks.
      Reviewed-by: default avatarJames Clark <james.clark@arm.com>
      Signed-off-by: default avatarMike Leach <mike.leach@linaro.org>
      Acked-by: default avatarSuzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Darren Hart <darren@os.amperecomputing.com>
      Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20230331055645.26918-2-mike.leach@linaro.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      09277295
    • Namhyung Kim's avatar
      perf lock contention: Show detail failure reason for BPF · 84c3a2bb
      Namhyung Kim authored
      It can fail to collect lock stat from BPF for various reasons.  For
      example, I've got a report that sometimes time calculation seems wrong
      in case of contended spinlocks.  I suspect the time delta went negative
      for some reason.
      
      Count them separately and show in the output like below:
      
      $ sudo perf lock contention -abE5 sleep 10
       contended   total wait     max wait     avg wait         type   caller
      
              13    785.61 us     79.36 us     60.43 us     spinlock   remove_wait_queue+0x14
              10    469.02 us     87.51 us     46.90 us     spinlock   prepare_to_wait+0x27
               9    289.09 us     69.08 us     32.12 us     spinlock   finish_wait+0x36
             114    251.05 us      8.56 us      2.20 us     spinlock   try_to_wake_up+0x1f5
             132    188.63 us      5.01 us      1.43 us     spinlock   __wake_up_common_lock+0x62
      
      === output for debug ===
      
      bad: 1, total: 279
      bad rate: 0.36 %
      histogram of failure reasons
             task: 1
            stack: 0
             time: 0
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Hao Luo <haoluo@google.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: bpf@vger.kernel.org
      Link: https://lore.kernel.org/r/20230327225711.245738-1-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      84c3a2bb