1. 09 Nov, 2023 1 commit
    • Kan Liang's avatar
      perf tools: Add branch counter knob · 9fbb4b02
      Kan Liang authored
      Add a new branch filter, "counter", for the branch counter option. It is
      used to mark the events which should be logged in the branch. If it is
      applied with the -j option, the counters of all the events should be
      logged in the branch. If the legacy kernel doesn't support the new
      branch sample type, switching off the branch counter filter.
      
      The stored counter values in each branch are displayed right after the
      regular branch stack information via perf report -D.
      
      Usage examples:
      
        # perf record -e "{branch-instructions,branch-misses}:S" -j any,counter
      
      Only the first event, branch-instructions, collect the LBR. Both
      branch-instructions and branch-misses are marked as logged events.  The
      occurrences information of them can be found in the branch stack
      extension space of each branch.
      
        # perf record -e "{cpu/branch-instructions,branch_type=any/,cpu/branch-misses,branch_type=counter/}"
      
      Only the first event, branch-instructions, collect the LBR. Only the
      branch-misses event is marked as a logged event.
      
      Committer notes:
      
      I noticed 'perf test "Sample parsing"' failing, reported to the list and
      Kan provided a patch that checks if the evsel has a leader and that
      evsel->evlist is set, the comment in the source code further explains
      it.
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tinghao Zhang <tinghao.zhang@intel.com>
      Link: https://lore.kernel.org/r/20231025201626.3000228-8-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9fbb4b02
  2. 06 Nov, 2023 2 commits
  3. 03 Nov, 2023 2 commits
  4. 31 Oct, 2023 1 commit
  5. 30 Oct, 2023 1 commit
  6. 28 Oct, 2023 13 commits
  7. 26 Oct, 2023 3 commits
  8. 25 Oct, 2023 17 commits
    • Ian Rogers's avatar
      perf vendor events intel: Fix broadwellde tma_info_system_dram_bw_use metric · 3779416e
      Ian Rogers authored
      Broadwell-de has a consumer core and server uncore. The uncore_arb PMU
      isn't present and the broadwellx style cbox PMU should be used
      instead. Fix the tma_info_system_dram_bw_use metric to use the server
      metric rather than client.
      
      The associated converter script fix is in:
      https://github.com/intel/perfmon/pull/111
      
      Fixes: 7d124303 ("perf vendor events intel: Update broadwell variant events/metrics")
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Caleb Biggers <caleb.biggers@intel.com>
      Cc: Perry Taylor <perry.taylor@intel.com>
      Link: https://lore.kernel.org/r/20230926031034.1201145-1-irogers@google.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      3779416e
    • Ian Rogers's avatar
      perf mem_info: Add and use map_symbol__exit and addr_map_symbol__exit · 56e144fe
      Ian Rogers authored
      Fix leak where mem_info__put wouldn't release the maps/map as used by
      perf mem. Add exit functions and use elsewhere that the maps and map
      are released.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Sandipan Das <sandipan.das@amd.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: liuwenyu <liuwenyu7@huawei.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Yanteng Si <siyanteng@loongson.cn>
      Cc: Liam Howlett <liam.howlett@oracle.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Link: https://lore.kernel.org/r/20231024222353.3024098-12-irogers@google.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      56e144fe
    • Ian Rogers's avatar
      perf callchain: Minor layout changes to callchain_list · dec07fe5
      Ian Rogers authored
      Avoid 6 byte hole for padding. Place more frequently used fields
      first in an attempt to use just 1 cacheline in the common case.
      
      Before:
      ```
      struct callchain_list {
              u64                        ip;                   /*     0     8 */
              struct map_symbol          ms;                   /*     8    24 */
              struct {
                      _Bool              unfolded;             /*    32     1 */
                      _Bool              has_children;         /*    33     1 */
              };                                               /*    32     2 */
      
              /* XXX 6 bytes hole, try to pack */
      
              u64                        branch_count;         /*    40     8 */
              u64                        from_count;           /*    48     8 */
              u64                        predicted_count;      /*    56     8 */
              /* --- cacheline 1 boundary (64 bytes) --- */
              u64                        abort_count;          /*    64     8 */
              u64                        cycles_count;         /*    72     8 */
              u64                        iter_count;           /*    80     8 */
              u64                        iter_cycles;          /*    88     8 */
              struct branch_type_stat *  brtype_stat;          /*    96     8 */
              const char  *              srcline;              /*   104     8 */
              struct list_head           list;                 /*   112    16 */
      
              /* size: 128, cachelines: 2, members: 13 */
              /* sum members: 122, holes: 1, sum holes: 6 */
      };
      ```
      
      After:
      ```
      struct callchain_list {
              struct list_head           list;                 /*     0    16 */
              u64                        ip;                   /*    16     8 */
              struct map_symbol          ms;                   /*    24    24 */
              const char  *              srcline;              /*    48     8 */
              u64                        branch_count;         /*    56     8 */
              /* --- cacheline 1 boundary (64 bytes) --- */
              u64                        from_count;           /*    64     8 */
              u64                        cycles_count;         /*    72     8 */
              u64                        iter_count;           /*    80     8 */
              u64                        iter_cycles;          /*    88     8 */
              struct branch_type_stat *  brtype_stat;          /*    96     8 */
              u64                        predicted_count;      /*   104     8 */
              u64                        abort_count;          /*   112     8 */
              struct {
                      _Bool              unfolded;             /*   120     1 */
                      _Bool              has_children;         /*   121     1 */
              };                                               /*   120     2 */
      
              /* size: 128, cachelines: 2, members: 13 */
              /* padding: 6 */
      };
      ```
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Sandipan Das <sandipan.das@amd.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: liuwenyu <liuwenyu7@huawei.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Yanteng Si <siyanteng@loongson.cn>
      Cc: Liam Howlett <liam.howlett@oracle.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Link: https://lore.kernel.org/r/20231024222353.3024098-11-irogers@google.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      dec07fe5
    • Ian Rogers's avatar
      perf callchain: Make brtype_stat in callchain_list optional · 6ba29fbb
      Ian Rogers authored
      struct callchain_list is 352bytes in size, 232 of which are
      brtype_stat. brtype_stat is only used for certain callchain_list
      items so make it optional, allocating when necessary. So that
      printing doesn't need to deal with an optional brtype_stat, pass
      an empty/zero version.
      
      Before:
      ```
      struct callchain_list {
              u64                        ip;                   /*     0     8 */
              struct map_symbol          ms;                   /*     8    24 */
              struct {
                      _Bool              unfolded;             /*    32     1 */
                      _Bool              has_children;         /*    33     1 */
              };                                               /*    32     2 */
      
              /* XXX 6 bytes hole, try to pack */
      
              u64                        branch_count;         /*    40     8 */
              u64                        from_count;           /*    48     8 */
              u64                        predicted_count;      /*    56     8 */
              /* --- cacheline 1 boundary (64 bytes) --- */
              u64                        abort_count;          /*    64     8 */
              u64                        cycles_count;         /*    72     8 */
              u64                        iter_count;           /*    80     8 */
              u64                        iter_cycles;          /*    88     8 */
              struct branch_type_stat    brtype_stat;          /*    96   232 */
              /* --- cacheline 5 boundary (320 bytes) was 8 bytes ago --- */
              const char  *              srcline;              /*   328     8 */
              struct list_head           list;                 /*   336    16 */
      
              /* size: 352, cachelines: 6, members: 13 */
              /* sum members: 346, holes: 1, sum holes: 6 */
              /* last cacheline: 32 bytes */
      };
      ```
      
      After:
      ```
      struct callchain_list {
              u64                        ip;                   /*     0     8 */
              struct map_symbol          ms;                   /*     8    24 */
              struct {
                      _Bool              unfolded;             /*    32     1 */
                      _Bool              has_children;         /*    33     1 */
              };                                               /*    32     2 */
      
              /* XXX 6 bytes hole, try to pack */
      
              u64                        branch_count;         /*    40     8 */
              u64                        from_count;           /*    48     8 */
              u64                        predicted_count;      /*    56     8 */
              /* --- cacheline 1 boundary (64 bytes) --- */
              u64                        abort_count;          /*    64     8 */
              u64                        cycles_count;         /*    72     8 */
              u64                        iter_count;           /*    80     8 */
              u64                        iter_cycles;          /*    88     8 */
              struct branch_type_stat *  brtype_stat;          /*    96     8 */
              const char  *              srcline;              /*   104     8 */
              struct list_head           list;                 /*   112    16 */
      
              /* size: 128, cachelines: 2, members: 13 */
              /* sum members: 122, holes: 1, sum holes: 6 */
      };
      ```
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Sandipan Das <sandipan.das@amd.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: liuwenyu <liuwenyu7@huawei.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Yanteng Si <siyanteng@loongson.cn>
      Cc: Liam Howlett <liam.howlett@oracle.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Link: https://lore.kernel.org/r/20231024222353.3024098-10-irogers@google.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      6ba29fbb
    • Ian Rogers's avatar
      perf callchain: Make display use of branch_type_stat const · d47d876d
      Ian Rogers authored
      Display code doesn't modify the branch_type_stat so switch uses to
      const. This is done to aid refactoring struct callchain_list where
      current the branch_type_stat is embedded even if not used.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Sandipan Das <sandipan.das@amd.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: liuwenyu <liuwenyu7@huawei.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Yanteng Si <siyanteng@loongson.cn>
      Cc: Liam Howlett <liam.howlett@oracle.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Link: https://lore.kernel.org/r/20231024222353.3024098-9-irogers@google.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      d47d876d
    • Ian Rogers's avatar
      perf offcpu: Add missed btf_free · 67a3ebf1
      Ian Rogers authored
      Caught by address/leak sanitizer.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Sandipan Das <sandipan.das@amd.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: liuwenyu <liuwenyu7@huawei.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Yanteng Si <siyanteng@loongson.cn>
      Cc: Liam Howlett <liam.howlett@oracle.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Link: https://lore.kernel.org/r/20231024222353.3024098-8-irogers@google.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      67a3ebf1
    • Ian Rogers's avatar
      perf threads: Remove unused dead thread list · 7b2e444b
      Ian Rogers authored
      Commit 40826c45 ("perf thread: Remove notion of dead threads")
      removed dead threads but the list head wasn't removed. Remove it here.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Sandipan Das <sandipan.das@amd.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: liuwenyu <liuwenyu7@huawei.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Yanteng Si <siyanteng@loongson.cn>
      Cc: Liam Howlett <liam.howlett@oracle.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Link: https://lore.kernel.org/r/20231024222353.3024098-7-irogers@google.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      7b2e444b
    • Ian Rogers's avatar
      perf hist: Add missing puts to hist__account_cycles · c1149037
      Ian Rogers authored
      Caught using reference count checking on perf top with
      "--call-graph=lbr". After this no memory leaks were detected.
      
      Fixes: 57849998 ("perf report: Add processing for cycle histograms")
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Sandipan Das <sandipan.das@amd.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: liuwenyu <liuwenyu7@huawei.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Yanteng Si <siyanteng@loongson.cn>
      Cc: Liam Howlett <liam.howlett@oracle.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Link: https://lore.kernel.org/r/20231024222353.3024098-6-irogers@google.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      c1149037
    • Ian Rogers's avatar
      libperf rc_check: Add RC_CHK_EQUAL · 78c32f4c
      Ian Rogers authored
      Comparing pointers with reference count checking is tricky to avoid a
      SEGV. Add a convenience macro to simplify and use.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Sandipan Das <sandipan.das@amd.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: liuwenyu <liuwenyu7@huawei.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Yanteng Si <siyanteng@loongson.cn>
      Cc: Liam Howlett <liam.howlett@oracle.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Link: https://lore.kernel.org/r/20231024222353.3024098-5-irogers@google.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      78c32f4c
    • Ian Rogers's avatar
      libperf rc_check: Make implicit enabling work for GCC · 75265320
      Ian Rogers authored
      Make the implicit REFCOUNT_CHECKING robust to when building with GCC.
      
      Fixes: 9be6ab18 ("libperf rc_check: Enable implicitly with sanitizers")
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Sandipan Das <sandipan.das@amd.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: liuwenyu <liuwenyu7@huawei.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Yanteng Si <siyanteng@loongson.cn>
      Cc: Liam Howlett <liam.howlett@oracle.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Link: https://lore.kernel.org/r/20231024222353.3024098-4-irogers@google.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      75265320
    • Ian Rogers's avatar
      perf machine: Avoid out of bounds LBR memory read · ab8ce150
      Ian Rogers authored
      Running perf top with address sanitizer and "--call-graph=lbr" fails
      due to reading sample 0 when no samples exist. Add a guard to prevent
      this.
      
      Fixes: e2b23483 ("perf machine: Factor out lbr_callchain_add_lbr_ip()")
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Sandipan Das <sandipan.das@amd.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: liuwenyu <liuwenyu7@huawei.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Yanteng Si <siyanteng@loongson.cn>
      Cc: Liam Howlett <liam.howlett@oracle.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Link: https://lore.kernel.org/r/20231024222353.3024098-3-irogers@google.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      ab8ce150
    • Ian Rogers's avatar
      perf rwsem: Add debug mode that uses a mutex · 7a8f349e
      Ian Rogers authored
      Mutex error check will capture trying to take the lock recursively and
      other problems that rwlock won't. At the expense of concurrency, adda
      debug mode that uses a mutex in place of a rwsem.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Sandipan Das <sandipan.das@amd.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: liuwenyu <liuwenyu7@huawei.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Yanteng Si <siyanteng@loongson.cn>
      Cc: Liam Howlett <liam.howlett@oracle.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Link: https://lore.kernel.org/r/20231024222353.3024098-2-irogers@google.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      7a8f349e
    • Arnaldo Carvalho de Melo's avatar
      perf build: Address stray '\' before # that is warned about since grep 3.8 · b27778ed
      Arnaldo Carvalho de Melo authored
      To address this grep 3.8 warning:
      
        grep: warning: stray \ before #
      
      We needed to remove the '' around the grep expression and keep the \
      before # so that it is escaped by the $(shell grep ...) and thus doesn't
      get to grep.
      
      We need that \ before the #, otherwise we get this:
      
        Makefile.perf:364: *** unterminated call to function 'shell': missing ')'.  Stop.
      
      As everything after the # will be considered a comment.
      
      Removing the single quotes needs some more escaping so that _some_ of
      the escaped chars gets to grep, like the '\|' that becomes '\\\|´.
      
      Running on debian:10, where there is no libtraceevent-devel available,
      we get:
      
        Makefile.perf:367: *** PYTHON_EXT_SRCS= util/python.c ../lib/ctype.c util/cap.c util/evlist.c util/evsel.c util/evsel_fprintf.c util/perf_event_attr_fprintf.c util/cpumap.c util/memswap.c util/mmap.c util/namespaces.c ../lib/bitmap.c ../lib/find_bit.c ../lib/list_sort.c ../lib/hweight.c ../lib/string.c ../lib/vsprintf.c util/thread_map.c util/util.c util/cgroup.c util/parse-branch-options.c util/rblist.c util/counts.c util/print_binary.c util/strlist.c ../lib/rbtree.c util/string.c util/symbol_fprintf.c util/units.c util/affinity.c util/rwsem.c util/hashmap.c util/perf_regs.c util/fncache.c util/perf-regs-arch/perf_regs_aarch64.c util/perf-regs-arch/perf_regs_arm.c util/perf-regs-arch/perf_regs_csky.c util/perf-regs-arch/perf_regs_loongarch.c util/perf-regs-arch/perf_regs_mips.c util/perf-regs-arch/perf_regs_powerpc.c util/perf-regs-arch/perf_regs_riscv.c util/perf-regs-arch/perf_regs_s390.c util/perf-regs-arch/perf_regs_x86.c.  Stop.
        make[1]: *** [Makefile.perf:242: sub-make] Error 2
      
      I.e. both the comments and the util/trace-event.c were removed.
      
      When using:
      
      msg := $(error PYTHON_EXT_SRCS=$(PYTHON_EXT_SRCS))
      
      While on the more recent fedora:38, with the new grep and make packages
      and libtraceevent-devel installed:
      
        Makefile.perf:367: *** PYTHON_EXT_SRCS= util/python.c ../lib/ctype.c util/cap.c util/evlist.c util/evsel.c util/evsel_fprintf.c util/perf_event_attr_fprintf.c util/cpumap.c util/memswap.c util/mmap.c util/namespaces.c ../lib/bitmap.c ../lib/find_bit.c ../lib/list_sort.c ../lib/hweight.c ../lib/string.c ../lib/vsprintf.c util/thread_map.c util/util.c util/cgroup.c util/parse-branch-options.c util/rblist.c util/counts.c util/print_binary.c util/strlist.c util/trace-event.c ../lib/rbtree.c util/string.c util/symbol_fprintf.c util/units.c util/affinity.c util/rwsem.c util/hashmap.c util/perf_regs.c util/fncache.c util/perf-regs-arch/perf_regs_aarch64.c util/perf-regs-arch/perf_regs_arm.c util/perf-regs-arch/perf_regs_csky.c util/perf-regs-arch/perf_regs_loongarch.c util/perf-regs-arch/perf_regs_mips.c util/perf-regs-arch/perf_regs_powerpc.c util/perf-regs-arch/perf_regs_riscv.c util/perf-regs-arch/perf_regs_s390.c util/perf-regs-arch/perf_regs_x86.c.  Stop.
        make[1]: *** [Makefile.perf:242: sub-make] Error 2
        make: *** [Makefile:113: install-bin] Error 2
        make: Leaving directory '/home/acme/git/perf-tools-next/tools/perf'
        $
      
      I.e. only the comments were removed.
      
      If we build it on the same fedora:38 system, but using NO_LIBTRACEEVENT=1
      
        $ make NO_LIBTRACEEVENT=1 CORESIGHT=1 O=/tmp/build/$(basename $PWD) -C tools/perf install-bin
        Makefile.perf:367: *** PYTHON_EXT_SRCS= util/python.c ../lib/ctype.c util/cap.c util/evlist.c util/evsel.c util/evsel_fprintf.c util/perf_event_attr_fprintf.c util/cpumap.c util/memswap.c util/mmap.c util/namespaces.c ../lib/bitmap.c ../lib/find_bit.c ../lib/list_sort.c ../lib/hweight.c ../lib/string.c ../lib/vsprintf.c util/thread_map.c util/util.c util/cgroup.c util/parse-branch-options.c util/rblist.c util/counts.c util/print_binary.c util/strlist.c ../lib/rbtree.c util/string.c util/symbol_fprintf.c util/units.c util/affinity.c util/rwsem.c util/hashmap.c util/perf_regs.c util/fncache.c util/perf-regs-arch/perf_regs_aarch64.c util/perf-regs-arch/perf_regs_arm.c util/perf-regs-arch/perf_regs_csky.c util/perf-regs-arch/perf_regs_loongarch.c util/perf-regs-arch/perf_regs_mips.c util/perf-regs-arch/perf_regs_powerpc.c util/perf-regs-arch/perf_regs_riscv.c util/perf-regs-arch/perf_regs_s390.c util/perf-regs-arch/perf_regs_x86.c.  Stop.
        make[1]: *** [Makefile.perf:242: sub-make] Error 2
        make: *** [Makefile:113: install-bin] Error 2
        make: Leaving directory '/home/acme/git/perf-tools-next/tools/perf'
        $
      
      Both comments and the util/trace-event.c file removed.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Link: https://lore.kernel.org/r/ZTj6mfM9UqY2DggC@kernel.orgSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      b27778ed
    • Namhyung Kim's avatar
      perf report: Fix hierarchy mode on pipe input · a6e4a4a1
      Namhyung Kim authored
      The hierarchy mode needs to setup output formats for each evsel.
      Normally setup_sorting() handles this at the beginning, but it cannot
      do that if data comes from a pipe since there's no evsel info before
      reading the data.  And then perf report cannot process the samples
      in hierarchy mode and think as if there's no sample.
      
      Let's check the condition and setup the output formats after reading
      data so that it can find evsels.
      
      Before:
      
        $ ./perf record -o- true | ./perf report -i- --hierarchy -q
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.000 MB - ]
        Error:
        The - data has no samples!
      
      After:
      
        $ ./perf record -o- true | ./perf report -i- --hierarchy -q
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.000 MB - ]
            94.76%        true
               94.76%        [kernel.kallsyms]
                  94.76%        [k] filemap_fault
             5.24%        perf-ex
                5.24%        [kernel.kallsyms]
                   5.06%        [k] __memset
                   0.18%        [k] native_write_msr
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Link: https://lore.kernel.org/r/20231025003121.2811738-1-namhyung@kernel.orgSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      a6e4a4a1
    • Namhyung Kim's avatar
      perf lock contention: Use per-cpu array map for spinlocks · b5711042
      Namhyung Kim authored
      Currently lock contention timestamp is maintained in a hash map keyed by
      pid.  That means it needs to get and release a map element (which is
      proctected by spinlock!) on each contention begin and end pair.  This
      can impact on performance if there are a lot of contention (usually from
      spinlocks).
      
      It used to go with task local storage but it had an issue on memory
      allocation in some critical paths.  Although it's addressed in recent
      kernels IIUC, the tool should support old kernels too.  So it cannot
      simply switch to the task local storage at least for now.
      
      As spinlocks create lots of contention and they disabled preemption
      during the spinning, it can use per-cpu array to keep the timestamp to
      avoid overhead in hashmap update and delete.
      
      In contention_begin, it's easy to check the lock types since it can see
      the flags.  But contention_end cannot see it.  So let's try to per-cpu
      array first (unconditionally) if it has an active element (lock != 0).
      Then it should be used and per-task tstamp map should not be used until
      the per-cpu array element is cleared which means nested spinlock
      contention (if any) was finished and it nows see (the outer) lock.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Hao Luo <haoluo@google.com>
      Cc: Song Liu <song@kernel.org>
      Cc: bpf@vger.kernel.org
      Link: https://lore.kernel.org/r/20231020204741.1869520-3-namhyung@kernel.org
      b5711042
    • Namhyung Kim's avatar
      perf lock contention: Check race in tstamp elem creation · 6a070573
      Namhyung Kim authored
      When pelem is NULL, it'd create a new entry with zero data.  But it
      might be preempted by IRQ/NMI just before calling bpf_map_update_elem()
      then there's a chance to call it twice for the same pid.  So it'd be
      better to use BPF_NOEXIST flag and check the return value to prevent
      the race.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Hao Luo <haoluo@google.com>
      Cc: Song Liu <song@kernel.org>
      Cc: bpf@vger.kernel.org
      Link: https://lore.kernel.org/r/20231020204741.1869520-2-namhyung@kernel.org
      6a070573
    • Namhyung Kim's avatar
      perf lock contention: Clear lock addr after use · d99317f2
      Namhyung Kim authored
      It checks the current lock to calculated the delta of contention time.
      The address is saved in the tstamp map which is allocated at begining of
      contention and released at end of contention.
      
      But it's possible for bpf_map_delete_elem() to fail.  In that case, the
      element in the tstamp map kept for the current lock and it makes the
      next contention for the same lock tracked incorrectly.  Specificially
      the next contention begin will see the existing element for the task and
      it'd just return.  Then the next contention end will see the element and
      calculate the time using the timestamp for the previous begin.
      
      This can result in a large value for two small contentions happened from
      time to time.  Let's clear the lock address so that it can be updated
      next time even if the bpf_map_delete_elem() failed.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Hao Luo <haoluo@google.com>
      Cc: Song Liu <song@kernel.org>
      Cc: bpf@vger.kernel.org
      Link: https://lore.kernel.org/r/20231020204741.1869520-1-namhyung@kernel.org
      d99317f2