- 10 Nov, 2022 1 commit
-
-
Eduard Zingerman authored
An update for libbpf's hashmap interface from void* -> void* to a polymorphic one, allowing both long and void* keys and values. This simplifies many use cases in libbpf as hashmaps there are mostly integer to integer. Perf copies hashmap implementation from libbpf and has to be updated as well. Changes to libbpf, selftests/bpf and perf are packed as a single commit to avoid compilation issues with any future bisect. Polymorphic interface is acheived by hiding hashmap interface functions behind auxiliary macros that take care of necessary type casts, for example: #define hashmap_cast_ptr(p) \ ({ \ _Static_assert((p) == NULL || sizeof(*(p)) == sizeof(long),\ #p " pointee should be a long-sized integer or a pointer"); \ (long *)(p); \ }) bool hashmap_find(const struct hashmap *map, long key, long *value); #define hashmap__find(map, key, value) \ hashmap_find((map), (long)(key), hashmap_cast_ptr(value)) - hashmap__find macro casts key and value parameters to long and long* respectively - hashmap_cast_ptr ensures that value pointer points to a memory of appropriate size. This hack was suggested by Andrii Nakryiko in [1]. This is a follow up for [2]. [1] https://lore.kernel.org/bpf/CAEf4BzZ8KFneEJxFAaNCCFPGqp20hSpS2aCj76uRk3-qZUH5xg@mail.gmail.com/ [2] https://lore.kernel.org/bpf/af1facf9-7bc8-8a3d-0db4-7b3f333589a2@meta.com/T/#m65b28f1d6d969fcd318b556db6a3ad499a42607d Signed-off-by:
Eduard Zingerman <eddyz87@gmail.com> Signed-off-by:
Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20221109142611.879983-2-eddyz87@gmail.com
-
- 06 Oct, 2022 1 commit
-
-
Namhyung Kim authored
For system-wide evsels, the thread map should be dummy - i.e. it has a single entry of -1. But the code guarantees such a thread map, so no need to handle it specially. No functional change intended. Reviewed-by:
Adrian Hunter <adrian.hunter@intel.com> Signed-off-by:
Namhyung Kim <namhyung@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20221003204647.1481128-6-namhyung@kernel.org Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 04 Oct, 2022 3 commits
-
-
Adrian Hunter authored
Add debug messages to enable scripts to track aspects of 'perf record' behaviour. The messages will be consumed after 'perf record' has run, with the exception of "perf record has started" which is consequently flushed. Put comments so developers know which messages are also being used by test scripts. Signed-off-by:
Adrian Hunter <adrian.hunter@intel.com> Acked-by:
Namhyung Kim <namhyung@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20220912083412.7058-11-adrian.hunter@intel.com Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
Namhyung Kim authored
As we want to see the number of lost samples in the perf report, set the LOST format when it configs evsel. On old kernels, it'd fallback to disable it. Reviewed-by:
Adrian Hunter <adrian.hunter@intel.com> Signed-off-by:
Namhyung Kim <namhyung@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220901195739.668604-3-namhyung@kernel.org Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
Ian Rogers authored
When libbpf is present the build uses definitions in libbpf hashmap.c, however, libbpf's hashmap.h wasn't being used. Switch to using the correct hashmap.h dependent on the define HAVE_LIBBPF_SUPPORT. This was the original intent in: https://lore.kernel.org/lkml/20200515221732.44078-8-irogers@google.com/ Signed-off-by:
Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20220824050604.352156-1-irogers@google.com Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 19 Aug, 2022 1 commit
-
-
Namhyung Kim authored
The recent kernel added lost count can be read from either read(2) or ring buffer data with PERF_SAMPLE_READ. As it's a variable length data we need to access it according to the format info. But for perf tools use cases, PERF_FORMAT_ID is always set. So we can only check PERF_FORMAT_LOST bit to determine the data format. Add sample_read_value_size() and next_sample_read_value() helpers to make it a bit easier to access. Use them in all places where it reads the struct sample_read_value. Signed-off-by:
Namhyung Kim <namhyung@kernel.org> Acked-by:
Jiri Olsa <jolsa@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220819003644.508916-5-namhyung@kernel.org Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 29 Jul, 2022 1 commit
-
-
Kan Liang authored
The commit 55bcf6ef ("perf: Extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE") extends the two types to become PMU aware types for a hybrid system. However, current evsel__hw_name doesn't take the PMU type into account. It mistakenly returns the "unknown-hardware" for the hardware event with a specific PMU type. Add an arch specific arch_evsel__hw_name() to specially handle the PMU aware hardware event. Currently, the extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE is only supported by X86. Only implement the specific arch_evsel__hw_name() for X86 in the patch. Nothing is changed for the other archs. Signed-off-by:
Kan Liang <kan.liang@linux.intel.com> Acked-by:
Ian Rogers <irogers@google.com> Acked-by:
Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220721065706.2886112-3-zhengjun.xing@linux.intel.com Signed-off-by:
Xing Zhengjun <zhengjun.xing@linux.intel.com> Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 20 Jul, 2022 2 commits
-
-
Adrian Hunter authored
When parsing a sample with a sample ID, copy machine_pid and vcpu from perf_sample_id to perf_sample. Note, machine_pid will be zero when unused, so only a non-zero value represents a guest machine. vcpu should be ignored if machine_pid is zero. Note also, machine_pid is used with events that have come from injecting a guest perf.data file, however guest events recorded on the host (i.e. using perf kvm) have the (QEMU) hypervisor process pid to identify them - refer machines__find_for_cpumode(). Signed-off-by:
Adrian Hunter <adrian.hunter@intel.com> Acked-by:
Ian Rogers <irogers@google.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: kvm@vger.kernel.org Link: https://lore.kernel.org/r/20220711093218.10967-14-adrian.hunter@intel.com Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
Factor out evsel__id_hdr_size() so it can be reused. This is needed by perf inject. When injecting events from a guest perf.data file, there is a possibility that the sample ID numbers conflict. To re-write an ID sample, the old one needs to be removed first, which means determining how big it is with evsel__id_hdr_size() and then subtracting that from the event size. Signed-off-by:
Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: kvm@vger.kernel.org Link: https://lore.kernel.org/r/20220711093218.10967-6-adrian.hunter@intel.com Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 28 Jun, 2022 1 commit
-
-
Namhyung Kim authored
As offcpu-time event is synthesized at the end, it could not get the all the sample info. Define OFFCPU_SAMPLE_TYPES for allowed ones and mask out others in evsel__config() to prevent parse errors. Because perf sample parsing assumes a specific ordering with the sample types, setting unsupported one would make it fail to read data like perf record -d/--data. Fixes: edc41a10 ("perf record: Enable off-cpu analysis with BPF") Signed-off-by:
Namhyung Kim <namhyung@kernel.org> Cc: Blake Jones <blakejones@google.com> Cc: Hao Luo <haoluo@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: bpf@vger.kernel.org Link: http://lore.kernel.org/lkml/20220624231313.367909-3-namhyung@kernel.org Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 24 Jun, 2022 1 commit
-
-
Ravi Bangoria authored
Samples without an L3 miss are discarded and counter is reset with random value (between 1-15 for fetch PMU and 1-127 for op PMU) when IBS L3 miss filtering is enabled. This causes a sampling period skew but there is no way to reconstruct aggregated sampling period. So print a warning at perf record if user sets l3missonly=1. Ex: # perf record -c 10000 -C 0 -e ibs_op/l3missonly=1/ WARNING: Hw internally resets sampling period when L3 Miss Filtering is enabled and tagged operation does not cause L3 Miss. This causes sampling period skew. Signed-off-by:
Ravi Bangoria <ravi.bangoria@amd.com> Acked-by:
Ian Rogers <irogers@google.com> Acked-by:
Namhyung Kim <namhyung@kernel.org> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <rrichter@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: like.xu.linux@gmail.com Cc: x86@kernel.org Link: http://lore.kernel.org/lkml/20220604044519.594-2-ravi.bangoria@amd.com Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 26 May, 2022 3 commits
-
-
James Clark authored
Architectures can detect availability of extra registers at runtime so use this more complete set for unwinding. This will include the VG register on arm64 in a later commit. If the function isn't implemented then PERF_REGS_MASK is returned and there is no change. Committer notes: Added util/perf_regs.c to tools/perf/util/python-ext-sources so that 'perf test python' passes, i.e. the perf python binding has all the symbols it needs, addressing: $ perf test -v python 19: 'import perf' in python : --- start --- test child forked, pid 2037817 python usage test: "echo "import sys ; sys.path.append('/tmp/build/perf/python'); import perf" | '/usr/bin/python3' " Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: /tmp/build/perf/python/perf.cpython-310-x86_64-linux-gnu.so: undefined symbol: arch__user_reg_mask test child finished with -1 ---- end ---- 'import perf' in python: FAILED! $ Reviewed-by:
Leo Yan <leo.yan@linaro.org> Signed-off-by:
James Clark <james.clark@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Mark Brown <broonie@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20220525154114.718321-4-james.clark@arm.com Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
Namhyung Kim authored
Currently evsel__new_idx() sets more sample_type bits when it finds a BPF-output event. But it should honor what's recorded in the perf data file rather than blindly sets the bits. Otherwise it could lead to a parse error when it recorded with a modified sample_type. Signed-off-by:
Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Blake Jones <blakejones@google.com> Cc: Hao Luo <haoluo@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20220518224725.742882-2-namhyung@kernel.org Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
Uncore events require a CPU i.e. it cannot be -1. The evsel system_wide flag is intended for events that should be on every CPU, which does not make sense for uncore events because uncore events do not map one-to-one with CPUs. These 2 requirements are not exactly the same, so introduce a new flag 'requires_cpu' for the uncore case. Signed-off-by:
Adrian Hunter <adrian.hunter@intel.com> Acked-by:
Ian Rogers <irogers@google.com> Acked-by:
Namhyung Kim <namhyung@kernel.org> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Link: https://lore.kernel.org/r/20220524075436.29144-13-adrian.hunter@intel.com Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 20 May, 2022 1 commit
-
-
Kan Liang authored
If any member in a group has a different cpu mask than the other members, the current perf stat disables group. when the perf metrics topdown events are part of the group, the below <not supported> error will be triggered. $ perf stat -e "{slots,topdown-retiring,uncore_imc_free_running_0/dclk/}" -a sleep 1 WARNING: grouped events cpus do not match, disabling group: anon group { slots, topdown-retiring, uncore_imc_free_running_0/dclk/ } Performance counter stats for 'system wide': 141,465,174 slots <not supported> topdown-retiring 1,605,330,334 uncore_imc_free_running_0/dclk/ The perf metrics topdown events must always be grouped with a slots event as leader. Factor out evsel__remove_from_group() to only remove the regular events from the group. Remove evsel__must_be_in_group(), since no one use it anymore. With the patch, the topdown events aren't broken from the group for the splitting. $ perf stat -e "{slots,topdown-retiring,uncore_imc_free_running_0/dclk/}" -a sleep 1 WARNING: grouped events cpus do not match, disabling group: anon group { slots, topdown-retiring, uncore_imc_free_running_0/dclk/ } Performance counter stats for 'system wide': 346,110,588 slots 124,608,256 topdown-retiring 1,606,869,976 uncore_imc_free_running_0/dclk/ 1.003877592 seconds time elapsed Fixes: a9a17902 ("perf stat: Ensure group is defined on top of the same cpu mask") Signed-off-by:
Kan Liang <kan.liang@linux.intel.com> Acked-by:
Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20220518143900.1493980-3-kan.liang@linux.intel.com Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 17 May, 2022 1 commit
-
-
Ian Rogers authored
On Intel Icelake, topdown events must always be grouped with a slots event as leader. When a metric is parsed a weak group is formed and retried if perf_event_open fails. The retried events aren't grouped breaking the slots leader requirement. This change modifies the weak group "reset" behavior so that topdown events aren't broken from the group for the retry. $ perf stat -e '{slots,topdown-bad-spec,topdown-be-bound,topdown-fe-bound,topdown-retiring,branch-instructions,branch-misses,bus-cycles,cache-misses,cache-references,cpu-cycles,instructions,mem-loads,mem-stores,ref-cycles,baclears.any,ARITH.DIVIDER_ACTIVE}:W' -a sleep 1 Performance counter stats for 'system wide': 47,867,188,483 slots (92.27%) <not supported> topdown-bad-spec <not supported> topdown-be-bound <not supported> topdown-fe-bound <not supported> topdown-retiring 2,173,346,937 branch-instructions (92.27%) 10,540,253 branch-misses # 0.48% of all branches (92.29%) 96,291,140 bus-cycles (92.29%) 6,214,202 cache-misses # 20.120 % of all cache refs (92.29%) 30,886,082 cache-references (76.91%) 11,773,726,641 cpu-cycles (84.62%) 11,807,585,307 instructions # 1.00 insn per cycle (92.31%) 0 mem-loads (92.32%) 2,212,928,573 mem-stores (84.69%) 10,024,403,118 ref-cycles (92.35%) 16,232,978 baclears.any (92.35%) 23,832,633 ARITH.DIVIDER_ACTIVE (84.59%) 0.981070734 seconds time elapsed After: $ perf stat -e '{slots,topdown-bad-spec,topdown-be-bound,topdown-fe-bound,topdown-retiring,branch-instructions,branch-misses,bus-cycles,cache-misses,cache-references,cpu-cycles,instructions,mem-loads,mem-stores,ref-cycles,baclears.any,ARITH.DIVIDER_ACTIVE}:W' -a sleep 1 Performance counter stats for 'system wide': 31040189283 slots (92.27%) 8997514811 topdown-bad-spec # 28.2% bad speculation (92.27%) 10997536028 topdown-be-bound # 34.5% backend bound (92.27%) 4778060526 topdown-fe-bound # 15.0% frontend bound (92.27%) 7086628768 topdown-retiring # 22.2% retiring (92.27%) 1417611942 branch-instructions (92.26%) 5285529 branch-misses # 0.37% of all branches (92.28%) 62922469 bus-cycles (92.29%) 1440708 cache-misses # 8.292 % of all cache refs (92.30%) 17374098 cache-references (76.94%) 8040889520 cpu-cycles (84.63%) 7709992319 instructions # 0.96 insn per cycle (92.32%) 0 mem-loads (92.32%) 1515669558 mem-stores (84.68%) 6542411177 ref-cycles (92.35%) 4154149 baclears.any (92.35%) 20556152 ARITH.DIVIDER_ACTIVE (84.59%) 1.010799593 seconds time elapsed Reviewed-by:
Kan Liang <kan.liang@linux.intel.com> Signed-off-by:
Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Florian Fischer <florian.fischer@muhq.space> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com> Cc: Stephane Eranian <eranian@google.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20220517052724.283874-2-irogers@google.com Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 09 May, 2022 2 commits
-
-
Ian Rogers authored
Convert to and from a string. Fix evsel__tool_name() as array is off-by-1. Support more than just duration_time as a metric-id. Fixes: 75eafc97 ("perf list: Print all available tool events") Signed-off-by:
Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Florian Fischer <florian.fischer@muhq.space> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com> Cc: Stephane Eranian <eranian@google.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20220507053410.3798748-4-irogers@google.com Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
Ian Rogers authored
Remove public definition of evsel__tool_names(). Not used outside util/evsel.c. Signed-off-by:
Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Florian Fischer <florian.fischer@muhq.space> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com> Cc: Stephane Eranian <eranian@google.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20220507053410.3798748-3-irogers@google.com Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 20 Apr, 2022 1 commit
-
-
Florian Fischer authored
Introduce names for the new tool events 'user_time' and 'system_time'. $ perf list ... duration_time [Tool event] user_time [Tool event] system_time [Tool event] ... Committer testing: Before: $ perf list | grep Tool duration_time [Tool event] $ After: $ perf list | grep Tool duration_time [Tool event] user_time [Tool event] system_time [Tool event] $ Signed-off-by:
Florian Fischer <florian.fischer@muhq.space> Tested-by:
Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: http://lore.kernel.org/lkml/20220420174244.1741958-2-florian.fischer@muhq.space Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 26 Mar, 2022 1 commit
-
-
Kim Phillips authored
Improve the error message returned on failed perf_event_open() on AMD systems when using IBS (Instruction-Based Sampling). Output of executing 'perf record -e ibs_op// true' as a non root user BEFORE this patch (perf will add the 'u' modifier at the end to exclude kernel/hypervisor sampling): The sys_perf_event_open() syscall returned with 22 (Invalid argument)for event (ibs_op//u). /bin/dmesg | grep -i perf may provide additional information. Output after: AMD IBS can't exclude kernel events. Try running at a higher privilege level. Output of executing 'sudo perf record -e ibs_op// true' BEFORE this patch: Error: The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (ibs_op//). /bin/dmesg | grep -i perf may provide additional information. Output after: Error: Invalid event (ibs_op//) in per-thread mode, enable system wide with '-a'. Folowing the suggestion: $ sudo perf record -a -e ibs_op// true [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.664 MB perf.data (194 samples) ] $ Signed-off-by:
Kim Phillips <kim.phillips@amd.com> Tested-by:
Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: João Martins <joao.m.martins@oracle.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20220322221517.2510440-12-eranian@google.com Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 22 Mar, 2022 1 commit
-
-
Kim Phillips authored
It's possible to have an evsel and evsel->evlist populated without an evsel->evlist->env, when, e.g., cmd_record is in its error path. Future patches will add support for evsel__open_strerror to be able to customize error messaging based on perf_env__{arch,cpuid}, so let's have evsel__env return &perf_env instead of NULL in that case. Reviewed-by:
Kajol Jain <kjain@linux.ibm.com> Signed-off-by:
Kim Phillips <kim.phillips@amd.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20211004214114.188477-1-kim.phillips@amd.com Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 07 Mar, 2022 1 commit
-
-
James Clark authored
EOPNOTSUPP is a possible return value when branch stacks are requested but they aren't enabled in the kernel or hardware. It's also returned if they aren't supported on the specific event type. The currently printed error message about sampling/overflow-interrupts is not correct in this case. Add a check for branch stacks before sample_period is checked because sample_period is also set (to the default value) when using branch stacks. Before this change (when branch stacks aren't supported): perf record -j any Error: cycles: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat' After this change: perf record -j any Error: cycles: PMU Hardware or event type doesn't support branch stack sampling. Signed-off-by:
James Clark <james.clark@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20220307171917.2555829-2-james.clark@arm.com Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 22 Jan, 2022 2 commits
-
-
German Gomez authored
A previous patch preventing "attr->sample_period" values from being overridden in pfm events changed a related behaviour in arm-spe. Before said patch: perf record -c 10000 -e arm_spe_0// -- sleep 1 Would yield an SPE event with period=10000. After the patch, the period in "-c 10000" was being ignored because the arm-spe code initializes sample_period to a non-zero value. This patch restores the previous behaviour for non-libpfm4 events. Fixes: ae5dcc8a (“perf record: Prevent override of attr->sample_period for libpfm4 events”) Reported-by:
Chase Conklin <chase.conklin@arm.com> Signed-off-by:
German Gomez <german.gomez@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20220118144054.2541-1-german.gomez@arm.com Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
Ian Rogers authored
Switch from directly accessing the perf_cpu_map to using the appropriate libperf API when possible. Using the API simplifies the job of refactoring use of perf_cpu_map. Signed-off-by:
Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: André Almeida <andrealmeid@collabora.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miaoqian Lin <linmq006@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yury Norov <yury.norov@gmail.com> Link: http://lore.kernel.org/lkml/20220122045811.3402706-3-irogers@google.com Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 12 Jan, 2022 7 commits
-
-
Ian Rogers authored
A common problem is confusing CPU map indices with the CPU, by wrapping the CPU with a struct then this is avoided. This approach is similar to atomic_t. Committer notes: To make it build with BUILD_BPF_SKEL=1 these files needed the conversions to 'struct perf_cpu' usage: tools/perf/util/bpf_counter.c tools/perf/util/bpf_counter_cgroup.c tools/perf/util/bpf_ftrace.c Also perf_env__get_cpu() was removed back in "perf cpumap: Switch cpu_map__build_map to cpu function". Additionally these needed to be fixed for the ARM builds to complete: tools/perf/arch/arm/util/cs-etm.c tools/perf/arch/arm64/util/pmu.c Suggested-by:
John Garry <john.garry@huawei.com> Signed-off-by:
Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Vineet Singh <vineet.singh@intel.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: zhengjun.xing@intel.com Link: https://lore.kernel.org/r/20220105061351.120843-49-irogers@google.com Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
Ian Rogers authored
Make naming less error prone. Signed-off-by:
Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Vineet Singh <vineet.singh@intel.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: zhengjun.xing@intel.com Link: https://lore.kernel.org/r/20220105061351.120843-40-irogers@google.com Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
Ian Rogers authored
Move to being static. Signed-off-by:
Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Vineet Singh <vineet.singh@intel.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: zhengjun.xing@intel.com Link: https://lore.kernel.org/r/20220105061351.120843-39-irogers@google.com Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
Ian Rogers authored
CPU is really a cpu map index, change names to make code more intention revealing. Signed-off-by:
Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Vineet Singh <vineet.singh@intel.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: zhengjun.xing@intel.com Link: https://lore.kernel.org/r/20220105061351.120843-38-irogers@google.com Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
Ian Rogers authored
Switch from cpu to cpu_map_idx to reduce confusion. Signed-off-by:
Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Vineet Singh <vineet.singh@intel.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: zhengjun.xing@intel.com Link: https://lore.kernel.org/r/20220105061351.120843-37-irogers@google.com Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
Ian Rogers authored
Passing the number of CPUs and threads allows for an evsel's counts to be mismatched to its cpu map. To avoid this always derive the counts size from the cpu map. Change openat-syscall-all-cpus to set the cpus to allow for this to work. Signed-off-by:
Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Vineet Singh <vineet.singh@intel.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: zhengjun.xing@intel.com Link: https://lore.kernel.org/r/20220105061351.120843-27-irogers@google.com Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
Ian Rogers authored
When a group has multiple events and the leader fails it can yield errors like: $ perf stat -e '{uncore_imc/cas_count_read/},instructions' /bin/true Error: The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (uncore_imc/cas_count_read/). /bin/dmesg | grep -i perf may provide additional information. However, when not the group leader <not supported> is given: $ perf stat -e '{instructions,uncore_imc/cas_count_read/}' /bin/true ... 1,619,057 instructions <not supported> MiB uncore_imc/cas_count_read/ This is necessary because get_group_fd will fail if the leader fails and is the direct result of the check on line 750 of builtin-stat.c in stat_handle_error that returns COUNTER_SKIP for the latter case. This patch improves the error message to: $ perf stat -e '{uncore_imc/cas_count_read/},instructions' /bin/true Error: Invalid event (uncore_imc/cas_count_read/) in per-thread mode, enable system wide with '-a'. v2. Changed the test to use !target__has_cpu as suggested by Namhyung Kim. Signed-off-by:
Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20211223183948.3423989-2-irogers@google.com Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 08 Dec, 2021 1 commit
-
-
Shunsuke Nakamura authored
Move perf_counts_values__scale() from tools/perf/util to tools/lib/perf so that it can be used with libperf. Committer notes: As noted by Jiri, use __s8 instead of s8 on the exported function. Signed-off-by:
Shunsuke Nakamura <nakamura.shun@fujitsu.com> Acked-by:
Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20211109085831.3770594-2-nakamura.shun@fujitsu.com Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 06 Dec, 2021 1 commit
-
-
Masami Hiramatsu authored
Add new '__rel_loc' dynamic data location attribute support. This type attribute is similar to the '__data_loc' but records the offset from the field itself. The libtraceevent adds TEP_FIELD_IS_RELATIVE to the 'tep_format_field::flags' with TEP_FIELD_IS_DYNAMIC for'__rel_loc'. Link: https://lkml.kernel.org/r/163757344810.510314.12449413842136229871.stgit@devnote2 Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Tom Zanussi <zanussi@kernel.org> Signed-off-by:
Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by:
Steven Rostedt (VMware) <rostedt@goodmis.org>
-
- 18 Nov, 2021 1 commit
-
-
Ian Rogers authored
unit may have a strdup pointer or be to a literal, consequently memory assocciated with it isn't freed. Change it so the unit is always strdup and so the memory can be safely freed. Fix related issue in perf_event__process_event_update() for name and own_cpus. Leaks were spotted by leak sanitizer. Signed-off-by:
Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20211118084749.2191447-1-irogers@google.com Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 13 Nov, 2021 1 commit
-
-
Ian Rogers authored
Events like uncore_imc/cas_count_read/ on Skylake open multiple events and then aggregate in the metric leader. To determine the average value per event the number of these events is needed. Add a source_count function that returns this value by counting the number of events with the given metric leader. For most events the value is 1 but for uncore_imc/cas_count_read/ it can yield values like 6. Add a generic test, but manually tested with a test metric that uses the function. Signed-off-by:
Ian Rogers <irogers@google.com> Acked-by:
Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul A . Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Song Liu <song@kernel.org> Cc: Wan Jiabing <wanjiabing@vivo.com> Cc: Yury Norov <yury.norov@gmail.com> Link: https://lore.kernel.org/r/20211111002109.194172-9-irogers@google.com Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 07 Nov, 2021 1 commit
-
-
Ravi Bangoria authored
Perf tool sets exclude_guest by default while calling perf_event_open(). Because IBS does not have filtering capability, it always gets rejected by IBS PMU driver and thus perf falls back to non-precise sampling. Fix it by not setting exclude_guest by default on AMD. Before: $ sudo ./perf record -C 0 -vvv true |& grep precise precise_ip 3 decreasing precise_ip by one (2) precise_ip 2 decreasing precise_ip by one (1) precise_ip 1 decreasing precise_ip by one (0) After: $ sudo ./perf record -C 0 -vvv true |& grep precise precise_ip 3 decreasing precise_ip by one (2) precise_ip 2 Committer notes: Fixup init to zero for perf_env in older compilers: arch/x86/util/evsel.c:15:26: error: missing field 'os_release' initializer [-Werror,-Wmissing-field-initializers] struct perf_env env = {0}; ^ Committer notes: Namhyung remarked: It'd be nice if it can cover explicit "-e cycles:pp" as well. Ravi clarified: For explicit :pp modifier, evsel->precise_max does not get set and thus perf does not try with different attr->precise_ip values while exclude_guest set. So no issue with explicit :pp: $ sudo ./perf record -C 0 -e cycles:pp -vvv |& grep "precise_ip\|exclude_guest" precise_ip 2 exclude_guest 1 precise_ip 2 exclude_guest 1 switching off exclude_guest, exclude_host precise_ip 2 ^C Also, with :P modifier, evsel->precise_max gets set but exclude_guest does not and thus :P also works fine: $ sudo ./perf record -C 0 -e cycles:P -vvv |& grep "precise_ip\|exclude_guest" precise_ip 3 decreasing precise_ip by one (2) precise_ip 2 ^C Reported-by:
Kim Phillips <kim.phillips@amd.com> Signed-off-by:
Ravi Bangoria <ravi.bangoria@amd.com> Acked-by:
Namhyung Kim <namhyung@kernel.org> Tested-by:
Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20211103072112.32312-1-ravi.bangoria@amd.com Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 06 Nov, 2021 1 commit
-
-
Namhyung Kim authored
The current logic for the perf missing feature has a bug that it can wrongly clear some modifiers like G or H. Actually some PMUs don't support any filtering or exclusion while others do. But we check it as a global feature. For example, the cycles event can have 'G' modifier to enable it only in the guest mode on x86. When you don't run any VMs it'll return 0. # perf stat -a -e cycles:G sleep 1 Performance counter stats for 'system wide': 0 cycles:G 1.000721670 seconds time elapsed But when it's used with other pmu events that don't support G modifier, it'll be reset and return non-zero values. # perf stat -a -e cycles:G,msr/tsc/ sleep 1 Performance counter stats for 'system wide': 538,029,960 cycles:G 16,924,010,738 msr/tsc/ 1.001815327 seconds time elapsed This is because of the missing feature detection logic being global. Add a hashmap to set pmu-specific exclude_host/guest features. Committer notes: Fix 'perf test python' by adding a stub for evsel__find_pmu() in tools/perf/util/python.c, document that it is used so far only for the above reasons so that if anybody needs this in the python binding usecases, we can revisit this. Reported-by:
Stephane Eranian <eranian@google.com> Signed-off-by:
Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: http://lore.kernel.org/lkml/20211105205847.120950-1-namhyung@kernel.org Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 28 Oct, 2021 1 commit
-
-
Madhavan Srinivasan authored
The branch_stack struct has bit field definition which produces different bit ordering for big/little endian. Because of this, when branch_stack sample is collected in a BE system and viewed/reported in a LE system, bit fields of the branch stack are not presented properly. To address this issue, a evsel__bitfield_swap_branch_stack() is defined and introduced in evsel__parse_sample. Signed-off-by:
Madhavan Srinivasan <maddy@linux.ibm.com> Acked-by:
Jiri Olsa <jolsa@redhat.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20211028113714.600549-1-maddy@linux.ibm.com Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 20 Oct, 2021 1 commit
-
-
Ian Rogers authored
Add a new "metric-id" term to events so that metric parsing can set an ID that can be reliably looked up. Metric parsing currently will turn a metric like "instructions/cycles" into a parse events string of "{instructions,cycles}:W". However, parse-events may change "instructions" into "instructions:u" if perf_event_paranoid=2. When this happens expr__resolve_id currently fails as stat-shadow adds the ID "instructions:u" to match with the counter value and the metric tries to look up the ID just "instructions". A later patch will use the new term. An example of the current problem: $ echo -1 > /proc/sys/kernel/perf_event_paranoid $ perf stat -M IPC /bin/true Performance counter stats for '/bin/true': 1,217,161 inst_retired.any # 0.97 IPC 1,250,389 cpu_clk_unhalted.thread 0.002064773 seconds time elapsed 0.002378000 seconds user 0.000000000 seconds sys $ echo 2 > /proc/sys/kernel/perf_event_paranoid $ perf stat -M IPC /bin/true Performance counter stats for '/bin/true': 150,298 inst_retired.any:u # nan IPC 187,095 cpu_clk_unhalted.thread:u 0.002042731 seconds time elapsed 0.000000000 seconds user 0.002377000 seconds sys Note: nan IPC is printed as an effect of "perf metric: Use NAN for missing event IDs." but earlier versions of perf just fail with a parse error and display no value. Signed-off-by:
Ian Rogers <irogers@google.com> Acked-by:
Andi Kleen <ak@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Antonov <alexander.antonov@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrew Kilroy <andrew.kilroy@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Changbin Du <changbin.du@intel.com> Cc: Denys Zagorui <dzagorui@cisco.com> Cc: Fabian Hemmer <copy@copy.sh> Cc: Felix Fietkau <nbd@nbd.name> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jacob Keller <jacob.e.keller@intel.com> Cc: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Joakim Zhang <qiangqing.zhang@nxp.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kees Kook <keescook@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nicholas Fraser <nfraser@codeweavers.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: ShihCheng Tu <mrtoastcheng@gmail.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Wan Jiabing <wanjiabing@vivo.com> Cc: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/r/20211015172132.1162559-15-irogers@google.com Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 11 Sep, 2021 1 commit
-
-
Adrian Hunter authored
Factor out copy_config_terms() and free_config_terms() so that they can be reused. Signed-off-by:
Adrian Hunter <adrian.hunter@intel.com> Acked-by:
Jiri Olsa <jolsa@redhat.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Link: https //lore.kernel.org/r/20210909125508.28693-2-adrian.hunter@intel.com Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-