- 19 Mar, 2020 1 commit
-
-
Ingo Molnar authored
Merge tag 'perf-core-for-mingo-5.7-20200310' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: perf stat: Jin Yao: - Show percore counts in per CPU output. perf report: Jin Yao: - Allow selecting which block info columns to report and its order. - Support color ops to print block percents in color. - Fix wrong block address comparison in block_info__cmp(). perf annotate: Ravi Bangoria: - Get rid of annotation->nr_jumps, unused. expr: Jiri Olsa: - Move expr lexer to flex. llvm: Arnaldo Carvalho de Melo: - Add debug hint message about missing kernel-devel package. core: Kan Liang: - Initial patches to support the recently added PERF_SAMPLE_BRANCH_HW_INDEX kernel feature. - Add check for unexpected use of reserved membrs in event attr, so that in the future older perf tools will complain instead of silently try to process unknown features. libapi: Namhyung Kim: - Adopt cgroupsfs_find_mountpoint() from tools/perf/util/. libperf: Michael Petlan: - Add counting example. libtraceevent: Steven Rostedt (VMware): - Remove extra '\n' in print_event_time(). Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
- 17 Mar, 2020 3 commits
-
-
Kim Phillips authored
Family 19h introduces change in slice, core and thread specification in its L3 Performance Event Select (ChL3PmcCfg) h/w register. The change is incompatible with Family 17h's version of the register. Introduce a new path in l3_thread_slice_mask() to do things differently for Family 19h vs. Family 17h, otherwise the new hardware doesn't get programmed correctly. Instead of a linear core--thread bitmask, Family 19h takes an encoded core number, and a separate thread mask. There are new bits that are set for all cores and all slices, of which only the latter is used, since the driver counts events for all slices on behalf of the specified CPU. Also update amd_uncore_init() to base its L2/NB vs. L3/Data Fabric mode decision based on Family 17h or above, not just 17h and 18h: the Family 19h Data Fabric PMC is compatible with the Family 17h DF PMC. [ bp: Touchups. ] Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200313231024.17601-3-kim.phillips@amd.com
-
Kim Phillips authored
Convert the l3_thread_slice_mask() function to use the more readable topology_* helper functions, more intuitive variable names like shift and thread_mask, and BIT_ULL(). No functional changes. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200313231024.17601-2-kim.phillips@amd.com
-
Kim Phillips authored
In order to better accommodate the upcoming Family 19h, given the 80-char line limit, move the existing code into a new l3_thread_slice_mask() function. No functional changes. [ bp: Touchups. ] Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200313231024.17601-1-kim.phillips@amd.com
-
- 10 Mar, 2020 12 commits
-
-
Jin Yao authored
It would be nice to print the block percents with colors. This patch supports the 'Sampled Cycles%' and 'Avg Cycles%' printed in colors. For example, perf record -b ... perf report --total-cycles or perf report --total-cycles --stdio percent > 5%, colored in red percent > 0.5%, colored in green percent < 0.5%, default color Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200202141655.32053-5-yao.jin@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jin Yao authored
Currently we use a predefined array to set the block info output formats, it's fixed and inflexible. This patch adds two parameters "block_hpps" and "nr_hpps" in block_info__create_report and other static functions, in order to let user decide which columns to report and with specified report ordering. It should be more flexible. Buffers will be allocated to contain the new fmts, of course, we need to release them before perf exits. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200202141655.32053-4-yao.jin@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jin Yao authored
'perf diff' uses block_pair_cmp() to compare two blocks. But block_info__cmp() has the similar functionality and it's a bit more complete. This patch removes block_pair_cmp() and uses __block_info__cmp() instead. __block_info__cmp() is wrapped by block_info__cmp() and it doesn't receives a perf_hpp_fmt parameter. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200202141655.32053-3-yao.jin@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jin Yao authored
Commit 60414418 ("perf block: Cleanup and refactor block info functions") introduces block_info__cmp(), which compares two blocks. But the issues are: 1. It should return the strcmp cmp value only if it's not 0. 2. When symbol names are matched, we need to compare the addresses of blocks further. But it wrongly uses the symbol addresses for comparison. 3. If the syms are both NULL, we can't consider these two blocks are matched. This patch fixes above 3 issues. Fixes: 60414418 ("perf block: Cleanup and refactor block info functions") Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200202141655.32053-2-yao.jin@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
To match the error value of the expr__find_other function, so all exported expr functions return the same values: 0 on success, -1 on error. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: http://lore.kernel.org/lkml/20200228093616.67125-6-jolsa@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
Now that we have a flex parser we don't need to update the parsed string pointer, so the interface can just be passed the pointer to the expression instead of a pointer to pointer. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: http://lore.kernel.org/lkml/20200228093616.67125-5-jolsa@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
We have metrics that define more than 15 variables, like Branch_Misprediction_Cost. Increasing the allowed variables count to 20. As Andy pointed out, we can't go too high in here, because some of the code has O(n^2) complexity (already_seen) and we might want to do some other changes (like using hash tables) before increasing the maximum even more. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: http://lore.kernel.org/lkml/20200228093616.67125-4-jolsa@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
Adding expr flex code instead of the manual parser code. So it's easily extensible in upcoming changes. The new flex code is in flex.l object and gets compiled like all the other flexers we use. It's defined as flex reentrant parser. It's used by both expr__parse and expr__find_other interfaces by separating the starting point. There's no intended change of functionality ;-) the test expr is passing. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: http://lore.kernel.org/lkml/20200228093616.67125-3-jolsa@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
Add generic expr code into new expr.c object. The expr.c object will be mainly used in following change that will get rid of the manual flex code, Signed-off-by: Jiri Olsa <jolsa@kernel.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: http://lore.kernel.org/lkml/20200228093616.67125-2-jolsa@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Kan Liang authored
The perf.data may be generated by a newer version of perf tool, which support new input bits in attr, e.g. new bit for branch_sample_type. The perf.data may be parsed by an older version of perf tool later. The old perf tool may parse the perf.data incorrectly. There is no warning message for this case. Current perf header never check for unknown input bits in attr. When read the event desc from header, check the stored event attr. The reserved bits, sample type, read format and branch sample type will be checked. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pavel Gerasimov <pavel.gerasimov@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Link: http://lkml.kernel.org/r/20200228163011.19358-4-kan.liang@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Kan Liang authored
A new branch sample type PERF_SAMPLE_BRANCH_HW_INDEX has been introduced in latest kernel. Enable HW_INDEX by default in LBR call stack mode. If kernel doesn't support the sample type, switching it off. Add HW_INDEX in attr_fprintf as well. User can check whether the branch sample type is set via debug information or header. Committer testing: First collect some samples with LBR callchains, system wide, for a few seconds: # perf record --call-graph lbr -a sleep 5 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.625 MB perf.data (224 samples) ] # Now lets use 'perf evlist -v' to look at the branch_sample_type: # perf evlist -v cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CALLCHAIN|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: USER|CALL_STACK|NO_FLAGS|NO_CYCLES|HW_INDEX # So the machine has the kernel feature, and it was correctly added to perf_event_attr.branch_sample_type, for the default 'cycles' event. If we do it in another machine, where the kernel lacks the HW_INDEX feature, we get: # perf record --call-graph lbr -a sleep 2s [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.690 MB perf.data (499 samples) ] # perf evlist -v cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CALLCHAIN|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: USER|CALL_STACK|NO_FLAGS|NO_CYCLES # No HW_INDEX in attr.branch_sample_type. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pavel Gerasimov <pavel.gerasimov@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Link: http://lore.kernel.org/lkml/20200228163011.19358-3-kan.liang@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Kan Liang authored
The low level index of raw branch records for the most recent branch can be recorded in a sample with PERF_SAMPLE_BRANCH_HW_INDEX branch_sample_type. Extend struct branch_stack to support it. However, if the PERF_SAMPLE_BRANCH_HW_INDEX is not applied, only nr and entries[] will be output by kernel. The pointer of entries[] could be wrong, since the output format is different with new struct branch_stack. Add a variable no_hw_idx in struct perf_sample to indicate whether the hw_idx is output. Add get_branch_entry() to return corresponding pointer of entries[0]. To make dummy branch sample consistent as new branch sample, add hw_idx in struct dummy_branch_stack for cs-etm and intel-pt. Apply the new struct branch_stack for synthetic events as well. Extend test case sample-parsing to support new struct branch_stack. Committer notes: Renamed get_branch_entries() to perf_sample__branch_entries() to have proper namespacing and pave the way for this to be moved to libperf, eventually. Add 'static' to that inline as it is in a header. Add 'hw_idx' to 'struct dummy_branch_stack' in cs-etm.c to fix the build on arm64. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pavel Gerasimov <pavel.gerasimov@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Link: http://lore.kernel.org/lkml/20200228163011.19358-2-kan.liang@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 06 Mar, 2020 9 commits
-
-
Ian Rogers authored
If one is monitoring 6 events on 20 cgroups the per-CPU RB tree will hold 120 events. The scheduling in of the events currently iterates over all events looking to see which events match the task's cgroup or its cgroup hierarchy. If a task is in 1 cgroup with 6 events, then 114 events are considered unnecessarily. This change orders events in the RB tree by cgroup id if it is present. This means scheduling in may go directly to events associated with the task's cgroup if one is present. The per-CPU iterator storage in visit_groups_merge is sized sufficent for an iterator per cgroup depth, where different iterators are needed for the task's cgroup and parent cgroups. By considering the set of iterators when visiting, the lowest group_index event may be selected and the insertion order group_index property is maintained. This also allows event rotation to function correctly, as although events are grouped into a cgroup, rotation always selects the lowest group_index event to rotate (delete/insert into the tree) and the min heap of iterators make it so that the group_index order is maintained. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20190724223746.153620-3-irogers@google.com
-
Ian Rogers authored
Allow the per-CPU min heap storage to have sufficient space for per-cgroup iterators. Based-on-work-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200214075133.181299-6-irogers@google.com
-
Ian Rogers authored
The storage required for visit_groups_merge's min heap needs to vary in order to support more iterators, such as when multiple nested cgroups' events are being visited. This change allows for 2 iterators and doesn't support growth. Based-on-work-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200214075133.181299-5-irogers@google.com
-
Ian Rogers authored
visit_groups_merge will pick the next event based on when it was inserted in to the context (perf_event group_index). Events may be per CPU or for any CPU, but in the future we'd also like to have per cgroup events to avoid searching all events for the events to schedule for a cgroup. Introduce a min heap for the events that maintains a property that the earliest inserted event is always at the 0th element. Initialize the heap with per-CPU and any-CPU events for the context. Based-on-work-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200214075133.181299-4-irogers@google.com
-
Ian Rogers authored
Supports push, pop and converting an array into a heap. If the sense of the compare function is inverted then it can provide a max-heap. Based-on-work-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200214075133.181299-3-irogers@google.com
-
Peter Zijlstra authored
Move perf_cgroup_connect() after perf_event_alloc(), such that we can find/use the PMU's cpu context. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200214075133.181299-2-irogers@google.com
-
Peter Zijlstra authored
We can deduce the ctx and cpuctx from the event, no need to pass them along. Remove the structure and pass in can_add_hw directly. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
Peter Zijlstra authored
Less is more; unify the two very nearly identical function. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
Ingo Molnar authored
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
- 05 Mar, 2020 1 commit
-
-
Arnaldo Carvalho de Melo authored
To get the changes in: bbfd5e4f ("perf/core: Add new branch sample type for HW index of raw branch records") This silences this perf tools build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/perf_event.h' differs from latest version at 'include/uapi/linux/perf_event.h' diff -u tools/include/uapi/linux/perf_event.h include/uapi/linux/perf_event.h This update is a prerequisite to adding support for the HW index of raw branch records. Acked-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pavel Gerasimov <pavel.gerasimov@intel.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Link: http://lore.kernel.org/lkml/20200304134902.GB12612@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 04 Mar, 2020 9 commits
-
-
Steven Rostedt (VMware) authored
If the precision of print_event_time() is zero or greater than the timestamp, it uses a different format. But that format had an extra new line at the end, and caused the output to not look right: cpus=2 sleep-3946 [001]111264306005 : function: inotify_inode_queue_event sleep-3946 [001]111264307158 : function: __fsnotify_parent sleep-3946 [001]111264307637 : function: inotify_dentry_parent_queue_event sleep-3946 [001]111264307989 : function: fsnotify sleep-3946 [001]111264308401 : function: audit_syscall_exit Fixes: 38847db9 ("libtraceevent, perf tools: Changes in tep_print_event_* APIs") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20200303231852.6ab6882f@oasis.local.homeSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Michael Petlan authored
Current libperf man pages mention file counting.c "coming with libperf package", however, the file is missing. Add the file then. Fixes: 81de3bf3 ("libperf: Add man pages") Signed-off-by: Michael Petlan <mpetlan@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> LPU-Reference: 20200227194424.28210-1-mpetlan@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Ravi Bangoria authored
The 'nr_jumps' field in 'struct annotation' is not used since it's inception in commit 2402e4a9 ("perf annotate browser: Show 'jumpy' functions"). Get rid of it. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Link: http://lore.kernel.org/lkml/20200204045233.474937-7-ravi.bangoria@linux.ibm.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
To help in debugging, add this extra message: detect_kbuild_dir: Couldn't find "/lib/modules/5.4.20-200.fc31.x86_64/build/include/generated/autoconf.h", missing kernel-devel package?. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jin Yao authored
We have supported the event modifier "percore" which sums up the event counts for all hardware threads in a core and show the counts per core. For example, # perf stat -e cpu/event=cpu-cycles,percore/ -a -A -- sleep 1 Performance counter stats for 'system wide': S0-D0-C0 395,072 cpu/event=cpu-cycles,percore/ S0-D0-C1 851,248 cpu/event=cpu-cycles,percore/ S0-D0-C2 954,226 cpu/event=cpu-cycles,percore/ S0-D0-C3 1,233,659 cpu/event=cpu-cycles,percore/ This patch provides a new option "--percore-show-thread". It is used with event modifier "percore" together to sum up the event counts for all hardware threads in a core but show the counts per hardware thread. This is essentially a replacement for the any bit (which is gone in Icelake). Per core counts are useful for some formulas, e.g. CoreIPC. The original percore version was inconvenient to post process. This variant matches the output of the any bit. With this patch, for example, # perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1 Performance counter stats for 'system wide': CPU0 2,453,061 cpu/event=cpu-cycles,percore/ CPU1 1,823,921 cpu/event=cpu-cycles,percore/ CPU2 1,383,166 cpu/event=cpu-cycles,percore/ CPU3 1,102,652 cpu/event=cpu-cycles,percore/ CPU4 2,453,061 cpu/event=cpu-cycles,percore/ CPU5 1,823,921 cpu/event=cpu-cycles,percore/ CPU6 1,383,166 cpu/event=cpu-cycles,percore/ CPU7 1,102,652 cpu/event=cpu-cycles,percore/ We can see counts are duplicated in CPU pairs (CPU0/CPU4, CPU1/CPU5, CPU2/CPU6, CPU3/CPU7). The interval mode also works. For example, # perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -I 1000 # time CPU counts unit events 1.000425421 CPU0 925,032 cpu/event=cpu-cycles,percore/ 1.000425421 CPU1 430,202 cpu/event=cpu-cycles,percore/ 1.000425421 CPU2 436,843 cpu/event=cpu-cycles,percore/ 1.000425421 CPU3 1,192,504 cpu/event=cpu-cycles,percore/ 1.000425421 CPU4 925,032 cpu/event=cpu-cycles,percore/ 1.000425421 CPU5 430,202 cpu/event=cpu-cycles,percore/ 1.000425421 CPU6 436,843 cpu/event=cpu-cycles,percore/ 1.000425421 CPU7 1,192,504 cpu/event=cpu-cycles,percore/ If we offline CPU5, the result is: # perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1 Performance counter stats for 'system wide': CPU0 2,752,148 cpu/event=cpu-cycles,percore/ CPU1 1,009,312 cpu/event=cpu-cycles,percore/ CPU2 2,784,072 cpu/event=cpu-cycles,percore/ CPU3 2,427,922 cpu/event=cpu-cycles,percore/ CPU4 2,752,148 cpu/event=cpu-cycles,percore/ CPU6 2,784,072 cpu/event=cpu-cycles,percore/ CPU7 2,427,922 cpu/event=cpu-cycles,percore/ 1.001416041 seconds time elapsed v4: --- Ravi Bangoria reports an issue in v3. Once we offline a CPU, the output is not correct. The issue is we should use the cpu idx in print_percore_thread rather than using the cpu value. v3: --- 1. Fix the interval mode output error 2. Use cpu value (not cpu index) in config->aggr_get_id(). 3. Refine the code according to Jiri's comments. v2: --- Add the explanation in change log. This is essentially a replacement for the any bit. No code change. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200214080452.26402-1-yao.jin@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Namhyung Kim authored
Move it from tools/perf/util/cgroup.c as it can be used by other places. Note that cgroup filesystem is different from others since it's usually mounted separately (in v1) for each subsystem. I just copied the code with a little modification to pass a name of subsystem. Suggested-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20200127100031.1368732-1-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
To pick up fixes. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Nick Desaulniers authored
clang warns: util/block-info.c:298:18: error: result of comparison against a string literal is unspecified (use an explicit string comparison function instead) [-Werror,-Wstring-compare] if ((start_line != SRCLINE_UNKNOWN) && (end_line != SRCLINE_UNKNOWN)) { ^ ~~~~~~~~~~~~~~~ util/block-info.c:298:51: error: result of comparison against a string literal is unspecified (use an explicit string comparison function instead) [-Werror,-Wstring-compare] if ((start_line != SRCLINE_UNKNOWN) && (end_line != SRCLINE_UNKNOWN)) { ^ ~~~~~~~~~~~~~~~ util/block-info.c:298:18: error: result of comparison against a string literal is unspecified (use an explicit string comparison function instead) [-Werror,-Wstring-compare] if ((start_line != SRCLINE_UNKNOWN) && (end_line != SRCLINE_UNKNOWN)) { ^ ~~~~~~~~~~~~~~~ util/block-info.c:298:51: error: result of comparison against a string literal is unspecified (use an explicit string comparison function instead) [-Werror,-Wstring-compare] if ((start_line != SRCLINE_UNKNOWN) && (end_line != SRCLINE_UNKNOWN)) { ^ ~~~~~~~~~~~~~~~ util/map.c:434:15: error: result of comparison against a string literal is unspecified (use an explicit string comparison function instead) [-Werror,-Wstring-compare] if (srcline != SRCLINE_UNKNOWN) ^ ~~~~~~~~~~~~~~~ Reviewer Notes: Looks good to me. Some more context: https://clang.llvm.org/docs/DiagnosticsReference.html#wstring-compare The spec says: J.1 Unspecified behavior The following are unspecified: .. Whether two string literals result in distinct arrays (6.4.5). Signed-off-by: Nick Desaulniers <nick.desaulniers@gmail.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Changbin Du <changbin.du@intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Keeping <john@metanate.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: clang-built-linux@googlegroups.com Link: https://github.com/ClangBuiltLinux/linux/issues/900 Link: http://lore.kernel.org/lkml/20200223193456.25291-1-nick.desaulniers@gmail.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Ingo Molnar authored
Merge tag 'perf-urgent-for-mingo-5.6-20200303' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes from Arnaldo Carvalho de Melo: perf symbols: Arnaldo Carvalho de Melo: - Don't try to find a vmlinux file when looking for kernel modules, fixing symbol resolution in systems with compressed kernel modules. perf env: Arnaldo Carvalho de Melo: - Do not return pointers to local variables, fixing valid warning from gcc 10 for corner case that stops the build due to -Werror. perf tests: Arnaldo Carvalho de Melo: - Make global variable static in the bp_account entry to fix build with gcc 10. perf parse-events: Arnaldo Carvalho de Melo: - Use asprintf() instead of strncpy() to read tracepoint files, addressing compiler warning that stops the build as we use -Werror. perf bench: Arnaldo Carvalho de Melo: - Share some global variables to fix build with gcc 10. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
- 03 Mar, 2020 3 commits
-
-
git://git.samba.org/sfrench/cifs-2.6Linus Torvalds authored
Pull cifs fixes from Steve French: "Five small cifs/smb3 fixes, two for stable (one for a reconnect problem and the other fixes a use case when renaming an open file)" * tag '5.6-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: Use #define in cifs_dbg cifs: fix rename() by ensuring source handle opened with DELETE bit cifs: add missing mount option to /proc/mounts cifs: fix potential mismatch of UNC paths cifs: don't leak -EAGAIN for stat() during reconnect
-
Arnaldo Carvalho de Melo authored
The dso->kernel value is now set to everything that is in machine->kmaps, but that was being used to decide if vmlinux lookup is needed, which ended up making that lookup be made for kernel modules, that now have dso->kernel set, leading to these kinds of warnings when running on a machine with compressed kernel modules, like fedora:31: [root@five ~]# perf record -F 10000 -a sleep 2 [ perf record: Woken up 1 times to write data ] lzma: fopen failed on vmlinux: 'No such file or directory' lzma: fopen failed on /boot/vmlinux: 'No such file or directory' lzma: fopen failed on /boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory' lzma: fopen failed on /usr/lib/debug/boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory' lzma: fopen failed on /lib/modules/5.5.5-200.fc31.x86_64/build/vmlinux: 'No such file or directory' lzma: fopen failed on vmlinux: 'No such file or directory' lzma: fopen failed on /boot/vmlinux: 'No such file or directory' lzma: fopen failed on /boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory' lzma: fopen failed on /usr/lib/debug/boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory' lzma: fopen failed on /lib/modules/5.5.5-200.fc31.x86_64/build/vmlinux: 'No such file or directory' lzma: fopen failed on vmlinux: 'No such file or directory' lzma: fopen failed on /boot/vmlinux: 'No such file or directory' lzma: fopen failed on /boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory' lzma: fopen failed on /usr/lib/debug/boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory' lzma: fopen failed on /lib/modules/5.5.5-200.fc31.x86_64/build/vmlinux: 'No such file or directory' lzma: fopen failed on vmlinux: 'No such file or directory' lzma: fopen failed on /boot/vmlinux: 'No such file or directory' lzma: fopen failed on /boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory' lzma: fopen failed on /usr/lib/debug/boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory' lzma: fopen failed on /lib/modules/5.5.5-200.fc31.x86_64/build/vmlinux: 'No such file or directory' lzma: fopen failed on vmlinux: 'No such file or directory' lzma: fopen failed on /boot/vmlinux: 'No such file or directory' lzma: fopen failed on /boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory' lzma: fopen failed on /usr/lib/debug/boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory' lzma: fopen failed on /lib/modules/5.5.5-200.fc31.x86_64/build/vmlinux: 'No such file or directory' [ perf record: Captured and wrote 1.024 MB perf.data (1366 samples) ] [root@five ~]# This happens when collecting the buildid, when we find samples for kernel modules, fix it by checking if the looked up DSO is a kernel module by other means. Fixes: 02213cec ("perf maps: Mark module DSOs with kernel type") Tested-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: http://lore.kernel.org/lkml/20200302191007.GD10335@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
Noticed with gcc 10 (fedora rawhide) that those variables were not being declared as static, so end up with: ld: /tmp/build/perf/bench/epoll-wait.o:/git/perf/tools/perf/bench/epoll-wait.c:93: multiple definition of `end'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here ld: /tmp/build/perf/bench/epoll-wait.o:/git/perf/tools/perf/bench/epoll-wait.c:93: multiple definition of `start'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here ld: /tmp/build/perf/bench/epoll-wait.o:/git/perf/tools/perf/bench/epoll-wait.c:93: multiple definition of `runtime'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here ld: /tmp/build/perf/bench/epoll-ctl.o:/git/perf/tools/perf/bench/epoll-ctl.c:38: multiple definition of `end'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here ld: /tmp/build/perf/bench/epoll-ctl.o:/git/perf/tools/perf/bench/epoll-ctl.c:38: multiple definition of `start'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here ld: /tmp/build/perf/bench/epoll-ctl.o:/git/perf/tools/perf/bench/epoll-ctl.c:38: multiple definition of `runtime'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here make[4]: *** [/git/perf/tools/build/Makefile.build:145: /tmp/build/perf/bench/perf-in.o] Error 1 Prefix those with bench__ and add them to bench/bench.h, so that we can share those on the tools needing to access those variables from signal handlers. Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20200303155811.GD13702@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 02 Mar, 2020 2 commits
-
-
Arnaldo Carvalho de Melo authored
Make the code more compact by using asprintf() instead of malloc()+strncpy() which also uses less memory and avoids these warnings with gcc 10: CC /tmp/build/perf/util/cloexec.o In file included from /usr/include/string.h:495, from util/parse-events.h:12, from util/parse-events.c:18: In function ‘strncpy’, inlined from ‘tracepoint_id_to_path’ at util/parse-events.c:271:5: /usr/include/bits/string_fortified.h:106:10: error: ‘__builtin_strncpy’ offset [275, 511] from the object at ‘sys_dirent’ is out of the bounds of referenced subobject ‘d_name’ with type ‘char[256]’ at offset 19 [-Werror=array-bounds] 106 | return __builtin___strncpy_chk (__dest, __src, __len, __bos (__dest)); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from /usr/include/dirent.h:61, from util/parse-events.c:5: util/parse-events.c: In function ‘tracepoint_id_to_path’: /usr/include/bits/dirent.h:33:10: note: subobject ‘d_name’ declared here 33 | char d_name[256]; /* We must not include limits.h! */ | ^~~~~~ In file included from /usr/include/string.h:495, from util/parse-events.h:12, from util/parse-events.c:18: In function ‘strncpy’, inlined from ‘tracepoint_id_to_path’ at util/parse-events.c:273:5: /usr/include/bits/string_fortified.h:106:10: error: ‘__builtin_strncpy’ offset [275, 511] from the object at ‘evt_dirent’ is out of the bounds of referenced subobject ‘d_name’ with type ‘char[256]’ at offset 19 [-Werror=array-bounds] 106 | return __builtin___strncpy_chk (__dest, __src, __len, __bos (__dest)); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from /usr/include/dirent.h:61, from util/parse-events.c:5: util/parse-events.c: In function ‘tracepoint_id_to_path’: /usr/include/bits/dirent.h:33:10: note: subobject ‘d_name’ declared here 33 | char d_name[256]; /* We must not include limits.h! */ | ^~~~~~ CC /tmp/build/perf/util/call-path.o Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20200302145535.GA28183@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
It is possible to return a pointer to a local variable when looking up the architecture name for the running system and no normalization is done on that value, i.e. we may end up returning the uts.machine local variable. While this doesn't happen on most arches, as normalization takes place, lets fix this by making that a static variable and optimize it a bit by not always running uname(), only the first time. Noticed in fedora rawhide running with: [perfbuilder@a5ff49d6e6e4 ~]$ gcc --version gcc (GCC) 10.0.1 20200216 (Red Hat 10.0.1-0.8) Reported-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-