1. 11 Mar, 2020 12 commits
  2. 10 Mar, 2020 18 commits
    • Kan Liang's avatar
      perf vendor events intel: Add NO_NMI_WATCHDOG metric constraint · b95fcd2c
      Kan Liang authored
      Add NO_NMI_WATCHDOG metric constraint to Page_Walks_Utilization for Sky Lake
      and Cascade Lake.
      
      Committer testing:
      
      On a Lenovo T480S, Intel(R) Core(TM) i7-8650U Kaby Lake, that looking at x86's
      mapfile.csv file is a:
      
        $ grep -w skylake tools/perf/pmu-events/arch/x86/mapfile.csv
        GenuineIntel-6-[4589]E,v24,skylake,core
        $
      
      So uses the constraint added in this patch in this file:
      
        tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json
      
      Before:
      
        # perf stat -a -M Page_Walks_Utilization sleep 2
      
         Performance counter stats for 'system wide':
      
             <not counted>      itlb_misses.walk_pending                                      (0.00%)
             <not counted>      dtlb_load_misses.walk_pending                                     (0.00%)
             <not counted>      dtlb_store_misses.walk_pending                                     (0.00%)
             <not counted>      ept.walk_pending                                              (0.00%)
             <not counted>      cycles                                                        (0.00%)
      
               2.001750514 seconds time elapsed
      
        Some events weren't counted. Try disabling the NMI watchdog:
        	echo 0 > /proc/sys/kernel/nmi_watchdog
        	perf stat ...
        	echo 1 > /proc/sys/kernel/nmi_watchdog
        The events in group usually have to be from the same PMU. Try reorganizing the group.
        #
      
      After:
      
        # perf stat -a -M Page_Walks_Utilization sleep 2
        Splitting metric group Page_Walks_Utilization into standalone metrics.
        Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric constraint:
            echo 0 > /proc/sys/kernel/nmi_watchdog
            perf stat ...
            echo 1 > /proc/sys/kernel/nmi_watchdog
        ,
         Performance counter stats for 'system wide':
      
                36,883,102      itlb_misses.walk_pending  #      0.1 Page_Walks_Utilization   (79.99%)
               123,104,146      dtlb_load_misses.walk_pending                                     (80.02%)
                13,720,795      dtlb_store_misses.walk_pending                                     (79.99%)
                         0      ept.walk_pending                                              (79.99%)
             1,519,948,400      cycles                                                        (80.01%)
      
               2.002170780 seconds time elapsed
      
        #
      
      Before and after, if we disable the nmi_watchdog we get:
      
        # echo 0 > /proc/sys/kernel/nmi_watchdog
        # perf stat -a -M Page_Walks_Utilization sleep 2
      
         Performance counter stats for 'system wide':
      
                33,721,658      itlb_misses.walk_pending  #      0.1 Page_Walks_Utilization
                84,070,996      dtlb_load_misses.walk_pending
                 9,816,071      dtlb_store_misses.walk_pending
                         0      ept.walk_pending
               704,920,899      cycles
      
               2.002331670 seconds time elapsed
      
        #
      
        More information about the metric expressions:
      
        # perf stat -v -a -M Page_Walks_Utilization sleep 2
        Using CPUID GenuineIntel-6-8E-A
        metric expr ( itlb_misses.walk_pending + dtlb_load_misses.walk_pending + dtlb_store_misses.walk_pending + ept.walk_pending ) / ( 2 * cycles ) for Page_Walks_Utilization
        found event itlb_misses.walk_pending
        found event dtlb_load_misses.walk_pending
        found event dtlb_store_misses.walk_pending
        found event ept.walk_pending
        found event cycles
        adding {itlb_misses.walk_pending,dtlb_load_misses.walk_pending,dtlb_store_misses.walk_pending,ept.walk_pending,cycles}:W
         -> cpu/umask=0x10,(null)=0x186a3,event=0x85/
         -> cpu/umask=0x10,(null)=0x1e8483,event=0x8/
         -> cpu/umask=0x10,(null)=0x1e8483,event=0x49/
         -> cpu/umask=0x10,(null)=0x1e8483,event=0x4f/
        itlb_misses.walk_pending: 8085772 16010162799 16010162799
        dtlb_load_misses.walk_pending: 28134579 16010162799 16010162799
        dtlb_store_misses.walk_pending: 7276535 16010162799 16010162799
        ept.walk_pending: 2 16010162799 16010162799
        cycles: 315140605 16010162799 16010162799
      
         Performance counter stats for 'system wide':
      
                 8,085,772      itlb_misses.walk_pending  #      0.1 Page_Walks_Utilization
                28,134,579      dtlb_load_misses.walk_pending
                 7,276,535      dtlb_store_misses.walk_pending
                         2      ept.walk_pending
               315,140,605      cycles
      
               2.002333181 seconds time elapsed
      
        #
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/1582581564-184429-6-git-send-email-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b95fcd2c
    • Kan Liang's avatar
      perf metricgroup: Support metric constraint · ab483d8b
      Kan Liang authored
      Some metric groups have metric constraints. A metric group can be
      scheduled as a group only when some constraints are applied.  For
      example, Page_Walks_Utilization has a metric constraint,
      "NO_NMI_WATCHDOG".
      
      When NMI watchdog is disabled, the metric group can be scheduled as a
      group. Otherwise, splitting the metric group into standalone metrics.
      
      Add a new function, metricgroup__has_constraint(), to check whether all
      constraints are applied. If not, splitting the metric group into
      standalone metrics.
      
      Currently, only one constraint, "NO_NMI_WATCHDOG", is checked. Print a
      warning for the metric group with the constraint, when NMI WATCHDOG is
      enabled.
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/1582581564-184429-5-git-send-email-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ab483d8b
    • Kan Liang's avatar
      perf util: Factor out sysctl__nmi_watchdog_enabled() · 2a14c1bf
      Kan Liang authored
      The NMI watchdog status is required for metric group constraint
      examination.  Factor out sysctl__nmi_watchdog_enabled() to retrieve the
      NMI watchdog status.
      
      Users may count more than one metric group each time. If so, the NMI
      watchdog status may be retrieved several times. To reduce the overhead,
      cache the NMI watchdog status.
      
      Replace the NMI watchdog status checking in print_footer() by
      sysctl__nmi_watchdog_enabled().
      Suggested-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/1582581564-184429-4-git-send-email-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2a14c1bf
    • Kan Liang's avatar
      perf metricgroup: Factor out metricgroup__add_metric_weak_group() · f742634a
      Kan Liang authored
      Factor out metricgroup__add_metric_weak_group() which add metrics into a
      weak group. The change can improve code readability. Because following
      patch will introduce a function which add standalone metrics.
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/1582581564-184429-3-git-send-email-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f742634a
    • Kan Liang's avatar
      perf jevents: Support metric constraint · 03fe02b1
      Kan Liang authored
      A new field "MetricConstraint" is introduced in JSON event list.
      
      Extend jevents to parse the field and save the value in
      metric_constraint.
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/1582581564-184429-2-git-send-email-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      03fe02b1
    • Thomas Richter's avatar
      perf vendor events s390: Add new deflate counters for IBM z15 · e7950166
      Thomas Richter authored
      Add support for new deflate counters:
      
      - Counter 247: cycles CPU spent obtaining access to Deflate unit
      - Counter 252: cycles CPU is using Deflate unit
      - Counter 264: Increments by one for every DEFLATE CONVERSION CALL
      	    instruction executed.
      - Counter 265: Increments by one for every DEFLATE CONVERSION CALL
      	    instruction executed that ended in Condition Codes
      	    0, 1 or 2.
      
      Also adjust the some crypto counter description to latest documentation.
      Signed-off-by: default avatarThomas Richter <tmricht@linux.ibm.com>
      Reviewed-by: default avatarSumanth Korikkar <sumanthk@linux.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20200310142937.32045-1-tmricht@linux.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e7950166
    • Jin Yao's avatar
      perf block-info: Support color ops to print block percents in color · f787feff
      Jin Yao authored
      It would be nice to print the block percents with colors.
      
      This patch supports the 'Sampled Cycles%' and 'Avg Cycles%' printed in
      colors.
      
      For example,
      
      perf record -b ...
      perf report --total-cycles or perf report --total-cycles --stdio
      
      percent > 5%, colored in red
      percent > 0.5%, colored in green
      percent < 0.5%, default color
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200202141655.32053-5-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f787feff
    • Jin Yao's avatar
      perf block-info: Allow selecting which columns to report and its order · cca0cc76
      Jin Yao authored
      Currently we use a predefined array to set the block info output
      formats, it's fixed and inflexible.
      
      This patch adds two parameters "block_hpps" and "nr_hpps" in
      block_info__create_report and other static functions, in order to let
      user decide which columns to report and with specified report ordering.
      It should be more flexible.
      
      Buffers will be allocated to contain the new fmts, of course, we need to
      release them before perf exits.
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200202141655.32053-4-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      cca0cc76
    • Jin Yao's avatar
      perf diff: Use __block_info__cmp() to replace block_pair_cmp() · a8a9f6dc
      Jin Yao authored
      'perf diff' uses block_pair_cmp() to compare two blocks. But
      block_info__cmp() has the similar functionality and it's a bit more
      complete.
      
      This patch removes block_pair_cmp() and uses __block_info__cmp()
      instead. __block_info__cmp() is wrapped by block_info__cmp() and it
      doesn't receives a perf_hpp_fmt parameter.
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200202141655.32053-3-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a8a9f6dc
    • Jin Yao's avatar
      perf block-info: Fix wrong block address comparison in block_info__cmp() · 3e152aa9
      Jin Yao authored
      Commit 60414418 ("perf block: Cleanup and refactor block info
      functions") introduces block_info__cmp(), which compares two blocks.
      
      But the issues are:
      
      1. It should return the strcmp cmp value only if it's not 0.
      
      2. When symbol names are matched, we need to compare the addresses
         of blocks further. But it wrongly uses the symbol addresses for
         comparison.
      
      3. If the syms are both NULL, we can't consider these two blocks are
         matched.
      
      This patch fixes above 3 issues.
      
      Fixes: 60414418 ("perf block: Cleanup and refactor block info functions")
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200202141655.32053-2-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3e152aa9
    • Jiri Olsa's avatar
      perf expr: Make expr__parse() return -1 on error · d942815a
      Jiri Olsa authored
      To match the error value of the expr__find_other function, so all
      exported expr functions return the same values:
      0 on success, -1 on error.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20200228093616.67125-6-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d942815a
    • Jiri Olsa's avatar
      perf expr: Straighten expr__parse()/expr__find_other() interface · 0f9b1e12
      Jiri Olsa authored
      Now that we have a flex parser we don't need to update the parsed string
      pointer, so the interface can just be passed the pointer to the
      expression instead of a pointer to pointer.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20200228093616.67125-5-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      0f9b1e12
    • Jiri Olsa's avatar
      perf expr: Increase EXPR_MAX_OTHER to support metrics with more than 15 variables · 58ca7076
      Jiri Olsa authored
      We have metrics that define more than 15 variables, like
      Branch_Misprediction_Cost. Increasing the allowed variables count to 20.
      
      As Andy pointed out, we can't go too high in here, because some of the
      code has O(n^2) complexity (already_seen) and we might want to do some
      other changes (like using hash tables) before increasing the maximum
      even more.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20200228093616.67125-4-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      58ca7076
    • Jiri Olsa's avatar
      perf expr: Move expr lexer to flex · 26226a97
      Jiri Olsa authored
      Adding expr flex code instead of the manual parser code. So it's easily
      extensible in upcoming changes.
      
      The new flex code is in flex.l object and gets compiled like all the
      other flexers we use.  It's defined as flex reentrant parser.
      
      It's used by both expr__parse and expr__find_other interfaces by
      separating the starting point.
      
      There's no intended change of functionality ;-) the test expr is
      passing.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20200228093616.67125-3-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      26226a97
    • Jiri Olsa's avatar
      perf expr: Add expr.c object · 576a65b6
      Jiri Olsa authored
      Add generic expr code into new expr.c object.
      
      The expr.c object will be mainly used in following change that will get
      rid of the manual flex code,
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20200228093616.67125-2-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      576a65b6
    • Kan Liang's avatar
      perf header: Add check for unexpected use of reserved membrs in event attr · 277ce1ef
      Kan Liang authored
      The perf.data may be generated by a newer version of perf tool, which
      support new input bits in attr, e.g. new bit for branch_sample_type.
      
      The perf.data may be parsed by an older version of perf tool later.  The
      old perf tool may parse the perf.data incorrectly. There is no warning
      message for this case.
      
      Current perf header never check for unknown input bits in attr.
      
      When read the event desc from header, check the stored event attr.  The
      reserved bits, sample type, read format and branch sample type will be
      checked.
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
      Link: http://lkml.kernel.org/r/20200228163011.19358-4-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      277ce1ef
    • Kan Liang's avatar
      perf evsel: Support PERF_SAMPLE_BRANCH_HW_INDEX · d3f85437
      Kan Liang authored
      A new branch sample type PERF_SAMPLE_BRANCH_HW_INDEX has been introduced
      in latest kernel.
      
      Enable HW_INDEX by default in LBR call stack mode.
      
      If kernel doesn't support the sample type, switching it off.
      
      Add HW_INDEX in attr_fprintf as well. User can check whether the branch
      sample type is set via debug information or header.
      
      Committer testing:
      
      First collect some samples with LBR callchains, system wide, for a few
      seconds:
      
        # perf record --call-graph lbr -a sleep 5
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.625 MB perf.data (224 samples) ]
        #
      
      Now lets use 'perf evlist -v' to look at the branch_sample_type:
      
        # perf evlist -v
        cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CALLCHAIN|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: USER|CALL_STACK|NO_FLAGS|NO_CYCLES|HW_INDEX
        #
      
      So the machine has the kernel feature, and it was correctly added to
      perf_event_attr.branch_sample_type, for the default 'cycles' event.
      
      If we do it in another machine, where the kernel lacks the HW_INDEX
      feature, we get:
      
        # perf record --call-graph lbr -a sleep 2s
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 1.690 MB perf.data (499 samples) ]
        # perf evlist -v
        cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CALLCHAIN|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: USER|CALL_STACK|NO_FLAGS|NO_CYCLES
        #
      
      No HW_INDEX in attr.branch_sample_type.
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
      Link: http://lore.kernel.org/lkml/20200228163011.19358-3-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d3f85437
    • Kan Liang's avatar
      perf tools: Add hw_idx in struct branch_stack · 42bbabed
      Kan Liang authored
      The low level index of raw branch records for the most recent branch can
      be recorded in a sample with PERF_SAMPLE_BRANCH_HW_INDEX
      branch_sample_type. Extend struct branch_stack to support it.
      
      However, if the PERF_SAMPLE_BRANCH_HW_INDEX is not applied, only nr and
      entries[] will be output by kernel. The pointer of entries[] could be
      wrong, since the output format is different with new struct
      branch_stack.  Add a variable no_hw_idx in struct perf_sample to
      indicate whether the hw_idx is output.  Add get_branch_entry() to return
      corresponding pointer of entries[0].
      
      To make dummy branch sample consistent as new branch sample, add hw_idx
      in struct dummy_branch_stack for cs-etm and intel-pt.
      
      Apply the new struct branch_stack for synthetic events as well.
      
      Extend test case sample-parsing to support new struct branch_stack.
      
      Committer notes:
      
      Renamed get_branch_entries() to perf_sample__branch_entries() to have
      proper namespacing and pave the way for this to be moved to libperf,
      eventually.
      
      Add 'static' to that inline as it is in a header.
      
      Add 'hw_idx' to 'struct dummy_branch_stack' in cs-etm.c to fix the build
      on arm64.
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
      Link: http://lore.kernel.org/lkml/20200228163011.19358-2-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      42bbabed
  3. 05 Mar, 2020 1 commit
    • Arnaldo Carvalho de Melo's avatar
      tools headers UAPI: Update tools's copy of linux/perf_event.h · 6339998d
      Arnaldo Carvalho de Melo authored
      To get the changes in:
      
        bbfd5e4f ("perf/core: Add new branch sample type for HW index of raw branch records")
      
      This silences this perf tools build warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/perf_event.h' differs from latest version at 'include/uapi/linux/perf_event.h'
        diff -u tools/include/uapi/linux/perf_event.h include/uapi/linux/perf_event.h
      
      This update is a prerequisite to adding support for the HW index of raw
      branch records.
      Acked-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
      Link: http://lore.kernel.org/lkml/20200304134902.GB12612@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6339998d
  4. 04 Mar, 2020 9 commits
    • Steven Rostedt (VMware)'s avatar
      tools lib traceevent: Remove extra '\n' in print_event_time() · 401d61cb
      Steven Rostedt (VMware) authored
      If the precision of print_event_time() is zero or greater than the
      timestamp, it uses a different format. But that format had an extra new
      line at the end, and caused the output to not look right:
      
      cpus=2
                 sleep-3946  [001]111264306005
      : function:             inotify_inode_queue_event
                 sleep-3946  [001]111264307158
      : function:             __fsnotify_parent
                 sleep-3946  [001]111264307637
      : function:             inotify_dentry_parent_queue_event
                 sleep-3946  [001]111264307989
      : function:             fsnotify
                 sleep-3946  [001]111264308401
      : function:             audit_syscall_exit
      
      Fixes: 38847db9 ("libtraceevent, perf tools: Changes in tep_print_event_* APIs")
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200303231852.6ab6882f@oasis.local.homeSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      401d61cb
    • Michael Petlan's avatar
      libperf: Add counting example · 76ce0265
      Michael Petlan authored
      Current libperf man pages mention file counting.c "coming with libperf package",
      however, the file is missing. Add the file then.
      
      Fixes: 81de3bf3 ("libperf: Add man pages")
      Signed-off-by: default avatarMichael Petlan <mpetlan@redhat.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LPU-Reference: 20200227194424.28210-1-mpetlan@redhat.com
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      76ce0265
    • Ravi Bangoria's avatar
      perf annotate: Get rid of annotation->nr_jumps · dabce16b
      Ravi Bangoria authored
      The 'nr_jumps' field in 'struct annotation' is not used since it's
      inception in commit 2402e4a9 ("perf annotate browser: Show 'jumpy'
      functions").  Get rid of it.
      Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Song Liu <songliubraving@fb.com>
      Link: http://lore.kernel.org/lkml/20200204045233.474937-7-ravi.bangoria@linux.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      dabce16b
    • Arnaldo Carvalho de Melo's avatar
      perf llvm: Add debug hint message about missing kernel-devel package · 357a5d24
      Arnaldo Carvalho de Melo authored
      To help in debugging, add this extra message:
      
        detect_kbuild_dir: Couldn't find "/lib/modules/5.4.20-200.fc31.x86_64/build/include/generated/autoconf.h", missing kernel-devel package?.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      357a5d24
    • Jin Yao's avatar
      perf stat: Show percore counts in per CPU output · 1af62ce6
      Jin Yao authored
      We have supported the event modifier "percore" which sums up the event
      counts for all hardware threads in a core and show the counts per core.
      
      For example,
      
       # perf stat -e cpu/event=cpu-cycles,percore/ -a -A -- sleep 1
      
        Performance counter stats for 'system wide':
      
       S0-D0-C0                395,072      cpu/event=cpu-cycles,percore/
       S0-D0-C1                851,248      cpu/event=cpu-cycles,percore/
       S0-D0-C2                954,226      cpu/event=cpu-cycles,percore/
       S0-D0-C3              1,233,659      cpu/event=cpu-cycles,percore/
      
      This patch provides a new option "--percore-show-thread". It is used
      with event modifier "percore" together to sum up the event counts for
      all hardware threads in a core but show the counts per hardware thread.
      
      This is essentially a replacement for the any bit (which is gone in
      Icelake). Per core counts are useful for some formulas, e.g. CoreIPC.
      The original percore version was inconvenient to post process. This
      variant matches the output of the any bit.
      
      With this patch, for example,
      
       # perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread  -- sleep 1
      
        Performance counter stats for 'system wide':
      
       CPU0               2,453,061      cpu/event=cpu-cycles,percore/
       CPU1               1,823,921      cpu/event=cpu-cycles,percore/
       CPU2               1,383,166      cpu/event=cpu-cycles,percore/
       CPU3               1,102,652      cpu/event=cpu-cycles,percore/
       CPU4               2,453,061      cpu/event=cpu-cycles,percore/
       CPU5               1,823,921      cpu/event=cpu-cycles,percore/
       CPU6               1,383,166      cpu/event=cpu-cycles,percore/
       CPU7               1,102,652      cpu/event=cpu-cycles,percore/
      
      We can see counts are duplicated in CPU pairs (CPU0/CPU4, CPU1/CPU5,
      CPU2/CPU6, CPU3/CPU7).
      
      The interval mode also works. For example,
      
       # perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread  -I 1000
       #           time CPU                    counts unit events
            1.000425421 CPU0                 925,032      cpu/event=cpu-cycles,percore/
            1.000425421 CPU1                 430,202      cpu/event=cpu-cycles,percore/
            1.000425421 CPU2                 436,843      cpu/event=cpu-cycles,percore/
            1.000425421 CPU3               1,192,504      cpu/event=cpu-cycles,percore/
            1.000425421 CPU4                 925,032      cpu/event=cpu-cycles,percore/
            1.000425421 CPU5                 430,202      cpu/event=cpu-cycles,percore/
            1.000425421 CPU6                 436,843      cpu/event=cpu-cycles,percore/
            1.000425421 CPU7               1,192,504      cpu/event=cpu-cycles,percore/
      
      If we offline CPU5, the result is:
      
       # perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1
      
        Performance counter stats for 'system wide':
      
       CPU0               2,752,148      cpu/event=cpu-cycles,percore/
       CPU1               1,009,312      cpu/event=cpu-cycles,percore/
       CPU2               2,784,072      cpu/event=cpu-cycles,percore/
       CPU3               2,427,922      cpu/event=cpu-cycles,percore/
       CPU4               2,752,148      cpu/event=cpu-cycles,percore/
       CPU6               2,784,072      cpu/event=cpu-cycles,percore/
       CPU7               2,427,922      cpu/event=cpu-cycles,percore/
      
              1.001416041 seconds time elapsed
      
       v4:
       ---
       Ravi Bangoria reports an issue in v3. Once we offline a CPU,
       the output is not correct. The issue is we should use the cpu
       idx in print_percore_thread rather than using the cpu value.
      
       v3:
       ---
       1. Fix the interval mode output error
       2. Use cpu value (not cpu index) in config->aggr_get_id().
       3. Refine the code according to Jiri's comments.
      
       v2:
       ---
       Add the explanation in change log. This is essentially a replacement
       for the any bit. No code change.
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Tested-by: default avatarRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200214080452.26402-1-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      1af62ce6
    • Namhyung Kim's avatar
      tools lib api fs: Move cgroupsfs_find_mountpoint() · 7982a898
      Namhyung Kim authored
      Move it from tools/perf/util/cgroup.c as it can be used by other places.
      Note that cgroup filesystem is different from others since it's usually
      mounted separately (in v1) for each subsystem.
      
      I just copied the code with a little modification to pass a name of
      subsystem.
      Suggested-by: default avatarJiri Olsa <jolsa@redhat.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200127100031.1368732-1-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7982a898
    • Arnaldo Carvalho de Melo's avatar
    • Nick Desaulniers's avatar
      perf diff: Fix undefined string comparison spotted by clang's -Wstring-compare · c395c355
      Nick Desaulniers authored
      clang warns:
      
        util/block-info.c:298:18: error: result of comparison against a string
        literal is unspecified (use an explicit string comparison function
        instead) [-Werror,-Wstring-compare]
                if ((start_line != SRCLINE_UNKNOWN) && (end_line != SRCLINE_UNKNOWN)) {
                                ^  ~~~~~~~~~~~~~~~
        util/block-info.c:298:51: error: result of comparison against a string
        literal is unspecified (use an explicit string comparison function
        instead) [-Werror,-Wstring-compare]
                if ((start_line != SRCLINE_UNKNOWN) && (end_line != SRCLINE_UNKNOWN)) {
                                                                 ^  ~~~~~~~~~~~~~~~
        util/block-info.c:298:18: error: result of comparison against a string
        literal is unspecified (use an explicit string
        comparison function instead) [-Werror,-Wstring-compare]
                if ((start_line != SRCLINE_UNKNOWN) && (end_line != SRCLINE_UNKNOWN)) {
                                ^  ~~~~~~~~~~~~~~~
        util/block-info.c:298:51: error: result of comparison against a string
        literal is unspecified (use an explicit string comparison function
        instead) [-Werror,-Wstring-compare]
                if ((start_line != SRCLINE_UNKNOWN) && (end_line != SRCLINE_UNKNOWN)) {
                                                                 ^  ~~~~~~~~~~~~~~~
        util/map.c:434:15: error: result of comparison against a string literal
        is unspecified (use an explicit string comparison function instead)
        [-Werror,-Wstring-compare]
                        if (srcline != SRCLINE_UNKNOWN)
                                    ^  ~~~~~~~~~~~~~~~
      
      Reviewer Notes:
      
      Looks good to me. Some more context:
      https://clang.llvm.org/docs/DiagnosticsReference.html#wstring-compare
      The spec says:
      J.1 Unspecified behavior
      The following are unspecified:
      .. Whether two string literals result in distinct arrays (6.4.5).
      Signed-off-by: default avatarNick Desaulniers <nick.desaulniers@gmail.com>
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Changbin Du <changbin.du@intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Keeping <john@metanate.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: clang-built-linux@googlegroups.com
      Link: https://github.com/ClangBuiltLinux/linux/issues/900
      Link: http://lore.kernel.org/lkml/20200223193456.25291-1-nick.desaulniers@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c395c355
    • Ingo Molnar's avatar
      Merge tag 'perf-urgent-for-mingo-5.6-20200303' of... · b95b4d5e
      Ingo Molnar authored
      Merge tag 'perf-urgent-for-mingo-5.6-20200303' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
      
      Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
      
      perf symbols:
      
        Arnaldo Carvalho de Melo:
      
        - Don't try to find a vmlinux file when looking for kernel modules,
          fixing symbol resolution in systems with compressed kernel modules.
      
      perf env:
      
        Arnaldo Carvalho de Melo:
      
        - Do not return pointers to local variables, fixing valid warning from
          gcc 10 for corner case that stops the build due to -Werror.
      
      perf tests:
      
        Arnaldo Carvalho de Melo:
      
        - Make global variable static in the bp_account entry to fix build
          with gcc 10.
      
      perf parse-events:
      
        Arnaldo Carvalho de Melo:
      
        - Use asprintf() instead of strncpy() to read tracepoint files, addressing
          compiler warning that stops the build as we use -Werror.
      
      perf bench:
      
        Arnaldo Carvalho de Melo:
      
        - Share some global variables to fix build with gcc 10.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b95b4d5e