1. 08 Mar, 2016 19 commits
    • Kan Liang's avatar
      perf/x86/intel: Fix PEBS warning by only restoring active PMU in pmi · c3d266c8
      Kan Liang authored
      This patch tries to fix a PEBS warning found in my stress test. The
      following perf command can easily trigger the pebs warning or spurious
      NMI error on Skylake/Broadwell/Haswell platforms:
      
        sudo perf record -e 'cpu/umask=0x04,event=0xc4/pp,cycles,branches,ref-cycles,cache-misses,cache-references' --call-graph fp -b -c1000 -a
      
      Also the NMI watchdog must be enabled.
      
      For this case, the events number is larger than counter number. So
      perf has to do multiplexing.
      
      In perf_mux_hrtimer_handler, it does perf_pmu_disable(), schedule out
      old events, rotate_ctx, schedule in new events and finally
      perf_pmu_enable().
      
      If the old events include precise event, the MSR_IA32_PEBS_ENABLE
      should be cleared when perf_pmu_disable().  The MSR_IA32_PEBS_ENABLE
      should keep 0 until the perf_pmu_enable() is called and the new event is
      precise event.
      
      However, there is a corner case which could restore PEBS_ENABLE to
      stale value during the above period. In perf_pmu_disable(), GLOBAL_CTRL
      will be set to 0 to stop overflow and followed PMI. But there may be
      pending PMI from an earlier overflow, which cannot be stopped. So even
      GLOBAL_CTRL is cleared, the kernel still be possible to get PMI. At
      the end of the PMI handler, __intel_pmu_enable_all() will be called,
      which will restore the stale values if old events haven't scheduled
      out.
      
      Once the stale pebs value is set, it's impossible to be corrected if
      the new events are non-precise. Because the pebs_enabled will be set
      to 0. x86_pmu.enable_all() will ignore the MSR_IA32_PEBS_ENABLE
      setting. As a result, the following NMI with stale PEBS_ENABLE
      trigger pebs warning.
      
      The pending PMI after enabled=0 will become harmless if the NMI handler
      does not change the state. This patch checks cpuc->enabled in pmi and
      only restore the state when PMU is active.
      
      Here is the dump:
      
        Call Trace:
         <NMI>  [<ffffffff813c3a2e>] dump_stack+0x63/0x85
         [<ffffffff810a46f2>] warn_slowpath_common+0x82/0xc0
         [<ffffffff810a483a>] warn_slowpath_null+0x1a/0x20
         [<ffffffff8100fe2e>] intel_pmu_drain_pebs_nhm+0x2be/0x320
         [<ffffffff8100caa9>] intel_pmu_handle_irq+0x279/0x460
         [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40
         [<ffffffff811f290d>] ? vunmap_page_range+0x20d/0x330
         [<ffffffff811f2f11>] ?  unmap_kernel_range_noflush+0x11/0x20
         [<ffffffff8148379f>] ? ghes_copy_tofrom_phys+0x10f/0x2a0
         [<ffffffff814839c8>] ? ghes_read_estatus+0x98/0x170
         [<ffffffff81005a7d>] perf_event_nmi_handler+0x2d/0x50
         [<ffffffff810310b9>] nmi_handle+0x69/0x120
         [<ffffffff810316f6>] default_do_nmi+0xe6/0x100
         [<ffffffff810317f2>] do_nmi+0xe2/0x130
         [<ffffffff817aea71>] end_repeat_nmi+0x1a/0x1e
         [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40
         [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40
         [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40
         <<EOE>>  <IRQ>  [<ffffffff81006df8>] ?  x86_perf_event_set_period+0xd8/0x180
         [<ffffffff81006eec>] x86_pmu_start+0x4c/0x100
         [<ffffffff8100722d>] x86_pmu_enable+0x28d/0x300
         [<ffffffff811994d7>] perf_pmu_enable.part.81+0x7/0x10
         [<ffffffff8119cb70>] perf_mux_hrtimer_handler+0x200/0x280
         [<ffffffff8119c970>] ?  __perf_install_in_context+0xc0/0xc0
         [<ffffffff8110f92d>] __hrtimer_run_queues+0xfd/0x280
         [<ffffffff811100d8>] hrtimer_interrupt+0xa8/0x190
         [<ffffffff81199080>] ?  __perf_read_group_add.part.61+0x1a0/0x1a0
         [<ffffffff81051bd8>] local_apic_timer_interrupt+0x38/0x60
         [<ffffffff817af01d>] smp_apic_timer_interrupt+0x3d/0x50
         [<ffffffff817ad15c>] apic_timer_interrupt+0x8c/0xa0
         <EOI>  [<ffffffff81199080>] ?  __perf_read_group_add.part.61+0x1a0/0x1a0
         [<ffffffff81123de5>] ?  smp_call_function_single+0xd5/0x130
         [<ffffffff81123ddb>] ?  smp_call_function_single+0xcb/0x130
         [<ffffffff81199080>] ?  __perf_read_group_add.part.61+0x1a0/0x1a0
         [<ffffffff8119765a>] event_function_call+0x10a/0x120
         [<ffffffff8119c660>] ? ctx_resched+0x90/0x90
         [<ffffffff811971e0>] ? cpu_clock_event_read+0x30/0x30
         [<ffffffff811976d0>] ? _perf_event_disable+0x60/0x60
         [<ffffffff8119772b>] _perf_event_enable+0x5b/0x70
         [<ffffffff81197388>] perf_event_for_each_child+0x38/0xa0
         [<ffffffff811976d0>] ? _perf_event_disable+0x60/0x60
         [<ffffffff811a0ffd>] perf_ioctl+0x12d/0x3c0
         [<ffffffff8134d855>] ? selinux_file_ioctl+0x95/0x1e0
         [<ffffffff8124a3a1>] do_vfs_ioctl+0xa1/0x5a0
         [<ffffffff81036d29>] ? sched_clock+0x9/0x10
         [<ffffffff8124a919>] SyS_ioctl+0x79/0x90
         [<ffffffff817ac4b2>] entry_SYSCALL_64_fastpath+0x1a/0xa4
        ---[ end trace aef202839fe9a71d ]---
        Uhhuh. NMI received for unknown reason 2d on CPU 2.
        Do you have a strange power saving mode enabled?
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1457046448-6184-1-git-send-email-kan.liang@intel.com
      [ Fixed various typos and other small details. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c3d266c8
    • Jiri Olsa's avatar
      perf/x86/intel: Use PAGE_SIZE for PEBS buffer size on Core2 · e72daf3f
      Jiri Olsa authored
      Using PAGE_SIZE buffers makes the WRMSR to PERF_GLOBAL_CTRL in
      intel_pmu_enable_all() mysteriously hang on Core2. As a workaround, we
      don't do this.
      
      The hard lockup is easily triggered by running 'perf test attr'
      repeatedly. Most of the time it gets stuck on sample session with
      small periods.
      
        # perf test attr -vv
        14: struct perf_event_attr setup                             :
        --- start ---
        ...
          'PERF_TEST_ATTR=/tmp/tmpuEKz3B /usr/bin/perf record -o /tmp/tmpuEKz3B/perf.data -c 123 kill >/dev/null 2>&1' ret 1
      Reported-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/20160301190352.GA8355@krava.redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e72daf3f
    • Alexander Shishkin's avatar
      perf/core: Fix perf_sched_count derailment · 927a5570
      Alexander Shishkin authored
      The error path in perf_event_open() is such that asking for a sampling
      event on a PMU that doesn't generate interrupts will end up in dropping
      the perf_sched_count even though it hasn't been incremented for this
      event yet.
      
      Given a sufficient amount of these calls, we'll end up disabling
      scheduler's jump label even though we'd still have active events in the
      system, thereby facilitating the arrival of the infernal regions upon us.
      
      I'm fixing this by moving account_event() inside perf_event_alloc().
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/1456917854-29427-1-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      927a5570
    • Ingo Molnar's avatar
      Merge branch 'email/acme' into perf/core · b9461ba8
      Ingo Molnar authored
      Merge perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
          User visible changes:
      
       - Allow grouping multiple sort keys per 'perf report/top --hierarchy'
         level (Namhyung Kim)
      
       - Document 'perf stat --detailed' option (Borislav Petkov)
      
      Infrastructure changes:
      
       - jitdump prep work for supporting it with Intel PT (Adrian Hunter)
      
       - Use 64-bit shifts with (TSC) time conversion (Adrian Hunter)
      
      Fixes:
      
       - Explicitly declare inc_group_count as a void function (Colin Ian King)
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b9461ba8
    • Namhyung Kim's avatar
      perf report: Use hierarchy hpp list on gtk · 58ecd33b
      Namhyung Kim authored
      Now hpp formats are linked using perf_hpp_list_node when hierarchy is
      enabled.  Like in stdio, use this info to print entries with multiple
      sort keys in a single hierarchy properly.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1457361308-514-8-git-send-email-namhyung@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      58ecd33b
    • Namhyung Kim's avatar
      perf hists browser: Use hierarchy hpp list · a61a22f6
      Namhyung Kim authored
      Now hpp formats are linked using perf_hpp_list_node when hierarchy is
      enabled.  Like in stdio, use this info to print entries with multiple
      sort keys in a single hierarchy properly.
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1457361308-514-7-git-send-email-namhyung@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a61a22f6
    • Namhyung Kim's avatar
      perf report: Use hierarchy hpp list on stdio · f58c95e3
      Namhyung Kim authored
      Now hpp formats are linked using perf_hpp_list_node when hierarchy is
      enabled.  Use this info to print entries with multiple sort keys in a
      single hierarchy properly.
      
      For example, the below example shows using 4 sort keys with 2 levels.
      
        $ perf report --hierarchy -s '{prev_pid,prev_comm},{next_pid,next_comm}' \
         --percent-limit 1 -i perf.data.sched
        ...
        #    Overhead  prev_pid+prev_comm / next_pid+next_comm
        # ...........  .......................................
        #
            22.36%     0  swapper/0
                9.48%     17773  transmission-gt
                5.25%     109  kworker/0:1H
                1.53%     6524  Xephyr
            21.39%     17773  transmission-gt
                9.52%     0  swapper/0
                9.04%     0  swapper/2
                1.78%     0  swapper/3
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1457361308-514-6-git-send-email-namhyung@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f58c95e3
    • Namhyung Kim's avatar
      perf hists: Fix indent for multiple hierarchy sort key · 2dbbe9f2
      Namhyung Kim authored
      When multiple sort keys are used in a single hierarchy, it should indent
      using number of hierarchy levels instead of number of sort keys.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1457361308-514-5-git-send-email-namhyung@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2dbbe9f2
    • Namhyung Kim's avatar
      perf hists: Support multiple sort keys in a hierarchy level · a23f37e8
      Namhyung Kim authored
      This implements having multiple sort keys in a single hierarchy level.
      Originally only single sort key is supported for each level, but now
      using the group syntax with '{ }', it can set more than one sort key in
      one level.  Note that now it needs to quote in order to prevent shell
      interpretation.
      
      For example:
      
        $ perf report --hierarchy -s '{comm,dso},sym'
        ...
        #       Overhead  Command / Shared Object / Symbol
        # ..............  ..........................................
        #
            48.67%        swapper          [kernel.vmlinux]
               34.42%        [k] intel_idle
                1.30%        [k] __tick_nohz_idle_enter
                1.03%        [k] cpuidle_reflect
             8.87%        firefox          libpthread-2.22.so
                6.60%        [.] __GI___libc_recvmsg
                1.18%        [.] pthread_cond_signal@@GLIBC_2.3.2
                1.09%        [.] 0x000000000000ff4b
             6.11%        Xorg             libc-2.22.so
                5.27%        [.] __memcpy_sse2_unaligned
      
      In the above example, the command name and the shared object name are
      shown on the same line but the symbol name is on the different line.
      Since the first two are grouped by '{}', they are in the same level.
      
      Suggested-and-Tested=by: Arnaldo Carvalho de Melo <acme@kernel.org>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1457361308-514-4-git-send-email-namhyung@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a23f37e8
    • Namhyung Kim's avatar
      perf hists: Use own hpp_list for hierarchy mode · 1b2dbbf4
      Namhyung Kim authored
      Now each hists has its own hpp lists in hierarchy.  So instead of having
      a pointer to a single perf_hpp_fmt in a hist entry, make it point the
      hpp_list for its level.  This will be used to support multiple sort keys
      in a single hierarchy level.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1457361308-514-3-git-send-email-namhyung@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1b2dbbf4
    • Namhyung Kim's avatar
      perf hists: Introduce perf_hpp__setup_hists_formats() · c3bc0c43
      Namhyung Kim authored
      The perf_hpp__setup_hists_formats() is to build hists-specific output
      formats (and sort keys).  Currently it's only used in order to build the
      output format in a hierarchy with same sort keys, but it could be used
      with different sort keys in non-hierarchy mode later.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1457361308-514-2-git-send-email-namhyung@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c3bc0c43
    • Borislav Petkov's avatar
      perf stat: Document --detailed option · f594bae0
      Borislav Petkov authored
      I'm surprised this remained undocumented since at least 2011. And it is
      actually a very useful switch, as Steve and I came to realize recently.
      
      Add the text from
      
        2cba3ffb ("perf stat: Add -d -d and -d -d -d options to show more CPU events")
      
      which added the incrementing aspect to -d.
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Davidlohr Bueso <dbueso@suse.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mel Gorman <mgorman@suse.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 2cba3ffb ("perf stat: Add -d -d and -d -d -d options to show more CPU events")
      Link: http://lkml.kernel.org/r/1457347294-32546-1-git-send-email-bp@alien8.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f594bae0
    • Namhyung Kim's avatar
      perf hists: Add level field to struct perf_hpp_fmt · 4b633eba
      Namhyung Kim authored
      The level field is to distinguish levels in the hierarchy mode.
      Currently each column (perf_hpp_fmt) has a different level.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1457103582-28396-2-git-send-email-namhyung@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      4b633eba
    • Adrian Hunter's avatar
      perf tools: Use 64-bit shifts with (TSC) time conversion · a23f96ee
      Adrian Hunter authored
      Commit b9511cd7 ("perf/x86: Fix time_shift in perf_event_mmap_page")
      altered the time conversion algorithms documented in the perf_event.h
      header file, to use 64-bit shifts.  That was done to make the code more
      future-proof (i.e. some time in the future a 32-bit shift could be
      allowed).  Reflect those changes in perf tools.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1457005856-6143-9-git-send-email-adrian.hunter@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a23f96ee
    • Adrian Hunter's avatar
      perf jit: Move clockid validation · 4a018cc4
      Adrian Hunter authored
      Move clockid validation into jit_process() so it can later be made
      conditional.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1457005856-6143-6-git-send-email-adrian.hunter@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      4a018cc4
    • Adrian Hunter's avatar
      perf jit: Let jit_process() return errors · 570735b3
      Adrian Hunter authored
      In preparation for moving clockid validation into jit_process().
      
      Previously a return value of zero meant the processing had been done and
      non-zero meant either the processing was not done (i.e. not the jitdump
      file mmap event) or an error occurred.
      
      Change it so that zero means the processing was not done, one means the
      processing was done and successful, and negative values are an error.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1457005856-6143-5-git-send-email-adrian.hunter@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      570735b3
    • Adrian Hunter's avatar
      perf session: Simplify tool stubs · 5fb0ac16
      Adrian Hunter authored
      Some of the stubs are identical so just have one function for them.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1457005856-6143-3-git-send-email-adrian.hunter@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5fb0ac16
    • Adrian Hunter's avatar
      perf inject: Hit all DSOs for AUX data in JIT and other cases · 640dad47
      Adrian Hunter authored
      Currently, when injecting build ids, if there is AUX data then 'perf
      inject' hits all DSOs because it is not known which DSOs the trace data
      would hit.
      
      That needs to be done for JIT injection also, and in fact there is no
      reason to distinguish what kind of injection is being done.  That is,
      any time there is AUX data and the HEADER_BUID_ID feature flag is set,
      and the AUX data is not being processed, then hit all DSOs.  This patch
      does that.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1457005856-6143-2-git-send-email-adrian.hunter@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      640dad47
    • Colin Ian King's avatar
      perf tools: Explicitly declare inc_group_count as a void function · 07ef7574
      Colin Ian King authored
      The return type is not defined, so it defaults to int, however, the
      function is not returning anything, so this is clearly not correct. Make
      it a void function.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1457008214-14393-1-git-send-email-colin.king@canonical.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      07ef7574
  2. 04 Mar, 2016 1 commit
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo-20160303' of... · 00966852
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo-20160303' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes:
      
      User visible changes:
      
       - Check existence of frontend/backed stalled cycles in 'perf stat' (Andi Kleen)
      
       - Implement CSV metrics output in 'perf stat' (Andi Kleen)
      
       - Support metrics in 'perf stat' --per-core/socket mode (Andi Kleen)
      
       - Avoid installing .o files from tools/lib/ into the python extension (Jiri Olsa)
      
       - Rename the tracepoint '/format' field that carries the syscall ID from 'nr',
         that is also the name of some syscalls arguments, to "__syscall_nr", to
         avoid having multiple fields with the same name, that was breaking the
         python script skeleton generator from perf.data files (Taeung Song)
      
       - Support converting data from bpf events in 'perf data' (Wang Nan)
      
       - Fix segfault in 'perf test' hists related entries (Arnaldo Carvalho de Melo)
      
       - Fix output of %llu for 64 bit values read on 32 bit machines in libtraceevent (Steven Rostedt)
      
       - Fix time stamp rounding issue in libtraceevent (Chaos.Chen)
      
      Infrastructure changes:
      
       - Fix setlocale() breakage in the pmu parsing code (Jiri Olsa)
      
       - Split libtraceevent's pevent_print_event() (Steven Rostedt)
      
       - Librarize some 'perf record' bits to allow handling multiple perf.data
         files per session (Wang Nan)
      
       - Ensure return non-zero rc when mmap fails in 'perf record' (Wang Nan)
      
       - Fix double free on 'command_line' in a error path in 'perf script' (Colin Ian King)
      
       - Initialize struct sigaction 'sa_flags' field in a 'perf test' entry (Colin Ian King)
      
       - Fix various build warnings in turbostat, detected with gcc6 (Colin Ian King)
      
       - Use .s extension for preprocessed assembler code (Masahiro Yamada)
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      00966852
  3. 03 Mar, 2016 20 commits
    • Andi Kleen's avatar
      perf stat: Check for frontend stalled for metrics · fb4605ba
      Andi Kleen authored
      Add an extra check for frontend stalled in the metrics.  This avoids an
      extra column for the --metric-only case when the CPU does not support
      frontend stalled.
      
      v2: Add separate init function
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/1456858672-21594-8-git-send-email-andi@firstfloor.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      fb4605ba
    • Colin Ian King's avatar
      tools/power turbostat: fix various build warnings · 1b69317d
      Colin Ian King authored
      When building with gcc 6 we're getting various build warnings that just
      require some trivial function declaration and call fixes:
      
        turbostat.c: In function ‘dump_cstate_pstate_config_info’:
        turbostat.c:1973:1: warning: type of ‘family’ defaults to ‘int’
         dump_cstate_pstate_config_info(family, model)
        turbostat.c:1973:1: warning: type of ‘model’ defaults to ‘int’
        turbostat.c: In function ‘get_tdp’:
        turbostat.c:2145:8: warning: type of ‘model’ defaults to ‘int’
         double get_tdp(model)
        turbostat.c: In function ‘perf_limit_reasons_probe’:
        turbostat.c:2259:6: warning: type of ‘family’ defaults to ‘int’
         void perf_limit_reasons_probe(family, model)
        turbostat.c:2259:6: warning: type of ‘model’ defaults to ‘int’
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-wbicer8n0s9qe6ql8h9x478e@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      1b69317d
    • Colin Ian King's avatar
      perf tests: Initialize sa.sa_flags · e17a0e16
      Colin Ian King authored
      The sa_flags field is not being initialized, so a garbage value is being
      passed to sigaction.  Initialize it to zero.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1456923322-29697-1-git-send-email-colin.king@canonical.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e17a0e16
    • Arnaldo Carvalho de Melo's avatar
      perf test: Fix hists related entries · 9b240637
      Arnaldo Carvalho de Melo authored
      That got broken by d3a72fd8 ("perf report: Fix indentation of
      dynamic entries in hierarchy"), by using the evlist in setup_sorting()
      without checking if it is NULL, as done in some 'perf test' entries:
      
        $ find tools/ -name "*.c" | xargs grep 'setup_sorting(NULL);'
        tools/perf/tests/hists_output.c:      setup_sorting(NULL);
        tools/perf/tests/hists_output.c:      setup_sorting(NULL);
        tools/perf/tests/hists_output.c:      setup_sorting(NULL);
        tools/perf/tests/hists_output.c:      setup_sorting(NULL);
        tools/perf/tests/hists_output.c:      setup_sorting(NULL);
        tools/perf/tests/hists_cumulate.c:    setup_sorting(NULL);
        tools/perf/tests/hists_cumulate.c:    setup_sorting(NULL);
        tools/perf/tests/hists_cumulate.c:    setup_sorting(NULL);
        tools/perf/tests/hists_cumulate.c:    setup_sorting(NULL);
        $
      
      Fix it.
      
      Before:
      
        [root@jouet ~]# perf test
        <SNIP>
        15: Test matching and linking multiple hists                 : FAILED!
        16: Try 'import perf' in python, checking link problems      : Ok
        17: Test breakpoint overflow signal handler                  : Ok
        18: Test breakpoint overflow sampling                        : Ok
        19: Test number of exit event of a simple workload           : Ok
        20: Test software clock events have valid period values      : Ok
        21: Test object code reading                                 : Ok
        22: Test sample parsing                                      : Ok
        23: Test using a dummy software event to keep tracking       : Ok
        24: Test parsing with no sample_id_all bit set               : Ok
        25: Test filtering hist entries                              : FAILED!
        26: Test mmap thread lookup                                  : Ok
        27: Test thread mg sharing                                   : Ok
        28: Test output sorting of hist entries                      : FAILED!
        29: Test cumulation of child hist entries                    : FAILED!
        <SNIP>
      
      After the patch the above failed tests complete successfully.
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: d3a72fd8 ("perf report: Fix indentation of dynamic entries in hierarchy")
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9b240637
    • Steven Rostedt (Red Hat)'s avatar
      tools lib traceevent: Fix output of %llu for 64 bit values read on 32 bit machines · a66673a0
      Steven Rostedt (Red Hat) authored
      When a long value is read on 32 bit machines for 64 bit output, the
      parsing needs to change "%lu" into "%llu", as the value is read
      natively.
      
      Unfortunately, if "%llu" is already there, the code will add another "l"
      to it and fail to parse it properly.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20160209204237.337024613@goodmis.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a66673a0
    • Steven Rostedt (Red Hat)'s avatar
      tools lib traceevent: Set int_array fields to NULL if freeing from error · 9ec72eaf
      Steven Rostedt (Red Hat) authored
      Had a bug where on error of parsing __print_array() where the fields are
      freed after they were allocated, but since they were not set to NULL,
      the freeing of the arg also tried to free the already freed fields
      causing a double free.
      
      Fix process_hex() while at it.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20160209204237.188327674@goodmis.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9ec72eaf
    • Chaos.Chen's avatar
      tools lib traceevent: Fix time stamp rounding issue · 21a30100
      Chaos.Chen authored
      When rounding to microseconds, if the timestamp subsecond is between
      .999999500 and .999999999, it is rounded to .1000000, when it should
      instead increment the second counter due to the overflow.
      
      For example, if the timestamp is 1234.999999501 instead of seeing:
      
        1235.000000
      
      we see:
      
        1234.1000000
      Signed-off-by: default avatarChaos.Chen <rainboy1215@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20160209204236.824426460@goodmis.org
      [ fixed incrementing "secs" instead of decrementing it ]
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      21a30100
    • Colin Ian King's avatar
      perf script: Fix double free on command_line · 979ac257
      Colin Ian King authored
      The 'command_line' variable is free'd twice if db_export__branch_types()
      fails. To avoid this, defer the free'ing of 'command_line' to after this
      call so that the error return path will just free 'command_line' once.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Javi Merino <javi.merino@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/1456875980-25606-1-git-send-email-colin.king@canonical.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      979ac257
    • Masahiro Yamada's avatar
      tools build: Use .s extension for preprocessed assembler code · 67678793
      Masahiro Yamada authored
      The "man gcc" says .i extension represents the file is C source code
      that should not be preprocessed.  Here, .s should be used.
      
      For clarification,
        .c  ---(preprocess)--->  .i
        .S  ---(preprocess)--->  .s
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Aaro Koskinen <aaro.koskinen@nokia.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Lukas Wunner <lukas@wunner.de>
      Link: http://lkml.kernel.org/r/1454263140-19670-1-git-send-email-yamada.masahiro@socionext.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      67678793
    • Andi Kleen's avatar
      perf stat: Support metrics in --per-core/socket mode · 44d49a60
      Andi Kleen authored
      Enable metrics printing in --per-core / --per-socket mode. We need to
      save the shadow metrics in a unique place. Always use the first CPU in
      the aggregation. Then use the same CPU to retrieve the shadow value
      later.
      
      Example output:
      
        % perf stat --per-core -a ./BC1s
      
         Performance counter stats for 'system wide':
      
        S0-C0 2   2966.020381 task-clock (msec) #   2.004 CPUs utilized  (100.00%)
        S0-C0 2            49 context-switches  #   0.017 K/sec          (100.00%)
        S0-C0 2             4 cpu-migrations    #   0.001 K/sec          (100.00%)
        S0-C0 2           467 page-faults       #   0.157 K/sec
        S0-C0 2 4,599,061,773 cycles            #   1.551 GHz            (100.00%)
        S0-C0 2 9,755,886,883 instructions      #   2.12  insn per cycle (100.00%)
        S0-C0 2 1,906,272,125 branches          # 642.704 M/sec          (100.00%)
        S0-C0 2    81,180,867 branch-misses     #   4.26% of all branches
        S0-C1 2   2965.995373 task-clock (msec) #   2.003 CPUs utilized  (100.00%)
        S0-C1 2            62 context-switches  #   0.021 K/sec          (100.00%)
        S0-C1 2             8 cpu-migrations    #   0.003 K/sec          (100.00%)
        S0-C1 2           281 page-faults       #   0.095 K/sec
        S0-C1 2     6,347,290 cycles            #   0.002 GHz            (100.00%)
        S0-C1 2     4,654,156 instructions      #   0.73  insn per cycle (100.00%)
        S0-C1 2       947,121 branches          #   0.319 M/sec          (100.00%)
        S0-C1 2        37,322 branch-misses     #   3.94% of all branches
      
               1.480409747 seconds time elapsed
      
      v2: Rebase to older patches
      v3: Document shadow cpus. Fix aggr_get_id argument. Fix -A shadows (Jiri)
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/1456785386-19481-4-git-send-email-andi@firstfloor.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      44d49a60
    • Andi Kleen's avatar
      perf stat: Implement CSV metrics output · 92a61f64
      Andi Kleen authored
      Now support CSV output for metrics. With the new output callbacks this
      is relatively straight forward by creating new callbacks.
      
      This allows to easily plot metrics from CSV files.
      
      The new line callback needs to know the number of fields to skip them
      correctly
      
      Example output before:
      
        % perf stat -x, true
        0.200687,,task-clock,200687,100.00
        0,,context-switches,200687,100.00
        0,,cpu-migrations,200687,100.00
        40,,page-faults,200687,100.00
        730871,,cycles,203601,100.00
        551056,,stalled-cycles-frontend,203601,100.00
        <not supported>,,stalled-cycles-backend,0,100.00
        385523,,instructions,203601,100.00
        78028,,branches,203601,100.00
        3946,,branch-misses,203601,100.00
      
      After:
      
        % perf stat -x, true
        .502457,,task-clock,502457,100.00,0.485,CPUs utilized
        0,,context-switches,502457,100.00,0.000,K/sec
        0,,cpu-migrations,502457,100.00,0.000,K/sec
        45,,page-faults,502457,100.00,0.090,M/sec
        644692,,cycles,509102,100.00,1.283,GHz
        423470,,stalled-cycles-frontend,509102,100.00,65.69,frontend cycles idle
        <not supported>,,stalled-cycles-backend,0,100.00,,,,
        492701,,instructions,509102,100.00,0.76,insn per cycle
        ,,,,,0.86,stalled cycles per insn
        97767,,branches,509102,100.00,194.578,M/sec
        4788,,branch-misses,509102,100.00,4.90,of all branches
      
      or easier readable
      
        $ perf stat  -x, -o x.csv true
        $ column -s, -t x.csv
        0.490635        task-clock              490635 100.00 0.489   CPUs utilized
        0               context-switches        490635 100.00 0.000   K/sec
        0               cpu-migrations          490635 100.00 0.000   K/sec
        45              page-faults             490635 100.00 0.092   M/sec
        629080          cycles                  497698 100.00 1.282   GHz
        409498          stalled-cycles-frontend 497698 100.00 65.09   frontend cycles idle
        <not supported> stalled-cycles-backend  0      100.00
        491424          instructions            497698 100.00 0.78    insn per cycle
                                                              0.83    stalled cycles per insn
        97278           branches                497698 100.00 198.270 M/sec
        4569            branch-misses           497698 100.00 4.70    of all branches
      
      Two new fields are added: metric value and metric name.
      
      v2: Split out function argument changes
      v3: Reenable metrics for real.
      v4: Fix wrong hunk from refactoring.
      v5: Remove extra "noise" printing (Jiri), but add it to the not counted case.
      Print empty metrics for not counted.
      v6: Avoid outputting metric on empty format.
      v7: Print metric at the end
      v8: Remove extra run, ena fields
      v9: Avoid extra new line for unsupported counters
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Link: http://lkml.kernel.org/r/1456785386-19481-3-git-send-email-andi@firstfloor.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      92a61f64
    • Wang Nan's avatar
      perf record: Ensure return non-zero rc when mmap fail · 95c36561
      Wang Nan authored
      perf_evlist__mmap_ex() can fail without setting errno (for example, fail
      in condition checking. In this case all syscall is success).
      
      If this happen, record__open() incorrectly returns 0. Force setting rc
      is a quick way to avoid this problem, or we have to follow all possible
      code path in perf_evlist__mmap_ex() to make sure there's at least one
      system call before returning an error.
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1456479154-136027-30-git-send-email-wangnan0@huawei.comSigned-off-by: default avatarHe Kuang <hekuang@huawei.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      95c36561
    • Wang Nan's avatar
      perf record: Introduce record__finish_output() to finish a perf.data · e1ab48ba
      Wang Nan authored
      Move code for finalizing 'perf.data' to record__finish_output(). It will
      be used by following commits to split output to multiple files.
      Signed-off-by: default avatarHe Kuang <hekuang@huawei.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1456479154-136027-23-git-send-email-wangnan0@huawei.comSigned-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e1ab48ba
    • Wang Nan's avatar
      perf record: Extract synthesize code to record__synthesize() · c45c86eb
      Wang Nan authored
      Create record__synthesize(). It can be used to create tracking events
      for each perf.data after perf supporting splitting into multiple
      outputs.
      Signed-off-by: default avatarHe Kuang <hekuang@huawei.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1456479154-136027-20-git-send-email-wangnan0@huawei.comSigned-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c45c86eb
    • Wang Nan's avatar
      perf record: Use WARN_ONCE to replace 'if' condition · d8871ea7
      Wang Nan authored
      Commits in a BPF patchkit will extract kernel and module synthesizing
      code into a separated function and call it multiple times. This patch
      replace 'if (err < 0)' using WARN_ONCE, makes sure the error message
      show one time.
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1456479154-136027-19-git-send-email-wangnan0@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d8871ea7
    • Wang Nan's avatar
      perf data: Explicitly set byte order for integer types · f8dd2d5f
      Wang Nan authored
      After babeltrace commit 5cec03e402aa ("ir: copy variants and sequences
      when setting a field path"), 'perf data convert' gets incorrect result
      if there's bpf output data. For example:
      
       # perf data convert --to-ctf ./out.ctf
       # babeltrace ./out.ctf
       [10:44:31.186045346] (+?.?????????) evt: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF810E7DD1, perf_tid = 23819, perf_pid = 23819, perf_id = 518, raw_len = 3, raw_data = [ [0] = 0xC028E32F, [1] = 0x815D0100, [2] = 0x1000000 ] }
       [10:44:31.286101003] (+0.100055657) evt: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF8105B609, perf_tid = 23819, perf_pid = 23819, perf_id = 518, raw_len = 3, raw_data = [ [0] = 0x35D9F1EB, [1] = 0x15D81, [2] = 0x2 ] }
      
      The expected result of the first sample should be:
      
       raw_data = [ [0] = 0x2FE328C0, [1] = 0x15D81, [2] = 0x1 ] }
      
      however, 'perf data convert' output big endian value to resuling CTF
      file.
      
      The reason is a internal change (or a bug?) of babeltrace.
      
      Before this patch, at the first add_bpf_output_values(), byte order of
      all integer type is uncertain (is 0, neither 1234 (le) nor 4321 (be)).
      It would be fixed by:
      
      perf_evlist__deliver_sample
       -> process_sample_event
         -> ctf_stream
            ...
            ->bt_ctf_trace_add_stream_class
              ->bt_ctf_field_type_structure_set_byte_order
                ->bt_ctf_field_type_integer_set_byte_order
      
      during creating the stream.
      
      However, the babeltrace commit mentioned above duplicates types in
      sequence to prevent potential conflict in following call stack and link
      the newly allocated type into the 'raw_data' sequence:
      
      perf_evlist__deliver_sample
       -> process_sample_event
         -> ctf_stream
            ...
            -> bt_ctf_trace_add_stream_class
              -> bt_ctf_stream_class_resolve_types
                 ...
                 -> bt_ctf_field_type_sequence_copy
                   ->bt_ctf_field_type_integer_copy
      
      This happens before byte order setting, so only the newly allocated
      type is initialized, the byte order of original type perf choose to
      create the first raw_data is still uncertain.
      
      Byte order in CTF output is not related to byte order in perf.data.
      Setting it to anything other than BT_CTF_BYTE_ORDER_NATIVE solves this
      problem (only BT_CTF_BYTE_ORDER_NATIVE needs to be fixed). To reduce
      behavior changing, set byte order according to compiling options.
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jérémie Galarneau <jeremie.galarneau@efficios.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1456479154-136027-10-git-send-email-wangnan0@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f8dd2d5f
    • Wang Nan's avatar
      perf data: Support converting data from bpf_perf_event_output() · 6122d57e
      Wang Nan authored
      bpf_perf_event_output() outputs data through sample->raw_data. This
      patch adds support to convert those data into CTF. A python script then
      can be used to process output data from BPF programs.
      
      Test result:
      
        # cat ./test_bpf_output_2.c
        /************************ BEGIN **************************/
        #include <uapi/linux/bpf.h>
        struct bpf_map_def {
       	unsigned int type;
       	unsigned int key_size;
       	unsigned int value_size;
       	unsigned int max_entries;
        };
        #define SEC(NAME) __attribute__((section(NAME), used))
        static u64 (*ktime_get_ns)(void) =
       	(void *)BPF_FUNC_ktime_get_ns;
        static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
       	(void *)BPF_FUNC_trace_printk;
        static int (*get_smp_processor_id)(void) =
       	(void *)BPF_FUNC_get_smp_processor_id;
        static int (*perf_event_output)(void *, struct bpf_map_def *, int, void *, unsigned long) =
       	(void *)BPF_FUNC_perf_event_output;
      
        struct bpf_map_def SEC("maps") channel = {
       	.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
       	.key_size = sizeof(int),
       	.value_size = sizeof(u32),
       	.max_entries = __NR_CPUS__,
        };
      
        static inline int __attribute__((always_inline))
        func(void *ctx, int type)
        {
       	struct {
       		u64 ktime;
       		int type;
       	} __attribute__((packed)) output_data;
       	char error_data[] = "Error: failed to output\n";
       	int err;
      
       	output_data.type = type;
       	output_data.ktime = ktime_get_ns();
       	err = perf_event_output(ctx, &channel, get_smp_processor_id(),
       				&output_data, sizeof(output_data));
       	if (err)
       		trace_printk(error_data, sizeof(error_data));
       	return 0;
        }
        SEC("func_begin=sys_nanosleep")
        int func_begin(void *ctx) {return func(ctx, 1);}
        SEC("func_end=sys_nanosleep%return")
        int func_end(void *ctx) { return func(ctx, 2);}
        char _license[] SEC("license") = "GPL";
        int _version SEC("version") = LINUX_VERSION_CODE;
        /************************* END ***************************/
      
        # ./perf record -e bpf-output/no-inherit,name=evt/ \
                       -e ./test_bpf_output_2.c/map:channel.event=evt/ \
                       usleep 100000
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.012 MB perf.data (2 samples) ]
      
        # ./perf script
                usleep 14942 92503.198504: evt:  ffffffff810e0ba1 sys_nanosleep (/lib/modules/4.3.0....
                usleep 14942 92503.298562: evt:  ffffffff810585e9 kretprobe_trampoline_holder (/lib....
      
        # ./perf data convert --to-ctf ./out.ctf
        [ perf data convert: Converted 'perf.data' into CTF data './out.ctf' ]
        [ perf data convert: Converted and wrote 0.000 MB (2 samples) ]
      
        # babeltrace ./out.ctf
        [01:41:43.198504134] (+?.?????????) evt: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF810E0BA1, perf_tid = 14942, perf_pid = 14942, perf_id = 1044, raw_len = 3, raw_data = [ [0] = 0x32C0C07B, [1] = 0x5421, [2] = 0x1 ] }
        [01:41:43.298562257] (+0.100058123) evt: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF810585E9, perf_tid = 14942, perf_pid = 14942, perf_id = 1044, raw_len = 3, raw_data = [ [0] = 0x38B77FAA, [1] = 0x5421, [2] = 0x2 ] }
      
        # cat ./test_bpf_output_2.py
        from babeltrace import TraceCollection
        tc = TraceCollection()
        tc.add_trace('./out.ctf', 'ctf')
        d = {1:[], 2:[]}
        for event in tc.events:
           if not event.name.startswith('evt'):
               continue
           raw_data = event['raw_data']
           (time, type) = ((raw_data[0] + (raw_data[1] << 32)), raw_data[2])
           d[type].append(time)
        print(list(map(lambda i: d[2][i] - d[1][i], range(len(d[1])))));
      
        # python3 ./test_bpf_output_2.py
        [100056879]
      
      Committer note:
      
      Make sure you have python3-devel installed, not python-devel, which may
      be for python2, which will lead to some "PyInstance_Type" errors. Also
      make sure that you use the right libbabeltrace, because it is shipped
      in Fedora, for instance, but an older version.
      
      To build libbabeltrace's python binding one also needs to use:
      
       ./configure --enable-python-bindings
      
      And then set PYTHONPATH=/usr/local/lib64/python3.4/site-packages/.
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1456479154-136027-9-git-send-email-wangnan0@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6122d57e
    • Andi Kleen's avatar
      perf stat: Check existence of frontend/backed stalled cycles · 9dec4473
      Andi Kleen authored
      Only put the frontend/backend stalled cycles into the default perf stat
      events when the CPU actually supports them.
      
      This avoids empty columns with --metric-only on newer Intel CPUs.
      
      Committer note:
      
      Before:
      
        $ perf stat ls
      
          Performance counter stats for 'ls':
      
                1.080893     task-clock (msec)      #    0.619 CPUs utilized
                       0     context-switches       #    0.000 K/sec
                       0     cpu-migrations         #    0.000 K/sec
                      97     page-faults            #    0.090 M/sec
               3,327,741     cycles                 #    3.079 GHz
         <not supported>     stalled-cycles-frontend
         <not supported>     stalled-cycles-backend
               1,609,544     instructions           #    0.48  insn per cycle
                 319,117     branches               #  295.235 M/sec
                  12,246     branch-misses          #    3.84% of all branches
      
             0.001746508 seconds time elapsed
        $
      
      After:
      
        $ perf stat ls
      
          Performance counter stats for 'ls':
      
                0.693948     task-clock (msec)      #    0.662 CPUs utilized
                       0     context-switches       #    0.000 K/sec
                       0     cpu-migrations         #    0.000 K/sec
                      95     page-faults            #    0.137 M/sec
               1,792,509     cycles                 #    2.583 GHz
               1,599,047     instructions           #    0.89  insn per cycle
                 316,328     branches               #  455.838 M/sec
                  12,453     branch-misses          #    3.94% of all branches
      
             0.001048987 seconds time elapsed
        $
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1456532881-26621-2-git-send-email-andi@firstfloor.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9dec4473
    • Jiri Olsa's avatar
      perf tools: Fix locale handling in pmu parsing · f9a5978a
      Jiri Olsa authored
      Ingo reported regression on display format of big numbers, which is
      missing separators (in default perf stat output).
      
       triton:~/tip> perf stat -a sleep 1
               ...
               127008602      cycles                    #    0.011 GHz
               279538533      stalled-cycles-frontend   #  220.09% frontend cycles idle
               119213269      instructions              #    0.94  insn per cycle
      
      This is caused by recent change:
      
        perf stat: Check existence of frontend/backed stalled cycles
      
      that added call to pmu_have_event, that subsequently calls
      perf_pmu__parse_scale, which has a bug in locale handling.
      
      The lc string returned from setlocale, that we use to store old locale
      value, may be allocated in static storage. Getting a dynamic copy to
      make it survive another setlocale call.
      
        $ perf stat ls
               ...
               2,360,602      cycles                    #    3.080 GHz
               2,703,090      instructions              #    1.15  insn per cycle
                 546,031      branches                  #  712.511 M/sec
      
      Committer note:
      
      Since the patch introducing the regression didn't made to perf/core,
      move it to just before where the regression was introduced, so that we
      don't break bisection for this feature.
      Reported-by: default avatarIngo Molnar <mingo@redhat.com>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20160303095348.GA24511@krava.redhat.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f9a5978a
    • Ingo Molnar's avatar
      perf/x86/uncore: Fix build on UP-IOAPIC configs · 6f6e1516
      Ingo Molnar authored
      Commit:
      
        cf6d445f ("perf/x86/uncore: Track packages, not per CPU data")
      
      reorganized the uncore code to track packages, and introduced a dependency
      on MAX_APIC_ID. This constant is not available on UP-IOAPIC builds:
      
        arch/x86/events/intel/uncore.c:1350:44: error: 'MAX_LOCAL_APIC' undeclared here (not in a function)
      
      Include asm/apicdef.h explicitly to pick it up.
      
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Harish Chegondi <harish.chegondi@intel.com>
      Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6f6e1516