Commits · 285932a25879602407f207e862bc5b8416711f42 · Kirill Smelkov / linux

14 Nov, 2016 2 commits

tools build: Add jvmti feature detection support · 285932a2

Jiri Olsa authored Nov 02, 2016

Adding support to detect jvmti support. It is not plugged into the
FEATURE_TESTS machinery, because it's quite rare and will be used
separately from perf via feature_check call.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: William Cohen <wcohen@redhat.com>
Link: http://lkml.kernel.org/r/1478093749-5602-3-git-send-email-jolsa@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

285932a2

tools build: Add CFLAGS_REMOVE_* support · 2ec8107d

Jiri Olsa authored Nov 02, 2016

Adding support to remove options from final CFLAGS for both object file
and build target. It's now possible to remove CFLAGS options like:

  CFLAGS_REMOVE_krava.o += -Wstrict-prototypes

Committer notes:

This comes from the kernel's kbuild infrastructure, the subset that is
supported in tools/ is being documented at tools/build/Documentation/Build.txt.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: William Cohen <wcohen@redhat.com>
Link: http://lkml.kernel.org/r/1478093749-5602-2-git-send-email-jolsa@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

2ec8107d

11 Nov, 2016 1 commit

perf intel-pt: Update documentation about context switch events · 699c12a7

Arnaldo Carvalho de Melo authored Nov 09, 2016

Since the unprivileged sched switch event was added in perf, PT doesn't
need need perf_event_paranoid=-1 anymore for per cpu decoding.

Add a note stating that that is only needed for kernels < 4.2.
Reported-by: Andi Kleen <ak@linux.intel.com>
Report-Link: http://lkml.kernel.org/r/http://lkml.kernel.org/n/tip-x2ybghpqxxn3zu0m8o7qi42r@git.kernel.orgAcked-by: Adrian Hunter <adrian.hunter@intel.com>
Fixes: 45ac1403 ("perf: Add PERF_RECORD_SWITCH to indicate context switches")
Link: http://lkml.kernel.org/n/tip-x2ybghpqxxn3zu0m8o7qi42r@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

699c12a7

08 Nov, 2016 1 commit

perf callchain: Fixup help/config for no-unwinding · c56cb33b

Rabin Vincent authored Aug 10, 2016

Since 841e3558 ("perf callchain: Recording 'dwarf' callchains do not
need DWARF unwinding support"), --call-graph dwarf is allowed in 'perf
record' even without unwind support. A couple of other places don't
reflect this yet though: the help text should list dwarf as a valid
record mode and the dump_size config should be respected too.
Signed-off-by: Rabin Vincent <rabinv@axis.com>
Cc: He Kuang <hekuang@huawei.com>
Fixes: 841e3558 ("perf callchain: Recording 'dwarf' callchains do not need DWARF unwinding support")
Link: http://lkml.kernel.org/r/1470837148-7642-1-git-send-email-rabin.vincent@axis.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

c56cb33b

28 Oct, 2016 9 commits

Merge tag 'perf-core-for-mingo-20161028' of... · 91a79e5f

Ingo Molnar authored Oct 28, 2016

Merge tag 'perf-core-for-mingo-20161028' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

perf/core improvements and fixes from Arnaldo Carvalho de Melo:

New features:

- Support matching by topic in 'perf list' (Andi Kleen)

User visible:

- Apply cpu color only when there was activity in 'perf sched map' (Namhyung Kim)

- Always show the task's COMM in 'perf sched map -v' (Namhyung Kim)

- Fix hierarchy column counts in the perf hist browser (top, report), avoiding
  showing nothing after pressing the RIGHT key a number of times (Namhyung Kim)

Infrastructure:

- Support cascading options in libsubcmd and use it to share common options in
  'perf sched' subcommands (Namhyung Kim)

- Avoid worker cacheline bouncing in 'perf bench futex' (Davidlohr Bueso)

- Sanitize numeric parameters in 'perf bench futex' (Davidlohr Bueso)

- Update copies of kernel files (Arnaldo Carvalho de Melo)

- Fix scripting (perl, python) setup to avoid leaks (Arnaldo Carvalho de Melo)

- Add missing object file to the python binding linkage list (Arnaldo Carvalho de Melo)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

91a79e5f

perf tools: Add missing object file to the python binding linkage list · 46cb25b1

Arnaldo Carvalho de Melo authored Oct 26, 2016

In ac12f676 ("perf tools: Implement branch_type event parameter") we
started using the parse_branch_str() function from one of the files used
in the python binding, which caused this entry in 'perf test' to fail:

  # perf test -v python
  16: Try 'import perf' in python, checking link problems      :
  --- start ---
  test child forked, pid 16667
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  ImportError: /tmp/build/perf/python/perf.so: undefined symbol:
  parse_branch_str
  test child finished with -1
  ---- end ----
  Try 'import perf' in python, checking link problems: FAILED!
  #

I must've commited some mistake when running 'perf test' to send the
pull request for the perf-core-for-mingo-20161024 tag, to have let this
regression to pass, sigh.

Just add tools/perf/util/parse-branch-options.c and switch from using
ui__warning(), that is not available in the python binding, use
pr_warning() instead, which is good enough for this case.

Now:

  # perf test python
  16: Try 'import perf' in python, checking link problems      : Ok
  #

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Andi Kleen <ak@linux.intel.com>
Fixes: ac12f676 ("perf tools: Implement branch_type event parameter")
Link: http://lkml.kernel.org/n/tip-9kn1ct1cx9ppwqlmzl6z0xhs@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

46cb25b1

perf scripting: Don't die if scripting can't be setup, disable it · 9a8860bb

Arnaldo Carvalho de Melo authored Oct 25, 2016

Removing one more set of die() calls.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-6pyil685m5i2tugg56gcy0tg@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

9a8860bb

perf scripting: Avoid leaking the scripting_context variable · cf346d5b

Arnaldo Carvalho de Melo authored Oct 25, 2016

Both register_perl_scripting() and register_python_scripting() allocate
this variable, fix it by checking if it already was.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: 7e4b21b8 ("perf/scripts: Add Python scripting engine")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

cf346d5b

perf tools: Update x86's syscall_64.tbl, adding pkey_(alloc,free,mprotect) · ca7202bf

Arnaldo Carvalho de Melo authored Oct 25, 2016

Introduced in commit f9afc619 ("x86: Wire up protection keys system
calls")

This will make 'perf trace' aware of them on x86_64.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-s1ta2ttv2xacecqogmd3a9p1@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

ca7202bf

tools: Update asm-generic/mman-common.h copy from the kernel · 0fb75c8c

Arnaldo Carvalho de Melo authored Oct 25, 2016

To get the defines introduced in the commit e8c24d3a ("x86/pkeys:
Allocation/free syscalls")

Silencing this perf build warning:

  Warning: tools/include/uapi/asm-generic/mman-common.h differs from kernel

Need to change 'perf trace' to beautify those syscalls, as soon as
booting with a kernel with it.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-yev9rexu02cl7cjeozzmrl9t@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

0fb75c8c

perf bench mem: Ignore export.h related changes to mem{cpy,set}.S · e0c47582

Arnaldo Carvalho de Melo authored Oct 25, 2016

Ignore export.h and EXPORT_SYMBOL in:

  784d5699 ("x86: move exports to actual definitions")

We're not dragging this stuff, not useful in tools/

This silences the following warnings while building perf:

  Warning: tools/arch/x86/lib/memcpy_64.S differs from kernel
  Warning: tools/arch/x86/lib/memset_64.S differs from kernel

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-h9vw3pe0fq79zmyqsfr0s0mo@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

e0c47582

perf list: Support matching by topic · 67bdc35f

Andi Kleen authored Oct 19, 2016

Add support in perf list topic to only show events belonging to a
specific vendor events topic. For example the following works now:

  % perf list frontend
  List of pre-defined events (to be used in -e):

    stalled-cycles-frontend OR idle-cycles-frontend    [Hardware event]

    stalled-cycles-frontend OR cpu/stalled-cycles-frontend/ [Kernel PMU event]

  frontend:
    dsb2mite_switches.count
         [Decode Stream Buffer (DSB)-to-MITE switches]
    dsb2mite_switches.penalty_cycles
         [Decode Stream Buffer (DSB)-to-MITE switch true penalty cycles]
    dsb_fill.exceed_dsb_lines
         [Cycles when Decode Stream Buffer (DSB) fill encounter more than 3 Decode Stream Buffer (DSB)
          lines]
    icache.hit
         [Number of Instruction Cache, Streaming Buffer and Victim Cache Reads. both cacheable and
          noncacheable, including UC fetches]
  ...
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/1476902724-9586-2-git-send-email-andi@firstfloor.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

67bdc35f

perf tools: Introduce timestamp__scnprintf_usec() · 99620a5d

Namhyung Kim authored Oct 24, 2016

Joonwoo reported that there's a mismatch between timestamps in script
and sched commands.  This was because of difference in printing the
timestamp.  Factor out the code and share it so that they can be in
sync.  Also I found that sched map has similar problem, fix it too.

Committer notes:

Fixed the max_lat_at bug introduced by Namhyung's original patch, as
pointed out by Joonwoo, and made it a function following the scnprintf()
model, i.e. returning the number of bytes formatted, and receiving as
the first parameter the object from where the data to the formatting is
obtained, renaming it from:

   char *timestamp_in_usec(char *bf, size_t size, u64 timestamp)

to

   int timestamp__scnprintf_usec(u64 timestamp, char *bf, size_t size)
Reported-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20161024020246.14928-3-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

99620a5d

25 Oct, 2016 7 commits

perf sched map: Always show task comm with -v · e107f129

Namhyung Kim authored Oct 24, 2016

I'd like to see the name of tasks with perf sched map, but it only shows
name of new tasks and then use short names after all.  This is not good
for long running tasks since it's hard for users to track the short
names.  This patch makes it show the names (except the idle task) when
-v option is used.  Probably we may make it as default behavior.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20161024020246.14928-2-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

e107f129

perf sched map: Apply cpu color when there's an activity · 1208bb27

Namhyung Kim authored Oct 24, 2016

Applying cpu color always doesn't help readability IMHO.  Instead it
might be better to applying the color when there's an activity on those
CPUs.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20161024020246.14928-1-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

1208bb27

perf sched: Make common options cascading · 77f02f44

Namhyung Kim authored Oct 24, 2016

The -i and -v options can be used in subcommands so enable cascading the
sched_options.  This fixes the following inconvenience in 'perf sched':

  $ perf sched -i perf.data.sched  map
  ... (it works well) ...

  $ perf sched map  -i perf.data.sched
    Error: unknown switch `i'

   Usage: perf sched map [<options>]

          --color-cpus <cpus>
                            highlight given CPUs in map
          --color-pids <pids>
                            highlight given pids in map
          --compact         map output in compact mode
          --cpus <cpus>     display given CPUs in map

With this patch, the second command line works with the perf.data.sched
data file.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20161024030003.28534-2-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

77f02f44

tools lib subcmd: Suppport cascading options · 369a2478

Namhyung Kim authored Oct 24, 2016

Sometimes subcommand have common options and it can only handled in the
upper level command unless it duplicates the options.

This patch adds a parent field and fallback to the parent if the given
argument was not found in the current options.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20161024030003.28534-1-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

369a2478

perf hist browser: Fix hierarchy column counts · 8a06b0be

Namhyung Kim authored Oct 25, 2016

The perf report/top on TUI supports horizontal scrolling using LEFT and
RIGHT keys.

But it calculate the number of columns incorrectly when hierarchy mode
is enabled so that keep pressing RIGHT key can make the output
disappeared.

In the hierarchy mode, all sort keys are collapsed into a single column,
so it needs to be applied when calculating column numbers.
Reported-and-Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20161024162110.17918-1-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

8a06b0be

perf bench futex: Sanitize numeric parameters · 60758d66

Davidlohr Bueso authored Oct 24, 2016

This gets rid of oddities such as:

  perf bench futex hash -t -4
  perf: calloc: Cannot allocate memory

Runtime (and many more) are equally busted, i.e. run for bogus amounts of
time. Just use the abs, instead of, for example errorring out.

Committer note:

After the patch:

  $ perf bench futex hash -t -4
  # Running 'futex/hash' benchmark:
  Run summary [PID 10178]: 4 threads, each operating on 1024 [private] futexes for 10 secs.

  [thread  0] futexes: 0x34f9fa0 ... 0x34faf9c [ 4702208 ops/sec ]
  [thread  1] futexes: 0x34fb140 ... 0x34fc13c [ 4707020 ops/sec ]
  [thread  2] futexes: 0x34fc2e0 ... 0x34fd2dc [ 4711526 ops/sec ]
  [thread  3] futexes: 0x34fd480 ... 0x34fe47c [ 4709683 ops/sec ]

  Averaged 4707609 operations/sec (+- 0.04%), total secs = 10
  $
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1477342613-9938-3-git-send-email-dave@stgolabs.netSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

60758d66

perf bench futex: Avoid worker cacheline bouncing · e2e1680f

Davidlohr Bueso authored Oct 24, 2016

Sebastian noted that overhead for worker thread ops (throughput)
accounting was producing 'perf' to appear in the profiles, consuming a
non-trivial (i.e. 13%) amount of CPU.

This is due to cacheline bouncing due to the increment of w->ops.

We can easily fix this by just working on a local copy and updating the
actual worker once done running, and ready to show the program summary.
There is no danger of the worker being concurrent, so we can trust that
no stale value is being seen by another thread.

This also gets rid of the unnecessary cache alignment hack; its not
worth it.
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: http://lkml.kernel.org/r/1477342613-9938-2-git-send-email-dave@stgolabs.netSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

e2e1680f

24 Oct, 2016 20 commits

Merge tag 'perf-core-for-mingo-20161024' of... · 76e2d261

Ingo Molnar authored Oct 24, 2016

Merge tag 'perf-core-for-mingo-20161024' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

New features:

- Dynamicly change verbosity level by pressing 'V' in the 'perf top/report'
  hists TUI browser (Alexis Berlemont)

- Implement 'perf trace --delay' in the same fashion as in 'perf record --delay',
  to skip sampling workload initialization events (Alexis Berlemont)

- Make vendor named events case insensitive in 'perf list', i.e.
  'perf list LONGEST_LAT' works just the same as  'perf list longest_lat' (Andi Kleen)

- Show instruction bytes and lenght in 'perf script' for Intel PT and BTS (Andi Kleen, Adrian Hunter)

   E.g:

    % perf record -e intel_pt// foo
    % perf script --itrace=i0ns -F ip,insn,insnlen
     ffffffff8101232f ilen: 5 insn: 0f 1f 44 00 00
     ffffffff81012334 ilen: 1 insn: 5b
     ffffffff81012335 ilen: 1 insn: 5d
     ffffffff81012336 ilen: 1 insn: c3
     ffffffff810123e3 ilen: 1 insn: 5b
     ffffffff810123e4 ilen: 2 insn: 41 5c
     ffffffff810123e6 ilen: 1 insn: 5d
     ffffffff810123e7 ilen: 1 insn: c3
     ffffffff810124a6 ilen: 2 insn: 31 c0
     ffffffff810124a8 ilen: 9 insn: 41 83 bc 24 a8 01 00 00 01
     ffffffff810124b1 ilen: 2 insn: 75 87

- Allow enabling the perf_event_attr.branch_type attribute member: (Andi Kleen)

  perf record -e sched:sched_switch,cpu/cpu-cycles,branch_type=any/ ...

- Add unwinding support for jitdump (Stefano Sanfilippo)

Fixes:

- Use raw_syscall:sys_enter timestamp in 'perf trace' (Arnaldo Carvalho de Melo)

Infrastructure:

- Allow jitdump to be built without libdwarf (Maciej Debski)

- Sync x86's syscall table tools/ copy (Arnaldo Carvalho de Melo)

- Fixes to avoid calling die() in library fuctions already propagating other
  errors (Arnaldo Carvalho de Melo)

- Improvements to allow libtraceevent to be properly installed in distro
  packages (Jiri Olsa)

- Removing coresight miscellaneous debug output (Mathieu Poirier)

- Cache align the 'perf bench futex' worker struct (Sebastian Andrzej Siewior)

Documentation:

- Minor improvements on the documentation of event parameters (Andi Kleen)

- Add jitdump format specification document (Stephane Eranian)

Spelling fixes:

- Fix typo "No enough" to "Not enough" (Alexander Alemayhu)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

76e2d261

perf coresight: Removing miscellaneous debug output · 04b553ad

Mathieu Poirier authored Oct 19, 2016

Printing the full path of the selected link is obviously not needed,
hence removing.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1476913323-6836-1-git-send-email-mathieu.poirier@linaro.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

04b553ad

perf list: Make vendor event matching case insensitive · 38d14f0c

Andi Kleen authored Oct 19, 2016

Make the 'perf list' glob matching for vendor events case insensitive.
This allows to use the upper case vendor events with perf list too.

Now the following works:

  % perf list LONGEST_LAT

  ...

  cache:
    longest_lat_cache.miss
         [Core-originated cacheable demand requests missed LLC]
    longest_lat_cache.reference
         [Core-originated cacheable demand requests that refer to LLC]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/1476899402-31460-1-git-send-email-andi@firstfloor.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

38d14f0c

perf trace: Use the syscall raw_syscalls:sys_enter timestamp · ecf1e225

Arnaldo Carvalho de Melo authored Oct 18, 2016

Instead of the one when another syscall takes place while another is being
processed (in another CPU, but we show it serialized, so need to "interrupt"
the other), and also when finally showing the sys_enter + sys_exit + duration,
where we were showing the sample->time for the sys_exit, duh.

Before:

  # perf trace sleep 1
  <SNIP>
     0.373 (   0.001 ms): close(fd: 3                   ) = 0
  1000.626 (1000.211 ms): nanosleep(rqtp: 0x7ffd6ddddfb0) = 0
  1000.653 (   0.003 ms): close(fd: 1                   ) = 0
  1000.657 (   0.002 ms): close(fd: 2                   ) = 0
  1000.667 (   0.000 ms): exit_group(                   )
  #

After:

  # perf trace sleep 1
  <SNIP>
     0.336 (   0.001 ms): close(fd: 3                   ) = 0
     0.373 (1000.086 ms): nanosleep(rqtp: 0x7ffe303e9550) = 0
  1000.481 (   0.002 ms): close(fd: 1                   ) = 0
  1000.485 (   0.001 ms): close(fd: 2                   ) = 0
  1000.494 (   0.000 ms): exit_group(                   )
[root@jouet linux]#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-ecbzgmu2ni6glc6zkw8p1zmx@git.kernel.org
Fixes: 752fde44 ("perf trace: Support interrupted syscalls")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

ecf1e225

perf trace: Remove thread_trace->exit_time · 1f369460

Arnaldo Carvalho de Melo authored Oct 17, 2016

Not used at all, we need just the entry_time to calculate the syscall
duration.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-js6r09zdwlzecvaei7t4l3vd@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

1f369460

perf bench futex: Cache align the worker struct · 34b75300

Sebastian Andrzej Siewior authored Oct 16, 2016

It popped up in perf testing that the worker consumes some amount of
CPU. It boils down to the increment of `ops` which causes cache line
bouncing between the individual threads.

This patch aligns the struct by 256 bytes to ensure that not a cache
line is shared among CPUs. 128 byte is the x86 worst case and grep says
that L1_CACHE_SHIFT is set to 8 on s390.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20161016190803.3392-1-bigeasy@linutronix.deSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

34b75300

perf tools: Use normal error reporting when processing PERF_RECORD_READ events · 89973506

Arnaldo Carvalho de Melo authored Oct 14, 2016

We already have handling for errors when processing PERF_RECORD_ events,
so instead of calling die() when not being able to alloc, propagate the
error, so that the normal UI exit sequence can take place, the user be
warned and possibly the terminal be properly reset to a sane mode.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Brice Goglin <Brice.Goglin@inria.fr>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-r90je3c009a125dvs3525yge@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

89973506

perf tools: Normalize sq_quote_argv() error reporting · e7b32d12

Arnaldo Carvalho de Melo authored Oct 14, 2016

It already returns whatever strbuf_(grow|addch)() returns in case of
failure, so just return -ENOSPC in the only case where it was die()ing.
When it returns, its only caller will call die() anyway, so no need to
be so eager, die later.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-as05b7mbogprlwi8iarwns8e@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

e7b32d12

perf bench mem: Move boilerplate memory allocation to the infrastructure · 47b5757b

Arnaldo Carvalho de Melo authored Oct 14, 2016

Instead of having all tests perform alloc/free, do it in the code that
calls the do_cycles() and do_gettimeofday() functions.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-lywj4mbdb1m9x1z9asivwuuy@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

47b5757b

perf trace: Implement --delay · e36b7821

Alexis Berlemont authored Oct 10, 2016

In the perf wiki todo-list[1], there is an entry regarding initial-delay
and 'perf trace'; the following small patch tries to fulfill this point.
It has been generated against the branch tip/perf/core.

It has only been implemented in the "trace__run" case.

Ex.:

  $ sudo strace -- ./perf trace --delay 5 sleep 1 2>&1
  ...
  fcntl(7, F_SETFL, O_RDONLY|O_NONBLOCK)  = 0
  ioctl(7, PERF_EVENT_IOC_ID, 0x7ffc8fd35718) = 0
  ioctl(11, PERF_EVENT_IOC_SET_OUTPUT, 0x7) = 0
  fcntl(11, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
  ioctl(11, PERF_EVENT_IOC_ID, 0x7ffc8fd35718) = 0
  write(6, "\0", 1)                       = 1
  close(6)                                = 0
  nanosleep({0, 5000000}, NULL)           = 0  # DELAY OF 5 MS BEFORE ENABLING THE EVENTS
  ioctl(3, PERF_EVENT_IOC_ENABLE, 0)      = 0
  ioctl(4, PERF_EVENT_IOC_ENABLE, 0)      = 0
  ioctl(5, PERF_EVENT_IOC_ENABLE, 0)      = 0
  ioctl(7, PERF_EVENT_IOC_ENABLE, 0)      = 0
  ...

[1]: https://perf.wiki.kernel.org/index.php/TodoSigned-off-by: Alexis Berlemont <alexis.berlemont@gmail.com>
Suggested-and-Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20161010054328.4028-2-alexis.berlemont@gmail.com
[ Add entry to the manpage, cut'n'pasted from stat's and record's ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

e36b7821

perf hists browser: Dynamically change verbosity level · 21e8c810

Alexis Berlemont authored Oct 12, 2016

Here is a small patch which tries to fulfill a point in the perf todo
list:

* Make pressing 'V' multiple times to go on cycling thru various
  verbosity levels in 'perf top', so that info that is present in
  'perf top -v' can be obtained without having to restart the tool
  (acme).

After a small grep in the code, the max verbosity level seems 3; so,
we cycle at 4; I did not dare define a MAX_VERBOSE_LEVEL constant.
Signed-off-by: Alexis Berlemont <alexis.berlemont@gmail.com>
Suggested-and-Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20161012214823.14324-2-alexis.berlemont@gmail.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

21e8c810

perf tools: Fix typo "No enough" to "Not enough" · 042cfb5f

Alexander Alemayhu authored Oct 13, 2016

The latter version occurs much more when running git grep.
Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20161013161811.4939-1-alexander@alemayhu.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

042cfb5f

perf pmu: Only print Using CPUID message once · fb967063

Andi Kleen authored Oct 13, 2016

With uncore event aliases which are duplicated over multiple PMUs the
"Using CPUID" message with -v could be printed many times.  Only print
it once.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1476393332-20732-3-git-send-email-andi@firstfloor.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

fb967063

perf jit: Add jitdump format specification document · b3151ea5

Stephane Eranian authored Oct 13, 2016

This patch adds a formal specification of the jitdump format. The goal
is to help jit runtime developers implement the jitdump support without
having to read the jvmti code.
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1476356383-30100-10-git-send-email-eranian@google.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

b3151ea5

perf jit: Check JITHEADER_VERSION · 6760d77b

Stefano Sanfilippo authored Oct 13, 2016

Check the version number when opening a jitdump file.  Accept older
versions, but not newer ones.
Signed-off-by: Stefano Sanfilippo <ssanfilippo@chromium.org>
Signed-off-by: Ross McIlroy <rmcilroy@chromium.org>
Reviewed-by: Stephane Eranian <eranian@google.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1476356383-30100-9-git-send-email-eranian@google.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

6760d77b

perf jit: Generate .eh_frame/.eh_frame_hdr in DSO · 086f9f3d

Stefano Sanfilippo authored Oct 13, 2016

When the jit_buf_desc contains unwinding information, it is emitted as
eh_frame unwinding sections in the DSOs generated by perf inject.

The unwinding information is required to unwind of JITed code which do
not maintain the frame pointer register during function calls.  It can
be emitted by V8 / Chromium when the --perf_prof_unwinding_info is
passed to V8.

The eh_frame and eh_frame_hdr sections are emitted immediately after the
.text.

The .eh_frame is aligned at a 8-byte boundary, and .eh_frame_hdr at a
4-byte one. Since size of the .eh_frame is required to be a multiple of
the word size, which means there will never be additional padding
between it and the .eh_frame_hdr on machines where the word size is 4 or
8 bytes.

However, additional padding might be inserted between .text and
.eh_frame to reach the correct alignment, which will always be 8 bytes,
also on 32bit machines. The reasoning behind this choice is that 4 extra
bytes of padding worst case are not a large cost for the advantage of
removing word-size dependent offset calculations when emitting the
jitdump.
Signed-off-by: Stefano Sanfilippo <ssanfilippo@chromium.org>
Signed-off-by: Ross McIlroy <rmcilroy@chromium.org>
Reviewed-by: Stephane Eranian <eranian@google.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1476356383-30100-8-git-send-email-eranian@google.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

086f9f3d

perf jit: Add unwinding support · 0284fecd

Stefano Sanfilippo authored Oct 13, 2016

This record is intended to provide unwinding information in the
eh_frame format. This is required to unwind JITed code which
does not maintain the frame pointer register during function calls.

The eh_frame unwinding information can be emitted by V8 / Chromium
when the --perf_prof_unwinding_info is passed.

A record of type jr_code_unwinding_info comes before the jr_code_load
it referred to and contains both the .eh_frame and .eh_frame_hdr.

The fields in the header have the following meaning:

  * unwinding_size: size of the eh_frame and eh_frame_hdr, necessary
    for distinguishing the content from the padding.

  * eh_frame_hdr_size: as the name says.

  * mapped_size: size of the payload that was in memory at runtime.
    typically unwinding_size if the .eh_frame_hdr and .eh_frame were
    mapped, or 0 if they weren't. It should always be the former case,
    since the .eh_frame is guaranteed to be mapped in memory. However,
    certain JITs might want to inject an .eh_frame_hdr with an empty LUT
    to trigger fp-based unwinding fallback in libunwind. The only part
    of the .eh_frame_hdr that libunwind reads from remote memory is the
    LUT, and since there is none, mapping the unwinding info in memory
    is not necessary, and 0 in this field signifies that it wasn't.
    This practical hack allows to save bytes in code memory for those
    JIT compilers that might or might not maintain a valid frame pointer.

The payload that follows is assumed to contain first the .eh_frame and
then the .eh_header_hdr, with no padding between the two.
Signed-off-by: Stefano Sanfilippo <ssanfilippo@chromium.org>
Signed-off-by: Ross McIlroy <rmcilroy@chromium.org>
Reviewed-by: Stephane Eranian <eranian@google.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1476356383-30100-7-git-send-email-eranian@google.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

0284fecd

perf jit: Do not assume pgoff is zero · eac05af2

Stefano Sanfilippo authored Oct 13, 2016

When calculating .eh_frame_hdr base and LUT offsets do not always assume
that pgoff is zero.

The assumption is false for DSOs built from the jitdump by perf inject,
because the ELF header did not exist in memory at sampling time.
Signed-off-by: Stefano Sanfilippo <ssanfilippo@chromium.org>
Signed-off-by: Ross McIlroy <rmcilroy@chromium.org>
Reviewed-by: Stephane Eranian <eranian@google.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1476356383-30100-6-git-send-email-eranian@google.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

eac05af2

perf jit: Make perf skip unknown records · 7354ec7a

Stefano Sanfilippo authored Oct 13, 2016

The behavior before this commit was to skip the remaining portion of the
jitdump in case an unknown record was found, including those records
that perf could handle.

With this change, parsing a record with an unknown id will cause a
warning to be emitted, the record will be skipped and parsing will
resume from the next (valid) one.

The patch aims at making perf more future proof, by extracting as much
information as possible from jitdumps.
Signed-off-by: Stefano Sanfilippo <ssanfilippo@chromium.org>
Signed-off-by: Ross McIlroy <rmcilroy@chromium.org>
Reviewed-by: Stephane Eranian <eranian@google.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1476356383-30100-5-git-send-email-eranian@google.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

7354ec7a

perf jit: Remove unecessary padding in jitdump file · 13b9012a

Stephane Eranian authored Oct 13, 2016

This patch removes all the string padding generated in the jitdump file.
They are not necessary and were adding unnecessary complexity. Modern
processors can handle unaligned accesses quite well. The perf.data/
jitdump file are always post-processed, no need to add extra complexity
for no real gain.
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1476356383-30100-4-git-send-email-eranian@google.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

13b9012a