Commits · 9e5b127d6f33468143d90c8a45ca12410e4c3fa7 · nexedi / linux

12 Mar, 2018 1 commit

perf/core: Fix perf_output_read_group() · 9e5b127d

Peter Zijlstra authored Mar 09, 2018

Mark reported his arm64 perf fuzzer runs sometimes splat like:

  armv8pmu_read_counter+0x1e8/0x2d8
  armpmu_event_update+0x8c/0x188
  armpmu_read+0xc/0x18
  perf_output_read+0x550/0x11e8
  perf_event_read_event+0x1d0/0x248
  perf_event_exit_task+0x468/0xbb8
  do_exit+0x690/0x1310
  do_group_exit+0xd0/0x2b0
  get_signal+0x2e8/0x17a8
  do_signal+0x144/0x4f8
  do_notify_resume+0x148/0x1e8
  work_pending+0x8/0x14

which asserts that we only call pmu::read() on ACTIVE events.

The above callchain does:

  perf_event_exit_task()
    perf_event_exit_task_context()
      task_ctx_sched_out() // INACTIVE
      perf_event_exit_event()
        perf_event_set_state(EXIT) // EXIT
        sync_child_event()
          perf_event_read_event()
            perf_output_read()
              perf_output_read_group()
                leader->pmu->read()

Which results in doing a pmu::read() on an !ACTIVE event.

I _think_ this is 'new' since we added attr.inherit_stat, which added
the perf_event_read_event() to the exit path, without that
perf_event_read_output() would only trigger from samples and for
@event to trigger a sample, it's leader _must_ be ACTIVE too.

Still, adding this check makes it consistent with the @sub case for
the siblings.
Reported-and-Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

9e5b127d

09 Mar, 2018 8 commits

Merge tag 'perf-core-for-mingo-4.17-20180308' of... · fbf8a1e1

Ingo Molnar authored Mar 09, 2018

Merge tag 'perf-core-for-mingo-4.17-20180308' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

- Support to display the IPC/Cycle in 'annotate' TUI, for systems
  where this info can be obtained, like Intel's >= Skylake (Jin Yao)

- Support wildcards on PMU name in dynamic PMU events (Agustin Vega-Frias)

- Display pmu name when printing unmerged events in stat (Agustin Vega-Frias)

- Auto-merge PMU events created by prefix or glob match (Agustin Vega-Frias)

- Fix s390 'call' operations target function annotation (Thomas Richter)

- Handle s390 PC relative load and store instruction in the augmented
  'annotate', code, used so far in the TUI modes of 'perf report' and
  'perf annotate' (Thomas Richter)

- Provide libtraceevent with a kernel symbol resolver, so that
  symbols in tracepoint fields can be resolved when showing them in
  tools such as 'perf report' (Wang YanQing)

- Refactor the cgroups code to look more like other code in tools/perf,
  using cgroup__{put,get} for refcount operations instead of its
  open-coded equivalent, breaking larger functions, etc (Arnaldo Carvalho de Melo)

- Implement support for the -G/--cgroup target in 'perf trace', allowing
  strace like tracing (plus other events, backtraces, etc) for cgroups
  (Arnaldo Carvalho de Melo)

- Update thread shortname in 'perf sched map' when the thread's COMM
  changes (Changbin Du)

- refcount 'struct mem_info', for better sharing it over several
  users, avoid duplicating structs and fixing crashes related to
  use after free (Jiri Olsa)

- Display perf.data version, offsets in 'perf report --header' (Jiri Olsa)

- Record the machine's memory topology information in a perf.data
  feature section, to be used by tools such as 'perf c2c' (Jiri Olsa)

- Fix output of forced groups in the header for 'perf report' --stdio
  and --tui (Jiri Olsa)

- Better support llvm, clang, cxx make tests in the build process (Jiri Olsa)

- Streamline the 'struct perf_mmap' methods, storing some info in the
  struct instead of passing it via various methods, shortening its
  signatures (Kan Liang)

- Update the quipper perf.data parser library site information (Stephane Eranian)

- Correct perf's man pages title markers for asciidoctor (Takashi Iwai)

- Intel PT fixes and refactorings paving the way for implementing
  support for AUX area sampling (Adrian Hunter)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

fbf8a1e1

perf/x86/intel: Disable userspace RDPMC usage for large PEBS · 1af22eba

Kan Liang authored Feb 12, 2018

Userspace RDPMC cannot possibly work for large PEBS, which was introduced in:

b8241d20699e ("perf/x86/intel: Implement batched PEBS interrupt handling (large PEBS interrupt threshold)")

When the PEBS interrupt threshold is larger than one, there is no way
to get exact auto-reload times and value for userspace RDPMC. Disable
the userspace RDPMC usage when large PEBS is enabled.

The only exception is when the PEBS interrupt threshold is 1, in which
case user-space RDPMC works well even with auto-reload events.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Fixes: b8241d20699e ("perf/x86/intel: Implement batched PEBS interrupt handling (large PEBS interrupt threshold)")
Link: http://lkml.kernel.org/r/1518474035-21006-6-git-send-email-kan.liang@linux.intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

1af22eba

perf/x86/intel: Fix PMU read for auto-reload · ceb90d9e

Kan Liang authored Feb 12, 2018

Auto-reload events needs to be specially handled in event count read.

Auto-reload is only available for intel_pmu.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Fixes: b8241d20699e ("perf/x86/intel: Implement batched PEBS interrupt handling (large PEBS interrupt threshold)")
Link: http://lkml.kernel.org/r/1518474035-21006-5-git-send-email-kan.liang@linux.intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

ceb90d9e

perf/x86/intel/ds: Introduce ->read() function for auto-reload events and... · 5bee2cc6

Kan Liang authored Feb 12, 2018

perf/x86/intel/ds: Introduce ->read() function for auto-reload events and flush the PEBS buffer there

There is no way to get exact auto-reload times and values which are needed
for event updates unless we flush the PEBS buffer.

Introduce intel_pmu_auto_reload_read() to drain the PEBS buffer for
auto reload event. To prevent races with the hardware, we can only
call drain_pebs() when the PMU is disabled.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Link: http://lkml.kernel.org/r/1518474035-21006-4-git-send-email-kan.liang@linux.intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

5bee2cc6

perf/x86: Introduce a ->read() callback in 'struct x86_pmu' · bcfbe5c4

Kan Liang authored Feb 12, 2018

Auto-reload needs to be specially handled when reading event counts.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Link: http://lkml.kernel.org/r/1518474035-21006-3-git-send-email-kan.liang@linux.intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

bcfbe5c4

perf/x86/intel: Fix event update for auto-reload · d31fc13f

Kan Liang authored Feb 12, 2018

There is a bug when reading event->count with large PEBS enabled.

Here is an example:

  # ./read_count
  0x71f0
  0x122c0
  0x1000000001c54
  0x100000001257d
  0x200000000bdc5

In fixed period mode, the auto-reload mechanism could be enabled for
PEBS events, but the calculation of event->count does not take the
auto-reload values into account.

Anyone who reads event->count will get the wrong result, e.g x86_pmu_read().

This bug was introduced with the auto-reload mechanism enabled since
commit:

  851559e3 ("perf/x86/intel: Use the PEBS auto reload mechanism when possible")

Introduce intel_pmu_save_and_restart_reload() to calculate the
event->count only for auto-reload.

Since the counter increments a negative counter value and overflows on
the sign switch, giving the interval:

        [-period, 0]

the difference between two consequtive reads is:

 A) value2 - value1;
    when no overflows have happened in between,
 B) (0 - value1) + (value2 - (-period));
    when one overflow happened in between,
 C) (0 - value1) + (n - 1) * (period) + (value2 - (-period));
    when @n overflows happened in between.

Here A) is the obvious difference, B) is the extension to the discrete
interval, where the first term is to the top of the interval and the
second term is from the bottom of the next interval and C) the extension
to multiple intervals, where the middle term is the whole intervals
covered.

The equation for all cases is:

    value2 - value1 + n * period

Previously the event->count is updated right before the sample output.
But for case A, there is no PEBS record ready. It needs to be specially
handled.

Remove the auto-reload code from x86_perf_event_set_period() since
we'll not longer call that function in this case.

Based-on-code-from: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Fixes: 851559e3 ("perf/x86/intel: Use the PEBS auto reload mechanism when possible")
Link: http://lkml.kernel.org/r/1518474035-21006-2-git-send-email-kan.liang@linux.intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

d31fc13f

perf/x86/intel: Properly save/restore the PMU state in the NMI handler · 82d71ed0

Kan Liang authored Feb 20, 2018

The PMU is disabled in intel_pmu_handle_irq(), but cpuc->enabled is not updated
accordingly.

This is fine in current usage because no-one checks it - but fix it
for future code: for example, the drain_pebs() will be modified to
fix an auto-reload bug.

Properly save/restore the old PMU state.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Cc: kernel test robot <fengguang.wu@intel.com>
Link: http://lkml.kernel.org/r/6f44ee84-56f8-79f1-559b-08e371eaeb78@linux.intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

82d71ed0

perf/x86/intel: Fix large period handling on Broadwell CPUs · f605cfca

Kan Liang authored Mar 01, 2018

Large fixed period values could be truncated on Broadwell, for example:

  perf record -e cycles -c 10000000000

Here the fixed period is 0x2540BE400, but the period which finally applied is
0x540BE400 - which is wrong.

The reason is that x86_pmu::limit_period() uses an u32 parameter, so the
high 32 bits of 'period' get truncated.

This bug was introduced in:

  commit 294fe0f5 ("perf/x86/intel: Add INST_RETIRED.ALL workarounds")

It's safe to use u64 instead of u32:

 - Although the 'left' is s64, the value of 'left' must be positive when
   calling limit_period().

 - bdw_limit_period() only modifies the lowest 6 bits, it doesn't touch
   the higher 32 bits.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 294fe0f5 ("perf/x86/intel: Add INST_RETIRED.ALL workarounds")
Link: http://lkml.kernel.org/r/1519926894-3520-1-git-send-email-kan.liang@linux.intel.com
[ Rewrote unacceptably bad changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

f605cfca

08 Mar, 2018 31 commits

perf tools: Update quipper information · 2427b432

Stephane Eranian authored Mar 07, 2018

This patch updates the links to the Quipper library. It is now
available from GitHub and has been updated.
Reported-by: Lakshman Annadorai <lakshmana@google.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1520495985-2147-1-git-send-email-eranian@google.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

2427b432

perf annotate: Handle s390 PC relative load and store instruction. · 0b4b6b78

Thomas Richter authored Mar 08, 2018

S390 has several load and store instructions with target operand
addressing relative to the program counter, for example lrl, lgrl, strl,
stgrl.

These instructions are handled similar to x86. Objdump output displays
those instructions as:

   9595c: c4 2d 00 09 9c 54   lgrl   %r7,1c8540 <mp_+0x60>

This output is parsed (like on x86) and perf annotate shows those lines
as:

   lgrl   %r7,mp_+0x60

This patch handles the s390 specific instruction parsing for PC relative
load and store instructions.
Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Link: http://lkml.kernel.org/r/20180308120913.14802-1-tmricht@linux.vnet.ibm.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

0b4b6b78

perf annotate: Support to display the IPC/Cycle in TUI mode · bb848c14

Jin Yao authored Feb 27, 2018

Unlike the perf report interactive annotate mode, the perf annotate
doesn't display the IPC/Cycle even if branch info is recorded in perf
data file.

perf record -b ...
perf annotate function

It should show IPC/cycle, but it doesn't.

This patch lets perf annotate support the displaying of IPC/Cycle if
branch info is in perf data.

For example,

  perf annotate compute_flag

  Percent│ IPC Cycle
         │
         │
         │                Disassembly of section .text:
         │
         │                0000000000400640 <compute_flag>:
         │                compute_flag():
         │                volatile int count;
         │                static unsigned int s_randseed;
         │
         │                __attribute__((noinline))
         │                int compute_flag()
         │                {
   22.96 │1.18   584        sub    $0x8,%rsp
         │                        int i;
         │
         │                        i = rand() % 2;
   23.02 │1.18     1      → callq  rand@plt
         │
         │                        return i;
   27.05 │3.37              mov    %eax,%edx
         │                }
         │3.37              add    $0x8,%rsp
         │                {
         │                        int i;
         │
         │                        i = rand() % 2;
         │
         │                        return i;
         │3.37              shr    $0x1f,%edx
         │3.37              add    %edx,%eax
         │3.37              and    $0x1,%eax
         │3.37              sub    %edx,%eax
         │                }
   26.97 │3.37     2      ← retq

Note that, this patch only supports TUI mode. For stdio, now it just keeps
original behavior. Will support it in a follow-up patch.

  $ perf annotate compute_flag --stdio

   Percent |      Source code & Disassembly of div for cycles:ppp (7993 samples)
  ------------------------------------------------------------------------------
           :
           :
           :
           :            Disassembly of section .text:
           :
           :            0000000000400640 <compute_flag>:
           :            compute_flag():
           :            volatile int count;
           :            static unsigned int s_randseed;
           :
           :            __attribute__((noinline))
           :            int compute_flag()
           :            {
      0.29 :   400640:       sub    $0x8,%rsp     # +100.00%
           :                    int i;
           :
           :                    i = rand() % 2;
     42.93 :   400644:       callq  400490 <rand@plt>     # -100.00% (p:100.00%)
           :
           :                    return i;
      0.10 :   400649:       mov    %eax,%edx     # +100.00%
           :            }
      0.94 :   40064b:       add    $0x8,%rsp
           :            {
           :                    int i;
           :
           :                    i = rand() % 2;
           :
           :                    return i;
     27.02 :   40064f:       shr    $0x1f,%edx
      0.15 :   400652:       add    %edx,%eax
      1.24 :   400654:       and    $0x1,%eax
      2.08 :   400657:       sub    %edx,%eax
           :            }
     25.26 :   400659:       retq # -100.00% (p:100.00%)
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/20180223170210.GC7045@tassilo.jf.intel.com
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1519724327-7773-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

bb848c14

perf report: Provide libtraceevent with a kernel symbol resolver · ea85ab24

Wang YanQing authored Mar 08, 2018

So that beautifiers wanting to resolve kernel function addresses to
names can do its work, and when we use "perf report" for output of "perf
kmem record", we will get kernel symbol output.

This patch affect the output of "perf report" for the record data
generated by "perf kmem record" looks like below:

Before patch:
0.01% call_site=ffffffff814e5828 ptr=0x99bb000 bytes_req=3616 bytes_alloc=4096 gfp_flags=GFP_ATOMIC
0.01% call_site=ffffffff81370b87 ptr=0x428a3060 bytes_req=32 bytes_alloc=32 gfp_flags=GFP_KERNEL|GFP_ZERO

After patch:
0.01% (aa_alloc_task_context+0x27) call_site=ffffffff81370b87 ptr=0x428a3060 bytes_req=32 bytes_alloc=32 gfp_flags=GFP_KERNEL|GFP_ZERO
0.01% (__tty_buffer_request_room+0x88) call_site=ffffffff814e5828 ptr=0x99bb000 bytes_req=3616 bytes_alloc=4096 gfp_flags=GFP_ATOMIC
Signed-off-by: Wang YanQing <udknight@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180308032850.GA12383@udknight-ThinkPad-E550Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>