1. 12 Mar, 2018 5 commits
    • Peter Zijlstra's avatar
      perf/core: Remove perf_event::group_entry · 8343aae6
      Peter Zijlstra authored
      Now that all the grouping is done with RB trees, we no longer need
      group_entry and can replace the whole thing with sibling_list.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Carrillo-Cisneros <davidcc@google.com>
      Cc: Dmitri Prokhorov <Dmitry.Prohorov@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Valery Cherepennikov <valery.cherepennikov@intel.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8343aae6
    • Peter Zijlstra's avatar
      perf/core: Fix event schedule order · 1cac7b1a
      Peter Zijlstra authored
      Scheduling in events with cpu=-1 before events with cpu=# changes
      semantics and is undesirable in that it would priorize these events.
      
      Given that groups->index is across all groups we actually have an
      inter-group ordering, meaning we can merge-sort two groups, which is
      just what we need to preserve semantics.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Carrillo-Cisneros <davidcc@google.com>
      Cc: Dmitri Prokhorov <Dmitry.Prohorov@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Valery Cherepennikov <valery.cherepennikov@intel.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1cac7b1a
    • Peter Zijlstra's avatar
      perf/core: Cleanup the rb-tree code · 161c85fa
      Peter Zijlstra authored
      Trivial comment and code fixups..
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Carrillo-Cisneros <davidcc@google.com>
      Cc: Dmitri Prokhorov <Dmitry.Prohorov@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Valery Cherepennikov <valery.cherepennikov@intel.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      161c85fa
    • Alexey Budankov's avatar
      perf/cor: Use RB trees for pinned/flexible groups · 8e1a2031
      Alexey Budankov authored
      Change event groups into RB trees sorted by CPU and then by a 64bit
      index, so that multiplexing hrtimer interrupt handler would be able
      skipping to the current CPU's list and ignore groups allocated for the
      other CPUs.
      
      New API for manipulating event groups in the trees is implemented as well
      as adoption on the API in the current implementation.
      
      pinned_group_sched_in() and flexible_group_sched_in() API are
      introduced to consolidate code enabling the whole group from pinned
      and flexible groups appropriately.
      Signed-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: David Carrillo-Cisneros <davidcc@google.com>
      Cc: Dmitri Prokhorov <Dmitry.Prohorov@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Valery Cherepennikov <valery.cherepennikov@intel.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/372f9c8b-0cfe-4240-e44d-83d863d40813@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8e1a2031
    • Peter Zijlstra's avatar
      perf/core: Fix perf_output_read_group() · 9e5b127d
      Peter Zijlstra authored
      Mark reported his arm64 perf fuzzer runs sometimes splat like:
      
        armv8pmu_read_counter+0x1e8/0x2d8
        armpmu_event_update+0x8c/0x188
        armpmu_read+0xc/0x18
        perf_output_read+0x550/0x11e8
        perf_event_read_event+0x1d0/0x248
        perf_event_exit_task+0x468/0xbb8
        do_exit+0x690/0x1310
        do_group_exit+0xd0/0x2b0
        get_signal+0x2e8/0x17a8
        do_signal+0x144/0x4f8
        do_notify_resume+0x148/0x1e8
        work_pending+0x8/0x14
      
      which asserts that we only call pmu::read() on ACTIVE events.
      
      The above callchain does:
      
        perf_event_exit_task()
          perf_event_exit_task_context()
            task_ctx_sched_out() // INACTIVE
            perf_event_exit_event()
              perf_event_set_state(EXIT) // EXIT
              sync_child_event()
                perf_event_read_event()
                  perf_output_read()
                    perf_output_read_group()
                      leader->pmu->read()
      
      Which results in doing a pmu::read() on an !ACTIVE event.
      
      I _think_ this is 'new' since we added attr.inherit_stat, which added
      the perf_event_read_event() to the exit path, without that
      perf_event_read_output() would only trigger from samples and for
      @event to trigger a sample, it's leader _must_ be ACTIVE too.
      
      Still, adding this check makes it consistent with the @sub case for
      the siblings.
      Reported-and-Tested-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9e5b127d
  2. 09 Mar, 2018 8 commits
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo-4.17-20180308' of... · fbf8a1e1
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo-4.17-20180308' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      - Support to display the IPC/Cycle in 'annotate' TUI, for systems
        where this info can be obtained, like Intel's >= Skylake (Jin Yao)
      
      - Support wildcards on PMU name in dynamic PMU events (Agustin Vega-Frias)
      
      - Display pmu name when printing unmerged events in stat (Agustin Vega-Frias)
      
      - Auto-merge PMU events created by prefix or glob match (Agustin Vega-Frias)
      
      - Fix s390 'call' operations target function annotation (Thomas Richter)
      
      - Handle s390 PC relative load and store instruction in the augmented
        'annotate', code, used so far in the TUI modes of 'perf report' and
        'perf annotate' (Thomas Richter)
      
      - Provide libtraceevent with a kernel symbol resolver, so that
        symbols in tracepoint fields can be resolved when showing them in
        tools such as 'perf report' (Wang YanQing)
      
      - Refactor the cgroups code to look more like other code in tools/perf,
        using cgroup__{put,get} for refcount operations instead of its
        open-coded equivalent, breaking larger functions, etc (Arnaldo Carvalho de Melo)
      
      - Implement support for the -G/--cgroup target in 'perf trace', allowing
        strace like tracing (plus other events, backtraces, etc) for cgroups
        (Arnaldo Carvalho de Melo)
      
      - Update thread shortname in 'perf sched map' when the thread's COMM
        changes (Changbin Du)
      
      - refcount 'struct mem_info', for better sharing it over several
        users, avoid duplicating structs and fixing crashes related to
        use after free (Jiri Olsa)
      
      - Display perf.data version, offsets in 'perf report --header' (Jiri Olsa)
      
      - Record the machine's memory topology information in a perf.data
        feature section, to be used by tools such as 'perf c2c' (Jiri Olsa)
      
      - Fix output of forced groups in the header for 'perf report' --stdio
        and --tui (Jiri Olsa)
      
      - Better support llvm, clang, cxx make tests in the build process (Jiri Olsa)
      
      - Streamline the 'struct perf_mmap' methods, storing some info in the
        struct instead of passing it via various methods, shortening its
        signatures (Kan Liang)
      
      - Update the quipper perf.data parser library site information (Stephane Eranian)
      
      - Correct perf's man pages title markers for asciidoctor (Takashi Iwai)
      
      - Intel PT fixes and refactorings paving the way for implementing
        support for AUX area sampling (Adrian Hunter)
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      fbf8a1e1
    • Kan Liang's avatar
      perf/x86/intel: Disable userspace RDPMC usage for large PEBS · 1af22eba
      Kan Liang authored
      Userspace RDPMC cannot possibly work for large PEBS, which was introduced in:
      
        b8241d20699e ("perf/x86/intel: Implement batched PEBS interrupt handling (large PEBS interrupt threshold)")
      
      When the PEBS interrupt threshold is larger than one, there is no way
      to get exact auto-reload times and value for userspace RDPMC.  Disable
      the userspace RDPMC usage when large PEBS is enabled.
      
      The only exception is when the PEBS interrupt threshold is 1, in which
      case user-space RDPMC works well even with auto-reload events.
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Fixes: b8241d20699e ("perf/x86/intel: Implement batched PEBS interrupt handling (large PEBS interrupt threshold)")
      Link: http://lkml.kernel.org/r/1518474035-21006-6-git-send-email-kan.liang@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1af22eba
    • Kan Liang's avatar
      perf/x86/intel: Fix PMU read for auto-reload · ceb90d9e
      Kan Liang authored
      Auto-reload events needs to be specially handled in event count read.
      
      Auto-reload is only available for intel_pmu.
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Fixes: b8241d20699e ("perf/x86/intel: Implement batched PEBS interrupt handling (large PEBS interrupt threshold)")
      Link: http://lkml.kernel.org/r/1518474035-21006-5-git-send-email-kan.liang@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ceb90d9e
    • Kan Liang's avatar
      perf/x86/intel/ds: Introduce ->read() function for auto-reload events and... · 5bee2cc6
      Kan Liang authored
      perf/x86/intel/ds: Introduce ->read() function for auto-reload events and flush the PEBS buffer there
      
      There is no way to get exact auto-reload times and values which are needed
      for event updates unless we flush the PEBS buffer.
      
      Introduce intel_pmu_auto_reload_read() to drain the PEBS buffer for
      auto reload event. To prevent races with the hardware, we can only
      call drain_pebs() when the PMU is disabled.
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Link: http://lkml.kernel.org/r/1518474035-21006-4-git-send-email-kan.liang@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5bee2cc6
    • Kan Liang's avatar
      perf/x86: Introduce a ->read() callback in 'struct x86_pmu' · bcfbe5c4
      Kan Liang authored
      Auto-reload needs to be specially handled when reading event counts.
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Link: http://lkml.kernel.org/r/1518474035-21006-3-git-send-email-kan.liang@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      bcfbe5c4
    • Kan Liang's avatar
      perf/x86/intel: Fix event update for auto-reload · d31fc13f
      Kan Liang authored
      There is a bug when reading event->count with large PEBS enabled.
      
      Here is an example:
      
        # ./read_count
        0x71f0
        0x122c0
        0x1000000001c54
        0x100000001257d
        0x200000000bdc5
      
      In fixed period mode, the auto-reload mechanism could be enabled for
      PEBS events, but the calculation of event->count does not take the
      auto-reload values into account.
      
      Anyone who reads event->count will get the wrong result, e.g x86_pmu_read().
      
      This bug was introduced with the auto-reload mechanism enabled since
      commit:
      
        851559e3 ("perf/x86/intel: Use the PEBS auto reload mechanism when possible")
      
      Introduce intel_pmu_save_and_restart_reload() to calculate the
      event->count only for auto-reload.
      
      Since the counter increments a negative counter value and overflows on
      the sign switch, giving the interval:
      
              [-period, 0]
      
      the difference between two consequtive reads is:
      
       A) value2 - value1;
          when no overflows have happened in between,
       B) (0 - value1) + (value2 - (-period));
          when one overflow happened in between,
       C) (0 - value1) + (n - 1) * (period) + (value2 - (-period));
          when @n overflows happened in between.
      
      Here A) is the obvious difference, B) is the extension to the discrete
      interval, where the first term is to the top of the interval and the
      second term is from the bottom of the next interval and C) the extension
      to multiple intervals, where the middle term is the whole intervals
      covered.
      
      The equation for all cases is:
      
          value2 - value1 + n * period
      
      Previously the event->count is updated right before the sample output.
      But for case A, there is no PEBS record ready. It needs to be specially
      handled.
      
      Remove the auto-reload code from x86_perf_event_set_period() since
      we'll not longer call that function in this case.
      
      Based-on-code-from: Peter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Fixes: 851559e3 ("perf/x86/intel: Use the PEBS auto reload mechanism when possible")
      Link: http://lkml.kernel.org/r/1518474035-21006-2-git-send-email-kan.liang@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d31fc13f
    • Kan Liang's avatar
      perf/x86/intel: Properly save/restore the PMU state in the NMI handler · 82d71ed0
      Kan Liang authored
      The PMU is disabled in intel_pmu_handle_irq(), but cpuc->enabled is not updated
      accordingly.
      
      This is fine in current usage because no-one checks it - but fix it
      for future code: for example, the drain_pebs() will be modified to
      fix an auto-reload bug.
      
      Properly save/restore the old PMU state.
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: kernel test robot <fengguang.wu@intel.com>
      Link: http://lkml.kernel.org/r/6f44ee84-56f8-79f1-559b-08e371eaeb78@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      82d71ed0
    • Kan Liang's avatar
      perf/x86/intel: Fix large period handling on Broadwell CPUs · f605cfca
      Kan Liang authored
      Large fixed period values could be truncated on Broadwell, for example:
      
        perf record -e cycles -c 10000000000
      
      Here the fixed period is 0x2540BE400, but the period which finally applied is
      0x540BE400 - which is wrong.
      
      The reason is that x86_pmu::limit_period() uses an u32 parameter, so the
      high 32 bits of 'period' get truncated.
      
      This bug was introduced in:
      
        commit 294fe0f5 ("perf/x86/intel: Add INST_RETIRED.ALL workarounds")
      
      It's safe to use u64 instead of u32:
      
       - Although the 'left' is s64, the value of 'left' must be positive when
         calling limit_period().
      
       - bdw_limit_period() only modifies the lowest 6 bits, it doesn't touch
         the higher 32 bits.
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: 294fe0f5 ("perf/x86/intel: Add INST_RETIRED.ALL workarounds")
      Link: http://lkml.kernel.org/r/1519926894-3520-1-git-send-email-kan.liang@linux.intel.com
      [ Rewrote unacceptably bad changelog. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f605cfca
  3. 08 Mar, 2018 27 commits