1. 06 Jan, 2016 5 commits
    • Andi Kleen's avatar
      perf/x86: Remove warning for zero PEBS status · 957ea1fd
      Andi Kleen authored
      The recent commit:
      
        75f80859 ("perf/x86/intel/pebs: Robustify PEBS buffer drain")
      
      causes lots of warnings on different CPUs before Skylake
      when running PEBS intensive workloads.
      
      They can have a zero status field in the PEBS record when
      PEBS is racing with clearing of GLOBAl_STATUS.
      
      This also can cause hangs (it seems there are still
      problems with printk in NMI).
      
      Disable the warning, but still ignore the record.
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1449177740-5422-1-git-send-email-andi@firstfloor.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      957ea1fd
    • Peter Zijlstra's avatar
      perf/core: Collapse more IPI loops · 7b648018
      Peter Zijlstra authored
      This patch collapses the two 'hard' cases, which are
      perf_event_{dis,en}able().
      
      I cannot seem to convince myself the current code is correct.
      
      So starting with perf_event_disable(); we don't strictly need to test
      for event->state == ACTIVE, ctx->is_active is enough. If the event is
      not scheduled while the ctx is, __perf_event_disable() still does the
      right thing.  Its a little less efficient to IPI in that case,
      over-all simpler.
      
      For perf_event_enable(); the same goes, but I think that's actually
      broken in its current form. The current condition is: ctx->is_active
      && event->state == OFF, that means it doesn't do anything when
      !ctx->active && event->state == OFF. This is wrong, it should still
      mark the event INACTIVE in that case, otherwise we'll still not try
      and schedule the event once the context becomes active again.
      
      This patch implements the two function using the new
      event_function_call() and does away with the tricky event->state
      tests.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarAlexander Shishkin <alexander.shishkin@intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      7b648018
    • Ingo Molnar's avatar
    • Peter Zijlstra's avatar
      perf: Fix race in swevent hash · 12ca6ad2
      Peter Zijlstra authored
      There's a race on CPU unplug where we free the swevent hash array
      while it can still have events on. This will result in a
      use-after-free which is BAD.
      
      Simply do not free the hash array on unplug. This leaves the thing
      around and no use-after-free takes place.
      
      When the last swevent dies, we do a for_each_possible_cpu() iteration
      anyway to clean these up, at which time we'll free it, so no leakage
      will occur.
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Tested-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      12ca6ad2
    • Peter Zijlstra's avatar
      perf: Fix race in perf_event_exec() · c1274499
      Peter Zijlstra authored
      I managed to tickle this warning:
      
        [ 2338.884942] ------------[ cut here ]------------
        [ 2338.890112] WARNING: CPU: 13 PID: 35162 at ../kernel/events/core.c:2702 task_ctx_sched_out+0x6b/0x80()
        [ 2338.900504] Modules linked in:
        [ 2338.903933] CPU: 13 PID: 35162 Comm: bash Not tainted 4.4.0-rc4-dirty #244
        [ 2338.911610] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
        [ 2338.923071]  ffffffff81f1468e ffff8807c6457cb8 ffffffff815c680c 0000000000000000
        [ 2338.931382]  ffff8807c6457cf0 ffffffff810c8a56 ffffe8ffff8c1bd0 ffff8808132ed400
        [ 2338.939678]  0000000000000286 ffff880813170380 ffff8808132ed400 ffff8807c6457d00
        [ 2338.947987] Call Trace:
        [ 2338.950726]  [<ffffffff815c680c>] dump_stack+0x4e/0x82
        [ 2338.956474]  [<ffffffff810c8a56>] warn_slowpath_common+0x86/0xc0
        [ 2338.963195]  [<ffffffff810c8b4a>] warn_slowpath_null+0x1a/0x20
        [ 2338.969720]  [<ffffffff811a49cb>] task_ctx_sched_out+0x6b/0x80
        [ 2338.976244]  [<ffffffff811a62d2>] perf_event_exec+0xe2/0x180
        [ 2338.982575]  [<ffffffff8121fb6f>] setup_new_exec+0x6f/0x1b0
        [ 2338.988810]  [<ffffffff8126de83>] load_elf_binary+0x393/0x1660
        [ 2338.995339]  [<ffffffff811dc772>] ? get_user_pages+0x52/0x60
        [ 2339.001669]  [<ffffffff8121e297>] search_binary_handler+0x97/0x200
        [ 2339.008581]  [<ffffffff8121f8b3>] do_execveat_common.isra.33+0x543/0x6e0
        [ 2339.016072]  [<ffffffff8121fcea>] SyS_execve+0x3a/0x50
        [ 2339.021819]  [<ffffffff819fc165>] stub_execve+0x5/0x5
        [ 2339.027469]  [<ffffffff819fbeb2>] ? entry_SYSCALL_64_fastpath+0x12/0x71
        [ 2339.034860] ---[ end trace ee1337c59a0ddeac ]---
      
      Which is a WARN_ON_ONCE() indicating that cpuctx->task_ctx is not
      what we expected it to be.
      
      This is because context switches can swap the task_struct::perf_event_ctxp[]
      pointer around. Therefore you have to either disable preemption when looking
      at current, or hold ctx->lock.
      
      Fix perf_event_enable_on_exec(), it loads current->perf_event_ctxp[]
      before disabling interrupts, therefore a preemption in the right place
      can swap contexts around and we're using the wrong one.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: syzkaller <syzkaller@googlegroups.com>
      Link: http://lkml.kernel.org/r/20151210195740.GG6357@twins.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c1274499
  2. 18 Dec, 2015 4 commits
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo-3' of... · d64fe8e6
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo-3' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull new perf tool feature from Arnaldo Carvalho de Melo:
      
      " User visible changes:
      
        - Generate perf.data files from 'perf stat', to tap into the scripting
          capabilities perf has instead of defining a 'perf stat' specific scripting
          support to calculate event ratios, etc. Simple example:
      
          $ perf stat record -e cycles usleep 1
      
           Performance counter stats for 'usleep 1':
      
                 1,134,996      cycles
      
               0.000670644 seconds time elapsed
      
          $ perf stat report
      
           Performance counter stats for '/home/acme/bin/perf stat record -e cycles usleep 1':
      
                 1,134,996      cycles
      
               0.000670644 seconds time elapsed
      
          $
      
          It generates PERF_RECORD_ userspace records to store the details:
      
          $ perf report -D | grep PERF_RECORD
          0xf0 [0x28]: PERF_RECORD_THREAD_MAP nr: 1 thread: 27637
          0x118 [0x12]: PERF_RECORD_CPU_MAP nr: 1 cpu: 65535
          0x12a [0x40]: PERF_RECORD_STAT_CONFIG
          0x16a [0x30]: PERF_RECORD_STAT
          -1 -1 0x19a [0x40]: PERF_RECORD_MMAP -1/0: [0xffffffff81000000(0x1f000000) @ 0xffffffff81000000]: x [kernel.kallsyms]_text
          0x1da [0x18]: PERF_RECORD_STAT_ROUND
          [acme@ssdandy linux]$
      
          An effort was made to make perf.data files generated like this to not
          generate cryptic messages when processed by older tools.
      
          The 'perf script' bits need rebasing, will go up later.
      
        Jiri's cover letter for this series:
      
        The initial attempt defined its own formula lang and allowed triggering user's
        script on the end of the stat command:
      
          http://marc.info/?l=linux-kernel&m=136742146322273&w=2
      
        This patchset abandons the idea of new formula language and rather adds support
        to:
      
          - store stat data into perf.data file
          - add python support to process stat events
      
        Basically it allows to store stat data into perf.data and post process it with
        python scripts in a similar way we do for sampling data.
      
        The stat data are stored in new stat, stat-round, stat-config user events.
          stat        - stored for each read syscall of the counter
          stat round  - stored for each interval or end of the command invocation
          stat config - stores all the config information needed to process data
                        so report tool could restore the same output as record
      
        The python script can now define 'stat__<eventname>_<modifier>' functions
        to get stat events data and 'stat__interval' to get stat-round data.
      
        See CPI script example in scripts/python/stat-cpi.py."
      
      Also a few other changes:
      
      User visible changes:
      
        - Make command line options always available, even when they
          depend on some feature being enabled, warning the user about
          use of such options (Wang Nan)
      
        - Support --vmlinux in perf record, useful, so far, for eBPF,
          where we will set up events that will be used in the record
          session (He Kuang)
      
        - Automatically disable collecting branch flags and cycles with
          --call-graph lbr. This allows avoiding a bunch of extra MSR
          reads in the PMI on Skylake.  (Andi Kleen)
      
      Infrastructure changes:
      
        - Dump the stack when a 'perf test -v ' entry segfaults, so far we
          would have to run it under gdb with 'set follow-fork-mode child'
          set to get a proper backtrace (Arnaldo Carvalho de Melo)
      
        - Initialize the refcnt in 'struct thread' to 1 and fixup its
          users accordingly, so that we try to have the same refcount
          model accross the perf codebase (Arnaldo Carvalho de Melo)
      
        - More prep work for moving the subcmd infrastructure out of
          tools/perf/ and into tools/lib/subcmd/ to be used by other
          tools/ living utilities (Josh Poimboeuf)
      
        - Fix 'perf test' hist testcases when kptr_restrict is on (Namhyung Kim)
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d64fe8e6
    • Ingo Molnar's avatar
      Merge branch 'perf/urgent' into perf/core, to make sure a cherry-picked commit... · 141a361e
      Ingo Molnar authored
      Merge branch 'perf/urgent' into perf/core, to make sure a cherry-picked commit does not create conflicts
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      141a361e
    • Ingo Molnar's avatar
      Merge tag 'perf-urgent-for-mingo' of... · 2d2e7ac1
      Ingo Molnar authored
      Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
      
      Pull perf/urgent tooling fix from Arnaldo Carvalho de Melo:
      
        User visible changes:
      
          - Fix 'perf list' segfault due to lack of support for PERF_CONF_SW_BPF_OUTPUT
            in an array used just for printing available events, robustify the code
            involved (Arnaldo Carvalho de Melo)
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2d2e7ac1
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo-2.1' of... · b21daaed
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo-2.1' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements from Arnaldo Carvalho de Melo:
      
      User visible changes:
      
        - Add record.build-id config option to 'perf record', to allow configuring
          in the ~/.perfconfig file if and how build-ids should be processed, allowing
          a permanent setting for options such as -B and -N: (Namhyung Kim)
      
          $ perf record -h -B -N
      
           Usage: perf record [<options>] [<command>]
              or: perf record [<options>] -- <command> [<options>]
      
              -B, --no-buildid       do not collect buildids in perf.data
              -N, --no-buildid-cache do not update the buildid cache
      
          $
      
      Infrastructure changes:
      
        - Move code for options parsing and subcommand handling from tools/perf/
          to tools/lib/subcmd/, so that it can be used by other tools/ living
          utilities (Josh Poimboeuf)
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b21daaed
  3. 17 Dec, 2015 31 commits