1. 23 Jun, 2013 3 commits
    • Dave Hansen's avatar
      x86: Add NMI duration tracepoints · 0c4df02d
      Dave Hansen authored
      This patch has been invaluable in my adventures finding
      issues in the perf NMI handler.  I'm as big a fan of
      printk() as anybody is, but using printk() in NMIs is
      deadly when they're happening frequently.
      
      Even hacking in trace_printk() ended up eating enough
      CPU to throw off some of the measurements I was making.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: paulus@samba.org
      Cc: acme@ghostprotocols.net
      Cc: Dave Hansen <dave@sr71.net>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0c4df02d
    • Dave Hansen's avatar
      perf: Drop sample rate when sampling is too slow · 14c63f17
      Dave Hansen authored
      This patch keeps track of how long perf's NMI handler is taking,
      and also calculates how many samples perf can take a second.  If
      the sample length times the expected max number of samples
      exceeds a configurable threshold, it drops the sample rate.
      
      This way, we don't have a runaway sampling process eating up the
      CPU.
      
      This patch can tend to drop the sample rate down to level where
      perf doesn't work very well.  *BUT* the alternative is that my
      system hangs because it spends all of its time handling NMIs.
      
      I'll take a busted performance tool over an entire system that's
      busted and undebuggable any day.
      
      BTW, my suspicion is that there's still an underlying bug here.
      Using the HPET instead of the TSC is definitely a contributing
      factor, but I suspect there are some other things going on.
      But, I can't go dig down on a bug like that with my machine
      hanging all the time.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: paulus@samba.org
      Cc: acme@ghostprotocols.net
      Cc: Dave Hansen <dave@sr71.net>
      [ Prettified it a bit. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      14c63f17
    • Dave Hansen's avatar
      x86: Warn when NMI handlers take large amounts of time · 2ab00456
      Dave Hansen authored
      I have a system which is causing all kinds of problems.  It has
      8 NUMA nodes, and lots of cores that can fight over cachelines.
      If things are not working _perfectly_, then NMIs can take longer
      than expected.
      
      If we get too many of them backed up to each other, we can
      easily end up in a situation where we are doing nothing *but*
      running NMIs.  The biggest problem, though, is that this happens
      _silently_.  You might be lucky to get an hrtimer warning, but
      most of the time system simply hangs.
      
      This patch should at least give us some warning before we fall
      off the cliff.  the warnings look like this:
      
      	nmi_handle: perf_event_nmi_handler() took: 26095071 ns
      
      The message is triggered whenever we notice the longest NMI
      we've seen to date.  You can always view and reset this value
      via the debugfs interface if you like.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: paulus@samba.org
      Cc: acme@ghostprotocols.net
      Cc: Dave Hansen <dave@sr71.net>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2ab00456
  2. 20 Jun, 2013 9 commits
  3. 19 Jun, 2013 16 commits
  4. 31 May, 2013 1 commit
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo' of... · afb71193
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      perf/core improvements and fixes:
      
       * Reset SIGTERM handler in workload child process, fix from David Ahern.
      
       * Handle death by SIGTERM in 'perf record', fix from David Ahern.
      
       * Fix printing of perf_event_paranoid message, from David Ahern.
      
       * Handle realloc failures in 'perf kvm', from David Ahern.
      
       * Fix divide by 0 in variance, from David Ahern.
      
       * Save parent pid in thread struct, from David Ahern.
      
       * Handle JITed code in shared memory, from Andi Kleen.
      
       * Makefile reorganization, prep work for Kconfig patches, from Jiri Olsa.
      
       * Fixes for 'perf diff', from Jiri Olsa.
      
       * Add automated make test suite, from Jiri Olsa.
      
       * 'perf tests' fixes from Jiri Olsa.
      
       * Remove some unused struct members, from Jiri Olsa.
      
       * Add missing liblk.a dependency for python/perf.so, fix from Jiri Olsa.
      
       * Respect CROSS_COMPILE in liblk.a, from Rabin Vincent.
      
       * Expand definition of sysfs format attribute, from Michael Ellerman.
      
       * No need to do locking when adding hists in perf report, only 'top'
         needs that, from Namhyung Kim.
      
       * Sorting improvements, from Namhyung Kim.
      
       * Fix alignment of symbol column in in the hists browser (top, report)
         when -v is given, from NAmhyung Kim.
      
       * Add --percent-limit option to 'top' and 'report', from Namhyung Kim.
      
       * Fix 'perf top' -E option behavior, from Namhyung Kim.
      
       * Fix bug in isupper() and islower(), from Sukadev Bhattiprolu.
      
       * Fix compile errors in bp_signal 'perf test', from Sukadev Bhattiprolu.
      
       * Make Power7 CPI stack events available in sysfs, from Sukadev Bhattiprolu.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      afb71193
  5. 30 May, 2013 9 commits
  6. 29 May, 2013 2 commits