1. 19 Feb, 2013 1 commit
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Call ftrace cleanup module notifier after all other notifiers · 8c189ea6
      Steven Rostedt (Red Hat) authored
      Commit: c1bf08ac "ftrace: Be first to run code modification on modules"
      
      changed ftrace module notifier's priority to INT_MAX in order to
      process the ftrace nops before anything else could touch them
      (namely kprobes). This was the correct thing to do.
      
      Unfortunately, the ftrace module notifier also contains the ftrace
      clean up code. As opposed to the set up code, this code should be
      run *after* all the module notifiers have run in case a module is doing
      correct clean-up and unregisters its ftrace hooks. Basically, ftrace
      needs to do clean up on module removal, as it needs to know about code
      being removed so that it doesn't try to modify that code. But after it
      removes the module from its records, if a ftrace user tries to remove
      a probe, that removal will fail due as the record of that code segment
      no longer exists.
      
      Nothing really bad happens if the probe removal is called after ftrace
      did the clean up, but the ftrace removal function will return an error.
      Correct code (such as kprobes) will produce a WARN_ON() if it fails
      to remove the probe. As people get annoyed by frivolous warnings, it's
      best to do the ftrace clean up after everything else.
      
      By splitting the ftrace_module_notifier into two notifiers, one that
      does the module load setup that is run at high priority, and the other
      that is called for module clean up that is run at low priority, the
      problem is solved.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarFrank Ch. Eigler <fche@redhat.com>
      Acked-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      8c189ea6
  2. 12 Feb, 2013 1 commit
    • Steven Rostedt's avatar
      tracing/syscalls: Allow archs to ignore tracing compat syscalls · f431b634
      Steven Rostedt authored
      The tracing of ia32 compat system calls has been a bit of a pain as they
      use different system call numbers than the 64bit equivalents.
      
      I wrote a simple 'lls' program that lists files. I compiled it as a i686
      ELF binary and ran it under a x86_64 box. This is the result:
      
      echo 0 > /debug/tracing/tracing_on
      echo 1 > /debug/tracing/events/syscalls/enable
      echo 1 > /debug/tracing/tracing_on ; ./lls ; echo 0 > /debug/tracing/tracing_on
      
      grep lls /debug/tracing/trace
      
      [.. skipping calls before TS_COMPAT is set ...]
      
                   lls-1127  [005] d...   936.409188: sys_recvfrom(fd: 0, ubuf: 4d560fc4, size: 0, flags: 8048034, addr: 8, addr_len: f7700420)
                   lls-1127  [005] d...   936.409190: sys_recvfrom -> 0x8a77000
                   lls-1127  [005] d...   936.409211: sys_lgetxattr(pathname: 0, name: 1000, value: 3, size: 22)
                   lls-1127  [005] d...   936.409215: sys_lgetxattr -> 0xf76ff000
                   lls-1127  [005] d...   936.409223: sys_dup2(oldfd: 4d55ae9b, newfd: 4)
                   lls-1127  [005] d...   936.409228: sys_dup2 -> 0xfffffffffffffffe
                   lls-1127  [005] d...   936.409236: sys_newfstat(fd: 4d55b085, statbuf: 80000)
                   lls-1127  [005] d...   936.409242: sys_newfstat -> 0x3
                   lls-1127  [005] d...   936.409243: sys_removexattr(pathname: 3, name: ffcd0060)
                   lls-1127  [005] d...   936.409244: sys_removexattr -> 0x0
                   lls-1127  [005] d...   936.409245: sys_lgetxattr(pathname: 0, name: 19614, value: 1, size: 2)
                   lls-1127  [005] d...   936.409248: sys_lgetxattr -> 0xf76e5000
                   lls-1127  [005] d...   936.409248: sys_newlstat(filename: 3, statbuf: 19614)
                   lls-1127  [005] d...   936.409249: sys_newlstat -> 0x0
                   lls-1127  [005] d...   936.409262: sys_newfstat(fd: f76fb588, statbuf: 80000)
                   lls-1127  [005] d...   936.409279: sys_newfstat -> 0x3
                   lls-1127  [005] d...   936.409279: sys_close(fd: 3)
                   lls-1127  [005] d...   936.421550: sys_close -> 0x200
                   lls-1127  [005] d...   936.421558: sys_removexattr(pathname: 3, name: ffcd00d0)
                   lls-1127  [005] d...   936.421560: sys_removexattr -> 0x0
                   lls-1127  [005] d...   936.421569: sys_lgetxattr(pathname: 4d564000, name: 1b1abc, value: 5, size: 802)
                   lls-1127  [005] d...   936.421574: sys_lgetxattr -> 0x4d564000
                   lls-1127  [005] d...   936.421575: sys_capget(header: 4d70f000, dataptr: 1000)
                   lls-1127  [005] d...   936.421580: sys_capget -> 0x0
                   lls-1127  [005] d...   936.421580: sys_lgetxattr(pathname: 4d710000, name: 3000, value: 3, size: 812)
                   lls-1127  [005] d...   936.421589: sys_lgetxattr -> 0x4d710000
                   lls-1127  [005] d...   936.426130: sys_lgetxattr(pathname: 4d713000, name: 2abc, value: 3, size: 32)
                   lls-1127  [005] d...   936.426141: sys_lgetxattr -> 0x4d713000
                   lls-1127  [005] d...   936.426145: sys_newlstat(filename: 3, statbuf: f76ff3f0)
                   lls-1127  [005] d...   936.426146: sys_newlstat -> 0x0
                   lls-1127  [005] d...   936.431748: sys_lgetxattr(pathname: 0, name: 1000, value: 3, size: 22)
      
      Obviously I'm not calling newfstat with a fd of 4d55b085. The calls are
      obviously incorrect, and confusing.
      
      Other efforts have been made to fix this:
      
      https://lkml.org/lkml/2012/3/26/367
      
      But the real solution is to rewrite the syscall internals and come up
      with a fixed solution. One that doesn't require all the kluge that the
      current solution has.
      
      Thus for now, instead of outputting incorrect data, simply ignore them.
      With this patch the changes now have:
      
       #> grep lls /debug/tracing/trace
       #>
      
      Compat system calls simply are not traced. If users need compat
      syscalls, then they should just use the raw syscall tracepoints.
      
      For an architecture to make their compat syscalls ignored, it must
      define ARCH_TRACE_IGNORE_COMPAT_SYSCALLS (done in asm/ftrace.h) and also
      define an arch_trace_is_compat_syscall() function that will return true
      if the current task should ignore tracing the syscall.
      
      I want to stress that this change does not affect actual syscalls in any
      way, shape or form. It is only used within the tracing system and
      doesn't interfere with the syscall logic at all. The changes are
      consolidated nicely into trace_syscalls.c and asm/ftrace.h.
      
      I had to make one small modification to asm/thread_info.h and that was
      to remove the include of asm/ftrace.h. As asm/ftrace.h required the
      current_thread_info() it was causing include hell. That include was
      added back in 2008 when the function graph tracer was added:
      
       commit caf4b323 "tracing, x86: add low level support for ftrace return tracing"
      
      It does not need to be included there.
      
      Link: http://lkml.kernel.org/r/1360703939.21867.99.camel@gandalf.local.homeAcked-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      f431b634
  3. 06 Feb, 2013 24 commits
  4. 03 Feb, 2013 1 commit
  5. 01 Feb, 2013 2 commits
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Init current_trace to nop_trace and remove NULL checks · d840f718
      Steven Rostedt (Red Hat) authored
      On early boot up, when the ftrace ring buffer is initialized, the
      static variable current_trace is initialized to &nop_trace.
      Before this initialization, current_trace is NULL and will never
      become NULL again. It is always reassigned to a ftrace tracer.
      
      Several places check if current_trace is NULL before it uses
      it, and this check is frivolous, because at the point in time
      when the checks are made the only way current_trace could be
      NULL is if ftrace failed its allocations at boot up, and the
      paths to these locations would probably not be possible.
      
      By initializing current_trace to &nop_trace where it is declared,
      current_trace will never be NULL, and we can remove all these
      checks of current_trace being NULL which never needed to be
      checked in the first place.
      
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      d840f718
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo' of... · 9c4c5fd9
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      . Make some POWER7 events available in sysfs, equivalent to
        what was done on x86, from Sukadev Bhattiprolu.
      
      . Add event group view, from Namyung Kim:
      
        To use it, 'perf record' should group events when recording. And then perf
        report parses the saved group relation from file header and prints them
        together if --group option is provided.  You can use 'perf evlist' command to
        see event group information:
      
          $ perf record -e '{ref-cycles,cycles}' noploop 1
          [ perf record: Woken up 2 times to write data ]
          [ perf record: Captured and wrote 0.385 MB perf.data (~16807 samples) ]
      
          $ perf evlist --group
          {ref-cycles,cycles}
      
        With this example, default perf report will show you each event
        separately like this:
      
          $ perf report
          ...
          # group: {ref-cycles,cycles}
          # ========
          # Samples: 3K of event 'ref-cycles'
          # Event count (approx.): 3153797218
          #
          # Overhead  Command      Shared Object                      Symbol
          # ........  .......  .................  ..........................
              99.84%  noploop  noploop            [.] main
               0.07%  noploop  ld-2.15.so         [.] strcmp
               0.03%  noploop  [kernel.kallsyms]  [k] timerqueue_del
               0.03%  noploop  [kernel.kallsyms]  [k] sched_clock_cpu
               0.02%  noploop  [kernel.kallsyms]  [k] account_user_time
               0.01%  noploop  [kernel.kallsyms]  [k] __alloc_pages_nodemask
               0.00%  noploop  [kernel.kallsyms]  [k] native_write_msr_safe
      
          # Samples: 3K of event 'cycles'
          # Event count (approx.): 3722310525
          #
          # Overhead  Command      Shared Object                     Symbol
          # ........  .......  .................  .........................
              99.76%  noploop  noploop            [.] main
               0.11%  noploop  [kernel.kallsyms]  [k] _raw_spin_lock
               0.06%  noploop  [kernel.kallsyms]  [k] find_get_page
               0.03%  noploop  [kernel.kallsyms]  [k] sched_clock_cpu
               0.02%  noploop  [kernel.kallsyms]  [k] rcu_check_callbacks
               0.02%  noploop  [kernel.kallsyms]  [k] __current_kernel_time
               0.00%  noploop  [kernel.kallsyms]  [k] native_write_msr_safe
      
        In this case the event group information will be shown in the end of
        header area.  So you can use --group option to enable event group view.
      
          $ perf report --group
          ...
          # group: {ref-cycles,cycles}
          # ========
          # Samples: 7K of event 'anon group { ref-cycles, cycles }'
          # Event count (approx.): 6876107743
          #
          #         Overhead  Command      Shared Object                      Symbol
          # ................  .......  .................  ..........................
              99.84%  99.76%  noploop  noploop            [.] main
               0.07%   0.00%  noploop  ld-2.15.so         [.] strcmp
               0.03%   0.00%  noploop  [kernel.kallsyms]  [k] timerqueue_del
               0.03%   0.03%  noploop  [kernel.kallsyms]  [k] sched_clock_cpu
               0.02%   0.00%  noploop  [kernel.kallsyms]  [k] account_user_time
               0.01%   0.00%  noploop  [kernel.kallsyms]  [k] __alloc_pages_nodemask
               0.00%   0.00%  noploop  [kernel.kallsyms]  [k] native_write_msr_safe
               0.00%   0.11%  noploop  [kernel.kallsyms]  [k] _raw_spin_lock
               0.00%   0.06%  noploop  [kernel.kallsyms]  [k] find_get_page
               0.00%   0.02%  noploop  [kernel.kallsyms]  [k] rcu_check_callbacks
               0.00%   0.02%  noploop  [kernel.kallsyms]  [k] __current_kernel_time
      
        As you can see the Overhead column now contains both of ref-cycles and
        cycles and header line shows group information also - 'anon group {
        ref-cycles, cycles }'.  The output is sorted by period of group leader
        first.
      
        If perf.data file doesn't contain group information, this --group
        option does nothing.  So if you want enable event group view by
        default you can set it in ~/.perfconfig file:
      
          $ cat ~/.perfconfig
          [report]
          group = true
      
        It can be overridden with command line if you want:
      
          $ perf report --no-group
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9c4c5fd9
  6. 31 Jan, 2013 11 commits