1. 26 Jan, 2009 3 commits
    • Arnaldo Carvalho de Melo's avatar
      blktrace: add ftrace plugin · c71a8961
      Arnaldo Carvalho de Melo authored
      Impact: New way of using the blktrace infrastructure
      
      This drops the requirement of userspace utilities to use the blktrace
      facility.
      
      Configuration is done thru sysfs, adding a "trace" directory to the
      partition directory where blktrace can be enabled for the associated
      request_queue.
      
      The same filters present in the IOCTL interface are present as sysfs
      device attributes.
      
      The /sys/block/sdX/sdXN/trace/enable file allows tracing without any
      filters.
      
      The other files in this directory: pid, act_mask, start_lba and end_lba
      can be used with the same meaning as with the IOCTL interface.
      
      Using the sysfs interface will only setup the request_queue->blk_trace
      fields, tracing will only take place when the "blk" tracer is selected
      via the ftrace interface, as in the following example:
      
      To see the trace, one can use the /d/tracing/trace file or the
      /d/tracign/trace_pipe file, with semantics defined in the ftrace
      documentation in Documentation/ftrace.txt.
      
      [root@f10-1 ~]# cat /t/trace
             kjournald-305   [000]  3046.491224:   8,1    A WBS 6367 + 8 <- (8,1) 6304
             kjournald-305   [000]  3046.491227:   8,1    Q   R 6367 + 8 [kjournald]
             kjournald-305   [000]  3046.491236:   8,1    G  RB 6367 + 8 [kjournald]
             kjournald-305   [000]  3046.491239:   8,1    P  NS [kjournald]
             kjournald-305   [000]  3046.491242:   8,1    I RBS 6367 + 8 [kjournald]
             kjournald-305   [000]  3046.491251:   8,1    D  WB 6367 + 8 [kjournald]
             kjournald-305   [000]  3046.491610:   8,1    U  WS [kjournald] 1
                <idle>-0     [000]  3046.511914:   8,1    C  RS 6367 + 8 [6367]
      [root@f10-1 ~]#
      
      The default line context (prefix) format is the one described in the ftrace
      documentation, with the blktrace specific bits using its existing format,
      described in blkparse(8).
      
      If one wants to have the classic blktrace formatting, this is possible by
      using:
      
      [root@f10-1 ~]# echo blk_classic > /t/trace_options
      [root@f10-1 ~]# cat /t/trace
        8,1    0  3046.491224   305  A WBS 6367 + 8 <- (8,1) 6304
        8,1    0  3046.491227   305  Q   R 6367 + 8 [kjournald]
        8,1    0  3046.491236   305  G  RB 6367 + 8 [kjournald]
        8,1    0  3046.491239   305  P  NS [kjournald]
        8,1    0  3046.491242   305  I RBS 6367 + 8 [kjournald]
        8,1    0  3046.491251   305  D  WB 6367 + 8 [kjournald]
        8,1    0  3046.491610   305  U  WS [kjournald] 1
        8,1    0  3046.511914     0  C  RS 6367 + 8 [6367]
      [root@f10-1 ~]#
      
      Using the ftrace standard format allows more flexibility, such
      as the ability of asking for backtraces via trace_options:
      
      [root@f10-1 ~]# echo noblk_classic > /t/trace_options
      [root@f10-1 ~]# echo stacktrace > /t/trace_options
      
      [root@f10-1 ~]# cat /t/trace
             kjournald-305   [000]  3318.826779:   8,1    A WBS 6375 + 8 <- (8,1) 6312
             kjournald-305   [000]  3318.826782:
       <= submit_bio
       <= submit_bh
       <= sync_dirty_buffer
       <= journal_commit_transaction
       <= kjournald
       <= kthread
       <= child_rip
             kjournald-305   [000]  3318.826836:   8,1    Q   R 6375 + 8 [kjournald]
             kjournald-305   [000]  3318.826837:
       <= generic_make_request
       <= submit_bio
       <= submit_bh
       <= sync_dirty_buffer
       <= journal_commit_transaction
       <= kjournald
       <= kthread
      
      Please read the ftrace documentation to use aditional, standardized
      tracing filters such as /d/tracing/trace_cpumask, etc.
      
      See also /d/tracing/trace_mark to add comments in the trace stream,
      that is equivalent to the /d/block/sdaN/msg interface.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      c71a8961
    • Arnaldo Carvalho de Melo's avatar
      ftrace: add ftrace_vprintk · 9011262a
      Arnaldo Carvalho de Melo authored
      Impact: new helper function
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      9011262a
    • Ingo Molnar's avatar
  2. 23 Jan, 2009 5 commits
    • Frederic Weisbecker's avatar
      tracing/function-graph-tracer: various fixes and features · 9005f3eb
      Frederic Weisbecker authored
      This patch brings various bugfixes:
      
      - Drop the first irrelevant task switch on the very beginning of a trace.
      
      - Drop the OVERHEAD word from the headers, the DURATION word is sufficient
        and will not overlap other columns.
      
      - Make the headers fit well their respective columns whatever the
        selected options.
      
      Ie, default options:
      
       # tracer: function_graph
       #
       # CPU  DURATION                  FUNCTION CALLS
       # |     |   |                     |   |   |   |
      
        1)   0.646 us    |                    }
        1)               |                    mem_cgroup_del_lru_list() {
        1)   0.624 us    |                      lookup_page_cgroup();
        1)   1.970 us    |                    }
      
       echo funcgraph-proc > trace_options
      
       # tracer: function_graph
       #
       # CPU  TASK/PID        DURATION                  FUNCTION CALLS
       # |    |    |           |   |                     |   |   |   |
      
        0)   bash-2937    |   0.895 us    |                }
        0)   bash-2937    |   0.888 us    |                __rcu_read_unlock();
        0)   bash-2937    |   0.864 us    |                conv_uni_to_pc();
        0)   bash-2937    |   1.015 us    |                __rcu_read_lock();
      
       echo nofuncgraph-cpu > trace_options
       echo nofuncgraph-proc > trace_options
      
       # tracer: function_graph
       #
       #   DURATION                  FUNCTION CALLS
       #    |   |                     |   |   |   |
      
         3.752 us    |                  native_pud_val();
         0.616 us    |                  native_pud_val();
         0.624 us    |                  native_pmd_val();
      
      About features, one can now disable the duration (this will hide the
      overhead too for convenient reasons and because on  doesn't need
      overhead if it hasn't the duration):
      
       echo nofuncgraph-duration > trace_options
      
       # tracer: function_graph
       #
       #                FUNCTION CALLS
       #                |   |   |   |
      
                 cap_vm_enough_memory() {
                   __vm_enough_memory() {
                     vm_acct_memory();
                   }
                 }
               }
      
      And at last, an option to print the absolute time:
      
       //Restart from default options
       echo funcgraph-abstime > trace_options
      
       # tracer: function_graph
       #
       #      TIME       CPU  DURATION                  FUNCTION CALLS
       #       |         |     |   |                     |   |   |   |
      
         261.339774 |   1) + 42.823 us   |    }
         261.339775 |   1)   1.045 us    |    _spin_lock_irq();
         261.339777 |   1)   0.940 us    |    _spin_lock_irqsave();
         261.339778 |   1)   0.752 us    |    _spin_unlock_irqrestore();
         261.339780 |   1)   0.857 us    |    _spin_unlock_irq();
         261.339782 |   1)               |    flush_to_ldisc() {
         261.339783 |   1)               |      tty_ldisc_ref() {
         261.339783 |   1)               |        tty_ldisc_try() {
         261.339784 |   1)   1.075 us    |          _spin_lock_irqsave();
         261.339786 |   1)   0.842 us    |          _spin_unlock_irqrestore();
         261.339788 |   1)   4.211 us    |        }
         261.339788 |   1)   5.662 us    |      }
      
      The format is seconds.usecs.
      
      I guess no one needs the nanosec precision here, the main goal is to have
      an overview about the general timings of events, and to see the place when
      the trace switches from one cpu to another.
      
      ie:
      
         274.874760 |   1)   0.676 us    |      _spin_unlock();
         274.874762 |   1)   0.609 us    |      native_load_sp0();
         274.874763 |   1)   0.602 us    |      native_load_tls();
         274.878739 |   0)   0.722 us    |                  }
         274.878740 |   0)   0.714 us    |                  native_pmd_val();
         274.878741 |   0)   0.730 us    |                  native_pmd_val();
      
      Here there is a 4000 usecs difference when we switch the cpu.
      
      Changes in V2:
      
      - Completely fix the first pointless task switch.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      9005f3eb
    • Steven Rostedt's avatar
      trace, lockdep: manual preempt count adding for local_bh_disable · 7e49fcce
      Steven Rostedt authored
      Impact: fix to preempt trace triggering lockdep check_flag failure
      
      In local_bh_disable, the use of add_preempt_count causes the
      preempt tracer to start recording the time preemption is off.
      But because it already modified the preempt_count to show
      softirqs disabled, and before it called the lockdep code to
      handle this, it causes a state that lockdep can not handle.
      
      The preempt tracer will reset the ring buffer on start of a trace,
      and the ring buffer reset code does a spin_lock_irqsave. This
      calls into lockdep and lockdep will fail when it detects the
      invalid state of having softirqs disabled but the internal
      current->softirqs_enabled is still set.
      
      The fix is to manually add the SOFTIRQ_OFFSET to preempt count
      and call the preempt tracer code outside the lockdep critical
      area.
      
      Thanks to Peter Zijlstra for suggesting this solution.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      7e49fcce
    • Steven Rostedt's avatar
      trace: fix logic to start/stop counting · b06a8301
      Steven Rostedt authored
      The logic in the tracing_start/stop code prevents the WARN_ON
      from ever detecting if a start/stop pair was mismatched.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b06a8301
    • Steven Rostedt's avatar
      trace: remove internal irqsoff disabling for trace output · 94523e81
      Steven Rostedt authored
      Impact: cleanup of duplicate features
      
      The trace output disables the ring buffer and prevents tracing to
      occur. The code in irqsoff to do the same thing is no longer needed.
      This patch removes it.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      94523e81
    • Ingo Molnar's avatar
      9b036389
  3. 22 Jan, 2009 6 commits
  4. 20 Jan, 2009 7 commits
  5. 18 Jan, 2009 2 commits
    • Steven Rostedt's avatar
      ftrace: test for running of recordmcount.pl twice on an object · b43f7093
      Steven Rostedt authored
      Impact: fix failure of dynamic function tracer selftest
      
      In a course of development, a developer does several makes on their
      kernel. Sometimes, the make might do something abnormal. In the
      case of running the recordmcount.pl script on an object twice,
      the script will duplicate all the calls to mcount in the __mcount_loc
      section.
      
      On boot up, the dynamic function tracer is careful when it modifies
      code, and performs several consistency checks. One is to not modify
      the call site if it is not what it expects it to be. If a function
      call site is listed twice, the first entry will convert the site
      to a nop, and the second will fail because it expected to see a
      call to mcount, but instead it sees a nop. Thus, the function tracer
      is disabled.
      
      Eric Sesterhenn reported seeing:
      
      [    1.055440] ftrace: converting mcount calls to 0f 1f 44 00 00
      [    1.055568] ftrace: allocating 29418 entries in 116 pages
      [    1.061000] ------------[ cut here ]------------
      [    1.061000] WARNING: at kernel/trace/ftrace.c:441
      
       [...]
      
      [    1.060000] ---[ end trace 4eaa2a86a8e2da23 ]---
      [    1.060000] ftrace failed to modify [<c0118072>] check_corruption+0x3/0x2d
      [    1.060000]  actual: 0f:1f:44:00:00
      
      This warning shows that check_corruption+0x3 already had a nop in
      its place (0x0f1f440000). After compiling another kernel the problem
      went away.
      
      Later Eric Paris notice the same type of issue. Luckily, he saved
      the vmlinux file that caused it. In the file we found a bunch of
      duplicate mcount call site records, which lead us to the script.
      
      Perhaps this problem only happens to people named Eric.
      
      This patch changes the script to test if the __mcount_loc already
      exists in the object file, and if it does, it will print out
      an error message and kill the compile.
      Reported-by: default avatarEric Sesterhenn <snakebyte@gmx.de>
      Reported-by: default avatarEric Paris <eparis@redhat.com>
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b43f7093
    • Ingo Molnar's avatar
  6. 16 Jan, 2009 17 commits