1. 17 Feb, 2009 1 commit
    • Steven Rostedt's avatar
      ftrace: trace different functions with a different tracer · 59df055f
      Steven Rostedt authored
      Impact: new feature
      
      Currently, the function tracer only gives you an ability to hook
      a tracer to all functions being traced. The dynamic function trace
      allows you to pick and choose which of those functions will be
      traced, but all functions being traced will call all tracers that
      registered with the function tracer.
      
      This patch adds a new feature that allows a tracer to hook to specific
      functions, even when all functions are being traced. It allows for
      different functions to call different tracer hooks.
      
      The way this is accomplished is by a special function that will hook
      to the function tracer and will set up a hash table knowing which
      tracer hook to call with which function. This is the most general
      and easiest method to accomplish this. Later, an arch may choose
      to supply their own method in changing the mcount call of a function
      to call a different tracer. But that will be an exercise for the
      future.
      
      To register a function:
      
       struct ftrace_hook_ops {
      	void			(*func)(unsigned long ip,
      					unsigned long parent_ip,
      					void **data);
      	int			(*callback)(unsigned long ip, void **data);
      	void			(*free)(void **data);
       };
      
       int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
      				  void *data);
      
      glob is a simple glob to search for the functions to hook.
      ops is a pointer to the operations (listed below)
      data is the default data to be passed to the hook functions when traced
      
      ops:
       func is the hook function to call when the functions are traced
       callback is a callback function that is called when setting up the hash.
         That is, if the tracer needs to do something special for each
         function, that is being traced, and wants to give each function
         its own data. The address of the entry data is passed to this
         callback, so that the callback may wish to update the entry to
         whatever it would like.
       free is a callback for when the entry is freed. In case the tracer
         allocated any data, it is give the chance to free it.
      
      To unregister we have three functions:
      
        void
        unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
      				void *data)
      
      This will unregister all hooks that match glob, point to ops, and
      have its data matching data. (note, if glob is NULL, blank or '*',
      all functions will be tested).
      
        void
        unregister_ftrace_function_hook_func(char *glob,
      				 struct ftrace_hook_ops *ops)
      
      This will unregister all functions matching glob that has an entry
      pointing to ops.
      
        void unregister_ftrace_function_hook_all(char *glob)
      
      This simply unregisters all funcs.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      59df055f
  2. 16 Feb, 2009 9 commits
    • Steven Rostedt's avatar
      ftrace: consolidate mutexes · e6ea44e9
      Steven Rostedt authored
      Impact: clean up
      
      Now that ftrace_lock is a mutex, there is no reason to have three
      different mutexes protecting similar data. All the mutex paths
      are not in hot paths, so having a mutex to cover more data is
      not a problem.
      
      This patch removes the ftrace_sysctl_lock and ftrace_start_lock
      and uses the ftrace_lock to protect the locations that were protected
      by these locks. By doing so, this change also removes some of
      the lock nesting that was taking place.
      
      There are still more mutexes in ftrace.c that can probably be
      consolidated, but they can be dealt with later. We need to be careful
      about the way the locks are nested, and by consolidating, we can cause
      a recursive deadlock.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      e6ea44e9
    • Steven Rostedt's avatar
      ftrace: convert ftrace_lock from a spinlock to mutex · 52baf119
      Steven Rostedt authored
      Impact: clean up
      
      The older versions of ftrace required doing the ftrace list
      search under atomic context. Now all the calls are in non-atomic
      context. There is no reason to keep the ftrace_lock as a spinlock.
      
      This patch converts it to a mutex.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      52baf119
    • Steven Rostedt's avatar
      ftrace: add command interface for function selection · f6180773
      Steven Rostedt authored
      Allow for other tracers to add their own commands for function
      selection. This interface gives a trace the ability to name a
      command for function selection. Right now it is pretty limited
      in what it offers, but this is a building step for more features.
      
      The :mod: command is converted to this interface and also serves
      as a template for other implementations.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      f6180773
    • Steven Rostedt's avatar
      ftrace: enable filtering only when a function is filtered on · e68746a2
      Steven Rostedt authored
      Impact: fix to prevent empty set_ftrace_filter and no ftrace output
      
      The function filter is used to only trace a given set of functions.
      The filter is enabled when a function name is echoed into the
      set_ftrace_filter file. But if the name has a typo and the function
      is not found, the filter is enabled, but no function is listed.
      
      This makes a confusing situation where set_ftrace_filter is empty
      but no functions ever get enabled for tracing.
      
      For example:
      
       # cat /debug/tracing/set_ftrace_filter
      
        #### all functions enabled ####
      
       # echo bad_name > set_ftrace_filter
       # cat /debug/tracing/set_ftrace_filter
      
       # echo function > current_tracer
       # cat trace
      
        # tracer: nop
        #
        #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
        #              | |       |          |         |
      
      This patch changes that to only enable filtering if a function
      is set to be filtered on. Now, the filter is not enabled if
      a bad name is echoed into set_ftrace_filter.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      e68746a2
    • Steven Rostedt's avatar
      ftrace: add module command function filter selection · 64e7c440
      Steven Rostedt authored
      This patch adds a "command" syntax to the function filtering files:
      
        /debugfs/tracing/set_ftrace_filter
        /debugfs/tracing/set_ftrace_notrace
      
      Of the format:  <function>:<command>:<parameter>
      
      The command is optional, and dependent on the command, so are
      the parameters.
      
       echo do_fork > set_ftrace_filter
      
      Will only trace 'do_fork'.
      
       echo 'sched_*' > set_ftrace_filter
      
      Will only trace functions starting with the letters 'sched_'.
      
       echo '*:mod:ext3' > set_ftrace_filter
      
      Will trace only the ext3 module functions.
      
       echo '*write*:mod:ext3' > set_ftrace_notrace
      
      Will prevent the ext3 functions with the letters 'write' in
      the name from being traced.
      
       echo '!*_allocate:mod:ext3' > set_ftrace_filter
      
      Will remove the functions in ext3 that end with the letters
      '_allocate' from the ftrace filter.
      
      Although this patch implements the 'command' format, only the
      'mod' command is supported. More commands to follow.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      64e7c440
    • Steven Rostedt's avatar
      ftrace: break up ftrace_match_records into smaller components · 9f4801e3
      Steven Rostedt authored
      Impact: clean up
      
      ftrace_match_records does a lot of things that other features
      can use. This patch breaks up ftrace_match_records and pulls
      out ftrace_setup_glob and ftrace_match_record.
      
      ftrace_setup_glob prepares a simple glob expression for use with
      ftrace_match_record. ftrace_match_record compares a single record
      with a glob type.
      
      Breaking this up will allow for more features to run on individual
      records.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      9f4801e3
    • Steven Rostedt's avatar
      ftrace: rename ftrace_match to ftrace_match_records · 7f24b31b
      Steven Rostedt authored
      Impact: clean up
      
      ftrace_match is too generic of a name. What it really does is
      search all records and matches the records with the given string,
      and either sets or unsets the functions to be traced depending
      on if the parameter 'enable' is set or not.
      
      This allows us to make another function called ftrace_match that
      can be used to test a single record.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      7f24b31b
    • Steven Rostedt's avatar
      ftrace: add do_for_each_ftrace_rec and while_for_each_ftrace_rec · 265c831c
      Steven Rostedt authored
      Impact: clean up
      
      To iterate over all the functions that dynamic trace knows about
      it requires two for loops. One to iterate over the pages and the
      other to iterate over the records within the page.
      
      There are several duplications of these loops in ftrace.c. This
      patch creates the macros do_for_each_ftrace_rec and
      while_for_each_ftrace_rec to handle this logic, and removes the
      duplicate code.
      
      While making this change, I also discovered and fixed a small
      bug that one of the iterations should exit the loop after it found the
      record it was searching for. This used a break when it should have
      used a goto, since there were two loops it needed to break out
      from.  No real harm was done by this bug since it would only continue
      to search the other records, and the code was in a slow path anyway.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      265c831c
    • Steven Rostedt's avatar
      ftrace: state that all functions are enabled in set_ftrace_filter · 0c75a3ed
      Steven Rostedt authored
      Impact: clean up, make set_ftrace_filter less confusing
      
      The set_ftrace_filter shows only the functions that will be traced.
      But when it is empty, it will trace all functions. This can be a bit
      confusing.
      
      This patch makes set_ftrace_filter show:
      
        #### all functions enabled ####
      
      When all functions will be traced, and we do not filter only a select
      few.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      0c75a3ed
  3. 13 Feb, 2009 3 commits
  4. 12 Feb, 2009 7 commits
    • Steven Rostedt's avatar
      sched: do not account for NMIs · 2a7b8df0
      Steven Rostedt authored
      Impact: avoid corruption in system time accounting
      
      Martin Schwidefsky told me that there was an issue with NMIs and
      system accounting. The problem is that the accounting code is
      not reentrant, and if an NMI goes off after an interrupt it can
      corrupt the accounting.
      
      For now, the best we can do is to treat NMIs like SMIs and they
      are not accounted for.
      
      This patch changes nmi_enter to not call __irq_enter and to do
      the preempt-count and tracing calls directly.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      2a7b8df0
    • Steven Rostedt's avatar
      ring-buffer: rename label out_unlock to out_reset · 45141d46
      Steven Rostedt authored
      Impact: clean up
      
      While reviewing the ring buffer code, I thougth I saw a bug with
      
      	if (!__raw_spin_trylock(&cpu_buffer->lock))
      		goto out_unlock;
      
      But I forgot that we use a variable "lock_taken" that is set if
      the spinlock is taken, and only unlock it if that variable is set.
      
      To avoid further confusion from other reviewers, this patch
      renames the label out_unlock with out_reset, which is the more
      appropriate name.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      45141d46
    • Linus Torvalds's avatar
    • Steven Rostedt's avatar
      preempt-count: force hardirq-count to max of 10 · 5a5fb7db
      Steven Rostedt authored
      To add a bit in the preempt_count to be set when in NMI context, we
      found that some archs did not have enough bits to spare. This is
      due to the hardirq_count being a mask that can hold NR_IRQS.
      
      Some archs allow for over 16000 IRQs, and that would require a mask
      of 14 bits. The sofitrq mask is 8 bits and the preempt disable mask
      is also 8 bits.  The PREEMP_ACTIVE bit is bit 30, and bit 31 would
      make the preempt_count (which is type int) a negative number.
      A negative preempt_count is a sign of failure.
      
      Add them up 14+8+8+1+1 you get 32 bits. No room for the NMI bit.
      
      But the hardirq_count is to track the number of nested IRQs, not
      the number of total IRQs.  This originally took the paranoid approach
      of setting the max nesting to NR_IRQS. But when we have archs with
      over 1000 IRQs, it is not practical to think they will ever all
      nest on a single CPU. Not to mention that this would most definitely
      cause a stack overflow.
      
      This patch sets a max of 10 bits to be used for IRQ nesting.
      I did a 'git grep HARDIRQ' to examine all users of HARDIRQ_BITS and
      HARDIRQ_MASK, and found that making it a max of 10 would not hurt
      anyone. I did find that the m68k expected it to be 8 bits, so
      I allow for the archs to set the number to be less than 10.
      
      I removed the setting of HARDIRQ_BITS from the archs that set it
      to more than 10. This includes ALPHA, ia64 and avr32.
      
      This will always allow room for the NMI bit, and if we need to allow
      for NMI nesting, we have 4 bits to play with.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      5a5fb7db
    • Nick Piggin's avatar
      Fix page writeback thinko, causing Berkeley DB slowdown · 3a4c6800
      Nick Piggin authored
      A bug was introduced into write_cache_pages cyclic writeout by commit
      31a12666 ("mm: write_cache_pages cyclic
      fix").  The intention (and comments) is that we should cycle back and
      look for more dirty pages at the beginning of the file if there is no
      more work to be done.
      
      But the !done condition was dropped from the test.  This means that any
      time the page writeout loop breaks (eg.  due to nr_to_write == 0), we
      will set index to 0, then goto again.  This will set done_index to
      index, then find done is set, so will proceed to the end of the
      function.  When updating mapping->writeback_index for cyclic writeout,
      we now use done_index == 0, so we're always cycling back to 0.
      
      This seemed to be causing random mmap writes (slapadd and iozone) to
      start writing more pages from the LRU and writeout would slowdown, and
      caused bugzilla entry
      
      	http://bugzilla.kernel.org/show_bug.cgi?id=12604
      
      about Berkeley DB slowing down dramatically.
      
      With this patch, iozone random write performance is increased nearly
      5x on my system (iozone -B -r 4k -s 64k -s 512m -s 1200m on ext2).
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Reported-and-tested-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3a4c6800
    • Kirill A. Shutemov's avatar
      mm: Export symbol ksize() · b1aabecd
      Kirill A. Shutemov authored
      Commit 7b2cd92a ("crypto: api - Fix
      zeroing on free") added modular user of ksize(). Export that to fix
      crypto.ko compilation.
      
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarKirill A. Shutemov <kirill@shutemov.name>
      Signed-off-by: default avatarPekka Enberg <penberg@cs.helsinki.fi>
      b1aabecd
    • Linus Torvalds's avatar
  5. 11 Feb, 2009 20 commits