1. 22 May, 2011 3 commits
  2. 19 May, 2011 6 commits
    • Ingo Molnar's avatar
      core_kernel_data(): Fix architectures that do not define _sdata · c5fc4721
      Ingo Molnar authored
      Some architectures such as Alpha do not define _sdata but _data:
      
        kernel/built-in.o: In function `core_kernel_data':
        kernel/extable.c:77: undefined reference to `_sdata'
      
      So expand the scope of the data range to the text addresses too,
      this might be more correct anyway because this way we can
      cover readonly variables as well.
      
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/n/tip-i878c8a0e0g0ep4v7i6vxnhz@git.kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@elte.hu>
      c5fc4721
    • Ingo Molnar's avatar
      Merge branch 'tip/perf/core' of... · 29510ec3
      Ingo Molnar authored
      Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core
      29510ec3
    • Ingo Molnar's avatar
      Merge branch 'tip/perf/core-3' of... · 398995ce
      Ingo Molnar authored
      Merge branch 'tip/perf/core-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core
      398995ce
    • Ingo Molnar's avatar
      perf stat: Add more cache-miss percentage printouts · c3305257
      Ingo Molnar authored
      Print out the cache-miss percentage as well if the cache refs were
      collected, for all the generic cache event types.
      
      Before:
      
         11,103,723,230 dTLB-loads                #  622.471 M/sec                    ( +-  0.30% )
             87,065,337 dTLB-load-misses          #    4.881 M/sec                    ( +-  0.90% )
      
      After:
      
         11,353,713,242 dTLB-loads                #  626.020 M/sec                    ( +-  0.35% )
            113,393,472 dTLB-load-misses          #    1.00% of all dTLB cache hits   ( +-  0.49% )
      
      Also ASCII color highlight too high percentages, them when it's executed on the console.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/n/tip-lkhwxsevdbd9a8nymx0vxc3y@git.kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@elte.hu>
      c3305257
    • Ingo Molnar's avatar
      perf stat: Add -d -d and -d -d -d options to show more CPU events · 2cba3ffb
      Ingo Molnar authored
      Print even more detailed statistics if requested via perf stat -d:
      
             -d:          detailed events, L1 and LLC data cache
          -d -d:     more detailed events, dTLB and iTLB events
       -d -d -d:     very detailed events, adding prefetch events
      
      Full output looks like this now:
      
       Performance counter stats for '/home/mingo/hackbench 10' (5 runs):
      
             1703.674707 task-clock                #    8.709 CPUs utilized            ( +-  4.19% )
                  49,068 context-switches          #    0.029 M/sec                    ( +- 16.66% )
                   8,303 CPU-migrations            #    0.005 M/sec                    ( +- 24.90% )
                  17,397 page-faults               #    0.010 M/sec                    ( +-  0.46% )
           2,345,389,239 cycles                    #    1.377 GHz                      ( +-  4.61% ) [55.90%]
           1,884,503,527 stalled-cycles-frontend   #   80.35% frontend cycles idle     ( +-  5.67% ) [50.39%]
             743,919,737 stalled-cycles-backend    #   31.72% backend  cycles idle     ( +-  8.75% ) [49.91%]
           1,314,416,379 instructions              #    0.56  insns per cycle
                                                   #    1.43  stalled cycles per insn  ( +-  2.53% ) [60.87%]
             272,592,567 branches                  #  160.003 M/sec                    ( +-  1.74% ) [56.56%]
               3,794,846 branch-misses             #    1.39% of all branches          ( +-  6.59% ) [58.50%]
             449,982,778 L1-dcache-loads           #  264.125 M/sec                    ( +-  2.47% ) [49.88%]
              22,404,961 L1-dcache-load-misses     #    4.98% of all L1-dcache hits    ( +-  6.08% ) [55.05%]
               6,204,750 LLC-loads                 #    3.642 M/sec                    ( +-  8.91% ) [43.75%]
               1,837,411 LLC-load-misses           #    1.078 M/sec                    ( +-  7.27% ) [12.07%]
             411,440,421 L1-icache-loads           #  241.502 M/sec                    ( +-  5.60% ) [36.52%]
              27,556,832 L1-icache-load-misses     #   16.175 M/sec                    ( +-  7.46% ) [46.72%]
             464,067,627 dTLB-loads                #  272.392 M/sec                    ( +-  4.46% ) [54.17%]
              10,765,648 dTLB-load-misses          #    6.319 M/sec                    ( +-  3.18% ) [48.68%]
           1,273,080,386 iTLB-loads                #  747.256 M/sec                    ( +-  3.38% ) [47.53%]
                 117,481 iTLB-load-misses          #    0.069 M/sec                    ( +- 14.99% ) [47.01%]
               4,590,653 L1-dcache-prefetches      #    2.695 M/sec                    ( +-  4.49% ) [46.19%]
               1,712,660 L1-dcache-prefetch-misses #    1.005 M/sec                    ( +-  3.75% ) [44.82%]
      
              0.195622057  seconds time elapsed  ( +-  6.84% )
      
      Also clean up the attribute construction code to be appending, and factor
      it out into add_default_attributes().
      
      Tweak the coverage percentage printout a bit, so that it's easier to view it
      alongside the +- sttddev colum.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@elte.hu>
      2cba3ffb
    • Michal Marek's avatar
      ftrace/kbuild: Add recordmcount files to force full build · d6971822
      Michal Marek authored
      Modifications to recordmcount must be performed on all object
      files to stay consistent with what the kernel code may expect.
      Add the recordmcount files to the main dependencies to make sure
      any change to them causes a full recompile.
      Signed-off-by: default avatarMichal Marek <mmarek@suse.cz>
      Link: http://lkml.kernel.org/r/20110517133646.GP13293@sepie.suse.czSigned-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      d6971822
  3. 18 May, 2011 17 commits
    • Steven Rostedt's avatar
      ftrace: Add self-tests for multiple function trace users · 95950c2e
      Steven Rostedt authored
      Add some basic sanity tests for multiple users of the function
      tracer at startup.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      95950c2e
    • Steven Rostedt's avatar
      ftrace: Modify ftrace_set_filter/notrace to take ops · 936e074b
      Steven Rostedt authored
      Since users of the function tracer can now pick and choose which
      functions they want to trace agnostically from other users of the
      function tracer, we need to pass the ops struct to the ftrace_set_filter()
      functions.
      
      The functions ftrace_set_global_filter() and ftrace_set_global_notrace()
      is added to keep the old filter functions which are used to modify
      the generic function tracers.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      936e074b
    • Steven Rostedt's avatar
      ftrace: Allow dynamically allocated function tracers · cdbe61bf
      Steven Rostedt authored
      Now that functions may be selected individually, it only makes sense
      that we should allow dynamically allocated trace structures to
      be traced. This will allow perf to allocate a ftrace_ops structure
      at runtime and use it to pick and choose which functions that
      structure will trace.
      
      Note, a dynamically allocated ftrace_ops will always be called
      indirectly instead of being called directly from the mcount in
      entry.S. This is because there's no safe way to prevent mcount
      from being preempted before calling the function, unless we
      modify every entry.S to do so (not likely). Thus, dynamically allocated
      functions will now be called by the ftrace_ops_list_func() that
      loops through the ops that are allocated if there are more than
      one op allocated at a time. This loop is protected with a
      preempt_disable.
      
      To determine if an ftrace_ops structure is allocated or not, a new
      util function was added to the kernel/extable.c called
      core_kernel_data(), which returns 1 if the address is between
      _sdata and _edata.
      
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      cdbe61bf
    • Steven Rostedt's avatar
      ftrace: Implement separate user function filtering · b848914c
      Steven Rostedt authored
      ftrace_ops that are registered to trace functions can now be
      agnostic to each other in respect to what functions they trace.
      Each ops has their own hash of the functions they want to trace
      and a hash to what they do not want to trace. A empty hash for
      the functions they want to trace denotes all functions should
      be traced that are not in the notrace hash.
      
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      b848914c
    • Steven Rostedt's avatar
      ftrace: Free hash with call_rcu_sched() · 07fd5515
      Steven Rostedt authored
      When a hash is modified and might be in use, we need to perform
      a schedule RCU operation on it, as the hashes will soon be used
      directly in the function tracer callback.
      
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      07fd5515
    • Steven Rostedt's avatar
      ftrace: Have global_ops store the functions that are to be traced · 2b499381
      Steven Rostedt authored
      This is a step towards each ops structure defining its own set
      of functions to trace. As the current code with pid's and such
      are specific to the global_ops, it is restructured to be used
      with the global ops.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      2b499381
    • Steven Rostedt's avatar
      ftrace: Add ops parameter to ftrace_startup/shutdown functions · bd69c30b
      Steven Rostedt authored
      In order to allow different ops to enable different functions,
      the ftrace_startup() and ftrace_shutdown() functions need the
      ops parameter passed to them.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      bd69c30b
    • Steven Rostedt's avatar
      ftrace: Add enabled_functions file · 647bcd03
      Steven Rostedt authored
      Add the enabled_functions file that is used to show all the
      functions that have been enabled for tracing as well as their
      ref counts. This helps seeing if any function has been registered
      and what functions are being traced.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      647bcd03
    • Steven Rostedt's avatar
      ftrace: Use counters to enable functions to trace · ed926f9b
      Steven Rostedt authored
      Every function has its own record that stores the instruction
      pointer and flags for the function to be traced. There are only
      two flags: enabled and free. The enabled flag states that tracing
      for the function has been enabled (actively traced), and the free
      flag states that the record no longer points to a function and can
      be used by new functions (loaded modules).
      
      These flags are now moved to the MSB of the flags (actually just
      the top 32bits). The rest of the bits (30 bits) are now used as
      a ref counter. Everytime a tracer register functions to trace,
      those functions will have its counter incremented.
      
      When tracing is enabled, to determine if a function should be traced,
      the counter is examined, and if it is non-zero it is set to trace.
      
      When a ftrace_ops is registered to trace functions, its hashes
      are examined. If the ftrace_ops filter_hash count is zero, then
      all functions are set to be traced, otherwise only the functions
      in the hash are to be traced. The exception to this is if a function
      is also in the ftrace_ops notrace_hash. Then that function's counter
      is not incremented for this ftrace_ops.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      ed926f9b
    • Steven Rostedt's avatar
      ftrace: Separate hash allocation and assignment · 33dc9b12
      Steven Rostedt authored
      When filtering, allocate a hash to insert the function records.
      After the filtering is complete, assign it to the ftrace_ops structure.
      
      This allows the ftrace_ops structure to have a much smaller array of
      hash buckets instead of wasting a lot of memory.
      
      A read only empty_hash is created to be the minimum size that any ftrace_ops
      can point to.
      
      When a new hash is created, it has the following steps:
      
      o Allocate a default hash.
      o Walk the function records assigning the filtered records to the hash
      o Allocate a new hash with the appropriate size buckets
      o Move the entries from the default hash to the new hash.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      33dc9b12
    • Steven Rostedt's avatar
      ftrace: Create a global_ops to hold the filter and notrace hashes · f45948e8
      Steven Rostedt authored
      Combine the filter and notrace hashes to be accessed by a single entity,
      the global_ops. The global_ops is a ftrace_ops structure that is passed
      to different functions that can read or modify the filtering of the
      function tracer.
      
      The ftrace_ops structure was modified to hold a filter and notrace
      hashes so that later patches may allow each ftrace_ops to have its own
      set of rules to what functions may be filtered.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      f45948e8
    • Steven Rostedt's avatar
      ftrace: Use hash instead for FTRACE_FL_FILTER · 1cf41dd7
      Steven Rostedt authored
      When multiple users are allowed to have their own set of functions
      to trace, having the FTRACE_FL_FILTER flag will not be enough to
      handle the accounting of those users. Each user will need their own
      set of functions.
      
      Replace the FTRACE_FL_FILTER with a filter_hash instead. This is
      temporary until the rest of the function filtering accounting
      gets in.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      1cf41dd7
    • Steven Rostedt's avatar
      ftrace: Replace FTRACE_FL_NOTRACE flag with a hash of ignored functions · b448c4e3
      Steven Rostedt authored
      To prepare for the accounting system that will allow multiple users of
      the function tracer, having the FTRACE_FL_NOTRACE as a flag in the
      dyn_trace record does not make sense.
      
      All ftrace_ops will soon have a hash of functions they should trace
      and not trace. By making a global hash of functions not to trace makes
      this easier for the transition.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      b448c4e3
    • Ingo Molnar's avatar
      perf bench, x86: Add alternatives-asm.h wrapper · b3132072
      Ingo Molnar authored
      perf bench needs this to build the kernel's memcpy routine:
      
      In file included from bench/mem-memcpy-x86-64-asm.S:2:0:
      bench/../../../arch/x86/lib/memcpy_64.S:7:33: fatal error: asm/alternative-asm.h: No such file or directory
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/n/tip-c5d41xibgullk8h2280q4gv0@git.kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b3132072
    • Ingo Molnar's avatar
      Merge branch 'x86/mem' into perf/core · 01ed58ab
      Ingo Molnar authored
      Merge reason: memcpy_64.S changes an assumption perf bench has, so merge this
                    here so we can fix it.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      01ed58ab
    • Ingo Molnar's avatar
      Merge branch 'tip/perf/core-2' of... · af2d03d4
      Ingo Molnar authored
      Merge branch 'tip/perf/core-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core
      af2d03d4
    • Jiri Olsa's avatar
      x86, 64-bit: Fix copy_[to/from]_user() checks for the userspace address limit · 26afb7c6
      Jiri Olsa authored
      As reported in BZ #30352:
      
        https://bugzilla.kernel.org/show_bug.cgi?id=30352
      
      there's a kernel bug related to reading the last allowed page on x86_64.
      
      The _copy_to_user() and _copy_from_user() functions use the following
      check for address limit:
      
        if (buf + size >= limit)
      	fail();
      
      while it should be more permissive:
      
        if (buf + size > limit)
      	fail();
      
      That's because the size represents the number of bytes being
      read/write from/to buf address AND including the buf address.
      So the copy function will actually never touch the limit
      address even if "buf + size == limit".
      
      Following program fails to use the last page as buffer
      due to the wrong limit check:
      
       #include <sys/mman.h>
       #include <sys/socket.h>
       #include <assert.h>
      
       #define PAGE_SIZE       (4096)
       #define LAST_PAGE       ((void*)(0x7fffffffe000))
      
       int main()
       {
              int fds[2], err;
              void * ptr = mmap(LAST_PAGE, PAGE_SIZE, PROT_READ | PROT_WRITE,
                                MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0);
              assert(ptr == LAST_PAGE);
              err = socketpair(AF_LOCAL, SOCK_STREAM, 0, fds);
              assert(err == 0);
              err = send(fds[0], ptr, PAGE_SIZE, 0);
              perror("send");
              assert(err == PAGE_SIZE);
              err = recv(fds[1], ptr, PAGE_SIZE, MSG_WAITALL);
              perror("recv");
              assert(err == PAGE_SIZE);
              return 0;
       }
      
      The other place checking the addr limit is the access_ok() function,
      which is working properly. There's just a misleading comment
      for the __range_not_ok() macro - which this patch fixes as well.
      
      The last page of the user-space address range is a guard page and
      Brian Gerst observed that the guard page itself due to an erratum on K8 cpus
      (#121 Sequential Execution Across Non-Canonical Boundary Causes Processor
      Hang).
      
      However, the test code is using the last valid page before the guard page.
      The bug is that the last byte before the guard page can't be read
      because of the off-by-one error. The guard page is left in place.
      
      This bug would normally not show up because the last page is
      part of the process stack and never accessed via syscalls.
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      Acked-by: default avatarBrian Gerst <brgerst@gmail.com>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: <stable@kernel.org>
      Link: http://lkml.kernel.org/r/1305210630-7136-1-git-send-email-jolsa@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@elte.hu>
      26afb7c6
  4. 17 May, 2011 11 commits
  5. 16 May, 2011 3 commits