1. 19 May, 2011 2 commits
    • Ingo Molnar's avatar
      perf stat: Add more cache-miss percentage printouts · c3305257
      Ingo Molnar authored
      Print out the cache-miss percentage as well if the cache refs were
      collected, for all the generic cache event types.
      
      Before:
      
         11,103,723,230 dTLB-loads                #  622.471 M/sec                    ( +-  0.30% )
             87,065,337 dTLB-load-misses          #    4.881 M/sec                    ( +-  0.90% )
      
      After:
      
         11,353,713,242 dTLB-loads                #  626.020 M/sec                    ( +-  0.35% )
            113,393,472 dTLB-load-misses          #    1.00% of all dTLB cache hits   ( +-  0.49% )
      
      Also ASCII color highlight too high percentages, them when it's executed on the console.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/n/tip-lkhwxsevdbd9a8nymx0vxc3y@git.kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@elte.hu>
      c3305257
    • Ingo Molnar's avatar
      perf stat: Add -d -d and -d -d -d options to show more CPU events · 2cba3ffb
      Ingo Molnar authored
      Print even more detailed statistics if requested via perf stat -d:
      
             -d:          detailed events, L1 and LLC data cache
          -d -d:     more detailed events, dTLB and iTLB events
       -d -d -d:     very detailed events, adding prefetch events
      
      Full output looks like this now:
      
       Performance counter stats for '/home/mingo/hackbench 10' (5 runs):
      
             1703.674707 task-clock                #    8.709 CPUs utilized            ( +-  4.19% )
                  49,068 context-switches          #    0.029 M/sec                    ( +- 16.66% )
                   8,303 CPU-migrations            #    0.005 M/sec                    ( +- 24.90% )
                  17,397 page-faults               #    0.010 M/sec                    ( +-  0.46% )
           2,345,389,239 cycles                    #    1.377 GHz                      ( +-  4.61% ) [55.90%]
           1,884,503,527 stalled-cycles-frontend   #   80.35% frontend cycles idle     ( +-  5.67% ) [50.39%]
             743,919,737 stalled-cycles-backend    #   31.72% backend  cycles idle     ( +-  8.75% ) [49.91%]
           1,314,416,379 instructions              #    0.56  insns per cycle
                                                   #    1.43  stalled cycles per insn  ( +-  2.53% ) [60.87%]
             272,592,567 branches                  #  160.003 M/sec                    ( +-  1.74% ) [56.56%]
               3,794,846 branch-misses             #    1.39% of all branches          ( +-  6.59% ) [58.50%]
             449,982,778 L1-dcache-loads           #  264.125 M/sec                    ( +-  2.47% ) [49.88%]
              22,404,961 L1-dcache-load-misses     #    4.98% of all L1-dcache hits    ( +-  6.08% ) [55.05%]
               6,204,750 LLC-loads                 #    3.642 M/sec                    ( +-  8.91% ) [43.75%]
               1,837,411 LLC-load-misses           #    1.078 M/sec                    ( +-  7.27% ) [12.07%]
             411,440,421 L1-icache-loads           #  241.502 M/sec                    ( +-  5.60% ) [36.52%]
              27,556,832 L1-icache-load-misses     #   16.175 M/sec                    ( +-  7.46% ) [46.72%]
             464,067,627 dTLB-loads                #  272.392 M/sec                    ( +-  4.46% ) [54.17%]
              10,765,648 dTLB-load-misses          #    6.319 M/sec                    ( +-  3.18% ) [48.68%]
           1,273,080,386 iTLB-loads                #  747.256 M/sec                    ( +-  3.38% ) [47.53%]
                 117,481 iTLB-load-misses          #    0.069 M/sec                    ( +- 14.99% ) [47.01%]
               4,590,653 L1-dcache-prefetches      #    2.695 M/sec                    ( +-  4.49% ) [46.19%]
               1,712,660 L1-dcache-prefetch-misses #    1.005 M/sec                    ( +-  3.75% ) [44.82%]
      
              0.195622057  seconds time elapsed  ( +-  6.84% )
      
      Also clean up the attribute construction code to be appending, and factor
      it out into add_default_attributes().
      
      Tweak the coverage percentage printout a bit, so that it's easier to view it
      alongside the +- sttddev colum.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@elte.hu>
      2cba3ffb
  2. 18 May, 2011 4 commits
    • Ingo Molnar's avatar
      perf bench, x86: Add alternatives-asm.h wrapper · b3132072
      Ingo Molnar authored
      perf bench needs this to build the kernel's memcpy routine:
      
      In file included from bench/mem-memcpy-x86-64-asm.S:2:0:
      bench/../../../arch/x86/lib/memcpy_64.S:7:33: fatal error: asm/alternative-asm.h: No such file or directory
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/n/tip-c5d41xibgullk8h2280q4gv0@git.kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b3132072
    • Ingo Molnar's avatar
      Merge branch 'x86/mem' into perf/core · 01ed58ab
      Ingo Molnar authored
      Merge reason: memcpy_64.S changes an assumption perf bench has, so merge this
                    here so we can fix it.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      01ed58ab
    • Ingo Molnar's avatar
      Merge branch 'tip/perf/core-2' of... · af2d03d4
      Ingo Molnar authored
      Merge branch 'tip/perf/core-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core
      af2d03d4
    • Jiri Olsa's avatar
      x86, 64-bit: Fix copy_[to/from]_user() checks for the userspace address limit · 26afb7c6
      Jiri Olsa authored
      As reported in BZ #30352:
      
        https://bugzilla.kernel.org/show_bug.cgi?id=30352
      
      there's a kernel bug related to reading the last allowed page on x86_64.
      
      The _copy_to_user() and _copy_from_user() functions use the following
      check for address limit:
      
        if (buf + size >= limit)
      	fail();
      
      while it should be more permissive:
      
        if (buf + size > limit)
      	fail();
      
      That's because the size represents the number of bytes being
      read/write from/to buf address AND including the buf address.
      So the copy function will actually never touch the limit
      address even if "buf + size == limit".
      
      Following program fails to use the last page as buffer
      due to the wrong limit check:
      
       #include <sys/mman.h>
       #include <sys/socket.h>
       #include <assert.h>
      
       #define PAGE_SIZE       (4096)
       #define LAST_PAGE       ((void*)(0x7fffffffe000))
      
       int main()
       {
              int fds[2], err;
              void * ptr = mmap(LAST_PAGE, PAGE_SIZE, PROT_READ | PROT_WRITE,
                                MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0);
              assert(ptr == LAST_PAGE);
              err = socketpair(AF_LOCAL, SOCK_STREAM, 0, fds);
              assert(err == 0);
              err = send(fds[0], ptr, PAGE_SIZE, 0);
              perror("send");
              assert(err == PAGE_SIZE);
              err = recv(fds[1], ptr, PAGE_SIZE, MSG_WAITALL);
              perror("recv");
              assert(err == PAGE_SIZE);
              return 0;
       }
      
      The other place checking the addr limit is the access_ok() function,
      which is working properly. There's just a misleading comment
      for the __range_not_ok() macro - which this patch fixes as well.
      
      The last page of the user-space address range is a guard page and
      Brian Gerst observed that the guard page itself due to an erratum on K8 cpus
      (#121 Sequential Execution Across Non-Canonical Boundary Causes Processor
      Hang).
      
      However, the test code is using the last valid page before the guard page.
      The bug is that the last byte before the guard page can't be read
      because of the off-by-one error. The guard page is left in place.
      
      This bug would normally not show up because the last page is
      part of the process stack and never accessed via syscalls.
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      Acked-by: default avatarBrian Gerst <brgerst@gmail.com>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: <stable@kernel.org>
      Link: http://lkml.kernel.org/r/1305210630-7136-1-git-send-email-jolsa@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@elte.hu>
      26afb7c6
  3. 17 May, 2011 11 commits
  4. 16 May, 2011 14 commits
  5. 12 May, 2011 1 commit
  6. 10 May, 2011 4 commits
  7. 09 May, 2011 4 commits