• Kan Liang's avatar
    perf tools: Construct LBR call chain · 384b6055
    Kan Liang authored
    LBR call stack only has user-space callchains. It is output in the
    PERF_SAMPLE_BRANCH_STACK data format. For kernel callchains, it's
    still in the form of PERF_SAMPLE_CALLCHAIN.
    
    The perf tool has to handle both data sources to construct a
    complete callstack.
    
    For the "perf report -D" option, both lbr and fp information will be
    displayed.
    
    A new call chain recording option "lbr" is introduced into the perf
    tool for LBR call stack. The user can use --call-graph lbr to get
    the call stack information from hardware.
    
    Here are some examples.
    
    When profiling bc(1) on Fedora 19:
    
      echo 'scale=2000; 4*a(1)' > cmd; perf record --call-graph lbr bc -l < cmd
    
    If enabling LBR, perf report output looks like:
    
        50.36%       bc  bc                 [.] bc_divide
                     |
                     --- bc_divide
                         execute
                         run_code
                         yyparse
                         main
                         __libc_start_main
                         _start
        33.66%       bc  bc                 [.] _one_mult
                     |
                     --- _one_mult
                         bc_divide
                         execute
                         run_code
                         yyparse
                         main
                         __libc_start_main
                         _start
         7.62%       bc  bc                 [.] _bc_do_add
                     |
                     --- _bc_do_add
                        |
                        |--99.89%-- 0x2000186a8
                         --0.11%-- [...]
         6.83%       bc  bc                 [.] _bc_do_sub
                     |
                     --- _bc_do_sub
                        |
                        |--99.94%-- bc_add
                        |          execute
                        |          run_code
                        |          yyparse
                        |          main
                        |          __libc_start_main
                        |          _start
                         --0.06%-- [...]
         0.46%       bc  libc-2.17.so       [.] __memset_sse2
                     |
                     --- __memset_sse2
                        |
                        |--54.13%-- bc_new_num
                        |          |
                        |          |--51.00%-- bc_divide
                        |          |          execute
                        |          |          run_code
                        |          |          yyparse
                        |          |          main
                        |          |          __libc_start_main
                        |          |          _start
                        |          |
                        |          |--30.46%-- _bc_do_sub
                        |          |          bc_add
                        |          |          execute
                        |          |          run_code
                        |          |          yyparse
                        |          |          main
                        |          |          __libc_start_main
                        |          |          _start
                        |          |
                        |           --18.55%-- _bc_do_add
                        |                     bc_add
                        |                     execute
                        |                     run_code
                        |                     yyparse
                        |                     main
                        |                     __libc_start_main
                        |                     _start
                        |
                         --45.87%-- bc_divide
                                   execute
                                   run_code
                                   yyparse
                                   main
                                   __libc_start_main
                                   _start
    
    If using FP, perf report output looks like:
    
      echo 'scale=2000; 4*a(1)' > cmd; perf record --call-graph fp bc -l < cmd
    
        50.49%       bc  bc                 [.] bc_divide
                     |
                     --- bc_divide
        33.57%       bc  bc                 [.] _one_mult
                     |
                     --- _one_mult
         7.61%       bc  bc                 [.] _bc_do_add
                     |
                     --- _bc_do_add
                         0x2000186a8
         6.88%       bc  bc                 [.] _bc_do_sub
                     |
                     --- _bc_do_sub
         0.42%       bc  libc-2.17.so       [.] __memcpy_ssse3_back
                     |
                     --- __memcpy_ssse3_back
    
    If using LBR, perf report -D output looks like:
    
    3458145275743 0x2fd750 [0xd8]: PERF_RECORD_SAMPLE(IP, 0x2): 9748/9748: 0x408ea8 period: 609644 addr: 0
    ... LBR call chain: nr:8
    .....  0: fffffffffffffe00
    .....  1: 0000000000408e50
    .....  2: 000000000040a458
    .....  3: 000000000040562e
    .....  4: 0000000000408590
    .....  5: 00000000004022c0
    .....  6: 00000000004015dd
    .....  7: 0000003d1cc21b43
    ... FP chain: nr:2
    .....  0: fffffffffffffe00
    .....  1: 0000000000408ea8
     ... thread: bc:9748
     ...... dso: /usr/bin/bc
    
    The LBR call stack has the following known limitations:
    
     - Zero length calls are not filtered out by the hardware
    
     - Exception handing such as setjmp/longjmp will have calls/returns not
       match
    
     - Pushing different return address onto the stack will have
       calls/returns not match
    
     - If callstack is deeper than the LBR, only the last entries are
       captured
    Tested-by: default avatarJiri Olsa <jolsa@kernel.org>
    Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Adrian Hunter <adrian.hunter@intel.com>
    Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
    Cc: Borislav Petkov <bp@suse.de>
    Cc: David Ahern <dsahern@gmail.com>
    Cc: Don Zickus <dzickus@redhat.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Cc: Paul Mackerras <paulus@samba.org>
    Cc: Simon Que <sque@chromium.org>
    Cc: Stephane Eranian <eranian@google.com>
    Link: http://lkml.kernel.org/r/1420482185-29830-3-git-send-email-kan.liang@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
    384b6055
machine.c 43.3 KB