• Milian Wolff's avatar
    perf report: Cache srclines for callchain nodes · 21ac9d54
    Milian Wolff authored
    On one hand this ensures that the memory is properly freed when the DSO
    gets freed. On the other hand this significantly speeds up the
    processing of the callchain nodes when lots of srclines are requested.
    For one of my data files e.g.:
    
    Before:
    
     Performance counter stats for 'perf report -s srcline -g srcline --stdio':
    
          52496.495043      task-clock (msec)         #    0.999 CPUs utilized
                   634      context-switches          #    0.012 K/sec
                     2      cpu-migrations            #    0.000 K/sec
               191,561      page-faults               #    0.004 M/sec
       165,074,498,235      cycles                    #    3.144 GHz
       334,170,832,408      instructions              #    2.02  insn per cycle
        90,220,029,745      branches                  # 1718.591 M/sec
           654,525,177      branch-misses             #    0.73% of all branches
    
          52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks!
    
    After:
    
     Performance counter stats for 'perf report -s srcline -g srcline --stdio':
    
          22606.323706      task-clock (msec)         #    1.000 CPUs utilized
                    31      context-switches          #    0.001 K/sec
                     0      cpu-migrations            #    0.000 K/sec
               185,471      page-faults               #    0.008 M/sec
        71,188,113,681      cycles                    #    3.149 GHz
       133,204,943,083      instructions              #    1.87  insn per cycle
        34,886,384,979      branches                  # 1543.214 M/sec
           278,214,495      branch-misses             #    0.80% of all branches
    
          22.609857253 seconds time elapsed
    
    Note that the difference is only this large when `--inline` is not
    passed. In such situations, we would use the inliner cache and thus do
    not run this code path that often.
    
    I think that this cache should actually be used in other places, too.
    When looking at the valgrind leak report for perf report, we see tons of
    srclines being leaked, most notably from calls to
    hist_entry__get_srcline. The problem is that get_srcline has many
    different formatting options (show_sym, show_addr, potentially even
    unwind_inlines when calling __get_srcline directly). As such, the
    srcline cannot easily be cached for all calls, or we'd have to add
    caches for all formatting combinations (6 so far). An alternative would
    be to remove the formatting options and handle that on a different level
    - i.e. print the sym/addr on demand wherever we actually output
    something. And the unwind_inlines could be moved into a separate
    function that does not return the srcline.
    Signed-off-by: default avatarMilian Wolff <milian.wolff@kdab.com>
    Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
    Cc: David Ahern <dsahern@gmail.com>
    Cc: Jin Yao <yao.jin@linux.intel.com>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Link: http://lkml.kernel.org/r/20171019113836.5548-4-milian.wolff@kdab.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
    21ac9d54
dso.c 34.2 KB