• Milian Wolff's avatar
    perf report: Cache failed lookups of inlined frames · b38775cf
    Milian Wolff authored
    When no inlined frames could be found for a given address, we did not
    store this information anywhere. That means we potentially do the costly
    inliner lookup repeatedly for cases where we know it can never succeed.
    
    This patch makes dso__parse_addr_inlines always return a valid
    inline_node. It will be empty when no inliners are found. This enables
    us to cache the empty list in the DSO, thereby improving the performance
    when many addresses fail to find the inliners.
    
    For my trivial example, the performance impact is already quite
    significant:
    
    Before:
    
    ~~~~~
     Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):
    
            594.804032      task-clock (msec)         #    0.998 CPUs utilized            ( +-  0.07% )
                    53      context-switches          #    0.089 K/sec                    ( +-  4.09% )
                     0      cpu-migrations            #    0.000 K/sec                    ( +-100.00% )
                 5,687      page-faults               #    0.010 M/sec                    ( +-  0.02% )
         2,300,918,213      cycles                    #    3.868 GHz                      ( +-  0.09% )
         4,395,839,080      instructions              #    1.91  insn per cycle           ( +-  0.00% )
           939,177,205      branches                  # 1578.969 M/sec                    ( +-  0.00% )
            11,824,633      branch-misses             #    1.26% of all branches          ( +-  0.10% )
    
           0.596246531 seconds time elapsed                                          ( +-  0.07% )
    ~~~~~
    
    After:
    
    ~~~~~
     Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):
    
            113.111405      task-clock (msec)         #    0.990 CPUs utilized            ( +-  0.89% )
                    29      context-switches          #    0.255 K/sec                    ( +- 54.25% )
                     0      cpu-migrations            #    0.000 K/sec
                 5,380      page-faults               #    0.048 M/sec                    ( +-  0.01% )
           432,378,779      cycles                    #    3.823 GHz                      ( +-  0.75% )
           670,057,633      instructions              #    1.55  insn per cycle           ( +-  0.01% )
           141,001,247      branches                  # 1246.570 M/sec                    ( +-  0.01% )
             2,346,845      branch-misses             #    1.66% of all branches          ( +-  0.19% )
    
           0.114222393 seconds time elapsed                                          ( +-  1.19% )
    ~~~~~
    Signed-off-by: default avatarMilian Wolff <milian.wolff@kdab.com>
    Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
    Cc: David Ahern <dsahern@gmail.com>
    Cc: Jin Yao <yao.jin@linux.intel.com>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Link: http://lkml.kernel.org/r/20171019113836.5548-3-milian.wolff@kdab.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
    b38775cf
machine.c 55.9 KB