• Andi Kleen's avatar
    perf script: Allow computing 'perf stat' style metrics · 4bd1bef8
    Andi Kleen authored
    Add support for computing 'perf stat' style metrics in 'perf script'.
    
    When using leader sampling we can get metrics for each sampling period
    by computing formulas over the values of the different group members.
    
    This allows things like fine grained IPC tracking through sampling, much
    more fine grained than with 'perf stat'.
    
    The metric is still averaged over the sampling period, it is not just
    for the sampling point.
    
    This patch adds a new metric output field for 'perf script' that uses
    the existing 'perf stat' metrics infrastructure to compute any metrics
    supported by 'perf stat'.
    
    For example to sample IPC:
    
      $ perf record -e '{ref-cycles,cycles,instructions}:S' -a sleep 1
      $ perf script -F metric,ip,sym,time,cpu,comm
      ...
       alsa-sink-ALC32 [000] 42815.856074:      7fd65937d6cc [unknown]
       alsa-sink-ALC32 [000] 42815.856074:      7fd65937d6cc [unknown]
       alsa-sink-ALC32 [000] 42815.856074:      7fd65937d6cc [unknown]
       alsa-sink-ALC32 [000] 42815.856074:    metric:    0.13  insn per cycle
               swapper [000] 42815.857961:  ffffffff81655df0 __schedule
               swapper [000] 42815.857961:  ffffffff81655df0 __schedule
               swapper [000] 42815.857961:  ffffffff81655df0 __schedule
               swapper [000] 42815.857961:    metric:    0.23  insn per cycle
       qemu-system-x86 [000] 42815.858130:  ffffffff8165ad0e _raw_spin_unlock_irqrestore
       qemu-system-x86 [000] 42815.858130:  ffffffff8165ad0e _raw_spin_unlock_irqrestore
       qemu-system-x86 [000] 42815.858130:  ffffffff8165ad0e _raw_spin_unlock_irqrestore
       qemu-system-x86 [000] 42815.858130:    metric:    0.46  insn per cycle
                 :4972 [000] 42815.858312:  ffffffffa080e5f2 vmx_vcpu_run
                 :4972 [000] 42815.858312:  ffffffffa080e5f2 vmx_vcpu_run
                 :4972 [000] 42815.858312:  ffffffffa080e5f2 vmx_vcpu_run
                 :4972 [000] 42815.858312:    metric:    0.45  insn per cycle
    
    TopDown:
    
    This requires disabling SMT if you have it enabled, because SMT would
    require sampling per core, which is not supported.
    
      $ perf record -e '{ref-cycles,topdown-fetch-bubbles,\
                         topdown-recovery-bubbles,\
                         topdown-slots-retired,topdown-total-slots,\
                         topdown-slots-issued}:S' -a sleep 1
      $ perf script --header -I -F cpu,ip,sym,event,metric,period
      ...
      [000]     121108               ref-cycles:  ffffffff8165222e copy_user_enhanced_fast_string
      [000]     190350    topdown-fetch-bubbles:  ffffffff8165222e copy_user_enhanced_fast_string
      [000]       2055 topdown-recovery-bubbles:  ffffffff8165222e copy_user_enhanced_fast_string
      [000]     148729    topdown-slots-retired:  ffffffff8165222e copy_user_enhanced_fast_string
      [000]     144324      topdown-total-slots:  ffffffff8165222e copy_user_enhanced_fast_string
      [000]     160852     topdown-slots-issued:  ffffffff8165222e copy_user_enhanced_fast_string
      [000]   metric:     33.0% frontend bound
      [000]   metric:      3.5% bad speculation
      [000]   metric:     25.8% retiring
      [000]   metric:     37.7% backend bound
      [000]     112112               ref-cycles:  ffffffff8165aec8 _raw_spin_lock_irqsave
      [000]     357222    topdown-fetch-bubbles:  ffffffff8165aec8 _raw_spin_lock_irqsave
      [000]       3325 topdown-recovery-bubbles:  ffffffff8165aec8 _raw_spin_lock_irqsave
      [000]     323553    topdown-slots-retired:  ffffffff8165aec8 _raw_spin_lock_irqsave
      [000]     270507      topdown-total-slots:  ffffffff8165aec8 _raw_spin_lock_irqsave
      [000]     341226     topdown-slots-issued:  ffffffff8165aec8 _raw_spin_lock_irqsave
      [000]   metric:     33.0% frontend bound
      [000]   metric:      2.9% bad speculation
      [000]   metric:     29.9% retiring
      [000]   metric:     34.2% backend bound
    ...
    
    v2:
    Use evsel->priv for new fields
    Port to new base line, support fp output.
    Handle stats in ->stats, not ->priv
    Minor cleanups
    
    Extra explanation about the use of the term 'averaging', from Andi in the
    thread in the Link: tag below:
    
    <quote Andi>
    The current samples contains the sum of event counts for a sampling period.
    
    EventA-1           EventA-2                EventA-3      EventA-4
    EventB-1     EventB-2                             EventC-3
    
                             gap with no events                overflow
    |-----------------------------------------------------------------|
    period-start                                             period-end
    ^                                                                 ^
    |                                                                 |
    previous sample                                      current sample
    
    So EventA = 4 and EventB = 3 at the sample point
    
    I generate a metric, let's say EventA / EventB. It applies to the whole period.
    
    But the metric is over a longer time which does not have the same behavior. For
    example the gap above doesn't have any events, while they are clustered at the
    beginning and end of the sample period.
    
    But we're summing everything together. The metric doesn't know that the gap is
    different than the busy period.
    
    That's what I'm trying to express with averaging.
    </quote>
    Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
    Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
    Link: http://lkml.kernel.org/r/20171117214300.32746-4-andi@firstfloor.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
    4bd1bef8
perf-script.txt 12.3 KB