• Stephane Eranian's avatar
    perf/x86/amd: Add AMD Fam19h Branch Sampling support · ada54345
    Stephane Eranian authored
    Add support for the AMD Fam19h 16-deep branch sampling feature as
    described in the AMD PPR Fam19h Model 01h Revision B1.  This is a model
    specific extension. It is not an architected AMD feature.
    
    The Branch Sampling (BRS) operates with a 16-deep saturating buffer in MSR
    registers. There is no branch type filtering. All control flow changes are
    captured. BRS relies on specific programming of the core PMU of Fam19h.  In
    particular, the following requirements must be met:
     - the sampling period be greater than 16 (BRS depth)
     - the sampling period must use a fixed and not frequency mode
    
    BRS interacts with the NMI interrupt as well. Because enabling BRS is
    expensive, it is only activated after P event occurrences, where P is the
    desired sampling period.  At P occurrences of the event, the counter
    overflows, the CPU catches the interrupt, activates BRS for 16 branches until
    it saturates, and then delivers the NMI to the kernel.  Between the overflow
    and the time BRS activates more branches may be executed skewing the period.
    All along, the sampling event keeps counting. The skid may be attenuated by
    reducing the sampling period by 16 (subsequent patch).
    
    BRS is integrated into perf_events seamlessly via the same
    PERF_RECORD_BRANCH_STACK sample format. BRS generates perf_branch_entry
    records in the sampling buffer. No prediction information is supported. The
    branches are stored in reverse order of execution.  The most recent branch is
    the first entry in each record.
    
    No modification to the perf tool is necessary.
    
    BRS can be used with any sampling event. However, it is recommended to use
    the RETIRED_BRANCH_INSTRUCTIONS event because it matches what the BRS
    captures.
    
    $ perf record -b -c 1000037 -e cpu/event=0xc2,name=ret_br_instructions/ test
    
    $ perf report -D
    56531696056126 0x193c000 [0x1a8]: PERF_RECORD_SAMPLE(IP, 0x2): 18122/18230: 0x401d24 period: 1000037 addr: 0
    ... branch stack: nr:16
    .....  0: 0000000000401d24 -> 0000000000401d5a 0 cycles      0
    .....  1: 0000000000401d5c -> 0000000000401d24 0 cycles      0
    .....  2: 0000000000401d22 -> 0000000000401d5c 0 cycles      0
    .....  3: 0000000000401d5e -> 0000000000401d22 0 cycles      0
    .....  4: 0000000000401d20 -> 0000000000401d5e 0 cycles      0
    .....  5: 0000000000401d3e -> 0000000000401d20 0 cycles      0
    .....  6: 0000000000401d42 -> 0000000000401d3e 0 cycles      0
    .....  7: 0000000000401d3c -> 0000000000401d42 0 cycles      0
    .....  8: 0000000000401d44 -> 0000000000401d3c 0 cycles      0
    .....  9: 0000000000401d3a -> 0000000000401d44 0 cycles      0
    ..... 10: 0000000000401d46 -> 0000000000401d3a 0 cycles      0
    ..... 11: 0000000000401d38 -> 0000000000401d46 0 cycles      0
    ..... 12: 0000000000401d48 -> 0000000000401d38 0 cycles      0
    ..... 13: 0000000000401d36 -> 0000000000401d48 0 cycles      0
    ..... 14: 0000000000401d4a -> 0000000000401d36 0 cycles      0
    ..... 15: 0000000000401d34 -> 0000000000401d4a 0 cycles      0
     ... thread: test:18230
     ...... dso: test
    Signed-off-by: default avatarStephane Eranian <eranian@google.com>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lore.kernel.org/r/20220322221517.2510440-4-eranian@google.com
    ada54345
Makefile 315 Bytes