1. 05 Apr, 2022 6 commits
    • Stephane Eranian's avatar
      perf/x86/amd: Add AMD branch sampling period adjustment · ba2fe750
      Stephane Eranian authored
      Add code to adjust the sampling event period when used with the Branch
      Sampling feature (BRS). Given the depth of the BRS (16), the period is
      reduced by that depth such that in the best case scenario, BRS saturates at
      the desired sampling period. In practice, though, the processor may execute
      more branches. Given a desired period P and a depth D, the kernel programs
      the actual period at P - D. After P occurrences of the sampling event, the
      counter overflows. It then may take X branches (skid) before the NMI is
      caught and held by the hardware and BRS activates. Then, after D branches,
      BRS saturates and the NMI is delivered.  With no skid, the effective period
      would be (P - D) + D = P. In practice, however, it will likely be (P - D) +
      X + D. There is no way to eliminate X or predict X.
      Signed-off-by: default avatarStephane Eranian <eranian@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220322221517.2510440-7-eranian@google.com
      ba2fe750
    • Stephane Eranian's avatar
      perf/x86/amd: Enable branch sampling priv level filtering · 8910075d
      Stephane Eranian authored
      The AMD Branch Sampling features does not provide hardware filtering by
      privilege level. The associated PMU counter does but not the branch sampling
      by itself. Given how BRS operates there is a possibility that BRS captures
      kernel level branches even though the event is programmed to count only at
      the user level.
      
      Implement a workaround in software by removing the branches which belong to
      the wrong privilege level. The privilege level is evaluated on the target of
      the branch and not the source so as to be compatible with other architectures.
      As a consequence of this patch, the number of entries in the
      PERF_RECORD_BRANCH_STACK buffer may be less than the maximum (16).  It could
      even be zero. Another consequence is that consecutive entries in the branch
      stack may not reflect actual code path and may have discontinuities, in case
      kernel branches were suppressed. But this is no different than what happens
      on other architectures.
      Signed-off-by: default avatarStephane Eranian <eranian@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220322221517.2510440-6-eranian@google.com
      8910075d
    • Stephane Eranian's avatar
      perf/x86/amd: Add branch-brs helper event for Fam19h BRS · 44175993
      Stephane Eranian authored
      Add a pseudo event called branch-brs to help use the FAM Fam19h
      Branch Sampling feature (BRS). BRS samples taken branches, so it is best used
      when sampling on a retired taken branch event (0xc4) which is what BRS
      captures.  Instead of trying to remember the event code or actual event name,
      users can simply do:
      
      $ perf record -b -e cpu/branch-brs/ -c 1000037 .....
      Signed-off-by: default avatarStephane Eranian <eranian@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220322221517.2510440-5-eranian@google.com
      44175993
    • Stephane Eranian's avatar
      perf/x86/amd: Add AMD Fam19h Branch Sampling support · ada54345
      Stephane Eranian authored
      Add support for the AMD Fam19h 16-deep branch sampling feature as
      described in the AMD PPR Fam19h Model 01h Revision B1.  This is a model
      specific extension. It is not an architected AMD feature.
      
      The Branch Sampling (BRS) operates with a 16-deep saturating buffer in MSR
      registers. There is no branch type filtering. All control flow changes are
      captured. BRS relies on specific programming of the core PMU of Fam19h.  In
      particular, the following requirements must be met:
       - the sampling period be greater than 16 (BRS depth)
       - the sampling period must use a fixed and not frequency mode
      
      BRS interacts with the NMI interrupt as well. Because enabling BRS is
      expensive, it is only activated after P event occurrences, where P is the
      desired sampling period.  At P occurrences of the event, the counter
      overflows, the CPU catches the interrupt, activates BRS for 16 branches until
      it saturates, and then delivers the NMI to the kernel.  Between the overflow
      and the time BRS activates more branches may be executed skewing the period.
      All along, the sampling event keeps counting. The skid may be attenuated by
      reducing the sampling period by 16 (subsequent patch).
      
      BRS is integrated into perf_events seamlessly via the same
      PERF_RECORD_BRANCH_STACK sample format. BRS generates perf_branch_entry
      records in the sampling buffer. No prediction information is supported. The
      branches are stored in reverse order of execution.  The most recent branch is
      the first entry in each record.
      
      No modification to the perf tool is necessary.
      
      BRS can be used with any sampling event. However, it is recommended to use
      the RETIRED_BRANCH_INSTRUCTIONS event because it matches what the BRS
      captures.
      
      $ perf record -b -c 1000037 -e cpu/event=0xc2,name=ret_br_instructions/ test
      
      $ perf report -D
      56531696056126 0x193c000 [0x1a8]: PERF_RECORD_SAMPLE(IP, 0x2): 18122/18230: 0x401d24 period: 1000037 addr: 0
      ... branch stack: nr:16
      .....  0: 0000000000401d24 -> 0000000000401d5a 0 cycles      0
      .....  1: 0000000000401d5c -> 0000000000401d24 0 cycles      0
      .....  2: 0000000000401d22 -> 0000000000401d5c 0 cycles      0
      .....  3: 0000000000401d5e -> 0000000000401d22 0 cycles      0
      .....  4: 0000000000401d20 -> 0000000000401d5e 0 cycles      0
      .....  5: 0000000000401d3e -> 0000000000401d20 0 cycles      0
      .....  6: 0000000000401d42 -> 0000000000401d3e 0 cycles      0
      .....  7: 0000000000401d3c -> 0000000000401d42 0 cycles      0
      .....  8: 0000000000401d44 -> 0000000000401d3c 0 cycles      0
      .....  9: 0000000000401d3a -> 0000000000401d44 0 cycles      0
      ..... 10: 0000000000401d46 -> 0000000000401d3a 0 cycles      0
      ..... 11: 0000000000401d38 -> 0000000000401d46 0 cycles      0
      ..... 12: 0000000000401d48 -> 0000000000401d38 0 cycles      0
      ..... 13: 0000000000401d36 -> 0000000000401d48 0 cycles      0
      ..... 14: 0000000000401d4a -> 0000000000401d36 0 cycles      0
      ..... 15: 0000000000401d34 -> 0000000000401d4a 0 cycles      0
       ... thread: test:18230
       ...... dso: test
      Signed-off-by: default avatarStephane Eranian <eranian@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220322221517.2510440-4-eranian@google.com
      ada54345
    • Stephane Eranian's avatar
      x86/cpufeatures: Add AMD Fam19h Branch Sampling feature · a77d41ac
      Stephane Eranian authored
      Add a cpu feature for AMD Fam19h Branch Sampling feature as bit
      31 of EBX on CPUID leaf function 0x80000008.
      Signed-off-by: default avatarStephane Eranian <eranian@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220322221517.2510440-3-eranian@google.com
      a77d41ac
    • Stephane Eranian's avatar
      perf/core: Add perf_clear_branch_entry_bitfields() helper · bfe4daf8
      Stephane Eranian authored
      Make it simpler to reset all the info fields on the
      perf_branch_entry by adding a helper inline function.
      
      The goal is to centralize the initialization to avoid missing
      a field in case more are added.
      Signed-off-by: default avatarStephane Eranian <eranian@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220322221517.2510440-2-eranian@google.com
      bfe4daf8
  2. 03 Apr, 2022 8 commits
  3. 02 Apr, 2022 26 commits