• Ian Rogers's avatar
    perf pmu: Make the loading of formats lazy · 50402641
    Ian Rogers authored
    The sysfs format files are loaded eagerly in a PMU. Add a flag so that
    we create the format but only load the contents when necessary.
    
    Reduce the size of the value in struct perf_pmu_format and avoid holes
    so there is no additional space requirement.
    
    For "perf stat -e cycles true" this reduces the number of openat calls
    from 648 to 573 (about 12%). The benchmark pmu scan speed is improved
    by roughly 5%.
    
    Before:
    
      $ perf bench internals pmu-scan
      Computing performance of sysfs PMU event scan for 100 times
        Average core PMU scanning took: 1061.100 usec (+- 9.965 usec)
        Average PMU scanning took: 4725.300 usec (+- 260.599 usec)
    
    After:
    
      $ perf bench internals pmu-scan
      Computing performance of sysfs PMU event scan for 100 times
        Average core PMU scanning took: 989.170 usec (+- 6.873 usec)
        Average PMU scanning took: 4520.960 usec (+- 251.272 usec)
    
    Committer testing:
    
    On a AMD Ryzen 5950x:
    
    Before:
    
      $ perf bench internals pmu-scan -i1000
      # Running 'internals/pmu-scan' benchmark:
      Computing performance of sysfs PMU event scan for 1000 times
        Average core PMU scanning took: 563.466 usec (+- 1.008 usec)
        Average PMU scanning took: 1619.174 usec (+- 23.627 usec)
      $ perf stat -r5 perf bench internals pmu-scan -i1000
      # Running 'internals/pmu-scan' benchmark:
      Computing performance of sysfs PMU event scan for 1000 times
        Average core PMU scanning took: 583.401 usec (+- 2.098 usec)
        Average PMU scanning took: 1677.352 usec (+- 24.636 usec)
      # Running 'internals/pmu-scan' benchmark:
      Computing performance of sysfs PMU event scan for 1000 times
        Average core PMU scanning took: 553.254 usec (+- 0.825 usec)
        Average PMU scanning took: 1635.655 usec (+- 24.312 usec)
      # Running 'internals/pmu-scan' benchmark:
      Computing performance of sysfs PMU event scan for 1000 times
        Average core PMU scanning took: 557.733 usec (+- 0.980 usec)
        Average PMU scanning took: 1600.659 usec (+- 23.344 usec)
      # Running 'internals/pmu-scan' benchmark:
      Computing performance of sysfs PMU event scan for 1000 times
        Average core PMU scanning took: 554.906 usec (+- 0.774 usec)
        Average PMU scanning took: 1595.338 usec (+- 23.288 usec)
      # Running 'internals/pmu-scan' benchmark:
      Computing performance of sysfs PMU event scan for 1000 times
        Average core PMU scanning took: 551.798 usec (+- 0.967 usec)
        Average PMU scanning took: 1623.213 usec (+- 23.998 usec)
    
       Performance counter stats for 'perf bench internals pmu-scan -i1000' (5 runs):
    
                 3276.82 msec task-clock:u                     #    0.990 CPUs utilized               ( +-  0.82% )
                       0      context-switches:u               #    0.000 /sec
                       0      cpu-migrations:u                 #    0.000 /sec
                    1008      page-faults:u                    #  307.615 /sec                        ( +-  0.04% )
             12049614778      cycles:u                         #    3.677 GHz                         ( +-  0.07% )  (83.34%)
               117507478      stalled-cycles-frontend:u        #    0.98% frontend cycles idle        ( +-  0.33% )  (83.32%)
                27106761      stalled-cycles-backend:u         #    0.22% backend cycles idle         ( +-  9.55% )  (83.36%)
             33294953848      instructions:u                   #    2.76  insn per cycle
                                                               #    0.00  stalled cycles per insn     ( +-  0.03% )  (83.31%)
              6849825049      branches:u                       #    2.090 G/sec                       ( +-  0.03% )  (83.37%)
                71533903      branch-misses:u                  #    1.04% of all branches             ( +-  0.20% )  (83.30%)
    
                  3.3088 +- 0.0302 seconds time elapsed  ( +-  0.91% )
    
      $
    
    After:
    
      $ perf stat -r5 perf bench internals pmu-scan -i1000
      # Running 'internals/pmu-scan' benchmark:
      Computing performance of sysfs PMU event scan for 1000 times
        Average core PMU scanning took: 550.702 usec (+- 0.958 usec)
        Average PMU scanning took: 1566.577 usec (+- 22.747 usec)
      # Running 'internals/pmu-scan' benchmark:
      Computing performance of sysfs PMU event scan for 1000 times
        Average core PMU scanning took: 548.315 usec (+- 0.555 usec)
        Average PMU scanning took: 1565.499 usec (+- 22.760 usec)
      # Running 'internals/pmu-scan' benchmark:
      Computing performance of sysfs PMU event scan for 1000 times
        Average core PMU scanning took: 548.073 usec (+- 0.555 usec)
        Average PMU scanning took: 1586.097 usec (+- 23.299 usec)
      # Running 'internals/pmu-scan' benchmark:
      Computing performance of sysfs PMU event scan for 1000 times
        Average core PMU scanning took: 561.184 usec (+- 2.709 usec)
        Average PMU scanning took: 1567.153 usec (+- 22.548 usec)
      # Running 'internals/pmu-scan' benchmark:
      Computing performance of sysfs PMU event scan for 1000 times
        Average core PMU scanning took: 546.987 usec (+- 0.553 usec)
        Average PMU scanning took: 1562.814 usec (+- 22.729 usec)
    
       Performance counter stats for 'perf bench internals pmu-scan -i1000' (5 runs):
    
                 3170.86 msec task-clock:u                     #    0.992 CPUs utilized               ( +-  0.22% )
                       0      context-switches:u               #    0.000 /sec
                       0      cpu-migrations:u                 #    0.000 /sec
                    1010      page-faults:u                    #  318.526 /sec                        ( +-  0.04% )
             11890047674      cycles:u                         #    3.750 GHz                         ( +-  0.14% )  (83.27%)
               119090499      stalled-cycles-frontend:u        #    1.00% frontend cycles idle        ( +-  0.46% )  (83.40%)
                32502449      stalled-cycles-backend:u         #    0.27% backend cycles idle         ( +-  8.32% )  (83.30%)
             33119141261      instructions:u                   #    2.79  insn per cycle
                                                        #    0.00  stalled cycles per insn     ( +-  0.01% )  (83.37%)
              6812816561      branches:u                       #    2.149 G/sec                       ( +-  0.01% )  (83.29%)
                70157855      branch-misses:u                  #    1.03% of all branches             ( +-  0.28% )  (83.38%)
    
                 3.19710 +- 0.00826 seconds time elapsed  ( +-  0.26% )
    
      $
    Signed-off-by: default avatarIan Rogers <irogers@google.com>
    Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Adrian Hunter <adrian.hunter@intel.com>
    Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
    Cc: Gaosheng Cui <cuigaosheng1@huawei.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: James Clark <james.clark@arm.com>
    Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: John Garry <john.g.garry@oracle.com>
    Cc: Kajol Jain <kjain@linux.ibm.com>
    Cc: Kan Liang <kan.liang@linux.intel.com>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Ravi Bangoria <ravi.bangoria@amd.com>
    Cc: Rob Herring <robh@kernel.org>
    Link: https://lore.kernel.org/r/20230824041330.266337-2-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
    50402641
pmu.c 40.8 KB