1. 28 Jun, 2024 3 commits
    • James Clark's avatar
      perf pmu: Don't de-duplicate core PMUs · 7afbf90e
      James Clark authored
      Arm PMUs have a suffix, either a single decimal (armv8_pmuv3_0) or 3 hex
      digits which (armv8_cortex_a53) which Perf assumes are both strippable
      suffixes for the purposes of deduplication. S390 "cpum_cf" is a
      similarly suffixed core PMU but is only two characters so is not treated
      as strippable because the rules are a minimum of 3 hex characters or 1
      decimal character.
      
      There are two paths involved in listing PMU events:
      
       * HW/cache event printing assumes core PMUs don't have suffixes so
         doesn't try to strip.
       * Sysfs PMU events share the printing function with uncore PMUs which
         strips.
      
      This results in slightly inconsistent Perf list behavior if a core PMU
      has a suffix:
      
        # perf list
        ...
        armv8_pmuv3_0/branch-load-misses/
        armv8_pmuv3/l3d_cache_wb/          [Kernel PMU event]
        ...
      
      Fix it by partially reverting back to the old list behavior where
      stripping was only done for uncore PMUs. For example commit 8d9f5146
      ("perf pmus: Sort pmus by name then suffix") mentions that only PMUs
      starting 'uncore_' are considered to have a potential suffix. This
      change doesn't go back that far, but does only strip PMUs that are
      !is_core. This keeps the desirable behavior where the many possibly
      duplicated uncore PMUs aren't repeated, but it doesn't break listing for
      core PMUs.
      
      Searching for a PMU continues to use the new stripped comparison
      functions, meaning that it's still possible to request an event by
      specifying the common part of a PMU name, or even open events on
      multiple similarly named PMUs. For example:
      
        # perf stat -e armv8_cortex/inst_retired/
      
        5777173628      armv8_cortex_a53/inst_retired/          (99.93%)
        7469626951      armv8_cortex_a57/inst_retired/          (49.88%)
      
      Fixes: 3241d46f ("perf pmus: Sort/merge/aggregate PMUs like mrvl_ddr_pmu")
      Suggested-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarJames Clark <james.clark@arm.com>
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Cc: robin.murphy@arm.com
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240626145448.896746-3-james.clark@arm.com
      7afbf90e
    • James Clark's avatar
      perf pmu: Restore full PMU name wildcard support · 3e0bf9fd
      James Clark authored
      Commit b2b9d3a3 ("perf pmu: Support wildcards on pmu name in dynamic
      pmu events") gives the following example for wildcarding a subset of
      PMUs:
      
        E.g., in a system with the following dynamic pmus:
      
              mypmu_0
              mypmu_1
              mypmu_2
              mypmu_4
      
        perf stat -e mypmu_[01]/<config>/
      
      Since commit f91fa2ae ("perf pmu: Refactor perf_pmu__match()"), only
      "*" has been supported, removing the ability to subset PMUs, even though
      parse-events.l still supports ? and [] characters.
      
      Fix it by using fnmatch() when any glob character is detected and add a
      test which covers that and other scenarios of
      perf_pmu__match_ignoring_suffix().
      
      Fixes: f91fa2ae ("perf pmu: Refactor perf_pmu__match()")
      Signed-off-by: default avatarJames Clark <james.clark@arm.com>
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Cc: robin.murphy@arm.com
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240626145448.896746-2-james.clark@arm.com
      3e0bf9fd
    • Namhyung Kim's avatar
      perf report: Display pregress bar on redirected pipe data · 4553c431
      Namhyung Kim authored
      It's possible to save pipe output of perf record into a file.
      
        $ perf record -o- ... > pipe.data
      
      And you can use the data same as the normal perf data.
      
        $ perf report -i pipe.data
      
      In that case, perf tools will treat the input as a pipe, but it can get
      the total size of the input.  This means it can show the progress bar
      unlike the normal pipe input (which doesn't know the total size in
      advance).
      
      While at it, fix the string in __perf_session__process_dir_events().
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240627181916.1202110-1-namhyung@kernel.org
      4553c431
  2. 26 Jun, 2024 9 commits
    • Veronika Molnarova's avatar
      perf test stat_bpf_counter.sh: Stabilize the test results · e8b86f03
      Veronika Molnarova authored
      The test has been failing for some time when two separate runs of
      perf benchmarks are recorded for cycles events and their counts are
      compared, while once the recording was done with option --bpf-counters
      and once without it. It is expected that the count of the samples
      should be within a certain range, firstly the difference was set to be
      within 10%, which was then later raised to 20%. However, the test case
      keeps failing on certain architectures as recording the provided
      benchmark can produce completely different counts based on the
      current load of the system.
      
      Sampling two separate runs on intel-eaglestream-spr-13 of "perf stat
      --no-big-num -e cycles -- perf bench sched messaging -g 1 -l 100 -t":
      
       Performance counter stats for 'perf bench sched messaging -g 1 -l 100 -t':
      
               396782898      cycles
      
             0.010051983 seconds time elapsed
      
             0.008664000 seconds user
             0.097058000 seconds sys
      
       Performance counter stats for 'perf bench sched messaging -g 1 -l 100 -t':
      
              1431133032      cycles
      
             0.021803714 seconds time elapsed
      
             0.023377000 seconds user
             0.349918000 seconds sys
      
      , which is ranging from 400mil to 1400mil samples.
      
      Instead of recording the cycles use instructions event, which provides
      more stable values. At the same time change the tested workload to one
      of the provided testing workloads by perf that is not based on a
      scheduler, which can provide another dependency on the current load.
      
      Sampling instructions event with the new workload provide much more
      stable results on intel-eaglestream-spr-13 of "perf stat --no-big-num
      -e instructions -- perf test -w brstack":
      
       Performance counter stats for 'perf test -w brstack':
      
                64584494      instructions
      
             0.009173945 seconds time elapsed
      
             0.007262000 seconds user
             0.002071000 seconds sys
      
       Performance counter stats for 'perf test -w brstack':
      
                64672669      instructions
      
             0.008888135 seconds time elapsed
      
             0.005018000 seconds user
             0.004018000 seconds sys
      Signed-off-by: default avatarVeronika Molnarova <vmolnaro@redhat.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: mpetlan@redhat.com
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240625092001.10909-1-vmolnaro@redhat.com
      e8b86f03
    • Ian Rogers's avatar
      perf python: Clean up build dependencies · e4b19e2c
      Ian Rogers authored
      The python build now depends on libraries and doesn't use
      python-ext-sources except for the util/python.c dependency. Switch to
      just directly depending on that file and util/setup.py. This allows
      the removal of python-ext-sources.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarJames Clark <james.clark@arm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Gary Guo <gary@garyguo.net>
      Cc: Alex Gaynor <alex.gaynor@gmail.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Wedson Almeida Filho <wedsonaf@gmail.com>
      Cc: Ze Gao <zegao2021@gmail.com>
      Cc: Alice Ryhl <aliceryhl@google.com>
      Cc: Andrei Vagin <avagin@google.com>
      Cc: Yicong Yang <yangyicong@hisilicon.com>
      Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Oliver Upton <oliver.upton@linux.dev>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Benno Lossin <benno.lossin@proton.me>
      Cc: Björn Roy Baron <bjorn3_gh@protonmail.com>
      Cc: Andreas Hindborg <a.hindborg@samsung.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240625214117.953777-9-irogers@google.com
      e4b19e2c
    • Ian Rogers's avatar
      perf python: Switch module to linking libraries from building source · 9dabf400
      Ian Rogers authored
      setup.py was building most perf sources causing setup.py to mimic the
      Makefile logic as well as flex/bison code to be stubbed out, due to
      complexity building. By using libraries fewer functions are stubbed
      out, the build is faster and the Makefile logic is reused which should
      simplify updating. The libraries are passed through LDFLAGS to avoid
      complexity in python.
      
      Force the -fPIC flag for libbpf.a to ensure it is suitable for linking
      into the perf python module.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarJames Clark <james.clark@arm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Gary Guo <gary@garyguo.net>
      Cc: Alex Gaynor <alex.gaynor@gmail.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Wedson Almeida Filho <wedsonaf@gmail.com>
      Cc: Ze Gao <zegao2021@gmail.com>
      Cc: Alice Ryhl <aliceryhl@google.com>
      Cc: Andrei Vagin <avagin@google.com>
      Cc: Yicong Yang <yangyicong@hisilicon.com>
      Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Oliver Upton <oliver.upton@linux.dev>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Benno Lossin <benno.lossin@proton.me>
      Cc: Björn Roy Baron <bjorn3_gh@protonmail.com>
      Cc: Andreas Hindborg <a.hindborg@samsung.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240625214117.953777-8-irogers@google.com
      9dabf400
    • Ian Rogers's avatar
      perf util: Make util its own library · e467705a
      Ian Rogers authored
      Make the util directory into its own library. This is done to avoid
      compiling code twice, once for the perf tool and once for the perf
      python module. For convenience:
        arch/common.c
        scripts/perl/Perf-Trace-Util/Context.c
        scripts/python/Perf-Trace-Util/Context.c
      are made part of this library.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarJames Clark <james.clark@arm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Gary Guo <gary@garyguo.net>
      Cc: Alex Gaynor <alex.gaynor@gmail.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Wedson Almeida Filho <wedsonaf@gmail.com>
      Cc: Ze Gao <zegao2021@gmail.com>
      Cc: Alice Ryhl <aliceryhl@google.com>
      Cc: Andrei Vagin <avagin@google.com>
      Cc: Yicong Yang <yangyicong@hisilicon.com>
      Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Oliver Upton <oliver.upton@linux.dev>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Benno Lossin <benno.lossin@proton.me>
      Cc: Björn Roy Baron <bjorn3_gh@protonmail.com>
      Cc: Andreas Hindborg <a.hindborg@samsung.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240625214117.953777-7-irogers@google.com
      e467705a
    • Ian Rogers's avatar
      perf bench: Make bench its own library · 21cc3bc0
      Ian Rogers authored
      Make the benchmark code into a library so it may be linked against
      things like the python module to avoid compiling code twice.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarJames Clark <james.clark@arm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Gary Guo <gary@garyguo.net>
      Cc: Alex Gaynor <alex.gaynor@gmail.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Wedson Almeida Filho <wedsonaf@gmail.com>
      Cc: Ze Gao <zegao2021@gmail.com>
      Cc: Alice Ryhl <aliceryhl@google.com>
      Cc: Andrei Vagin <avagin@google.com>
      Cc: Yicong Yang <yangyicong@hisilicon.com>
      Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Oliver Upton <oliver.upton@linux.dev>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Benno Lossin <benno.lossin@proton.me>
      Cc: Björn Roy Baron <bjorn3_gh@protonmail.com>
      Cc: Andreas Hindborg <a.hindborg@samsung.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240625214117.953777-6-irogers@google.com
      21cc3bc0
    • Ian Rogers's avatar
      perf test: Make tests its own library · 1dad99af
      Ian Rogers authored
      Make the tests code its own library. This is done to avoid compiling
      code twice, once for the perf tool and once for the perf python
      module.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarJames Clark <james.clark@arm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Gary Guo <gary@garyguo.net>
      Cc: Alex Gaynor <alex.gaynor@gmail.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Wedson Almeida Filho <wedsonaf@gmail.com>
      Cc: Ze Gao <zegao2021@gmail.com>
      Cc: Alice Ryhl <aliceryhl@google.com>
      Cc: Andrei Vagin <avagin@google.com>
      Cc: Yicong Yang <yangyicong@hisilicon.com>
      Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Oliver Upton <oliver.upton@linux.dev>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Benno Lossin <benno.lossin@proton.me>
      Cc: Björn Roy Baron <bjorn3_gh@protonmail.com>
      Cc: Andreas Hindborg <a.hindborg@samsung.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240625214117.953777-5-irogers@google.com
      1dad99af
    • Ian Rogers's avatar
      perf pmu-events: Make pmu-events a library · 49f4ac4b
      Ian Rogers authored
      Make pmu-events into a library so it may be linked against things like
      the python module and not built from source.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarJames Clark <james.clark@arm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Gary Guo <gary@garyguo.net>
      Cc: Alex Gaynor <alex.gaynor@gmail.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Wedson Almeida Filho <wedsonaf@gmail.com>
      Cc: Ze Gao <zegao2021@gmail.com>
      Cc: Alice Ryhl <aliceryhl@google.com>
      Cc: Andrei Vagin <avagin@google.com>
      Cc: Yicong Yang <yangyicong@hisilicon.com>
      Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Oliver Upton <oliver.upton@linux.dev>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Benno Lossin <benno.lossin@proton.me>
      Cc: Björn Roy Baron <bjorn3_gh@protonmail.com>
      Cc: Andreas Hindborg <a.hindborg@samsung.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240625214117.953777-4-irogers@google.com
      49f4ac4b
    • Ian Rogers's avatar
      perf ui: Make ui its own library · 39f3ce5c
      Ian Rogers authored
      Make the ui code its own library. This is done to avoid compiling code
      twice, once for the perf tool and once for the perf python module.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarJames Clark <james.clark@arm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Gary Guo <gary@garyguo.net>
      Cc: Alex Gaynor <alex.gaynor@gmail.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Wedson Almeida Filho <wedsonaf@gmail.com>
      Cc: Ze Gao <zegao2021@gmail.com>
      Cc: Alice Ryhl <aliceryhl@google.com>
      Cc: Andrei Vagin <avagin@google.com>
      Cc: Yicong Yang <yangyicong@hisilicon.com>
      Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Oliver Upton <oliver.upton@linux.dev>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Benno Lossin <benno.lossin@proton.me>
      Cc: Björn Roy Baron <bjorn3_gh@protonmail.com>
      Cc: Andreas Hindborg <a.hindborg@samsung.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240625214117.953777-3-irogers@google.com
      39f3ce5c
    • Ian Rogers's avatar
      perf build: Add '*.a' to clean targets · 7f240209
      Ian Rogers authored
      Fix some excessively long lines by deploying '\'.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarJames Clark <james.clark@arm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Gary Guo <gary@garyguo.net>
      Cc: Alex Gaynor <alex.gaynor@gmail.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Wedson Almeida Filho <wedsonaf@gmail.com>
      Cc: Ze Gao <zegao2021@gmail.com>
      Cc: Alice Ryhl <aliceryhl@google.com>
      Cc: Andrei Vagin <avagin@google.com>
      Cc: Yicong Yang <yangyicong@hisilicon.com>
      Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Oliver Upton <oliver.upton@linux.dev>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Benno Lossin <benno.lossin@proton.me>
      Cc: Björn Roy Baron <bjorn3_gh@protonmail.com>
      Cc: Andreas Hindborg <a.hindborg@samsung.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240625214117.953777-2-irogers@google.com
      7f240209
  3. 25 Jun, 2024 14 commits
    • Namhyung Kim's avatar
      perf mem: Fix a segfault with NULL event->name · c7a5592e
      Namhyung Kim authored
      Guilherme reported a crash in perf mem record.  It's because the
      perf_mem_event->name was NULL on his machine.  It should just return
      a NULL string when it has no format string in the name.
      
      The backtrace at the crash is below:
      
        Program received signal SIGSEGV, Segmentation fault.
        __strchrnul_avx2 () at ../sysdeps/x86_64/multiarch/strchr-avx2.S:67
        67              vmovdqu (%rdi), %ymm2
        (gdb) bt
        #0  __strchrnul_avx2 () at ../sysdeps/x86_64/multiarch/strchr-avx2.S:67
        #1  0x00007ffff6c982de in __find_specmb (format=0x0) at printf-parse.h:82
        #2  __printf_buffer (buf=buf@entry=0x7fffffffc760, format=format@entry=0x0, ap=ap@entry=0x7fffffffc880,
            mode_flags=mode_flags@entry=0) at vfprintf-internal.c:649
        #3  0x00007ffff6cb7840 in __vsnprintf_internal (string=<optimized out>, maxlen=<optimized out>, format=0x0,
            args=0x7fffffffc880, mode_flags=mode_flags@entry=0) at vsnprintf.c:96
        #4  0x00007ffff6cb787f in ___vsnprintf (string=<optimized out>, maxlen=<optimized out>, format=<optimized out>,
            args=<optimized out>) at vsnprintf.c:103
        #5  0x00005555557b9391 in scnprintf (buf=0x555555fe9320 <mem_loads_name> "", size=100, fmt=0x0)
            at ../lib/vsprintf.c:21
        #6  0x00005555557b74c3 in perf_pmu__mem_events_name (i=0, pmu=0x555556832180) at util/mem-events.c:106
        #7  0x00005555557b7ab9 in perf_mem_events__record_args (rec_argv=0x55555684c000, argv_nr=0x7fffffffca20)
            at util/mem-events.c:252
        #8  0x00005555555e370d in __cmd_record (argc=3, argv=0x7fffffffd760, mem=0x7fffffffcd80) at builtin-mem.c:156
        #9  0x00005555555e49c4 in cmd_mem (argc=4, argv=0x7fffffffd760) at builtin-mem.c:514
        #10 0x000055555569716c in run_builtin (p=0x555555fcde80 <commands+672>, argc=8, argv=0x7fffffffd760) at perf.c:349
        #11 0x0000555555697402 in handle_internal_command (argc=8, argv=0x7fffffffd760) at perf.c:402
        #12 0x0000555555697560 in run_argv (argcp=0x7fffffffd59c, argv=0x7fffffffd590) at perf.c:446
        #13 0x00005555556978a6 in main (argc=8, argv=0x7fffffffd760) at perf.c:562
      Reported-by: default avatarGuilherme Amadio <amadio@cern.ch>
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Closes: https://lore.kernel.org/linux-perf-users/Zlns_o_IE5L28168@cern.chSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240621170528.608772-5-namhyung@kernel.org
      c7a5592e
    • Namhyung Kim's avatar
      perf tools: Fix a compiler warning of NULL pointer · 0eb739d8
      Namhyung Kim authored
      A compiler warning on the second argument of bsearch() should not be
      NULL, but there's a case we might pass it.  Let's return early if we
      don't have any DSOs to search in __dsos__find_by_longname_id().
      
        util/dsos.c:184:8: runtime error: null pointer passed as argument 2, which is declared to never be null
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Closes: https://lore.kernel.org/oe-lkp/202406180932.84be448c-oliver.sang@intel.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240621170528.608772-4-namhyung@kernel.org
      0eb739d8
    • Namhyung Kim's avatar
      perf symbol: Simplify kernel module checking · e988a5b5
      Namhyung Kim authored
      In dso__load(), it checks if the dso is a kernel module by looking the
      symtab type.  Actually dso has 'is_kmod' field to check that easily and
      dso__set_module_info() set the symtab type and the is_kmod bit.  So it
      should have the same result to check the is_kmod bit.
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240621170528.608772-3-namhyung@kernel.org
      e988a5b5
    • Namhyung Kim's avatar
      perf report: Fix condition in sort__sym_cmp() · cb39d05e
      Namhyung Kim authored
      It's expected that both hist entries are in the same hists when
      comparing two.  But the current code in the function checks one without
      dso sort key and other with the key.  This would make the condition true
      in any case.
      
      I guess the intention of the original commit was to add '!' for the
      right side too.  But as it should be the same, let's just remove it.
      
      Fixes: 69849fc5 ("perf hists: Move sort__has_dso into struct perf_hpp_list")
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240621170528.608772-2-namhyung@kernel.org
      cb39d05e
    • Junhao He's avatar
      perf pmus: Fixes always false when compare duplicates aliases · dd9a426e
      Junhao He authored
      In the previous loop, all the members in the aliases[j-1] have been freed
      and set to NULL. But in this loop, the function pmu_alias_is_duplicate()
      compares the aliases[j] with the aliases[j-1] that has already been
      disposed, so the function will always return false and duplicate aliases
      will never be discarded.
      
      If we find duplicate aliases, it skips the zfree aliases[j], which is
      accompanied by a memory leak.
      
      We can use the next aliases[j+1] to theck for duplicate aliases to
      fixes the aliases NULL pointer dereference, then goto zfree code snippet
      to release it.
      
      After patch testing:
       $ perf list --unit=hisi_sicl,cpa pmu
      
       uncore cpa:
         cpa_p0_rd_dat_32b
              [Number of read ops transmitted by the P0 port which size is 32 bytes.
               Unit: hisi_sicl,cpa]
         cpa_p0_rd_dat_64b
              [Number of read ops transmitted by the P0 port which size is 64 bytes.
               Unit: hisi_sicl,cpa]
      
      Fixes: c3245d20 ("perf pmu: Abstract alias/event struct")
      Signed-off-by: default avatarJunhao He <hejunhao3@huawei.com>
      Cc: ravi.bangoria@amd.com
      Cc: james.clark@arm.com
      Cc: prime.zeng@hisilicon.com
      Cc: cuigaosheng1@huawei.com
      Cc: jonathan.cameron@huawei.com
      Cc: linuxarm@huawei.com
      Cc: yangyicong@huawei.com
      Cc: robh@kernel.org
      Cc: renyu.zj@linux.alibaba.com
      Cc: kjain@linux.ibm.com
      Cc: john.g.garry@oracle.com
      Cc: linux-arm-kernel@lists.infradead.org
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240614094318.11607-1-hejunhao3@huawei.com
      dd9a426e
    • Yunseong Kim's avatar
      perf unwind-libunwind: Add malloc() failure handling · 83da316a
      Yunseong Kim authored
      Add malloc() failure handling in unread_unwind_spec_debug_frame().
      This make caller find_proc_info() works well when the allocation failure.
      Signed-off-by: default avatarYunseong Kim <yskelg@gmail.com>
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Austin Kim <austindh.kim@gmail.com>
      Cc: shjy180909@gmail.com
      Cc: Ze Gao <zegao2021@gmail.com>
      Cc: Leo Yan <leo.yan@linux.dev>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240619204211.6438-2-yskelg@gmail.com
      83da316a
    • Yunseong Kim's avatar
      util: constant -1 with expression of type char · e9ffa312
      Yunseong Kim authored
      This patch resolve following warning.
      
        tools/perf/util/evsel.c:1620:9: error: result of comparison of constant
         -1 with expression of type 'char' is always false
         -Werror,-Wtautological-constant-out-of-range-compare
         1620 |                 if (c == -1)
              |                     ~ ^  ~~
      Signed-off-by: default avatarYunseong Kim <yskelg@gmail.com>
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Austin Kim <austindh.kim@gmail.com>
      Cc: shjy180909@gmail.com
      Cc: Ze Gao <zegao2021@gmail.com>
      Cc: Leo Yan <leo.yan@linux.dev>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240619203428.6330-2-yskelg@gmail.com
      e9ffa312
    • Fernand Sieber's avatar
      perf: Timehist account sch delay for scheduled out running · d363c2a8
      Fernand Sieber authored
      When using perf timehist, sch delay is only computed for a waking task,
      not for a pre empted task. This patches changes sch delay to account for
      both. This makes sense as testing scheduling policy need to consider the
      effect of scheduling delay globally, not only for waking tasks.
      
      Example of `perf timehist` report before the patch for `stress` task
      competing with each other.
      
      First column is wait time, second column sch delay, third column
      runtime.
      
      1.492060 [0000]  s    stress[81]                          1.999      0.000      2.000      R  next: stress[83]
      1.494060 [0000]  s    stress[83]                          2.000      0.000      2.000      R  next: stress[81]
      1.496060 [0000]  s    stress[81]                          2.000      0.000      2.000      R  next: stress[83]
      1.498060 [0000]  s    stress[83]                          2.000      0.000      1.999      R  next: stress[81]
      
      After the patch, it looks like this (note that all wait time is not zero
      anymore):
      
      1.492060 [0000]  s    stress[81]                          1.999      1.999      2.000      R  next: stress[83]
      1.494060 [0000]  s    stress[83]                          2.000      2.000      2.000      R  next: stress[81]
      1.496060 [0000]  s    stress[81]                          2.000      2.000      2.000      R  next: stress[83]
      1.498060 [0000]  s    stress[83]                          2.000      2.000      1.999      R  next: stress[81]
      Signed-off-by: default avatarFernand Sieber <sieberf@amazon.com>
      Reviewed-by: default avatarMadadi Vineeth Reddy <vineethr@linux.ibm.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240618090339.87482-1-sieberf@amazon.com
      d363c2a8
    • Adrian Hunter's avatar
      perf tests: Add APX and other new instructions to x86 instruction decoder test · fcd094e5
      Adrian Hunter authored
      Add samples of APX and other new instructions to the 'x86 instruction
      decoder - new instructions' test.
      
      Note the test is only available if the perf tool has been built with
      EXTRA_TESTS=1.
      
      Example:
      
        $ make EXTRA_TESTS=1 -C tools/perf
        $ tools/perf/perf test -F -v 'new ins' |& grep -i 'jmpabs\|popp\|pushp'
        Decoded ok: d5 00 a1 ef cd ab 90 78 56 34 12    jmpabs $0x1234567890abcdef
        Decoded ok: d5 08 53                    pushp  %rbx
        Decoded ok: d5 18 50                    pushp  %r16
        Decoded ok: d5 19 57                    pushp  %r31
        Decoded ok: d5 19 5f                    popp   %r31
        Decoded ok: d5 18 58                    popp   %r16
        Decoded ok: d5 08 5b                    popp   %rbx
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chang S. Bae <chang.seok.bae@intel.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Nikolay Borisov <nik.borisov@suse.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: x86@kernel.org
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240502105853.5338-11-adrian.hunter@intel.com
      fcd094e5
    • Adrian Hunter's avatar
      perf intel pt: Add new JMPABS instruction to the Intel PT instruction decoder · a44abd2c
      Adrian Hunter authored
      JMPABS is 64-bit absolute direct jump instruction, encoded with a mandatory
      REX2 prefix. JMPABS is designed to be used in the procedure linkage table
      (PLT) to replace indirect jumps, because it has better performance. In that
      case the jump target will be amended at run time. To enable Intel PT to
      follow the code, a TIP packet is always emitted when JMPABS is traced under
      Intel PT.
      
      Refer to the Intel Advanced Performance Extensions (Intel APX) Architecture
      Specification for details.
      
      Decode JMPABS as an indirect jump, because it has an associated TIP packet
      the same as an indirect jump and the control flow should follow the TIP
      packet payload, and not assume it is the same as the on-file object code
      JMPABS target address.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chang S. Bae <chang.seok.bae@intel.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Nikolay Borisov <nik.borisov@suse.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: x86@kernel.org
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240502105853.5338-10-adrian.hunter@intel.com
      a44abd2c
    • Chaitanya S Prakash's avatar
      perf test: Check output of the probe ... --funcs command · abc0f0c4
      Chaitanya S Prakash authored
      Test "perf probe of function from different CU" only checks if the perf
      command has failed and doesn't test the --funcs output. In the issue
      reported in the previous commit, the garbage output of the --funcs
      command was being ignored by the test when it could have been caught.
      
      The script first makes use of --funcs option with the perf probe command
      to check if the function "foo" exists in the testfile before adding a
      probe to it in the next command. The output of probe...--funcs command
      is redirected to stdout, therefore, add '| grep "foo"' to validate the
      result.
      Signed-off-by: default avatarChaitanya S Prakash <chaitanyas.prakash@arm.com>
      Reviewed-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Cc: anshuman.khandual@arm.com
      Cc: james.clark@arm.com
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240601125946.1741414-11-ChaitanyaS.Prakash@arm.com
      abc0f0c4
    • Athira Rajeev's avatar
      tools/perf: Fix parallel-perf python script to replace new python syntax ":=" usage · 7d49ced8
      Athira Rajeev authored
      perf test "perf script tests" fails as below in systems
      with python 3.6
      
      	File "/home/athira/linux/tools/perf/tests/shell/../../scripts/python/parallel-perf.py", line 442
      	if line := p.stdout.readline():
                   ^
      	SyntaxError: invalid syntax
      	--- Cleaning up ---
      	---- end(-1) ----
      	92: perf script tests: FAILED!
      
      This happens because ":=" is a new syntax that assigns values
      to variables as part of a larger expression. This is introduced
      from python 3.8 and hence fails in setup with python 3.6
      Address this by splitting the large expression and check the
      value in two steps:
      Previous line: if line := p.stdout.readline():
      Current change:
      	line = p.stdout.readline()
      	if line:
      
      With patch
      
      	./perf test "perf script tests"
      	 93: perf script tests:  Ok
      Signed-off-by: default avatarAthira Rajeev <atrajeev@linux.vnet.ibm.com>
      Acked-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: akanksha@linux.ibm.com
      Cc: kjain@linux.ibm.com
      Cc: maddy@linux.ibm.com
      Cc: disgoel@linux.vnet.ibm.com
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240623064850.83720-3-atrajeev@linux.vnet.ibm.com
      7d49ced8
    • Athira Rajeev's avatar
      tools/perf: Use is_perf_pid_map_name helper function to check dso's of pattern /tmp/perf-%d.map · b9241f15
      Athira Rajeev authored
      commit 80d496be ("perf report: Add support for profiling JIT
      generated code") added support for profiling JIT generated code.
      This patch handles dso's of form "/tmp/perf-$PID.map".
      
      Some of the references doesn't check exactly for same pattern.
      some uses "if (!strncmp(dso_name, "/tmp/perf-", 10))". Fix
      this by using helper function perf_pid_map_tid and
      is_perf_pid_map_name which looks for proper pattern of
      form: "/tmp/perf-$PID.map" for these checks.
      Signed-off-by: default avatarAthira Rajeev <atrajeev@linux.vnet.ibm.com>
      Reviewed-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: akanksha@linux.ibm.com
      Cc: kjain@linux.ibm.com
      Cc: maddy@linux.ibm.com
      Cc: disgoel@linux.vnet.ibm.com
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240623064850.83720-2-atrajeev@linux.vnet.ibm.com
      b9241f15
    • Athira Rajeev's avatar
      tools/perf: Fix the string match for "/tmp/perf-$PID.map" files in dso__load · b0979f00
      Athira Rajeev authored
      Perf test for perf probe of function from different CU fails
      as below:
      
      	./perf test -vv "test perf probe of function from different CU"
      	116: test perf probe of function from different CU:
      	--- start ---
      	test child forked, pid 2679
      	Failed to find symbol foo in /tmp/perf-uprobe-different-cu-sh.Msa7iy89bx/testfile
      	  Error: Failed to add events.
      	--- Cleaning up ---
      	"foo" does not hit any event.
      	  Error: Failed to delete events.
      	---- end(-1) ----
      	116: test perf probe of function from different CU                   : FAILED!
      
      The test does below to probe function "foo" :
      
      	# gcc -g -Og -flto -c /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile-foo.c
      	-o /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile-foo.o
      	# gcc -g -Og -c /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile-main.c
      	-o /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile-main.o
      	# gcc -g -Og -o /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile
      	/tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile-foo.o
      	/tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile-main.o
      
      	# ./perf probe -x /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile foo
      	Failed to find symbol foo in /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile
      	   Error: Failed to add events.
      
      Perf probe fails to find symbol foo in the executable placed in
      /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7
      
      Simple reproduce:
      
       # mktemp -d /tmp/perf-checkXXXXXXXXXX
         /tmp/perf-checkcWpuLRQI8j
      
       # gcc -g -o test test.c
       # cp test /tmp/perf-checkcWpuLRQI8j/
       # nm /tmp/perf-checkcWpuLRQI8j/test | grep foo
         00000000100006bc T foo
      
       # ./perf probe -x /tmp/perf-checkcWpuLRQI8j/test foo
         Failed to find symbol foo in /tmp/perf-checkcWpuLRQI8j/test
            Error: Failed to add events.
      
      But it works with any files like /tmp/perf/test. Only for
      patterns with "/tmp/perf-", this fails.
      
      Further debugging, commit 80d496be ("perf report: Add support
      for profiling JIT generated code") added support for profiling JIT
      generated code. This patch handles dso's of form
      "/tmp/perf-$PID.map" .
      
      The check used "if (strncmp(self->name, "/tmp/perf-", 10) == 0)"
      to match "/tmp/perf-$PID.map". With this commit, any dso in
      /tmp/perf- folder will be considered separately for processing
      (not only JIT created map files ). Fix this by changing the
      string pattern to check for "/tmp/perf-%d.map". Add a helper
      function is_perf_pid_map_name to do this check. In "struct dso",
      dso->long_name holds the long name of the dso file. Since the
      /tmp/perf-$PID.map check uses the complete name, use dso___long_name for
      the string name.
      
      With the fix,
      	# ./perf test "test perf probe of function from different CU"
      	117: test perf probe of function from different CU                   : Ok
      
      Fixes: 56cbeacf ("perf probe: Add test for regression introduced by switch to die_get_decl_file()")
      Signed-off-by: default avatarAthira Rajeev <atrajeev@linux.vnet.ibm.com>
      Reviewed-by: default avatarChaitanya S Prakash <chaitanyas.prakash@arm.com>
      Reviewed-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: akanksha@linux.ibm.com
      Cc: kjain@linux.ibm.com
      Cc: maddy@linux.ibm.com
      Cc: disgoel@linux.vnet.ibm.com
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240623064850.83720-1-atrajeev@linux.vnet.ibm.com
      b0979f00
  4. 24 Jun, 2024 1 commit
  5. 21 Jun, 2024 4 commits
  6. 20 Jun, 2024 9 commits