1. 16 Feb, 2024 18 commits
  2. 15 Feb, 2024 4 commits
    • Leo Yan's avatar
      perf build: Cleanup perf register configuration · 81901fc0
      Leo Yan authored
      The target is to allow the tool to always enable the perf register
      feature for native parsing and cross parsing, and current code doesn't
      depend on the macro 'HAVE_PERF_REGS_SUPPORT'.
      
      This patch remove the variable 'NO_PERF_REGS' and the defined macro
      'HAVE_PERF_REGS_SUPPORT' from the Makefile.
      Signed-off-by: default avatarLeo Yan <leo.yan@linux.dev>
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Ming Wang <wangming01@loongson.cn>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: linux-csky@vger.kernel.org
      Cc: linux-riscv@lists.infradead.org
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240214113947.240957-5-leo.yan@linux.dev
      81901fc0
    • Leo Yan's avatar
      perf parse-regs: Introduce a weak function arch__sample_reg_masks() · 9a4e47ef
      Leo Yan authored
      Every architecture can provide a register list for sampling. If an
      architecture doesn't support register sampling, it won't define the data
      structure 'sample_reg_masks'. Consequently, any code using this
      structure must be protected by the macro 'HAVE_PERF_REGS_SUPPORT'.
      
      This patch defines a weak function, arch__sample_reg_masks(), which will
      be replaced by an architecture-defined function for returning the
      architecture's register list. With this refactoring, the function always
      exists, the condition checking for 'HAVE_PERF_REGS_SUPPORT' is not
      needed anymore, so remove it.
      Signed-off-by: default avatarLeo Yan <leo.yan@linux.dev>
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Ming Wang <wangming01@loongson.cn>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: linux-csky@vger.kernel.org
      Cc: linux-riscv@lists.infradead.org
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240214113947.240957-4-leo.yan@linux.dev
      9a4e47ef
    • Leo Yan's avatar
      perf parse-regs: Always build perf register functions · ec87c99d
      Leo Yan authored
      Currently, the macro HAVE_PERF_REGS_SUPPORT is used as a switch to turn
      on or turn off the code of perf registers. If any architecture cannot
      support perf register, it disables the perf register parsing, for both
      the native parsing and cross parsing for other architectures.
      
      To support both the native parsing and cross parsing, the tool should
      always build the perf regs functions. Thus, this patch removes
      HAVE_PERF_REGS_SUPPORT from the perf regs files.
      Signed-off-by: default avatarLeo Yan <leo.yan@linux.dev>
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Ming Wang <wangming01@loongson.cn>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: linux-csky@vger.kernel.org
      Cc: linux-riscv@lists.infradead.org
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240214113947.240957-3-leo.yan@linux.dev
      ec87c99d
    • Leo Yan's avatar
      perf build: Remove unused CONFIG_PERF_REGS · fca6af7b
      Leo Yan authored
      CONFIG_PERF_REGS is not used, remove it.
      Signed-off-by: default avatarLeo Yan <leo.yan@linux.dev>
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Ming Wang <wangming01@loongson.cn>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: linux-csky@vger.kernel.org
      Cc: linux-riscv@lists.infradead.org
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240214113947.240957-2-leo.yan@linux.dev
      fca6af7b
  3. 13 Feb, 2024 4 commits
  4. 12 Feb, 2024 7 commits
    • Ian Rogers's avatar
      perf maps: Locking tidy up of nr_maps · 923e4616
      Ian Rogers authored
      After this change maps__nr_maps is only used by tests, existing users
      are migrated to maps__empty. Compute maps__empty under the read lock.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Colin Ian King <colin.i.king@gmail.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Liam Howlett <liam.howlett@oracle.com>
      Cc: Artem Savkov <asavkov@redhat.com>
      Cc: bpf@vger.kernel.org
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240210031746.4057262-7-irogers@google.com
      923e4616
    • Ian Rogers's avatar
      perf maps: Hide maps internals · ff0bd799
      Ian Rogers authored
      Move the struct into the C file. Add maps__equal to work around
      exposing the struct for reference count checking. Add accessors for
      the unwind_libunwind_ops. Move maps_list_node to its only use in
      symbol.c.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Colin Ian King <colin.i.king@gmail.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Liam Howlett <liam.howlett@oracle.com>
      Cc: Artem Savkov <asavkov@redhat.com>
      Cc: bpf@vger.kernel.org
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240210031746.4057262-6-irogers@google.com
      ff0bd799
    • Ian Rogers's avatar
      perf maps: Get map before returning in maps__find_next_entry · 39a27325
      Ian Rogers authored
      Finding a map is done under a lock, returning the map without a
      reference count means it can be removed without notice and causing
      uses after free. Grab a reference count to the map within the lock
      region and return this. Fix up locations that need a map__put
      following this.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Colin Ian King <colin.i.king@gmail.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Liam Howlett <liam.howlett@oracle.com>
      Cc: Artem Savkov <asavkov@redhat.com>
      Cc: bpf@vger.kernel.org
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240210031746.4057262-5-irogers@google.com
      39a27325
    • Ian Rogers's avatar
      perf maps: Get map before returning in maps__find_by_name · 107ef66c
      Ian Rogers authored
      Finding a map is done under a lock, returning the map without a
      reference count means it can be removed without notice and causing
      uses after free. Grab a reference count to the map within the lock
      region and return this. Fix up locations that need a map__put
      following this. Also fix some reference counted pointer comparisons.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Colin Ian King <colin.i.king@gmail.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Liam Howlett <liam.howlett@oracle.com>
      Cc: Artem Savkov <asavkov@redhat.com>
      Cc: bpf@vger.kernel.org
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240210031746.4057262-4-irogers@google.com
      107ef66c
    • Ian Rogers's avatar
      perf maps: Get map before returning in maps__find · 42fd623b
      Ian Rogers authored
      Finding a map is done under a lock, returning the map without a
      reference count means it can be removed without notice and causing
      uses after free. Grab a reference count to the map within the lock
      region and return this. Fix up locations that need a map__put
      following this.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Colin Ian King <colin.i.king@gmail.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Liam Howlett <liam.howlett@oracle.com>
      Cc: Artem Savkov <asavkov@redhat.com>
      Cc: bpf@vger.kernel.org
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240210031746.4057262-3-irogers@google.com
      42fd623b
    • Ian Rogers's avatar
      perf maps: Switch from rbtree to lazily sorted array for addresses · 659ad349
      Ian Rogers authored
      Maps is a collection of maps primarily sorted by the starting address
      of the map. Prior to this change the maps were held in an rbtree
      requiring 4 pointers per node. Prior to reference count checking, the
      rbnode was embedded in the map so 3 pointers per node were
      necessary. This change switches the rbtree to an array lazily sorted
      by address, much as the array sorting nodes by name. 1 pointer is
      needed per node, but to avoid excessive resizing the backing array may
      be twice the number of used elements. Meaning the memory overhead is
      roughly half that of the rbtree. For a perf record with
      "--no-bpf-event -g -a" of true, the memory overhead of perf inject is
      reduce fom 3.3MB to 3MB, so 10% or 300KB is saved.
      
      Map inserts always happen at the end of the array. The code tracks
      whether the insertion violates the sorting property. O(log n) rb-tree
      complexity is switched to O(1).
      
      Remove slides the array, so O(log n) rb-tree complexity is degraded to
      O(n).
      
      A find may need to sort the array using qsort which is O(n*log n), but
      in general the maps should be sorted and so average performance should
      be O(log n) as with the rbtree.
      
      An rbtree node consumes a cache line, but with the array 4 nodes fit
      on a cache line. Iteration is simplified to scanning an array rather
      than pointer chasing.
      
      Overall it is expected the performance after the change should be
      comparable to before, but with half of the memory consumed.
      
      To avoid a list and repeated logic around splitting maps,
      maps__merge_in is rewritten in terms of
      maps__fixup_overlap_and_insert. maps_merge_in splits the given mapping
      inserting remaining gaps. maps__fixup_overlap_and_insert splits the
      existing mappings, then adds the incoming mapping. By adding the new
      mapping first, then re-inserting the existing mappings the splitting
      behavior matches.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Colin Ian King <colin.i.king@gmail.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Liam Howlett <liam.howlett@oracle.com>
      Cc: Artem Savkov <asavkov@redhat.com>
      Cc: bpf@vger.kernel.org
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240210031746.4057262-2-irogers@google.com
      659ad349
    • Namhyung Kim's avatar
      Merge branch 'perf-tools' into perf-tools-next · 39d14c0d
      Namhyung Kim authored
      To get some fixes in the perf test and JSON metrics into the development
      branch.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      39d14c0d
  5. 10 Feb, 2024 1 commit
  6. 09 Feb, 2024 6 commits
    • Yicong Yang's avatar
      perf stat: Support per-cluster aggregation · cbc917a1
      Yicong Yang authored
      Some platforms have 'cluster' topology and CPUs in the cluster will
      share resources like L3 Cache Tag (for HiSilicon Kunpeng SoC) or L2
      cache (for Intel Jacobsville). Currently parsing and building cluster
      topology have been supported since [1].
      
      perf stat has already supported aggregation for other topologies like
      die or socket, etc. It'll be useful to aggregate per-cluster to find
      problems like L3T bandwidth contention.
      
      This patch add support for "--per-cluster" option for per-cluster
      aggregation. Also update the docs and related test. The output will
      be like:
      
      [root@localhost tmp]# perf stat -a -e LLC-load --per-cluster -- sleep 5
      
       Performance counter stats for 'system wide':
      
      S56-D0-CLS158    4      1,321,521,570      LLC-load
      S56-D0-CLS594    4        794,211,453      LLC-load
      S56-D0-CLS1030    4             41,623      LLC-load
      S56-D0-CLS1466    4             41,646      LLC-load
      S56-D0-CLS1902    4             16,863      LLC-load
      S56-D0-CLS2338    4             15,721      LLC-load
      S56-D0-CLS2774    4             22,671      LLC-load
      [...]
      
      On a legacy system without cluster or cluster support, the output will
      be look like:
      [root@localhost perf]# perf stat -a -e cycles --per-cluster -- sleep 1
      
       Performance counter stats for 'system wide':
      
      S56-D0-CLS0   64         18,011,485      cycles
      S7182-D0-CLS0   64         16,548,835      cycles
      
      Note that this patch doesn't mix the cluster information in the outputs
      of --per-core to avoid breaking any tools/scripts using it.
      
      Note that perf recently supports "--per-cache" aggregation, but it's not
      the same with the cluster although cluster CPUs may share some cache
      resources. For example on my machine all clusters within a die share the
      same L3 cache:
      $ cat /sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_list
      0-31
      $ cat /sys/devices/system/cpu/cpu0/topology/cluster_cpus_list
      0-3
      
      [1] commit c5e22fef ("topology: Represent clusters of CPUs within a die")
      Tested-by: default avatarJie Zhan <zhanjie9@hisilicon.com>
      Reviewed-by: default avatarTim Chen <tim.c.chen@linux.intel.com>
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarYicong Yang <yangyicong@hisilicon.com>
      Cc: james.clark@arm.com
      Cc: 21cnbao@gmail.com
      Cc: prime.zeng@hisilicon.com
      Cc: Jonathan.Cameron@huawei.com
      Cc: fanghao11@huawei.com
      Cc: linuxarm@huawei.com
      Cc: tim.c.chen@intel.com
      Cc: linux-arm-kernel@lists.infradead.org
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240208024026.2691-1-yangyicong@huawei.com
      cbc917a1
    • Namhyung Kim's avatar
      perf tools: Remove misleading comments on map functions · 9a440bb2
      Namhyung Kim authored
      When it converts sample IP to or from objdump-capable one, there's a
      comment saying that kernel modules have DSO_SPACE__USER.  But commit
      02213cec ("perf maps: Mark module DSOs with kernel type") changed
      it and makes the comment confusing.  Let's get rid of it.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Link: https://lore.kernel.org/r/20240208181025.1329645-1-namhyung@kernel.org
      9a440bb2
    • Yang Jihong's avatar
      perf thread_map: Free strlist on normal path in thread_map__new_by_tid_str() · 1eb3d924
      Yang Jihong authored
      slist needs to be freed in both error path and normal path in
      thread_map__new_by_tid_str().
      
      Fixes: b52956c9 ("perf tools: Allow multiple threads or processes in record, stat, top")
      Reviewed-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarYang Jihong <yangjihong1@huawei.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240206083228.172607-6-yangjihong1@huawei.com
      1eb3d924
    • Yang Jihong's avatar
      perf sched: Move curr_pid and cpu_last_switched initialization to perf_sched__{lat|map|replay}() · bd2cdf26
      Yang Jihong authored
      The curr_pid and cpu_last_switched are used only for the
      'perf sched replay/latency/map'. Put their initialization in
      perf_sched__{lat|map|replay () to reduce unnecessary actions in other
      commands.
      
      Simple functional testing:
      
        # perf sched record perf bench sched messaging
        # Running 'sched/messaging' benchmark:
        # 20 sender and receiver processes per group
        # 10 groups == 400 processes run
      
             Total time: 0.209 [sec]
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 16.456 MB perf.data (147907 samples) ]
      
        # perf sched lat
      
         -------------------------------------------------------------------------------------------------------------------------------------------
          Task                  |   Runtime ms  | Switches | Avg delay ms    | Max delay ms    | Max delay start           | Max delay end          |
         -------------------------------------------------------------------------------------------------------------------------------------------
          sched-messaging:(401) |   2990.699 ms |    38705 | avg:   0.661 ms | max:  67.046 ms | max start: 456532.624830 s | max end: 456532.691876 s
          qemu-system-x86:(7)   |    179.764 ms |     2191 | avg:   0.152 ms | max:  21.857 ms | max start: 456532.576434 s | max end: 456532.598291 s
          sshd:48125            |      0.522 ms |        2 | avg:   0.037 ms | max:   0.046 ms | max start: 456532.514610 s | max end: 456532.514656 s
        <SNIP>
          ksoftirqd/11:82       |      0.063 ms |        1 | avg:   0.005 ms | max:   0.005 ms | max start: 456532.769366 s | max end: 456532.769371 s
          kworker/9:0-mm_:34624 |      0.233 ms |       20 | avg:   0.004 ms | max:   0.007 ms | max start: 456532.690804 s | max end: 456532.690812 s
          migration/13:93       |      0.000 ms |        1 | avg:   0.004 ms | max:   0.004 ms | max start: 456532.512669 s | max end: 456532.512674 s
         -----------------------------------------------------------------------------------------------------------------
          TOTAL:                |   3180.750 ms |    41368 |
         ---------------------------------------------------
      
        # echo $?
        0
      
        # perf sched map
          *A0                                                               456532.510141 secs A0 => migration/0:15
          *.                                                                456532.510171 secs .  => swapper:0
           .  *B0                                                           456532.510261 secs B0 => migration/1:21
           .  *.                                                            456532.510279 secs
        <SNIP>
           L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7 *L7  .   .   .   .    456532.785979 secs
           L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7 *L7  .   .   .    456532.786054 secs
           L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7 *L7  .   .    456532.786127 secs
           L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7 *L7  .    456532.786197 secs
           L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7 *L7   456532.786270 secs
        # echo $?
        0
      
        # perf sched replay
        run measurement overhead: 108 nsecs
        sleep measurement overhead: 66473 nsecs
        the run test took 1000002 nsecs
        the sleep test took 1082686 nsecs
        nr_run_events:        49334
        nr_sleep_events:      50054
        nr_wakeup_events:     34701
        target-less wakeups:  165
        multi-target wakeups: 766
        task      0 (             swapper:         0), nr_events: 15419
        task      1 (             swapper:         1), nr_events: 1
        task      2 (             swapper:         2), nr_events: 1
        <SNIP>
        task    715 (     sched-messaging:    110248), nr_events: 1438
        task    716 (     sched-messaging:    110249), nr_events: 512
        task    717 (     sched-messaging:    110250), nr_events: 500
        task    718 (     sched-messaging:    110251), nr_events: 537
        task    719 (     sched-messaging:    110252), nr_events: 823
        ------------------------------------------------------------
        #1  : 1325.288, ravg: 1325.29, cpu: 7823.35 / 7823.35
        #2  : 1363.606, ravg: 1329.12, cpu: 7655.53 / 7806.56
        #3  : 1349.494, ravg: 1331.16, cpu: 7544.80 / 7780.39
        #4  : 1311.488, ravg: 1329.19, cpu: 7495.13 / 7751.86
        #5  : 1309.902, ravg: 1327.26, cpu: 7266.65 / 7703.34
        #6  : 1309.535, ravg: 1325.49, cpu: 7843.86 / 7717.39
        #7  : 1316.482, ravg: 1324.59, cpu: 7854.41 / 7731.09
        #8  : 1366.604, ravg: 1328.79, cpu: 7955.81 / 7753.57
        #9  : 1326.286, ravg: 1328.54, cpu: 7466.86 / 7724.90
        #10 : 1356.653, ravg: 1331.35, cpu: 7566.60 / 7709.07
        # echo $?
        0
      Signed-off-by: default avatarYang Jihong <yangjihong1@huawei.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240206083228.172607-5-yangjihong1@huawei.com
      bd2cdf26
    • Yang Jihong's avatar
      perf sched: Move curr_thread initialization to perf_sched__map() · 5e895278
      Yang Jihong authored
      The curr_thread is used only for the 'perf sched map'. Put initialization
      in perf_sched__map() to reduce unnecessary actions in other commands.
      
      Simple functional testing:
      
        # perf sched record perf bench sched messaging
        # Running 'sched/messaging' benchmark:
        # 20 sender and receiver processes per group
        # 10 groups == 400 processes run
      
             Total time: 0.197 [sec]
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 15.526 MB perf.data (140095 samples) ]
      
        # perf sched map
          *A0                                                               451264.532445 secs A0 => migration/0:15
          *.                                                                451264.532468 secs .  => swapper:0
           .  *B0                                                           451264.532537 secs B0 => migration/1:21
           .  *.                                                            451264.532560 secs
           .   .  *C0                                                       451264.532644 secs C0 => migration/2:27
           .   .  *.                                                        451264.532668 secs
           .   .   .  *D0                                                   451264.532753 secs D0 => migration/3:33
           .   .   .  *.                                                    451264.532778 secs
           .   .   .   .  *E0                                               451264.532861 secs E0 => migration/4:39
           .   .   .   .  *.                                                451264.532886 secs
           .   .   .   .   .  *F0                                           451264.532973 secs F0 => migration/5:45
        <SNIP>
           A7  A7  A7  A7  A7 *A7  .   .   .   .   .   .   .   .   .   .    451264.790785 secs
           A7  A7  A7  A7  A7  A7 *A7  .   .   .   .   .   .   .   .   .    451264.790858 secs
           A7  A7  A7  A7  A7  A7  A7 *A7  .   .   .   .   .   .   .   .    451264.790934 secs
           A7  A7  A7  A7  A7  A7  A7  A7 *A7  .   .   .   .   .   .   .    451264.791004 secs
           A7  A7  A7  A7  A7  A7  A7  A7  A7 *A7  .   .   .   .   .   .    451264.791075 secs
           A7  A7  A7  A7  A7  A7  A7  A7  A7  A7 *A7  .   .   .   .   .    451264.791143 secs
           A7  A7  A7  A7  A7  A7  A7  A7  A7  A7  A7 *A7  .   .   .   .    451264.791232 secs
           A7  A7  A7  A7  A7  A7  A7  A7  A7  A7  A7  A7 *A7  .   .   .    451264.791336 secs
           A7  A7  A7  A7  A7  A7  A7  A7  A7  A7  A7  A7  A7 *A7  .   .    451264.791407 secs
           A7  A7  A7  A7  A7  A7  A7  A7  A7  A7  A7  A7  A7  A7 *A7  .    451264.791484 secs
           A7  A7  A7  A7  A7  A7  A7  A7  A7  A7  A7  A7  A7  A7  A7 *A7   451264.791553 secs
        # echo $?
        0
      Signed-off-by: default avatarYang Jihong <yangjihong1@huawei.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240206083228.172607-4-yangjihong1@huawei.com
      5e895278
    • Yang Jihong's avatar
      perf sched: Fix memory leak in perf_sched__map() · ef76a5af
      Yang Jihong authored
      perf_sched__map() needs to free memory of map_cpus, color_pids and
      color_cpus in normal path and rollback allocated memory in error path.
      Signed-off-by: default avatarYang Jihong <yangjihong1@huawei.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240206083228.172607-3-yangjihong1@huawei.com
      ef76a5af