1. 17 Feb, 2023 3 commits
    • Kajol Jain's avatar
      perf tests stat_all_metrics: Change true workload to sleep workload for system wide check · f9fa0778
      Kajol Jain authored
      Testcase stat_all_metrics.sh fails in powerpc:
      
      98: perf all metrics test : FAILED!
      
      Logs with verbose:
      
        [command]# ./perf test 98 -vv
         98: perf all metrics test                                           :
         --- start ---
        test child forked, pid 13262
        Testing BRU_STALL_CPI
        Testing COMPLETION_STALL_CPI
         ----
        Testing TOTAL_LOCAL_NODE_PUMPS_P23
        Metric 'TOTAL_LOCAL_NODE_PUMPS_P23' not printed in:
        Error:
        Invalid event (hv_24x7/PM_PB_LNS_PUMP23,chip=3/) in per-thread mode, enable system wide with '-a'.
        Testing TOTAL_LOCAL_NODE_PUMPS_RETRIES_P01
        Metric 'TOTAL_LOCAL_NODE_PUMPS_RETRIES_P01' not printed in:
        Error:
        Invalid event (hv_24x7/PM_PB_RTY_LNS_PUMP01,chip=3/) in per-thread mode, enable system wide with '-a'.
         ----
      
      Based on above logs, we could see some of the hv-24x7 metric events
      fails, and logs suggest to run the metric event with -a option.  This
      change happened after the commit a4b8cfca ("perf stat: Delay
      metric parsing"), which delayed the metric parsing phase and now before
      metric parsing phase perf tool identifies, whether target is system-wide
      or not. With this change, perf_event_open will fails with workload
      monitoring for uncore events as expected.
      
      The perf all metric test case fails as some of the hv-24x7 metric events
      may need bigger workload with system wide monitoring to get the data.
      Fix this issue by changing current system wide check from true workload
      to sleep 0.01 workload.
      
      Result with the patch changes in powerpc:
      
        98: perf all metrics test : Ok
      
      Fixes: a4b8cfca ("perf stat: Delay metric parsing")
      Suggested-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarAthira Rajeev <atrajeev@linux.vnet.ibm.com>
      Signed-off-by: default avatarKajol Jain <kjain@linux.ibm.com>
      Tested-by: default avatarDisha Goel <disgoel@linux.ibm.com>
      Tested-by: default avatarIan Rogers <irogers@google.com>
      Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
      Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: https://lore.kernel.org/r/20230215093827.124921-1-kjain@linux.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f9fa0778
    • Athira Rajeev's avatar
      perf vendor events power10: Add JSON metric events to present CPI stall cycles in powerpc · cf26e043
      Athira Rajeev authored
      Power10 Performance Monitoring Unit (PMU) provides events to understand
      stall cycles of different pipeline stages.  These events along with
      completed instructions provides useful metrics for application tuning.
      
      Patch implements the JSON changes to collect counter statistics to
      present the high level CPI stall breakdown metrics. New metric group is
      named as "CPI_STALL_RATIO" and this new metric group presents these
      stall metrics:
      
      - DISPATCHED_CPI ( Dispatch stall cycles per insn )
      - ISSUE_STALL_CPI ( Issue stall cycles per insn )
      - EXECUTION_STALL_CPI ( Execution stall cycles per insn )
      - COMPLETION_STALL_CPI ( Completition stall cycles per insn )
      
      To avoid multipling of events, PM_RUN_INST_CMPL event has been modified
      to use PMC5(performance monitoring counter5) instead of PMC4. This
      change is needed, since completion stall event is using PMC4.
      
      Usage example:
      
       ./perf stat --metric-no-group -M CPI_STALL_RATIO <workload>
      
       Performance counter stats for 'workload':
      
          63,056,817,982      PM_CMPL_STALL                    #     0.28 COMPLETION_STALL_CPI
       1,743,988,038,896      PM_ISSUE_STALL                   #     7.73 ISSUE_STALL_CPI
         225,597,495,030      PM_RUN_INST_CMPL                 #     6.18 DISPATCHED_CPI
                                                        #    37.48 EXECUTION_STALL_CPI
       1,393,916,546,654      PM_DISP_STALL_CYC
       8,455,376,836,463      PM_EXEC_STALL
      
      "--metric-no-group" is used for forcing PM_RUN_INST_CMPL to be scheduled
      in all group for more accuracy.
      Signed-off-by: default avatarAthira Rajeev <atrajeev@linux.vnet.ibm.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Disha Goel <disgoel@linux.ibm.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: https://lore.kernel.org/r/20230216061240.18067-1-atrajeev@linux.vnet.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      cf26e043
    • Steinar H. Gunderson's avatar
      perf intel-pt: Synthesize cycle events · 7e55b956
      Steinar H. Gunderson authored
      There is no good reason why we cannot synthesize "cycle" events from
      Intel PT just as we can synthesize "instruction" events, in particular
      when CYC packets are available. This enables using PT to getting much
      more accurate cycle profiles than regular sampling (record -e cycles)
      when the work last for very short periods (<10 ms).  Thus, add support
      for this, based off of the existing IPC calculation framework. The new
      option to --itrace is "y" (for cYcles), as c was taken for calls. Cycle
      and instruction events can be synthesized together, and are by default.
      
      The only real caveat is that CYC packets are only emitted whenever some
      other packet is, which in practice is when a branch instruction is
      encountered (and not even all branches). Thus, even at no subsampling
      (e.g. --itrace=y0ns), it is impossible to get more accuracy than a
      single basic block, and all cycles spent executing that block will get
      attributed to the branch instruction that ends the packet.  Thus, one
      cannot know whether the cycles came from e.g. a specific load, a
      mispredicted branch, or something else. When subsampling (which is the
      default), the cycle events will get smeared out even more, but will
      still be generally useful to attribute cycle counts to functions.
      Reviewed-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarSteinar H. Gunderson <sesse@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220322082452.1429091-1-sesse@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7e55b956
  2. 16 Feb, 2023 1 commit
    • Feng Tang's avatar
      perf c2c: Add report option to show false sharing in adjacent cachelines · 1470a108
      Feng Tang authored
      Many platforms have feature of adjacent cachelines prefetch, when it is
      enabled, for data in RAM of 2 cachelines (2N and 2N+1) granularity, if
      one is fetched to cache, the other one could likely be fetched too,
      which sort of extends the cacheline size to double, thus the false
      sharing could happens in adjacent cachelines.
      
      0Day has captured performance changed related with this [1], and some
      commercial software explicitly makes its hot global variables 128 bytes
      aligned (2 cache lines) to avoid this kind of extended false sharing.
      
      So add an option "--double-cl" for 'perf c2c report' to show false
      sharing in double cache line granularity, which acts just like the
      cacheline size is doubled. There is no change to c2c record. The
      hardware events of shared cacheline are still per cacheline, and this
      option just changes the granularity of how events are grouped and
      displayed.
      
      In the 'perf c2c report' output below (will-it-scale's 'pagefault2' case
      on old kernel):
      
        ----------------------------------------------------------------------
           26       31        2        0        0        0  0xffff888103ec6000
        ----------------------------------------------------------------------
         35.48%   50.00%    0.00%    0.00%    0.00%   0x10     0       1  0xffffffff8133148b   1153   66    971   3748   74  [k] get_mem_cgroup_from_mm
          6.45%    0.00%    0.00%    0.00%    0.00%   0x10     0       1  0xffffffff813396e4    570    0   1531    879   75  [k] mem_cgroup_charge
         25.81%   50.00%    0.00%    0.00%    0.00%   0x54     0       1  0xffffffff81331472    949   70    593   3359   74  [k] get_mem_cgroup_from_mm
         19.35%    0.00%    0.00%    0.00%    0.00%   0x54     0       1  0xffffffff81339686   1352    0   1073   1022   74  [k] mem_cgroup_charge
          9.68%    0.00%    0.00%    0.00%    0.00%   0x54     0       1  0xffffffff813396d6   1401    0    863    768   74  [k] mem_cgroup_charge
          3.23%    0.00%    0.00%    0.00%    0.00%   0x54     0       1  0xffffffff81333106    618    0    804     11    9  [k] uncharge_batch
      
      The offset 0x10 and 0x54 used to displayed in 2 groups, and now they are
      listed together to give users a hint of extended false sharing.
      
      [1]. https://lore.kernel.org/lkml/20201102091543.GM31092@shao2-debian/
      
      Committer notes:
      
      Link: https://lore.kernel.org/r/Y+wvVNWqXb70l4uy@feng-clx
      
      Removed -a, leaving just as --double-cl, as this probably is not used so
      frequently and perhaps will be even auto-detected if we manage to record
      the MSR where this is configured.
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Reviewed-by: default avatarLeo Yan <leo.yan@linaro.org>
      Signed-off-by: default avatarFeng Tang <feng.tang@intel.com>
      Tested-by: default avatarLeo Yan <leo.yan@linaro.org>
      Acked-by: default avatarJoe Mario <jmario@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tim Chen <tim.c.chen@intel.com>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Link: https://lore.kernel.org/r/20230214075823.246414-1-feng.tang@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      1470a108
  3. 15 Feb, 2023 1 commit
    • Yang Jihong's avatar
      perf record: Fix segfault with --overwrite and --max-size · 91621be6
      Yang Jihong authored
      When --overwrite and --max-size options of perf record are used
      together, a segmentation fault occurs. The following is an example:
      
        # perf record -e sched:sched* --overwrite --max-size 1K -a -- sleep 1
        [ perf record: Woken up 1 times to write data ]
        perf: Segmentation fault
        Obtained 12 stack frames.
        ./perf/perf(+0x197673) [0x55f99710b673]
        /lib/x86_64-linux-gnu/libc.so.6(+0x3ef0f) [0x7fa45f3cff0f]
        ./perf/perf(+0x8eb40) [0x55f997002b40]
        ./perf/perf(+0x1f6882) [0x55f99716a882]
        ./perf/perf(+0x794c2) [0x55f996fed4c2]
        ./perf/perf(+0x7b7c7) [0x55f996fef7c7]
        ./perf/perf(+0x9074b) [0x55f99700474b]
        ./perf/perf(+0x12e23c) [0x55f9970a223c]
        ./perf/perf(+0x12e54a) [0x55f9970a254a]
        ./perf/perf(+0x7db60) [0x55f996ff1b60]
        /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe6) [0x7fa45f3b2c86]
        ./perf/perf(+0x7dfe9) [0x55f996ff1fe9]
        Segmentation fault (core dumped)
      
      backtrace of the core file is as follows:
      
        (gdb) bt
        #0  record__bytes_written (rec=0x55f99755a200 <record>) at builtin-record.c:234
        #1  record__output_max_size_exceeded (rec=0x55f99755a200 <record>) at builtin-record.c:242
        #2  record__write (map=0x0, size=12816, bf=0x55f9978da2e0, rec=0x55f99755a200 <record>) at builtin-record.c:263
        #3  process_synthesized_event (tool=tool@entry=0x55f99755a200 <record>, event=event@entry=0x55f9978da2e0, sample=sample@entry=0x0, machine=machine@entry=0x55f997893658) at builtin-record.c:618
        #4  0x000055f99716a883 in __perf_event__synthesize_id_index (tool=tool@entry=0x55f99755a200 <record>, process=process@entry=0x55f997002aa0 <process_synthesized_event>, evlist=0x55f9978928b0, machine=machine@entry=0x55f997893658,
            from=from@entry=0) at util/synthetic-events.c:1895
        #5  0x000055f99716a91f in perf_event__synthesize_id_index (tool=tool@entry=0x55f99755a200 <record>, process=process@entry=0x55f997002aa0 <process_synthesized_event>, evlist=<optimized out>, machine=machine@entry=0x55f997893658)
            at util/synthetic-events.c:1905
        #6  0x000055f996fed4c3 in record__synthesize (tail=tail@entry=true, rec=0x55f99755a200 <record>) at builtin-record.c:1997
        #7  0x000055f996fef7c8 in __cmd_record (argc=argc@entry=2, argv=argv@entry=0x7ffc67551260, rec=0x55f99755a200 <record>) at builtin-record.c:2802
        #8  0x000055f99700474c in cmd_record (argc=<optimized out>, argv=0x7ffc67551260) at builtin-record.c:4258
        #9  0x000055f9970a223d in run_builtin (p=0x55f997564d88 <commands+264>, argc=10, argv=0x7ffc67551260) at perf.c:330
        #10 0x000055f9970a254b in handle_internal_command (argc=10, argv=0x7ffc67551260) at perf.c:384
        #11 0x000055f996ff1b61 in run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:428
        #12 main (argc=<optimized out>, argv=0x7ffc67551260) at perf.c:562
      
      The reason is that record__bytes_written accesses the freed memory rec->thread_data,
      The process is as follows:
        __cmd_record
          -> record__free_thread_data
            -> zfree(&rec->thread_data)         // free rec->thread_data
          -> record__synthesize
            -> perf_event__synthesize_id_index
              -> process_synthesized_event
                -> record__write
                  -> record__bytes_written      // access rec->thread_data
      
      We add a member variable "thread_bytes_written" in the struct "record"
      to save the data size written by the threads.
      
      Fixes: 6d575816 ("perf record: Add support for limit perf output file size")
      Signed-off-by: default avatarYang Jihong <yangjihong1@huawei.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiwei Sun <jiwei.sun@windriver.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/CAM9d7ci_TRrqBQVQNW8=GwakUr7SsZpYxaaty-S4bxF8zJWyqw@mail.gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      91621be6
  4. 09 Feb, 2023 1 commit
    • Ian Rogers's avatar
      perf stat: Avoid merging/aggregating metric counts twice · 37f322cd
      Ian Rogers authored
      The added perf_stat_merge_counters combines uncore counters. When
      metrics are enabled, the counts are merged into a metric_leader via the
      stat-shadow saved_value logic. As the leader now is passed an aggregated
      count, it leads to all counters being added together twice and counts
      appearing approximately doubled in metrics.
      
      This change disables the saved_value merging of counts for evsels that
      are merged. It is recommended that later changes remove the saved_value
      entirely as the two layers of aggregation in the code is confusing.
      
      Fixes: 942c5593 ("perf stat: Add perf_stat_merge_counters()")
      Reported-by: default avatarPerry Taylor <perry.taylor@intel.com>
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Eduard Zingerman <eddyz87@gmail.com>
      Cc: Florian Fischer <florian.fischer@muhq.space>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Link: https://lore.kernel.org/r/20230209064447.83733-1-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      37f322cd
  5. 08 Feb, 2023 5 commits
    • Thomas Richter's avatar
      perf tools: Fix perf tool build error in util/pfm.c · 6a5558f1
      Thomas Richter authored
      I have downloaded linux-next and build the perf tool using
      
        # make LIBPFM4=1
      
      to have libpfm4 support built into perf. The build fails:
      
       # make LIBPFM4=1
      ....
      INSTALL libbpf_headers
        CC      util/pfm.o
      util/pfm.c: In function ‘print_libpfm_event’:
      util/pfm.c:189:9: error: too many arguments to function ‘print_cb->print_event’
        189 |         print_cb->print_event(print_state,
            |         ^~~~~~~~
      util/pfm.c:220:25: error: too many arguments to function ‘print_cb->print_event’
        220 |                         print_cb->print_event(print_state,
      
      The build error is caused by commit d9dc8874 ("perf pmu-events:
      Remove now unused event and metric variables") which changes the
      function prototype of
      
        struct print_callbacks {
            ...
            void (*print_event)(...);  --> last two parameters removed.
        };
      
      but does not adjust the usage of this function prototype in util/pfm.c.
      In file util/pfm.c function print_event() is still invoked with 13
      parameters instead of 11. The compile fails.
      
      When I adjust the file util/pfm.c as in this patch, the build works file.
      Please check this patch for correctness, I have just fixed the compile
      issue.
      
      Fixes: d9dc8874 ("perf pmu-events: Remove now unused event and metric variables")
      Signed-off-by: default avatarThomas Richter <tmricht@linux.ibm.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: default avatarIan Rogers <irogers@google.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: egorenar@linux.ibm.com
      Cc: linux-kernel-next@vger.kernel.org
      Link: https://lore.kernel.org/r/20230207140447.1827741-1-tmricht@linux.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6a5558f1
    • Yicong Yang's avatar
      perf tools: Fix auto-complete on aarch64 · ffd1240e
      Yicong Yang authored
      On aarch64 CPU related events are not under event_source/devices/cpu/events,
      they're under event_source/devices/armv8_pmuv3_0/events on my machine.
      Using current auto-complete script will generate below error:
      
        [root@localhost bin]# perf stat -e
        ls: cannot access '/sys/bus/event_source/devices/cpu/events': No such file or directory
      
      Fix this by not testing /sys/bus/event_source/devices/cpu/events on
      aarch64 machine.
      
      Fixes: 74cd5815 ("perf tool: Improve bash command line auto-complete for multiple events with comma")
      Reviewed-by: default avatarJames Clark <james.clark@arm.com>
      Signed-off-by: default avatarYicong Yang <yangyicong@hisilicon.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linuxarm@huawei.com
      Cc: prime.zeng@hisilicon.com
      Link: https://lore.kernel.org/r/20230207035057.43394-1-yangyicong@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ffd1240e
    • Namhyung Kim's avatar
      perf lock contention: Support old rw_semaphore type · 1bece135
      Namhyung Kim authored
      The old kernel has a different type of the owner field in rwsem.  We can
      check it using bpf_core_type_matches() builtin in clang but it also
      needs its own version check since it's available on recent versions.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Hao Luo <haoluo@google.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: bpf@vger.kernel.org
      Link: https://lore.kernel.org/r/20230207002403.63590-4-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      1bece135
    • Namhyung Kim's avatar
      perf lock contention: Add -o/--lock-owner option · 3477f079
      Namhyung Kim authored
      When there're many lock contentions in the system, people sometimes want
      to know who caused the contention, IOW who's the owner of the locks.
      
      The -o/--lock-owner option tries to follow the lock owners for the
      contended mutexes and rwsems from BPF, and then attributes the
      contention time to the owner instead of the waiter.  It's a best effort
      approach to get the owner info at the time of the contention and doesn't
      guarantee to have the precise tracking of owners if it's changing over
      time.
      
      Currently it only handles mutex and rwsem that have owner field in their
      struct and it basically points to a task_struct that owns the lock at
      the moment.
      
      Technically its type is atomic_long_t and it comes with some LSB bits
      used for other meanings.  So it needs to clear them when casting it to a
      pointer to task_struct.
      
      Also the atomic_long_t is a typedef of the atomic 32 or 64 bit types
      depending on arch which is a wrapper struct for the counter value.  I'm
      not aware of proper ways to access those kernel atomic types from BPF so
      I just read the internal counter value directly.  Please let me know if
      there's a better way.
      
      When -o/--lock-owner option is used, it goes to the task aggregation
      mode like -t/--threads option does.  However it cannot get the owner for
      other lock types like spinlock and sometimes even for mutex.
      
        $ sudo ./perf lock con -abo -- ./perf bench sched pipe
        # Running 'sched/pipe' benchmark:
        # Executed 1000000 pipe operations between two processes
      
             Total time: 4.766 [sec]
      
               4.766540 usecs/op
                 209795 ops/sec
         contended   total wait     max wait     avg wait          pid   owner
      
               403    565.32 us     26.81 us      1.40 us           -1   Unknown
                 4     27.99 us      8.57 us      7.00 us      1583145   sched-pipe
                 1      8.25 us      8.25 us      8.25 us      1583144   sched-pipe
                 1      2.03 us      2.03 us      2.03 us         5068   chrome
      
      As you can see, the owner is unknown for the most cases.  But if we
      filter only for the mutex locks, it'd more likely get the onwers.
      
        $ sudo ./perf lock con -abo -Y mutex -- ./perf bench sched pipe
        # Running 'sched/pipe' benchmark:
        # Executed 1000000 pipe operations between two processes
      
             Total time: 4.910 [sec]
      
               4.910435 usecs/op
                 203647 ops/sec
         contended   total wait     max wait     avg wait          pid   owner
      
                 2     15.50 us      8.29 us      7.75 us      1582852   sched-pipe
                 7      7.20 us      2.47 us      1.03 us           -1   Unknown
                 1      6.74 us      6.74 us      6.74 us      1582851   sched-pipe
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Hao Luo <haoluo@google.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: bpf@vger.kernel.org
      Link: https://lore.kernel.org/r/20230207002403.63590-3-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3477f079
    • Namhyung Kim's avatar
      perf lock contention: Fix to save callstack for the default modified · 55e39185
      Namhyung Kim authored
      The previous change missed to set the con->save_callstack for the
      LOCK_AGGR_CALLER mode resulting in no caller information.
      
      Fixes: ebab2916 ("perf lock contention: Support filters for different aggregation")
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Hao Luo <haoluo@google.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: bpf@vger.kernel.org
      Link: https://lore.kernel.org/r/20230207002403.63590-2-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      55e39185
  6. 06 Feb, 2023 8 commits
  7. 05 Feb, 2023 8 commits
    • Linus Torvalds's avatar
      Linux 6.2-rc7 · 4ec5183e
      Linus Torvalds authored
      4ec5183e
    • Linus Torvalds's avatar
      Merge tag 'usb-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · c608f6b5
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are some small USB fixes that resolve some reported problems.
        These include:
      
         - gadget driver fixes
      
         - dwc3 driver fix
      
         - typec driver fix
      
         - MAINTAINERS file update.
      
        All of these have been in linux-next with no reported problems"
      
      * tag 'usb-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        usb: typec: ucsi: Don't attempt to resume the ports before they exist
        usb: gadget: udc: do not clear gadget driver.bus
        usb: gadget: f_uac2: Fix incorrect increment of bNumEndpoints
        usb: gadget: f_fs: Fix unbalanced spinlock in __ffs_ep0_queue_wait
        usb: dwc3: qcom: enable vbus override when in OTG dr-mode
        MAINTAINERS: Add myself as UVC Gadget Maintainer
      c608f6b5
    • Linus Torvalds's avatar
      Merge tag 'tty-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · dc0ce181
      Linus Torvalds authored
      Pull tty/serial driver fixes from Greg KH:
       "Here are some small serial and vt fixes. These include:
      
         - 8250 driver fixes relating to dma issues
      
         - stm32 serial driver fix for threaded irqs
      
         - vc_screen bugfix for reported problems.
      
        All have been in linux-next for a while with no reported problems"
      
      * tag 'tty-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        vc_screen: move load of struct vc_data pointer in vcs_read() to avoid UAF
        serial: 8250_dma: Fix DMA Rx rearm race
        serial: 8250_dma: Fix DMA Rx completion race
        serial: stm32: Merge hard IRQ and threaded IRQ handling into single IRQ handler
      dc0ce181
    • Linus Torvalds's avatar
      Merge tag 'char-misc-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · d3feaff4
      Linus Torvalds authored
      Pull char/misc driver fixes from Greg KH:
       "Here are a number of small char/misc/whatever driver fixes. They
        include:
      
         - IIO driver fixes for some reported problems
      
         - nvmem driver fixes
      
         - fpga driver fixes
      
         - debugfs memory leak fix in the hv_balloon and irqdomain code
           (irqdomain change was acked by the maintainer)
      
        All have been in linux-next with no reported problems"
      
      * tag 'char-misc-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (33 commits)
        kernel/irq/irqdomain.c: fix memory leak with using debugfs_lookup()
        HV: hv_balloon: fix memory leak with using debugfs_lookup()
        nvmem: qcom-spmi-sdam: fix module autoloading
        nvmem: core: fix return value
        nvmem: core: fix cell removal on error
        nvmem: core: fix device node refcounting
        nvmem: core: fix registration vs use race
        nvmem: core: fix cleanup after dev_set_name()
        nvmem: core: remove nvmem_config wp_gpio
        nvmem: core: initialise nvmem->id early
        nvmem: sunxi_sid: Always use 32-bit MMIO reads
        nvmem: brcm_nvram: Add check for kzalloc
        iio: imu: fxos8700: fix MAGN sensor scale and unit
        iio: imu: fxos8700: remove definition FXOS8700_CTRL_ODR_MIN
        iio: imu: fxos8700: fix failed initialization ODR mode assignment
        iio: imu: fxos8700: fix incorrect ODR mode readback
        iio: light: cm32181: Fix PM support on system with 2 I2C resources
        iio: hid: fix the retval in gyro_3d_capture_sample
        iio: hid: fix the retval in accel_3d_capture_sample
        iio: imu: st_lsm6dsx: fix build when CONFIG_IIO_TRIGGERED_BUFFER=m
        ...
      d3feaff4
    • Linus Torvalds's avatar
      Merge tag 'fbdev-for-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev · 870c3a9a
      Linus Torvalds authored
      Pull fbdev fixes from Helge Deller:
      
       - fix fbcon to prevent fonts bigger than 32x32 pixels to avoid
         overflows reported by syzbot
      
       - switch omapfb to use kstrtobool()
      
       - switch some fbdev drivers to use the backlight helpers
      
      * tag 'fbdev-for-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
        fbcon: Check font dimension limits
        fbdev: omapfb: Use kstrtobool() instead of strtobool()
        fbdev: fbmon: fix function name in kernel-doc
        fbdev: atmel_lcdfb: Rework backlight status updates
        fbdev: riva: Use backlight helper
        fbdev: omapfb: panel-dsi-cm: Use backlight helper
        fbdev: nvidia: Use backlight helper
        fbdev: mx3fb: Use backlight helper
        fbdev: radeon: Use backlight helper
        fbdev: atyfb: Use backlight helper
        fbdev: aty128fb: Use backlight helper
      870c3a9a
    • Linus Torvalds's avatar
      Merge tag 'x86_urgent_for_v6.2_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9e482602
      Linus Torvalds authored
      Pull x86 fix from Borislav Petkov:
      
       - Prevent the compiler from reordering accesses to debug regs which
         could cause a #VC exception in SEV-ES guests at the wrong place in
         the NMI handling path
      
      * tag 'x86_urgent_for_v6.2_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/debug: Fix stack recursion caused by wrongly ordered DR7 accesses
      9e482602
    • Linus Torvalds's avatar
      Merge tag 'perf_urgent_for_v6.2_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · de506eec
      Linus Torvalds authored
      Pull perf fix from Borislav Petkov:
      
       - Lock the proper critical section when dealing with perf event context
      
      * tag 'perf_urgent_for_v6.2_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf: Fix perf_event_pmu_context serialization
      de506eec
    • Linus Torvalds's avatar
      Merge tag 'powerpc-6.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 837c07cf
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "It's a bit of a big batch for rc6, but just because I didn't send any
        fixes the last week or two while I was on vacation, next week should
        be quieter:
      
         - Fix a few objtool warnings since we recently enabled objtool.
      
         - Fix a deadlock with the hash MMU vs perf record.
      
         - Fix perf profiling of asynchronous interrupt handlers.
      
         - Revert the IMC PMU nest_init_lock to being a mutex.
      
         - Two commits fixing problems with the kexec_file FDT size
           estimation.
      
         - Two commits fixing problems with strict RWX vs kernels running at
           non-zero.
      
         - Reconnect tlb_flush() to hash__tlb_flush()
      
        Thanks to Kajol Jain, Nicholas Piggin, Sachin Sant Sathvika Vasireddy,
        and Sourabh Jain"
      
      * tag 'powerpc-6.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/64s: Reconnect tlb_flush() to hash__tlb_flush()
        powerpc/kexec_file: Count hot-pluggable memory in FDT estimate
        powerpc/64s/radix: Fix RWX mapping with relocated kernel
        powerpc/64s/radix: Fix crash with unaligned relocated kernel
        powerpc/kexec_file: Fix division by zero in extra size estimation
        powerpc/imc-pmu: Revert nest_init_lock to being a mutex
        powerpc/64: Fix perf profiling asynchronous interrupt handlers
        powerpc/64s: Fix local irq disable when PMIs are disabled
        powerpc/kvm: Fix unannotated intra-function call warning
        powerpc/85xx: Fix unannotated intra-function call warning
      837c07cf
  8. 04 Feb, 2023 7 commits
  9. 03 Feb, 2023 6 commits
    • Kan Liang's avatar
      perf script: Support Retire Latency · 17f248aa
      Kan Liang authored
      The Retire Latency field is added in the var3_w of the
      PERF_SAMPLE_WEIGHT_STRUCT. The Retire Latency reports the number of
      elapsed core clocks between the retirement of the instruction indicated
      by the Instruction Pointer field of the PEBS record and the retirement
      of the prior instruction. That's quite useful to display the information
      with perf script.
      
      Add a new field retire_lat for the Retire Latency information.
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20230104201349.1451191-9-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      17f248aa
    • Kan Liang's avatar
      perf report: Support Retire Latency · d7d213e0
      Kan Liang authored
      The Retire Latency field is added in the var3_w of the
      PERF_SAMPLE_WEIGHT_STRUCT. The Retire Latency reports pipeline stall of
      this instruction compared to the previous instruction in cycles.  That's
      quite useful to display the information with perf mem report.
      
      The p_stage_cyc for Power is also from the var3_w. Union the p_stage_cyc
      and retire_lat to share the code.
      
      Implement X86 specific codes to display the X86 specific header.
      
      Add a new sort key retire_lat for the Retire Latency.
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20230104201349.1451191-8-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d7d213e0
    • Namhyung Kim's avatar
      perf lock contention: Support filters for different aggregation · ebab2916
      Namhyung Kim authored
      It'd be useful to filter other than the current aggregation mode.  For
      example, users may want to see callstacks for specific locks only.  Or
      they may want tasks from a certain callstack.
      
      The tracepoints already collected the information but it needs to check
      the condition again when processing the event.  And it needs to change
      BPF to allow the key combinations.
      
      The lock contentions on 'rcu_state' spinlock can be monitored:
      
        $ sudo perf lock con -abv -L rcu_state sleep 1
        ...
         contended   total wait     max wait     avg wait         type   caller
      
                 4    151.39 us     62.57 us     37.85 us     spinlock   rcu_core+0xcb
                                0xffffffff81fd1666  _raw_spin_lock_irqsave+0x46
                                0xffffffff8172d76b  rcu_core+0xcb
                                0xffffffff822000eb  __softirqentry_text_start+0xeb
                                0xffffffff816a0ba9  __irq_exit_rcu+0xc9
                                0xffffffff81fc0112  sysvec_apic_timer_interrupt+0xa2
                                0xffffffff82000e46  asm_sysvec_apic_timer_interrupt+0x16
                                0xffffffff81d49f78  cpuidle_enter_state+0xd8
                                0xffffffff81d4a259  cpuidle_enter+0x29
                 1     30.21 us     30.21 us     30.21 us     spinlock   rcu_core+0xcb
                                0xffffffff81fd1666  _raw_spin_lock_irqsave+0x46
                                0xffffffff8172d76b  rcu_core+0xcb
                                0xffffffff822000eb  __softirqentry_text_start+0xeb
                                0xffffffff816a0ba9  __irq_exit_rcu+0xc9
                                0xffffffff81fc00c4  sysvec_apic_timer_interrupt+0x54
                                0xffffffff82000e46  asm_sysvec_apic_timer_interrupt+0x16
                 1     28.84 us     28.84 us     28.84 us     spinlock   rcu_accelerate_cbs_unlocked+0x40
                                0xffffffff81fd1c60  _raw_spin_lock+0x30
                                0xffffffff81728cf0  rcu_accelerate_cbs_unlocked+0x40
                                0xffffffff8172da82  rcu_core+0x3e2
                                0xffffffff822000eb  __softirqentry_text_start+0xeb
                                0xffffffff816a0ba9  __irq_exit_rcu+0xc9
                                0xffffffff81fc0112  sysvec_apic_timer_interrupt+0xa2
                                0xffffffff82000e46  asm_sysvec_apic_timer_interrupt+0x16
                                0xffffffff81d49f78  cpuidle_enter_state+0xd8
        ...
      
      To see tasks calling 'rcu_core' function:
      
        $ sudo perf lock con -abt -S rcu_core sleep 1
         contended   total wait     max wait     avg wait          pid   comm
      
                19     23.46 us      2.21 us      1.23 us            0   swapper
                 2     18.37 us     17.01 us      9.19 us      2061859   ThreadPoolForeg
                 3      5.76 us      1.97 us      1.92 us         3909   pipewire-pulse
                 1      2.26 us      2.26 us      2.26 us      1809271   MediaSu~isor #2
                 1      1.97 us      1.97 us      1.97 us      1514882   Chrome_ChildIOT
                 1       987 ns       987 ns       987 ns         3740   pipewire-pulse
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Hao Luo <haoluo@google.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <song@kernel.org>
      Cc: bpf@vger.kernel.org
      Link: https://lore.kernel.org/r/20230203021324.143540-4-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ebab2916
    • Namhyung Kim's avatar
      perf lock contention: Use lock_stat_find{,new} · 16cad1d3
      Namhyung Kim authored
      This is a preparation work to support complex keys of BPF maps.  Now it
      has single value key according to the aggregation mode like stack_id or
      pid.  But we want to use a combination of those keys.
      
      Then lock_contention_read() should still aggregate the result based on
      the key that was requested by user.  The other key info will be used for
      filtering.
      
      So instead of creating a lock_stat entry always, Check if it's already
      there using lock_stat_find() first.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Hao Luo <haoluo@google.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <song@kernel.org>
      Cc: bpf@vger.kernel.org
      Link: https://lore.kernel.org/r/20230203021324.143540-3-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      16cad1d3
    • Namhyung Kim's avatar
      perf lock contention: Factor out lock_contention_get_name() · 492fef21
      Namhyung Kim authored
      The lock_contention_get_name() returns a name for the lock stat entry
      based on the current aggregation mode.  As it's called sequentially in a
      single thread, it can return the address of a static buffer for symbol
      and offset of the caller.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Hao Luo <haoluo@google.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <song@kernel.org>
      Cc: bpf@vger.kernel.org
      Link: https://lore.kernel.org/r/20230203021324.143540-2-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      492fef21
    • Rob Herring's avatar
      perf arm-spe: Add raw decoding for SPEv1.2 previous branch address · 7105311c
      Rob Herring authored
      Arm SPEv1.2 adds a new optional address packet type: previous branch
      target. The recorded address is the target virtual address of the most
      recently taken branch in program order.
      
      Add support for decoding the address packet in raw dumps.
      Reviewed-by: default avatarLeo Yan <leo.yan@linaro.org>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20230203162401.132931-1-robh@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7105311c