1. 17 Apr, 2024 4 commits
    • Chaitanya S Prakash's avatar
      perf tools: Enable configs required for test_uprobe_from_different_cu.sh · 6b718ac6
      Chaitanya S Prakash authored
      Test "perf probe of function from different CU" fails due to certain
      configs not being enabled. Building the kernel with
      CONFIG_KPROBE_EVENTS=y and CONFIG_UPROBE_EVENTS=y fixes the issue. As
      CONFIG_KPROBE_EVENTS is dependent on CONFIG_KPROBES, enable it as well.
      Some platforms enable these configs as a part of their defconfig, so
      this change is only required for the ones that don't do so.
      Reviewed-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarChaitanya S Prakash <chaitanyas.prakash@arm.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: James Clark <james.clark@arm.com>
      Link: https://lore.kernel.org/r/20240408062230.1949882-1-ChaitanyaS.Prakash@arm.com
      Link: https://lore.kernel.org/r/20240408062230.1949882-7-ChaitanyaS.Prakash@arm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6b718ac6
    • Namhyung Kim's avatar
      perf report: Add weight[123] output fields · 7043dc52
      Namhyung Kim authored
      Add weight1, weight2 and weight3 fields to -F/--fields and their aliases
      like 'ins_lat', 'p_stage_cyc' and 'retire_lat'.  Note that they are in
      the sort keys too but the difference is that output fields will sum up
      the weight values and display the average.
      
      In the sort key, users can see the distribution of weight value and I
      think it's confusing we have local vs. global weight for the same weight.
      
      For example, I experiment with mem-loads events to get the weights.  On
      my laptop, it seems only weight1 field is supported.
      
        $ perf mem record -- perf test -w noploop
      
      Let's look at the noploop function only.  It has 7 samples.
      
        $ perf script -F event,ip,sym,weight | grep noploop
        # event                         weight     ip           sym
        cpu/mem-loads,ldlat=30/P:           43     55b3c122bffc noploop
        cpu/mem-loads,ldlat=30/P:           48     55b3c122bffc noploop
        cpu/mem-loads,ldlat=30/P:           38     55b3c122bffc noploop    <--- same weight
        cpu/mem-loads,ldlat=30/P:           38     55b3c122bffc noploop    <--- same weight
        cpu/mem-loads,ldlat=30/P:           59     55b3c122bffc noploop
        cpu/mem-loads,ldlat=30/P:           33     55b3c122bffc noploop
        cpu/mem-loads,ldlat=30/P:           38     55b3c122bffc noploop    <--- same weight
      
      When you use the 'weight' sort key, it'd show entries with a separate
      weight value separately.  Also note that the first entry has 3 samples
      with weight value 38, so they are displayed together and the weight
      value is the sum of 3 samples (114 = 38 * 3).
      
        $ perf report -n -s +weight | grep -e Weight -e noploop
        # Overhead  Samples  Command   Shared Object   Symbol         Weight
             0.53%        3     perf   perf            [.] noploop    114
             0.18%        1     perf   perf            [.] noploop    59
             0.18%        1     perf   perf            [.] noploop    48
             0.18%        1     perf   perf            [.] noploop    43
             0.18%        1     perf   perf            [.] noploop    33
      
      If you use 'local_weight' sort key, you can see the actual weight.
      
        $ perf report -n -s +local_weight | grep -e Weight -e noploop
        # Overhead  Samples  Command   Shared Object   Symbol         Local Weight
             0.53%        3     perf   perf            [.] noploop    38
             0.18%        1     perf   perf            [.] noploop    59
             0.18%        1     perf   perf            [.] noploop    48
             0.18%        1     perf   perf            [.] noploop    43
             0.18%        1     perf   perf            [.] noploop    33
      
      But when you use the -F/--field option instead, you can see the average
      weight for the while noploop function (as it won't group samples by
      weight value and use the default 'comm,dso,sym' sort keys).
      
        $ perf report -n -F +weight | grep -e Weight -e noploop
        Warning:
        --fields weight shows the average value unlike in the --sort key.
        # Overhead  Samples  Weight1  Command  Shared Object  Symbol
             1.23%        7     42.4  perf     perf           [.] noploop
      
      The weight1 field shows the average value:
        (38 * 3 + 59 + 48 + 43 + 33) / 7 = 42.4
      
      Also it'd show the warning that 'weight' field has the average value.
      Using 'weight1' can remove the warning.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20240411181718.2367948-3-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7043dc52
    • Namhyung Kim's avatar
      perf hist: Add weight fields to hist entry stats · 6fcf1e65
      Namhyung Kim authored
      Like period and sample numbers, it'd be better to track weight values
      and display them in the output rather than having them as sort keys.
      
      This patch just adds a few more fields to save the weights in a hist
      entry.  It'll be displayed as new output fields in the later patch.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20240411181718.2367948-2-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6fcf1e65
    • Namhyung Kim's avatar
      perf hist: Move histogram related code to hist.h · 0993d724
      Namhyung Kim authored
      It's strange that sort.h has the definition of struct hist_entry.  As
      sort.h already includes hist.h, let's move the data structure to hist.h.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20240411181718.2367948-1-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      0993d724
  2. 16 Apr, 2024 4 commits
    • Namhyung Kim's avatar
      perf annotate-data: Handle RSP if it's not the FB register · a5a00497
      Namhyung Kim authored
      In some cases, the stack pointer on x86 (rsp = reg7) is used to point
      variables on stack but it's not the frame base register.  Then it
      should handle the register like normal registers (IOW not to access
      the other stack variables using offset calculation) but it should not
      assume it would have a pointer.
      
      Before:
        -----------------------------------------------------------
        find data type for 0x7c(reg7) at tcp_getsockopt+0xb62
        CU for net/ipv4/tcp.c (die:0x7b5f516)
        frame base: cfa=0 fbreg=6
        no pointer or no type
        check variable "zc" failed (die: 0x7b9580a)
         variable location: base=reg7, offset=0x40
         type='struct tcp_zerocopy_receive' size=0x40 (die:0x7b947f4)
      
      After:
        -----------------------------------------------------------
        find data type for 0x7c(reg7) at tcp_getsockopt+0xb62
        CU for net/ipv4/tcp.c (die:0x7b5f516)
        frame base: cfa=0 fbreg=6
        found "zc" in scope=3/3 (die: 0x7b957fc) type_offset=0x3c
         variable location: base=reg7, offset=0x40
         type='struct tcp_zerocopy_receive' size=0x40 (die:0x7b947f4)
      
      Note that the type-offset was properly calculated to 0x3c as the
      variable starts at 0x40.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240412183310.2518474-5-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a5a00497
    • Namhyung Kim's avatar
      perf dwarf-aux: Check variable address range properly · 0519fadb
      Namhyung Kim authored
      In match_var_offset(), it just checked the end address of the variable
      with the given offset because it assumed the register holds a pointer
      to the data type and the offset starts from the base.
      
      But I found some cases that the stack pointer (rsp = reg7) register is
      used to pointer a stack variable while the frame base is maintained by a
      different register (rbp = reg6).  In that case, it cannot simply use the
      stack pointer as it cannot guarantee that it points to the frame base.
      So it needs to check both boundaries of the variable location.
      
      Before:
        -----------------------------------------------------------
        find data type for 0x7c(reg7) at tcp_getsockopt+0xb62
        CU for net/ipv4/tcp.c (die:0x7b5f516)
        frame base: cfa=0 fbreg=6
        no pointer or no type
        check variable "tss" failed (die: 0x7b95801)
         variable location: base reg7, offset=0x110
         type='struct scm_timestamping_internal' size=0x30 (die:0x7b8c126)
      
      So the current code just checks register number for the non-PC and
      non-FB registers and assuming it has offset 0.  But this variable has
      offset 0x110 so it should not match to this.
      
      After:
        -----------------------------------------------------------
        find data type for 0x7c(reg7) at tcp_getsockopt+0xb62
        CU for net/ipv4/tcp.c (die:0x7b5f516)
        frame base: cfa=0 fbreg=6
        no pointer or no type
        check variable "zc" failed (die: 0x7b9580a)
         variable location: base=reg7, offset=0x40
         type='struct tcp_zerocopy_receive' size=0x40 (die:7b947f4)
      
      Now it find the correct variable "zc".  It was located at reg7 + 0x40
      and the size if 0x40 which means it should cover [0x40, 0x80).  And the
      access was for reg7 + 0x7c so it found the right one.  But it still
      failed to use the variable and it would be handled in the next patch.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240412183310.2518474-4-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      0519fadb
    • Namhyung Kim's avatar
      perf dwarf-aux: Check pointer offset when checking variables · 645af3fb
      Namhyung Kim authored
      In match_var_offset(), it checks the offset range with the target type
      only for non-pointer types.  But it also needs to check the pointer
      types with the target type.
      
      This is because there can be more than one pointer variable located in
      the same register.  Let's look at the following example.  It's looking
      up a variable for reg3 at tcp_get_info+0x62.  It found "sk" variable but
      it wasn't the right one since it accesses beyond the target type (struct
      'sock' in this case) size.
      
        -----------------------------------------------------------
        find data type for 0x7bc(reg3) at tcp_get_info+0x62
        CU for net/ipv4/tcp.c (die:0x7b5f516)
        frame base: cfa=0 fbreg=6
        offset: 1980 is bigger than size: 760
        check variable "sk" failed (die: 0x7b92b2c)
         variable location: reg3
         type='struct sock' size=0x2f8 (die:0x7b63c3ab)
      
      Actually there was another variable "tp" in the function and it's
      located at the same (reg3) because it's just type-casted like below.
      
        void tcp_get_info(struct sock *sk, struct tcp_info *info)
        {
            const struct tcp_sock *tp = tcp_sk(sk);
            ...
      
      The 'struct tcp_sock' contains the 'struct sock' at offset 0 so it can
      just use the same address as a pointer to tcp_sock.  That means it
      should match variables correctly by checking the offset and size.
      Actually it cannot distinguish if the offset was smaller than the size
      of the original struct sock.  But I think it's fine as they are the same
      at that part.
      
      So let's check the target type size and retry if it doesn't match.
      Now it succeeded to find the correct variable.
      
        -----------------------------------------------------------
        find data type for 0x7bc(reg3) at tcp_get_info+0x62
        CU for net/ipv4/tcp.c (die:0x7b5f516)
        frame base: cfa=0 fbreg=6
        found "tp" in scope=1/1 (die: 0x7b92b16) type_offset=0x7bc
         variable location: reg3
         type='struct tcp_sock' size=0xa68 (die:0x7b81380)
      
      Fixes: bc10db8e ("perf annotate-data: Support stack variables")
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240412183310.2518474-3-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      645af3fb
    • Namhyung Kim's avatar
      perf annotate-data: Improve debug message with location info · 2bc3cf57
      Namhyung Kim authored
      To verify it found the correct variable, let's add the location
      expression to the debug message.
      
        $ perf --debug type-profile annotate --data-type
        ...
        -----------------------------------------------------------
        find data type for 0xaf0(reg15) at schedule+0xeb
        CU for kernel/sched/core.c (die:0x1180523)
        frame base: cfa=0 fbreg=6
        found "rq" in scope=3/4 (die: 0x11b6a00) type_offset=0xaf0
         variable location: reg15
         type='struct rq' size=0xfc0 (die:0x11892e2)
        -----------------------------------------------------------
        find data type for 0x7bc(reg3) at tcp_get_info+0x62
        CU for net/ipv4/tcp.c (die:0x7b5f516)
        frame base: cfa=0 fbreg=6
        offset: 1980 is bigger than size: 760
        check variable "sk" failed (die: 0x7b92b2c)
         variable location: reg3
         type='struct sock' size=0x2f8 (die:0x7b63c3ab)
        -----------------------------------------------------------
        ...
      
      The first case is fine.  It looked up a data type in r15 with offset of
      0xaf0 at schedule+0xeb.  It found the CU die and the frame base info and
      the variable "rq" was found in the scope 3/4.  Its location is the r15
      register and the type size is 0xfc0 which includes 0xaf0.
      
      But the second case is not good.  It looked up a data type in rbx (reg3)
      with offset 0x7bc.  It found a CU and the frame base which is good so
      far.  And it also found a variable "sk" but the access offset is bigger
      than the type size (1980 vs. 760 or 0x7bc vs. 0x2f8).  The variable has
      the right location (reg3) but I need to figure out why it accesses
      beyond what it's supposed to.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240412183310.2518474-2-namhyung@kernel.org
      [ Fix the build on 32-bit by casting Dwarf_Word to (long) in pr_debug_location() ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2bc3cf57
  3. 12 Apr, 2024 29 commits
  4. 08 Apr, 2024 3 commits