1. 25 Jan, 2024 1 commit
    • Ian Rogers's avatar
      perf pmu: Treat the msr pmu as software · 24852ef2
      Ian Rogers authored
      The msr PMU is a software one, meaning msr events may be grouped
      with events in a hardware context. As the msr PMU isn't marked as a
      software PMU by perf_pmu__is_software, groups with the msr PMU in
      are broken and the msr events placed in a different group. This
      may lead to multiplexing errors where a hardware event isn't
      counted while the msr event, such as tsc, is. Fix all of this by
      marking the msr PMU as software, which agrees with the driver.
      
      Before:
      ```
      $ perf stat -e '{slots,tsc}' -a true
      WARNING: events were regrouped to match PMUs
      
       Performance counter stats for 'system wide':
      
               1,750,335      slots
               4,243,557      tsc
      
             0.001456717 seconds time elapsed
      ```
      
      After:
      ```
      $ perf stat -e '{slots,tsc}' -a true
       Performance counter stats for 'system wide':
      
              12,526,380      slots
               3,415,163      tsc
      
             0.001488360 seconds time elapsed
      ```
      
      Fixes: 251aa040 ("perf parse-events: Wildcard most "numeric" events")
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Caleb Biggers <caleb.biggers@intel.com>
      Cc: Edward Baker <edward.baker@intel.com>
      Cc: Perry Taylor <perry.taylor@intel.com>
      Cc: Samantha Alt <samantha.alt@intel.com>
      Cc: Weilin Wang <weilin.wang@intel.com>
      Link: https://lore.kernel.org/r/20240124234200.1510417-1-irogers@google.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      24852ef2
  2. 24 Jan, 2024 10 commits
  3. 23 Jan, 2024 4 commits
  4. 22 Jan, 2024 14 commits
    • Yang Jihong's avatar
      perf data: Minor code style alignment cleanup · 57c8f107
      Yang Jihong authored
      Minor code style alignment cleanup for perf_data__switch() and
      perf_data__write().
      
      No functional change.
      Signed-off-by: default avatarYang Jihong <yangjihong1@huawei.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240119040304.3708522-4-yangjihong1@huawei.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      57c8f107
    • Yang Jihong's avatar
      perf record: Check conflict between '--timestamp-filename' option and pipe mode before recording · 02f9b50e
      Yang Jihong authored
      In pipe mode, no need to switch perf data output, therefore,
      '--timestamp-filename' option should not take effect.
      Check the conflict before recording and output WARNING.
      In this case, the check pipe mode in perf_data__switch() can be removed.
      
      Before:
      
        # perf record --timestamp-filename -o- perf test -w noploop | perf report -i- --percent-limit=1
        # To display the perf.data header info, please use --header/--header-only options.
        #
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Dump -.2024011812110182 ]
        #
        # Total Lost Samples: 0
        #
        # Samples: 4K of event 'cycles:P'
        # Event count (approx.): 2176784359
        #
        # Overhead  Command  Shared Object         Symbol
        # ........  .......  ....................  ......................................
        #
            97.83%  perf     perf                  [.] noploop
      
        #
        # (Tip: Print event counts in CSV format with: perf stat -x,)
        #
      
      After:
      
        # perf record --timestamp-filename -o- perf test -w noploop | perf report -i- --percent-limit=1
        WARNING: --timestamp-filename option is not available in pipe mode.
        # To display the perf.data header info, please use --header/--header-only options.
        #
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.000 MB - ]
        #
        # Total Lost Samples: 0
        #
        # Samples: 4K of event 'cycles:P'
        # Event count (approx.): 2185575421
        #
        # Overhead  Command  Shared Object          Symbol
        # ........  .......  .....................  .............................................
        #
            97.75%  perf     perf                   [.] noploop
      
        #
        # (Tip: Profiling branch (mis)predictions with: perf record -b / perf report)
        #
      
      Fixes: ecfd7a9c ("perf record: Add '--timestamp-filename' option to append timestamp to output file name")
      Signed-off-by: default avatarYang Jihong <yangjihong1@huawei.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240119040304.3708522-3-yangjihong1@huawei.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      02f9b50e
    • Yang Jihong's avatar
      perf record: Fix possible incorrect free in record__switch_output() · aff10a16
      Yang Jihong authored
      perf_data__switch() may not assign a legal value to 'new_filename'.
      In this case, 'new_filename' uses the on-stack value, which may cause a
      incorrect free and unexpected result.
      
      Fixes: 03724b2e ("perf record: Allow to limit number of reported perf.data files")
      Signed-off-by: default avatarYang Jihong <yangjihong1@huawei.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240119040304.3708522-2-yangjihong1@huawei.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      aff10a16
    • Namhyung Kim's avatar
      perf dwarf-aux: Check allowed DWARF Ops · 55442cc2
      Namhyung Kim authored
      The DWARF location expression can be fairly complex and it'd be hard
      to match it with the condition correctly.  So let's be conservative
      and only allow simple expressions.  For now it just checks the first
      operation in the list.  The following operations looks ok:
      
       * DW_OP_stack_value
       * DW_OP_deref_size
       * DW_OP_deref
       * DW_OP_piece
      
      To refuse complex (and unsupported) location expressions, add
      check_allowed_ops() to compare the rest of the list.  It seems earlier
      result contained those unsupported expressions.  For example, I found
      some local struct variable is placed like below.
      
       <2><43d1517>: Abbrev Number: 62 (DW_TAG_variable)
          <43d1518>   DW_AT_location    : 15 byte block: 91 50 93 8 91 78 93 4 93 84 8 91 68 93 4
              (DW_OP_fbreg: -48; DW_OP_piece: 8;
               DW_OP_fbreg: -8; DW_OP_piece: 4;
               DW_OP_piece: 1028;
               DW_OP_fbreg: -24; DW_OP_piece: 4)
      
      Another example is something like this.
      
          0057c8be ffffffffffffffff ffffffff812109f0 (base address)
          0057c8ce ffffffff812112b5 ffffffff812112c8 (DW_OP_breg3 (rbx): 0;
                                                      DW_OP_constu: 18446744073709551612;
                                                      DW_OP_and;
                                                      DW_OP_stack_value)
      
      It should refuse them.  After the change, the stat shows:
      
        Annotate data type stats:
        total 294, ok 158 (53.7%), bad 136 (46.3%)
        -----------------------------------------------------------
                30 : no_sym
                32 : no_mem_ops
                53 : no_var
                14 : no_typeinfo
                 7 : bad_offset
      Acked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20240117062657.985479-10-namhyung@kernel.orgSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      55442cc2
    • Namhyung Kim's avatar
      perf annotate-data: Support stack variables · bc10db8e
      Namhyung Kim authored
      Local variables are allocated in the stack and the location list
      should look like base register(s) and an offset.  Extend the
      die_find_variable_by_reg() to handle the following expressions
      
       * DW_OP_breg{0..31}
       * DW_OP_bregx
       * DW_OP_fbreg
      
      Ususally DWARF subprogram entries have frame base information and
      use it to locate stack variable like below:
      
       <2><43d1575>: Abbrev Number: 62 (DW_TAG_variable)
          <43d1576>   DW_AT_location    : 2 byte block: 91 7c         (DW_OP_fbreg: -4)  <--- here
          <43d1579>   DW_AT_name        : (indirect string, offset: 0x2c00c9): i
          <43d157d>   DW_AT_decl_file   : 1
          <43d157e>   DW_AT_decl_line   : 78
          <43d157f>   DW_AT_type        : <0x43d19d7>
      
      I found some differences on saving the frame base between gcc and clang.
      The gcc uses the CFA to get the base so it needs to check the current
      frame's CFI info.  In this case, stack offset needs to be adjusted from
      the start of the CFA.
      
       <1><1bb8d>: Abbrev Number: 102 (DW_TAG_subprogram)
          <1bb8e>   DW_AT_name        : (indirect string, offset: 0x74d41): kernel_init
          <1bb92>   DW_AT_decl_file   : 2
          <1bb92>   DW_AT_decl_line   : 1440
          <1bb94>   DW_AT_decl_column : 18
          <1bb95>   DW_AT_prototyped  : 1
          <1bb95>   DW_AT_type        : <0xcc>
          <1bb99>   DW_AT_low_pc      : 0xffffffff81bab9e0
          <1bba1>   DW_AT_high_pc     : 0x1b2
          <1bba9>   DW_AT_frame_base  : 1 byte block: 9c      (DW_OP_call_frame_cfa)  <------ here
          <1bbab>   DW_AT_call_all_calls: 1
          <1bbab>   DW_AT_sibling     : <0x1bf5a>
      
      While clang sets it to a register directly and it can check the register
      and offset in the instruction directly.
      
       <1><43d1542>: Abbrev Number: 60 (DW_TAG_subprogram)
          <43d1543>   DW_AT_low_pc      : 0xffffffff816a7c60
          <43d154b>   DW_AT_high_pc     : 0x98
          <43d154f>   DW_AT_frame_base  : 1 byte block: 56    (DW_OP_reg6 (rbp))  <---------- here
          <43d1551>   DW_AT_GNU_all_call_sites: 1
          <43d1551>   DW_AT_name        : (indirect string, offset: 0x3bce91): foo
          <43d1555>   DW_AT_decl_file   : 1
          <43d1556>   DW_AT_decl_line   : 75
          <43d1557>   DW_AT_prototyped  : 1
          <43d1557>   DW_AT_type        : <0x43c7332>
          <43d155b>   DW_AT_external    : 1
      
      Also it needs to update the offset after finding the type like global
      variables since the offset was from the frame base.  Factor out
      match_var_offset() to check global and local variables in the same way.
      
      The type stats are improved too:
      
        Annotate data type stats:
        total 294, ok 160 (54.4%), bad 134 (45.6%)
        -----------------------------------------------------------
                30 : no_sym
                32 : no_mem_ops
                51 : no_var
                14 : no_typeinfo
                 7 : bad_offset
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Link: https://lore.kernel.org/r/20240117062657.985479-9-namhyung@kernel.orgSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      bc10db8e
    • Namhyung Kim's avatar
      perf dwarf-aux: Add die_get_cfa() · 6fed025f
      Namhyung Kim authored
      The die_get_cfa() is to get frame base register and offset at the given
      instruction address (pc).  This info will be used to locate stack
      variables which have location expression using DW_OP_fbreg.
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Link: https://lore.kernel.org/r/20240117062657.985479-8-namhyung@kernel.orgSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      6fed025f
    • Namhyung Kim's avatar
      perf annotate-data: Support global variables · 5f7cdde8
      Namhyung Kim authored
      Global variables are accessed using PC-relative address so it needs to
      be handled separately.  The PC-rel addressing is detected by using
      DWARF_REG_PC.  On x86, %rip register would be used.
      
      The address can be calculated using the ip and offset in the
      instruction.  But it should start from the next instruction so add
      calculate_pcrel_addr() to do it properly.
      
      But global variables defined in a different file would only have a
      declaration which doesn't include a location list.  So it first tries
      to get the type info using the address, and then looks up the variable
      declarations using name.  The name of global variables should be get
      from the symbol table.  The declaration would have the type info.
      
      So extend find_var_type() to take both address and name for global
      variables.
      
      The stat is now looks like:
      
        Annotate data type stats:
        total 294, ok 153 (52.0%), bad 141 (48.0%)
        -----------------------------------------------------------
                30 : no_sym
                32 : no_mem_ops
                61 : no_var
                10 : no_typeinfo
                 8 : bad_offset
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Link: https://lore.kernel.org/r/20240117062657.985479-7-namhyung@kernel.orgSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      5f7cdde8
    • Namhyung Kim's avatar
      perf annotate-data: Handle PC-relative addressing · 83bfa06d
      Namhyung Kim authored
      Extend find_data_type_die() to find data type from PC-relative address
      using die_find_variable_by_addr().  Users need to pass the address for
      the (global) variable.
      
      The offset for the variable should be updated after finding the type
      because the offset in the instruction is just to calcuate the address
      for the variable.  So it changed to pass a pointer to offset and renamed
      it to 'poffset'.
      
      First it searches variables in the CU DIE as it's likely that the global
      variables are defined in the file level.  And then it iterates the scope
      DIEs to find a local (static) variable.
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Link: https://lore.kernel.org/r/20240117062657.985479-6-namhyung@kernel.orgSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      83bfa06d
    • Namhyung Kim's avatar
      perf annotate-data: Add stack operation pseudo type · 7a54f1d8
      Namhyung Kim authored
      A typical function prologue and epilogue include multiple stack
      operations to save and restore the current value of registers.
      On x86, it looks like below:
      
        push  r15
        push  r14
        push  r13
        push  r12
      
        ...
      
        pop   r12
        pop   r13
        pop   r14
        pop   r15
        ret
      
      As these all touches the stack memory region, chances are high that they
      appear in a memory profile data.  But these are not used for any real
      purpose yet so it'd return no types.
      
      One of my profile type shows that non neglible portion of data came from
      the stack operations.  It also seems GCC generates more stack operations
      than clang.
      
      Annotate Instruction stats
      total 264, ok 169 (64.0%), bad 95 (36.0%)
      
          Name      :  Good   Bad
        -----------------------------------------------------------
          movq      :    49    27
          movl      :    24     9
          popq      :     0    19   <-- here
          cmpl      :    17     2
          addq      :    14     1
          cmpq      :    12     2
          cmpxchgl  :     3     7
      
      Instead of dealing them as unknown, let's create a seperate pseudo type
      to represent those stack operations separately.
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Link: https://lore.kernel.org/r/20240117062657.985479-5-namhyung@kernel.orgSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      7a54f1d8
    • Namhyung Kim's avatar
      perf annotate-data: Handle array style accesses · d3030191
      Namhyung Kim authored
      On x86, instructions for array access often looks like below.
      
        mov  0x1234(%rax,%rbx,8), %rcx
      
      Usually the first register holds the type information and the second one
      has the index.  And the current code only looks up a variable for the
      first register.  But it's possible to be in the other way around so it
      needs to check the second register if the first one failed.
      
      The stat changed like this.
      
        Annotate data type stats:
        total 294, ok 148 (50.3%), bad 146 (49.7%)
        -----------------------------------------------------------
                30 : no_sym
                32 : no_mem_ops
                66 : no_var
                10 : no_typeinfo
                 8 : bad_offset
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Link: https://lore.kernel.org/r/20240117062657.985479-4-namhyung@kernel.orgSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      d3030191
    • Namhyung Kim's avatar
      perf annotate-data: Handle macro fusion on x86 · 1cf4df03
      Namhyung Kim authored
      When a sample was come from a conditional branch without a memory
      operand, it could be due to a macro fusion with a previous instruction.
      So it needs to check the memory operand in the previous one.
      
      This improves the stat like below:
      
        Annotate data type stats:
        total 294, ok 147 (50.0%), bad 147 (50.0%)
        -----------------------------------------------------------
                30 : no_sym
                32 : no_mem_ops
                71 : no_var
                 6 : no_typeinfo
                 8 : bad_offset
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Link: https://lore.kernel.org/r/20240117062657.985479-3-namhyung@kernel.orgSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      1cf4df03
    • Namhyung Kim's avatar
      perf annotate-data: Parse 'lock' prefix from llvm-objdump · a3397d69
      Namhyung Kim authored
      For the performance reason, I prefer llvm-objdump over GNU's.  But I
      found that llvm-objdump puts x86 lock prefix in a separate line like
      below.
      
        ffffffff81000695: f0                    lock
        ffffffff81000696: ff 83 54 0b 00 00     incl    2900(%rbx)
      
      This should be parsed properly, but I just changed to find the insn
      with next offset for now.
      
      This improves the statistics as it can process more instructions.
      
        Annotate data type stats:
        total 294, ok 144 (49.0%), bad 150 (51.0%)
        -----------------------------------------------------------
                30 : no_sym
                35 : no_mem_ops
                71 : no_var
                 6 : no_typeinfo
                 8 : bad_offset
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Link: https://lore.kernel.org/r/20240117062657.985479-2-namhyung@kernel.orgSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      a3397d69
    • Yang Jihong's avatar
      perf build: Check whether pkg-config is installed when libtraceevent is linked · 8462247f
      Yang Jihong authored
      If pkg-config is not installed when libtraceevent is linked, the build fails.
      
      The error information is as follows:
      
        $ make
        <SNIP>
        In file included from /home/yjh/projects_linux/perf-tool-next/linux/tools/perf/util/evsel.c:43:
        /home/yjh/projects_linux/perf-tool-next/linux/tools/perf/util/trace-event.h:149:62: error: operator '&&' has no right operand
          149 | #if defined(LIBTRACEEVENT_VERSION) &&  LIBTRACEEVENT_VERSION >= MAKE_LIBTRACEEVENT_VERSION(1, 5, 0)
              |                                                              ^~
        error: command '/usr/bin/gcc' failed with exit code 1
        cp: cannot stat 'python_ext_build/lib/perf*.so': No such file or directory
        make[2]: *** [Makefile.perf:668: python/perf.cpython-310-x86_64-linux-gnu.so] Error 1
        make[2]: *** Waiting for unfinished jobs....
      
      Because pkg-config is not installed, fail to get libtraceevent version in
      Makefile.config file. As a result, LIBTRACEEVENT_VERSION is empty.
      However, the preceding error information is not user-friendly.
      
      Identify errors in advance by checking that pkg-config is installed at
      compile time.
      
      The build results of various scenarios are as follows:
      
      1. build successful when libtraceevent is not linked and pkg-config is not installed
      
        $ pkg-config --version
        -bash: /usr/bin/pkg-config: No such file or directory
        $ make clean >/dev/null
        $ make NO_LIBTRACEEVENT=1 >/dev/null
        Makefile.config:1133: No alternatives command found, you need to set JDIR= to point to the root of your Java directory
          PERF_VERSION = 6.7.rc6.gd988c9f5
        $ echo $?
        0
      
      2. dummy pkg-config is missing when libtraceevent is linked
      
        $ pkg-config --version
        -bash: /usr/bin/pkg-config: No such file or directory
        $ make clean >/dev/null
        $ make >/dev/null
        Makefile.config:221: *** Error: pkg-config needed by libtraceevent is missing on this system, please install it.  Stop.
        make[1]: *** [Makefile.perf:251: sub-make] Error 2
        make: *** [Makefile:70: all] Error 2
        $ echo $?
        2
      
      3. build successful when libtraceevent is linked and pkg-config is installed
      
        $ pkg-config --version
        0.29.2
        $ make clean >/dev/null
        $ make >/dev/null
        Makefile.config:1133: No alternatives command found, you need to set JDIR= to point to the root of your Java directory
          PERF_VERSION = 6.7.rc6.gd988c9f5
        $ echo $?
        0
      Signed-off-by: default avatarYang Jihong <yangjihong1@huawei.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240112034019.3558584-1-yangjihong1@huawei.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      8462247f
    • Thomas Richter's avatar
      perf test: raise limit to 20 percent for perf_stat_--bpf-counters_test · 999eea92
      Thomas Richter authored
      This test case often fails on s390 (about 2 out of 10) because the
      10% percent limit on the difference between --bpf-counters event counting
      and s390 hardware counting is more than 10% in all failure cases.
      Raise the limit to 20% on s390 and the test case succeeds.
      Signed-off-by: default avatarThomas Richter <tmricht@linux.ibm.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: gor@linux.ibm.com
      Cc: hca@linux.ibm.com
      Cc: sumanthk@linux.ibm.com
      Cc: svens@linux.ibm.com
      Link: https://lore.kernel.org/r/20240108084009.3959211-1-tmricht@linux.ibm.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      999eea92
  5. 21 Jan, 2024 11 commits