An error occurred fetching the project authors.
  1. 04 Oct, 2022 3 commits
    • Namhyung Kim's avatar
      perf report: Show per-event LOST SAMPLES stat · d7ba22d4
      Namhyung Kim authored
      Display lost samples with --stat (if not zero):
      
        $ perf report --stat
          Aggregated stats:
                   TOTAL events:         64
                    COMM events:          2  ( 3.1%)
                    EXIT events:          1  ( 1.6%)
                  SAMPLE events:         26  (40.6%)
                   MMAP2 events:          4  ( 6.2%)
            LOST_SAMPLES events:          1  ( 1.6%)
                    ATTR events:          2  ( 3.1%)
          FINISHED_ROUND events:          1  ( 1.6%)
                ID_INDEX events:          1  ( 1.6%)
              THREAD_MAP events:          1  ( 1.6%)
                 CPU_MAP events:          1  ( 1.6%)
            EVENT_UPDATE events:          2  ( 3.1%)
               TIME_CONV events:          1  ( 1.6%)
                 FEATURE events:         20  (31.2%)
           FINISHED_INIT events:          1  ( 1.6%)
        cycles:uH stats:
                  SAMPLE events:         14
            LOST_SAMPLES events:          1
        instructions:uH stats:
                  SAMPLE events:         12
      Reviewed-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220901195739.668604-6-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d7ba22d4
    • Namhyung Kim's avatar
      perf hist: Add nr_lost_samples to hist_stats · 75b37db0
      Namhyung Kim authored
      This is a preparation to display accurate lost sample counts for
      each evsel.
      Reviewed-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220901195739.668604-5-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      75b37db0
    • Ian Rogers's avatar
      perf hist: Update use of pthread mutex · 8e03bb88
      Ian Rogers authored
      Switch to the use of mutex wrappers that provide better error checking.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexandre Truong <alexandre.truong@arm.com>
      Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andres Freund <andres@anarazel.de>
      Cc: Andrii Nakryiko <andrii@kernel.org>
      Cc: André Almeida <andrealmeid@igalia.com>
      Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
      Cc: Colin Ian King <colin.king@intel.com>
      Cc: Dario Petrillo <dario.pk1@gmail.com>
      Cc: Darren Hart <dvhart@infradead.org>
      Cc: Dave Marchevsky <davemarchevsky@fb.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Fangrui Song <maskray@google.com>
      Cc: Hewenliang <hewenliang4@huawei.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jason Wang <wangborong@cdjrlc.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kim Phillips <kim.phillips@amd.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin Liška <mliska@suse.cz>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Pavithra Gurushankar <gpavithrasha@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Monnet <quentin@isovalent.com>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Remi Bernon <rbernon@codeweavers.com>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Cc: Tom Rix <trix@redhat.com>
      Cc: Weiguo Li <liwg06@foxmail.com>
      Cc: Wenyu Liu <liuwenyu7@huawei.com>
      Cc: William Cohen <wcohen@redhat.com>
      Cc: Zechuan Chen <chenzechuan1@huawei.com>
      Cc: bpf@vger.kernel.org
      Cc: llvm@lists.linux.dev
      Cc: yaowenbin <yaowenbin1@huawei.com>
      Link: https://lore.kernel.org/r/20220826164242.43412-5-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8e03bb88
  2. 16 Feb, 2022 1 commit
    • Stephane Eranian's avatar
      perf report: Add "addr_from" and "addr_to" sort dimensions · 05274770
      Stephane Eranian authored
      With the existing symbol_from/symbol_to, branches captured in the same
      function would be collapsed into a single function if the latencies
      associated with the each branch (cycles) were all the same.  That is the
      case on Intel Broadwell, for instance. Since Intel Skylake, the latency
      is captured by hardware and therefore is used to disambiguate branches.
      
      Add addr_from/addr_to sort dimensions to sort branches based on their
      addresses and not the function there are in. The output is still the
      function name but the offset within the function is provided to uniquely
      identify each branch.  These new sort dimensions also help with annotate
      because they create different entries in the histogram which, in turn,
      generates proper branch annotations.
      
      Here is an example using AMD's branch sampling:
      
        $ perf record -a -b -c 1000037 -e cpu/branch-brs/ test_prg
      
        $ perf report
        Samples: 6M of event 'cpu/branch-brs/', Event count (approx.): 6901276
        Overhead  Command          Source Shared Object  Source Symbol                                   Target Symbol                                   Basic Block Cycle
          99.65%  test_prg	   test_prg              [.] test_thread                                 [.] test_thread                                 -
           0.02%  test_prg         [kernel.vmlinux]      [k] asm_sysvec_apic_timer_interrupt             [k] error_entry                                 -
      
        $ perf report -F overhead,comm,dso,addr_from,addr_to
        Samples: 6M of event 'cpu/branch-brs/', Event count (approx.): 6901276
        Overhead  Command          Shared Object     Source Address          Target Address
           4.22%  test_prg         test_prg          [.] test_thread+0x3c    [.] test_thread+0x4
           4.13%  test_prg         test_prg          [.] test_thread+0x4     [.] test_thread+0x3a
           4.09%  test_prg         test_prg          [.] test_thread+0x3a    [.] test_thread+0x6
           4.08%  test_prg         test_prg          [.] test_thread+0x2     [.] test_thread+0x3c
           4.06%  test_prg         test_prg          [.] test_thread+0x3e    [.] test_thread+0x2
           3.87%  test_prg         test_prg          [.] test_thread+0x6     [.] test_thread+0x38
           3.84%  test_prg         test_prg          [.] test_thread         [.] test_thread+0x3e
           3.76%  test_prg         test_prg          [.] test_thread+0x1e    [.] test_thread
           3.76%  test_prg         test_prg          [.] test_thread+0x38    [.] test_thread+0x8
           3.56%  test_prg         test_prg          [.] test_thread+0x22    [.] test_thread+0x1e
           3.54%  test_prg         test_prg          [.] test_thread+0x8     [.] test_thread+0x36
           3.47%  test_prg         test_prg          [.] test_thread+0x1c    [.] test_thread+0x22
           3.45%  test_prg         test_prg          [.] test_thread+0x36    [.] test_thread+0xa
           3.28%  test_prg         test_prg          [.] test_thread+0x24    [.] test_thread+0x1c
           3.25%  test_prg         test_prg          [.] test_thread+0xa     [.] test_thread+0x34
           3.24%  test_prg         test_prg          [.] test_thread+0x1a    [.] test_thread+0x24
           3.20%  test_prg         test_prg          [.] test_thread+0x34    [.] test_thread+0xc
           3.04%  test_prg         test_prg          [.] test_thread+0x26    [.] test_thread+0x1a
           3.01%  test_prg         test_prg          [.] test_thread+0xc     [.] test_thread+0x32
           2.98%  test_prg         test_prg          [.] test_thread+0x18    [.] test_thread+0x26
           2.94%  test_prg         test_prg          [.] test_thread+0x32    [.] test_thread+0xe
           2.76%  test_prg         test_prg          [.] test_thread+0x28    [.] test_thread+0x18
           2.73%  test_prg         test_prg          [.] test_thread+0xe     [.] test_thread+0x30
           2.67%  test_prg         test_prg          [.] test_thread+0x30    [.] test_thread+0x10
           2.67%  test_prg         test_prg          [.] test_thread+0x16    [.] test_thread+0x28
           2.46%  test_prg         test_prg          [.] test_thread+0x10    [.] test_thread+0x2e
           2.44%  test_prg         test_prg          [.] test_thread+0x2a    [.] test_thread+0x16
           2.38%  test_prg         test_prg          [.] test_thread+0x14    [.] test_thread+0x2a
           2.32%  test_prg         test_prg          [.] test_thread+0x2e    [.] test_thread+0x12
           2.28%  test_prg         test_prg          [.] test_thread+0x12    [.] test_thread+0x2c
           2.16%  test_prg         test_prg          [.] test_thread+0x2c    [.] test_thread+0x14
           0.02%  test_prg         [kernel.vmlinux]  [k] asm_sysvec_apic_ti+0x5  [k] error_entry
      Signed-off-by: default avatarStephane Eranian <eranian@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kim Phillips <kim.phillips@amd.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Link: http://lore.kernel.org/lkml/20220208211637.2221872-13-eranian@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      05274770
  3. 10 Jan, 2022 1 commit
  4. 18 Nov, 2021 3 commits
    • Namhyung Kim's avatar
      perf sort: Fix the 'p_stage_cyc' sort key behavior · db4b2840
      Namhyung Kim authored
      andle 'p_stage_cyc' (for pipeline stage cycles) sort key with the same
      rationale as for the 'weight' and 'local_weight', see the fix in this
      series for a full explanation.
      
      Not sure it also needs the local and global variants.
      
      But I couldn't test it actually because I don't have the machine.
      Reviewed-by: default avatarAthira Jajeev <atrajeev@linux.vnet.ibm.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarAthira Jajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20211105225617.151364-3-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      db4b2840
    • Namhyung Kim's avatar
      perf sort: Fix the 'ins_lat' sort key behavior · 4d03c753
      Namhyung Kim authored
      Handle 'ins_lat' (for instruction latency) and 'local_ins_lat' sort keys
      with the same rationale as for the 'weight' and 'local_weight', see the
      previous fix in this series for a full explanation.
      
      But I couldn't test it actually, so only build tested.
      Reviewed-by: default avatarAthira Jajeev <atrajeev@linux.vnet.ibm.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarAthira Jajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20211105225617.151364-2-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4d03c753
    • Namhyung Kim's avatar
      perf sort: Fix the 'weight' sort key behavior · 784e8add
      Namhyung Kim authored
      Currently, the 'weight' field in the perf sample has latency information
      for some instructions like in memory accesses.  And perf tool has 'weight'
      and 'local_weight' sort keys to display the info.
      
      But it's somewhat confusing what it shows exactly.  In my understanding,
      'local_weight' shows a weight in a single sample, and (global) 'weight'
      shows a sum of the weights in the hist_entry.
      
      For example:
      
        $ perf mem record -t load dd if=/dev/zero of=/dev/null bs=4k count=1M
      
        $ perf report --stdio -n -s +local_weight
        ...
        #
        # Overhead  Samples  Command  Shared Object     Symbol                     Local Weight
        # ........  .......  .......  ................  .........................  ............
        #
            21.23%      313  dd       [kernel.vmlinux]  [k] lockref_get_not_zero   32
            12.43%      183  dd       [kernel.vmlinux]  [k] lockref_get_not_zero   35
            11.97%      159  dd       [kernel.vmlinux]  [k] lockref_get_not_zero   36
            10.40%      141  dd       [kernel.vmlinux]  [k] lockref_put_return     32
             7.63%      113  dd       [kernel.vmlinux]  [k] lockref_get_not_zero   33
             6.37%       92  dd       [kernel.vmlinux]  [k] lockref_get_not_zero   34
             6.15%       90  dd       [kernel.vmlinux]  [k] lockref_put_return     33
        ...
      
      So let's look at the 'lockref_get_not_zero' symbols.  The top entry
      shows that 313 samples were captured with 'local_weight' 32, so the
      total weight should be 313 x 32 = 10016.  But it's not the case:
      
        $ perf report --stdio -n -s +local_weight,weight -S lockref_get_not_zero
        ...
        #
        # Overhead  Samples  Command  Shared Object     Local Weight  Weight
        # ........  .......  .......  ................  ............  ......
        #
             1.36%        4  dd       [kernel.vmlinux]  36            144
             0.47%        4  dd       [kernel.vmlinux]  37            148
             0.42%        4  dd       [kernel.vmlinux]  32            128
             0.40%        4  dd       [kernel.vmlinux]  34            136
             0.35%        4  dd       [kernel.vmlinux]  36            144
             0.34%        4  dd       [kernel.vmlinux]  35            140
             0.30%        4  dd       [kernel.vmlinux]  36            144
             0.30%        4  dd       [kernel.vmlinux]  34            136
             0.30%        4  dd       [kernel.vmlinux]  32            128
             0.30%        4  dd       [kernel.vmlinux]  32            128
        ...
      
      With the 'weight' sort key, it's divided to 4 samples even with the same
      info ('comm', 'dso', 'sym' and 'local_weight').  I don't think this is
      what we want.
      
      I found this because of the way it aggregates the 'weight' value.  Since
      it's not a period, we should not add them in the he->stat.  Otherwise,
      two 32 'weight' entries will create a 64 'weight' entry.
      
      After that, new 32 'weight' samples don't have a matching entry so it'd
      create a new entry and make it a 64 'weight' entry again and again.
      Later, they will be merged into 128 'weight' entries during the
      hists__collapse_resort() with 4 samples, multiple times like above.
      
      Let's keep the weight and display it differently.  For 'local_weight',
      it can show the weight as is, and for (global) 'weight' it can display
      the number multiplied by the number of samples.
      
      With this change, I can see the expected numbers.
      
        $ perf report --stdio -n -s +local_weight,weight -S lockref_get_not_zero
        ...
        #
        # Overhead  Samples  Command  Shared Object     Local Weight  Weight
        # ........  .......  .......  ................  ............  .....
        #
            21.23%      313  dd       [kernel.vmlinux]  32            10016
            12.43%      183  dd       [kernel.vmlinux]  35            6405
            11.97%      159  dd       [kernel.vmlinux]  36            5724
             7.63%      113  dd       [kernel.vmlinux]  33            3729
             6.37%       92  dd       [kernel.vmlinux]  34            3128
             4.17%       59  dd       [kernel.vmlinux]  37            2183
             0.08%        1  dd       [kernel.vmlinux]  269           269
             0.08%        1  dd       [kernel.vmlinux]  38            38
      Reviewed-by: default avatarAthira Jajeev <atrajeev@linux.vnet.ibm.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarAthira Jajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20211105225617.151364-1-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      784e8add
  5. 29 Apr, 2021 2 commits
  6. 26 Mar, 2021 1 commit
    • Athira Rajeev's avatar
      perf tools: Support pipeline stage cycles for powerpc · 06e5ca74
      Athira Rajeev authored
      The pipeline stage cycles details can be recorded on powerpc from the
      contents of Performance Monitor Unit (PMU) registers. On ISA v3.1
      platform, sampling registers exposes the cycles spent in different
      pipeline stages. Patch adds perf tools support to present two of the
      cycle counter information along with memory latency (weight).
      
      Re-use the field 'ins_lat' for storing the first pipeline stage cycle.
      This is stored in 'var2_w' field of 'perf_sample_weight'.
      
      Add a new field 'p_stage_cyc' to store the second pipeline stage cycle
      which is stored in 'var3_w' field of perf_sample_weight.
      
      Add new sort function 'Pipeline Stage Cycle' and include this in
      default_mem_sort_order[]. This new sort function may be used to denote
      some other pipeline stage in another architecture. So add this to list
      of sort entries that can have dynamic header string.
      Signed-off-by: default avatarAthira Rajeev <atrajeev@linux.vnet.ibm.com>
      Reviewed-by: default avatarMadhavan Srinivasan <maddy@linux.ibm.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: https://lore.kernel.org/r/1616425047-1666-5-git-send-email-atrajeev@linux.vnet.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      06e5ca74
  7. 08 Feb, 2021 2 commits
    • Kan Liang's avatar
      perf report: Support instruction latency · 590db42d
      Kan Liang authored
      The instruction latency information can be recorded on some platforms,
      e.g., the Intel Sapphire Rapids server. With both memory latency
      (weight) and the new instruction latency information, users can easily
      locate the expensive load instructions, and also understand the time
      spent in different stages. The users can optimize their applications in
      different pipeline stages.
      
      The 'weight' field is shared among different architectures. Reusing the
      'weight' field may impacts other architectures. Add a new field to store
      the instruction latency.
      
      Like the 'weight' support, introduce a 'ins_lat' for the global
      instruction latency, and a 'local_ins_lat' for the local instruction
      latency version.
      
      Add new sort functions, INSTR Latency and Local INSTR Latency,
      accordingly.
      
      Add local_ins_lat to the default_mem_sort_order[].
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/1612296553-21962-7-git-send-email-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      590db42d
    • Kan Liang's avatar
      perf tools: Support data block and addr block · a054c298
      Kan Liang authored
      Two new data source fields, to indicate the block reasons of a load
      instruction, are introduced on the Intel Sapphire Rapids server. The
      fields can be used by the memory profiling.
      
      Add a new sort function, SORT_MEM_BLOCKED, for the two fields.
      
      For the previous platforms or the block reason is unknown, print "N/A"
      for the block reason.
      
      Add blocked as a default mem sort key for perf report and perf mem
      report.
      
      Committer testing:
      
      So in machines without this capability we get a "N/A" filling the new "Blocked"
      column:
      
        $ perf mem record ls
        arch     certs	 CREDITS  Documentation  include  ipc     Kconfig  lib       MAINTAINERS  mm   samples  security  usr    block
        COPYING	 crypto	 drivers  fs             init     Kbuild  kernel   LICENSES  Makefile     net  README   scripts   sound  tools
        virt
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.008 MB perf.data (17 samples) ]
        $
        $ perf mem report --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        #
        # Total Lost Samples: 0
        #
        # Samples: 6  of event 'cpu/mem-loads,ldlat=30/Pu'
        # Total weight : 1381
        # Sort order   : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked
        #
        # Overhead  Samples  Local Weight  Memory access         Symbol                   Shared Object  Data Symbol             Data Object   Snoop  TLB access    Locked  Blocked
        # ........  .......  ............  ....................  .......................  .............  ......................  ............  .....  ............  ......  .......
        #
            32.87%        1  454           Local RAM or RAM hit  [.] _dl_relocate_object  ld-2.31.so     [.] 0x00007fe91cef3078  libc-2.31.so  Hit    L1 or L2 hit  No       N/A
            25.56%        1  353           LFB or LFB hit        [.] strcmp               ld-2.31.so     [.] 0x00005586973855ca  ls            None   L1 or L2 hit  No       N/A
            22.59%        1  312           LFB or LFB hit        [.] _dl_cache_libcmp     ld-2.31.so     [.] 0x00007fe91d0e3b18  ld.so.cache   None   L1 or L2 hit  No       N/A
             8.47%        1  117           LFB or LFB hit        [.] _dl_relocate_object  ld-2.31.so     [.] 0x00007fe91ceee570  libc-2.31.so  None   L1 or L2 hit  No       N/A
             6.88%        1  95            LFB or LFB hit        [.] _dl_relocate_object  ld-2.31.so     [.] 0x00007fe91ceed490  libc-2.31.so  None   L1 or L2 hit  No       N/A
             3.62%        1  50            LFB or LFB hit        [.] _dl_cache_libcmp     ld-2.31.so     [.] 0x00007fe91d0ebe60  ld.so.cache   None   L1 or L2 hit  No       N/A
      
        # Samples: 11  of event 'cpu/mem-stores/Pu'
        # Total weight : 11
        # Sort order   : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked
        #
        # Overhead  Samples  Local Weight  Memory access  Symbol                   Shared Object  Data Symbol             Data Object  Snoop  TLB access  Locked  Blocked
        # ........  .......  ............  .............  .......................  .............  ......................  ...........  .....  ..........  ......  .......
        #
             9.09%        1  0             L1 hit         [.] __strcoll_l          libc-2.31.so   [.] 0x00007fffe5648fc8  [stack]      N/A    N/A         N/A      N/A
             9.09%        1  0             L1 hit         [.] _dl_lookup_symbol_x  ld-2.31.so     [.] 0x00007fffe56490b8  [stack]      N/A    N/A         N/A      N/A
             9.09%        1  0             L1 hit         [.] _dl_name_match_p     ld-2.31.so     [.] 0x00007fffe56487d8  [stack]      N/A    N/A         N/A      N/A
             9.09%        1  0             L1 hit         [.] _dl_start            ld-2.31.so     [.] start_time+0x0      ld-2.31.so   N/A    N/A         N/A      N/A
             9.09%        1  0             L1 hit         [.] _dl_sysdep_start     ld-2.31.so     [.] 0x00007fffe56494b8  [stack]      N/A    N/A         N/A      N/A
             9.09%        1  0             L1 hit         [.] do_lookup_x          ld-2.31.so     [.] 0x00007fffe5648ff8  [stack]      N/A    N/A         N/A      N/A
             9.09%        1  0             L1 hit         [.] do_lookup_x          ld-2.31.so     [.] 0x00007fffe5649064  [stack]      N/A    N/A         N/A      N/A
             9.09%        1  0             L1 hit         [.] do_lookup_x          ld-2.31.so     [.] 0x00007fffe5649130  [stack]      N/A    N/A         N/A      N/A
             9.09%        1  0             L1 miss        [.] _dl_start            ld-2.31.so     [.] _rtld_global+0xaf8  ld-2.31.so   N/A    N/A         N/A      N/A
             9.09%        1  0             L1 miss        [.] _dl_start            ld-2.31.so     [.] _rtld_global+0xc28  ld-2.31.so   N/A    N/A         N/A      N/A
             9.09%        1  0             L1 miss        [.] _dl_start            ld-2.31.so     [.] 0x00007fffe56495b8  [stack]      N/A    N/A         N/A      N/A
      
        # (Tip: Show user configuration overrides: perf config --user --list)
        $
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/1612296553-21962-4-git-send-email-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a054c298
  8. 20 Jan, 2021 1 commit
  9. 19 Dec, 2020 1 commit
    • Kan Liang's avatar
      perf sort: Add sort option for data page size · a50d03e3
      Kan Liang authored
      Add a new sort option "data_page_size" for --mem-mode sort.  With this
      option applied, perf can sort and report by sample's data page size.
      
      Here is an example:
      
      perf report --stdio --mem-mode
      --sort=comm,symbol,phys_daddr,data_page_size
      
       # To display the perf.data header info, please use
       # --header/--header-only options.
       #
       #
       # Total Lost Samples: 0
       #
       # Samples: 9K of event 'mem-loads:uP'
       # Total weight : 9028
       # Sort order   : comm,symbol,phys_daddr,data_page_size
       #
       # Overhead  Command  Symbol                        Data Physical
       # Address
       # Data Page Size
       # ........  .......  ............................
       # ......................  ......................
       #
          11.19%  dtlb     [.] touch_buffer              [.] 0x00000003fec82ea8  4K
           8.61%  dtlb     [.] GetTickCount              [.] 0x00000003c4f2c8a8  4K
           4.52%  dtlb     [.] GetTickCount              [.] 0x00000003fec82f58  4K
           4.33%  dtlb     [.] __gettimeofday            [.] 0x00000003fec82f48  4K
           4.32%  dtlb     [.] GetTickCount              [.] 0x00000003fec82f78  4K
           4.28%  dtlb     [.] GetTickCount              [.] 0x00000003fec82f50  4K
           4.23%  dtlb     [.] GetTickCount              [.] 0x00000003fec82f70  4K
           4.11%  dtlb     [.] GetTickCount              [.] 0x00000003fec82f68  4K
           4.00%  dtlb     [.] Calibrate                 [.] 0x00000003fec82f98  4K
           3.91%  dtlb     [.] Calibrate                 [.] 0x00000003fec82f90  4K
           3.43%  dtlb     [.] touch_buffer              [.] 0x00000003fec82e98  4K
           3.42%  dtlb     [.] touch_buffer              [.] 0x00000003fec82e90  4K
           0.09%  dtlb     [.] DoDependentLoads          [.] 0x000000036ea084c0  2M
           0.08%  dtlb     [.] DoDependentLoads          [.] 0x000000032b010b80  2M
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Will Deacon <will@kernel.org>
      Link: http://lore.kernel.org/lkml/20201216185805.9981-3-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a50d03e3
  10. 30 Nov, 2020 1 commit
  11. 28 May, 2020 2 commits
  12. 05 May, 2020 3 commits
  13. 18 Apr, 2020 1 commit
    • Kan Liang's avatar
      perf hist: Add fast path for duplicate entries check · 12e89e65
      Kan Liang authored
      Perf checks the duplicate entries in a callchain before adding an entry.
      However the check is very slow especially with deeper call stack.
      Almost ~50% elapsed time of perf report is spent on the check when the
      call stack is always depth of 32.
      
      The hist_entry__cmp() is used to compare the new entry with the old
      entries. It will go through all the available sorts in the sort_list,
      and call the specific cmp of each sort, which is very slow.
      
      Actually, for most cases, there are no duplicate entries in callchain.
      The symbols are usually different. It's much faster to do a quick check
      for symbols first. Only do the full cmp when the symbols are exactly the
      same.
      
      The quick check is only to check symbols, not dso. Export
      _sort__sym_cmp.
      
        $ perf record --call-graph lbr ./tchain_edit_64
      
        Without the patch
        $time perf report --stdio
        real    0m21.142s
        user    0m21.110s
        sys     0m0.033s
      
        With the patch
        $time perf report --stdio
        real    0m10.977s
        user    0m10.948s
        sys     0m0.027s
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
      Link: http://lore.kernel.org/lkml/20200319202517.23423-18-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      12e89e65
  14. 03 Apr, 2020 1 commit
    • Namhyung Kim's avatar
      perf report: Add 'cgroup' sort key · b629f3e9
      Namhyung Kim authored
      The cgroup sort key is to show cgroup membership of each task.
      Currently it shows full path in the cgroupfs (not relative to the root
      of cgroup namespace) since it'd be more intuitive IMHO.  Otherwise root
      cgroup in different namespaces will all show same name - "/".
      
      The cgroup sort key should come before cgroup_id otherwise
      sort_dimension__add() will match it to cgroup_id as it only matches with
      the given substring.
      
      For example it will look like following.  Note that record patch adding
      --all-cgroups patch will come later.
      
        $ perf record -a --namespace --all-cgroups  cgtest
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.208 MB perf.data (4090 samples) ]
      
        $ perf report -s cgroup_id,cgroup,pid
        ...
        # Overhead  cgroup id (dev/inode)  Cgroup          Pid:Command
        # ........  .....................  ..........  ...............
        #
            93.96%  0/0x0                  /                 0:swapper
             1.25%  3/0xeffffffb           /               278:looper0
             0.86%  3/0xf000015f           /sub/cgrp1      280:cgtest
             0.37%  3/0xf0000160           /sub/cgrp2      281:cgtest
             0.34%  3/0xf0000163           /sub/cgrp3      282:cgtest
             0.22%  3/0xeffffffb           /sub            278:looper0
             0.20%  3/0xeffffffb           /               280:cgtest
             0.15%  3/0xf0000163           /sub/cgrp3      285:looper3
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200325124536.2800725-6-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b629f3e9
  15. 10 Mar, 2020 1 commit
    • Kan Liang's avatar
      perf tools: Add hw_idx in struct branch_stack · 42bbabed
      Kan Liang authored
      The low level index of raw branch records for the most recent branch can
      be recorded in a sample with PERF_SAMPLE_BRANCH_HW_INDEX
      branch_sample_type. Extend struct branch_stack to support it.
      
      However, if the PERF_SAMPLE_BRANCH_HW_INDEX is not applied, only nr and
      entries[] will be output by kernel. The pointer of entries[] could be
      wrong, since the output format is different with new struct
      branch_stack.  Add a variable no_hw_idx in struct perf_sample to
      indicate whether the hw_idx is output.  Add get_branch_entry() to return
      corresponding pointer of entries[0].
      
      To make dummy branch sample consistent as new branch sample, add hw_idx
      in struct dummy_branch_stack for cs-etm and intel-pt.
      
      Apply the new struct branch_stack for synthetic events as well.
      
      Extend test case sample-parsing to support new struct branch_stack.
      
      Committer notes:
      
      Renamed get_branch_entries() to perf_sample__branch_entries() to have
      proper namespacing and pave the way for this to be moved to libperf,
      eventually.
      
      Add 'static' to that inline as it is in a header.
      
      Add 'hw_idx' to 'struct dummy_branch_stack' in cs-etm.c to fix the build
      on arm64.
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
      Link: http://lore.kernel.org/lkml/20200228163011.19358-2-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      42bbabed
  16. 26 Nov, 2019 2 commits
  17. 12 Nov, 2019 2 commits
  18. 07 Nov, 2019 4 commits
  19. 05 Nov, 2019 1 commit
  20. 01 Sep, 2019 3 commits
  21. 31 Aug, 2019 1 commit
  22. 29 Aug, 2019 1 commit
  23. 28 Aug, 2019 1 commit
  24. 26 Aug, 2019 1 commit
    • Andi Kleen's avatar
      perf report: Fix --ns time sort key output · 3dab6ac0
      Andi Kleen authored
      If the user specified --ns, the column to print the sort time stamp
      wasn't wide enough to actually print the full nanoseconds.
      
      Widen the time key column width when --ns is specified.
      
      Before:
      
        % perf record -a sleep 1
        % perf report --sort time,overhead,symbol --stdio --ns
        ...
             2.39%  187851.10000  [k] smp_call_function_single   -      -
             1.53%  187851.10000  [k] intel_idle                 -      -
             0.59%  187851.10000  [.] __wcscmp_ifunc             -      -
             0.33%  187851.10000  [.] 0000000000000000           -      -
             0.28%  187851.10000  [k] cpuidle_enter_state        -      -
      
      After:
      
        % perf report --sort time,overhead,symbol --stdio --ns
        ...
             2.39%  187851.100000000  [k] smp_call_function_single   -      -
             1.53%  187851.100000000  [k] intel_idle                 -      -
             0.59%  187851.100000000  [.] __wcscmp_ifunc             -      -
             0.33%  187851.100000000  [.] 0000000000000000           -      -
             0.28%  187851.100000000  [k] cpuidle_enter_state        -      -
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20190823210338.12360-2-andi@firstfloor.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3dab6ac0