1. 22 Feb, 2019 2 commits
    • Adrian Hunter's avatar
      perf thread-stack: Hide x86 retpolines · 3c0cd952
      Adrian Hunter authored
      x86 retpoline functions pollute the call graph by showing up everywhere
      there is an indirect branch, but they do not really mean anything. Make
      changes so that the default retpoline functions will no longer appear in
      the call graph. Note this only affects the call graph, since all the
      original branches are left unchanged.
      
      This does not handle function return thunks, nor is there any
      improvement for the handling of inline thunks or extern thunks.
      
      Example:
      
        $ cat simple-retpoline.c
        __attribute__((noinline)) int bar(void)
        {
                return -1;
        }
      
        int foo(void)
        {
                return bar() + 1;
        }
      
        __attribute__((indirect_branch("thunk"))) int main()
        {
                int (*volatile fn)(void) = foo;
      
                fn();
                return fn();
        }
        $ gcc -ggdb3 -Wall -Wextra -O2 -o simple-retpoline simple-retpoline.c
        $ objdump -d simple-retpoline
        <SNIP>
        0000000000001040 <main>:
            1040:       48 83 ec 18             sub    $0x18,%rsp
            1044:       48 8d 05 25 01 00 00    lea    0x125(%rip),%rax        # 1170 <foo>
            104b:       48 89 44 24 08          mov    %rax,0x8(%rsp)
            1050:       48 8b 44 24 08          mov    0x8(%rsp),%rax
            1055:       e8 1f 01 00 00          callq  1179 <__x86_indirect_thunk_rax>
            105a:       48 8b 44 24 08          mov    0x8(%rsp),%rax
            105f:       48 83 c4 18             add    $0x18,%rsp
            1063:       e9 11 01 00 00          jmpq   1179 <__x86_indirect_thunk_rax>
        <SNIP>
        0000000000001160 <bar>:
            1160:       b8 ff ff ff ff          mov    $0xffffffff,%eax
            1165:       c3                      retq
        <SNIP>
        0000000000001170 <foo>:
            1170:       e8 eb ff ff ff          callq  1160 <bar>
            1175:       83 c0 01                add    $0x1,%eax
            1178:       c3                      retq
        0000000000001179 <__x86_indirect_thunk_rax>:
            1179:       e8 07 00 00 00          callq  1185 <__x86_indirect_thunk_rax+0xc>
            117e:       f3 90                   pause
            1180:       0f ae e8                lfence
            1183:       eb f9                   jmp    117e <__x86_indirect_thunk_rax+0x5>
            1185:       48 89 04 24             mov    %rax,(%rsp)
            1189:       c3                      retq
        <SNIP>
        $ perf record -o simple-retpoline.perf.data -e intel_pt/cyc/u ./simple-retpoline
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0,017 MB simple-retpoline.perf.data ]
        $ perf script -i simple-retpoline.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py simple-retpoline.db branches calls
        2019-01-08 14:03:37.851655 Creating database...
        2019-01-08 14:03:37.863256 Writing records...
        2019-01-08 14:03:38.069750 Adding indexes
        2019-01-08 14:03:38.078799 Done
        $ ~/libexec/perf-core/scripts/python/exported-sql-viewer.py simple-retpoline.db
      
      Before:
      
          main
              -> __x86_indirect_thunk_rax
                  -> __x86_indirect_thunk_rax
                      -> foo
                          -> bar
      
      After:
      
          main
              -> foo
                  -> bar
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20190109091835.5570-7-adrian.hunter@intel.com
      [ Remove (sym->name != NULL) test, this is not a pointer and breaks the build with clang version 7.0.1 (Fedora 7.0.1-2.fc30) ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3c0cd952
    • Adrian Hunter's avatar
      perf thread-stack: Improve thread_stack__no_call_return() · 1f35cd65
      Adrian Hunter authored
      Improve thread_stack__no_call_return() to better handle 'returns' that
      do not match the stack i.e. 'no call'. See code comments for details.
      The example below shows how retpolines are affected:
      
      Example:
      
        $ cat simple-retpoline.c
        __attribute__((noinline)) int bar(void)
        {
                return -1;
        }
      
        int foo(void)
        {
                return bar() + 1;
        }
      
        __attribute__((indirect_branch("thunk"))) int main()
        {
                int (*volatile fn)(void) = foo;
      
                fn();
                return fn();
        }
        $ gcc -ggdb3 -Wall -Wextra -O2 -o simple-retpoline simple-retpoline.c
        $ objdump -d simple-retpoline
        <SNIP>
        0000000000001040 <main>:
            1040:       48 83 ec 18             sub    $0x18,%rsp
            1044:       48 8d 05 25 01 00 00    lea    0x125(%rip),%rax        # 1170 <foo>
            104b:       48 89 44 24 08          mov    %rax,0x8(%rsp)
            1050:       48 8b 44 24 08          mov    0x8(%rsp),%rax
            1055:       e8 1f 01 00 00          callq  1179 <__x86_indirect_thunk_rax>
            105a:       48 8b 44 24 08          mov    0x8(%rsp),%rax
            105f:       48 83 c4 18             add    $0x18,%rsp
            1063:       e9 11 01 00 00          jmpq   1179 <__x86_indirect_thunk_rax>
        <SNIP>
        0000000000001160 <bar>:
            1160:       b8 ff ff ff ff          mov    $0xffffffff,%eax
            1165:       c3                      retq
        <SNIP>
        0000000000001170 <foo>:
            1170:       e8 eb ff ff ff          callq  1160 <bar>
            1175:       83 c0 01                add    $0x1,%eax
            1178:       c3                      retq
        0000000000001179 <__x86_indirect_thunk_rax>:
            1179:       e8 07 00 00 00          callq  1185 <__x86_indirect_thunk_rax+0xc>
            117e:       f3 90                   pause
            1180:       0f ae e8                lfence
            1183:       eb f9                   jmp    117e <__x86_indirect_thunk_rax+0x5>
            1185:       48 89 04 24             mov    %rax,(%rsp)
            1189:       c3                      retq
        <SNIP>
        $ perf record -o simple-retpoline.perf.data -e intel_pt/cyc/u ./simple-retpoline
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0,017 MB simple-retpoline.perf.data ]
        $ perf script -i simple-retpoline.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py simple-retpoline.db branches calls
        2019-01-08 14:03:37.851655 Creating database...
        2019-01-08 14:03:37.863256 Writing records...
        2019-01-08 14:03:38.069750 Adding indexes
        2019-01-08 14:03:38.078799 Done
        $ ~/libexec/perf-core/scripts/python/exported-sql-viewer.py simple-retpoline.db
      
      Before:
      
          main
              -> __x86_indirect_thunk_rax
                  -> __x86_indirect_thunk_rax
                      -> __x86_indirect_thunk_rax
                          -> bar
      
      After:
      
          main
              -> __x86_indirect_thunk_rax
                  -> __x86_indirect_thunk_rax
                      -> foo
                          -> bar
      
      Committer testing:
      
      Chose "Reports", Then "Context-Sensitive Call Graph" and then go on
      expanding:
      
      Before:
      
      simple-retpolin
         PID:PID
            _start
               _start
                  __libc_start_main
                     main
                         __x86_indirect_thunk_rax
                            __x86_indirect_thunk_rax
                            bar
      
      After:
      
      Remove the "simple.retpoline.db" file, run again the 'perf script' line
      to regenerate the .db file and run the exported-sql-viewer.py again to
      get the same all the way to 'main', then, from there, including 'main':
      
                     main
                         __x86_indirect_thunk_rax
                             __x86_indirect_thunk_rax
                                 foo
                                     bar
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20190109091835.5570-6-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      1f35cd65
  2. 21 Feb, 2019 1 commit
    • Wei Li's avatar
      perf annotate: Fix getting source line failure · 11db1ad4
      Wei Li authored
      The output of "perf annotate -l --stdio xxx" changed since commit 425859ff
      ("perf annotate: No need to calculate notes->start twice") removed notes->start
      assignment in symbol__calc_lines(). It will get failed in
      find_address_in_section() from symbol__tty_annotate() subroutine as the
      a2l->addr is wrong. So the annotate summary doesn't report the line number of
      source code correctly.
      
      Before fix:
      
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ cat common_while_1.c
        void hotspot_1(void)
        {
      	volatile int i;
      
      	for (i = 0; i < 0x10000000; i++);
      	for (i = 0; i < 0x10000000; i++);
      	for (i = 0; i < 0x10000000; i++);
        }
      
        int main(void)
        {
      	hotspot_1();
      
      	return 0;
        }
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ gcc common_while_1.c -g -o common_while_1
      
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf record ./common_while_1
        [ perf record: Woken up 2 times to write data ]
        [ perf record: Captured and wrote 0.488 MB perf.data (12498 samples) ]
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf annotate -l -s hotspot_1 --stdio
      
        Sorted summary for file /home/liwei/main_code/hulk_work/hulk/tools/perf/common_while_1
        ----------------------------------------------
      
         19.30 common_while_1[32]
         19.03 common_while_1[4e]
         19.01 common_while_1[16]
          5.04 common_while_1[13]
          4.99 common_while_1[4b]
          4.78 common_while_1[2c]
          4.77 common_while_1[10]
          4.66 common_while_1[2f]
          4.59 common_while_1[51]
          4.59 common_while_1[35]
          4.52 common_while_1[19]
          4.20 common_while_1[56]
          0.51 common_while_1[48]
         Percent |      Source code & Disassembly of common_while_1 for cycles:ppp (12480 samples, percent: local period)
        -----------------------------------------------------------------------------------------------------------------
               :
               :
               :
               :         Disassembly of section .text:
               :
               :         00000000000005fa <hotspot_1>:
               :         hotspot_1():
               :         void hotspot_1(void)
               :         {
          0.00 :   5fa:   push   %rbp
          0.00 :   5fb:   mov    %rsp,%rbp
               :                 volatile int i;
               :
               :                 for (i = 0; i < 0x10000000; i++);
          0.00 :   5fe:   movl   $0x0,-0x4(%rbp)
          0.00 :   605:   jmp    610 <hotspot_1+0x16>
          0.00 :   607:   mov    -0x4(%rbp),%eax
         common_while_1[10]    4.77 :   60a:   add    $0x1,%eax
         common_while_1[13]    5.04 :   60d:   mov    %eax,-0x4(%rbp)
         common_while_1[16]   19.01 :   610:   mov    -0x4(%rbp),%eax
         common_while_1[19]    4.52 :   613:   cmp    $0xfffffff,%eax
            0.00 :   618:   jle    607 <hotspot_1+0xd>
                 :                 for (i = 0; i < 0x10000000; i++);
        ...
      
      After fix:
      
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf record ./common_while_1
        [ perf record: Woken up 2 times to write data ]
        [ perf record: Captured and wrote 0.488 MB perf.data (12500 samples) ]
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf annotate -l -s hotspot_1 --stdio
      
        Sorted summary for file /home/liwei/main_code/hulk_work/hulk/tools/perf/common_while_1
        ----------------------------------------------
      
         33.34 common_while_1.c:5
         33.34 common_while_1.c:6
         33.32 common_while_1.c:7
         Percent |      Source code & Disassembly of common_while_1 for cycles:ppp (12482 samples, percent: local period)
        -----------------------------------------------------------------------------------------------------------------
               :
               :
               :
               :         Disassembly of section .text:
               :
               :         00000000000005fa <hotspot_1>:
               :         hotspot_1():
               :         void hotspot_1(void)
               :         {
          0.00 :   5fa:   push   %rbp
          0.00 :   5fb:   mov    %rsp,%rbp
               :                 volatile int i;
               :
               :                 for (i = 0; i < 0x10000000; i++);
          0.00 :   5fe:   movl   $0x0,-0x4(%rbp)
          0.00 :   605:   jmp    610 <hotspot_1+0x16>
          0.00 :   607:   mov    -0x4(%rbp),%eax
         common_while_1.c:5    4.70 :   60a:   add    $0x1,%eax
          4.89 :   60d:   mov    %eax,-0x4(%rbp)
         common_while_1.c:5   19.03 :   610:   mov    -0x4(%rbp),%eax
         common_while_1.c:5    4.72 :   613:   cmp    $0xfffffff,%eax
          0.00 :   618:   jle    607 <hotspot_1+0xd>
               :                 for (i = 0; i < 0x10000000; i++);
          0.00 :   61a:   movl   $0x0,-0x4(%rbp)
          0.00 :   621:   jmp    62c <hotspot_1+0x32>
          0.00 :   623:   mov    -0x4(%rbp),%eax
         common_while_1.c:6    4.54 :   626:   add    $0x1,%eax
          4.73 :   629:   mov    %eax,-0x4(%rbp)
         common_while_1.c:6   19.54 :   62c:   mov    -0x4(%rbp),%eax
         common_while_1.c:6    4.54 :   62f:   cmp    $0xfffffff,%eax
        ...
      Signed-off-by: default avatarWei Li <liwei391@huawei.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Fixes: 425859ff ("perf annotate: No need to calculate notes->start twice")
      Link: http://lkml.kernel.org/r/20190221095716.39529-1-liwei391@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      11db1ad4
  3. 20 Feb, 2019 6 commits
  4. 19 Feb, 2019 10 commits
  5. 15 Feb, 2019 2 commits
    • Tommi Rantala's avatar
      perf tests shell: Skip trace+probe_vfs_getname.sh if built without trace support · 83244772
      Tommi Rantala authored
      If perf was built without trace support, the trace+probe_vfs_getname.sh
      'perf test' entry fails:
      
        # perf trace -h
        perf: 'trace' is not a perf-command. See 'perf --help'
      
        # perf test 64
        64: Check open filename arg using perf trace + vfs_getname: FAILED!
      
      Check trace support, so that we'll skip the test in that case:
      
        # perf test 64
        64: Check open filename arg using perf trace + vfs_getname: Skip
      Signed-off-by: default avatarTommi Rantala <tommi.t.rantala@nokia.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Hendrik Brueckner <brueckner@linux.ibm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20190215134253.11454-1-tt.rantala@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      83244772
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo-5.1-20190214' of... · 43f4e627
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo-5.1-20190214' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      perf list:
      
        Jiri Olsa:
      
        - Display metric expressions for --details option
      
      perf record:
      
        Alexey Budankov:
      
        - Implement --affinity=node|cpu option, leftover, the other patches
          in this kit were already applied.
      
      perf trace:
      
        Arnaldo Carvalho de Melo:
      
        - Fix segfaults due to not properly handling negative file descriptor syscall args.
      
        - Fix segfault related to the 'waitid' 'options' prefix showing logic.
      
        - Filter out 'gnome-terminal*' if it is a parent of 'perf trace', to reduce the
          syscall feedback loop in system wide sessions.
      
      BPF:
      
        Song Liu:
      
        - Silence "Couldn't synthesize bpf events" warning for EPERM.
      
      Build system:
      
        Arnaldo Carvalho de Melo:
      
        - Fix the test-all.c feature detection fast path that was broken for
          quite a while leading to longer build times.
      
      Event parsing:
      
        Jiri Olsa:
      
        - Fix legacy events symbol separator parsing
      
      cs-etm:
      
        Mathieu Poirier:
      
        - Fix some error path return errors and plug some memory leaks.
      
        - Add proper header file for symbols
      
        - Remove unused structure fields.
      
        - Modularize auxtrace_buffer fetch, decoder and packet processing loop.
      
      Vendor events:
      
        Paul Clarke:
      
        - Add assorted metrics for the Power8 and Power9 architectures.
      
      perf report:
      
        Thomas Richter:
      
        - Add s390 diagnostic sampling descriptor size
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      43f4e627
  6. 14 Feb, 2019 19 commits