1. 05 Dec, 2017 25 commits
    • Wang Nan's avatar
      perf mmap: Don't discard prev in backward mode · 7fb4b407
      Wang Nan authored
      'perf record' can switch its output data file. The new output should
      only store the data after switching. However, in overwrite backward
      mode, the new output still can have data from before switching. That
      also brings extra overhead.
      
      At the end of mmap_read(), the position of the processed ring buffer is
      saved in md->prev. Next mmap_read should be end in md->prev if it is not
      overwriten. That avoids processing duplicate data.  However, md->prev is
      discarded. So next the mmap_read() has to process whole valid ring
      buffer, which probably includes old processed data.
      
      Avoid calling backward_rb_find_range() when md->prev is still
      available.
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Tested-by: default avatarKan Liang <kan.liang@intel.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mengting Zhang <zhangmengting@huawei.com>
      Link: http://lkml.kernel.org/r/20171204165107.95327-3-wangnan0@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7fb4b407
    • Wang Nan's avatar
      perf mmap: Fix perf backward recording · 71f566a3
      Wang Nan authored
      'perf record' backward recording doesn't work as we expected: it never
      overwrites when ring buffer gets full.
      
      Test:
      
      Run a busy python printing task background like this:
      
       while True:
           print 123
      
      send SIGUSR2 to perf to capture snapshot, then:
      
       # ./perf record --overwrite -e raw_syscalls:sys_enter -e raw_syscalls:sys_exit --exclude-perf -a --switch-output
       [ perf record: dump data: Woken up 1 times ]
       [ perf record: Dump perf.data.2017110101520743 ]
       [ perf record: dump data: Woken up 1 times ]
       [ perf record: Dump perf.data.2017110101521251 ]
       [ perf record: dump data: Woken up 1 times ]
       [ perf record: Dump perf.data.2017110101521692 ]
       ^C[ perf record: Woken up 1 times to write data ]
       [ perf record: Dump perf.data.2017110101521936 ]
       [ perf record: Captured and wrote 0.826 MB perf.data.<timestamp> ]
      
       # ./perf script -i ./perf.data.2017110101520743 | head -n3
                   perf  2717 [000] 12449.310785: raw_syscalls:sys_enter: NR 16 (5, 2400, 0, 59, 100, 0)
                   perf  2717 [000] 12449.310790: raw_syscalls:sys_enter: NR 7 (4112340, 2, ffffffff, 3df, 100, 0)
                 python  2545 [000] 12449.310800:  raw_syscalls:sys_exit: NR 1 = 4
       # ./perf script -i ./perf.data.2017110101521251 | head -n3
                   perf  2717 [000] 12449.310785: raw_syscalls:sys_enter: NR 16 (5, 2400, 0, 59, 100, 0)
                   perf  2717 [000] 12449.310790: raw_syscalls:sys_enter: NR 7 (4112340, 2, ffffffff, 3df, 100, 0)
                 python  2545 [000] 12449.310800:  raw_syscalls:sys_exit: NR 1 = 4
       # ./perf script -i ./perf.data.2017110101521692 | head -n3
                   perf  2717 [000] 12449.310785: raw_syscalls:sys_enter: NR 16 (5, 2400, 0, 59, 100, 0)
                   perf  2717 [000] 12449.310790: raw_syscalls:sys_enter: NR 7 (4112340, 2, ffffffff, 3df, 100, 0)
                 python  2545 [000] 12449.310800:  raw_syscalls:sys_exit: NR 1 = 4
      
      Timestamps never change, but my background task is a dead loop, can
      easily overwhelm the ring buffer.
      
      This patch fixes it by forcing unsetting PROT_WRITE for a backward ring
      buffer, so all backward ring buffers become overwrite ring buffers.
      
      Test result:
      
       # ./perf record --overwrite -e raw_syscalls:sys_enter -e raw_syscalls:sys_exit --exclude-perf -a --switch-output
       [ perf record: dump data: Woken up 1 times ]
       [ perf record: Dump perf.data.2017110101285323 ]
       [ perf record: dump data: Woken up 1 times ]
       [ perf record: Dump perf.data.2017110101290053 ]
       [ perf record: dump data: Woken up 1 times ]
       [ perf record: Dump perf.data.2017110101290446 ]
       ^C[ perf record: Woken up 1 times to write data ]
       [ perf record: Dump perf.data.2017110101290837 ]
       [ perf record: Captured and wrote 0.826 MB perf.data.<timestamp> ]
       # ./perf script -i ./perf.data.2017110101285323 | head -n3
                 python  2545 [000] 11064.268083:  raw_syscalls:sys_exit: NR 1 = 4
                 python  2545 [000] 11064.268084: raw_syscalls:sys_enter: NR 1 (1, 12cc330, 4, 7fc237280370, 7fc2373d0700, 2c7b0)
                 python  2545 [000] 11064.268086:  raw_syscalls:sys_exit: NR 1 = 4
       # ./perf script -i ./perf.data.2017110101290 | head -n3
       failed to open ./perf.data.2017110101290: No such file or directory
       # ./perf script -i ./perf.data.2017110101290053 | head -n3
                 python  2545 [000] 11071.564062: raw_syscalls:sys_enter: NR 1 (1, 12cc330, 4, 7fc237280370, 7fc2373d0700, 2c7b0)
                 python  2545 [000] 11071.564064:  raw_syscalls:sys_exit: NR 1 = 4
                 python  2545 [000] 11071.564066: raw_syscalls:sys_enter: NR 1 (1, 12cc330, 4, 7fc237280370, 7fc2373d0700, 2c7b0)
       # ./perf script -i ./perf.data.2017110101290 | head -n3
       perf.data.2017110101290053  perf.data.2017110101290446  perf.data.2017110101290837
       # ./perf script -i ./perf.data.2017110101290446 | head -n3
                   sshd  1321 [000] 11075.499473:  raw_syscalls:sys_exit: NR 14 = 0
                   sshd  1321 [000] 11075.499474: raw_syscalls:sys_enter: NR 14 (2, 7ffe98899490, 0, 8, 0, 3000)
                   sshd  1321 [000] 11075.499474:  raw_syscalls:sys_exit: NR 14 = 0
       # ./perf script -i ./perf.data.2017110101290837 | head -n3
                 python  2545 [000] 11079.280844:  raw_syscalls:sys_exit: NR 1 = 4
                 python  2545 [000] 11079.280847: raw_syscalls:sys_enter: NR 1 (1, 12cc330, 4, 7fc237280370, 7fc2373d0700, 2c7b0)
                 python  2545 [000] 11079.280850:  raw_syscalls:sys_exit: NR 1 = 4
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Mengting Zhang <zhangmengting@huawei.com>
      Link: http://lkml.kernel.org/r/20171204165107.95327-2-wangnan0@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      71f566a3
    • Seokho Song's avatar
      perf report: Set browser mode right before setup_browser() · 712d36db
      Seokho Song authored
      There are codes that print messages to the screen between assignment of
      the use_browser variable and setup_browser().
      
      But since the GUI browser is not initialized during that period, all
      messages fail to show if the user passed the --gtk option to perf as GTK
      is not initialized yet.
      
      Reorder the code to assign use_browser variable right before
      setup_browser() is called.
      Signed-off-by: default avatarSeokho Song <0xdevssh@gmail.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20171204160244.6332-1-0xdevssh@gmail.comSigned-off-by: default avatarPark Ju Hyung <qkrwngud825@gmail.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      712d36db
    • Arnaldo Carvalho de Melo's avatar
      x86/asm: Allow again using asm.h when building for the 'bpf' clang target · c343bade
      Arnaldo Carvalho de Melo authored
      Up to f5caf621 ("x86/asm: Fix inline asm call constraints for Clang")
      we were able to use x86 headers to build to the 'bpf' clang target, as
      done by the BPF code in tools/perf/.
      
      With that commit, we ended up with following failure for 'perf test LLVM', this
      is because "clang ... -target bpf ..." fails since 4.0 does not have bpf inline
      asm support and 6.0 does not recognize the register 'esp', fix it by guarding
      that part with an #ifndef __BPF__, that is defined by clang when building to
      the "bpf" target.
      
        # perf test -v LLVM
        37: LLVM search and compile                               :
        37.1: Basic BPF llvm compile                              :
        --- start ---
        test child forked, pid 25526
        Kernel build dir is set to /lib/modules/4.14.0+/build
        set env: KBUILD_DIR=/lib/modules/4.14.0+/build
        unset env: KBUILD_OPTS
        include option is set to  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h
        set env: NR_CPUS=4
        set env: LINUX_VERSION_CODE=0x40e00
        set env: CLANG_EXEC=/usr/local/bin/clang
        set env: CLANG_OPTIONS=-xc
        set env: KERNEL_INC_OPTIONS= -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h
        set env: WORKING_DIR=/lib/modules/4.14.0+/build
        set env: CLANG_SOURCE=-
        llvm compiling command template: echo '/*
         * bpf-script-example.c
         * Test basic LLVM building
         */
        #ifndef LINUX_VERSION_CODE
        # error Need LINUX_VERSION_CODE
        # error Example: for 4.2 kernel, put 'clang-opt="-DLINUX_VERSION_CODE=0x40200" into llvm section of ~/.perfconfig'
        #endif
        #define BPF_ANY 0
        #define BPF_MAP_TYPE_ARRAY 2
        #define BPF_FUNC_map_lookup_elem 1
        #define BPF_FUNC_map_update_elem 2
      
        static void *(*bpf_map_lookup_elem)(void *map, void *key) =
      	  (void *) BPF_FUNC_map_lookup_elem;
        static void *(*bpf_map_update_elem)(void *map, void *key, void *value, int flags) =
      	  (void *) BPF_FUNC_map_update_elem;
      
        struct bpf_map_def {
      	  unsigned int type;
      	  unsigned int key_size;
      	  unsigned int value_size;
      	  unsigned int max_entries;
        };
      
        #define SEC(NAME) __attribute__((section(NAME), used))
        struct bpf_map_def SEC("maps") flip_table = {
      	  .type = BPF_MAP_TYPE_ARRAY,
      	  .key_size = sizeof(int),
      	  .value_size = sizeof(int),
      	  .max_entries = 1,
        };
      
        SEC("func=SyS_epoll_wait")
        int bpf_func__SyS_epoll_wait(void *ctx)
        {
      	  int ind =0;
      	  int *flag = bpf_map_lookup_elem(&flip_table, &ind);
      	  int new_flag;
      	  if (!flag)
      		  return 0;
      	  /* flip flag and store back */
      	  new_flag = !*flag;
      	  bpf_map_update_elem(&flip_table, &ind, &new_flag, BPF_ANY);
      	  return new_flag;
        }
        char _license[] SEC("license") = "GPL";
        int _version SEC("version") = LINUX_VERSION_CODE;
        ' | $CLANG_EXEC -D__KERNEL__ -D__NR_CPUS__=$NR_CPUS -DLINUX_VERSION_CODE=$LINUX_VERSION_CODE $CLANG_OPTIONS $KERNEL_INC_OPTIONS -Wno-unused-value -Wno-pointer-sign -working-directory $WORKING_DIR -c "$CLANG_SOURCE" -target bpf -O2 -o -
        test child finished with 0
        ---- end ----
        LLVM search and compile subtest 0: Ok
        37.2: kbuild searching                                    :
        --- start ---
        test child forked, pid 25950
        Kernel build dir is set to /lib/modules/4.14.0+/build
        set env: KBUILD_DIR=/lib/modules/4.14.0+/build
        unset env: KBUILD_OPTS
        include option is set to  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h
        set env: NR_CPUS=4
        set env: LINUX_VERSION_CODE=0x40e00
        set env: CLANG_EXEC=/usr/local/bin/clang
        set env: CLANG_OPTIONS=-xc
        set env: KERNEL_INC_OPTIONS= -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h
        set env: WORKING_DIR=/lib/modules/4.14.0+/build
        set env: CLANG_SOURCE=-
        llvm compiling command template: echo '/*
         * bpf-script-test-kbuild.c
         * Test include from kernel header
         */
        #ifndef LINUX_VERSION_CODE
        # error Need LINUX_VERSION_CODE
        # error Example: for 4.2 kernel, put 'clang-opt="-DLINUX_VERSION_CODE=0x40200" into llvm section of ~/.perfconfig'
        #endif
        #define SEC(NAME) __attribute__((section(NAME), used))
      
        #include <uapi/linux/fs.h>
        #include <uapi/asm/ptrace.h>
      
        SEC("func=vfs_llseek")
        int bpf_func__vfs_llseek(void *ctx)
        {
      	  return 0;
        }
      
        char _license[] SEC("license") = "GPL";
        int _version SEC("version") = LINUX_VERSION_CODE;
        ' | $CLANG_EXEC -D__KERNEL__ -D__NR_CPUS__=$NR_CPUS -DLINUX_VERSION_CODE=$LINUX_VERSION_CODE $CLANG_OPTIONS $KERNEL_INC_OPTIONS -Wno-unused-value -Wno-pointer-sign -working-directory $WORKING_DIR -c "$CLANG_SOURCE" -target bpf -O2 -o -
        In file included from <stdin>:12:
        In file included from /home/acme/git/linux/arch/x86/include/uapi/asm/ptrace.h:5:
        In file included from /home/acme/git/linux/include/linux/compiler.h:242:
        In file included from /home/acme/git/linux/arch/x86/include/asm/barrier.h:5:
        In file included from /home/acme/git/linux/arch/x86/include/asm/alternative.h:10:
        /home/acme/git/linux/arch/x86/include/asm/asm.h:145:50: error: unknown register name 'esp' in asm
        register unsigned long current_stack_pointer asm(_ASM_SP);
                                                         ^
        /home/acme/git/linux/arch/x86/include/asm/asm.h:44:18: note: expanded from macro '_ASM_SP'
        #define _ASM_SP         __ASM_REG(sp)
                                ^
        /home/acme/git/linux/arch/x86/include/asm/asm.h:27:32: note: expanded from macro '__ASM_REG'
        #define __ASM_REG(reg)         __ASM_SEL_RAW(e##reg, r##reg)
                                       ^
        /home/acme/git/linux/arch/x86/include/asm/asm.h:18:29: note: expanded from macro '__ASM_SEL_RAW'
        # define __ASM_SEL_RAW(a,b) __ASM_FORM_RAW(a)
                                    ^
        /home/acme/git/linux/arch/x86/include/asm/asm.h:11:32: note: expanded from macro '__ASM_FORM_RAW'
        # define __ASM_FORM_RAW(x)     #x
                                       ^
        <scratch space>:4:1: note: expanded from here
        "esp"
        ^
        1 error generated.
        ERROR:	unable to compile -
        Hint:	Check error message shown above.
        Hint:	You can also pre-compile it into .o using:
           		  clang -target bpf -O2 -c -
           	  with proper -I and -D options.
        Failed to compile test case: 'kbuild searching'
        test child finished with -1
        ---- end ----
        LLVM search and compile subtest 1: FAILED!
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Dmitriy Vyukov <dvyukov@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthias Kaehlcke <mka@chromium.org>
      Cc: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Yonghong Song <yhs@fb.com>
      Link: https://lkml.kernel.org/r/20171128175948.GL3298@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c343bade
    • William Cohen's avatar
      perf vendor events: Use more flexible pattern matching for CPU identification for mapfile.csv · fbc2844e
      William Cohen authored
      The powerpc cpuid information includes chip revision information.
      Changes between chip revisions are usually minor bug fixes and usually
      do not affect the operation of the performance monitoring hardware.
      
      The original mapfile.csv matching requires enumerating every possible
      cpuid string.  When a new minor chip revision is produced a new entry
      has to be added to the mapfile.csv and the code recompiled to allow perf
      to have the implementation specific perf events for this new minor
      revision.  For users of various distibutions of Linux having to wait for
      a new release of the kernel's perf tool to be built with these trivial
      patches is inconvenient.
      
      Using regular expressions rather than exactly string matching of the
      entire cpuid string allows developers to write mapfile.csv files that do
      not require patches and recompiles for each of these minor version
      changes.  If special cases need to be made for some particular versions,
      they can be placed earlier in the mapfile.csv file before the more
      general matches.
      Signed-off-by: default avatarWilliam Cohen <wcohen@redhat.com>
      Tested-by: default avatarRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Shriya <shriyak@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/20171204145728.16792-1-wcohen@redhat.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      fbc2844e
    • Sangwon Hong's avatar
      perf c2c: Add a tip about cacheline events · 01251952
      Sangwon Hong authored
      Signed-off-by: default avatarSangwon Hong <qpakzk@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/r/1512188201-14109-1-git-send-email-qpakzk@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      01251952
    • Wang Nan's avatar
      perf mmap: Remove overwrite and check_messup from mmap read · 8eb7a1fe
      Wang Nan authored
      All perf_mmap__read_forward() read from read-write ring buffer, so no
      need check_messup. Reading from backward ring buffer doesn't require
      check_messup because it never mess up. Cleanup arguments lists.
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Link: http://lkml.kernel.org/r/20171203020044.81680-6-wangnan0@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8eb7a1fe
    • Wang Nan's avatar
      perf mmap: Remove overwrite from arguments list of perf_mmap__push · ca6a9a05
      Wang Nan authored
      'overwrite' argument is always 'false'. Remove it from arguments list of
      perf_mmap__push().
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Link: http://lkml.kernel.org/r/20171203020044.81680-5-wangnan0@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ca6a9a05
    • Wang Nan's avatar
      perf evlist: Remove evlist->overwrite · 144b9a4f
      Wang Nan authored
      evlist->overwrite is set to false in all users. It can be removed.
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Link: http://lkml.kernel.org/r/20171203020044.81680-4-wangnan0@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      144b9a4f
    • Wang Nan's avatar
      perf evlist: Remove 'overwrite' parameter from perf_evlist__mmap_ex · 7a276ff6
      Wang Nan authored
      All users of perf_evlist__mmap_ex set !overwrite. Remove it from its
      arguments list.
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Link: http://lkml.kernel.org/r/20171203020044.81680-3-wangnan0@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7a276ff6
    • Wang Nan's avatar
      perf evlist: Remove 'overwrite' parameter from perf_evlist__mmap · f74b9d3a
      Wang Nan authored
      Now all perf_evlist__mmap's users doesn't set 'overwrite'. Remove it
      from arguments list.
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Link: http://lkml.kernel.org/r/20171203020044.81680-2-wangnan0@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f74b9d3a
    • Jiri Olsa's avatar
      perf tools: Fix up build in hardnened environments · c6707fde
      Jiri Olsa authored
      On Fedora systems the perl and python CFLAGS/LDFLAGS include the
      hardened specs from redhat-rpm-config package. We apply them only for
      perl/python objects, which makes them not compatible with the rest of
      the objects and the build fails with:
      
        /usr/bin/ld: perf-in.o: relocation R_X86_64_32 against `.rodata.str1.1' can not be used when making a shared object; recompile with -f
      +PIC
        /usr/bin/ld: libperf.a(libperf-in.o): relocation R_X86_64_32S against `.text' can not be used when making a shared object; recompile w
      +ith -fPIC
        /usr/bin/ld: final link failed: Nonrepresentable section on output
        collect2: error: ld returned 1 exit status
        make[2]: *** [Makefile.perf:507: perf] Error 1
        make[1]: *** [Makefile.perf:210: sub-make] Error 2
        make: *** [Makefile:69: all] Error 2
      
      Mainly it's caused by perl/python objects being compiled with:
      
        -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1
      
      which prevent the final link impossible, because it will check
      for 'proper' objects with following option:
      
        -specs=/usr/lib/rpm/redhat/redhat-hardened-ld
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20171204082437.GC30564@kravaSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c6707fde
    • Ganapatrao Kulkarni's avatar
      perf pmu: Add check for valid cpuid in perf_pmu__find_map() · de3d0f12
      Ganapatrao Kulkarni authored
      On some platforms(arm/arm64) which uses cpus map to get corresponding
      cpuid string, cpuid can be NULL for PMUs other than CORE PMUs.  Adding
      check for NULL cpuid in function perf_pmu__find_map to avoid
      segmentation fault.
      Signed-off-by: default avatarGanapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Ganapatrao Kulkarni <gklkml16@gmail.com>
      Cc: Jayachandran C <jnair@caviumnetworks.com>
      Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Richter <robert.richter@cavium.com>
      Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lkml.kernel.org/r/20171016183222.25750-6-ganapatrao.kulkarni@cavium.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      de3d0f12
    • Ganapatrao Kulkarni's avatar
      perf vendor events arm64: Add ThunderX2 implementation defined pmu core events · d3964221
      Ganapatrao Kulkarni authored
      This is not a full event list, but a short list of useful events.
      Signed-off-by: default avatarGanapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Ganapatrao Kulkarni <gklkml16@gmail.com>
      Cc: Jayachandran C <jnair@caviumnetworks.com>
      Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Richter <robert.richter@cavium.com>
      Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lkml.kernel.org/r/20171016183222.25750-5-ganapatrao.kulkarni@cavium.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d3964221
    • Ganapatrao Kulkarni's avatar
      perf pmu: Add helper function is_pmu_core to detect PMU CORE devices · 14b22ae0
      Ganapatrao Kulkarni authored
      On some platforms, PMU core devices sysfs name is not cpu.
      Adding function is_pmu_core to detect PMU core devices using
      core device specific hints in sysfs.
      
      For arm64 platforms, all core devices have file "cpus" in sysfs.
      Signed-off-by: default avatarGanapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
      Tested-by: default avatarShaokun Zhang <zhangshaokun@hisilicon.com>
      Tested-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Link: https://lkml.kernel.org/n/tip-y1woxt1k2pqqwpprhonnft2s@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      14b22ae0
    • Ganapatrao Kulkarni's avatar
      perf tools arm64: Add support for get_cpuid_str function. · b57df288
      Ganapatrao Kulkarni authored
      The get_cpuid_str function returns the MIDR string of the first online
      cpu from the range of cpus associated with the PMU CORE device.
      Signed-off-by: default avatarGanapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Ganapatrao Kulkarni <gklkml16@gmail.com>
      Cc: Jayachandran C <jnair@caviumnetworks.com>
      Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Richter <robert.richter@cavium.com>
      Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lkml.kernel.org/r/20171016183222.25750-3-ganapatrao.kulkarni@cavium.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b57df288
    • Ganapatrao Kulkarni's avatar
      perf pmu: Pass pmu as a parameter to get_cpuid_str() · 54e32dc0
      Ganapatrao Kulkarni authored
      The cpuid string will not be same on all CPUs on heterogeneous platforms
      like ARM's big.LITTLE, adding provision(using pmu->cpus) to find cpuid
      string from associated CPUs of PMU CORE device.
      
      Also optimise arguments to function pmu_add_cpu_aliases.
      Signed-off-by: default avatarGanapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Jayachandran C <jnair@caviumnetworks.com>
      Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Richter <robert.richter@cavium.com>
      Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
      Link: http://lkml.kernel.org/r/20171016183222.25750-2-ganapatrao.kulkarni@cavium.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      54e32dc0
    • Hendrik Brueckner's avatar
      perf s390: Always build with -fPIC · 1dc4ddf1
      Hendrik Brueckner authored
      On s390, object files must be compiled with position-indepedent code in
      order to be incrementally linked or linked to shared libraries.
      Therefore, add -fPIC to the CFLAGS for s390 to ensure each object file
      is built properly.
      Reported-by: default avatarJonathan Hermann <jonathan.hermann@de.ibm.com>
      Signed-off-by: default avatarHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
      Cc: linux s390 list <linux-s390@vger.kernel.org>
      LPU-Reference: 1512031765-9382-1-git-send-email-brueckner@linux.vnet.ibm.com
      Link: https://lkml.kernel.org/n/tip-a8wga8hrl0d0r84cal96fmgv@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      1dc4ddf1
    • Arnaldo Carvalho de Melo's avatar
      perf thread_map: Add method to map all threads in the system · 8d3cd4c3
      Arnaldo Carvalho de Melo authored
      Reusing the thread_map__new_by_uid() proc scanning already in place to
      return a map with all threads in the system.
      Based-on-a-patch-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lkml.kernel.org/n/tip-khh28q0wwqbqtrk32bfe07hd@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8d3cd4c3
    • Jin Yao's avatar
      perf stat: Add rbtree node_delete op · b984aff7
      Jin Yao authored
      In current stat-shadow.c, the rbtree deleting is ignored.
      
      The patch adds the implementation to node_delete method of rblist.
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1512125856-22056-5-git-send-email-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b984aff7
    • Jin Yao's avatar
      perf rblist: Create rblist__exit() function · 33fec3e3
      Jin Yao authored
      Currently we have a rblist__delete() which is used to delete a rblist.
      While rblist__delete() will free the pointer of rblist at the end.
      
      It's an inconvenience for the user to delete a rblist which is not
      allocated by something like malloc(). For example, the rblist is
      embedded in a larger data structure.
      
      This patch creates a new function rblist__exit() which is similar to
      rblist__delete() but it will not free the pointer of rblist.
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1512125856-22056-2-git-send-email-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      33fec3e3
    • Thomas Richter's avatar
      perf annotate: Fix objdump comment parsing for Intel mov dissassembly · 35a8a148
      Thomas Richter authored
      The command 'perf annotate' parses the output of objdump and also
      investigates the comments produced by objdump. For example the
      output of objdump produces (on x86):
      
      23eee:  4c 8b 3d 13 01 21 00 mov 0x210113(%rip),%r15
                                      # 234008 <stderr@@GLIBC_2.2.5+0x9a8>
      
      and the function mov__parse() is called to investigate the complete
      line. Mov__parse() breaks this line into several parts and finally
      calls function comment__symbol() to parse the data after the comment
      character '#'. Comment__symbol() expects a hexadecimal address followed
      by a symbol in '<' and '>' brackets.
      
      However the 2nd parameter given to function comment__symbol()
      always points to the comment character '#'. The address parsing
      always returns 0 because the character '#' is not a digit and
      strtoull() fails without being noticed.
      
      Fix this by advancing the second parameter to function comment__symbol()
      by one byte before invocation and add an error check after strtoull()
      has been called.
      Signed-off-by: default avatarThomas Richter <tmricht@linux.vnet.ibm.com>
      Reviewed-by: default avatarHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Acked-by: default avatarRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Fixes: 6de783b6 ("perf annotate: Resolve symbols using objdump comment")
      Link: http://lkml.kernel.org/r/20171128075632.72182-1-tmricht@linux.vnet.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      35a8a148
    • Thomas Richter's avatar
      perf annotate: Fix unnecessary memory allocation for s390x · 36c26360
      Thomas Richter authored
      This patch fixes a bug introduced with commit d9f8dfa9 ("perf
      annotate s390: Implement jump types for perf annotate").
      
      'perf annotate' displays annotated assembler output by reading output of
      command objdump and parsing the disassembled lines. For each shown
      mnemonic this function sequence is executed:
      
        disasm_line__new()
        |
        +--> disasm_line__init_ins()
             |
             +--> ins__find()
                  |
                  +--> arch->associate_instruction_ops()
      
      The s390x specific function assigned to function pointer
      associate_instruction_ops refers to function s390__associate_ins_ops().
      
      This function checks for supported mnemonics and assigns a NULL pointer
      to unsupported mnemonics.  However even the NULL pointer is added to the
      architecture dependend instruction array.
      
      This leads to an extremely large architecture instruction array
      (due to array resize logic in function arch__grow_instructions()).
      
      Depending on the objdump output being parsed the array can end up
      with several ten-thousand elements.
      
      This patch checks if a mnemonic is supported and only adds supported
      ones into the architecture instruction array. The array does not contain
      elements with NULL pointers anymore.
      
      Before the patch (With some debug printf output):
      
      [root@s35lp76 perf]# time ./perf annotate --stdio > /tmp/xxxbb
      
      real	8m49.679s
      user	7m13.008s
      sys	0m1.649s
      [root@s35lp76 perf]# fgrep '__ins__find sorted:1 nr_instructions:'
      			/tmp/xxxbb | tail -1
      __ins__find sorted:1 nr_instructions:87433 ins:0x341583c0
      [root@s35lp76 perf]#
      
      The number of different s390x branch/jump/call/return instructions
      entered into the array is 87433.
      
      After the patch (With some printf debug output:)
      
      [root@s35lp76 perf]# time ./perf annotate --stdio > /tmp/xxxaa
      
      real	1m24.553s
      user	0m0.587s
      sys	0m1.530s
      [root@s35lp76 perf]# fgrep '__ins__find sorted:1 nr_instructions:'
      			/tmp/xxxaa | tail -1
      __ins__find sorted:1 nr_instructions:56 ins:0x3f406570
      [root@s35lp76 perf]#
      
      The number of different s390x branch/jump/call/return instructions
      entered into the array is 56 which is sensible.
      Signed-off-by: default avatarThomas Richter <tmricht@linux.vnet.ibm.com>
      Reviewed-by: default avatarHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Acked-by: default avatarRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Link: http://lkml.kernel.org/r/20171124094637.55558-1-tmricht@linux.vnet.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      36c26360
    • James Yang's avatar
      perf bench futex: Sync waker threads · 8085e5ab
      James Yang authored
      Waker threads in the futex wake-parallel benchmark are started by a loop
      using pthread_create().  However, there is no synchronization for when
      the waker threads wake the waiting threads.  Comparison of the waker
      threads' measurement timestamps show they are not all running
      concurrently because older waker threads finish their task before newer
      waker threads even start.
      
      This patch uses a barrier to better synchronize the waker threads.
      
      Signed-off-by: James Yang <james.yang@arm.com
      Cc: Kim Phillips <kim.phillips@arm.com>
      Link: http://lkml.kernel.org/r/20171127042101.3659-4-dave@stgolabs.netSigned-off-by: default avatarDavidlohr Bueso <dave@stgolabs.net>
      [ Disable the wake-parallel test for systems without pthread_barrier_t ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8085e5ab
    • Arnaldo Carvalho de Melo's avatar
      tools build feature: Check if pthread_barrier_t is available · 25ab5abf
      Arnaldo Carvalho de Melo authored
      As 'perf bench futex wake-parallel" will use this, which is not
      available in older systems such as versions of the android NDK used in
      my container build tests (r12b and r15c at the moment).
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: James Yang <james.yang@arm.com
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-1i7iv54in4wj08lwo55b0pzv@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      25ab5abf
  2. 30 Nov, 2017 1 commit
  3. 29 Nov, 2017 12 commits
    • Arnaldo Carvalho de Melo's avatar
    • Adrian Hunter's avatar
      perf intel-pt: Improve build messages for files that differ from the kernel · c2653297
      Adrian Hunter authored
      Print file names of files that differ. For example, instead of:
      
        Warning: Intel PT: x86 instruction decoder differs from kernel
      
      print:
      
        Warning: Intel PT: x86 instruction decoder header at 'tools/perf/util/intel-pt-decoder/inat.h' differs from latest version at 'arch/x86/include/asm/inat.h'
      Reported-by: default avatarIngo Molnar <mingo@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Link: http://lkml.kernel.org/r/1511253326-22308-2-git-send-email-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c2653297
    • Arnaldo Carvalho de Melo's avatar
      perf report: Fix -D output for user metadata events · f250b09c
      Arnaldo Carvalho de Melo authored
      The PERF_RECORD_USER_ events are synthesized by the tool to assist in
      processing the PERF_RECORD_ ones generated by the kernel, the printing
      of that information doesn't come with a perf_sample structure, so, when
      dumping the event fields using 'perf report -D' there were columns that
      end up not being printed.
      
      To tidy up a bit this, fake a perf_sample structure with zeroes to have
      the missing columns printed and avoid the occasional surprise with that.
      
      Before:
      
      0 0x45b8 [0x68]: PERF_RECORD_MMAP -1/0: [0xffffffffc12ec000(0x4000) @ 0]: x /lib/modules/4.14.0+/kernel/fs/nls/nls_utf8.ko
      0x4620 [0x28]: PERF_RECORD_THREAD_MAP nr: 1 thread: 27820
      0x4648 [0x18]: PERF_RECORD_CPU_MAP: 0-3
      0 0x4660 [0x28]: PERF_RECORD_COMM: perf:27820/27820
      0x4a58 [0x8]: PERF_RECORD_FINISHED_ROUND
      447723433020976 0x4688 [0x28]: PERF_RECORD_SAMPLE(IP, 0x4001): 27820/27820: 0xffffffff8f1b6d7a period: 1 addr: 0
      
      After:
      
        $ perf report -D | grep PERF_RECORD_ | head
        0 0xe8 [0x20]: PERF_RECORD_TIME_CONV: unhandled!
        0 0x108 [0x28]: PERF_RECORD_THREAD_MAP nr: 1 thread: 32555
        0 0x130 [0x18]: PERF_RECORD_CPU_MAP: 0-3
        0 0x148 [0x28]: PERF_RECORD_COMM: perf:32555/32555
        0 0x4e8 [0x8]: PERF_RECORD_FINISHED_ROUND
        448743409421205 0x170 [0x28]: PERF_RECORD_COMM exec: sleep:32555/32555
        448743409431883 0x198 [0x68]: PERF_RECORD_MMAP2 32555/32555: [0x55e11d75a000(0x208000) @ 0 fd:00 3147174 2566255743]: r-xp /usr/bin/sleep
        448743409443873 0x200 [0x70]: PERF_RECORD_MMAP2 32555/32555: [0x7f0ced316000(0x229000) @ 0 fd:00 3151761 2566238119]: r-xp /usr/lib64/ld-2.25.so
        448743409454790 0x270 [0x60]: PERF_RECORD_MMAP2 32555/32555: [0x7ffe84f6d000(0x2000) @ 0 00:00 0 0]: r-xp [vdso]
        448743409479500 0x2d0 [0x28]: PERF_RECORD_SAMPLE(IP, 0x4002): 32555/32555: 0xffffffff8f84c7e7 period: 1 addr: 0
        $
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 9aefcab0 ("perf session: Consolidate the dump code")
      Link: https://lkml.kernel.org/n/tip-todcu15x0cwgppkh1gi6uhru@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f250b09c
    • Hansuk Hong's avatar
      perf buildid-cache: Document for Node.js USDT · 2e38e661
      Hansuk Hong authored
      Add a tip for Node.js USDT(User-Level Statically Defined Tracing) probes
      in tips.txt
      Signed-off-by: default avatarHansuk Hong <flavono123@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/r/20171123160546.9722-1-flavono123@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2e38e661
    • Andi Kleen's avatar
      perf script: Allow computing 'perf stat' style metrics · 4bd1bef8
      Andi Kleen authored
      Add support for computing 'perf stat' style metrics in 'perf script'.
      
      When using leader sampling we can get metrics for each sampling period
      by computing formulas over the values of the different group members.
      
      This allows things like fine grained IPC tracking through sampling, much
      more fine grained than with 'perf stat'.
      
      The metric is still averaged over the sampling period, it is not just
      for the sampling point.
      
      This patch adds a new metric output field for 'perf script' that uses
      the existing 'perf stat' metrics infrastructure to compute any metrics
      supported by 'perf stat'.
      
      For example to sample IPC:
      
        $ perf record -e '{ref-cycles,cycles,instructions}:S' -a sleep 1
        $ perf script -F metric,ip,sym,time,cpu,comm
        ...
         alsa-sink-ALC32 [000] 42815.856074:      7fd65937d6cc [unknown]
         alsa-sink-ALC32 [000] 42815.856074:      7fd65937d6cc [unknown]
         alsa-sink-ALC32 [000] 42815.856074:      7fd65937d6cc [unknown]
         alsa-sink-ALC32 [000] 42815.856074:    metric:    0.13  insn per cycle
                 swapper [000] 42815.857961:  ffffffff81655df0 __schedule
                 swapper [000] 42815.857961:  ffffffff81655df0 __schedule
                 swapper [000] 42815.857961:  ffffffff81655df0 __schedule
                 swapper [000] 42815.857961:    metric:    0.23  insn per cycle
         qemu-system-x86 [000] 42815.858130:  ffffffff8165ad0e _raw_spin_unlock_irqrestore
         qemu-system-x86 [000] 42815.858130:  ffffffff8165ad0e _raw_spin_unlock_irqrestore
         qemu-system-x86 [000] 42815.858130:  ffffffff8165ad0e _raw_spin_unlock_irqrestore
         qemu-system-x86 [000] 42815.858130:    metric:    0.46  insn per cycle
                   :4972 [000] 42815.858312:  ffffffffa080e5f2 vmx_vcpu_run
                   :4972 [000] 42815.858312:  ffffffffa080e5f2 vmx_vcpu_run
                   :4972 [000] 42815.858312:  ffffffffa080e5f2 vmx_vcpu_run
                   :4972 [000] 42815.858312:    metric:    0.45  insn per cycle
      
      TopDown:
      
      This requires disabling SMT if you have it enabled, because SMT would
      require sampling per core, which is not supported.
      
        $ perf record -e '{ref-cycles,topdown-fetch-bubbles,\
                           topdown-recovery-bubbles,\
                           topdown-slots-retired,topdown-total-slots,\
                           topdown-slots-issued}:S' -a sleep 1
        $ perf script --header -I -F cpu,ip,sym,event,metric,period
        ...
        [000]     121108               ref-cycles:  ffffffff8165222e copy_user_enhanced_fast_string
        [000]     190350    topdown-fetch-bubbles:  ffffffff8165222e copy_user_enhanced_fast_string
        [000]       2055 topdown-recovery-bubbles:  ffffffff8165222e copy_user_enhanced_fast_string
        [000]     148729    topdown-slots-retired:  ffffffff8165222e copy_user_enhanced_fast_string
        [000]     144324      topdown-total-slots:  ffffffff8165222e copy_user_enhanced_fast_string
        [000]     160852     topdown-slots-issued:  ffffffff8165222e copy_user_enhanced_fast_string
        [000]   metric:     33.0% frontend bound
        [000]   metric:      3.5% bad speculation
        [000]   metric:     25.8% retiring
        [000]   metric:     37.7% backend bound
        [000]     112112               ref-cycles:  ffffffff8165aec8 _raw_spin_lock_irqsave
        [000]     357222    topdown-fetch-bubbles:  ffffffff8165aec8 _raw_spin_lock_irqsave
        [000]       3325 topdown-recovery-bubbles:  ffffffff8165aec8 _raw_spin_lock_irqsave
        [000]     323553    topdown-slots-retired:  ffffffff8165aec8 _raw_spin_lock_irqsave
        [000]     270507      topdown-total-slots:  ffffffff8165aec8 _raw_spin_lock_irqsave
        [000]     341226     topdown-slots-issued:  ffffffff8165aec8 _raw_spin_lock_irqsave
        [000]   metric:     33.0% frontend bound
        [000]   metric:      2.9% bad speculation
        [000]   metric:     29.9% retiring
        [000]   metric:     34.2% backend bound
      ...
      
      v2:
      Use evsel->priv for new fields
      Port to new base line, support fp output.
      Handle stats in ->stats, not ->priv
      Minor cleanups
      
      Extra explanation about the use of the term 'averaging', from Andi in the
      thread in the Link: tag below:
      
      <quote Andi>
      The current samples contains the sum of event counts for a sampling period.
      
      EventA-1           EventA-2                EventA-3      EventA-4
      EventB-1     EventB-2                             EventC-3
      
                               gap with no events                overflow
      |-----------------------------------------------------------------|
      period-start                                             period-end
      ^                                                                 ^
      |                                                                 |
      previous sample                                      current sample
      
      So EventA = 4 and EventB = 3 at the sample point
      
      I generate a metric, let's say EventA / EventB. It applies to the whole period.
      
      But the metric is over a longer time which does not have the same behavior. For
      example the gap above doesn't have any events, while they are clustered at the
      beginning and end of the sample period.
      
      But we're summing everything together. The metric doesn't know that the gap is
      different than the busy period.
      
      That's what I'm trying to express with averaging.
      </quote>
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20171117214300.32746-4-andi@firstfloor.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4bd1bef8
    • Andi Kleen's avatar
      perf record: Synthesize thread map and cpu map · 373565d2
      Andi Kleen authored
      Synthesize the per attr thread maps and cpu maps in 'perf record'.
      
      This allows code from 'perf stat' called from 'perf script' to access
      this information.
      
      Committer testing:
      
      Please see the PERF_RECORD_THREAD_MAP and PERF_RECORD_CPU_MAP records,
      added by this patch:
      
        $ perf record sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.001 MB perf.data (8 samples) ]
        $ perf report -D | grep PERF_RECORD_ | head
        0xe8 [0x20]: PERF_RECORD_TIME_CONV: unhandled!
        0x108 [0x28]: PERF_RECORD_THREAD_MAP nr: 1 thread: 23568
        0x130 [0x18]: PERF_RECORD_CPU_MAP: 0-3
        0 0x148 [0x28]: PERF_RECORD_COMM: perf:23568/23568
        0x570 [0x8]: PERF_RECORD_FINISHED_ROUND
        445342677837144 0x170 [0x28]: PERF_RECORD_COMM exec: sleep:23568/23568
        445342677847339 0x198 [0x68]: PERF_RECORD_MMAP2 23568/23568: [0x564c943a4000(0x208000) @ 0 fd:00 3147174 2566255743]: r-xp /usr/bin/sleep
        445342677862450 0x200 [0x70]: PERF_RECORD_MMAP2 23568/23568: [0x7f25968a8000(0x229000) @ 0 fd:00 3151761 2566238119]: r-xp /usr/lib64/ld-2.25.so
        445342677873174 0x270 [0x60]: PERF_RECORD_MMAP2 23568/23568: [0x7ffc98176000(0x2000) @ 0 00:00 0 0]: r-xp [vdso]
        445342677891928 0x2d0 [0x28]: PERF_RECORD_SAMPLE(IP, 0x4002): 23568/23568: 0xffffffff8f84c7e7 period: 1 addr: 0
        $
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Link: http://lkml.kernel.org/r/20171117214300.32746-3-andi@firstfloor.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      373565d2
    • Andi Kleen's avatar
      perf record: Synthesize unit/scale/... in event update · bfd8f72c
      Andi Kleen authored
      Move the code to synthesize event updates for scale/unit/cpus to a
      common utility file, and use it both from stat and record.
      
      This allows to access scale and other extra qualifiers from perf script.
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20171117214300.32746-2-andi@firstfloor.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bfd8f72c
    • Thomas Richter's avatar
      perf test: Disable test cases 19 and 20 on s390x · 4ca69ca9
      Thomas Richter authored
      The s390x CPU sampling and measurement facilities do not support perf
      events of type PERF_TYPE_BREAKPOINT. The test cases are executed and
      fail with -ENOENT due to missing hardware support.
      
      Disable the execution of both test cases based on a
      platform check. This is the same approach as done for
      PowerPC.
      Signed-off-by: default avatarThomas Richter <tmricht@linux.vnet.ibm.com>
      Reviewed-by: default avatarHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      LPU-Reference: 20171123074623.20817-1-tmricht@linux.vnet.ibm.com
      Link: https://lkml.kernel.org/n/tip-uqvoy6a1tsu8jddo5jjg4h85@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4ca69ca9
    • Ingo Molnar's avatar
      tools headers: Follow the upstream UAPI header version 100% differ from the kernel · 3f27bb5f
      Ingo Molnar authored
      Remove this from check-headers.sh:
      
        opts="--ignore-blank-lines --ignore-space-change"
      
      as the easiest policy is to just follow the upstream UAPI header version 100%.
      Pure space-only changes are comparatively rare.
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Link: http://lkml.kernel.org/r/20171121084111.y6p5zwqso2cbms5s@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3f27bb5f
    • Ingo Molnar's avatar
      e4f57147
    • Ingo Molnar's avatar
      Merge branch 'perf/urgent' of... · 6e948c67
      Ingo Molnar authored
      Merge branch 'perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
      
      Pull perf tooling fixes from Arnaldo Carvalho de Melo:
      
      "- Fix window dimensions change handling in 'perf top' (Jiri Olsa)
      
      - Fix 'perf record -c/-F' options for CPU event aliases (Andi Kleen)
      
      - Generate PERF_RECORD_{MMAP,COMM,EXEC} with 'perf record --delay'
        fixing symbol resolution for processes created, maps put in place
        while --delay happens (Arnaldo Carvalho de Melo)
      
      - Fix up leftover perf_evsel_stat usage via evsel->priv, plugging
        a SEGV when using event groups as in:
      
           $ perf stat -e '{cpu-clock,instructions}' workload
      
      - Fix 'perf script --per-event-dump' for auxtrace synth evsels (Arnaldo Carvalho de Melo)
      
      - Ignore kptr_restrict when not sampling the kernel (Arnaldo Carvalho de Melo)
      
      - Synchronize kernel ABI headers wrt SPDX tags and ABI changes,
        taking minimal action to handle new syscall args and silencing
        perf build warnings (Arnaldo Carvalho de Melo, Ingo Molnar)
      
      - Fix header.size for namespace events (Jiri Olsa)
      
      - Fix a bug during strstart() conversion in 'perf help' (Namhyung Kim)
      
      - Do not truncate instruction names at 6 chars in 'perf annotate', there
        are really long instruction names in PPC (Ravi Bangoria)
      
      - Fixup discontiguous/sparse numa nodes in 'perf bench numa' (Satheesh Rajendran)
      
      - Fix an exit code of trace__symbols_init in 'perf trace' (Andrei Vagin)
      
      - Fix 'perf test' entries on s/390 (Thomas Richter)
      
      - Bring instruction decoder files used by Intel PT into line with the kernel,
        silencing build warning (Adrian Hunter)"
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6e948c67
    • Ingo Molnar's avatar
  4. 28 Nov, 2017 2 commits
    • Arnaldo Carvalho de Melo's avatar
      tools headers: Syncronize mman.h ABI header · 1b3b5219
      Arnaldo Carvalho de Melo authored
      To add support for the MAP_SYNC flag introduced in:
      
        b6fb293f ("mm: Define MAP_SYNC and VM_SYNC flags")
      
      Update tools/perf/trace/beauty/mmap.c to support that flag.
      
      This silences this perf build warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/mman.h' differs from latest version at 'include/uapi/asm-generic/mman.h'
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-14zyk3iywrj37c7g1eagmzbo@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      1b3b5219
    • Arnaldo Carvalho de Melo's avatar
      tools headers: Synchronize prctl.h ABI header · d9744f94
      Arnaldo Carvalho de Melo authored
      To pick up changes from:
      
        2d2123bc ("arm64/sve: Add prctl controls for userspace vector length management")
        7582e220 ("arm64/sve: Backend logic for setting the vector length")
      
      That showed a limitation of the regexp used in tools/perf/trace/beauty/prctl_option.sh,
      that matches only PR_{SET,GET}_, but should match a few more, like
      PR_MPX_*, PR_CAP_* and the one added by the above commit, PR_SVE_SET_*.
      
      This silences this warning when building tools/perf:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/prctl.h' differs from latest version at 'include/uapi/linux/prctl.h'
      
      Support for those extra prctl options should be left for the next merge
      window tho.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Dave Martin <Dave.Martin@arm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Link: https://lkml.kernel.org/n/tip-r52dsyuzy04qzqyfcifjs35t@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d9744f94