1. 20 Jul, 2017 13 commits
    • Arnaldo Carvalho de Melo's avatar
      perf trace beauty clone: Beautify syscall arguments · 33396a3a
      Arnaldo Carvalho de Melo authored
      Now, syswide tracing, selected entries:
      
        # trace -e clone
        24417.203 ( 0.158 ms): bash/11323 clone(flags: CHILD_CLEARTID|CHILD_SETTID|0x11, child_stack: 0, parent_tidptr: 0, child_tidptr: 0x7f0778e5c9d0, tls: 0x7f0778e5c700) = 11325 (bash)
                ? (     ?   ): bash/11325  ... [continued]: clone()) = 0
        24419.355 ( 0.093 ms): bash/10586 clone(flags: CHILD_CLEARTID|CHILD_SETTID|0x11, child_stack: 0, parent_tidptr: 0, child_tidptr: 0x7f0778e5c9d0, tls: 0x7f0778e5c700) = 11326 (bash)
                ? (     ?   ): bash/11326  ... [continued]: clone()) = 0
        24419.744 ( 0.102 ms): bash/11326 clone(flags: CHILD_CLEARTID|CHILD_SETTID|0x11, child_stack: 0, parent_tidptr: 0, child_tidptr: 0x7f0778e5c9d0, tls: 0x7f0778e5c700) = 11327 (bash)
                ? (     ?   ): bash/11327  ... [continued]: clone()) = 0
        24420.138 ( 0.105 ms): bash/11327 clone(flags: CHILD_CLEARTID|CHILD_SETTID|0x11, child_stack: 0, parent_tidptr: 0, child_tidptr: 0x7f0778e5c9d0, tls: 0x7f0778e5c700) = 11328 (bash)
                ? (     ?   ): bash/11328  ... [continued]: clone()) = 0
        35747.722 ( 0.044 ms): gpg-agent/18087 clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7ff0755f6ff0, parent_tidptr: 0x7ff0755f79d0, child_tidptr: 0x7ff0755f79d0, tls: 0x7ff0755f7700) = 11329 (gpg-agent)
                ? (     ?   ): gpg-agent/11329  ... [continued]: clone()) = 0
        35748.359 ( 0.022 ms): gpg-agent/18087 clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7ff075df7ff0, parent_tidptr: 0x7ff075df89d0, child_tidptr: 0x7ff075df89d0, tls: 0x7ff075df8700) = 11330 (gpg-agent)
                ? (     ?   ): gpg-agent/11330  ... [continued]: clone()) = 0
        35781.422 ( 0.452 ms): NetworkManager/1112 clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f2f1fffedb0, parent_tidptr: 0x7f2f1ffff9d0, child_tidptr: 0x7f2f1ffff9d0, tls: 0x7f2f1ffff700) = 11331 (NetworkManager)
                ? (     ?   ): NetworkManager/11331  ... [continued]: clone()) = 0
      
      Need to improve the formatting of the second return, to the child, this
      cset only focused on the argument formatting.
      
      If we trace just one pid:
      
        # trace -e clone -p 19863
           0.349 ( 0.025 ms): Chrome_IOThrea/19863 clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7ffb84eaac70, parent_tidptr: 0x7ffb84eab9d0, child_tidptr: 0x7ffb84eab9d0, tls: 0x7ffb84eab700) = 11637 (Chrome_IOThread)
           0.392 ( 0.013 ms): Chrome_IOThrea/19863 clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7ffb664b8c70, parent_tidptr: 0x7ffb664b99d0, child_tidptr: 0x7ffb664b99d0, tls: 0x7ffb664b9700) = 11638 (Chrome_IOThread)
           0.573 ( 0.015 ms): Chrome_IOThrea/19863 clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7ffb6046cc70, parent_tidptr: 0x7ffb6046d9d0, child_tidptr: 0x7ffb6046d9d0, tls: 0x7ffb6046d700) = 11639 (Chrome_IOThread)
           0.617 ( 0.014 ms): Chrome_IOThrea/19863 clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7ffb730dcc70, parent_tidptr: 0x7ffb730dd9d0, child_tidptr: 0x7ffb730dd9d0, tls: 0x7ffb730dd700) = 11640 (Chrome_IOThread)
           4.350 ( 0.065 ms): Chrome_IOThrea/19863 clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7ffb720d9c70, parent_tidptr: 0x7ffb720da9d0, child_tidptr: 0x7ffb720da9d0, tls: 0x7ffb720da700) = 11642 (Chrome_IOThread)
           5.642 ( 0.079 ms): Chrome_IOThrea/19863 clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7ffb718d8c70, parent_tidptr: 0x7ffb718d99d0, child_tidptr: 0x7ffb718d99d0, tls: 0x7ffb718d9700) = 11643 (Chrome_IOThread)
      ^C#
      
      We'll also have to fix the argument ordering in different arches,
      probably having multiple syscall_fmt entries with each possible order
      and then use perf_evsel__env_arch() (if dealing with a perf.data file)
      or the current system info, for live sessions.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-am068uyubgj83snepolwhbfe@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      33396a3a
    • Arnaldo Carvalho de Melo's avatar
      tools include uapi: Grab a copy of linux/sched.h · 450c86c9
      Arnaldo Carvalho de Melo authored
      So that we make sure we have recent enough defines for things
      such as 'perf trace' system call argument beautifiers.
      
      For instance, the 'clone' syscall argument 'flag' needs to use
      CLONE_NEWCGROUP, and that is not available in RHEL7.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-81sln0ng4a2lcxrth14vcov4@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      450c86c9
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Allow specifying names to syscall arguments formatters · c51bdfec
      Arnaldo Carvalho de Melo authored
      For tracepointless syscalls, like clone, otherwise get them from the
      tracepoint's /format file.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-ml5qvv1w5k96ghwhxpzzsmm3@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c51bdfec
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Allow specifying number of syscall args for tracepointless syscalls · 332337da
      Arnaldo Carvalho de Melo authored
      When we don't have syscalls:sys_{enter,exit}_NAME, we had to resort to
      dumping all the 6 syscall arguments, fix it by providing that info for
      such syscalls, like 'clone'.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-dfq1jtrxj8dqvqoeqqpr3slu@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      332337da
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Ditch __syscall__arg_val() variant, not needed anymore · 325f5091
      Arnaldo Carvalho de Melo authored
      All callers now can use syscall__arg_val(arg, idx), be it to iterate
      thru the syscall arguments while taking into account alignment, or to
      get values for other arguments that affect how the current argument
      should be formatted (think of fcntl's 'cmd' and 'arg' arguments).
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-wm5b156d8kro1r4y3b33eyta@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      325f5091
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Use the syscall_fmt formatters without a tracepoint · d032d79e
      Arnaldo Carvalho de Melo authored
      Previously we only used the syscall_fmt when we had sc->tp_format set,
      i.e. when we found the (enter, exit) pair in tracefs/events/syscalls/.
      
      But we really only need to use what is in sc->arg_fmt to apply the arg
      beautifiers to the syscall argument values, so do it.
      
      With this we will be able to provide formatters to the "clone" syscall,
      which doesn't have entries in tracefs/events/syscalls/.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-y41nl41jrayjo5ucnde2peix@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d032d79e
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Allow allocating sc->arg_fmt even without the syscall tracepoint · 5e58fcfa
      Arnaldo Carvalho de Melo authored
      At least "clone" doesn't have (enter, exit) entries tracefs/events/syscalls/,
      but we can provide a syscall_fmt and use it instead, as will be done for
      "clone" in the next cset.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-o12kejgcxddyovn2hlg4gbim@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      5e58fcfa
    • Arnaldo Carvalho de Melo's avatar
      perf trace beauty mmap: Ignore 'fd' and 'offset' args for MAP_ANONYMOUS · d57da8c9
      Arnaldo Carvalho de Melo authored
      Just suppress them, not used by the kernel.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-atpt07y2x9a8ttlwja94ow3j@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d57da8c9
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Add missing ' = ' in the default formatting of syscall returns · 6f8fe61e
      Arnaldo Carvalho de Melo authored
      We lost it recently, put it back.
      
      Before:
      
        789.499 ( 0.001 ms): libvirtd/1175 lseek(fd: 22, whence: CUR) 4328
      
      After:
      
        789.499 ( 0.001 ms): libvirtd/1175 lseek(fd: 22, whence: CUR) = 4328
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: 1f63139c ("perf trace beauty: Simplify syscall return formatting")
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6f8fe61e
    • Kan Liang's avatar
      perf intel-pt: Always set no branch for dummy event · 91a8c5b8
      Kan Liang authored
      An earlier kernel patch allowed enabling PT and LBR at the same time on
      Goldmont.
      
      commit ccbebba4 ("perf/x86/intel/pt: Bypass PT vs. LBR exclusivity
      if the core supports it")
      
      However, users still cannot use Intel PT and LBRs simultaneously.  $
      sudo perf record -e cycles,intel_pt//u -b  -- sleep 1 Error: PMU
      Hardware doesn't support sampling/overflow-interrupts.
      
      PT implicitly adds dummy event in perf tool. dummy event is software
      event which doesn't support LBR.
      
      Always setting no branch for dummy event in Intel PT.
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20170630141656.1626-2-kan.liang@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      91a8c5b8
    • Kan Liang's avatar
      perf intel-pt: Set no_aux_samples for the tracking event · 69d8bd8a
      Kan Liang authored
      The reason of introducing the tracking event (a dummy software event) is
      to collect side-band information. Additional sampling is wasteful.
      no_aux_samples should be set for tracking event.
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20170630141656.1626-1-kan.liang@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      69d8bd8a
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo-4.13-20170718' of... · 510457ec
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo-4.13-20170718' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      User visible changes:
      
      - Initial support for namespaces, using setns to access files in
        namespaces, grabbing their build-ids, etc. We still need to work
        more to deal with namespaces that vanish before we can get the
        needed data to do analysis, but this should be as good as what is
        in bcc now (Krister Johansen)
      
      - Add header record types to pipe-mode, now this command:
      
        $ perf record -o - -e cycles sleep 1 | perf report --stdio --header
      
        Will show the same as in non-pipe mode, i.e. involving a perf.data
        file (David Carrillo-Cisneros)
      
      - Implement a visual marker for fused x86 instructions in the annotate
        TUI browser, available now in 'perf report', more work needed to have
        it available as well in 'perf top' (Jin Yao)
      
        Further explanation from one of Jin's patches:
      
             │   ┌──cmpl   $0x0,argp_program_version_hook
       81.93 │   ├──je     20
             │   │  lock   cmpxchg %esi,0x38a9a4(%rip)
             │   │↓ jne    29
             │   │↓ jmp    43
       11.47 │20:└─→cmpxch %esi,0x38a999(%rip)
      
        That means the cmpl+je is a fused instruction pair and they should be
        considered together.
      
      - Record the branch type and then show statistics and info about
        in callchain entries (Jin Yao)
      
        Example from one of Jin's patches:
      
        # perf record -g -j any,save_type
        # perf report --branch-history --stdio --no-children
      
        38.50%  div.c:45                [.] main                    div
                |
                ---main div.c:42 (RET CROSS_2M cycles:2)
                   compute_flag div.c:28 (cycles:2)
                   compute_flag div.c:27 (RET CROSS_2M cycles:1)
                   rand rand.c:28 (cycles:1)
                   rand rand.c:28 (RET CROSS_2M cycles:1)
                   __random random.c:298 (cycles:1)
                   __random random.c:297 (COND_BWD CROSS_2M cycles:1)
                   __random random.c:295 (cycles:1)
                   __random random.c:295 (COND_BWD CROSS_2M cycles:1)
                   __random random.c:295 (cycles:1)
                   __random random.c:295 (RET CROSS_2M cycles:9)
      
      - Beautify the fcntl syscall, which is an interesting one in the sense
        that infrastructure had to be put in place to change the formatters of
        some arguments according to the value in a previous one, i.e. cmd
        dictates how arg and the syscall return will be formatted.
        (Arnaldo Carvalho de Melo
      
      Infrastructure changes:
      
      - 'perf test attr' fixes (Jiri Olsa)
      
      Vendor events changes:
      
      - Add POWER9 PMU events Sukadev (Bhattiprolu)
      
      - Support additional POWER8+ PVR in PMU mapfile (Shriya)
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      510457ec
    • Alexander Shishkin's avatar
      perf/core: Fix scheduling regression of pinned groups · 3bda69c1
      Alexander Shishkin authored
      Vince Weaver reported:
      
      > I was tracking down some regressions in my perf_event_test testsuite.
      > Some of the tests broke in the 4.11-rc1 timeframe.
      >
      > I've bisected one of them, this report is about
      >	tests/overflow/simul_oneshot_group_overflow
      > This test creates an event group containing two sampling events, set
      > to overflow to a signal handler (which disables and then refreshes the
      > event).
      >
      > On a good kernel you get the following:
      > 	Event perf::instructions with period 1000000
      > 	Event perf::instructions with period 2000000
      > 		fd 3 overflows: 946 (perf::instructions/1000000)
      > 		fd 4 overflows: 473 (perf::instructions/2000000)
      > 	Ending counts:
      > 		Count 0: 946379875
      > 		Count 1: 946365218
      >
      > With the broken kernels you get:
      > 	Event perf::instructions with period 1000000
      > 	Event perf::instructions with period 2000000
      > 		fd 3 overflows: 938 (perf::instructions/1000000)
      > 		fd 4 overflows: 318 (perf::instructions/2000000)
      > 	Ending counts:
      > 		Count 0: 946373080
      > 		Count 1: 653373058
      
      The root cause of the bug is that the following commit:
      
        487f05e1 ("perf/core: Optimize event rescheduling on active contexts")
      
      erronously assumed that event's 'pinned' setting determines whether the
      event belongs to a pinned group or not, but in fact, it's the group
      leader's pinned state that matters.
      
      This was discovered by Vince in the test case described above, where two instruction
      counters are grouped, the group leader is pinned, but the other event is not;
      in the regressed case the counters were off by 33% (the difference between events'
      periods), but should be the same within the error margin.
      
      Fix the problem by looking at the group leader's pinning.
      Reported-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Tested-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Fixes: 487f05e1 ("perf/core: Optimize event rescheduling on active contexts")
      Link: http://lkml.kernel.org/r/87lgnmvw7h.fsf@ashishki-desk.ger.corp.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3bda69c1
  2. 19 Jul, 2017 27 commits