1. 08 Aug, 2018 19 commits
  2. 03 Aug, 2018 1 commit
    • Thomas Richter's avatar
      perf auxtrace: Support for perf report -D for s390 · b96e6615
      Thomas Richter authored
      Add initial support for s390 auxiliary traces using the CPU-Measurement
      Sampling Facility.
      
      Support and ignore PERF_REPORT_AUXTRACE_INFO records in the perf data
      file. Later patches will show the contents of the auxiliary traces.
      
      Setup the auxtrace queues and data structures for s390.  A raw dump of
      the perf.data file now does not show an error when an auxtrace event is
      encountered.
      
      Output before:
      
        [root@s35lp76 perf]# ./perf report -D -i perf.data.auxtrace
        0x128 [0x10]: failed to process type: 70
        Error:
        failed to process sample
      
        0x128 [0x10]: event: 70
        .
        . ... raw event: size 16 bytes
        .  0000:  00 00 00 46 00 00 00 10 00 00 00 00 00 00 00 00  ...F............
      
        0x128 [0x10]: PERF_RECORD_AUXTRACE_INFO type: 0
        [root@s35lp76 perf]#
      
      Output after:
      
         # ./perf report -D -i perf.data.auxtrace |fgrep PERF_RECORD_AUXTRACE
        0 0 0x128 [0x10]: PERF_RECORD_AUXTRACE_INFO type: 5
        0 0 0x25a66 [0x30]: PERF_RECORD_AUXTRACE size: 0x40000
      	   offset: 0  ref: 0  idx: 4  tid: -1  cpu: 4
        ....
      
      Additional notes about the underlying hardware and software
      implementation, provided by Hendrik Brueckner (see Link: below).
      
      =============================================================================
      
      The CPU-Measurement Facility (CPU-MF) provides a set of functions to obtain
      performance information on the mainframe.  Basically, it was introduced
      with System z10 years ago for the z/Architecture, that means, 64-bit.
      For Linux, there are two facilities of interest, counter facility and sampling
      facility.  The counter facility provides hardware counters for instructions,
      cycles, crypto-activities, and many more.
      
      The sampling facility is a hardware sampler that when started will write
      samples at a particular interval into a sampling buffer.  At some point,
      for example, if a sample block is full, it generates an interrupt to collect
      samples (while the sampler continues to run).
      
      Few years ago, I started to provide the a perf PMU to use the counter
      and sampling facilities.  Recently, the device driver was updated to also
      "export" the sampling buffer into the AUX area.  Thomas now completed the
      related perf work to interpret and process these AUX data.
      
      If people are more interested in the sampling facility, they can have a
      look into:
      
      - The Load-Program-Parameter and the CPU-Measurement Facilities, SA23-2260-05
        http://www-01.ibm.com/support/docview.wss?uid=isg26fcd1cc32246f4c8852574ce0044734a
      
      and to learn how-to use it for Linux on Z, have look at chapter 54,
      "Using the CPU-measurement facilities" in the:
      
      - Device Drivers, Features, and Commands, SC33-8411-34
        http://public.dhe.ibm.com/software/dw/linux390/docu/l416dd34.pdf
      
      =============================================================================
      Signed-off-by: default avatarThomas Richter <tmricht@linux.ibm.com>
      Reviewed-by: default avatarHendrik Brueckner <brueckner@linux.ibm.com>
      Link: http://lkml.kernel.org/r/20180803100758.GA28475@linux.ibm.com
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Link: http://lkml.kernel.org/r/20180802074622.13641-2-tmricht@linux.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b96e6615
  3. 02 Aug, 2018 8 commits
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Use perf_evsel__sc_tp_{uint,ptr} for "id"/"args" handling syscalls:* events · f3acd886
      Arnaldo Carvalho de Melo authored
      Now it looks just about the same as for the trace__sys_{enter,exit}.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-y59may7zx1eccnp4m3qm4u0b@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f3acd886
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Setup struct syscall_tp for syscalls:sys_{enter,exit}_NAME events · d32855fa
      Arnaldo Carvalho de Melo authored
      Mapping "__syscall_nr" to "id" and setting up "args" from the offset of
      "__syscall_nr" + sizeof(u64), as the payload for syscalls:* is the same
      as for raw_syscalls:*, just the fields have different names.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-ogeenrpviwcpwl3oy1l55f3m@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d32855fa
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Allow setting up a syscall_tp struct without a format_field · aa823f58
      Arnaldo Carvalho de Melo authored
      To avoid having to ask libtraceevent to find a field by name when
      handling each tracepoint event, we setup a struct syscall_tp with
      a tp_field struct having an extractor function + the offset for the
      "id", "args" and "ret" raw_syscalls:sys_{enter,exit} tracepoints.
      
      Now that we want to do the same with syscalls:sys_{entry,exit}_NAME
      individual syscall tracepoints, where we have "id" as "__syscall_nr" and
      "args" as the actual series of per syscall parameters, we need more
      flexibility from the routines that set up these pre-looked up syscall
      tracepoint arg fields.
      
      The next cset will use it.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-v59q5e0jrlzkpl9a1c7t81ni@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      aa823f58
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Rename some syscall_tp methods to raw_syscall · 63f11c80
      Arnaldo Carvalho de Melo authored
      Because raw_syscalls have the field for the syscall number as 'id' while
      the syscalls:sys_{enter,exit}_NAME have it as __syscall_nr...
      
      Since we want to support both for being able to enable just a
      syscalls:sys_{enter,exit}_name instead of asking for
      raw_syscalls:sys_{enter,exit} plus filters, make the method names for
      each kind of tracepoint more explicit.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-4rixbfzco6tsry0w9ghx3ktb@git.kernel.orgSignef-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      63f11c80
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Use beautifiers on syscalls:sys_enter_ handlers · a98392bb
      Arnaldo Carvalho de Melo authored
      We were using the beautifiers only when processing the
      raw_syscalls:sys_enter events, but we can as well use them for the
      syscalls:sys_enter_NAME events, as the layout is the same.
      
      Some more tweaking is needed as we're processing them straight away,
      i.e. there is no buffering in the sys_enter_NAME event to wait for
      things like vfs_getname to provide pointer contents and then flushing
      at sys_exit_NAME, so we need to state in the syscall_arg that this
      is unbuffered, just print the pointer values, beautifying just
      non-pointer syscall args.
      
      This just shows an alternative way of processing tracepoints, that we
      will end up using when creating "tracepoint" payloads that already copy
      pointer contents (or chunks of it, i.e. not the whole filename, but just
      the end of it, not all the bf for a read/write, but just the start,
      etc), directly in the kernel using eBPF.
      
      E.g.:
      
        # perf trace -e syscalls:*enter*sleep,*sleep sleep 1
           0.303 (         ): syscalls:sys_enter_nanosleep:rqtp: 0x7ffc93d5ecc0
           0.305 (1000.229 ms): sleep/8746 nanosleep(rqtp: 0x7ffc93d5ecc0) = 0
        # perf trace -e syscalls:*_*sleep,*sleep sleep 1
           0.288 (         ): syscalls:sys_enter_nanosleep:rqtp: 0x7ffecde87e40
           0.289 (         ): sleep/8748 nanosleep(rqtp: 0x7ffecde87e40) ...
        1000.479 (         ): syscalls:sys_exit_nanosleep:0x0
           0.289 (1000.208 ms): sleep/8748  ... [continued]: nanosleep()) = 0
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-jehyd2zwhw00z3p7v7mg9632@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a98392bb
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Associate vfs_getname()'ed pathname with fd returned from 'openat' · 6a648b53
      Arnaldo Carvalho de Melo authored
      When the vfs_getname() wannabe tracepoint is in place:
      
        # perf probe -l
          probe:vfs_getname    (on getname_flags:73@acme/git/linux/fs/namei.c with pathname)
        #
      
      'perf trace' will use it to get the pathname when it is copied from
      userspace to the kernel, right after syscalls:sys_enter_open, copied
      in the 'probe:vfs_getname', stash it somewhere and then, at
      syscalls:sys_exit_open time, if the 'open' return is not -1, i.e. a
      successfull open syscall, associate that pathname to this return, i.e.
      the fd.
      
      We were not doing this for the 'openat' syscall, which would cause 'perf
      trace' to fallback to using /proc to get the fd, change it so that we
      use what we got from probe:vfs_getname, reducing the 'openat'
      beautification process cost, ditching the syscalls performed to read
      procfs state and avoiding some possible races in the process.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-xnp44ao3bkb6ejeczxfnjwsh@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6a648b53
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo-4.19-20180801' of... · ec2cb7a5
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo-4.19-20180801' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      perf trace: (Arnaldo Carvalho de Melo)
      
      - Do not require --no-syscalls to suppress strace like output, i.e.
      
           # perf trace -e sched:*switch
      
        will show just sched:sched_switch events, not strace-like formatted
        syscall events, use --syscalls to get the previous behaviour.
      
        If instead:
      
           # perf trace
      
        is used, i.e. no events specified, then --syscalls is implied and
        system wide strace like formatting will be applied to all syscalls.
      
        The behaviour when just a syscall subset is used with '-e' is unchanged:
      
           # perf trace -e *sleep,sched:*switch
      
        will work as before: just the 'nanosleep' syscall will be strace-like
        formatted plus the sched:sched_switch tracepoint event, system wide.
      
      - Allow string table generators to use a default header dir, allowing
        use of them without parameters to see the table it generates on
        stdout, e.g.:
      
          $ tools/perf/trace/beauty/kvm_ioctl.sh
          static const char *kvm_ioctl_cmds[] = {
              [0x00] = "GET_API_VERSION",
              [0x01] = "CREATE_VM",
              [0x02] = "GET_MSR_INDEX_LIST",
              [0x03] = "CHECK_EXTENSION",
      <BIG SNIP>
              [0xe0] = "CREATE_DEVICE",
              [0xe1] = "SET_DEVICE_ATTR",
              [0xe2] = "GET_DEVICE_ATTR",
              [0xe3] = "HAS_DEVICE_ATTR",
          };
          $
      
        See 'ls tools/perf/trace/beauty/*.sh' to see the available string
        table generators.
      
      - Add a generator for IPPROTO_ socket's protocol constants.
      
      perf record: (Kan Liang)
      
      - Fix error out while applying initial delay and using LBR, due to
        the use of a PERF_TYPE_SOFTWARE/PERF_COUNT_SW_DUMMY event to track
        PERF_RECORD_MMAP events while waiting for the initial delay. Such
        events fail when configured asking PERF_SAMPLE_BRANCH_STACK in
        perf_event_attr.sample_type.
      
      perf c2c: (Jiri Olsa)
      
      - Fix report crash for empty browser, when processing a perf.data file
        without events of interest, either because not asked for in
        'perf record' or because the workload didn't triggered such events.
      
      perf list: (Michael Petlan)
      
      - Align metric group description format with PMU event description.
      
      perf tests: (Sandipan Das)
      
      - Fix indexing when invoking subtests, which caused BPF tests to
        get results for the next test in the list, with the last one
        reporting a failure.
      
      eBPF:
      
      - Fix installation directory for header files included from eBPF proggies,
        avoiding clashing with relative paths used to build other software projects
        such as glibc. (Thomas Richter)
      
      - Show better message when failing to load an object. (Arnaldo Carvalho de Melo)
      
      General: (Christophe Leroy)
      
      - Allow overriding MAX_NR_CPUS at compile time, to make the tooling
        usable in systems with less memory, in time this has to be changed
        to properly allocate based on _NPROCESSORS_ONLN.
      
      Architecture specific:
      
      - Update arm64's ThunderX2 implementation defined pmu core events (Ganapatrao Kulkarni)
      
      - Fix complex event name parsing in 'perf test' for PowerPC, where the 'umask' event
        modifier isn't present. (Sandipan Das)
      
      CoreSight ARM hardware tracing: (Leo Yan)
      
      - Fix start tracing packet handling.
      
      - Support dummy address value for CS_ETM_TRACE_ON packet.
      
      - Generate branch sample when receiving a CS_ETM_TRACE_ON packet.
      
      - Generate branch sample for CS_ETM_TRACE_ON packet.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ec2cb7a5
    • Ingo Molnar's avatar
      16e0e6a8
  4. 01 Aug, 2018 3 commits
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Do not require --no-syscalls to suppress strace like output · b912885a
      Arnaldo Carvalho de Melo authored
      So far the --syscalls option was the default, requiring explicit
      --no-syscalls when wanting to process just some other event, invert that
      and assume it only when no other event was specified, allowing its
      explicit enablement when wanting to see all syscalls together with some
      other event:
      
      E.g:
      
      The existing default is maintained for a single workload:
      
        # perf trace sleep 1
      <SNIP>
           0.264 ( 0.003 ms): sleep/12762 mmap(len: 113045344, prot: READ, flags: PRIVATE, fd: 3) = 0x7f62cbf04000
           0.271 ( 0.001 ms): sleep/12762 close(fd: 3) = 0
           0.295 (1000.130 ms): sleep/12762 nanosleep(rqtp: 0x7ffd15194fd0) = 0
        1000.469 ( 0.006 ms): sleep/12762 close(fd: 1) = 0
        1000.480 ( 0.004 ms): sleep/12762 close(fd: 2) = 0
        1000.502 (         ): sleep/12762 exit_group()
        #
      
      For a pid:
      
        # pidof ssh
        7826 3961 3226 2628 2493
        # perf trace -p 3961
               ? (         ):  ... [continued]: select()) = 1
           0.023 ( 0.005 ms): clock_gettime(which_clock: BOOTTIME, tp: 0x7ffcc8fce870               ) = 0
           0.036 ( 0.009 ms): read(fd: 5</dev/pts/7>, buf: 0x7ffcc8fca7b0, count: 16384             ) = 3
           0.060 ( 0.004 ms): getpid(                                                               ) = 3961 (ssh)
           0.079 ( 0.004 ms): clock_gettime(which_clock: BOOTTIME, tp: 0x7ffcc8fce8e0               ) = 0
           0.088 ( 0.003 ms): clock_gettime(which_clock: BOOTTIME, tp: 0x7ffcc8fce7c0               ) = 0
      <SNIP>
      
      For system wide, threads, cgroups, user, etc when no event is specified,
      the existing behaviour is maintained, i.e. --syscalls is selected.
      
      When some event is specified, then --no-syscalls doesn't need to be
      specified:
      
        # perf trace -e tcp:tcp_probe ssh localhost
           0.000 tcp:tcp_probe:src=[::1]:22 dest=[::1]:39074 mark=0 length=53 snd_nxt=0xb67ce8f7 snd_una=0xb67ce8f7 snd_cwnd=10 ssthresh=2147483647 snd_wnd=43776 srtt=18 rcv_wnd=43690
           0.010 tcp:tcp_probe:src=[::1]:39074 dest=[::1]:22 mark=0 length=32 snd_nxt=0xa8f9ef38 snd_una=0xa8f9ef23 snd_cwnd=10 ssthresh=2147483647 snd_wnd=43690 srtt=31 rcv_wnd=43776
           4.525 tcp:tcp_probe:src=[::1]:22 dest=[::1]:39074 mark=0 length=1240 snd_nxt=0xb67ce90c snd_una=0xb67ce90c snd_cwnd=10 ssthresh=2147483647 snd_wnd=43776 srtt=18 rcv_wnd=43776
           7.242 tcp:tcp_probe:src=[::1]:22 dest=[::1]:39074 mark=0 length=80 snd_nxt=0xb67ced44 snd_una=0xb67ce90c snd_cwnd=10 ssthresh=2147483647 snd_wnd=43776 srtt=18 rcv_wnd=174720
        The authenticity of host 'localhost (::1)' can't be established.
        ECDSA key fingerprint is SHA256:TKZS58923458203490asekfjaklskljmkjfgPMBfHzY.
        ECDSA key fingerprint is MD5:d8:29:54:40:71:fa:b8:44:89:52:64:8a:35:42:d0:e8.
        Are you sure you want to continue connecting (yes/no)?
      ^C
        #
      
      To get the previous behaviour just use --syscalls and get all syscalls formatted
      strace like + the specified extra events:
      
        # trace -e sched:*switch --syscalls sleep 1
        <SNIP>
           0.160 ( 0.003 ms): sleep/12877 mprotect(start: 0x7fdfe2361000, len: 4096, prot: READ) = 0
           0.164 ( 0.009 ms): sleep/12877 munmap(addr: 0x7fdfe2345000, len: 113155) = 0
           0.211 ( 0.001 ms): sleep/12877 brk() = 0x55d3ce68e000
           0.212 ( 0.002 ms): sleep/12877 brk(brk: 0x55d3ce6af000) = 0x55d3ce6af000
           0.215 ( 0.001 ms): sleep/12877 brk() = 0x55d3ce6af000
           0.219 ( 0.004 ms): sleep/12877 open(filename: 0xe1f07c00, flags: CLOEXEC) = 3
           0.225 ( 0.001 ms): sleep/12877 fstat(fd: 3, statbuf: 0x7fdfe2138aa0) = 0
           0.227 ( 0.003 ms): sleep/12877 mmap(len: 113045344, prot: READ, flags: PRIVATE, fd: 3) = 0x7fdfdb1b8000
           0.234 ( 0.001 ms): sleep/12877 close(fd: 3) = 0
           0.257 (         ): sleep/12877 nanosleep(rqtp: 0x7fffb36b6020) ...
           0.260 (         ): sched:sched_switch:prev_comm=sleep prev_pid=12877 prev_prio=120 prev_state=D ==> next_comm=swapper/3 next_pid=0 next_prio=120
           0.257 (1000.134 ms): sleep/12877  ... [continued]: nanosleep()) = 0
        1000.428 ( 0.006 ms): sleep/12877 close(fd: 1) = 0
        1000.440 ( 0.004 ms): sleep/12877 close(fd: 2) = 0
        1000.461 (         ): sleep/12877 exit_group()
        #
      
      When specifiying just some syscalls, the behaviour doesn't change, i.e.:
      
        # trace -e nanosleep -e sched:*switch sleep 1
           0.000 (         ): sleep/14974 nanosleep(rqtp: 0x7ffc344ba9c0                                        ) ...
           0.007 (         ): sched:sched_switch:prev_comm=sleep prev_pid=14974 prev_prio=120 prev_state=D ==> next_comm=swapper/2 next_pid=0 next_prio=120
           0.000 (1000.139 ms): sleep/14974  ... [continued]: nanosleep()) = 0
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-om2fulll97ytnxv40ler8jkf@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b912885a
    • Arnaldo Carvalho de Melo's avatar
      perf bpf: Include uapi/linux/bpf.h from the 'perf trace' script's bpf.h · 822c2621
      Arnaldo Carvalho de Melo authored
      The next example scripts need the definition for the BPF functions, i.e.
      things like BPF_FUNC_probe_read, and in time will require lots of other
      definitions found in uapi/linux/bpf.h, so include it from the bpf.h file
      included from the eBPF scripts build with clang via '-e bpf_script.c'
      like in this example:
      
        $ tail -8 tools/perf/examples/bpf/5sec.c
        #include <bpf.h>
      
        int probe(hrtimer_nanosleep, rqtp->tv_sec)(void *ctx, int err, long sec)
        {
      	return sec == 5;
        }
      
        license(GPL);
        $
      
      That 'bpf.h' include in the 5sec.c eBPF example will come from a set of
      header files crafted for building eBPF objects, that in a end-user
      system will come from:
      
        /usr/lib/perf/include/bpf/bpf.h
      
      And will include <uapi/linux/bpf.h> either from the place where the
      kernel was built, or from a kernel-devel rpm package like:
      
        -working-directory /lib/modules/4.17.9-100.fc27.x86_64/build
      
      That is set up by tools/perf/util/llvm-utils.c, and can be overriden
      by setting the 'kbuild-dir' variable in the "llvm" ~/.perfconfig file,
      like:
      
        # cat ~/.perfconfig
        [llvm]
             kbuild-dir = /home/foo/git/build/linux
      
      This usually doesn't need any change, just documenting here my findings
      while working with this code.
      
      In the future we may want to instead just use what is in
      /usr/include/linux/bpf.h, that comes from the UAPI provided from the
      kernel sources, for now, to avoid getting the kernel's non-UAPI
      "linux/bpf.h" file, that will cause clang to fail and is not what we
      want anyway (no BPF function definitions, etc), do it explicitely by
      asking for "uapi/linux/bpf.h".
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-zd8zeyhr2sappevojdem9xxt@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      822c2621
    • Christophe Leroy's avatar
      perf tools: Allow overriding MAX_NR_CPUS at compile time · 21b8732e
      Christophe Leroy authored
      After update of kernel, the perf tool doesn't run anymore on my 32MB RAM
      powerpc board, but still runs on a 128MB RAM board:
      
        ~# strace perf
        execve("/usr/sbin/perf", ["perf"], [/* 12 vars */]) = -1 ENOMEM (Cannot allocate memory)
        --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0} ---
        +++ killed by SIGSEGV +++
        Segmentation fault
      
      objdump -x shows that .bss section has a huge size of 24Mbytes:
      
       27 .bss          016baca8  101cebb8  101cebb8  001cd988  2**3
      
      With especially the following objects having quite big size:
      
        10205f80 l     O .bss	00140000     runtime_cycles_stats
        10345f80 l     O .bss	00140000     runtime_stalled_cycles_front_stats
        10485f80 l     O .bss	00140000     runtime_stalled_cycles_back_stats
        105c5f80 l     O .bss	00140000     runtime_branches_stats
        10705f80 l     O .bss	00140000     runtime_cacherefs_stats
        10845f80 l     O .bss	00140000     runtime_l1_dcache_stats
        10985f80 l     O .bss	00140000     runtime_l1_icache_stats
        10ac5f80 l     O .bss	00140000     runtime_ll_cache_stats
        10c05f80 l     O .bss	00140000     runtime_itlb_cache_stats
        10d45f80 l     O .bss	00140000     runtime_dtlb_cache_stats
        10e85f80 l     O .bss	00140000     runtime_cycles_in_tx_stats
        10fc5f80 l     O .bss	00140000     runtime_transaction_stats
        11105f80 l     O .bss	00140000     runtime_elision_stats
        11245f80 l     O .bss	00140000     runtime_topdown_total_slots
        11385f80 l     O .bss	00140000     runtime_topdown_slots_retired
        114c5f80 l     O .bss	00140000     runtime_topdown_slots_issued
        11605f80 l     O .bss	00140000     runtime_topdown_fetch_bubbles
        11745f80 l     O .bss	00140000     runtime_topdown_recovery_bubbles
      
      This is due to commit 4d255766 ("perf: Bump max number of cpus
      to 1024"), because many tables are sized with MAX_NR_CPUS
      
      This patch gives the opportunity to redefine MAX_NR_CPUS via
      
        $ make EXTRA_CFLAGS=-DMAX_NR_CPUS=1
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/20170922112043.8349468C57@po15668-vm-win7.idsi0.si.c-s.frSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      21b8732e
  5. 31 Jul, 2018 9 commits
    • Arnaldo Carvalho de Melo's avatar
      perf bpf: Show better message when failing to load an object · 739e2edc
      Arnaldo Carvalho de Melo authored
      Before:
      
        libbpf: license of tools/perf/examples/bpf/etcsnoop.c is GPL
        libbpf: section(6) version, size 4, link 0, flags 3, type=1
        libbpf: kernel version of tools/perf/examples/bpf/etcsnoop.c is 41200
        libbpf: section(7) .symtab, size 120, link 1, flags 0, type=2
        bpf: config program 'syscalls:sys_enter_openat'
        libbpf: load bpf program failed: Operation not permitted
        libbpf: failed to load program 'syscalls:sys_enter_openat'
        libbpf: failed to load object 'tools/perf/examples/bpf/etcsnoop.c'
        bpf: load objects failed
      
      After: (just the last line changes)
      
        bpf: load objects failed: err=-4009: (Incorrect kernel version)
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-wi44iid0yjfht3lcvplc75fm@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      739e2edc
    • Michael Petlan's avatar
      perf list: Unify metric group description format with PMU event description · 95f04328
      Michael Petlan authored
      PMU event descriptions use 7 spaces + '[' or 8 spaces as indentation.
      Metric groups used a tab + '['. This patch unifies it to the way PMU
      event descriptions are indented.
      
      BEFORE:
      
        $ perf list
        [...]
        Metric Groups:
      
        DSB:
          DSB_Coverage
      	  [Fraction of Uops delivered by the DSB (aka Decoded Icache; or Uop Cache)]
        [...]
      
      AFTER:
      
        $ perf list
        [...]
        Metric Groups:
      
        DSB:
          DSB_Coverage
               [Fraction of Uops delivered by the DSB (aka Decoded Icache; or Uop Cache)]
        [...]
      Signed-off-by: default avatarMichael Petlan <mpetlan@redhat.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Kim Phillips <kim.phillips@arm.com>
      LPU-Reference: 771439042.22924766.1532986504631.JavaMail.zimbra@redhat.com
      Link: https://lkml.kernel.org/n/tip-mlo850517m6u1rbjndvd1bwr@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      95f04328
    • Ganapatrao Kulkarni's avatar
      perf vendor events arm64: Update ThunderX2 implementation defined pmu core events · b9b77222
      Ganapatrao Kulkarni authored
      Signed-off-by: default avatarGanapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ganapatrao Kulkarni <gklkml16@gmail.com>
      Cc: Jan Glauber <jan.glauber@cavium.com>
      Cc: Jayachandran C <jnair@caviumnetworks.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Richter <robert.richter@cavium.com>
      Cc: Vadim Lomovtsev <vadim.lomovtsev@cavium.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Link: http://lkml.kernel.org/r/20180731100251.23575-1-ganapatrao.kulkarni@cavium.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b9b77222
    • Leo Yan's avatar
      perf cs-etm: Generate branch sample for CS_ETM_TRACE_ON packet · 14a85b1e
      Leo Yan authored
      CS_ETM_TRACE_ON packet itself can give the info that there have a
      discontinuity in the trace, this patch is to add branch sample for
      CS_ETM_TRACE_ON packet if it is inserted in the middle of CS_ETM_RANGE
      packets; as result we can have hint for the trace discontinuity.
      Signed-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Reviewed-by: default avatarMathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Walker <robert.walker@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lkml.kernel.org/r/1531295145-596-7-git-send-email-leo.yan@linaro.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      14a85b1e
    • Leo Yan's avatar
      perf cs-etm: Generate branch sample when receiving a CS_ETM_TRACE_ON packet · d603b4e9
      Leo Yan authored
      If one CS_ETM_TRACE_ON packet is inserted, we miss to generate branch
      sample for the previous CS_ETM_RANGE packet.
      
      This patch is to generate branch sample when receiving a CS_ETM_TRACE_ON
      packet, so this can save complete info for the previous CS_ETM_RANGE
      packet just before CS_ETM_TRACE_ON packet.
      Signed-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Reviewed-by: default avatarMathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Walker <robert.walker@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lkml.kernel.org/r/1531295145-596-6-git-send-email-leo.yan@linaro.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d603b4e9
    • Leo Yan's avatar
      perf cs-etm: Support dummy address value for CS_ETM_TRACE_ON packet · 6035b680
      Leo Yan authored
      For CS_ETM_TRACE_ON packet, its fields 'packet->start_addr' and
      'packet->end_addr' equal to 0xdeadbeefdeadbeefUL which are emitted in
      the decoder layer as dummy value, but the dummy value is pointless for
      branch sample when we use 'perf script' command to check program flow.
      
      This patch is a preparation to support CS_ETM_TRACE_ON packet for branch
      sample, it converts the dummy address value to zero for more readable;
      this is accomplished by cs_etm__last_executed_instr() and
      cs_etm__first_executed_instr().  The later one is a new function
      introduced by this patch.
      Signed-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Reviewed-by: default avatarMathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Walker <robert.walker@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lkml.kernel.org/r/1531295145-596-5-git-send-email-leo.yan@linaro.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6035b680
    • Leo Yan's avatar
      perf cs-etm: Fix start tracing packet handling · 3eb3e07b
      Leo Yan authored
      Usually the start tracing packet is a CS_ETM_TRACE_ON packet, this
      packet is passed to cs_etm__flush();  cs_etm__flush() will check the
      condition 'prev_packet->sample_type == CS_ETM_RANGE' but 'prev_packet'
      is allocated by zalloc() so 'prev_packet->sample_type' is zero in
      initialization and this condition is false.  So cs_etm__flush() will
      directly bail out without handling the start tracing packet.
      
      This patch is to introduce a new sample type CS_ETM_EMPTY, which is used
      to indicate the packet is an empty packet.  cs_etm__flush() will swap
      packets when it finds the previous packet is empty, so this can record
      the start tracing packet into 'etmq->prev_packet'.
      Signed-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Reviewed-by: default avatarMathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Walker <robert.walker@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lkml.kernel.org/r/1531295145-596-4-git-send-email-leo.yan@linaro.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3eb3e07b
    • Thomas Richter's avatar
      perf build: Fix installation directory for eBPF · 83868bf7
      Thomas Richter authored
      The perf tool build and install is controlled via a Makefile. The
      'install' rule creates directories and copies files. Among them are
      header files installed in /usr/lib/include/perf/bpf/.
      
      However all listed examples are installing its header files in
      
        /usr/lib/<tool-name>/...[/include]/header.h
      
      and not in
      
        /usr/lib/include/<tool-name>/.../header.h.
      
      Background information:
      
      Building the Fedora 28 glibc RPM on s390x and s390 fails on s390 (gcc
      -m31) as gcc is not able to find header-files like stdbool.h.
      
      In the glibc.spec file, you can see that glibc is configured with
      "--with-headers". In this case, first -nostdinc is added to the CFLAGS
      and then further include paths are added via -isystem.  One of those
      paths should contain header files like stdbool.h.
      
      In order to get this path, gcc is invoked with:
      
      - on Fedora 28 (with 4.18 kernel):
      
        $ gcc -print-file-name=include
        /usr/lib/gcc/s390x-redhat-linux/8/include
        $ gcc -m31 -print-file-name=include
        /usr/lib/gcc/s390x-redhat-linux/8/../../../../lib/include
        => If perf is installed, this is: /usr/lib/include
        On my machine this directory is only containing the directory "perf".
        If perf is not installed gcc returns: /usr/lib/gcc/s390x-redhat-linux/8/include
      
      - on Ubuntu 18.04 (with 4.15 kernel):
      
        $ gcc  -print-file-name=include
        /usr/lib/gcc/s390x-linux-gnu/7/include
        $ gcc -m31 -print-file-name=include
        /usr/lib/gcc/s390x-linux-gnu/7/include
        => gcc returns the correct path even if perf is installed.
      
      In each case, the introduction of the subdirectory /usr/lib/include
      leads to the regression that one can not build the glibc RPM for s390
      anymore as gcc can not find headers like stdbool.h.
      
      To remedy this install bpf.h to /usr/lib/perf/include/bpf/bpf.h
      
      Output before using the command 'perf test -Fv 40':
      
        echo '...[bpf-program-source]...' | /usr/bin/clang ... \
      		   -I/root/lib/include/perf/bpf ...
                                     ^^^^^^^^^^^^
      ...
        [root@p23lp27 perf]# perf test -F 40
        40: BPF filter                                            :
        40.1: Basic BPF filtering                                 : Ok
        40.2: BPF pinning                                         : Ok
        40.3: BPF prologue generation                             : Ok
        40.4: BPF relocation checker                              : Ok
        [root@p23lp27 perf]#
      
      Output after using command 'perf test -Fv 40':
      
        echo '...[bpf-program-source]...' | /usr/bin/clang ... \
      		 -I/root/lib/perf/include/bpf ...
                                   ^^^^^^^^^^^^
      ...
        [root@p23lp27 perf]# perf test -F 40
        40: BPF filter                                            :
        40.1: Basic BPF filtering                                 : Ok
        40.2: BPF pinning                                         : Ok
        40.3: BPF prologue generation                             : Ok
        40.4: BPF relocation checker                              : Ok
        [root@p23lp27 perf]#
      
      Committer testing:
      
      While the above 'perf test -F 40' (or 'perf test bpf') will allow us
      to see that the correct path is now added via -I, to actually test this
      we better try to use a bpf script that includes files in the changed
      directory.
      
      We have the files that now reside in /root/lib/perf/examples/bpf/ to do
      just that:
      
        # tail -8 /root/lib/perf/examples/bpf/5sec.c
        #include <bpf.h>
      
        int probe(hrtimer_nanosleep, rqtp->tv_sec)(void *ctx, int err, long sec)
        {
      	  return sec == 5;
        }
      
        license(GPL);
        # perf trace -e *sleep -e /root/lib/perf/examples/bpf/5sec.c sleep 4
             0.333 (4000.086 ms): sleep/9248 nanosleep(rqtp: 0x7ffc155f3300) = 0
        # perf trace -e *sleep -e /root/lib/perf/examples/bpf/5sec.c sleep 5
             0.287 (         ): sleep/9659 nanosleep(rqtp: 0x7ffeafe38200) ...
             0.290 (         ): perf_bpf_probe:hrtimer_nanosleep:(ffffffff9911efe0) tv_sec=5
             0.287 (5000.059 ms): sleep/9659  ... [continued]: nanosleep()) = 0
        # perf trace -e *sleep -e /root/lib/perf/examples/bpf/5sec.c sleep 6
             0.247 (5999.951 ms): sleep/10068 nanosleep(rqtp: 0x7fff2086d900) = 0
        # perf trace -e *sleep -e /root/lib/perf/examples/bpf/5sec.c sleep 5.987
             0.293 (         ): sleep/10489 nanosleep(rqtp: 0x7ffdd4fc10e0) ...
             0.296 (         ): perf_bpf_probe:hrtimer_nanosleep:(ffffffff9911efe0) tv_sec=5
             0.293 (5986.912 ms): sleep/10489  ... [continued]: nanosleep()) = 0
        #
      Suggested-by: default avatarStefan Liebler <stli@linux.ibm.com>
      Suggested-by: default avatarArnaldo Carvalho de Melo <acme@kernel.org>
      Signed-off-by: default avatarThomas Richter <tmricht@linux.ibm.com>
      Reviewed-by: default avatarHendrik Brueckner <brueckner@linux.ibm.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Fixes: 1b16fffa ("perf llvm-utils: Add bpf include path to clang command line")
      Link: http://lkml.kernel.org/r/20180731073254.91090-1-tmricht@linux.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      83868bf7
    • Jiri Olsa's avatar
      perf c2c report: Fix crash for empty browser · 73978332
      Jiri Olsa authored
      'perf c2c' scans read/write accesses and tries to find false sharing
      cases, so when the events it wants were not asked for or ended up not
      taking place, we get no histograms.
      
      So do not try to display entry details if there's not any. Currently
      this ends up in crash:
      
        $ perf c2c report # then press 'd'
        perf: Segmentation fault
        $
      
      Committer testing:
      
      Before:
      
      Record a perf.data file without events of interest to 'perf c2c report',
      then call it and press 'd':
      
        # perf record sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.001 MB perf.data (6 samples) ]
        # perf c2c report
        perf: Segmentation fault
        -------- backtrace --------
        perf[0x5b1d2a]
        /lib64/libc.so.6(+0x346df)[0x7fcb566e36df]
        perf[0x46fcae]
        perf[0x4a9f1e]
        perf[0x4aa220]
        perf(main+0x301)[0x42c561]
        /lib64/libc.so.6(__libc_start_main+0xe9)[0x7fcb566cff29]
        perf(_start+0x29)[0x42c999]
        #
      
      After the patch the segfault doesn't take place, a follow up patch to
      tell the user why nothing changes when 'd' is pressed would be good.
      
      Reported-by: rodia@autistici.org
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Joe Mario <jmario@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Fixes: f1c5fd4d ("perf c2c report: Add TUI cacheline browser")
      Link: http://lkml.kernel.org/r/20180724062008.26126-1-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      73978332