• Arnaldo Carvalho de Melo's avatar
    perf evsel: Fix probing of precise_ip level for default cycles event · 7a1ac110
    Arnaldo Carvalho de Melo authored
    Since commit 18e7a45a ("perf/x86: Reject non sampling events with
    precise_ip") returns -EINVAL for sys_perf_event_open() with an attribute
    with (attr.precise_ip > 0 && attr.sample_period == 0), just like is done
    in the routine used to probe the max precise level when no events were
    passed to 'perf record' or 'perf top', i.e.:
    
    	perf_evsel__new_cycles()
    		perf_event_attr__set_max_precise_ip()
    
    The x86 code, in x86_pmu_hw_config(), which is called all the way from
    sys_perf_event_open() did, starting with the aforementioned commit:
    
                    /* There's no sense in having PEBS for non sampling events: */
                    if (!is_sampling_event(event))
                            return -EINVAL;
    
    Which makes it fail for cycles:ppp, cycles:pp and cycles:p, always using
    just the non precise cycles variant.
    
    To make sure that this is the case, I tested it, before this patch,
    with:
    
      # perf probe -L x86_pmu_hw_config
      <x86_pmu_hw_config@/home/acme/git/linux/arch/x86/events/core.c:0>
            0  int x86_pmu_hw_config(struct perf_event *event)
            1  {
            2         if (event->attr.precise_ip) {
    <SNIP>
           17                 if (event->attr.precise_ip > precise)
           18                         return -EOPNOTSUPP;
    
                              /* There's no sense in having PEBS for non sampling events: */
           21                 if (!is_sampling_event(event))
           22                         return -EINVAL;
                      }
    <SNIP>
      # perf probe x86_pmu_hw_config:22
      Added new events:
        probe:x86_pmu_hw_config (on x86_pmu_hw_config:22)
        probe:x86_pmu_hw_config_1 (on x86_pmu_hw_config:22)
    
      You can now use it in all perf tools, such as:
    
            perf record -e probe:x86_pmu_hw_config_1 -aR sleep 1
    
      # perf trace -e perf_event_open,probe:x86_pmu_hwconfig*/max-stack=16/ perf record usleep 1
         0.000 ( 0.015 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8ba110, cpu: -1, group_fd: -1      ) ...
         0.015 (         ): probe:x86_pmu_hw_config:(ffffffff9c0065e1))
                                           x86_pmu_hw_config ([kernel.kallsyms])
                                           hsw_hw_config ([kernel.kallsyms])
                                           x86_pmu_event_init ([kernel.kallsyms])
                                           perf_try_init_event ([kernel.kallsyms])
                                           perf_event_alloc ([kernel.kallsyms])
                                           SYSC_perf_event_open ([kernel.kallsyms])
                                           sys_perf_event_open ([kernel.kallsyms])
                                           do_syscall_64 ([kernel.kallsyms])
                                           return_from_SYSCALL_64 ([kernel.kallsyms])
                                           syscall (/usr/lib64/libc-2.24.so)
                                           perf_event_attr__set_max_precise_ip (/home/acme/bin/perf)
                                           perf_evsel__new_cycles (/home/acme/bin/perf)
                                           perf_evlist__add_default (/home/acme/bin/perf)
                                           cmd_record (/home/acme/bin/perf)
                                           run_builtin (/home/acme/bin/perf)
                                           handle_internal_command (/home/acme/bin/perf)
         0.000 ( 0.021 ms): perf/4150  ... [continued]: perf_event_open()) = -1 EINVAL Invalid argument
         0.023 ( 0.002 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8ba110, cpu: -1, group_fd: -1      ) ...
         0.025 (         ): probe:x86_pmu_hw_config:(ffffffff9c0065e1))
                                           x86_pmu_hw_config ([kernel.kallsyms])
                                           hsw_hw_config ([kernel.kallsyms])
                                           x86_pmu_event_init ([kernel.kallsyms])
                                           perf_try_init_event ([kernel.kallsyms])
                                           perf_event_alloc ([kernel.kallsyms])
                                           SYSC_perf_event_open ([kernel.kallsyms])
                                           sys_perf_event_open ([kernel.kallsyms])
                                           do_syscall_64 ([kernel.kallsyms])
                                           return_from_SYSCALL_64 ([kernel.kallsyms])
                                           syscall (/usr/lib64/libc-2.24.so)
                                           perf_event_attr__set_max_precise_ip (/home/acme/bin/perf)
                                           perf_evsel__new_cycles (/home/acme/bin/perf)
                                           perf_evlist__add_default (/home/acme/bin/perf)
                                           cmd_record (/home/acme/bin/perf)
                                           run_builtin (/home/acme/bin/perf)
                                           handle_internal_command (/home/acme/bin/perf)
         0.023 ( 0.004 ms): perf/4150  ... [continued]: perf_event_open()) = -1 EINVAL Invalid argument
         0.028 ( 0.002 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8ba110, cpu: -1, group_fd: -1      ) ...
         0.030 (         ): probe:x86_pmu_hw_config:(ffffffff9c0065e1))
                                           x86_pmu_hw_config ([kernel.kallsyms])
                                           hsw_hw_config ([kernel.kallsyms])
                                           x86_pmu_event_init ([kernel.kallsyms])
                                           perf_try_init_event ([kernel.kallsyms])
                                           perf_event_alloc ([kernel.kallsyms])
                                           SYSC_perf_event_open ([kernel.kallsyms])
                                           sys_perf_event_open ([kernel.kallsyms])
                                           do_syscall_64 ([kernel.kallsyms])
                                           return_from_SYSCALL_64 ([kernel.kallsyms])
                                           syscall (/usr/lib64/libc-2.24.so)
                                           perf_event_attr__set_max_precise_ip (/home/acme/bin/perf)
                                           perf_evsel__new_cycles (/home/acme/bin/perf)
                                           perf_evlist__add_default (/home/acme/bin/perf)
                                           cmd_record (/home/acme/bin/perf)
                                           run_builtin (/home/acme/bin/perf)
                                           handle_internal_command (/home/acme/bin/perf)
         0.028 ( 0.004 ms): perf/4150  ... [continued]: perf_event_open()) = -1 EINVAL Invalid argument
        41.018 ( 0.012 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8b5dd0, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
        41.065 ( 0.011 ms): perf/4150 perf_event_open(attr_uptr: 0x3c7db78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
        41.080 ( 0.006 ms): perf/4150 perf_event_open(attr_uptr: 0x3c7db78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
        41.103 ( 0.010 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), group_fd: -1, flags: FD_CLOEXEC) = 4
        41.115 ( 0.006 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), cpu: 1, group_fd: -1, flags: FD_CLOEXEC) = 5
        41.122 ( 0.004 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), cpu: 2, group_fd: -1, flags: FD_CLOEXEC) = 6
        41.128 ( 0.008 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), cpu: 3, group_fd: -1, flags: FD_CLOEXEC) = 8
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.017 MB perf.data (2 samples) ]
      #
    
    I.e. that return -EINVAL in x86_pmu_hw_config() is hit three times.
    
    So fix it by just setting attr.sample_period
    
    Now, after this patch:
    
      # perf trace --max-stack=2 -e perf_event_open,probe:x86_pmu_hw_config* perf record usleep 1
      [ perf record: Woken up 1 times to write data ]
         0.000 ( 0.017 ms): perf/8469 perf_event_open(attr_uptr: 0x7ffe36c27d10, pid: -1, cpu: 3, group_fd: -1, flags: FD_CLOEXEC) = 4
                                           syscall (/usr/lib64/libc-2.24.so)
                                           perf_event_open_cloexec_flag (/home/acme/bin/perf)
         0.050 ( 0.031 ms): perf/8469 perf_event_open(attr_uptr: 0x24ebb78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
                                           syscall (/usr/lib64/libc-2.24.so)
                                           perf_evlist__config (/home/acme/bin/perf)
         0.092 ( 0.040 ms): perf/8469 perf_event_open(attr_uptr: 0x24ebb78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
                                           syscall (/usr/lib64/libc-2.24.so)
                                           perf_evlist__config (/home/acme/bin/perf)
         0.143 ( 0.007 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, cpu: -1, group_fd: -1           ) = 4
                                           syscall (/usr/lib64/libc-2.24.so)
                                           perf_event_attr__set_max_precise_ip (/home/acme/bin/perf)
         0.161 ( 0.007 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), group_fd: -1, flags: FD_CLOEXEC) = 4
                                           syscall (/usr/lib64/libc-2.24.so)
                                           perf_evsel__open (/home/acme/bin/perf)
         0.171 ( 0.005 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), cpu: 1, group_fd: -1, flags: FD_CLOEXEC) = 5
                                           syscall (/usr/lib64/libc-2.24.so)
                                           perf_evsel__open (/home/acme/bin/perf)
         0.180 ( 0.007 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), cpu: 2, group_fd: -1, flags: FD_CLOEXEC) = 6
                                           syscall (/usr/lib64/libc-2.24.so)
                                           perf_evsel__open (/home/acme/bin/perf)
         0.190 ( 0.005 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), cpu: 3, group_fd: -1, flags: FD_CLOEXEC) = 8
                                           syscall (/usr/lib64/libc-2.24.so)
                                           perf_evsel__open (/home/acme/bin/perf)
      [ perf record: Captured and wrote 0.017 MB perf.data (7 samples) ]
      #
    
    The probe one called from perf_event_attr__set_max_precise_ip() works
    the first time, with attr.precise_ip = 3, wit hthe next ones being the
    per cpu ones for the cycles:ppp event.
    
    And here is the text from a report and alternative proposed patch by
    Thomas-Mich Richter:
    
     ---
    
    On s390 the counter and sampling facility do not support a precise IP
    skid level and sometimes returns EOPNOTSUPP when structure member
    precise_ip in struct perf_event_attr is not set to zero.
    
    On s390 commnd 'perf record -- true' fails with error EOPNOTSUPP.  This
    happens only when no events are specified on command line.
    
    The functions called are
    ...
      --> perf_evlist__add_default
          --> perf_evsel__new_cycles
              --> perf_event_attr__set_max_precise_ip
    
    The last function determines the value of structure member precise_ip by
    invoking the perf_event_open() system call and checking the return code.
    The first successful open is the value for precise_ip.
    
    However the value is determined without setting member sample_period and
    indicates no sampling.
    
    On s390 the counter facility and sampling facility are different.  The
    above procedure determines a precise_ip value of 3 using the counter
    facility. Later it uses the sampling facility with a value of 3 and
    fails with EOPNOTSUPP.
    
     ---
    
    v2: Older compilers (e.g. gcc 4.4.7) don't support referencing members
        of unnamed union members in the container struct initialization, so
        move from:
    
    	struct perf_event_attr attr = {
    		...
    		.sample_period = 1,
    	};
    
    to right after it as:
    
    	struct perf_event_attr attr = {
    		...
    	};
    
    	attr.sample_period = 1;
    
    v3: We need to reset .sample_period to 0 to let the users of
    perf_evsel__new_cycles() to properly setup attr.sample_period or
    attr.sample_freq. Reported by Ingo Molnar.
    Reported-and-Acked-by: default avatarThomas-Mich Richter <tmricht@linux.vnet.ibm.com>
    Acked-by: default avatarHendrik Brueckner <brueckner@linux.vnet.ibm.com>
    Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
    Cc: Adrian Hunter <adrian.hunter@intel.com>
    Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
    Cc: David Ahern <dsahern@gmail.com>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Wang Nan <wangnan0@huawei.com>
    Fixes: 18e7a45a ("perf/x86: Reject non sampling events with precise_ip")
    Link: http://lkml.kernel.org/n/tip-yv6nnkl7tzqocrm0hl3x7vf1@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
    7a1ac110
evsel.c 63.2 KB