1. 03 Mar, 2016 3 commits
    • Wang Nan's avatar
      perf data: Support converting data from bpf_perf_event_output() · 6122d57e
      Wang Nan authored
      bpf_perf_event_output() outputs data through sample->raw_data. This
      patch adds support to convert those data into CTF. A python script then
      can be used to process output data from BPF programs.
      
      Test result:
      
        # cat ./test_bpf_output_2.c
        /************************ BEGIN **************************/
        #include <uapi/linux/bpf.h>
        struct bpf_map_def {
       	unsigned int type;
       	unsigned int key_size;
       	unsigned int value_size;
       	unsigned int max_entries;
        };
        #define SEC(NAME) __attribute__((section(NAME), used))
        static u64 (*ktime_get_ns)(void) =
       	(void *)BPF_FUNC_ktime_get_ns;
        static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
       	(void *)BPF_FUNC_trace_printk;
        static int (*get_smp_processor_id)(void) =
       	(void *)BPF_FUNC_get_smp_processor_id;
        static int (*perf_event_output)(void *, struct bpf_map_def *, int, void *, unsigned long) =
       	(void *)BPF_FUNC_perf_event_output;
      
        struct bpf_map_def SEC("maps") channel = {
       	.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
       	.key_size = sizeof(int),
       	.value_size = sizeof(u32),
       	.max_entries = __NR_CPUS__,
        };
      
        static inline int __attribute__((always_inline))
        func(void *ctx, int type)
        {
       	struct {
       		u64 ktime;
       		int type;
       	} __attribute__((packed)) output_data;
       	char error_data[] = "Error: failed to output\n";
       	int err;
      
       	output_data.type = type;
       	output_data.ktime = ktime_get_ns();
       	err = perf_event_output(ctx, &channel, get_smp_processor_id(),
       				&output_data, sizeof(output_data));
       	if (err)
       		trace_printk(error_data, sizeof(error_data));
       	return 0;
        }
        SEC("func_begin=sys_nanosleep")
        int func_begin(void *ctx) {return func(ctx, 1);}
        SEC("func_end=sys_nanosleep%return")
        int func_end(void *ctx) { return func(ctx, 2);}
        char _license[] SEC("license") = "GPL";
        int _version SEC("version") = LINUX_VERSION_CODE;
        /************************* END ***************************/
      
        # ./perf record -e bpf-output/no-inherit,name=evt/ \
                       -e ./test_bpf_output_2.c/map:channel.event=evt/ \
                       usleep 100000
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.012 MB perf.data (2 samples) ]
      
        # ./perf script
                usleep 14942 92503.198504: evt:  ffffffff810e0ba1 sys_nanosleep (/lib/modules/4.3.0....
                usleep 14942 92503.298562: evt:  ffffffff810585e9 kretprobe_trampoline_holder (/lib....
      
        # ./perf data convert --to-ctf ./out.ctf
        [ perf data convert: Converted 'perf.data' into CTF data './out.ctf' ]
        [ perf data convert: Converted and wrote 0.000 MB (2 samples) ]
      
        # babeltrace ./out.ctf
        [01:41:43.198504134] (+?.?????????) evt: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF810E0BA1, perf_tid = 14942, perf_pid = 14942, perf_id = 1044, raw_len = 3, raw_data = [ [0] = 0x32C0C07B, [1] = 0x5421, [2] = 0x1 ] }
        [01:41:43.298562257] (+0.100058123) evt: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF810585E9, perf_tid = 14942, perf_pid = 14942, perf_id = 1044, raw_len = 3, raw_data = [ [0] = 0x38B77FAA, [1] = 0x5421, [2] = 0x2 ] }
      
        # cat ./test_bpf_output_2.py
        from babeltrace import TraceCollection
        tc = TraceCollection()
        tc.add_trace('./out.ctf', 'ctf')
        d = {1:[], 2:[]}
        for event in tc.events:
           if not event.name.startswith('evt'):
               continue
           raw_data = event['raw_data']
           (time, type) = ((raw_data[0] + (raw_data[1] << 32)), raw_data[2])
           d[type].append(time)
        print(list(map(lambda i: d[2][i] - d[1][i], range(len(d[1])))));
      
        # python3 ./test_bpf_output_2.py
        [100056879]
      
      Committer note:
      
      Make sure you have python3-devel installed, not python-devel, which may
      be for python2, which will lead to some "PyInstance_Type" errors. Also
      make sure that you use the right libbabeltrace, because it is shipped
      in Fedora, for instance, but an older version.
      
      To build libbabeltrace's python binding one also needs to use:
      
       ./configure --enable-python-bindings
      
      And then set PYTHONPATH=/usr/local/lib64/python3.4/site-packages/.
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1456479154-136027-9-git-send-email-wangnan0@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6122d57e
    • Andi Kleen's avatar
      perf stat: Check existence of frontend/backed stalled cycles · 9dec4473
      Andi Kleen authored
      Only put the frontend/backend stalled cycles into the default perf stat
      events when the CPU actually supports them.
      
      This avoids empty columns with --metric-only on newer Intel CPUs.
      
      Committer note:
      
      Before:
      
        $ perf stat ls
      
          Performance counter stats for 'ls':
      
                1.080893     task-clock (msec)      #    0.619 CPUs utilized
                       0     context-switches       #    0.000 K/sec
                       0     cpu-migrations         #    0.000 K/sec
                      97     page-faults            #    0.090 M/sec
               3,327,741     cycles                 #    3.079 GHz
         <not supported>     stalled-cycles-frontend
         <not supported>     stalled-cycles-backend
               1,609,544     instructions           #    0.48  insn per cycle
                 319,117     branches               #  295.235 M/sec
                  12,246     branch-misses          #    3.84% of all branches
      
             0.001746508 seconds time elapsed
        $
      
      After:
      
        $ perf stat ls
      
          Performance counter stats for 'ls':
      
                0.693948     task-clock (msec)      #    0.662 CPUs utilized
                       0     context-switches       #    0.000 K/sec
                       0     cpu-migrations         #    0.000 K/sec
                      95     page-faults            #    0.137 M/sec
               1,792,509     cycles                 #    2.583 GHz
               1,599,047     instructions           #    0.89  insn per cycle
                 316,328     branches               #  455.838 M/sec
                  12,453     branch-misses          #    3.94% of all branches
      
             0.001048987 seconds time elapsed
        $
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1456532881-26621-2-git-send-email-andi@firstfloor.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9dec4473
    • Jiri Olsa's avatar
      perf tools: Fix locale handling in pmu parsing · f9a5978a
      Jiri Olsa authored
      Ingo reported regression on display format of big numbers, which is
      missing separators (in default perf stat output).
      
       triton:~/tip> perf stat -a sleep 1
               ...
               127008602      cycles                    #    0.011 GHz
               279538533      stalled-cycles-frontend   #  220.09% frontend cycles idle
               119213269      instructions              #    0.94  insn per cycle
      
      This is caused by recent change:
      
        perf stat: Check existence of frontend/backed stalled cycles
      
      that added call to pmu_have_event, that subsequently calls
      perf_pmu__parse_scale, which has a bug in locale handling.
      
      The lc string returned from setlocale, that we use to store old locale
      value, may be allocated in static storage. Getting a dynamic copy to
      make it survive another setlocale call.
      
        $ perf stat ls
               ...
               2,360,602      cycles                    #    3.080 GHz
               2,703,090      instructions              #    1.15  insn per cycle
                 546,031      branches                  #  712.511 M/sec
      
      Committer note:
      
      Since the patch introducing the regression didn't made to perf/core,
      move it to just before where the regression was introduced, so that we
      don't break bisection for this feature.
      Reported-by: default avatarIngo Molnar <mingo@redhat.com>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20160303095348.GA24511@krava.redhat.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f9a5978a
  2. 29 Feb, 2016 32 commits
  3. 28 Feb, 2016 5 commits