1. 22 Dec, 2016 2 commits
    • Peter Zijlstra's avatar
      perf/x86: Fix overlap counter scheduling bug · 1134c2b5
      Peter Zijlstra authored
      Jiri reported the overlap scheduling exceeding its max stack.
      
      Looking at the constraint that triggered this, it turns out the
      overlap marker isn't needed.
      
      The comment with EVENT_CONSTRAINT_OVERLAP states: "This is the case if
      the counter mask of such an event is not a subset of any other counter
      mask of a constraint with an equal or higher weight".
      
      Esp. that latter part is of interest here I think, our overlapping mask
      is 0x0e, that has 3 bits set and is the highest weight mask in on the
      PMU, therefore it will be placed last. Can we still create a scenario
      where we would need to rewind that?
      
      The scenario for AMD Fam15h is we're having masks like:
      
      	0x3F -- 111111
      	0x38 -- 111000
      	0x07 -- 000111
      
      	0x09 -- 001001
      
      And we mark 0x09 as overlapping, because it is not a direct subset of
      0x38 or 0x07 and has less weight than either of those. This means we'll
      first try and place the 0x09 event, then try and place 0x38/0x07 events.
      Now imagine we have:
      
      	3 * 0x07 + 0x09
      
      and the initial pick for the 0x09 event is counter 0, then we'll fail to
      place all 0x07 events. So we'll pop back, try counter 4 for the 0x09
      event, and then re-try all 0x07 events, which will now work.
      
      The masks on the PMU in question are:
      
        0x01 - 0001
        0x03 - 0011
        0x0e - 1110
        0x0c - 1100
      
      But since all the masks that have overlap (0xe -> {0xc,0x3}) and (0x3 ->
      0x1) are of heavier weight, it should all work out.
      Reported-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Liang Kan <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vince@deater.net>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/20161109155153.GQ3142@twins.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1134c2b5
    • Stephane Eranian's avatar
      perf/x86/pebs: Fix handling of PEBS buffer overflows · daa864b8
      Stephane Eranian authored
      This patch solves a race condition between PEBS and the PMU handler.
      
      In case multiple PEBS events are sampled at the same time,
      it is possible to have GLOBAL_STATUS bit 62 set indicating
      PEBS buffer overflow and also seeing at most 3 PEBS counters
      having their bits set in the status register. This is a sign
      that there was at least one PEBS record pending at the time
      of the PMU interrupt. PEBS counters must only be processed
      via the drain_pebs() calls, and not via the regular sample
      processing loop coming after that the function, otherwise
      phony regular samples may be generated in the sampling buffer
      not marked with the EXACT tag.
      
      Another possibility is to have one PEBS event and at least
      one non-PEBS event whic hoverflows while PEBS has armed. In this
      case, bit 62 of GLOBAL_STATUS will not be set, yet the overflow
      status bit for the PEBS counter will be on Skylake.
      
      To avoid this problem, we systematically ignore the PEBS-enabled
      counters from the GLOBAL_STATUS mask and we always process PEBS
      events via drain_pebs().
      
      The problem manifested itself by having non-exact samples when
      sampling only PEBS events, i.e., the PERF_SAMPLE_RECORD would
      not have the EXACT flag set.
      
      Note that this problem is only present on Skylake processor.
      This fix is harmless on older processors.
      Reported-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarStephane Eranian <eranian@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1482395366-8992-1-git-send-email-eranian@google.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      daa864b8
  2. 20 Dec, 2016 11 commits
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo-20161220' of... · 03756917
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo-20161220' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
      
      Pull perf/core improvements and fixes:
      
      New features:
      
       - Introduce 'perf sched timehist --idle', to analyse processes
         going to/from idle state (Namhyung Kim)
      
      Fixes:
      
       - Allow 'perf record -u user' to continue when facing races with threads
         going away after having scanned them via /proc (Jiri Olsa)
      
       - Fix 'perf mem' --all-user/--all-kernel options (Jiri Olsa)
      
       - Support jumps with multiple arguments (Ravi Bangoria)
      
       - Fix jumps to before the function where they are located (Ravi Bangoria)
      
       - Fix lock-pi help string (Davidlohr Bueso)
      
       - Fix build of 'perf trace' in odd systems such as a RHEL PPC one (Jiri Olsa)
      
       - Do not overwrite valid build id in 'perf diff' (Kan Liang)
      
       - Don't throw error for zero length symbols, allowing the use of the TUI
         in PowerPC, where such symbols became more common recently (Ravi Bangoria)
      
      Infrastructure changes:
      
       - Switch of samples/bpf/ to use tools/lib/bpf, removing libbpf
         duplication (Joe Stringer)
      
       - Move headers check into bash script (Jiri Olsa)
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      03756917
    • Joe Stringer's avatar
      samples/bpf: Move open_raw_sock to separate header · 9899694a
      Joe Stringer authored
      This function was declared in libbpf.c and was the only remaining
      function in this library, but has nothing to do with BPF. Shift it out
      into a new header, sock_example.h, and include it from the relevant
      samples.
      Signed-off-by: default avatarJoe Stringer <joe@ovn.org>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/20161209024620.31660-8-joe@ovn.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9899694a
    • Joe Stringer's avatar
      samples/bpf: Remove perf_event_open() declaration · 205c8ada
      Joe Stringer authored
      This declaration was made in samples/bpf/libbpf.c for convenience, but
      there's already one in tools/perf/perf-sys.h. Reuse that one.
      
      Committer notes:
      
      Testing it:
      
        $ make -j4 O=../build/v4.9.0-rc8+ samples/bpf/
        make[1]: Entering directory '/home/build/v4.9.0-rc8+'
          CHK     include/config/kernel.release
          GEN     ./Makefile
          CHK     include/generated/uapi/linux/version.h
          Using /home/acme/git/linux as source for kernel
          CHK     include/generated/utsrelease.h
          CHK     include/generated/timeconst.h
          CHK     include/generated/bounds.h
          CHK     include/generated/asm-offsets.h
          CALL    /home/acme/git/linux/scripts/checksyscalls.sh
          HOSTCC  samples/bpf/test_verifier.o
          HOSTCC  samples/bpf/libbpf.o
          HOSTCC  samples/bpf/../../tools/lib/bpf/bpf.o
          HOSTCC  samples/bpf/test_maps.o
          HOSTCC  samples/bpf/sock_example.o
          HOSTCC  samples/bpf/bpf_load.o
      <SNIP>
          HOSTLD  samples/bpf/trace_event
          HOSTLD  samples/bpf/sampleip
          HOSTLD  samples/bpf/tc_l2_redirect
        make[1]: Leaving directory '/home/build/v4.9.0-rc8+'
        $
      
      Also tested the offwaketime resulting from the rebuild, seems to work as
      before.
      Signed-off-by: default avatarJoe Stringer <joe@ovn.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/20161209024620.31660-7-joe@ovn.org
      [ Use -I$(srctree)/tools/lib/ to support out of source code tree builds ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      205c8ada
    • Arnaldo Carvalho de Melo's avatar
      samples/bpf: Be consistent with bpf_load_program bpf_insn parameter · 811b4f0d
      Arnaldo Carvalho de Melo authored
      Only one of the examples declare the bpf_insn bpf proggie as a const:
      
        $ grep 'struct bpf_insn [a-z]' samples/bpf/*.c
        samples/bpf/fds_example.c:	static const struct bpf_insn insns[] = {
        samples/bpf/sock_example.c:	struct bpf_insn prog[] = {
        samples/bpf/test_cgrp2_attach2.c:	struct bpf_insn prog[] = {
        samples/bpf/test_cgrp2_attach.c:	struct bpf_insn prog[] = {
        samples/bpf/test_cgrp2_sock.c:	struct bpf_insn prog[] = {
        $
      
      Which causes this warning:
      
        [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux samples/bpf/
        <SNIP>
           HOSTCC  samples/bpf/fds_example.o
        /git/linux/samples/bpf/fds_example.c: In function 'bpf_prog_create':
        /git/linux/samples/bpf/fds_example.c:63:6: warning: passing argument 2 of 'bpf_load_program' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
              insns, insns_cnt, "GPL", 0,
              ^~~~~
        In file included from /git/linux/samples/bpf/libbpf.h:5:0,
                         from /git/linux/samples/bpf/bpf_load.h:4,
                         from /git/linux/samples/bpf/fds_example.c:15:
        /git/linux/tools/lib/bpf/bpf.h:31:5: note: expected 'struct bpf_insn *' but argument is of type 'const struct bpf_insn *'
         int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
             ^~~~~~~~~~~~~~~~
          HOSTCC  samples/bpf/sockex1_user.o
      
      So just ditch that 'const' to reduce build noise, leaving changing the
      bpf_load_program() bpf_insn parameter to const to a later patch, if deemed
      adequate.
      
      Cc: Joe Stringer <joe@ovn.org>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-1z5xee8n3oa66jf62bpv16ed@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      811b4f0d
    • Joe Stringer's avatar
      tools lib bpf: Add bpf_prog_{attach,detach} · 5dc880de
      Joe Stringer authored
      Commit d8c5b17f ("samples: bpf: add userspace example for attaching
      eBPF programs to cgroups") added these functions to samples/libbpf, but
      during this merge all of the samples libbpf functionality is shifting to
      tools/lib/bpf. Shift these functions there.
      
      Committer notes:
      
      Use bzero + attr.FIELD = value instead of 'attr = { .FIELD = value, just
      like the other wrapper calls to sys_bpf with bpf_attr to make this build
      in older toolchais, such as the ones in CentOS 5 and 6.
      Signed-off-by: default avatarJoe Stringer <joe@ovn.org>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-au2zvtsh55vqeo3v3uw7jr4c@git.kernel.org
      Link: https://github.com/joestringer/linux/commit/353e6f298c3d0a92fa8bfa61ff898c5050261a12.patchSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      5dc880de
    • Joe Stringer's avatar
      samples/bpf: Switch over to libbpf · 43371c83
      Joe Stringer authored
      Now that libbpf under tools/lib/bpf/* is synced with the version from
      samples/bpf, we can get rid most of the libbpf library here.
      
      Committer notes:
      
      Built it in a docker fedora rawhide container and ran it in the f25 host, seems
      to work just like it did before this patch, i.e. the switch to tools/lib/bpf/
      doesn't seem to have introduced problems and Joe said he tested it with
      all the entries in samples/bpf/ and other code he found:
      
        [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux headers_install
        <SNIP>
        [root@f5065a7d6272 linux]# rm -rf /tmp/build/linux/samples/bpf/
        [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux samples/bpf/
        make[1]: Entering directory '/tmp/build/linux'
          CHK     include/config/kernel.release
          HOSTCC  scripts/basic/fixdep
          GEN     ./Makefile
          CHK     include/generated/uapi/linux/version.h
          Using /git/linux as source for kernel
          CHK     include/generated/utsrelease.h
          HOSTCC  scripts/basic/bin2c
          HOSTCC  arch/x86/tools/relocs_32.o
          HOSTCC  arch/x86/tools/relocs_64.o
          LD      samples/bpf/built-in.o
        <SNIP>
          HOSTCC  samples/bpf/fds_example.o
          HOSTCC  samples/bpf/sockex1_user.o
        /git/linux/samples/bpf/fds_example.c: In function 'bpf_prog_create':
        /git/linux/samples/bpf/fds_example.c:63:6: warning: passing argument 2 of 'bpf_load_program' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
              insns, insns_cnt, "GPL", 0,
              ^~~~~
        In file included from /git/linux/samples/bpf/libbpf.h:5:0,
                         from /git/linux/samples/bpf/bpf_load.h:4,
                         from /git/linux/samples/bpf/fds_example.c:15:
        /git/linux/tools/lib/bpf/bpf.h:31:5: note: expected 'struct bpf_insn *' but argument is of type 'const struct bpf_insn *'
         int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
             ^~~~~~~~~~~~~~~~
          HOSTCC  samples/bpf/sockex2_user.o
        <SNIP>
          HOSTCC  samples/bpf/xdp_tx_iptunnel_user.o
        clang  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/6.2.1/include -I/git/linux/arch/x86/include -I./arch/x86/include/generated/uapi -I./arch/x86/include/generated  -I/git/linux/include -I./include -I/git/linux/arch/x86/include/uapi -I/git/linux/include/uapi -I./include/generated/uapi -include /git/linux/include/linux/kconfig.h  \
      	  -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
      	  -Wno-compare-distinct-pointer-types \
      	  -Wno-gnu-variable-sized-type-not-at-end \
      	  -Wno-address-of-packed-member -Wno-tautological-compare \
      	  -O2 -emit-llvm -c /git/linux/samples/bpf/sockex1_kern.c -o -| llc -march=bpf -filetype=obj -o samples/bpf/sockex1_kern.o
          HOSTLD  samples/bpf/tc_l2_redirect
        <SNIP>
          HOSTLD  samples/bpf/lwt_len_hist
          HOSTLD  samples/bpf/xdp_tx_iptunnel
        make[1]: Leaving directory '/tmp/build/linux'
        [root@f5065a7d6272 linux]#
      
      And then, in the host:
      
        [root@jouet bpf]# mount | grep "docker.*devicemapper\/"
        /dev/mapper/docker-253:0-1705076-9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9 on /var/lib/docker/devicemapper/mnt/9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9 type xfs (rw,relatime,context="system_u:object_r:container_file_t:s0:c73,c276",nouuid,attr2,inode64,sunit=1024,swidth=1024,noquota)
        [root@jouet bpf]# cd /var/lib/docker/devicemapper/mnt/9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9/rootfs/tmp/build/linux/samples/bpf/
        [root@jouet bpf]# file offwaketime
        offwaketime: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=f423d171e0487b2f802b6a792657f0f3c8f6d155, not stripped
        [root@jouet bpf]# readelf -SW offwaketime
        offwaketime         offwaketime_kern.o  offwaketime_user.o
        [root@jouet bpf]# readelf -SW offwaketime_kern.o
        There are 11 section headers, starting at offset 0x700:
      
        Section Headers:
          [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
          [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
          [ 1] .strtab           STRTAB          0000000000000000 000658 0000a8 00      0   0  1
          [ 2] .text             PROGBITS        0000000000000000 000040 000000 00  AX  0   0  4
          [ 3] kprobe/try_to_wake_up PROGBITS        0000000000000000 000040 0000d8 00  AX  0   0  8
          [ 4] .relkprobe/try_to_wake_up REL             0000000000000000 0005a8 000020 10     10   3  8
          [ 5] tracepoint/sched/sched_switch PROGBITS        0000000000000000 000118 000318 00  AX  0   0  8
          [ 6] .reltracepoint/sched/sched_switch REL             0000000000000000 0005c8 000090 10     10   5  8
          [ 7] maps              PROGBITS        0000000000000000 000430 000050 00  WA  0   0  4
          [ 8] license           PROGBITS        0000000000000000 000480 000004 00  WA  0   0  1
          [ 9] version           PROGBITS        0000000000000000 000484 000004 00  WA  0   0  4
          [10] .symtab           SYMTAB          0000000000000000 000488 000120 18      1   4  8
        Key to Flags:
          W (write), A (alloc), X (execute), M (merge), S (strings)
          I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
          O (extra OS processing required) o (OS specific), p (processor specific)
          [root@jouet bpf]# ./offwaketime | head -3
        qemu-system-x86;entry_SYSCALL_64_fastpath;sys_ppoll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;hrtimer_wakeup;__hrtimer_run_queues;hrtimer_interrupt;local_apic_timer_interrupt;smp_apic_timer_interrupt;__irqentry_text_start;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel;start_cpu;;swapper/0 4
        firefox;entry_SYSCALL_64_fastpath;sys_poll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;pollwake;__wake_up_common;__wake_up_sync_key;pipe_write;__vfs_write;vfs_write;sys_write;entry_SYSCALL_64_fastpath;;Timer 1
        swapper/2;start_cpu;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule;__schedule;-;---;; 61
        [root@jouet bpf]#
      Signed-off-by: default avatarJoe Stringer <joe@ovn.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: netdev@vger.kernel.org
      Link: https://github.com/joestringer/linux/commit/5c40f54a52b1f437123c81e21873f4b4b1f9bd55.patch
      Link: http://lkml.kernel.org/n/tip-xr8twtx7sjh5821g8qw47yxk@git.kernel.org
      [ Use -I$(srctree)/tools/lib/ to support out of source code tree builds, as noticed by Wang Nan ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      43371c83
    • Kan Liang's avatar
      perf diff: Do not overwrite valid build id · ed6c166c
      Kan Liang authored
      Fixes a perf diff regression issue which was introduced by commit
      5baecbcd ("perf symbols: we can now read separate debug-info files
      based on a build ID")
      
      The binary name could be same when perf diff different binaries. Build
      id is used to distinguish between them.
      However, the previous patch assumes the same binary name has same build
      id. So it overwrites the build id according to the binary name,
      regardless of whether the build id is set or not.
      
      Check the has_build_id in dso__load. If the build id is already set, use
      it.
      
      Before the fix:
      
        $ perf diff 1.perf.data 2.perf.data
        # Event 'cycles'
        #
        # Baseline    Delta  Shared Object     Symbol
        # ........  .......  ................  .............................
        #
          99.83%  -99.80%  tchain_edit       [.] f2
           0.12%  +99.81%  tchain_edit       [.] f3
           0.02%   -0.01%  [ixgbe]           [k] ixgbe_read_reg
      
        After the fix:
        $ perf diff 1.perf.data 2.perf.data
        # Event 'cycles'
        #
        # Baseline    Delta  Shared Object     Symbol
        # ........  .......  ................  .............................
        #
          99.83%   +0.10%  tchain_edit       [.] f3
           0.12%   -0.08%  tchain_edit       [.] f2
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      CC: Dima Kogan <dima@secretsauce.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Fixes: 5baecbcd ("perf symbols: we can now read separate debug-info files based on a build ID")
      Link: http://lkml.kernel.org/r/1481642984-13593-1-git-send-email-kan.liang@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ed6c166c
    • Ravi Bangoria's avatar
      perf annotate: Don't throw error for zero length symbols · edee44be
      Ravi Bangoria authored
      'perf report --tui' exits with error when it finds a sample of zero
      length symbol (i.e. addr == sym->start == sym->end). Actually these are
      valid samples. Don't exit TUI and show report with such symbols.
      Reported-and-Tested-by: default avatarAnton Blanchard <anton@samba.org>
      Link: https://lkml.org/lkml/2016/10/8/189Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Chris Riyder <chris.ryder@arm.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@kernel.org # v4.9+
      Link: http://lkml.kernel.org/r/1479804050-5028-1-git-send-email-ravi.bangoria@linux.vnet.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      edee44be
    • Davidlohr Bueso's avatar
      perf bench futex: Fix lock-pi help string · 9de3ffa1
      Davidlohr Bueso authored
      Obvious copy/paste typo from the requeue program.
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Link: http://lkml.kernel.org/r/1481830584-30909-1-git-send-email-dave@stgolabs.netSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9de3ffa1
    • Jiri Olsa's avatar
      perf trace: Check if MAP_32BIT is defined (again) · 2bd42f3a
      Jiri Olsa authored
      There might be systems where MAP_32BIT is not defined, like some some
      RHEL7 powerpc versions.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Kyle McMartin <kyle@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Fixes: 256763b0 ("perf trace beauty mmap: Add more conditional defines")
      Link: http://lkml.kernel.org/r/1481831814-23683-1-git-send-email-jolsa@kernel.org
      [ Changed the Fixme cset to the one removing the conditional switch case for MAP_32BIT ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2bd42f3a
    • Arnaldo Carvalho de Melo's avatar
      samples/bpf: Make perf_event_read() static · 96c2fb69
      Arnaldo Carvalho de Melo authored
      While testing Joe's conversion of samples/bpf/ to use tools/lib/bpf/ I noticed
      some warnings building samples/bpf/ on a Fedora Rawhide container, with
      clang/llvm 3.9 I noticed this:
      
        [root@1e797fdfbf4f linux]# make -j4 O=/tmp/build/linux/ samples/bpf/
        make[1]: Entering directory '/tmp/build/linux'
          CHK     include/config/kernel.release
          GEN     ./Makefile
          CHK     include/generated/uapi/linux/version.h
          Using /git/linux as source for kernel
        <SNIP>
          HOSTCC  samples/bpf/trace_output_user.o
        /git/linux/samples/bpf/trace_output_user.c:64:6: warning: no previous
        prototype for 'perf_event_read' [-Wmissing-prototypes]
         void perf_event_read(print_fn fn)
              ^~~~~~~~~~~~~~~
          HOSTLD  samples/bpf/trace_output
        make[1]: Leaving directory '/tmp/build/linux'
      
      Shut up the compiler by making that function static.
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Joe Stringer <joe@ovn.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/20161215152927.GC6866@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      96c2fb69
  3. 18 Dec, 2016 1 commit
    • Marcin Nowakowski's avatar
      uprobes: Fix uprobes on MIPS, allow for a cache flush after ixol breakpoint creation · 297e765e
      Marcin Nowakowski authored
      Commit:
      
        72e6ae28 ('ARM: 8043/1: uprobes need icache flush after xol write'
      
      ... has introduced an arch-specific method to ensure all caches are
      flushed appropriately after an instruction is written to an XOL page.
      
      However, when the XOL area is created and the out-of-line breakpoint
      instruction is copied, caches are not flushed at all and stale data may
      be found in icache.
      
      Replace a simple copy_to_page() with arch_uprobe_copy_ixol() to allow
      the arch to ensure all caches are updated accordingly.
      
      This change fixes uprobes on MIPS InterAptiv (tested on Creator Ci40).
      Signed-off-by: default avatarMarcin Nowakowski <marcin.nowakowski@imgtec.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Victor Kamensky <victor.kamensky@linaro.org>
      Cc: linux-mips@linux-mips.org
      Link: http://lkml.kernel.org/r/1481625657-22850-1-git-send-email-marcin.nowakowski@imgtec.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      297e765e
  4. 15 Dec, 2016 19 commits
    • Joe Stringer's avatar
      samples/bpf: Make samples more libbpf-centric · d40fc181
      Joe Stringer authored
      Switch all of the sample code to use the function names from
      tools/lib/bpf so that they're consistent with that, and to declare their
      own log buffers. This allow the next commit to be purely devoted to
      getting rid of the duplicate library in samples/bpf.
      
      Committer notes:
      
      Testing it:
      
      On a fedora rawhide container, with clang/llvm 3.9, sharing the host
      linux kernel git tree:
      
        # make O=/tmp/build/linux/ headers_install
        # make O=/tmp/build/linux -C samples/bpf/
      
      Since I forgot to make it privileged, just tested it outside the
      container, using what it generated:
      
        # uname -a
        Linux jouet 4.9.0-rc8+ #1 SMP Mon Dec 12 11:20:49 BRT 2016 x86_64 x86_64 x86_64 GNU/Linux
        # cd /var/lib/docker/devicemapper/mnt/c43e09a53ff56c86a07baf79847f00e2cc2a17a1e2220e1adbf8cbc62734feda/rootfs/tmp/build/linux/samples/bpf/
        # ls -la offwaketime
        -rwxr-xr-x. 1 root root 24200 Dec 15 12:19 offwaketime
        # file offwaketime
        offwaketime: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=c940d3f127d5e66cdd680e42d885cb0b64f8a0e4, not stripped
        # readelf -SW offwaketime_kern.o  | grep PROGBITS
        [ 2] .text             PROGBITS        0000000000000000 000040 000000 00  AX  0   0  4
        [ 3] kprobe/try_to_wake_up PROGBITS        0000000000000000 000040 0000d8 00  AX  0   0  8
        [ 5] tracepoint/sched/sched_switch PROGBITS        0000000000000000 000118 000318 00  AX  0   0  8
        [ 7] maps              PROGBITS        0000000000000000 000430 000050 00  WA  0   0  4
        [ 8] license           PROGBITS        0000000000000000 000480 000004 00  WA  0   0  1
        [ 9] version           PROGBITS        0000000000000000 000484 000004 00  WA  0   0  4
        # ./offwaketime | head -5
        swapper/1;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule;__schedule;-;---;; 106
        CPU 0/KVM;entry_SYSCALL_64_fastpath;sys_ioctl;do_vfs_ioctl;kvm_vcpu_ioctl;kvm_arch_vcpu_ioctl_run;kvm_vcpu_block;schedule;__schedule;-;try_to_wake_up;swake_up_locked;swake_up;apic_timer_expired;apic_timer_fn;__hrtimer_run_queues;hrtimer_interrupt;local_apic_timer_interrupt;smp_apic_timer_interrupt;__irqentry_text_start;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary;;swapper/3 2
        Compositor;entry_SYSCALL_64_fastpath;sys_futex;do_futex;futex_wait;futex_wait_queue_me;schedule;__schedule;-;try_to_wake_up;futex_requeue;do_futex;sys_futex;entry_SYSCALL_64_fastpath;;SoftwareVsyncTh 5
        firefox;entry_SYSCALL_64_fastpath;sys_poll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;pollwake;__wake_up_common;__wake_up_sync_key;pipe_write;__vfs_write;vfs_write;sys_write;entry_SYSCALL_64_fastpath;;Timer 13
        JS Helper;entry_SYSCALL_64_fastpath;sys_futex;do_futex;futex_wait;futex_wait_queue_me;schedule;__schedule;-;try_to_wake_up;do_futex;sys_futex;entry_SYSCALL_64_fastpath;;firefox 2
        #
      Signed-off-by: default avatarJoe Stringer <joe@ovn.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: netdev@vger.kernel.org
      Link: http://lkml.kernel.org/r/20161214224342.12858-2-joe@ovn.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d40fc181
    • Joe Stringer's avatar
      tools lib bpf: Add flags to bpf_create_map() · a5580c7f
      Joe Stringer authored
      Commit 6c905981 ("bpf: pre-allocate hash map elements") introduces
      map_flags to bpf_attr for BPF_MAP_CREATE command. Expose this new
      parameter in libbpf.
      
      By exposing it, users can access flags such as whether or not to
      preallocate the map.
      Signed-off-by: default avatarJoe Stringer <joe@ovn.org>
      Acked-by: default avatarWang Nan <wangnan0@huawei.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Link: http://lkml.kernel.org/r/20161209024620.31660-4-joe@ovn.org
      [ Added clarifying comment made by Wang Nan ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a5580c7f
    • Joe Stringer's avatar
      tools lib bpf: use __u32 from linux/types.h · 83d994d0
      Joe Stringer authored
      Fixes the following issue when building without access to 'u32' type:
      
      ./tools/lib/bpf/bpf.h:27:23: error: unknown type name ‘u32’
      Signed-off-by: default avatarJoe Stringer <joe@ovn.org>
      Acked-by: default avatarWang Nan <wangnan0@huawei.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Link: http://lkml.kernel.org/r/20161209024620.31660-3-joe@ovn.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      83d994d0
    • Joe Stringer's avatar
      tools lib bpf: Sync {tools,}/include/uapi/linux/bpf.h · 0cb34dc2
      Joe Stringer authored
      The tools version of this header is out of date; update it to the latest
      version from the kernel headers.
      Signed-off-by: default avatarJoe Stringer <joe@ovn.org>
      Acked-by: default avatarWang Nan <wangnan0@huawei.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Link: http://lkml.kernel.org/r/20161209024620.31660-2-joe@ovn.org
      [ Sync it harder, after merging with what was in net-next via perf/urgent via torvalds/master to get BPG_PROG_(AT|DE)TACH, etc ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      0cb34dc2
    • Ravi Bangoria's avatar
      perf annotate: Fix jump target outside of function address range · e216874c
      Ravi Bangoria authored
      If jump target is outside of function range, perf is not handling it
      correctly. Especially when target address is lesser than function start
      address, target offset will be negative. But, target address declared to
      be unsigned, converts negative number into 2's complement. See below
      example. Here target of 'jumpq' instruction at 34cf8 is 34ac0 which is
      lesser than function start address(34cf0).
      
              34ac0 - 34cf0 = -0x230 = 0xfffffffffffffdd0
      
      Objdump output:
      
        0000000000034cf0 <__sigaction>:
        __GI___sigaction():
          34cf0: lea    -0x20(%rdi),%eax
          34cf3: cmp    -bashx1,%eax
          34cf6: jbe    34d00 <__sigaction+0x10>
          34cf8: jmpq   34ac0 <__GI___libc_sigaction>
          34cfd: nopl   (%rax)
          34d00: mov    0x386161(%rip),%rax        # 3bae68 <_DYNAMIC+0x2e8>
          34d07: movl   -bashx16,%fs:(%rax)
          34d0e: mov    -bashxffffffff,%eax
          34d13: retq
      
      perf annotate before applying patch:
      
        __GI___sigaction  /usr/lib64/libc-2.22.so
                 lea    -0x20(%rdi),%eax
                 cmp    -bashx1,%eax
              v  jbe    10
              v  jmpq   fffffffffffffdd0
                 nop
          10:    mov    _DYNAMIC+0x2e8,%rax
                 movl   -bashx16,%fs:(%rax)
                 mov    -bashxffffffff,%eax
                 retq
      
      perf annotate after applying patch:
      
        __GI___sigaction  /usr/lib64/libc-2.22.so
                 lea    -0x20(%rdi),%eax
                 cmp    -bashx1,%eax
              v  jbe    10
              ^  jmpq   34ac0 <__GI___libc_sigaction>
                 nop
          10:    mov    _DYNAMIC+0x2e8,%rax
                 movl   -bashx16,%fs:(%rax)
                 mov    -bashxffffffff,%eax
                 retq
      Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Chris Riyder <chris.ryder@arm.com>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Taeung Song <treeze.taeung@gmail.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/1480953407-7605-3-git-send-email-ravi.bangoria@linux.vnet.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e216874c
    • Ravi Bangoria's avatar
      perf annotate: Support jump instruction with target as second operand · 3ee2eb6d
      Ravi Bangoria authored
      Architectures like PowerPC have jump instructions that includes a target
      address as a second operand. For example, 'bne cr7,0xc0000000000f6154'.
      Add support for such instruction in perf annotate.
      
      objdump o/p:
        c0000000000f6140:   ld     r9,1032(r31)
        c0000000000f6144:   cmpdi  cr7,r9,0
        c0000000000f6148:   bne    cr7,0xc0000000000f6154
        c0000000000f614c:   ld     r9,2312(r30)
        c0000000000f6150:   std    r9,1032(r31)
        c0000000000f6154:   ld     r9,88(r31)
      
      Corresponding perf annotate o/p:
      
      Before patch:
               ld     r9,1032(r31)
               cmpdi  cr7,r9,0
            v  bne    3ffffffffff09f2c
               ld     r9,2312(r30)
               std    r9,1032(r31)
        74:    ld     r9,88(r31)
      
      After patch:
               ld     r9,1032(r31)
               cmpdi  cr7,r9,0
            v  bne    74
               ld     r9,2312(r30)
               std    r9,1032(r31)
        74:    ld     r9,88(r31)
      Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Chris Riyder <chris.ryder@arm.com>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Taeung Song <treeze.taeung@gmail.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/1480953407-7605-2-git-send-email-ravi.bangoria@linux.vnet.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3ee2eb6d
    • Jiri Olsa's avatar
      perf record: Force ignore_missing_thread for uid option · 23dc4f15
      Jiri Olsa authored
      Enable perf_evsel::ignore_missing_thread for -u option to ignore
      complete failure if any of the user's processes die between its
      enumeration and time we open the event.
      
      Committer notes:
      
      While doing a 'make -j4 allmodconfig' we sometimes get into the race:
      
      Before:
      
        # perf record -u acme
        Error:
        The sys_perf_event_open() syscall returned with 3 (No such process) for event (cycles:ppp).
        /bin/dmesg may provide additional information.
        No CONFIG_PERF_EVENTS=y kernel support configured?
        #
      
      After:
      
        [root@jouet ~]# perf record -u acme
        WARNING: Ignored open failure for pid 9888
        WARNING: Ignored open failure for pid 18059
        [root@jouet ~]#
      
      Which is an improvement, with the races not preventing the remaining threads
      for the specified user from being monitored, but the message probably needs
      further clarification.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1481538943-21874-6-git-send-email-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      23dc4f15
    • Jiri Olsa's avatar
      perf evsel: Allow to ignore missing pid · a359c17a
      Jiri Olsa authored
      Adding perf_evsel::ignore_missing_cpu_thread bool.
      
      When set true, it allows perf to ignore error of missing pid of perf
      event syscall.
      
      We remove missing thread id from the thread_map, so the rest of the
      processing like ioctl and mmap won't get disturbed with -1 fd.
      
      The reason for supporting this is to ease up monitoring group of pids,
      that 'disappear' before perf opens their event. This currently leads
      perf to report error and exit and makes perf record's -u option unusable
      under certain setup.
      
      With this change we will allow this race and ignore such failure with
      following warning:
      
        WARNING: Ignored open failure for pid 8605
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20161213074622.GA3084@kravaSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a359c17a
    • Jiri Olsa's avatar
      perf thread_map: Add thread_map__remove function · 38af91f0
      Jiri Olsa authored
      Add thread_map__remove function to remove thread from thread map.
      
      Add automated test also.
      
      Committer notes:
      
      Testing it:
      
        # perf test "Remove thread map"
        39: Remove thread map                          : Ok
        # perf test -v "Remove thread map"
        39: Remove thread map                          :
        --- start ---
        test child forked, pid 4483
        2 threads: 4482, 4483
        1 thread: 4483
        0 thread:
        test child finished with 0
        ---- end ----
        Remove thread map: Ok
        #
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1481538943-21874-4-git-send-email-jolsa@kernel.org
      [ Added stdlib.h, to get the free() declaration ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      38af91f0
    • Jiri Olsa's avatar
      perf evsel: Use variable instead of repeating lengthy FD macro · 83c2e4f3
      Jiri Olsa authored
      It's more readable and will ease up following patches.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1481538943-21874-3-git-send-email-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      83c2e4f3
    • Jiri Olsa's avatar
      perf mem: Fix --all-user/--all-kernel options · 631ac41b
      Jiri Olsa authored
      Removing extra '--' prefix.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Fixes: ad16511b ("perf mem: Add -U/-K (--all-user/--all-kernel) options")
      Link: http://lkml.kernel.org/r/1481538943-21874-2-git-send-email-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      631ac41b
    • Arnaldo Carvalho de Melo's avatar
      perf tools: Remove some needless __maybe_unused · 7e6a7998
      Arnaldo Carvalho de Melo authored
      I.e. those parameters/functions _are_ used, so ditch that misleading attribute.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-13cqtjh0yojg5gzvpq1zzpl0@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7e6a7998
    • Namhyung Kim's avatar
      perf sched timehist: Show callchains for idle stat · ba957ebb
      Namhyung Kim authored
      When --idle-hist option is used with --summary, it now shows idle stats
      with callchains like below:
      
        Idle stats by callchain:
        CPU  0:   902.195 msec
        Idle time (msec)    Count Callchains
        ----------------  ------- --------------------------------------------------
                 370.589       69 futex_wait_queue_me <- futex_wait <- do_futex <- sys_futex <- entry_SYSCALL_64_fastpath
                 178.799       17 worker_thread <- kthread <- ret_from_fork
                 128.352       17 schedule_timeout <- rcu_gp_kthread <- kthread <- ret_from_fork
                 125.111       19 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_select <- core_sys_select
                  71.599       50 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_sys_poll <- sys_poll
                  23.146        1 rcu_gp_kthread <- kthread <- ret_from_fork
                   4.510        1 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- ep_poll <- sys_epoll_wait <- do_syscall_64
                   0.085        1 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_sys_poll <- do_restart_poll
        ...
      
      Committer notes:
      
      Extra testing:
      
        # uname -a
        Linux jouet 4.8.8-300.fc25.x86_64 #1 SMP Tue Nov 15 18:10:06 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
      
      1) Run 'perf sched record -g'
      
      2) Run 'perf sched timehist --idle --summary'
      
      <SNIP>
        Idle stats by callchain:
        CPU  0: 13456.840 msec
        Idle time (msec) Count Callchains
        ---------------- ----- --------------------------------------------------
                5386.637  3283 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_sys_poll <- sys_poll
                2750.238  2299 futex_wait_queue_me <- futex_wait <- do_futex <- sys_futex <- do_syscall_64
                1275.672  1287 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- ep_poll <- sys_epoll_wait <- entry_SYSCALL_64_fastpath
                 936.322   452 worker_thread <- kthread <- ret_from_fork
                 741.311   385 rcu_nocb_kthread <- kthread <- ret_from_fork
                 729.385   248 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_sys_poll <- sys_ppoll
                 365.386   229 irq_thread <- kthread <- ret_from_fork
                 338.934   265 futex_wait_queue_me <- futex_wait <- do_futex <- sys_futex <- entry_SYSCALL_64_fastpath
                 219.488   201 schedule_timeout <- rcu_gp_kthread <- kthread <- ret_from_fork
                 186.839   410 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- ep_poll <- sys_epoll_wait <- do_syscall_64
                 142.541    59 kvm_vcpu_block <- kvm_arch_vcpu_ioctl_run <- kvm_vcpu_ioctl <- do_vfs_ioctl <- sys_ioctl
                  83.887    92 smpboot_thread_fn <- kthread <- ret_from_fork
                  62.722    96 do_exit <- do_group_exit <- 0x2a5594 <- entry_SYSCALL_64_fastpath
                  47.894    83 pipe_wait <- pipe_read <- __vfs_read <- vfs_read <- sys_read
                  46.554    61 rcu_gp_kthread <- kthread <- ret_from_fork
                  34.337    21 schedule_timeout <- intel_fbc_work_fn <- process_one_work <- worker_thread <- kthread
                  29.521    14 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_select <- core_sys_select
                  20.274    10 schedule_timeout <- io_schedule_timeout <- bit_wait_io <- __wait_on_bit <- out_of_line_wait_on_bit
                  15.085    55 schedule_timeout <- unix_stream_read_generic <- unix_stream_recvmsg <- sock_recvmsg <- SYSC_recvfrom
      <SNIP>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20161208144755.16673-7-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ba957ebb
    • Namhyung Kim's avatar
      perf sched timehist: Add -I/--idle-hist option · 07235f84
      Namhyung Kim authored
      The --idle-hist option is to analyze system idle state so which process
      makes cpu to go idle.  If this option is specified, non-idle events will
      be skipped and processes switching to/from idle will be shown.
      
      This option is mostly useful when used with --summary(-only) option.  In
      the idle-time summary view, idle time is accounted to previous thread
      which is run before idle task.
      
      The example output looks like following:
      
        Idle-time summary
                        comm parent sched-out idle-time min-idle avg-idle max-idle stddev migrations
                                      (count)    (msec)   (msec)   (msec)   (msec)      %
        --------------------------------------------------------------------------------------------
              rcu_preempt[7]      2        95   550.872    0.011    5.798   23.146   7.63      0
             migration/1[16]      2         1    15.558   15.558   15.558   15.558   0.00      0
              khugepaged[39]      2         1     3.062    3.062    3.062    3.062   0.00      0
           kworker/0:1H[124]      2         2     4.728    0.611    2.364    4.116  74.12      0
        systemd-journal[167]      1         1     4.510    4.510    4.510    4.510   0.00      0
          kworker/u16:3[558]      2        13    74.737    0.080    5.749   12.960  21.96      0
         irq/34-iwlwifi[628]      2        21   118.403    0.032    5.638   23.990  24.00      0
          kworker/u17:0[673]      2         1     3.523    3.523    3.523    3.523   0.00      0
            dbus-daemon[722]      1         1     6.743    6.743    6.743    6.743   0.00      0
                ifplugd[741]      1         1    58.826   58.826   58.826   58.826   0.00      0
        wpa_supplicant[1490]      1         1    13.302   13.302   13.302   13.302   0.00      0
           wpa_actiond[1492]      1         2     4.064    0.168    2.032    3.896  91.72      0
               dockerd[1500]      1         1     0.055    0.055    0.055    0.055   0.00      0
        ...
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20161208144755.16673-6-namhyung@kernel.org
      Link: http://lkml.kernel.org/r/20161213080632.19099-2-namhyung@kernel.org
      [ Merged fix sent by Namhyumg, as posted in the second Link: tag ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      07235f84
    • Namhyung Kim's avatar
      perf sched timehist: Skip non-idle events when necessary · a4b2b6f5
      Namhyung Kim authored
      Sometimes it only focuses on idle-related events like upcoming idle-hist
      feature.  In this case we don't want to see other event to reduce noise.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20161208144755.16673-5-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a4b2b6f5
    • Namhyung Kim's avatar
      perf sched timehist: Save callchain when entering idle · 699b5b92
      Namhyung Kim authored
      In order to investigate the idleness reason, it is necessary to keep the
      callchains when entering idle.  This can be identified by the
      sched:sched_switch event having the next_pid field as 0.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20161208144755.16673-4-namhyung@kernel.org
      Link: http://lkml.kernel.org/r/20161213080632.19099-1-namhyung@kernel.org
      [ Merged fix from Namhyung, see second Link: tag ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      699b5b92
    • Namhyung Kim's avatar
      perf sched timehist: Introduce struct idle_time_data · 3bc2fa9c
      Namhyung Kim authored
      The struct idle_time_data is to keep idle stats with callchains entering
      to the idle task.  The normal thread_runtime calculation is done
      transparently since it extends the struct thread_runtime.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20161208144755.16673-3-namhyung@kernel.org
      [ Align struct field names ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3bc2fa9c
    • Namhyung Kim's avatar
      perf sched timehist: Split is_idle_sample() · 96039c7c
      Namhyung Kim authored
      The is_idle_sample() function actually does more than determining
      whether sample come from idle task.  Split the callchain part into
      save_task_callchain() to make it clearer.
      
      Also checking prev_pid from trace data looks preferred than just
      checking sample->pid since it's possible, although rare, to have invalid
      0 pid/tid on scheduling an exiting task.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20161208144755.16673-2-namhyung@kernel.org
      [ Remove some needless () in some return statements ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      96039c7c
    • Jiri Olsa's avatar
      perf tools: Move headers check into bash script · aeafd623
      Jiri Olsa authored
      To make it nicer and easily maintainable.
      
      Also moving the check into fixdep sub make, so its output is not
      scattered around the build output.
      
      Removing extra $$ from mman*.h checks.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1481030331-31944-5-git-send-email-jolsa@kernel.org
      [ Use /bin/sh, and 'function check() {' -> 'check () {' to make it work with busybox, in Alpine Linux, for instance ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      aeafd623
  5. 13 Dec, 2016 7 commits
    • Linus Torvalds's avatar
      Merge tag 'docs-4.10' of git://git.lwn.net/linux · e7aa8c2e
      Linus Torvalds authored
      Pull documentation update from Jonathan Corbet:
       "These are the documentation changes for 4.10.
      
        It's another busy cycle for the docs tree, as the sphinx conversion
        continues. Highlights include:
      
         - Further work on PDF output, which remains a bit of a pain but
           should be more solid now.
      
         - Five more DocBook template files converted to Sphinx. Only 27 to
           go... Lots of plain-text files have also been converted and
           integrated.
      
         - Images in binary formats have been replaced with more
           source-friendly versions.
      
         - Various bits of organizational work, including the renaming of
           various files discussed at the kernel summit.
      
         - New documentation for the device_link mechanism.
      
        ... and, of course, lots of typo fixes and small updates"
      
      * tag 'docs-4.10' of git://git.lwn.net/linux: (193 commits)
        dma-buf: Extract dma-buf.rst
        Update Documentation/00-INDEX
        docs: 00-INDEX: document directories/files with no docs
        docs: 00-INDEX: remove non-existing entries
        docs: 00-INDEX: add missing entries for documentation files/dirs
        docs: 00-INDEX: consolidate process/ and admin-guide/ description
        scripts: add a script to check if Documentation/00-INDEX is sane
        Docs: change sh -> awk in REPORTING-BUGS
        Documentation/core-api/device_link: Add initial documentation
        core-api: remove an unexpected unident
        ppc/idle: Add documentation for powersave=off
        Doc: Correct typo, "Introdution" => "Introduction"
        Documentation/atomic_ops.txt: convert to ReST markup
        Documentation/local_ops.txt: convert to ReST markup
        Documentation/assoc_array.txt: convert to ReST markup
        docs-rst: parse-headers.pl: cleanup the documentation
        docs-rst: fix media cleandocs target
        docs-rst: media/Makefile: reorganize the rules
        docs-rst: media: build SVG from graphviz files
        docs-rst: replace bayer.png by a SVG image
        ...
      e7aa8c2e
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · e34bac72
      Linus Torvalds authored
      Merge updates from Andrew Morton:
      
       - various misc bits
      
       - most of MM (quite a lot of MM material is awaiting the merge of
         linux-next dependencies)
      
       - kasan
      
       - printk updates
      
       - procfs updates
      
       - MAINTAINERS
      
       - /lib updates
      
       - checkpatch updates
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (123 commits)
        init: reduce rootwait polling interval time to 5ms
        binfmt_elf: use vmalloc() for allocation of vma_filesz
        checkpatch: don't emit unified-diff error for rename-only patches
        checkpatch: don't check c99 types like uint8_t under tools
        checkpatch: avoid multiple line dereferences
        checkpatch: don't check .pl files, improve absolute path commit log test
        scripts/checkpatch.pl: fix spelling
        checkpatch: don't try to get maintained status when --no-tree is given
        lib/ida: document locking requirements a bit better
        lib/rbtree.c: fix typo in comment of ____rb_erase_color
        lib/Kconfig.debug: make CONFIG_STRICT_DEVMEM depend on CONFIG_DEVMEM
        MAINTAINERS: add drm and drm/i915 irc channels
        MAINTAINERS: add "C:" for URI for chat where developers hang out
        MAINTAINERS: add drm and drm/i915 bug filing info
        MAINTAINERS: add "B:" for URI where to file bugs
        get_maintainer: look for arbitrary letter prefixes in sections
        printk: add Kconfig option to set default console loglevel
        printk/sound: handle more message headers
        printk/btrfs: handle more message headers
        printk/kdb: handle more message headers
        ...
      e34bac72
    • Joe Perches's avatar
      treewide: Make remaining source files non-executable · fe6bce8d
      Joe Perches authored
      .c and .h source files should not be executable, change
      the permissions to 0644.
      
      [ This would normally go through Andrew Morton, but his ancient
        patch-based toolchain doesn't do permission changes ]
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Acked-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fe6bce8d
    • Linus Torvalds's avatar
      Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f082f02c
      Linus Torvalds authored
      Pull irq updates from Thomas Gleixner:
       "The irq department provides:
      
         - a major update to the auto affinity management code, which is used
           by multi-queue devices
      
         - move of the microblaze irq chip driver into the common driver code
           so it can be shared between microblaze, powerpc and MIPS
      
         - a series of updates to the ARM GICV3 interrupt controller
      
         - the usual pile of fixes and small improvements all over the place"
      
      * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
        powerpc/virtex: Use generic xilinx irqchip driver
        irqchip/xilinx: Try to fall back if xlnx,kind-of-intr not provided
        irqchip/xilinx: Add support for parent intc
        irqchip/xilinx: Rename get_irq to xintc_get_irq
        irqchip/xilinx: Restructure and use jump label api
        irqchip/xilinx: Clean up print messages
        microblaze/irqchip: Move intc driver to irqchip
        ARM: virt: Select ARM_GIC_V3_ITS
        ARM: gic-v3-its: Add 32bit support to GICv3 ITS
        irqchip/gic-v3-its: Specialise readq and writeq accesses
        irqchip/gic-v3-its: Specialise flush_dcache operation
        irqchip/gic-v3-its: Narrow down Entry Size when used as a divider
        irqchip/gic-v3-its: Change unsigned types for AArch32 compatibility
        irqchip/gic-v3: Use nops macro for Cavium ThunderX erratum 23154
        irqchip/gic-v3: Convert arm64 GIC accessors to {read,write}_sysreg_s
        genirq/msi: Drop artificial PCI dependency
        irqchip/bcm7038-l1: Implement irq_cpu_offline() callback
        genirq/affinity: Use default affinity mask for reserved vectors
        genirq/affinity: Take reserved vectors into account when spreading irqs
        PCI: Remove the irq_affinity mask from struct pci_dev
        ...
      f082f02c
    • Linus Torvalds's avatar
      Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9465d9cc
      Linus Torvalds authored
      Pull timer updates from Thomas Gleixner:
       "The time/timekeeping/timer folks deliver with this update:
      
         - Fix a reintroduced signed/unsigned issue and cleanup the whole
           signed/unsigned mess in the timekeeping core so this wont happen
           accidentaly again.
      
         - Add a new trace clock based on boot time
      
         - Prevent injection of random sleep times when PM tracing abuses the
           RTC for storage
      
         - Make posix timers configurable for real tiny systems
      
         - Add tracepoints for the alarm timer subsystem so timer based
           suspend wakeups can be instrumented
      
         - The usual pile of fixes and updates to core and drivers"
      
      * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
        timekeeping: Use mul_u64_u32_shr() instead of open coding it
        timekeeping: Get rid of pointless typecasts
        timekeeping: Make the conversion call chain consistently unsigned
        timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion
        alarmtimer: Add tracepoints for alarm timers
        trace: Update documentation for mono, mono_raw and boot clock
        trace: Add an option for boot clock as trace clock
        timekeeping: Add a fast and NMI safe boot clock
        timekeeping/clocksource_cyc2ns: Document intended range limitation
        timekeeping: Ignore the bogus sleep time if pm_trace is enabled
        selftests/timers: Fix spelling mistake "Asyncrhonous" -> "Asynchronous"
        clocksource/drivers/bcm2835_timer: Unmap region obtained by of_iomap
        clocksource/drivers/arm_arch_timer: Map frame with of_io_request_and_map()
        arm64: dts: rockchip: Arch counter doesn't tick in system suspend
        clocksource/drivers/arm_arch_timer: Don't assume clock runs in suspend
        posix-timers: Make them configurable
        posix_cpu_timers: Move the add_device_randomness() call to a proper place
        timer: Move sys_alarm from timer.c to itimer.c
        ptp_clock: Allow for it to be optional
        Kconfig: Regenerate *.c_shipped files after previous changes
        ...
      9465d9cc
    • Linus Torvalds's avatar
      Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e71c3978
      Linus Torvalds authored
      Pull smp hotplug updates from Thomas Gleixner:
       "This is the final round of converting the notifier mess to the state
        machine. The removal of the notifiers and the related infrastructure
        will happen around rc1, as there are conversions outstanding in other
        trees.
      
        The whole exercise removed about 2000 lines of code in total and in
        course of the conversion several dozen bugs got fixed. The new
        mechanism allows to test almost every hotplug step standalone, so
        usage sites can exercise all transitions extensively.
      
        There is more room for improvement, like integrating all the
        pointlessly different architecture mechanisms of synchronizing,
        setting cpus online etc into the core code"
      
      * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
        tracing/rb: Init the CPU mask on allocation
        soc/fsl/qbman: Convert to hotplug state machine
        soc/fsl/qbman: Convert to hotplug state machine
        zram: Convert to hotplug state machine
        KVM/PPC/Book3S HV: Convert to hotplug state machine
        arm64/cpuinfo: Convert to hotplug state machine
        arm64/cpuinfo: Make hotplug notifier symmetric
        mm/compaction: Convert to hotplug state machine
        iommu/vt-d: Convert to hotplug state machine
        mm/zswap: Convert pool to hotplug state machine
        mm/zswap: Convert dst-mem to hotplug state machine
        mm/zsmalloc: Convert to hotplug state machine
        mm/vmstat: Convert to hotplug state machine
        mm/vmstat: Avoid on each online CPU loops
        mm/vmstat: Drop get_online_cpus() from init_cpu_node_state/vmstat_cpu_dead()
        tracing/rb: Convert to hotplug state machine
        oprofile/nmi timer: Convert to hotplug state machine
        net/iucv: Use explicit clean up labels in iucv_init()
        x86/pci/amd-bus: Convert to hotplug state machine
        x86/oprofile/nmi: Convert to hotplug state machine
        ...
      e71c3978
    • Jungseung Lee's avatar
      init: reduce rootwait polling interval time to 5ms · 39a0e975
      Jungseung Lee authored
      For several devices, the rootwait time is sensitive because it directly
      affects booting time.  The polling interval of rootwait is currently
      100ms.  To save unnessesary waiting time, reduce the polling interval to
      5 ms.
      
      [akpm@linux-foundation.org: remove used-once #define]
      Link: http://lkml.kernel.org/r/20161207060743.1728-1-js07.lee@samsung.comSigned-off-by: default avatarJungseung Lee <js07.lee@samsung.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      39a0e975