Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf updates from Ingo Molnar: "This update is pretty big and almost exclusively includes tooling changes, because v4.9's LTS status forced to completion most of the pending kernel side hardware enablement work and because we tried to freeze core perf work a bit to give a time window for the fuzzing efforts. The diff is large mostly due to the JSON hardware event tables added for Intel and Power8 CPUs. This was a popular feature request from people working close to hardware and from the HPC community. Tree size is big because this added the CPU event tables for over a decade of Intel CPUs. Future changes for a CPU vendor alrady support should be much smaller, as events for new models are added. The new events are listed in 'perf list', for the CPU model the tool is running on. If you find an interesting event it can be used as-is: $ perf stat -a -e l2_lines_out.pf_clean sleep 1 Performance counter stats for 'system wide': 7,860,403 l2_lines_out.pf_clean 1.000624918 seconds time elapsed The event lists can be searched the usual 'perf list' fashion for (case insensitive) substrings as well: $ perf list l2_lines_out List of pre-defined events (to be used in -e): cache: l2_lines_out.demand_clean [Clean L2 cache lines evicted by demand] l2_lines_out.demand_dirty [Dirty L2 cache lines evicted by demand] l2_lines_out.dirty_all [Dirty L2 cache lines filling the L2] l2_lines_out.pf_clean [Clean L2 cache lines evicted by L2 prefetch] l2_lines_out.pf_dirty [Dirty L2 cache lines evicted by L2 prefetch] etc. There's a few high level categories as well that can be listed: 'cache', 'floating point', 'frontend', 'memory', 'pipeline', 'virtual memory'. Existing generic events and workflows should work as-is. The only kernel side change is a late breaking fix for an older regression, related to Intel BTS, LBR and PT feature interaction. On the tooling side there are three new tools / major features: - The new 'perf c2c' tool provides means for Shared Data C2C/HITM analysis. This allows you to track down cacheline contention. The tool is based on x86's load latency and precise store facility events provided by Intel CPUs. It was tested by Joe Mario and has proven to be useful, finding some cacheline contentions. Joe also wrote a blog about c2c tool with examples: https://joemario.github.io/blog/2016/09/01/c2c-blog/ excerpt of the content on this site: At a high level, “perf c2c” will show you: * The cachelines where false sharing was detected. * The readers and writers to those cachelines, and the offsets where those accesses occurred. * The pid, tid, instruction addr, function name, binary object name for those readers and writers. * The source file and line number for each reader and writer. * The average load latency for the loads to those cachelines. * Which numa nodes the samples a cacheline came from and which CPUs were involved. Using perf c2c is similar to using the Linux perf tool today. First collect data with “perf c2c record”, then generate a report output with “perf c2c report” There one finds extensive details on using the tool, with tips on reducing the volume of samples while still capturing enough to do its job. (Dick Fowles, Joe Mario, Don Zickus, Jiri Olsa) - The new 'perf sched timehist' tool provides tailored analysis of scheduling events. Example usage: perf sched record -- sleep 1 perf sched timehist By default it shows the individual schedule events, including the wait time (time between sched-out and next sched-in events for the task), the task scheduling delay (time between wakeup and actually running) and run time for the task: time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) -------- ------ ---------------- --------- --------- -------- 1.874569 [0011] gcc[31949] 0.014 0.000 1.148 1.874591 [0010] gcc[31951] 0.000 0.000 0.024 1.874603 [0010] migration/10[59] 3.350 0.004 0.011 1.874604 [0011] <idle> 1.148 0.000 0.035 1.874723 [0005] <idle> 0.016 0.000 1.383 1.874746 [0005] gcc[31949] 0.153 0.078 0.022 ... Times are in msec.usec. (David Ahern, Namhyung Kim) - Add CPU vendor hardware event tables: Add JSON files with vendor event naming for Intel and Power8 processors, allowing users of tools like oprofile to keep using the event names they are used to, as well as people reading vendor documentation, where such naming is used. (Andi Kleen, Sukadev Bhattiprolu) You should see all the new events with 'perf list' and you should be able to search them, for example 'perf list miss' will list all the myriads of miss events. Other tooling features added were: - Cross-arch annotation support: o Improve ARM support in the annotation code, affecting 'perf annotate', 'perf report' and live annotation in 'perf top' (Kim Phillips) o Initial support for PowerPC in the annotation code (Ravi Bangoria) o Support AArch64 in the 'annotate' code, native/local and cross-arch/remote (Kim Phillips) - Allow considering just events in a given time interval, via the '--time start.s.ms,end.s.ms' command line, added to 'perf kmem', 'perf report', 'perf sched timehist' and 'perf script' (David Ahern) - Add option to stop printing a callchain at one of a given group of symbol names (David Ahern) - Track memory freed in 'perf kmem stat' (David Ahern) - Allow querying and setting .perfconfig variables (Taeung Song) - Show branch information in callchains (predicted, TSX aborts, loop iteractions, etc) (Jin Yao) - Dynamicly change verbosity level by pressing 'V' in the 'perf top/report' hists TUI browser (Alexis Berlemont) - Implement 'perf trace --delay' in the same fashion as in 'perf record --delay', to skip sampling workload initialization events (Alexis Berlemont) - Make vendor named events case insensitive in 'perf list', i.e. 'perf list LONGEST_LAT' works just the same as 'perf list longest_lat' (Andi Kleen) - Add unwinding support for jitdump (Stefano Sanfilippo) Tooling infrastructure changes: - Support linking perf with clang and LLVM libraries, initially statically, but this limitation will be lifted and shared libraries, when available, will be preferred to the static build, that should, as with other features, be enabled explicitly (Wang Nan) - Add initial support (and perf test entry) for tooling hooks, starting with 'record_start' and 'record_end', that will have as its initial user the eBPF infrastructure, where perf_ prefixed functions will be JITed and run when such hooks are called (Wang Nan) - Implement assorted libbpf improvements (Wang Nan)" ... and lots of other changes, features, cleanups and refactorings I did not list, see the shortlog and the git log for details" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (220 commits) perf/x86: Fix exclusion of BTS and LBR for Goldmont perf tools: Explicitly document that --children is enabled by default perf sched timehist: Cleanup idle_max_cpu handling perf sched timehist: Handle zero sample->tid properly perf callchain: Introduce callchain_cursor__copy() perf sched: Cleanup option processing perf sched timehist: Improve error message when analyzing wrong file perf tools: Move perf build related variables under non fixdep leg perf tools: Force fixdep compilation at the start of the build perf tools: Move PERF-VERSION-FILE target into rules area perf build: Check LLVM version in feature check perf annotate: Show raw form for jump instruction with indirect target perf tools: Add non config targets perf tools: Cleanup build directory before each test perf tools: Move python/perf.so target into rules area perf tools: Move install-gtk target into rules area tools build: Move tabs to spaces where suitable tools build: Make the .cmd file more readable perf clang: Compile BPF script using builtin clang support perf clang: Support compile IR to BPF object and add testcase ...

Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar: "This update is pretty big and almost exclusively includes tooling changes, because v4.9's LTS status forced to completion most of the pending kernel side hardware enablement work and because we tried to freeze core perf work a bit to give a time window for the fuzzing efforts. The diff is large mostly due to the JSON hardware event tables added for Intel and Power8 CPUs. This was a popular feature request from people working close to hardware and from the HPC community. Tree size is big because this added the CPU event tables for over a decade of Intel CPUs. Future changes for a CPU vendor alrady support should be much smaller, as events for new models are added. The new events are listed in 'perf list', for the CPU model the tool is running on. If you find an interesting event it can be used as-is: $ perf stat -a -e l2_lines_out.pf_clean sleep 1 Performance counter stats for 'system wide': 7,860,403 l2_lines_out.pf_clean 1.000624918 seconds time elapsed The event lists can be searched the usual 'perf list' fashion for (case insensitive) substrings as well: $ perf list l2_lines_out List of pre-defined events (to be used in -e): cache: l2_lines_out.demand_clean [Clean L2 cache lines evicted by demand] l2_lines_out.demand_dirty [Dirty L2 cache lines evicted by demand] l2_lines_out.dirty_all [Dirty L2 cache lines filling the L2] l2_lines_out.pf_clean [Clean L2 cache lines evicted by L2 prefetch] l2_lines_out.pf_dirty [Dirty L2 cache lines evicted by L2 prefetch] etc. There's a few high level categories as well that can be listed: 'cache', 'floating point', 'frontend', 'memory', 'pipeline', 'virtual memory'. Existing generic events and workflows should work as-is. The only kernel side change is a late breaking fix for an older regression, related to Intel BTS, LBR and PT feature interaction. On the tooling side there are three new tools / major features: - The new 'perf c2c' tool provides means for Shared Data C2C/HITM analysis. This allows you to track down cacheline contention. The tool is based on x86's load latency and precise store facility events provided by Intel CPUs. It was tested by Joe Mario and has proven to be useful, finding some cacheline contentions. Joe also wrote a blog about c2c tool with examples: https://joemario.github.io/blog/2016/09/01/c2c-blog/ excerpt of the content on this site: At a high level, “perf c2c” will show you: * The cachelines where false sharing was detected. * The readers and writers to those cachelines, and the offsets where those accesses occurred. * The pid, tid, instruction addr, function name, binary object name for those readers and writers. * The source file and line number for each reader and writer. * The average load latency for the loads to those cachelines. * Which numa nodes the samples a cacheline came from and which CPUs were involved. Using perf c2c is similar to using the Linux perf tool today. First collect data with “perf c2c record”, then generate a report output with “perf c2c report” There one finds extensive details on using the tool, with tips on reducing the volume of samples while still capturing enough to do its job. (Dick Fowles, Joe Mario, Don Zickus, Jiri Olsa) - The new 'perf sched timehist' tool provides tailored analysis of scheduling events. Example usage: perf sched record -- sleep 1 perf sched timehist By default it shows the individual schedule events, including the wait time (time between sched-out and next sched-in events for the task), the task scheduling delay (time between wakeup and actually running) and run time for the task: time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) -------- ------ ---------------- --------- --------- -------- 1.874569 [0011] gcc[31949] 0.014 0.000 1.148 1.874591 [0010] gcc[31951] 0.000 0.000 0.024 1.874603 [0010] migration/10[59] 3.350 0.004 0.011 1.874604 [0011] <idle> 1.148 0.000 0.035 1.874723 [0005] <idle> 0.016 0.000 1.383 1.874746 [0005] gcc[31949] 0.153 0.078 0.022 ... Times are in msec.usec. (David Ahern, Namhyung Kim) - Add CPU vendor hardware event tables: Add JSON files with vendor event naming for Intel and Power8 processors, allowing users of tools like oprofile to keep using the event names they are used to, as well as people reading vendor documentation, where such naming is used. (Andi Kleen, Sukadev Bhattiprolu) You should see all the new events with 'perf list' and you should be able to search them, for example 'perf list miss' will list all the myriads of miss events. Other tooling features added were: - Cross-arch annotation support: o Improve ARM support in the annotation code, affecting 'perf annotate', 'perf report' and live annotation in 'perf top' (Kim Phillips) o Initial support for PowerPC in the annotation code (Ravi Bangoria) o Support AArch64 in the 'annotate' code, native/local and cross-arch/remote (Kim Phillips) - Allow considering just events in a given time interval, via the '--time start.s.ms,end.s.ms' command line, added to 'perf kmem', 'perf report', 'perf sched timehist' and 'perf script' (David Ahern) - Add option to stop printing a callchain at one of a given group of symbol names (David Ahern) - Track memory freed in 'perf kmem stat' (David Ahern) - Allow querying and setting .perfconfig variables (Taeung Song) - Show branch information in callchains (predicted, TSX aborts, loop iteractions, etc) (Jin Yao) - Dynamicly change verbosity level by pressing 'V' in the 'perf top/report' hists TUI browser (Alexis Berlemont) - Implement 'perf trace --delay' in the same fashion as in 'perf record --delay', to skip sampling workload initialization events (Alexis Berlemont) - Make vendor named events case insensitive in 'perf list', i.e. 'perf list LONGEST_LAT' works just the same as 'perf list longest_lat' (Andi Kleen) - Add unwinding support for jitdump (Stefano Sanfilippo) Tooling infrastructure changes: - Support linking perf with clang and LLVM libraries, initially statically, but this limitation will be lifted and shared libraries, when available, will be preferred to the static build, that should, as with other features, be enabled explicitly (Wang Nan) - Add initial support (and perf test entry) for tooling hooks, starting with 'record_start' and 'record_end', that will have as its initial user the eBPF infrastructure, where perf_ prefixed functions will be JITed and run when such hooks are called (Wang Nan) - Implement assorted libbpf improvements (Wang Nan)" ... and lots of other changes, features, cleanups and refactorings I did not list, see the shortlog and the git log for details" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (220 commits) perf/x86: Fix exclusion of BTS and LBR for Goldmont perf tools: Explicitly document that --children is enabled by default perf sched timehist: Cleanup idle_max_cpu handling perf sched timehist: Handle zero sample->tid properly perf callchain: Introduce callchain_cursor__copy() perf sched: Cleanup option processing perf sched timehist: Improve error message when analyzing wrong file perf tools: Move perf build related variables under non fixdep leg perf tools: Force fixdep compilation at the start of the build perf tools: Move PERF-VERSION-FILE target into rules area perf build: Check LLVM version in feature check perf annotate: Show raw form for jump instruction with indirect target perf tools: Add non config targets perf tools: Cleanup build directory before each test perf tools: Move python/perf.so target into rules area perf tools: Move install-gtk target into rules area tools build: Move tabs to spaces where suitable tools build: Make the .cmd file more readable perf clang: Compile BPF script using builtin clang support perf clang: Support compile IR to BPF object and add testcase ...
bca13ce4 · Linus Torvalds · 0719dbf5 · b0c1ef52 · bca13ce4 · bca13ce4
Commit bca13ce4 authored Dec 12, 2016 by Linus Torvalds
289 changed files
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -365,7 +365,11 @@ int x86_add_exclusive(unsigned int what)
 {
 	int i;

-	if (x86_pmu.lbr_pt_coexist)
+	/*
+	 * When lbr_pt_coexist we allow PT to coexist with either LBR or BTS.
+	 * LBR and BTS are still mutually exclusive.
+	 */
+	if (x86_pmu.lbr_pt_coexist && what == x86_lbr_exclusive_pt)
 		return 0;

 	if (!atomic_inc_not_zero(&x86_pmu.lbr_exclusive[what])) {
@@ -388,7 +392,7 @@ int x86_add_exclusive(unsigned int what)

 void x86_del_exclusive(unsigned int what)
 {
-	if (x86_pmu.lbr_pt_coexist)
+	if (x86_pmu.lbr_pt_coexist && what == x86_lbr_exclusive_pt)
 		return;

 	atomic_dec(&x86_pmu.lbr_exclusive[what]);

--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -604,7 +604,7 @@ struct x86_pmu {
 	u64		lbr_sel_mask;		   /* LBR_SELECT valid bits */
 	const int	*lbr_sel_map;		   /* lbr_select mappings */
 	bool		lbr_double_abort;	   /* duplicated lbr aborts */
-	bool		lbr_pt_coexist;		   /* LBR may coexist with PT */
+	bool		lbr_pt_coexist;		   /* (LBR|BTS) may coexist with PT */

 	/*
 	 * Intel PT/LBR/BTS are exclusive

--- a/tools/build/Build.include
+++ b/tools/build/Build.include
@@ -65,22 +65,22 @@ dep-cmd = $(if $(wildcard $(fixdep)),
           printf '\# cannot find fixdep (%s)\n' $(fixdep) > $(dot-target).cmd; \
           printf '\# using basic dep data\n\n' >> $(dot-target).cmd;           \
           cat $(depfile) >> $(dot-target).cmd;                                 \
-           printf '%s\n' 'cmd_$@ := $(make-cmd)' >> $(dot-target).cmd)
+           printf '\n%s\n' 'cmd_$@ := $(make-cmd)' >> $(dot-target).cmd)

 ###
 # if_changed_dep  - execute command if any prerequisite is newer than
 #                   target, or command line has changed and update
 #                   dependencies in the cmd file
 if_changed_dep = $(if $(strip $(any-prereq) $(arg-check)),         \
-	@set -e;                                                   \
-	$(echo-cmd) $(cmd_$(1)) && $(dep-cmd))
+                  @set -e;                                         \
+                  $(echo-cmd) $(cmd_$(1)) && $(dep-cmd))

 # if_changed      - execute command if any prerequisite is newer than
 #                   target, or command line has changed
-if_changed = $(if $(strip $(any-prereq) $(arg-check)),             \
-	@set -e;                                                   \
-	$(echo-cmd) $(cmd_$(1));                                   \
-	printf '%s\n' 'cmd_$@ := $(make-cmd)' > $(dot-target).cmd)
+if_changed = $(if $(strip $(any-prereq) $(arg-check)),                   \
+              @set -e;                                                   \
+              $(echo-cmd) $(cmd_$(1));                                   \
+              printf '%s\n' 'cmd_$@ := $(make-cmd)' > $(dot-target).cmd)

 ###
 # C flags to be used in rule definitions, includes:
@@ -89,10 +89,12 @@ if_changed = $(if $(strip $(any-prereq) $(arg-check)),             \
 # - per target C flags
 # - per object C flags
 # - BUILD_STR macro to allow '-D"$(variable)"' constructs
-c_flags = -Wp,-MD,$(depfile),-MT,$@ $(CFLAGS) -D"BUILD_STR(s)=\#s" $(CFLAGS_$(basetarget).o) $(CFLAGS_$(obj))
-cxx_flags = -Wp,-MD,$(depfile),-MT,$@ $(CXXFLAGS) -D"BUILD_STR(s)=\#s" $(CXXFLAGS_$(basetarget).o) $(CXXFLAGS_$(obj))
+c_flags_1 = -Wp,-MD,$(depfile) -Wp,-MT,$@ $(CFLAGS) -D"BUILD_STR(s)=\#s" $(CFLAGS_$(basetarget).o) $(CFLAGS_$(obj))
+c_flags_2 = $(filter-out $(CFLAGS_REMOVE_$(basetarget).o), $(c_flags_1))
+c_flags   = $(filter-out $(CFLAGS_REMOVE_$(obj)), $(c_flags_2))
+cxx_flags = -Wp,-MD,$(depfile) -Wp,-MT,$@ $(CXXFLAGS) -D"BUILD_STR(s)=\#s" $(CXXFLAGS_$(basetarget).o) $(CXXFLAGS_$(obj))

 ###
 ## HOSTCC C flags

-host_c_flags = -Wp,-MD,$(depfile),-MT,$@ $(CHOSTFLAGS) -D"BUILD_STR(s)=\#s" $(CHOSTFLAGS_$(basetarget).o) $(CHOSTFLAGS_$(obj))
+host_c_flags = -Wp,-MD,$(depfile) -Wp,-MT,$@ $(CHOSTFLAGS) -D"BUILD_STR(s)=\#s" $(CHOSTFLAGS_$(basetarget).o) $(CHOSTFLAGS_$(obj))
--- a/tools/build/Documentation/Build.txt
+++ b/tools/build/Documentation/Build.txt
@@ -135,8 +135,10 @@ CFLAGS

 It's possible to alter the standard object C flags in the following way:

-  CFLAGS_perf.o += '...' - alters CFLAGS for perf.o object
-  CFLAGS_gtk += '...'    - alters CFLAGS for gtk build object
+  CFLAGS_perf.o        += '...'  - adds CFLAGS for perf.o object
+  CFLAGS_gtk           += '...'  - adds CFLAGS for gtk build object
+  CFLAGS_REMOVE_perf.o += '...'  - removes CFLAGS for perf.o object
+  CFLAGS_REMOVE_gtk    += '...'  - removes CFLAGS for gtk build object

 This C flags changes has the scope of the Build makefile they are defined in.


--- a/tools/build/Makefile.feature
+++ b/tools/build/Makefile.feature
@@ -27,58 +27,58 @@ endef
 #   the rule that uses them - an example for that is the 'bionic'
 #   feature check. ]
 #
-FEATURE_TESTS_BASIC :=			\
-	backtrace			\
-	dwarf				\
-	dwarf_getlocations		\
-	fortify-source			\
-	sync-compare-and-swap		\
-	glibc				\
-	gtk2				\
-	gtk2-infobar			\
-	libaudit			\
-	libbfd				\
-	libelf				\
-	libelf-getphdrnum		\
-	libelf-gelf_getnote		\
-	libelf-getshdrstrndx		\
-	libelf-mmap			\
-	libnuma				\
-	numa_num_possible_cpus		\
-	libperl				\
-	libpython			\
-	libpython-version		\
-	libslang			\
-	libcrypto			\
-	libunwind			\
-	libunwind-x86			\
-	libunwind-x86_64		\
-	libunwind-arm			\
-	libunwind-aarch64		\
-	pthread-attr-setaffinity-np	\
-	stackprotector-all		\
-	timerfd				\
-	libdw-dwarf-unwind		\
-	zlib				\
-	lzma				\
-	get_cpuid			\
-	bpf				\
-	sdt
+FEATURE_TESTS_BASIC :=                  \
+        backtrace                       \
+        dwarf                           \
+        dwarf_getlocations              \
+        fortify-source                  \
+        sync-compare-and-swap           \
+        glibc                           \
+        gtk2                            \
+        gtk2-infobar                    \
+        libaudit                        \
+        libbfd                          \
+        libelf                          \
+        libelf-getphdrnum               \
+        libelf-gelf_getnote             \
+        libelf-getshdrstrndx            \
+        libelf-mmap                     \
+        libnuma                         \
+        numa_num_possible_cpus          \
+        libperl                         \
+        libpython                       \
+        libpython-version               \
+        libslang                        \
+        libcrypto                       \
+        libunwind                       \
+        libunwind-x86                   \
+        libunwind-x86_64                \
+        libunwind-arm                   \
+        libunwind-aarch64               \
+        pthread-attr-setaffinity-np     \
+        stackprotector-all              \
+        timerfd                         \
+        libdw-dwarf-unwind              \
+        zlib                            \
+        lzma                            \
+        get_cpuid                       \
+        bpf                             \
+        sdt

 # FEATURE_TESTS_BASIC + FEATURE_TESTS_EXTRA is the complete list
 # of all feature tests
-FEATURE_TESTS_EXTRA :=			\
-	bionic				\
-	compile-32			\
-	compile-x32			\
-	cplus-demangle			\
-	hello				\
-	libbabeltrace			\
-	liberty				\
-	liberty-z			\
-	libunwind-debug-frame		\
-	libunwind-debug-frame-arm	\
-	libunwind-debug-frame-aarch64
+FEATURE_TESTS_EXTRA :=                  \
+         bionic                         \
+         compile-32                     \
+         compile-x32                    \
+         cplus-demangle                 \
+         hello                          \
+         libbabeltrace                  \
+         liberty                        \
+         liberty-z                      \
+         libunwind-debug-frame          \
+         libunwind-debug-frame-arm      \
+         libunwind-debug-frame-aarch64

 FEATURE_TESTS ?= $(FEATURE_TESTS_BASIC)

@@ -86,26 +86,26 @@ ifeq ($(FEATURE_TESTS),all)
  FEATURE_TESTS := $(FEATURE_TESTS_BASIC) $(FEATURE_TESTS_EXTRA)
 endif

-FEATURE_DISPLAY ?=			\
-	dwarf				\
-	dwarf_getlocations		\
-	glibc				\
-	gtk2				\
-	libaudit			\
-	libbfd				\
-	libelf				\
-	libnuma				\
-	numa_num_possible_cpus		\
-	libperl				\
-	libpython			\
-	libslang			\
-	libcrypto			\
-	libunwind			\
-	libdw-dwarf-unwind		\
-	zlib				\
-	lzma				\
-	get_cpuid			\
-	bpf
+FEATURE_DISPLAY ?=              \
+         dwarf                  \
+         dwarf_getlocations     \
+         glibc                  \
+         gtk2                   \
+         libaudit               \
+         libbfd                 \
+         libelf                 \
+         libnuma                \
+         numa_num_possible_cpus \
+         libperl                \
+         libpython              \
+         libslang               \
+         libcrypto              \
+         libunwind              \
+         libdw-dwarf-unwind     \
+         zlib                   \
+         lzma                   \
+         get_cpuid              \
+         bpf

 # Set FEATURE_CHECK_(C|LD)FLAGS-all for all FEATURE_TESTS features.
 # If in the future we need per-feature checks/flags for features not

--- a/tools/build/feature/Makefile
+++ b/tools/build/feature/Makefile
-FILES=					\
-	test-all.bin			\
-	test-backtrace.bin		\
-	test-bionic.bin			\
-	test-dwarf.bin			\
-	test-dwarf_getlocations.bin	\
-	test-fortify-source.bin		\
-	test-sync-compare-and-swap.bin	\
-	test-glibc.bin			\
-	test-gtk2.bin			\
-	test-gtk2-infobar.bin		\
-	test-hello.bin			\
-	test-libaudit.bin		\
-	test-libbfd.bin			\
-	test-liberty.bin		\
-	test-liberty-z.bin		\
-	test-cplus-demangle.bin		\
-	test-libelf.bin			\
-	test-libelf-getphdrnum.bin	\
-	test-libelf-gelf_getnote.bin	\
-	test-libelf-getshdrstrndx.bin	\
-	test-libelf-mmap.bin		\
-	test-libnuma.bin		\
-	test-numa_num_possible_cpus.bin	\
-	test-libperl.bin		\
-	test-libpython.bin		\
-	test-libpython-version.bin	\
-	test-libslang.bin		\
-	test-libcrypto.bin		\
-	test-libunwind.bin		\
-	test-libunwind-debug-frame.bin	\
-	test-libunwind-x86.bin		\
-	test-libunwind-x86_64.bin	\
-	test-libunwind-arm.bin		\
-	test-libunwind-aarch64.bin	\
-	test-libunwind-debug-frame-arm.bin	\
-	test-libunwind-debug-frame-aarch64.bin	\
-	test-pthread-attr-setaffinity-np.bin	\
-	test-stackprotector-all.bin	\
-	test-timerfd.bin		\
-	test-libdw-dwarf-unwind.bin	\
-	test-libbabeltrace.bin		\
-	test-compile-32.bin		\
-	test-compile-x32.bin		\
-	test-zlib.bin			\
-	test-lzma.bin			\
-	test-bpf.bin			\
-	test-get_cpuid.bin		\
-	test-sdt.bin			\
-	test-cxx.bin
+FILES=                                          \
+         test-all.bin                           \
+         test-backtrace.bin                     \
+         test-bionic.bin                        \
+         test-dwarf.bin                         \
+         test-dwarf_getlocations.bin            \
+         test-fortify-source.bin                \
+         test-sync-compare-and-swap.bin         \
+         test-glibc.bin                         \
+         test-gtk2.bin                          \
+         test-gtk2-infobar.bin                  \
+         test-hello.bin                         \
+         test-libaudit.bin                      \
+         test-libbfd.bin                        \
+         test-liberty.bin                       \
+         test-liberty-z.bin                     \
+         test-cplus-demangle.bin                \
+         test-libelf.bin                        \
+         test-libelf-getphdrnum.bin             \
+         test-libelf-gelf_getnote.bin           \
+         test-libelf-getshdrstrndx.bin          \
+         test-libelf-mmap.bin                   \
+         test-libnuma.bin                       \
+         test-numa_num_possible_cpus.bin        \
+         test-libperl.bin                       \
+         test-libpython.bin                     \
+         test-libpython-version.bin             \
+         test-libslang.bin                      \
+         test-libcrypto.bin                     \
+         test-libunwind.bin                     \
+         test-libunwind-debug-frame.bin         \
+         test-libunwind-x86.bin                 \
+         test-libunwind-x86_64.bin              \
+         test-libunwind-arm.bin                 \
+         test-libunwind-aarch64.bin             \
+         test-libunwind-debug-frame-arm.bin     \
+         test-libunwind-debug-frame-aarch64.bin \
+         test-pthread-attr-setaffinity-np.bin   \
+         test-stackprotector-all.bin            \
+         test-timerfd.bin                       \
+         test-libdw-dwarf-unwind.bin            \
+         test-libbabeltrace.bin                 \
+         test-compile-32.bin                    \
+         test-compile-x32.bin                   \
+         test-zlib.bin                          \
+         test-lzma.bin                          \
+         test-bpf.bin                           \
+         test-get_cpuid.bin                     \
+         test-sdt.bin                           \
+         test-cxx.bin                           \
+         test-jvmti.bin

 FILES := $(addprefix $(OUTPUT),$(FILES))

 CC := $(CROSS_COMPILE)gcc -MD
 CXX := $(CROSS_COMPILE)g++ -MD
 PKG_CONFIG := $(CROSS_COMPILE)pkg-config
+LLVM_CONFIG ?= llvm-config

 all: $(FILES)

@@ -225,6 +227,30 @@ $(OUTPUT)test-sdt.bin:
 $(OUTPUT)test-cxx.bin:
 	$(BUILDXX) -std=gnu++11

+$(OUTPUT)test-jvmti.bin:
+	$(BUILD)
+
+$(OUTPUT)test-llvm.bin:
+	$(BUILDXX) -std=gnu++11 				\
+		-I$(shell $(LLVM_CONFIG) --includedir) 		\
+		-L$(shell $(LLVM_CONFIG) --libdir)		\
+		$(shell $(LLVM_CONFIG) --libs Core BPF)		\
+		$(shell $(LLVM_CONFIG) --system-libs)
+
+$(OUTPUT)test-llvm-version.bin:
+	$(BUILDXX) -std=gnu++11 				\
+		-I$(shell $(LLVM_CONFIG) --includedir)
+
+$(OUTPUT)test-clang.bin:
+	$(BUILDXX) -std=gnu++11 				\
+		-I$(shell $(LLVM_CONFIG) --includedir) 		\
+		-L$(shell $(LLVM_CONFIG) --libdir)		\
+		-Wl,--start-group -lclangBasic -lclangDriver	\
+		  -lclangFrontend -lclangEdit -lclangLex	\
+		  -lclangAST -Wl,--end-group 			\
+		$(shell $(LLVM_CONFIG) --libs Core option)	\
+		$(shell $(LLVM_CONFIG) --system-libs)
+
 -include $(OUTPUT)*.d

 ###############################

--- a/tools/build/feature/test-clang.cpp
+++ b/tools/build/feature/test-clang.cpp
+#include "clang/Basic/VirtualFileSystem.h"
+#include "clang/Driver/Driver.h"
+#include "clang/Frontend/TextDiagnosticPrinter.h"
+#include "llvm/ADT/IntrusiveRefCntPtr.h"
+#include "llvm/Support/ManagedStatic.h"
+#include "llvm/Support/raw_ostream.h"
+
+using namespace clang;
+using namespace clang::driver;
+
+int main()
+{
+	IntrusiveRefCntPtr<DiagnosticIDs> DiagID(new DiagnosticIDs());
+	IntrusiveRefCntPtr<DiagnosticOptions> DiagOpts = new DiagnosticOptions();
+
+	DiagnosticsEngine Diags(DiagID, &*DiagOpts);
+	Driver TheDriver("test", "bpf-pc-linux", Diags);
+
+	llvm::llvm_shutdown();
+	return 0;
+}
--- a/tools/build/feature/test-jvmti.c
+++ b/tools/build/feature/test-jvmti.c
+#include <jvmti.h>
+#include <jvmticmlr.h>
+
+int main(void)
+{
+	JavaVM			jvm	__attribute__((unused));
+	jvmtiEventCallbacks	cb	__attribute__((unused));
+	jvmtiCapabilities	caps	__attribute__((unused));
+	jvmtiJlocationFormat	format	__attribute__((unused));
+	jvmtiEnv		jvmti	__attribute__((unused));
+
+	return 0;
+}
--- a/tools/build/feature/test-llvm-version.cpp
+++ b/tools/build/feature/test-llvm-version.cpp
+#include <cstdio>
+#include "llvm/Config/llvm-config.h"
+
+#define NUM_VERSION (((LLVM_VERSION_MAJOR) << 16) + (LLVM_VERSION_MINOR << 8) + LLVM_VERSION_PATCH)
+#define pass int main() {printf("%x\n", NUM_VERSION); return 0;}
+
+#if NUM_VERSION >= 0x030900
+pass
+#else
+# error This LLVM is not tested yet.
+#endif
--- a/tools/build/feature/test-llvm.cpp
+++ b/tools/build/feature/test-llvm.cpp
+#include "llvm/Support/ManagedStatic.h"
+#include "llvm/Support/raw_ostream.h"
+#define NUM_VERSION (((LLVM_VERSION_MAJOR) << 16) + (LLVM_VERSION_MINOR << 8) + LLVM_VERSION_PATCH)
+
+#if NUM_VERSION < 0x030900
+# error "LLVM version too low"
+#endif
+int main()
+{
+	llvm::errs() << "Hello World!\n";
+	llvm::llvm_shutdown();
+	return 0;
+}
--- a/tools/build/fixdep.c
+++ b/tools/build/fixdep.c
@@ -49,7 +49,7 @@ static void parse_dep_file(void *map, size_t len)
 	char *end = m + len;
 	char *p;
 	char s[PATH_MAX];
-	int is_target;
+	int is_target, has_target = 0;
 	int saw_any_target = 0;
 	int is_first_dep = 0;

@@ -67,7 +67,8 @@ static void parse_dep_file(void *map, size_t len)
 		if (is_target) {
 			/* The /next/ file is the first dependency */
 			is_first_dep = 1;
-		} else {
+			has_target = 1;
+		} else if (has_target) {
 			/* Save this token/filename */
 			memcpy(s, m, p-m);
 			s[p - m] = 0;

--- a/tools/include/asm-generic/bitops.h
+++ b/tools/include/asm-generic/bitops.h
@@ -13,6 +13,7 @@
 */

 #include <asm-generic/bitops/__ffs.h>
+#include <asm-generic/bitops/__ffz.h>
 #include <asm-generic/bitops/fls.h>
 #include <asm-generic/bitops/__fls.h>
 #include <asm-generic/bitops/fls64.h>

--- a/tools/include/asm-generic/bitops/__ffz.h
+++ b/tools/include/asm-generic/bitops/__ffz.h
+#ifndef _ASM_GENERIC_BITOPS_FFZ_H_
+#define _ASM_GENERIC_BITOPS_FFZ_H_
+
+/*
+ * ffz - find first zero in word.
+ * @word: The word to search
+ *
+ * Undefined if no zero exists, so code should check against ~0UL first.
+ */
+#define ffz(x)  __ffs(~(x))
+
+#endif /* _ASM_GENERIC_BITOPS_FFZ_H_ */
--- a/tools/include/asm-generic/bitops/find.h
+++ b/tools/include/asm-generic/bitops/find.h
@@ -15,6 +15,21 @@ extern unsigned long find_next_bit(const unsigned long *addr, unsigned long
 		size, unsigned long offset);
 #endif

+#ifndef find_next_zero_bit
+
+/**
+ * find_next_zero_bit - find the next cleared bit in a memory region
+ * @addr: The address to base the search on
+ * @offset: The bitnumber to start searching at
+ * @size: The bitmap size in bits
+ *
+ * Returns the bit number of the next zero bit
+ * If no bits are zero, returns @size.
+ */
+unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
+				 unsigned long offset);
+#endif
+
 #ifndef find_first_bit

 /**
@@ -30,4 +45,17 @@ extern unsigned long find_first_bit(const unsigned long *addr,

 #endif /* find_first_bit */

+#ifndef find_first_zero_bit
+
+/**
+ * find_first_zero_bit - find the first cleared bit in a memory region
+ * @addr: The address to start the search at
+ * @size: The maximum number of bits to search
+ *
+ * Returns the bit number of the first cleared bit.
+ * If no bits are zero, returns @size.
+ */
+unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size);
+#endif
+
 #endif /*_TOOLS_LINUX_ASM_GENERIC_BITOPS_FIND_H_ */
--- a/tools/include/linux/bitops.h
+++ b/tools/include/linux/bitops.h
@@ -39,6 +39,11 @@ extern unsigned long __sw_hweight64(__u64 w);
 	     (bit) < (size);					\
 	     (bit) = find_next_bit((addr), (size), (bit) + 1))

+#define for_each_clear_bit(bit, addr, size) \
+	for ((bit) = find_first_zero_bit((addr), (size));       \
+	     (bit) < (size);                                    \
+	     (bit) = find_next_zero_bit((addr), (size), (bit) + 1))
+
 /* same as for_each_set_bit() but use bit as value to start with */
 #define for_each_set_bit_from(bit, addr, size) \
 	for ((bit) = find_next_bit((addr), (size), (bit));	\

--- a/tools/include/uapi/asm-generic/mman-common.h
+++ b/tools/include/uapi/asm-generic/mman-common.h
@@ -72,4 +72,9 @@
 #define MAP_HUGE_SHIFT	26
 #define MAP_HUGE_MASK	0x3f

+#define PKEY_DISABLE_ACCESS	0x1
+#define PKEY_DISABLE_WRITE	0x2
+#define PKEY_ACCESS_MASK	(PKEY_DISABLE_ACCESS |\
+				 PKEY_DISABLE_WRITE)
+
 #endif /* __ASM_GENERIC_MMAN_COMMON_H */
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -110,3 +110,59 @@ int bpf_map_update_elem(int fd, void *key, void *value,

 	return sys_bpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
 }
+
+int bpf_map_lookup_elem(int fd, void *key, void *value)
+{
+	union bpf_attr attr;
+
+	bzero(&attr, sizeof(attr));
+	attr.map_fd = fd;
+	attr.key = ptr_to_u64(key);
+	attr.value = ptr_to_u64(value);
+
+	return sys_bpf(BPF_MAP_LOOKUP_ELEM, &attr, sizeof(attr));
+}
+
+int bpf_map_delete_elem(int fd, void *key)
+{
+	union bpf_attr attr;
+
+	bzero(&attr, sizeof(attr));
+	attr.map_fd = fd;
+	attr.key = ptr_to_u64(key);
+
+	return sys_bpf(BPF_MAP_DELETE_ELEM, &attr, sizeof(attr));
+}
+
+int bpf_map_get_next_key(int fd, void *key, void *next_key)
+{
+	union bpf_attr attr;
+
+	bzero(&attr, sizeof(attr));
+	attr.map_fd = fd;
+	attr.key = ptr_to_u64(key);
+	attr.next_key = ptr_to_u64(next_key);
+
+	return sys_bpf(BPF_MAP_GET_NEXT_KEY, &attr, sizeof(attr));
+}
+
+int bpf_obj_pin(int fd, const char *pathname)
+{
+	union bpf_attr attr;
+
+	bzero(&attr, sizeof(attr));
+	attr.pathname = ptr_to_u64((void *)pathname);
+	attr.bpf_fd = fd;
+
+	return sys_bpf(BPF_OBJ_PIN, &attr, sizeof(attr));
+}
+
+int bpf_obj_get(const char *pathname)
+{
+	union bpf_attr attr;
+
+	bzero(&attr, sizeof(attr));
+	attr.pathname = ptr_to_u64((void *)pathname);
+
+	return sys_bpf(BPF_OBJ_GET, &attr, sizeof(attr));
+}
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -35,4 +35,11 @@ int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,

 int bpf_map_update_elem(int fd, void *key, void *value,
 			u64 flags);
+
+int bpf_map_lookup_elem(int fd, void *key, void *value);
+int bpf_map_delete_elem(int fd, void *key);
+int bpf_map_get_next_key(int fd, void *key, void *next_key);
+int bpf_obj_pin(int fd, const char *pathname);
+int bpf_obj_get(const char *pathname);
+
 #endif
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -185,6 +185,7 @@ struct bpf_program {
 struct bpf_map {
 	int fd;
 	char *name;
+	size_t offset;
 	struct bpf_map_def def;
 	void *priv;
 	bpf_map_clear_priv_t clear_priv;
@@ -228,6 +229,10 @@ struct bpf_object {
 	 * all objects.
 	 */
 	struct list_head list;
+
+	void *priv;
+	bpf_object_clear_priv_t clear_priv;
+
 	char path[];
 };
 #define obj_elf_valid(o)	((o)->efile.elf)
@@ -513,57 +518,106 @@ bpf_object__init_kversion(struct bpf_object *obj,
 }

 static int
-bpf_object__init_maps(struct bpf_object *obj, void *data,
-		      size_t size)
+bpf_object__validate_maps(struct bpf_object *obj)
 {
-	size_t nr_maps;
 	int i;

-	nr_maps = size / sizeof(struct bpf_map_def);
-	if (!data || !nr_maps) {
-		pr_debug("%s doesn't need map definition\n",
-			 obj->path);
+	/*
+	 * If there's only 1 map, the only error case should have been
+	 * catched in bpf_object__init_maps().
+	 */
+	if (!obj->maps || !obj->nr_maps || (obj->nr_maps == 1))
 		return 0;
-	}

-	pr_debug("maps in %s: %zd bytes\n", obj->path, size);
+	for (i = 1; i < obj->nr_maps; i++) {
+		const struct bpf_map *a = &obj->maps[i - 1];
+		const struct bpf_map *b = &obj->maps[i];

-	obj->maps = calloc(nr_maps, sizeof(obj->maps[0]));
-	if (!obj->maps) {
-		pr_warning("alloc maps for object failed\n");
-		return -ENOMEM;
+		if (b->offset - a->offset < sizeof(struct bpf_map_def)) {
+			pr_warning("corrupted map section in %s: map \"%s\" too small\n",
+				   obj->path, a->name);
+			return -EINVAL;
+		}
 	}
-	obj->nr_maps = nr_maps;
-
-	for (i = 0; i < nr_maps; i++) {
-		struct bpf_map_def *def = &obj->maps[i].def;
+	return 0;
+}

-		/*
-		 * fill all fd with -1 so won't close incorrect
-		 * fd (fd=0 is stdin) when failure (zclose won't close
-		 * negative fd)).
-		 */
-		obj->maps[i].fd = -1;
+static int compare_bpf_map(const void *_a, const void *_b)
+{
+	const struct bpf_map *a = _a;
+	const struct bpf_map *b = _b;

-		/* Save map definition into obj->maps */
-		*def = ((struct bpf_map_def *)data)[i];
-	}
-	return 0;
+	return a->offset - b->offset;
 }

 static int
-bpf_object__init_maps_name(struct bpf_object *obj)
+bpf_object__init_maps(struct bpf_object *obj)
 {
-	int i;
+	int i, map_idx, nr_maps = 0;
+	Elf_Scn *scn;
+	Elf_Data *data;
 	Elf_Data *symbols = obj->efile.symbols;

-	if (!symbols || obj->efile.maps_shndx < 0)
+	if (obj->efile.maps_shndx < 0)
+		return -EINVAL;
+	if (!symbols)
+		return -EINVAL;
+
+	scn = elf_getscn(obj->efile.elf, obj->efile.maps_shndx);
+	if (scn)
+		data = elf_getdata(scn, NULL);
+	if (!scn || !data) {
+		pr_warning("failed to get Elf_Data from map section %d\n",
+			   obj->efile.maps_shndx);
 		return -EINVAL;
+	}

+	/*
+	 * Count number of maps. Each map has a name.
+	 * Array of maps is not supported: only the first element is
+	 * considered.
+	 *
+	 * TODO: Detect array of map and report error.
+	 */
 	for (i = 0; i < symbols->d_size / sizeof(GElf_Sym); i++) {
 		GElf_Sym sym;
-		size_t map_idx;
+
+		if (!gelf_getsym(symbols, i, &sym))
+			continue;
+		if (sym.st_shndx != obj->efile.maps_shndx)
+			continue;
+		nr_maps++;
+	}
+
+	/* Alloc obj->maps and fill nr_maps. */
+	pr_debug("maps in %s: %d maps in %zd bytes\n", obj->path,
+		 nr_maps, data->d_size);
+
+	if (!nr_maps)
+		return 0;
+
+	obj->maps = calloc(nr_maps, sizeof(obj->maps[0]));
+	if (!obj->maps) {
+		pr_warning("alloc maps for object failed\n");
+		return -ENOMEM;
+	}
+	obj->nr_maps = nr_maps;
+
+	/*
+	 * fill all fd with -1 so won't close incorrect
+	 * fd (fd=0 is stdin) when failure (zclose won't close
+	 * negative fd)).
+	 */
+	for (i = 0; i < nr_maps; i++)
+		obj->maps[i].fd = -1;
+
+	/*
+	 * Fill obj->maps using data in "maps" section.
+	 */
+	for (i = 0, map_idx = 0; i < symbols->d_size / sizeof(GElf_Sym); i++) {
+		GElf_Sym sym;
 		const char *map_name;
+		struct bpf_map_def *def;

 		if (!gelf_getsym(symbols, i, &sym))
 			continue;
@@ -573,21 +627,27 @@ bpf_object__init_maps_name(struct bpf_object *obj)
 		map_name = elf_strptr(obj->efile.elf,
 				      obj->efile.strtabidx,
 				      sym.st_name);
-		map_idx = sym.st_value / sizeof(struct bpf_map_def);
-		if (map_idx >= obj->nr_maps) {
-			pr_warning("index of map \"%s\" is buggy: %zu > %zu\n",
-				   map_name, map_idx, obj->nr_maps);
-			continue;
+		obj->maps[map_idx].offset = sym.st_value;
+		if (sym.st_value + sizeof(struct bpf_map_def) > data->d_size) {
+			pr_warning("corrupted maps section in %s: last map \"%s\" too small\n",
+				   obj->path, map_name);
+			return -EINVAL;
 		}
+
 		obj->maps[map_idx].name = strdup(map_name);
 		if (!obj->maps[map_idx].name) {
 			pr_warning("failed to alloc map name\n");
 			return -ENOMEM;
 		}
-		pr_debug("map %zu is \"%s\"\n", map_idx,
+		pr_debug("map %d is \"%s\"\n", map_idx,
 			 obj->maps[map_idx].name);
+		def = (struct bpf_map_def *)(data->d_buf + sym.st_value);
+		obj->maps[map_idx].def = *def;
+		map_idx++;
 	}
-	return 0;
+
+	qsort(obj->maps, obj->nr_maps, sizeof(obj->maps[0]), compare_bpf_map);
+	return bpf_object__validate_maps(obj);
 }

 static int bpf_object__elf_collect(struct bpf_object *obj)
@@ -645,11 +705,9 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
 			err = bpf_object__init_kversion(obj,
 							data->d_buf,
 							data->d_size);
-		else if (strcmp(name, "maps") == 0) {
-			err = bpf_object__init_maps(obj, data->d_buf,
-						    data->d_size);
+		else if (strcmp(name, "maps") == 0)
 			obj->efile.maps_shndx = idx;
-		} else if (sh.sh_type == SHT_SYMTAB) {
+		else if (sh.sh_type == SHT_SYMTAB) {
 			if (obj->efile.symbols) {
 				pr_warning("bpf: multiple SYMTAB in %s\n",
 					   obj->path);
@@ -698,7 +756,7 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
 		return LIBBPF_ERRNO__FORMAT;
 	}
 	if (obj->efile.maps_shndx >= 0)
-		err = bpf_object__init_maps_name(obj);
+		err = bpf_object__init_maps(obj);
 out:
 	return err;
 }
@@ -807,7 +865,7 @@ bpf_object__create_maps(struct bpf_object *obj)
 				zclose(obj->maps[j].fd);
 			return err;
 		}
-		pr_debug("create map: fd=%d\n", *pfd);
+		pr_debug("create map %s: fd=%d\n", obj->maps[i].name, *pfd);
 	}

 	return 0;
@@ -1175,6 +1233,9 @@ void bpf_object__close(struct bpf_object *obj)
 	if (!obj)
 		return;

+	if (obj->clear_priv)
+		obj->clear_priv(obj, obj->priv);
+
 	bpf_object__elf_finish(obj);
 	bpf_object__unload(obj);

@@ -1228,6 +1289,22 @@ unsigned int bpf_object__kversion(struct bpf_object *obj)
 	return obj ? obj->kern_version : 0;
 }

+int bpf_object__set_priv(struct bpf_object *obj, void *priv,
+			 bpf_object_clear_priv_t clear_priv)
+{
+	if (obj->priv && obj->clear_priv)
+		obj->clear_priv(obj, obj->priv);
+
+	obj->priv = priv;
+	obj->clear_priv = clear_priv;
+	return 0;
+}
+
+void *bpf_object__priv(struct bpf_object *obj)
+{
+	return obj ? obj->priv : ERR_PTR(-EINVAL);
+}
+
 struct bpf_program *
 bpf_program__next(struct bpf_program *prev, struct bpf_object *obj)
 {
@@ -1447,3 +1524,15 @@ bpf_object__find_map_by_name(struct bpf_object *obj, const char *name)
 	}
 	return NULL;
 }
+
+struct bpf_map *
+bpf_object__find_map_by_offset(struct bpf_object *obj, size_t offset)
+{
+	int i;
+
+	for (i = 0; i < obj->nr_maps; i++) {
+		if (obj->maps[i].offset == offset)
+			return &obj->maps[i];
+	}
+	return ERR_PTR(-ENOENT);
+}
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -24,6 +24,7 @@
 #include <stdio.h>
 #include <stdbool.h>
 #include <linux/err.h>
+#include <sys/types.h>  // for size_t

 enum libbpf_errno {
 	__LIBBPF_ERRNO__START = 4000,
@@ -79,6 +80,11 @@ struct bpf_object *bpf_object__next(struct bpf_object *prev);
 	     (pos) != NULL;				\
 	     (pos) = (tmp), (tmp) = bpf_object__next(tmp))

+typedef void (*bpf_object_clear_priv_t)(struct bpf_object *, void *);
+int bpf_object__set_priv(struct bpf_object *obj, void *priv,
+			 bpf_object_clear_priv_t clear_priv);
+void *bpf_object__priv(struct bpf_object *prog);
+
 /* Accessors of bpf_program. */
 struct bpf_program;
 struct bpf_program *bpf_program__next(struct bpf_program *prog,
@@ -195,6 +201,13 @@ struct bpf_map;
 struct bpf_map *
 bpf_object__find_map_by_name(struct bpf_object *obj, const char *name);

+/*
+ * Get bpf_map through the offset of corresponding struct bpf_map_def
+ * in the bpf object file.
+ */
+struct bpf_map *
+bpf_object__find_map_by_offset(struct bpf_object *obj, size_t offset);
+
 struct bpf_map *
 bpf_map__next(struct bpf_map *map, struct bpf_object *obj);
 #define bpf_map__for_each(pos, obj)		\

--- a/tools/lib/find_bit.c
+++ b/tools/lib/find_bit.c
@@ -82,3 +82,28 @@ unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
 	return size;
 }
 #endif
+
+#ifndef find_first_zero_bit
+/*
+ * Find the first cleared bit in a memory region.
+ */
+unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
+{
+	unsigned long idx;
+
+	for (idx = 0; idx * BITS_PER_LONG < size; idx++) {
+		if (addr[idx] != ~0UL)
+			return min(idx * BITS_PER_LONG + ffz(addr[idx]), size);
+	}
+
+	return size;
+}
+#endif
+
+#ifndef find_next_zero_bit
+unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
+				 unsigned long offset)
+{
+	return _find_next_bit(addr, size, offset, ~0UL);
+}
+#endif
--- a/tools/lib/subcmd/parse-options.c
+++ b/tools/lib/subcmd/parse-options.c
@@ -314,12 +314,19 @@ static int get_value(struct parse_opt_ctx_t *p,

 static int parse_short_opt(struct parse_opt_ctx_t *p, const struct option *options)
 {
+retry:
 	for (; options->type != OPTION_END; options++) {
 		if (options->short_name == *p->opt) {
 			p->opt = p->opt[1] ? p->opt + 1 : NULL;
 			return get_value(p, options, OPT_SHORT);
 		}
 	}
+
+	if (options->parent) {
+		options = options->parent;
+		goto retry;
+	}
+
 	return -2;
 }

@@ -333,6 +340,7 @@ static int parse_long_opt(struct parse_opt_ctx_t *p, const char *arg,
 	if (!arg_end)
 		arg_end = arg + strlen(arg);

+retry:
 	for (; options->type != OPTION_END; options++) {
 		const char *rest;
 		int flags = 0;
@@ -426,6 +434,12 @@ static int parse_long_opt(struct parse_opt_ctx_t *p, const char *arg,
 	}
 	if (abbrev_option)
 		return get_value(p, abbrev_option, abbrev_flags);
+
+	if (options->parent) {
+		options = options->parent;
+		goto retry;
+	}
+
 	return -2;
 }


--- a/tools/lib/subcmd/parse-options.h
+++ b/tools/lib/subcmd/parse-options.h
@@ -109,11 +109,13 @@ struct option {
 	intptr_t defval;
 	bool *set;
 	void *data;
+	const struct option *parent;
 };

 #define check_vtype(v, type) ( BUILD_BUG_ON_ZERO(!__builtin_types_compatible_p(typeof(v), type)) + v )

 #define OPT_END()                   { .type = OPTION_END }
+#define OPT_PARENT(p)               { .type = OPTION_END, .parent = (p) }
 #define OPT_ARGUMENT(l, h)          { .type = OPTION_ARGUMENT, .long_name = (l), .help = (h) }
 #define OPT_GROUP(h)                { .type = OPTION_GROUP, .help = (h) }
 #define OPT_BIT(s, l, v, h, b)      { .type = OPTION_BIT, .short_name = (s), .long_name = (l), .value = check_vtype(v, int *), .help = (h), .defval = (b) }

--- a/tools/lib/traceevent/Makefile
+++ b/tools/lib/traceevent/Makefile
@@ -99,8 +99,6 @@ libdir_SQ = $(subst ','\'',$(libdir))
 libdir_relative_SQ = $(subst ','\'',$(libdir_relative))
 plugin_dir_SQ = $(subst ','\'',$(plugin_dir))

-LIB_FILE = libtraceevent.a libtraceevent.so
-
 CONFIG_INCLUDES = 
 CONFIG_LIBS	=
 CONFIG_FLAGS	=
@@ -114,6 +112,9 @@ N		=

 EVENT_PARSE_VERSION = $(EP_VERSION).$(EP_PATCHLEVEL).$(EP_EXTRAVERSION)

+LIB_TARGET  = libtraceevent.a libtraceevent.so.$(EVENT_PARSE_VERSION)
+LIB_INSTALL = libtraceevent.a libtraceevent.so*
+
 INCLUDES = -I. -I $(srctree)/tools/include $(CONFIG_INCLUDES)

 # Set compile option CFLAGS
@@ -156,11 +157,11 @@ PLUGINS += plugin_cfg80211.so
 PLUGINS    := $(addprefix $(OUTPUT),$(PLUGINS))
 PLUGINS_IN := $(PLUGINS:.so=-in.o)

-TE_IN    := $(OUTPUT)libtraceevent-in.o
-LIB_FILE := $(addprefix $(OUTPUT),$(LIB_FILE))
+TE_IN      := $(OUTPUT)libtraceevent-in.o
+LIB_TARGET := $(addprefix $(OUTPUT),$(LIB_TARGET))
 DYNAMIC_LIST_FILE := $(OUTPUT)libtraceevent-dynamic-list

-CMD_TARGETS = $(LIB_FILE) $(PLUGINS) $(DYNAMIC_LIST_FILE)
+CMD_TARGETS = $(LIB_TARGET) $(PLUGINS) $(DYNAMIC_LIST_FILE)

 TARGETS = $(CMD_TARGETS)

@@ -171,8 +172,10 @@ all_cmd: $(CMD_TARGETS)
 $(TE_IN): force
 	$(Q)$(MAKE) $(build)=libtraceevent

-$(OUTPUT)libtraceevent.so: $(TE_IN)
-	$(QUIET_LINK)$(CC) --shared $^ -o $@
+$(OUTPUT)libtraceevent.so.$(EVENT_PARSE_VERSION): $(TE_IN)
+	$(QUIET_LINK)$(CC) --shared $^ -Wl,-soname,libtraceevent.so.$(EP_VERSION) -o $@
+	@ln -sf $(@F) $(OUTPUT)libtraceevent.so
+	@ln -sf $(@F) $(OUTPUT)libtraceevent.so.$(EP_VERSION)

 $(OUTPUT)libtraceevent.a: $(TE_IN)
 	$(QUIET_LINK)$(RM) $@; $(AR) rcs $@ $^
@@ -236,11 +239,15 @@ TAGS:	force
 	find . -name '*.[ch]' | xargs etags \
 	--regex='/_PE(\([^,)]*\).*/PEVENT_ERRNO__\1/'

+define do_install_mkdir
+	if [ ! -d '$(DESTDIR_SQ)$1' ]; then		\
+		$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$1';	\
+	fi
+endef
+
 define do_install
-	if [ ! -d '$(DESTDIR_SQ)$2' ]; then		\
-		$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$2';	\
-	fi;						\
-	$(INSTALL) $1 '$(DESTDIR_SQ)$2'
+	$(call do_install_mkdir,$2);			\
+	$(INSTALL) $(if $3,-m $3,) $1 '$(DESTDIR_SQ)$2'
 endef

 define do_install_plugins
@@ -257,13 +264,20 @@ define do_generate_dynamic_list_file
 endef

 install_lib: all_cmd install_plugins
-	$(call QUIET_INSTALL, $(LIB_FILE)) \
-		$(call do_install,$(LIB_FILE),$(libdir_SQ))
+	$(call QUIET_INSTALL, $(LIB_TARGET)) \
+		$(call do_install_mkdir,$(libdir_SQ)); \
+		cp -fpR $(LIB_INSTALL) $(DESTDIR)$(libdir_SQ)

 install_plugins: $(PLUGINS)
 	$(call QUIET_INSTALL, trace_plugins) \
 		$(call do_install_plugins, $(PLUGINS))

+install_headers:
+	$(call QUIET_INSTALL, headers) \
+		$(call do_install,event-parse.h,$(prefix)/include/traceevent,644); \
+		$(call do_install,event-utils.h,$(prefix)/include/traceevent,644); \
+		$(call do_install,kbuffer.h,$(prefix)/include/traceevent,644)
+
 install: install_lib

 clean:

--- a/tools/lib/traceevent/event-parse.c
+++ b/tools/lib/traceevent/event-parse.c
@@ -33,6 +33,7 @@
 #include <stdint.h>
 #include <limits.h>
 #include <linux/string.h>
+#include <linux/time64.h>

 #include <netinet/in.h>
 #include "event-parse.h"
@@ -5191,17 +5192,43 @@ struct event_format *pevent_data_event_from_type(struct pevent *pevent, int type
 }

 /**
- * pevent_data_pid - parse the PID from raw data
+ * pevent_data_pid - parse the PID from record
 * @pevent: a handle to the pevent
 * @rec: the record to parse
 *
- * This returns the PID from a raw data.
+ * This returns the PID from a record.
 */
 int pevent_data_pid(struct pevent *pevent, struct pevent_record *rec)
 {
 	return parse_common_pid(pevent, rec->data);
 }

+/**
+ * pevent_data_prempt_count - parse the preempt count from the record
+ * @pevent: a handle to the pevent
+ * @rec: the record to parse
+ *
+ * This returns the preempt count from a record.
+ */
+int pevent_data_prempt_count(struct pevent *pevent, struct pevent_record *rec)
+{
+	return parse_common_pc(pevent, rec->data);
+}
+
+/**
+ * pevent_data_flags - parse the latency flags from the record
+ * @pevent: a handle to the pevent
+ * @rec: the record to parse
+ *
+ * This returns the latency flags from a record.
+ *
+ *  Use trace_flag_type enum for the flags (see event-parse.h).
+ */
+int pevent_data_flags(struct pevent *pevent, struct pevent_record *rec)
+{
+	return parse_common_flags(pevent, rec->data);
+}
+
 /**
 * pevent_data_comm_from_pid - return the command line from PID
 * @pevent: a handle to the pevent
@@ -5424,8 +5451,8 @@ void pevent_print_event_time(struct pevent *pevent, struct trace_seq *s,
 	use_usec_format = is_timestamp_in_us(pevent->trace_clock,
 							use_trace_clock);
 	if (use_usec_format) {
-		secs = record->ts / NSECS_PER_SEC;
-		nsecs = record->ts - secs * NSECS_PER_SEC;
+		secs = record->ts / NSEC_PER_SEC;
+		nsecs = record->ts - secs * NSEC_PER_SEC;
 	}

 	if (pevent->latency_format) {
@@ -5437,10 +5464,10 @@ void pevent_print_event_time(struct pevent *pevent, struct trace_seq *s,
 			usecs = nsecs;
 			p = 9;
 		} else {
-			usecs = (nsecs + 500) / NSECS_PER_USEC;
+			usecs = (nsecs + 500) / NSEC_PER_USEC;
 			/* To avoid usecs larger than 1 sec */
-			if (usecs >= 1000000) {
-				usecs -= 1000000;
+			if (usecs >= USEC_PER_SEC) {
+				usecs -= USEC_PER_SEC;
 				secs++;
 			}
 			p = 6;

--- a/tools/lib/traceevent/event-parse.h
+++ b/tools/lib/traceevent/event-parse.h
@@ -172,9 +172,6 @@ struct pevent_plugin_option {
 #define PEVENT_PLUGIN_OPTIONS_NAME MAKE_STR(PEVENT_PLUGIN_OPTIONS)
 #define PEVENT_PLUGIN_ALIAS_NAME MAKE_STR(PEVENT_PLUGIN_ALIAS)

-#define NSECS_PER_SEC		1000000000ULL
-#define NSECS_PER_USEC		1000ULL
-
 enum format_flags {
 	FIELD_IS_ARRAY		= 1,
 	FIELD_IS_POINTER	= 2,
@@ -712,6 +709,8 @@ void pevent_data_lat_fmt(struct pevent *pevent,
 int pevent_data_type(struct pevent *pevent, struct pevent_record *rec);
 struct event_format *pevent_data_event_from_type(struct pevent *pevent, int type);
 int pevent_data_pid(struct pevent *pevent, struct pevent_record *rec);
+int pevent_data_prempt_count(struct pevent *pevent, struct pevent_record *rec);
+int pevent_data_flags(struct pevent *pevent, struct pevent_record *rec);
 const char *pevent_data_comm_from_pid(struct pevent *pevent, int pid);
 struct cmdline;
 struct cmdline *pevent_data_pid_from_comm(struct pevent *pevent, const char *comm,

--- a/tools/perf/Build
+++ b/tools/perf/Build
@@ -21,6 +21,7 @@ perf-y += builtin-inject.o
 perf-y += builtin-mem.o
 perf-y += builtin-data.o
 perf-y += builtin-version.o
+perf-y += builtin-c2c.o

 perf-$(CONFIG_AUDIT) += builtin-trace.o
 perf-$(CONFIG_LIBELF) += builtin-probe.o

--- a/tools/perf/Documentation/intel-pt.txt
+++ b/tools/perf/Documentation/intel-pt.txt
@@ -550,6 +550,18 @@ Unless /proc/sys/kernel/perf_event_paranoid is set to -1, unprivileged users
 have memory limits imposed upon them.  That affects what buffer sizes they can
 have as outlined above.

+The v4.2 kernel introduced support for a context switch metadata event,
+PERF_RECORD_SWITCH, which allows unprivileged users to see when their processes
+are scheduled out and in, just not by whom, which is left for the
+PERF_RECORD_SWITCH_CPU_WIDE, that is only accessible in system wide context,
+which in turn requires CAP_SYS_ADMIN.
+
+Please see the 45ac1403f564 ("perf: Add PERF_RECORD_SWITCH to indicate context
+switches") commit, that introduces these metadata events for further info.
+
+When working with kernels < v4.2, the following considerations must be taken,
+as the sched:sched_switch tracepoints will be used to receive such information:
+
 Unless /proc/sys/kernel/perf_event_paranoid is set to -1, unprivileged users are
 not permitted to use tracepoints which means there is insufficient side-band
 information to decode Intel PT in per-cpu mode, and potentially workload-only
@@ -564,8 +576,11 @@ sched_switch tracepoint
 -----------------------

 The sched_switch tracepoint is used to provide side-band data for Intel PT
-decoding.  sched_switch events are automatically added. e.g. the second event
-shown below
+decoding in kernels where the PERF_RECORD_SWITCH metadata event isn't
+available.
+
+The sched_switch events are automatically added. e.g. the second event shown
+below:

 	$ perf record -vv -e intel_pt//u uname
 	------------------------------------------------------------

--- a/tools/perf/Documentation/jitdump-specification.txt
+++ b/tools/perf/Documentation/jitdump-specification.txt
+JITDUMP specification version 2
+Last Revised: 09/15/2016
+Author: Stephane Eranian <eranian@gmail.com>
+
+--------------------------------------------------------
+| Revision  |    Date    | Description                 |
+--------------------------------------------------------
+|   1       | 09/07/2016 | Initial revision            |
+--------------------------------------------------------
+|   2       | 09/15/2016 | Add JIT_CODE_UNWINDING_INFO |
+--------------------------------------------------------
+
+
+I/ Introduction
+
+
+This document describes the jitdump file format. The file is generated by Just-In-time compiler runtimes to save meta-data information about the generated code, such as address, size, and name of generated functions, the native code generated, the source line information. The data may then be used by performance tools, such as Linux perf to generate function and assembly level profiles.
+
+The format is not specific to any particular programming language. It can be extended as need be.
+
+The format of the file is binary. It is self-describing in terms of endianness and is portable across multiple processor architectures.
+
+
+II/ Overview of the format
+
+
+The format requires only sequential accesses, i.e., append only mode. The file starts with a fixed size file header describing the version of the specification, the endianness.
+
+The header is followed by a series of records, each starting with a fixed size header describing the type of record and its size. It is, itself, followed by the payload for the record. Records can have a variable size even for a given type.
+
+Each entry in the file is timestamped. All timestamps must use the same clock source. The CLOCK_MONOTONIC clock source is recommended.
+
+
+III/ Jitdump file header format
+
+Each jitdump file starts with a fixed size header containing the following fields in order:
+
+
+* uint32_t magic     : a magic number tagging the file type. The value is 4-byte long and represents the string "JiTD" in ASCII form. It is 0x4A695444 or 0x4454694a depending on the endianness. The field can be used to detect the endianness of the file
+* uint32_t version   : a 4-byte value representing the format version. It is currently set to 2
+* uint32_t total_size: size in bytes of file header
+* uint32_t elf_mach  : ELF architecture encoding (ELF e_machine value as specified in /usr/include/elf.h)
+* uint32_t pad1      : padding. Reserved for future use
+* uint32_t pid       : JIT runtime process identification (OS specific)
+* uint64_t timestamp : timestamp of when the file was created
+* uint64_t flags     : a bitmask of flags
+
+The flags currently defined are as follows:
+ * bit 0: JITDUMP_FLAGS_ARCH_TIMESTAMP : set if the jitdump file is using an architecture-specific timestamp clock source. For instance, on x86, one could use TSC directly
+
+IV/ Record header
+
+The file header is immediately followed by records. Each record starts with a fixed size header describing the record that follows.
+
+The record header is specified in order as follows:
+* uint32_t id        : a value identifying the record type (see below)
+* uint32_t total_size: the size in bytes of the record including the header.
+* uint64_t timestamp : a timestamp of when the record was created.
+
+The following record types are defined:
+ * Value 0 : JIT_CODE_LOAD      : record describing a jitted function
+ * Value 1 : JIT_CODE_MOVE      : record describing an already jitted function which is moved
+ * Value 2 : JIT_CODE_DEBUG_INFO: record describing the debug information for a jitted function
+ * Value 3 : JIT_CODE_CLOSE     : record marking the end of the jit runtime (optional)
+ * Value 4 : JIT_CODE_UNWINDING_INFO: record describing a function unwinding information
+
+ The payload of the record must immediately follow the record header without padding.
+
+V/ JIT_CODE_LOAD record
+
+
+  The record has the following fields following the fixed-size record header in order:
+  * uint32_t pid: OS process id of the runtime generating the jitted code
+  * uint32_t tid: OS thread identification of the runtime thread generating the jitted code
+  * uint64_t vma: virtual address of jitted code start
+  * uint64_t code_addr: code start address for the jitted code. By default vma = code_addr
+  * uint64_t code_size: size in bytes of the generated jitted code
+  * uint64_t code_index: unique identifier for the jitted code (see below)
+  * char[n]: function name in ASCII including the null termination
+  * native code: raw byte encoding of the jitted code
+
+  The record header total_size field is inclusive of all components:
+  * record header
+  * fixed-sized fields
+  * function name string, including termination
+  * native code length
+  * record specific variable data (e.g., array of data entries)
+
+The code_index is used to uniquely identify each jitted function. The index can be a monotonically increasing 64-bit value. Each time a function is jitted it gets a new number. This value is used in case the code for a function is moved and avoids having to issue another JIT_CODE_LOAD record.
+
+The format supports empty functions with no native code.
+
+
+VI/ JIT_CODE_MOVE record
+
+  The record type is optional.
+
+  The record has the following fields following the fixed-size record header in order:
+  * uint32_t pid          : OS process id of the runtime generating the jitted code
+  * uint32_t tid          : OS thread identification of the runtime thread generating the jitted code
+  * uint64_t vma          : new virtual address of jitted code start
+  * uint64_t old_code_addr: previous code address for the same function
+  * uint64_t new_code_addr: alternate new code started address for the jitted code. By default it should be equal to the vma address.
+  * uint64_t code_size    : size in bytes of the jitted code
+  * uint64_t code_index   : index referring to the JIT_CODE_LOAD code_index record of when the function was initially jitted
+
+
+The MOVE record can be used in case an already jitted function is simply moved by the runtime inside the code cache.
+
+The JIT_CODE_MOVE record cannot come before the JIT_CODE_LOAD record for the same function name. The function cannot have changed name, otherwise a new JIT_CODE_LOAD record must be emitted.
+
+The code size of the function cannot change.
+
+
+VII/ JIT_DEBUG_INFO record
+
+The record type is optional.
+
+The record contains source lines debug information, i.e., a way to map a code address back to a source line. This information may be used by the performance tool.
+
+The record has the following fields following the fixed-size record header in order:
+  * uint64_t code_addr: address of function for which the debug information is generated
+  * uint64_t nr_entry : number of debug entries for the function
+  * debug_entry[n]: array of nr_entry debug entries for the function
+
+The debug_entry describes the source line information. It is defined as follows in order:
+* uint64_t code_addr: address of function for which the debug information is generated
+* uint32_t line     : source file line number (starting at 1)
+* uint32_t discrim  : column discriminator, 0 is default
+* char name[n]      : source file name in ASCII, including null termination
+
+The debug_entry entries are saved in sequence but given that they have variable sizes due to the file name string, they cannot be indexed directly.
+They need to be walked sequentially. The next debug_entry is found at sizeof(debug_entry) + strlen(name) + 1.
+
+IMPORTANT:
+  The JIT_CODE_DEBUG for a given function must always be generated BEFORE the JIT_CODE_LOAD for the function. This facilitates greatly the parser for the jitdump file.
+
+
+VIII/ JIT_CODE_CLOSE record
+
+
+The record type is optional.
+
+The record is used as a marker for the end of the jitted runtime. It can be replaced by the end of the file.
+
+The JIT_CODE_CLOSE record does not have any specific fields, the record header contains all the information needed.
+
+
+IX/ JIT_CODE_UNWINDING_INFO
+
+
+The record type is optional.
+
+The record is used to describe the unwinding information for a jitted function.
+
+The record has the following fields following the fixed-size record header in order:
+
+uint64_t unwind_data_size   : the size in bytes of the unwinding data table at the end of the record
+uint64_t eh_frame_hdr_size  : the size in bytes of the DWARF EH Frame Header at the start of the unwinding data table at the end of the record
+uint64_t mapped_size        : the size of the unwinding data mapped in memory
+const char unwinding_data[n]: an array of unwinding data, consisting of the EH Frame Header, followed by the actual EH Frame
+
+
+The EH Frame header follows the Linux Standard Base (LSB) specification as described in the document at https://refspecs.linuxfoundation.org/LSB_1.3.0/gLSB/gLSB/ehframehdr.html
+
+
+The EH Frame follows the LSB specicfication as described in the document at https://refspecs.linuxbase.org/LSB_3.0.0/LSB-PDA/LSB-PDA/ehframechpt.html
+
+
+NOTE: The mapped_size is generally either the same as unwind_data_size (if the unwinding data was mapped in memory by the running process) or zero (if the unwinding data is not mapped by the process). If the unwinding data was not mapped, then only the EH Frame Header will be read, which can be used to specify FP based unwinding for a function which does not have unwinding information.
--- a/tools/perf/Documentation/perf-c2c.txt
+++ b/tools/perf/Documentation/perf-c2c.txt
+perf-c2c(1)
+===========
+
+NAME
+----
+perf-c2c - Shared Data C2C/HITM Analyzer.
+
+SYNOPSIS
+--------
+[verse]
+'perf c2c record' [<options>] <command>
+'perf c2c record' [<options>] -- [<record command options>] <command>
+'perf c2c report' [<options>]
+
+DESCRIPTION
+-----------
+C2C stands for Cache To Cache.
+
+The perf c2c tool provides means for Shared Data C2C/HITM analysis. It allows
+you to track down the cacheline contentions.
+
+The tool is based on x86's load latency and precise store facility events
+provided by Intel CPUs. These events provide:
+  - memory address of the access
+  - type of the access (load and store details)
+  - latency (in cycles) of the load access
+
+The c2c tool provide means to record this data and report back access details
+for cachelines with highest contention - highest number of HITM accesses.
+
+The basic workflow with this tool follows the standard record/report phase.
+User uses the record command to record events data and report command to
+display it.
+
+
+RECORD OPTIONS
+--------------
+-e::
+--event=::
+	Select the PMU event. Use 'perf mem record -e list'
+	to list available events.
+
+-v::
+--verbose::
+	Be more verbose (show counter open errors, etc).
+
+-l::
+--ldlat::
+	Configure mem-loads latency.
+
+-k::
+--all-kernel::
+	Configure all used events to run in kernel space.
+
+-u::
+--all-user::
+	Configure all used events to run in user space.
+
+REPORT OPTIONS
+--------------
+-k::
+--vmlinux=<file>::
+	vmlinux pathname
+
+-v::
+--verbose::
+	Be more verbose (show counter open errors, etc).
+
+-i::
+--input::
+	Specify the input file to process.
+
+-N::
+--node-info::
+	Show extra node info in report (see NODE INFO section)
+
+-c::
+--coalesce::
+	Specify sorintg fields for single cacheline display.
+	Following fields are available: tid,pid,iaddr,dso
+	(see COALESCE)
+
+-g::
+--call-graph::
+	Setup callchains parameters.
+	Please refer to perf-report man page for details.
+
+--stdio::
+	Force the stdio output (see STDIO OUTPUT)
+
+--stats::
+	Display only statistic tables and force stdio mode.
+
+--full-symbols::
+	Display full length of symbols.
+
+--no-source::
+	Do not display Source:Line column.
+
+--show-all::
+	Show all captured HITM lines, with no regard to HITM % 0.0005 limit.
+
+-f::
+--force::
+	Don't do ownership validation.
+
+-d::
+--display::
+	Siwtch to HITM type (rmt, lcl) to display and sort on. Total HITMs as default.
+
+C2C RECORD
+----------
+The perf c2c record command setup options related to HITM cacheline analysis
+and calls standard perf record command.
+
+Following perf record options are configured by default:
+(check perf record man page for details)
+
+  -W,-d,--sample-cpu
+
+Unless specified otherwise with '-e' option, following events are monitored by
+default:
+
+  cpu/mem-loads,ldlat=30/P
+  cpu/mem-stores/P
+
+User can pass any 'perf record' option behind '--' mark, like (to enable
+callchains and system wide monitoring):
+
+  $ perf c2c record -- -g -a
+
+Please check RECORD OPTIONS section for specific c2c record options.
+
+C2C REPORT
+----------
+The perf c2c report command displays shared data analysis.  It comes in two
+display modes: stdio and tui (default).
+
+The report command workflow is following:
+  - sort all the data based on the cacheline address
+  - store access details for each cacheline
+  - sort all cachelines based on user settings
+  - display data
+
+In general perf report output consist of 2 basic views:
+  1) most expensive cachelines list
+  2) offsets details for each cacheline
+
+For each cacheline in the 1) list we display following data:
+(Both stdio and TUI modes follow the same fields output)
+
+  Index
+  - zero based index to identify the cacheline
+
+  Cacheline
+  - cacheline address (hex number)
+
+  Total records
+  - sum of all cachelines accesses
+
+  Rmt/Lcl Hitm
+  - cacheline percentage of all Remote/Local HITM accesses
+
+  LLC Load Hitm - Total, Lcl, Rmt
+  - count of Total/Local/Remote load HITMs
+
+  Store Reference - Total, L1Hit, L1Miss
+    Total - all store accesses
+    L1Hit - store accesses that hit L1
+    L1Hit - store accesses that missed L1
+
+  Load Dram
+  - count of local and remote DRAM accesses
+
+  LLC Ld Miss
+  - count of all accesses that missed LLC
+
+  Total Loads
+  - sum of all load accesses
+
+  Core Load Hit - FB, L1, L2
+  - count of load hits in FB (Fill Buffer), L1 and L2 cache
+
+  LLC Load Hit - Llc, Rmt
+  - count of LLC and Remote load hits
+
+For each offset in the 2) list we display following data:
+
+  HITM - Rmt, Lcl
+  - % of Remote/Local HITM accesses for given offset within cacheline
+
+  Store Refs - L1 Hit, L1 Miss
+  - % of store accesses that hit/missed L1 for given offset within cacheline
+
+  Data address - Offset
+  - offset address
+
+  Pid
+  - pid of the process responsible for the accesses
+
+  Tid
+  - tid of the process responsible for the accesses
+
+  Code address
+  - code address responsible for the accesses
+
+  cycles - rmt hitm, lcl hitm, load
+    - sum of cycles for given accesses - Remote/Local HITM and generic load
+
+  cpu cnt
+    - number of cpus that participated on the access
+
+  Symbol
+    - code symbol related to the 'Code address' value
+
+  Shared Object
+    - shared object name related to the 'Code address' value
+
+  Source:Line
+    - source information related to the 'Code address' value
+
+  Node
+    - nodes participating on the access (see NODE INFO section)
+
+NODE INFO
+---------
+The 'Node' field displays nodes that accesses given cacheline
+offset. Its output comes in 3 flavors:
+  - node IDs separated by ','
+  - node IDs with stats for each ID, in following format:
+      Node{cpus %hitms %stores}
+  - node IDs with list of affected CPUs in following format:
+      Node{cpu list}
+
+User can switch between above flavors with -N option or
+use 'n' key to interactively switch in TUI mode.
+
+COALESCE
+--------
+User can specify how to sort offsets for cacheline.
+
+Following fields are available and governs the final
+output fields set for caheline offsets output:
+
+  tid   - coalesced by process TIDs
+  pid   - coalesced by process PIDs
+  iaddr - coalesced by code address, following fields are displayed:
+             Code address, Code symbol, Shared Object, Source line
+  dso   - coalesced by shared object
+
+By default the coalescing is setup with 'pid,tid,iaddr'.
+
+STDIO OUTPUT
+------------
+The stdio output displays data on standard output.
+
+Following tables are displayed:
+  Trace Event Information
+  - overall statistics of memory accesses
+
+  Global Shared Cache Line Event Information
+  - overall statistics on shared cachelines
+
+  Shared Data Cache Line Table
+  - list of most expensive cachelines
+
+  Shared Cache Line Distribution Pareto
+  - list of all accessed offsets for each cacheline
+
+TUI OUTPUT
+----------
+The TUI output provides interactive interface to navigate
+through cachelines list and to display offset details.
+
+For details please refer to the help window by pressing '?' key.
+
+CREDITS
+-------
+Although Don Zickus, Dick Fowles and Joe Mario worked together
+to get this implemented, we got lots of early help from Arnaldo
+Carvalho de Melo, Stephane Eranian, Jiri Olsa and Andi Kleen.
+
+C2C BLOG
+--------
+Check Joe's blog on c2c tool for detailed use case explanation:
+  https://joemario.github.io/blog/2016/09/01/c2c-blog/
+
+SEE ALSO
+--------
+linkperf:perf-record[1], linkperf:perf-mem[1]
--- a/tools/perf/Documentation/perf-config.txt
+++ b/tools/perf/Documentation/perf-config.txt
@@ -8,6 +8,8 @@ perf-config - Get and set variables in a configuration file.
 SYNOPSIS
 --------
 [verse]
+'perf config' [<file-option>] [section.name[=value] ...]
+or
 'perf config' [<file-option>] -l | --list

 DESCRIPTION
@@ -118,6 +120,39 @@ Given a $HOME/.perfconfig like this:
 		children = true
 		group = true

+You can hide source code of annotate feature setting the config to false with
+
+	% perf config annotate.hide_src_code=true
+
+If you want to add or modify several config items, you can do like
+
+	% perf config ui.show-headers=false kmem.default=slab
+
+To modify the sort order of report functionality in user config file(i.e. `~/.perfconfig`), do
+
+	% perf config --user report sort-order=srcline
+
+To change colors of selected line to other foreground and background colors
+in system config file (i.e. `$(sysconf)/perfconfig`), do
+
+	% perf config --system colors.selected=yellow,green
+
+To query the record mode of call graph, do
+
+	% perf config call-graph.record-mode
+
+If you want to know multiple config key/value pairs, you can do like
+
+	% perf config report.queue-size call-graph.order report.children
+
+To query the config value of sort order of call graph in user config file (i.e. `~/.perfconfig`), do
+
+	% perf config --user call-graph.sort-order
+
+To query the config value of buildid directory in system config file (i.e. `$(sysconf)/perfconfig`), do
+
+	% perf config --system buildid.dir
+
 Variables
 ~~~~~~~~~


--- a/tools/perf/Documentation/perf-kmem.txt
+++ b/tools/perf/Documentation/perf-kmem.txt
@@ -61,6 +61,13 @@ OPTIONS
 	default, but this option shows live (currently allocated) pages
 	instead.  (This option works with --page option only)

+--time::
+	Only analyze samples within given time window: <start>,<stop>. Times
+	have the format seconds.microseconds. If start is not given (i.e., time
+	string is ',x.y') then analysis starts at the beginning of the file. If
+	stop time is not given (i.e, time string is 'x.y,') then analysis goes
+	to end of file.
+
 SEE ALSO
 --------
 linkperf:perf-record[1]
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -45,9 +45,9 @@ OPTIONS
          param1 and param2 are defined as formats for the PMU in:
          /sys/bus/event_source/devices/<pmu>/format/*

-	  There are also some params which are not defined in .../<pmu>/format/*.
+	  There are also some parameters which are not defined in .../<pmu>/format/*.
 	  These params can be used to overload default config values per event.
-	  Here is a list of the params.
+	  Here are some common parameters:
 	  - 'period': Set event sampling period
 	  - 'freq': Set event sampling frequency
 	  - 'time': Disable/enable time stamping. Acceptable values are 1 for
@@ -57,8 +57,11 @@ OPTIONS
 			 FP mode, "dwarf" for DWARF mode, "lbr" for LBR mode and
 			 "no" for disable callgraph.
 	  - 'stack-size': user stack size for dwarf mode
+
+          See the linkperf:perf-list[1] man page for more parameters.
+
 	  Note: If user explicitly sets options which conflict with the params,
-	  the value set by the params will be overridden.
+	  the value set by the parameters will be overridden.

 	  Also not defined in .../<pmu>/format/* are PMU driver specific
 	  configuration parameters.  Any configuration parameter preceded by

--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -239,7 +239,8 @@ OPTIONS
 	Accumulate callchain of children to parent entry so that then can
 	show up in the output.  The output will have a new "Children" column
 	and will be sorted on the data.  It requires callchains are recorded.
-	See the `overhead calculation' section for more details.
+	See the `overhead calculation' section for more details. Enabled by
+	default, disable with --no-children.

 --max-stack::
 	Set the stack depth limit when parsing the callchain, anything
@@ -382,6 +383,13 @@ OPTIONS
 --header-only::
 	Show only perf.data header (forces --stdio).

+--time::
+	Only analyze samples within given time window: <start>,<stop>. Times
+	have the format seconds.microseconds. If start is not given (i.e., time
+	string is ',x.y') then analysis starts at the beginning of the file. If
+	stop time is not given (i.e, time string is 'x.y,') then analysis goes
+	to end of file.
+
 --itrace::
 	Options for decoding instruction tracing data. The options are:


--- a/tools/perf/Documentation/perf-sched.txt
+++ b/tools/perf/Documentation/perf-sched.txt
@@ -8,11 +8,11 @@ perf-sched - Tool to trace/measure scheduler properties (latencies)
 SYNOPSIS
 --------
 [verse]
-'perf sched' {record|latency|map|replay|script}
+'perf sched' {record|latency|map|replay|script|timehist}

 DESCRIPTION
 -----------
-There are five variants of perf sched:
+There are several variants of 'perf sched':

  'perf sched record <command>' to record the scheduling events
  of an arbitrary workload.
@@ -36,6 +36,30 @@ There are five variants of perf sched:
  are running on a CPU. A '*' denotes the CPU that had the event, and
  a dot signals an idle CPU.

+  'perf sched timehist' provides an analysis of scheduling events.
+    
+    Example usage:
+        perf sched record -- sleep 1
+        perf sched timehist
+    
+   By default it shows the individual schedule events, including the wait
+   time (time between sched-out and next sched-in events for the task), the
+   task scheduling delay (time between wakeup and actually running) and run
+   time for the task:
+    
+                time    cpu  task name             wait time  sch delay   run time
+                             [tid/pid]                (msec)     (msec)     (msec)
+      -------------- ------  --------------------  ---------  ---------  ---------
+        79371.874569 [0011]  gcc[31949]                0.014      0.000      1.148
+        79371.874591 [0010]  gcc[31951]                0.000      0.000      0.024
+        79371.874603 [0010]  migration/10[59]          3.350      0.004      0.011
+        79371.874604 [0011]  <idle>                    1.148      0.000      0.035
+        79371.874723 [0005]  <idle>                    0.016      0.000      1.383
+        79371.874746 [0005]  gcc[31949]                0.153      0.078      0.022
+    ...
+    
+   Times are in msec.usec.
+
 OPTIONS
 -------
 -i::
@@ -66,6 +90,56 @@ OPTIONS for 'perf sched map'
 --color-pids::
 	Highlight the given pids.

+OPTIONS for 'perf sched timehist'
+---------------------------------
+-k::
+--vmlinux=<file>::
+    vmlinux pathname
+
+--kallsyms=<file>::
+    kallsyms pathname
+
+-g::
+--no-call-graph::
+	Do not display call chains if present.
+
+--max-stack::
+	Maximum number of functions to display in backtrace, default 5.
+
+-s::
+--summary::
+    Show only a summary of scheduling by thread with min, max, and average
+    run times (in sec) and relative stddev.
+
+-S::
+--with-summary::
+    Show all scheduling events followed by a summary by thread with min,
+    max, and average run times (in sec) and relative stddev.
+
+--symfs=<directory>::
+    Look for files with symbols relative to this directory.
+
+-V::
+--cpu-visual::
+	Show visual aid for sched switches by CPU: 'i' marks idle time,
+	's' are scheduler events.
+
+-w::
+--wakeups::
+	Show wakeup events.
+
+-M::
+--migrations::
+	Show migration events.
+
+--time::
+	Only analyze samples within given time window: <start>,<stop>. Times
+	have the format seconds.microseconds. If start is not given (i.e., time
+	string is ',x.y') then analysis starts at the beginning of the file. If
+	stop time is not given (i.e, time string is 'x.y,') then analysis goes
+	to end of file.
+
+
 SEE ALSO
 --------
 linkperf:perf-record[1]
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -117,7 +117,7 @@ OPTIONS
        Comma separated list of fields to print. Options are:
        comm, tid, pid, time, cpu, event, trace, ip, sym, dso, addr, symoff,
        srcline, period, iregs, brstack, brstacksym, flags, bpf-output,
-        callindent. Field list can be prepended with the type, trace, sw or hw,
+        callindent, insn, insnlen. Field list can be prepended with the type, trace, sw or hw,
        to indicate to which event type the field list applies.
        e.g., -F sw:comm,tid,time,ip,sym  and -F trace:time,cpu,trace

@@ -181,6 +181,10 @@ OPTIONS
 	Instruction Trace decoding. For calls and returns, it will display the
 	name of the symbol indented with spaces to reflect the stack depth.

+	When doing instruction trace decoding insn and insnlen give the
+	instruction bytes and the instruction length of the current
+	instruction.
+
 	Finally, a user may not set fields to none for all event types.
 	i.e., -F "" is not allowed.

@@ -208,6 +212,9 @@ OPTIONS
 --hide-call-graph::
        When printing symbols do not display call chain.

+--stop-bt::
+        Stop display of callgraph at these symbols
+
 -C::
 --cpu:: Only report samples for the list of CPUs provided. Multiple CPUs can
 	be provided as a comma-separated list with no space: 0,1. Ranges of
@@ -285,6 +292,13 @@ include::itrace.txt[]
 --force::
 	Don't do ownership validation.

+--time::
+	Only analyze samples within given time window: <start>,<stop>. Times
+	have the format seconds.microseconds. If start is not given (i.e., time
+	string is ',x.y') then analysis starts at the beginning of the file. If
+	stop time is not given (i.e, time string is 'x.y,') then analysis goes
+	to end of file.
+
 SEE ALSO
 --------
 linkperf:perf-record[1], linkperf:perf-script-perl[1],

--- a/tools/perf/Documentation/perf-top.txt
+++ b/tools/perf/Documentation/perf-top.txt
@@ -170,6 +170,7 @@ Default is to monitor all CPUS.
 	show up in the output.  The output will have a new "Children" column
 	and will be sorted on the data.  It requires -g/--call-graph option
 	enabled.  See the `overhead calculation' section for more details.
+	Enabled by default, disable with --no-children.

 --max-stack::
 	Set the stack depth limit when parsing the callchain, anything

--- a/tools/perf/Documentation/perf-trace.txt
+++ b/tools/perf/Documentation/perf-trace.txt
@@ -39,6 +39,11 @@ OPTIONS
 	Prefixing with ! shows all syscalls but the ones specified.  You may
 	need to escape it.

+-D msecs::
+--delay msecs::
+After starting the program, wait msecs before measuring. This is useful to
+filter out the startup phase of the program, which is often very different.
+
 -o::
 --output=::
 	Output file name.

--- a/tools/perf/MANIFEST
+++ b/tools/perf/MANIFEST
@@ -51,6 +51,7 @@ tools/include/asm-generic/bitops/arch_hweight.h
 tools/include/asm-generic/bitops/atomic.h
 tools/include/asm-generic/bitops/const_hweight.h
 tools/include/asm-generic/bitops/__ffs.h
+tools/include/asm-generic/bitops/__ffz.h
 tools/include/asm-generic/bitops/__fls.h
 tools/include/asm-generic/bitops/find.h
 tools/include/asm-generic/bitops/fls64.h

--- a/tools/perf/Makefile.config
+++ b/tools/perf/Makefile.config
@@ -136,6 +136,7 @@ endif
 # Treat warnings as errors unless directed not to
 ifneq ($(WERROR),0)
  CFLAGS += -Werror
+  CXXFLAGS += -Werror
 endif

 ifndef DEBUG
@@ -182,6 +183,13 @@ CFLAGS += -Wall
 CFLAGS += -Wextra
 CFLAGS += -std=gnu99

+CXXFLAGS += -std=gnu++11 -fno-exceptions -fno-rtti
+CXXFLAGS += -Wall
+CXXFLAGS += -fno-omit-frame-pointer
+CXXFLAGS += -ggdb3
+CXXFLAGS += -funwind-tables
+CXXFLAGS += -Wno-strict-aliasing
+
 # Enforce a non-executable stack, as we may regress (again) in the future by
 # adding assembler files missing the .GNU-stack linker note.
 LDFLAGS += -Wl,-z,noexecstack
@@ -204,24 +212,27 @@ ifeq ($(DEBUG),0)
  endif
 endif

-CFLAGS += -I$(src-perf)/util/include
-CFLAGS += -I$(src-perf)/arch/$(ARCH)/include
-CFLAGS += -I$(srctree)/tools/include/uapi
-CFLAGS += -I$(srctree)/tools/include/
-CFLAGS += -I$(srctree)/tools/arch/$(ARCH)/include/uapi
-CFLAGS += -I$(srctree)/tools/arch/$(ARCH)/include/
-CFLAGS += -I$(srctree)/tools/arch/$(ARCH)/
+INC_FLAGS += -I$(src-perf)/util/include
+INC_FLAGS += -I$(src-perf)/arch/$(ARCH)/include
+INC_FLAGS += -I$(srctree)/tools/include/uapi
+INC_FLAGS += -I$(srctree)/tools/include/
+INC_FLAGS += -I$(srctree)/tools/arch/$(ARCH)/include/uapi
+INC_FLAGS += -I$(srctree)/tools/arch/$(ARCH)/include/
+INC_FLAGS += -I$(srctree)/tools/arch/$(ARCH)/

 # $(obj-perf)      for generated common-cmds.h
 # $(obj-perf)/util for generated bison/flex headers
 ifneq ($(OUTPUT),)
-CFLAGS += -I$(obj-perf)/util
-CFLAGS += -I$(obj-perf)
+INC_FLAGS += -I$(obj-perf)/util
+INC_FLAGS += -I$(obj-perf)
 endif

-CFLAGS += -I$(src-perf)/util
-CFLAGS += -I$(src-perf)
-CFLAGS += -I$(srctree)/tools/lib/
+INC_FLAGS += -I$(src-perf)/util
+INC_FLAGS += -I$(src-perf)
+INC_FLAGS += -I$(srctree)/tools/lib/
+
+CFLAGS   += $(INC_FLAGS)
+CXXFLAGS += $(INC_FLAGS)

 CFLAGS += -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE

@@ -366,7 +377,7 @@ ifndef NO_SDT
 endif

 ifdef PERF_HAVE_JITDUMP
-  ifndef NO_DWARF
+  ifndef NO_LIBELF
    $(call detected,CONFIG_JITDUMP)
    CFLAGS += -DHAVE_JITDUMP
  endif
@@ -758,6 +769,62 @@ ifndef NO_AUXTRACE
  endif
 endif

+ifndef NO_JVMTI
+  ifneq (,$(wildcard /usr/sbin/update-java-alternatives))
+    JDIR=$(shell /usr/sbin/update-java-alternatives -l | head -1 | awk '{print $$3}')
+  else
+    ifneq (,$(wildcard /usr/sbin/alternatives))
+      JDIR=$(shell alternatives --display java | tail -1 | cut -d' ' -f 5 | sed 's%/jre/bin/java.%%g')
+    endif
+  endif
+  ifndef JDIR
+    $(warning No alternatives command found, you need to set JDIR= to point to the root of your Java directory)
+    NO_JVMTI := 1
+  endif
+endif
+
+ifndef NO_JVMTI
+  FEATURE_CHECK_CFLAGS-jvmti := -I$(JDIR)/include -I$(JDIR)/include/linux
+  $(call feature_check,jvmti)
+  ifeq ($(feature-jvmti), 1)
+    $(call detected_var,JDIR)
+  else
+    $(warning No openjdk development package found, please install JDK package)
+    NO_JVMTI := 1
+  endif
+endif
+
+USE_CXX = 0
+USE_CLANGLLVM = 0
+ifdef LIBCLANGLLVM
+  $(call feature_check,cxx)
+  ifneq ($(feature-cxx), 1)
+    msg := $(warning No g++ found, disable clang and llvm support. Please install g++)
+  else
+    $(call feature_check,llvm)
+    $(call feature_check,llvm-version)
+    ifneq ($(feature-llvm), 1)
+      msg := $(warning No suitable libLLVM found, disabling builtin clang and LLVM support. Please install llvm-dev(el) (>= 3.9.0))
+    else
+      $(call feature_check,clang)
+      ifneq ($(feature-clang), 1)
+        msg := $(warning No suitable libclang found, disabling builtin clang and LLVM support. Please install libclang-dev(el) (>= 3.9.0))
+      else
+        CFLAGS += -DHAVE_LIBCLANGLLVM_SUPPORT
+        CXXFLAGS += -DHAVE_LIBCLANGLLVM_SUPPORT -I$(shell $(LLVM_CONFIG) --includedir)
+        $(call detected,CONFIG_CXX)
+        $(call detected,CONFIG_CLANGLLVM)
+	USE_CXX = 1
+	USE_LLVM = 1
+	USE_CLANG = 1
+        ifneq ($(feature-llvm-version),1)
+          msg := $(warning This version of LLVM is not tested. May cause build errors)
+        endif
+      endif
+    endif
+  endif
+endif
+
 # Among the variables below, these:
 #   perfexecdir
 #   template_dir
@@ -850,6 +917,7 @@ ifeq ($(VF),1)
  $(call print_var,sysconfdir)
  $(call print_var,LIBUNWIND_DIR)
  $(call print_var,LIBDW_DIR)
+  $(call print_var,JDIR)

  ifeq ($(dwarf-post-unwind),1)
    $(call feature_print_text,"DWARF post unwind library", $(dwarf-post-unwind-text))

--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
--- a/tools/perf/arch/arm/annotate/instructions.c
+++ b/tools/perf/arch/arm/annotate/instructions.c
+#include <sys/types.h>
+#include <regex.h>
+
+struct arm_annotate {
+	regex_t call_insn,
+		jump_insn;
+};
+
+static struct ins_ops *arm__associate_instruction_ops(struct arch *arch, const char *name)
+{
+	struct arm_annotate *arm = arch->priv;
+	struct ins_ops *ops;
+	regmatch_t match[2];
+
+	if (!regexec(&arm->call_insn, name, 2, match, 0))
+		ops = &call_ops;
+	else if (!regexec(&arm->jump_insn, name, 2, match, 0))
+		ops = &jump_ops;
+	else
+		return NULL;
+
+	arch__associate_ins_ops(arch, name, ops);
+	return ops;
+}
+
+static int arm__annotate_init(struct arch *arch)
+{
+	struct arm_annotate *arm;
+	int err;
+
+	if (arch->initialized)
+		return 0;
+
+	arm = zalloc(sizeof(*arm));
+	if (!arm)
+		return -1;
+
+#define ARM_CONDS "(cc|cs|eq|ge|gt|hi|le|ls|lt|mi|ne|pl|vc|vs)"
+	err = regcomp(&arm->call_insn, "^blx?" ARM_CONDS "?$", REG_EXTENDED);
+	if (err)
+		goto out_free_arm;
+	err = regcomp(&arm->jump_insn, "^bx?" ARM_CONDS "?$", REG_EXTENDED);
+	if (err)
+		goto out_free_call;
+#undef ARM_CONDS
+
+	arch->initialized = true;
+	arch->priv	  = arm;
+	arch->associate_instruction_ops   = arm__associate_instruction_ops;
+	arch->objdump.comment_char	  = ';';
+	arch->objdump.skip_functions_char = '+';
+	return 0;
+
+out_free_call:
+	regfree(&arm->call_insn);
+out_free_arm:
+	free(arm);
+	return -1;
+}
--- a/tools/perf/arch/arm/util/cs-etm.c
+++ b/tools/perf/arch/arm/util/cs-etm.c
@@ -575,8 +575,6 @@ static FILE *cs_device__open_file(const char *name)
 	snprintf(path, PATH_MAX,
 		 "%s" CS_BUS_DEVICE_PATH "%s", sysfs, name);

-	printf("path: %s\n", path);
-
 	if (stat(path, &st) < 0)
 		return NULL;


--- a/tools/perf/arch/arm64/annotate/instructions.c
+++ b/tools/perf/arch/arm64/annotate/instructions.c
+#include <sys/types.h>
+#include <regex.h>
+
+struct arm64_annotate {
+	regex_t call_insn,
+		jump_insn;
+};
+
+static struct ins_ops *arm64__associate_instruction_ops(struct arch *arch, const char *name)
+{
+	struct arm64_annotate *arm = arch->priv;
+	struct ins_ops *ops;
+	regmatch_t match[2];
+
+	if (!regexec(&arm->jump_insn, name, 2, match, 0))
+		ops = &jump_ops;
+	else if (!regexec(&arm->call_insn, name, 2, match, 0))
+		ops = &call_ops;
+	else if (!strcmp(name, "ret"))
+		ops = &ret_ops;
+	else
+		return NULL;
+
+	arch__associate_ins_ops(arch, name, ops);
+	return ops;
+}
+
+static int arm64__annotate_init(struct arch *arch)
+{
+	struct arm64_annotate *arm;
+	int err;
+
+	if (arch->initialized)
+		return 0;
+
+	arm = zalloc(sizeof(*arm));
+	if (!arm)
+		return -1;
+
+	/* bl, blr */
+	err = regcomp(&arm->call_insn, "^blr?$", REG_EXTENDED);
+	if (err)
+		goto out_free_arm;
+	/* b, b.cond, br, cbz/cbnz, tbz/tbnz */
+	err = regcomp(&arm->jump_insn, "^[ct]?br?\\.?(cc|cs|eq|ge|gt|hi|le|ls|lt|mi|ne|pl)?n?z?$",
+		      REG_EXTENDED);
+	if (err)
+		goto out_free_call;
+
+	arch->initialized = true;
+	arch->priv	  = arm;
+	arch->associate_instruction_ops   = arm64__associate_instruction_ops;
+	arch->objdump.comment_char	  = ';';
+	arch->objdump.skip_functions_char = '+';
+	return 0;
+
+out_free_call:
+	regfree(&arm->call_insn);
+out_free_arm:
+	free(arm);
+	return -1;
+}
--- a/tools/perf/arch/powerpc/annotate/instructions.c
+++ b/tools/perf/arch/powerpc/annotate/instructions.c
+static struct ins_ops *powerpc__associate_instruction_ops(struct arch *arch, const char *name)
+{
+	int i;
+	struct ins_ops *ops;
+
+	/*
+	 * - Interested only if instruction starts with 'b'.
+	 * - Few start with 'b', but aren't branch instructions.
+	 */
+	if (name[0] != 'b'             ||
+	    !strncmp(name, "bcd", 3)   ||
+	    !strncmp(name, "brinc", 5) ||
+	    !strncmp(name, "bper", 4))
+		return NULL;
+
+	ops = &jump_ops;
+
+	i = strlen(name) - 1;
+	if (i < 0)
+		return NULL;
+
+	/* ignore optional hints at the end of the instructions */
+	if (name[i] == '+' || name[i] == '-')
+		i--;
+
+	if (name[i] == 'l' || (name[i] == 'a' && name[i-1] == 'l')) {
+		/*
+		 * if the instruction ends up with 'l' or 'la', then
+		 * those are considered 'calls' since they update LR.
+		 * ... except for 'bnl' which is branch if not less than
+		 * and the absolute form of the same.
+		 */
+		if (strcmp(name, "bnl") && strcmp(name, "bnl+") &&
+		    strcmp(name, "bnl-") && strcmp(name, "bnla") &&
+		    strcmp(name, "bnla+") && strcmp(name, "bnla-"))
+			ops = &call_ops;
+	}
+	if (name[i] == 'r' && name[i-1] == 'l')
+		/*
+		 * instructions ending with 'lr' are considered to be
+		 * return instructions
+		 */
+		ops = &ret_ops;
+
+	arch__associate_ins_ops(arch, name, ops);
+	return ops;
+}
+
+static int powerpc__annotate_init(struct arch *arch)
+{
+	if (!arch->initialized) {
+		arch->initialized = true;
+		arch->associate_instruction_ops = powerpc__associate_instruction_ops;
+		arch->objdump.comment_char      = '#';
+	}
+
+	return 0;
+}
--- a/tools/perf/arch/x86/annotate/instructions.c
+++ b/tools/perf/arch/x86/annotate/instructions.c
+static struct ins x86__instructions[] = {
+	{ .name = "add",	.ops = &mov_ops,  },
+	{ .name = "addl",	.ops = &mov_ops,  },
+	{ .name = "addq",	.ops = &mov_ops,  },
+	{ .name = "addw",	.ops = &mov_ops,  },
+	{ .name = "and",	.ops = &mov_ops,  },
+	{ .name = "bts",	.ops = &mov_ops,  },
+	{ .name = "call",	.ops = &call_ops, },
+	{ .name = "callq",	.ops = &call_ops, },
+	{ .name = "cmp",	.ops = &mov_ops,  },
+	{ .name = "cmpb",	.ops = &mov_ops,  },
+	{ .name = "cmpl",	.ops = &mov_ops,  },
+	{ .name = "cmpq",	.ops = &mov_ops,  },
+	{ .name = "cmpw",	.ops = &mov_ops,  },
+	{ .name = "cmpxch",	.ops = &mov_ops,  },
+	{ .name = "dec",	.ops = &dec_ops,  },
+	{ .name = "decl",	.ops = &dec_ops,  },
+	{ .name = "imul",	.ops = &mov_ops,  },
+	{ .name = "inc",	.ops = &dec_ops,  },
+	{ .name = "incl",	.ops = &dec_ops,  },
+	{ .name = "ja",		.ops = &jump_ops, },
+	{ .name = "jae",	.ops = &jump_ops, },
+	{ .name = "jb",		.ops = &jump_ops, },
+	{ .name = "jbe",	.ops = &jump_ops, },
+	{ .name = "jc",		.ops = &jump_ops, },
+	{ .name = "jcxz",	.ops = &jump_ops, },
+	{ .name = "je",		.ops = &jump_ops, },
+	{ .name = "jecxz",	.ops = &jump_ops, },
+	{ .name = "jg",		.ops = &jump_ops, },
+	{ .name = "jge",	.ops = &jump_ops, },
+	{ .name = "jl",		.ops = &jump_ops, },
+	{ .name = "jle",	.ops = &jump_ops, },
+	{ .name = "jmp",	.ops = &jump_ops, },
+	{ .name = "jmpq",	.ops = &jump_ops, },
+	{ .name = "jna",	.ops = &jump_ops, },
+	{ .name = "jnae",	.ops = &jump_ops, },
+	{ .name = "jnb",	.ops = &jump_ops, },
+	{ .name = "jnbe",	.ops = &jump_ops, },
+	{ .name = "jnc",	.ops = &jump_ops, },
+	{ .name = "jne",	.ops = &jump_ops, },
+	{ .name = "jng",	.ops = &jump_ops, },
+	{ .name = "jnge",	.ops = &jump_ops, },
+	{ .name = "jnl",	.ops = &jump_ops, },
+	{ .name = "jnle",	.ops = &jump_ops, },
+	{ .name = "jno",	.ops = &jump_ops, },
+	{ .name = "jnp",	.ops = &jump_ops, },
+	{ .name = "jns",	.ops = &jump_ops, },
+	{ .name = "jnz",	.ops = &jump_ops, },
+	{ .name = "jo",		.ops = &jump_ops, },
+	{ .name = "jp",		.ops = &jump_ops, },
+	{ .name = "jpe",	.ops = &jump_ops, },
+	{ .name = "jpo",	.ops = &jump_ops, },
+	{ .name = "jrcxz",	.ops = &jump_ops, },
+	{ .name = "js",		.ops = &jump_ops, },
+	{ .name = "jz",		.ops = &jump_ops, },
+	{ .name = "lea",	.ops = &mov_ops,  },
+	{ .name = "lock",	.ops = &lock_ops, },
+	{ .name = "mov",	.ops = &mov_ops,  },
+	{ .name = "movb",	.ops = &mov_ops,  },
+	{ .name = "movdqa",	.ops = &mov_ops,  },
+	{ .name = "movl",	.ops = &mov_ops,  },
+	{ .name = "movq",	.ops = &mov_ops,  },
+	{ .name = "movslq",	.ops = &mov_ops,  },
+	{ .name = "movzbl",	.ops = &mov_ops,  },
+	{ .name = "movzwl",	.ops = &mov_ops,  },
+	{ .name = "nop",	.ops = &nop_ops,  },
+	{ .name = "nopl",	.ops = &nop_ops,  },
+	{ .name = "nopw",	.ops = &nop_ops,  },
+	{ .name = "or",		.ops = &mov_ops,  },
+	{ .name = "orl",	.ops = &mov_ops,  },
+	{ .name = "test",	.ops = &mov_ops,  },
+	{ .name = "testb",	.ops = &mov_ops,  },
+	{ .name = "testl",	.ops = &mov_ops,  },
+	{ .name = "xadd",	.ops = &mov_ops,  },
+	{ .name = "xbeginl",	.ops = &jump_ops, },
+	{ .name = "xbeginq",	.ops = &jump_ops, },
+	{ .name = "retq",	.ops = &ret_ops,  },
+};
--- a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
@@ -335,6 +335,9 @@
 326	common	copy_file_range		sys_copy_file_range
 327	64	preadv2			sys_preadv2
 328	64	pwritev2		sys_pwritev2
+329	common	pkey_mprotect		sys_pkey_mprotect
+330	common	pkey_alloc		sys_pkey_alloc
+331	common	pkey_free		sys_pkey_free

 #
 # x32-specific system call numbers start at 512 to avoid cache impact
@@ -374,5 +377,5 @@
 543	x32	io_setup		compat_sys_io_setup
 544	x32	io_submit		compat_sys_io_submit
 545	x32	execveat		compat_sys_execveat/ptregs
-534	x32	preadv2			compat_sys_preadv2
-535	x32	pwritev2		compat_sys_pwritev2
+546	x32	preadv2			compat_sys_preadv64v2
+547	x32	pwritev2		compat_sys_pwritev64v2
--- a/tools/perf/arch/x86/tests/arch-tests.c
+++ b/tools/perf/arch/x86/tests/arch-tests.c
@@ -4,27 +4,27 @@

 struct test arch_tests[] = {
 	{
-		.desc = "x86 rdpmc test",
+		.desc = "x86 rdpmc",
 		.func = test__rdpmc,
 	},
 	{
-		.desc = "Test converting perf time to TSC",
+		.desc = "Convert perf time to TSC",
 		.func = test__perf_time_to_tsc,
 	},
 #ifdef HAVE_DWARF_UNWIND_SUPPORT
 	{
-		.desc = "Test dwarf unwind",
+		.desc = "DWARF unwind",
 		.func = test__dwarf_unwind,
 	},
 #endif
 #ifdef HAVE_AUXTRACE_SUPPORT
 	{
-		.desc = "Test x86 instruction decoder - new instructions",
+		.desc = "x86 instruction decoder - new instructions",
 		.func = test__insn_x86,
 	},
 #endif
 	{
-		.desc = "Test intel cqm nmi context read",
+		.desc = "Intel cqm nmi context read",
 		.func = test__intel_cqm_count_nmi_context,
 	},
 	{

--- a/tools/perf/bench/futex-hash.c
+++ b/tools/perf/bench/futex-hash.c
@@ -63,8 +63,9 @@ static const char * const bench_futex_hash_usage[] = {
 static void *workerfn(void *arg)
 {
 	int ret;
-	unsigned int i;
 	struct worker *w = (struct worker *) arg;
+	unsigned int i;
+	unsigned long ops = w->ops; /* avoid cacheline bouncing */

 	pthread_mutex_lock(&thread_lock);
 	threads_starting--;
@@ -74,7 +75,7 @@ static void *workerfn(void *arg)
 	pthread_mutex_unlock(&thread_lock);

 	do {
-		for (i = 0; i < nfutexes; i++, w->ops++) {
+		for (i = 0; i < nfutexes; i++, ops++) {
 			/*
 			 * We want the futex calls to fail in order to stress
 			 * the hashing of uaddr and not measure other steps,
@@ -88,6 +89,7 @@ static void *workerfn(void *arg)
 		}
 	}  while (!done);

+	w->ops = ops;
 	return NULL;
 }

@@ -128,6 +130,8 @@ int bench_futex_hash(int argc, const char **argv,
 	}

 	ncpus = sysconf(_SC_NPROCESSORS_ONLN);
+	nsecs = futexbench_sanitize_numeric(nsecs);
+	nfutexes = futexbench_sanitize_numeric(nfutexes);

 	sigfillset(&act.sa_mask);
 	act.sa_sigaction = toggle_done;
@@ -135,6 +139,8 @@ int bench_futex_hash(int argc, const char **argv,

 	if (!nthreads) /* default to the number of CPUs */
 		nthreads = ncpus;
+	else
+		nthreads = futexbench_sanitize_numeric(nthreads);

 	worker = calloc(nthreads, sizeof(*worker));
 	if (!worker)

--- a/tools/perf/bench/futex-lock-pi.c
+++ b/tools/perf/bench/futex-lock-pi.c
@@ -75,6 +75,7 @@ static void toggle_done(int sig __maybe_unused,
 static void *workerfn(void *arg)
 {
 	struct worker *w = (struct worker *) arg;
+	unsigned long ops = w->ops;

 	pthread_mutex_lock(&thread_lock);
 	threads_starting--;
@@ -103,9 +104,10 @@ static void *workerfn(void *arg)
 		if (ret && !silent)
 			warn("thread %d: Could not unlock pi-lock for %p (%d)",
 			     w->tid, w->futex, ret);
-		w->ops++; /* account for thread's share of work */
+		ops++; /* account for thread's share of work */
 	}  while (!done);

+	w->ops = ops;
 	return NULL;
 }

@@ -150,6 +152,7 @@ int bench_futex_lock_pi(int argc, const char **argv,
 		goto err;

 	ncpus = sysconf(_SC_NPROCESSORS_ONLN);
+	nsecs = futexbench_sanitize_numeric(nsecs);

 	sigfillset(&act.sa_mask);
 	act.sa_sigaction = toggle_done;
@@ -157,6 +160,8 @@ int bench_futex_lock_pi(int argc, const char **argv,

 	if (!nthreads)
 		nthreads = ncpus;
+	else
+		nthreads = futexbench_sanitize_numeric(nthreads);

 	worker = calloc(nthreads, sizeof(*worker));
 	if (!worker)

--- a/tools/perf/bench/futex-requeue.c
+++ b/tools/perf/bench/futex-requeue.c
@@ -128,6 +128,8 @@ int bench_futex_requeue(int argc, const char **argv,

 	if (!nthreads)
 		nthreads = ncpus;
+	else
+		nthreads = futexbench_sanitize_numeric(nthreads);

 	worker = calloc(nthreads, sizeof(*worker));
 	if (!worker)

--- a/tools/perf/bench/futex-wake-parallel.c
+++ b/tools/perf/bench/futex-wake-parallel.c
@@ -217,8 +217,12 @@ int bench_futex_wake_parallel(int argc, const char **argv,
 	sigaction(SIGINT, &act, NULL);

 	ncpus = sysconf(_SC_NPROCESSORS_ONLN);
+	nwaking_threads = futexbench_sanitize_numeric(nwaking_threads);
+
 	if (!nblocked_threads)
 		nblocked_threads = ncpus;
+	else
+		nblocked_threads = futexbench_sanitize_numeric(nblocked_threads);

 	/* some sanity checks */
 	if (nwaking_threads > nblocked_threads || !nwaking_threads)

--- a/tools/perf/bench/futex-wake.c
+++ b/tools/perf/bench/futex-wake.c
@@ -129,6 +129,7 @@ int bench_futex_wake(int argc, const char **argv,
 	}

 	ncpus = sysconf(_SC_NPROCESSORS_ONLN);
+	nwakes = futexbench_sanitize_numeric(nwakes);

 	sigfillset(&act.sa_mask);
 	act.sa_sigaction = toggle_done;
@@ -136,6 +137,8 @@ int bench_futex_wake(int argc, const char **argv,

 	if (!nthreads)
 		nthreads = ncpus;
+	else
+		nthreads = futexbench_sanitize_numeric(nthreads);

 	worker = calloc(nthreads, sizeof(*worker));
 	if (!worker)

--- a/tools/perf/bench/futex.h
+++ b/tools/perf/bench/futex.h
@@ -7,6 +7,7 @@
 #ifndef _FUTEX_H
 #define _FUTEX_H

+#include <stdlib.h>
 #include <unistd.h>
 #include <sys/syscall.h>
 #include <sys/types.h>
@@ -99,4 +100,7 @@ static inline int pthread_attr_setaffinity_np(pthread_attr_t *attr,
 }
 #endif

+/* User input sanitation */
+#define futexbench_sanitize_numeric(__n) abs((__n))
+
 #endif /* _FUTEX_H */
--- a/tools/perf/bench/mem-functions.c
+++ b/tools/perf/bench/mem-functions.c
@@ -106,9 +106,10 @@ static double timeval2double(struct timeval *ts)

 struct bench_mem_info {
 	const struct function *functions;
-	u64 (*do_cycles)(const struct function *r, size_t size);
-	double (*do_gettimeofday)(const struct function *r, size_t size);
+	u64 (*do_cycles)(const struct function *r, size_t size, void *src, void *dst);
+	double (*do_gettimeofday)(const struct function *r, size_t size, void *src, void *dst);
 	const char *const *usage;
+	bool alloc_src;
 };

 static void __bench_mem_function(struct bench_mem_info *info, int r_idx, size_t size, double size_total)
@@ -116,16 +117,26 @@ static void __bench_mem_function(struct bench_mem_info *info, int r_idx, size_t
 	const struct function *r = &info->functions[r_idx];
 	double result_bps = 0.0;
 	u64 result_cycles = 0;
+	void *src = NULL, *dst = zalloc(size);

 	printf("# function '%s' (%s)\n", r->name, r->desc);

+	if (dst == NULL)
+		goto out_alloc_failed;
+
+	if (info->alloc_src) {
+		src = zalloc(size);
+		if (src == NULL)
+			goto out_alloc_failed;
+	}
+
 	if (bench_format == BENCH_FORMAT_DEFAULT)
 		printf("# Copying %s bytes ...\n\n", size_str);

 	if (use_cycles) {
-		result_cycles = info->do_cycles(r, size);
+		result_cycles = info->do_cycles(r, size, src, dst);
 	} else {
-		result_bps = info->do_gettimeofday(r, size);
+		result_bps = info->do_gettimeofday(r, size, src, dst);
 	}

 	switch (bench_format) {
@@ -149,6 +160,14 @@ static void __bench_mem_function(struct bench_mem_info *info, int r_idx, size_t
 		BUG_ON(1);
 		break;
 	}
+
+out_free:
+	free(src);
+	free(dst);
+	return;
+out_alloc_failed:
+	printf("# Memory allocation failed - maybe size (%s) is too large?\n", size_str);
+	goto out_free;
 }

 static int bench_mem_common(int argc, const char **argv, struct bench_mem_info *info)
@@ -201,28 +220,14 @@ static int bench_mem_common(int argc, const char **argv, struct bench_mem_info *
 	return 0;
 }

-static void memcpy_alloc_mem(void **dst, void **src, size_t size)
-{
-	*dst = zalloc(size);
-	if (!*dst)
-		die("memory allocation failed - maybe size is too large?\n");
-
-	*src = zalloc(size);
-	if (!*src)
-		die("memory allocation failed - maybe size is too large?\n");
-
-	/* Make sure to always prefault zero pages even if MMAP_THRESH is crossed: */
-	memset(*src, 0, size);
-}
-
-static u64 do_memcpy_cycles(const struct function *r, size_t size)
+static u64 do_memcpy_cycles(const struct function *r, size_t size, void *src, void *dst)
 {
 	u64 cycle_start = 0ULL, cycle_end = 0ULL;
-	void *src = NULL, *dst = NULL;
 	memcpy_t fn = r->fn.memcpy;
 	int i;

-	memcpy_alloc_mem(&dst, &src, size);
+	/* Make sure to always prefault zero pages even if MMAP_THRESH is crossed: */
+	memset(src, 0, size);

 	/*
 	 * We prefault the freshly allocated memory range here,
@@ -235,20 +240,15 @@ static u64 do_memcpy_cycles(const struct function *r, size_t size)
 		fn(dst, src, size);
 	cycle_end = get_cycles();

-	free(src);
-	free(dst);
 	return cycle_end - cycle_start;
 }

-static double do_memcpy_gettimeofday(const struct function *r, size_t size)
+static double do_memcpy_gettimeofday(const struct function *r, size_t size, void *src, void *dst)
 {
 	struct timeval tv_start, tv_end, tv_diff;
 	memcpy_t fn = r->fn.memcpy;
-	void *src = NULL, *dst = NULL;
 	int i;

-	memcpy_alloc_mem(&dst, &src, size);
-
 	/*
 	 * We prefault the freshly allocated memory range here,
 	 * to not measure page fault overhead:
@@ -262,9 +262,6 @@ static double do_memcpy_gettimeofday(const struct function *r, size_t size)

 	timersub(&tv_end, &tv_start, &tv_diff);

-	free(src);
-	free(dst);
-
 	return (double)(((double)size * nr_loops) / timeval2double(&tv_diff));
 }

@@ -294,27 +291,18 @@ int bench_mem_memcpy(int argc, const char **argv, const char *prefix __maybe_unu
 		.do_cycles		= do_memcpy_cycles,
 		.do_gettimeofday	= do_memcpy_gettimeofday,
 		.usage			= bench_mem_memcpy_usage,
+		.alloc_src              = true,
 	};

 	return bench_mem_common(argc, argv, &info);
 }

-static void memset_alloc_mem(void **dst, size_t size)
-{
-	*dst = zalloc(size);
-	if (!*dst)
-		die("memory allocation failed - maybe size is too large?\n");
-}
-
-static u64 do_memset_cycles(const struct function *r, size_t size)
+static u64 do_memset_cycles(const struct function *r, size_t size, void *src __maybe_unused, void *dst)
 {
 	u64 cycle_start = 0ULL, cycle_end = 0ULL;
 	memset_t fn = r->fn.memset;
-	void *dst = NULL;
 	int i;

-	memset_alloc_mem(&dst, size);
-
 	/*
 	 * We prefault the freshly allocated memory range here,
 	 * to not measure page fault overhead:
@@ -326,19 +314,15 @@ static u64 do_memset_cycles(const struct function *r, size_t size)
 		fn(dst, i, size);
 	cycle_end = get_cycles();

-	free(dst);
 	return cycle_end - cycle_start;
 }

-static double do_memset_gettimeofday(const struct function *r, size_t size)
+static double do_memset_gettimeofday(const struct function *r, size_t size, void *src __maybe_unused, void *dst)
 {
 	struct timeval tv_start, tv_end, tv_diff;
 	memset_t fn = r->fn.memset;
-	void *dst = NULL;
 	int i;

-	memset_alloc_mem(&dst, size);
-
 	/*
 	 * We prefault the freshly allocated memory range here,
 	 * to not measure page fault overhead:
@@ -352,7 +336,6 @@ static double do_memset_gettimeofday(const struct function *r, size_t size)

 	timersub(&tv_end, &tv_start, &tv_diff);

-	free(dst);
 	return (double)(((double)size * nr_loops) / timeval2double(&tv_diff));
 }


--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
--- a/tools/perf/builtin-config.c
+++ b/tools/perf/builtin-config.c
@@ -17,7 +17,7 @@
 static bool use_system_config, use_user_config;

 static const char * const config_usage[] = {
-	"perf config [<file-option>] [options]",
+	"perf config [<file-option>] [options] [section.name[=value] ...]",
 	NULL
 };

@@ -33,6 +33,73 @@ static struct option config_options[] = {
 	OPT_END()
 };

+static int set_config(struct perf_config_set *set, const char *file_name,
+		      const char *var, const char *value)
+{
+	struct perf_config_section *section = NULL;
+	struct perf_config_item *item = NULL;
+	const char *first_line = "# this file is auto-generated.";
+	FILE *fp;
+
+	if (set == NULL)
+		return -1;
+
+	fp = fopen(file_name, "w");
+	if (!fp)
+		return -1;
+
+	perf_config_set__collect(set, file_name, var, value);
+	fprintf(fp, "%s\n", first_line);
+
+	/* overwrite configvariables */
+	perf_config_items__for_each_entry(&set->sections, section) {
+		if (!use_system_config && section->from_system_config)
+			continue;
+		fprintf(fp, "[%s]\n", section->name);
+
+		perf_config_items__for_each_entry(&section->items, item) {
+			if (!use_system_config && section->from_system_config)
+				continue;
+			if (item->value)
+				fprintf(fp, "\t%s = %s\n",
+					item->name, item->value);
+		}
+	}
+	fclose(fp);
+
+	return 0;
+}
+
+static int show_spec_config(struct perf_config_set *set, const char *var)
+{
+	struct perf_config_section *section;
+	struct perf_config_item *item;
+
+	if (set == NULL)
+		return -1;
+
+	perf_config_items__for_each_entry(&set->sections, section) {
+		if (prefixcmp(var, section->name) != 0)
+			continue;
+
+		perf_config_items__for_each_entry(&section->items, item) {
+			const char *name = var + strlen(section->name) + 1;
+
+			if (strcmp(name, item->name) == 0) {
+				char *value = item->value;
+
+				if (value) {
+					printf("%s=%s\n", var, value);
+					return 0;
+				}
+			}
+
+		}
+	}
+
+	return 0;
+}
+
 static int show_config(struct perf_config_set *set)
 {
 	struct perf_config_section *section;
@@ -52,9 +119,44 @@ static int show_config(struct perf_config_set *set)
 	return 0;
 }

+static int parse_config_arg(char *arg, char **var, char **value)
+{
+	const char *last_dot = strchr(arg, '.');
+
+	/*
+	 * Since "var" actually contains the section name and the real
+	 * config variable name separated by a dot, we have to know where the dot is.
+	 */
+	if (last_dot == NULL || last_dot == arg) {
+		pr_err("The config variable does not contain a section name: %s\n", arg);
+		return -1;
+	}
+	if (!last_dot[1]) {
+		pr_err("The config variable does not contain a variable name: %s\n", arg);
+		return -1;
+	}
+
+	*value = strchr(arg, '=');
+	if (*value == NULL)
+		*var = arg;
+	else if (!strcmp(*value, "=")) {
+		pr_err("The config variable does not contain a value: %s\n", arg);
+		return -1;
+	} else {
+		*value = *value + 1; /* excluding a first character '=' */
+		*var = strsep(&arg, "=");
+		if (*var[0] == '\0') {
+			pr_err("invalid config variable: %s\n", arg);
+			return -1;
+		}
+	}
+
+	return 0;
+}
+
 int cmd_config(int argc, const char **argv, const char *prefix __maybe_unused)
 {
-	int ret = 0;
+	int i, ret = 0;
 	struct perf_config_set *set;
 	char *user_config = mkpath("%s/.perfconfig", getenv("HOME"));

@@ -100,7 +202,36 @@ int cmd_config(int argc, const char **argv, const char *prefix __maybe_unused)
 		}
 		break;
 	default:
-		usage_with_options(config_usage, config_options);
+		if (argc) {
+			for (i = 0; argv[i]; i++) {
+				char *var, *value;
+				char *arg = strdup(argv[i]);
+
+				if (!arg) {
+					pr_err("%s: strdup failed\n", __func__);
+					ret = -1;
+					break;
+				}
+
+				if (parse_config_arg(arg, &var, &value) < 0) {
+					free(arg);
+					ret = -1;
+					break;
+				}
+
+				if (value == NULL)
+					ret = show_spec_config(set, var);
+				else {
+					const char *config_filename = config_exclusive_filename;
+
+					if (!config_exclusive_filename)
+						config_filename = user_config;
+					ret = set_config(set, config_filename, var, value);
+				}
+				free(arg);
+			}
+		} else
+			usage_with_options(config_usage, config_options);
 	}

 	perf_config_set__delete(set);

--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -11,6 +11,7 @@
 #include "util/session.h"
 #include "util/tool.h"
 #include "util/callchain.h"
+#include "util/time-utils.h"

 #include <subcmd/parse-options.h>
 #include "util/trace-event.h"
@@ -49,6 +50,7 @@ struct alloc_stat {
 	u64	ptr;
 	u64	bytes_req;
 	u64	bytes_alloc;
+	u64	last_alloc;
 	u32	hit;
 	u32	pingpong;

@@ -62,9 +64,13 @@ static struct rb_root root_alloc_sorted;
 static struct rb_root root_caller_stat;
 static struct rb_root root_caller_sorted;

-static unsigned long total_requested, total_allocated;
+static unsigned long total_requested, total_allocated, total_freed;
 static unsigned long nr_allocs, nr_cross_allocs;

+/* filters for controlling start and stop of time of analysis */
+static struct perf_time_interval ptime;
+const char *time_str;
+
 static int insert_alloc_stat(unsigned long call_site, unsigned long ptr,
 			     int bytes_req, int bytes_alloc, int cpu)
 {
@@ -105,6 +111,8 @@ static int insert_alloc_stat(unsigned long call_site, unsigned long ptr,
 	}
 	data->call_site = call_site;
 	data->alloc_cpu = cpu;
+	data->last_alloc = bytes_alloc;
+
 	return 0;
 }

@@ -223,6 +231,8 @@ static int perf_evsel__process_free_event(struct perf_evsel *evsel,
 	if (!s_alloc)
 		return 0;

+	total_freed += s_alloc->last_alloc;
+
 	if ((short)sample->cpu != s_alloc->alloc_cpu) {
 		s_alloc->pingpong++;

@@ -907,6 +917,15 @@ static int perf_evsel__process_page_free_event(struct perf_evsel *evsel,
 	return 0;
 }

+static bool perf_kmem__skip_sample(struct perf_sample *sample)
+{
+	/* skip sample based on time? */
+	if (perf_time__skip_sample(&ptime, sample->time))
+		return true;
+
+	return false;
+}
+
 typedef int (*tracepoint_handler)(struct perf_evsel *evsel,
 				  struct perf_sample *sample);

@@ -926,6 +945,9 @@ static int process_sample_event(struct perf_tool *tool __maybe_unused,
 		return -1;
 	}

+	if (perf_kmem__skip_sample(sample))
+		return 0;
+
 	dump_printf(" ... thread: %s:%d\n", thread__comm_str(thread), thread->tid);

 	if (evsel->handler != NULL) {
@@ -1128,6 +1150,11 @@ static void print_slab_summary(void)
 	printf("\n========================\n");
 	printf("Total bytes requested: %'lu\n", total_requested);
 	printf("Total bytes allocated: %'lu\n", total_allocated);
+	printf("Total bytes freed:     %'lu\n", total_freed);
+	if (total_allocated > total_freed) {
+		printf("Net total bytes allocated: %'lu\n",
+		total_allocated - total_freed);
+	}
 	printf("Total bytes wasted on internal fragmentation: %'lu\n",
 	       total_allocated - total_requested);
 	printf("Internal fragmentation: %f%%\n",
@@ -1884,6 +1911,8 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 	OPT_CALLBACK_NOOPT(0, "page", NULL, NULL, "Analyze page allocator",
 			   parse_page_opt),
 	OPT_BOOLEAN(0, "live", &live_page, "Show live page stat"),
+	OPT_STRING(0, "time", &time_str, "str",
+		   "Time span of interest (start,stop)"),
 	OPT_END()
 	};
 	const char *const kmem_subcommands[] = { "record", "stat", NULL };
@@ -1944,6 +1973,11 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)

 	symbol__init(&session->header.env);

+	if (perf_time__parse_str(&ptime, time_str) != 0) {
+		pr_err("Invalid time string\n");
+		return -EINVAL;
+	}
+
 	if (!strcmp(argv[0], "stat")) {
 		setlocale(LC_ALL, "");


--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -37,6 +37,7 @@
 #include "util/llvm-utils.h"
 #include "util/bpf-loader.h"
 #include "util/trigger.h"
+#include "util/perf-hooks.h"
 #include "asm/bug.h"

 #include <unistd.h>
@@ -206,6 +207,12 @@ static void sig_handler(int sig)
 	done = 1;
 }

+static void sigsegv_handler(int sig)
+{
+	perf_hooks__recover();
+	sighandler_dump_stack(sig);
+}
+
 static void record__sig_exit(void)
 {
 	if (signr == -1)
@@ -833,6 +840,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 	signal(SIGCHLD, sig_handler);
 	signal(SIGINT, sig_handler);
 	signal(SIGTERM, sig_handler);
+	signal(SIGSEGV, sigsegv_handler);

 	if (rec->opts.auxtrace_snapshot_mode || rec->switch_output) {
 		signal(SIGUSR2, snapshot_sig_handler);
@@ -970,6 +978,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)

 	trigger_ready(&auxtrace_snapshot_trigger);
 	trigger_ready(&switch_output_trigger);
+	perf_hooks__invoke_record_start();
 	for (;;) {
 		unsigned long long hits = rec->samples;

@@ -1114,6 +1123,8 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 		}
 	}

+	perf_hooks__invoke_record_end();
+
 	if (!err && !quiet) {
 		char samples[128];
 		const char *postfix = rec->timestamp_filename ?

--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -36,7 +36,7 @@
 #include "util/hist.h"
 #include "util/data.h"
 #include "arch/common.h"
-
+#include "util/time-utils.h"
 #include "util/auxtrace.h"

 #include <dlfcn.h>
@@ -59,6 +59,8 @@ struct report {
 	const char		*pretty_printing_style;
 	const char		*cpu_list;
 	const char		*symbol_filter_str;
+	const char		*time_str;
+	struct perf_time_interval ptime;
 	float			min_percent;
 	u64			nr_entries;
 	u64			queue_size;
@@ -158,6 +160,9 @@ static int process_sample_event(struct perf_tool *tool,
 	};
 	int ret = 0;

+	if (perf_time__skip_sample(&rep->ptime, sample->time))
+		return 0;
+
 	if (machine__resolve(machine, &al, sample) < 0) {
 		pr_debug("problem processing %d event, skipping it.\n",
 			 event->header.type);
@@ -207,11 +212,14 @@ static int process_read_event(struct perf_tool *tool,

 	if (rep->show_threads) {
 		const char *name = evsel ? perf_evsel__name(evsel) : "unknown";
-		perf_read_values_add_value(&rep->show_threads_values,
+		int err = perf_read_values_add_value(&rep->show_threads_values,
 					   event->read.pid, event->read.tid,
 					   event->read.id,
 					   name,
 					   event->read.value);
+
+		if (err)
+			return err;
 	}

 	dump_printf(": %d %d %s %" PRIu64 "\n", event->read.pid, event->read.tid,
@@ -539,8 +547,11 @@ static int __cmd_report(struct report *rep)
 		}
 	}

-	if (rep->show_threads)
-		perf_read_values_init(&rep->show_threads_values);
+	if (rep->show_threads) {
+		ret = perf_read_values_init(&rep->show_threads_values);
+		if (ret)
+			return ret;
+	}

 	ret = report__setup_sample_type(rep);
 	if (ret) {
@@ -824,6 +835,8 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 	OPT_CALLBACK_DEFAULT(0, "stdio-color", NULL, "mode",
 			     "'always' (default), 'never' or 'auto' only applicable to --stdio mode",
 			     stdio__config_color, "always"),
+	OPT_STRING(0, "time", &report.time_str, "str",
+		   "Time span of interest (start,stop)"),
 	OPT_END()
 	};
 	struct perf_data_file file = {
@@ -905,6 +918,9 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 	if (itrace_synth_opts.last_branch)
 		has_br_stack = true;

+	if (has_br_stack && branch_call_mode)
+		symbol_conf.show_branchflag_count = true;
+
 	/*
 	 * Branch mode is a tristate:
 	 * -1 means default, so decide based on the file having branch data.
@@ -1006,6 +1022,11 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 	if (symbol__init(&session->header.env) < 0)
 		goto error;

+	if (perf_time__parse_str(&report.ptime, report.time_str) != 0) {
+		pr_err("Invalid time string\n");
+		return -EINVAL;
+	}
+
 	sort__setup_elide(stdout);

 	ret = __cmd_report(&report);

--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -22,6 +22,7 @@
 #include "util/thread_map.h"
 #include "util/stat.h"
 #include "util/thread-stack.h"
+#include "util/time-utils.h"
 #include <linux/bitmap.h>
 #include <linux/stringify.h>
 #include <linux/time64.h>
@@ -66,6 +67,8 @@ enum perf_output_field {
 	PERF_OUTPUT_WEIGHT	    = 1U << 18,
 	PERF_OUTPUT_BPF_OUTPUT	    = 1U << 19,
 	PERF_OUTPUT_CALLINDENT	    = 1U << 20,
+	PERF_OUTPUT_INSN	    = 1U << 21,
+	PERF_OUTPUT_INSNLEN	    = 1U << 22,
 };

 struct output_option {
@@ -93,6 +96,8 @@ struct output_option {
 	{.str = "weight",   .field = PERF_OUTPUT_WEIGHT},
 	{.str = "bpf-output",   .field = PERF_OUTPUT_BPF_OUTPUT},
 	{.str = "callindent", .field = PERF_OUTPUT_CALLINDENT},
+	{.str = "insn", .field = PERF_OUTPUT_INSN},
+	{.str = "insnlen", .field = PERF_OUTPUT_INSNLEN},
 };

 /* default set to maintain compatibility with current format */
@@ -437,7 +442,6 @@ static void print_sample_start(struct perf_sample *sample,
 {
 	struct perf_event_attr *attr = &evsel->attr;
 	unsigned long secs;
-	unsigned long usecs;
 	unsigned long long nsecs;

 	if (PRINT_FIELD(COMM)) {
@@ -467,11 +471,14 @@ static void print_sample_start(struct perf_sample *sample,
 		nsecs = sample->time;
 		secs = nsecs / NSEC_PER_SEC;
 		nsecs -= secs * NSEC_PER_SEC;
-		usecs = nsecs / NSEC_PER_USEC;
+
 		if (nanosecs)
 			printf("%5lu.%09llu: ", secs, nsecs);
-		else
-			printf("%5lu.%06lu: ", secs, usecs);
+		else {
+			char sample_time[32];
+			timestamp__scnprintf_usec(sample->time, sample_time, sizeof(sample_time));
+			printf("%12s: ", sample_time);
+		}
 	}
 }

@@ -624,6 +631,20 @@ static void print_sample_callindent(struct perf_sample *sample,
 		printf("%*s", spacing - len, "");
 }

+static void print_insn(struct perf_sample *sample,
+		       struct perf_event_attr *attr)
+{
+	if (PRINT_FIELD(INSNLEN))
+		printf(" ilen: %d", sample->insn_len);
+	if (PRINT_FIELD(INSN)) {
+		int i;
+
+		printf(" insn:");
+		for (i = 0; i < sample->insn_len; i++)
+			printf(" %02x", (unsigned char)sample->insn[i]);
+	}
+}
+
 static void print_sample_bts(struct perf_sample *sample,
 			     struct perf_evsel *evsel,
 			     struct thread *thread,
@@ -668,6 +689,8 @@ static void print_sample_bts(struct perf_sample *sample,
 	if (print_srcline_last)
 		map__fprintf_srcline(al->map, al->addr, "\n  ", stdout);

+	print_insn(sample, attr);
+
 	printf("\n");
 }

@@ -811,6 +834,8 @@ struct perf_script {
 	struct cpu_map		*cpus;
 	struct thread_map	*threads;
 	int			name_width;
+	const char              *time_str;
+	struct perf_time_interval ptime;
 };

 static int perf_evlist__max_name_len(struct perf_evlist *evlist)
@@ -911,7 +936,7 @@ static void process_event(struct perf_script *script,

 	if (perf_evsel__is_bpf_output(evsel) && PRINT_FIELD(BPF_OUTPUT))
 		print_sample_bpf_output(sample);
-
+	print_insn(sample, attr);
 	printf("\n");
 }

@@ -992,6 +1017,9 @@ static int process_sample_event(struct perf_tool *tool,
 	struct perf_script *scr = container_of(tool, struct perf_script, tool);
 	struct addr_location al;

+	if (perf_time__skip_sample(&scr->ptime, sample->time))
+		return 0;
+
 	if (debug_mode) {
 		if (sample->time < last_timestamp) {
 			pr_err("Samples misordered, previous: %" PRIu64
@@ -2124,11 +2152,13 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
 		     "Valid types: hw,sw,trace,raw. "
 		     "Fields: comm,tid,pid,time,cpu,event,trace,ip,sym,dso,"
 		     "addr,symoff,period,iregs,brstack,brstacksym,flags,"
-		     "bpf-output,callindent", parse_output_fields),
+		     "bpf-output,callindent,insn,insnlen", parse_output_fields),
 	OPT_BOOLEAN('a', "all-cpus", &system_wide,
 		    "system-wide collection from all CPUs"),
 	OPT_STRING('S', "symbols", &symbol_conf.sym_list_str, "symbol[,symbol...]",
 		   "only consider these symbols"),
+	OPT_STRING(0, "stop-bt", &symbol_conf.bt_stop_list_str, "symbol[,symbol...]",
+		   "Stop display of callgraph at these symbols"),
 	OPT_STRING('C', "cpu", &cpu_list, "cpu", "list of cpus to profile"),
 	OPT_STRING('c', "comms", &symbol_conf.comm_list_str, "comm[,comm...]",
 		   "only display events for these comms"),
@@ -2162,7 +2192,8 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
 			"Enable symbol demangling"),
 	OPT_BOOLEAN(0, "demangle-kernel", &symbol_conf.demangle_kernel,
 			"Enable kernel symbol demangling"),
-
+	OPT_STRING(0, "time", &script.time_str, "str",
+		   "Time span of interest (start,stop)"),
 	OPT_END()
 	};
 	const char * const script_subcommands[] = { "record", "report", NULL };
@@ -2441,6 +2472,12 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
 	if (err < 0)
 		goto out_delete;

+	/* needs to be parsed after looking up reference time */
+	if (perf_time__parse_str(&script.ptime, script.time_str) != 0) {
+		pr_err("Invalid time string\n");
+		return -EINVAL;
+	}
+
 	err = __cmd_script(&script);

 	flush_scripting();

--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -130,7 +130,7 @@ static int perf_top__parse_source(struct perf_top *top, struct hist_entry *he)
 		return err;
 	}

-	err = symbol__disassemble(sym, map, 0);
+	err = symbol__disassemble(sym, map, NULL, 0);
 	if (err == 0) {
 out_assign:
 		top->sym_filter_entry = he;

--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -74,8 +74,6 @@ struct trace {
 		size_t		nr;
 		int		*entries;
 	}			ev_qualifier_ids;
-	struct intlist		*tid_list;
-	struct intlist		*pid_list;
 	struct {
 		size_t		nr;
 		pid_t		*entries;
@@ -843,7 +841,6 @@ static size_t fprintf_duration(unsigned long t, FILE *fp)
 */
 struct thread_trace {
 	u64		  entry_time;
-	u64		  exit_time;
 	bool		  entry_pending;
 	unsigned long	  nr_events;
 	unsigned long	  pfmaj, pfmin;
@@ -1452,7 +1449,7 @@ static int trace__printf_interrupted_entry(struct trace *trace, struct perf_samp

 	duration = sample->time - ttrace->entry_time;

-	printed  = trace__fprintf_entry_head(trace, trace->current, duration, sample->time, trace->output);
+	printed  = trace__fprintf_entry_head(trace, trace->current, duration, ttrace->entry_time, trace->output);
 	printed += fprintf(trace->output, "%-70s) ...\n", ttrace->entry_str);
 	ttrace->entry_pending = false;

@@ -1499,7 +1496,7 @@ static int trace__sys_enter(struct trace *trace, struct perf_evsel *evsel,

 	if (sc->is_exit) {
 		if (!(trace->duration_filter || trace->summary_only || trace->min_stack)) {
-			trace__fprintf_entry_head(trace, thread, 1, sample->time, trace->output);
+			trace__fprintf_entry_head(trace, thread, 1, ttrace->entry_time, trace->output);
 			fprintf(trace->output, "%-70s)\n", ttrace->entry_str);
 		}
 	} else {
@@ -1571,8 +1568,6 @@ static int trace__sys_exit(struct trace *trace, struct perf_evsel *evsel,
 		++trace->stats.vfs_getname;
 	}

-	ttrace->exit_time = sample->time;
-
 	if (ttrace->entry_time) {
 		duration = sample->time - ttrace->entry_time;
 		if (trace__filter_duration(trace, duration))
@@ -1592,7 +1587,7 @@ static int trace__sys_exit(struct trace *trace, struct perf_evsel *evsel,
 	if (trace->summary_only)
 		goto out;

-	trace__fprintf_entry_head(trace, thread, duration, sample->time, trace->output);
+	trace__fprintf_entry_head(trace, thread, duration, ttrace->entry_time, trace->output);

 	if (ttrace->entry_pending) {
 		fprintf(trace->output, "%-70s", ttrace->entry_str);
@@ -1893,18 +1888,6 @@ static int trace__pgfault(struct trace *trace,
 	return err;
 }

-static bool skip_sample(struct trace *trace, struct perf_sample *sample)
-{
-	if ((trace->pid_list && intlist__find(trace->pid_list, sample->pid)) ||
-	    (trace->tid_list && intlist__find(trace->tid_list, sample->tid)))
-		return false;
-
-	if (trace->pid_list || trace->tid_list)
-		return true;
-
-	return false;
-}
-
 static void trace__set_base_time(struct trace *trace,
 				 struct perf_evsel *evsel,
 				 struct perf_sample *sample)
@@ -1929,11 +1912,13 @@ static int trace__process_sample(struct perf_tool *tool,
 				 struct machine *machine __maybe_unused)
 {
 	struct trace *trace = container_of(tool, struct trace, tool);
+	struct thread *thread;
 	int err = 0;

 	tracepoint_handler handler = evsel->handler;

-	if (skip_sample(trace, sample))
+	thread = machine__findnew_thread(trace->host, sample->pid, sample->tid);
+	if (thread && thread__is_filtered(thread))
 		return 0;

 	trace__set_base_time(trace, evsel, sample);
@@ -1946,27 +1931,6 @@ static int trace__process_sample(struct perf_tool *tool,
 	return err;
 }

-static int parse_target_str(struct trace *trace)
-{
-	if (trace->opts.target.pid) {
-		trace->pid_list = intlist__new(trace->opts.target.pid);
-		if (trace->pid_list == NULL) {
-			pr_err("Error parsing process id string\n");
-			return -EINVAL;
-		}
-	}
-
-	if (trace->opts.target.tid) {
-		trace->tid_list = intlist__new(trace->opts.target.tid);
-		if (trace->tid_list == NULL) {
-			pr_err("Error parsing thread id string\n");
-			return -EINVAL;
-		}
-	}
-
-	return 0;
-}
-
 static int trace__record(struct trace *trace, int argc, const char **argv)
 {
 	unsigned int rec_argc, i, j;
@@ -2310,12 +2274,17 @@ static int trace__run(struct trace *trace, int argc, const char **argv)
 	if (err < 0)
 		goto out_error_mmap;

-	if (!target__none(&trace->opts.target))
+	if (!target__none(&trace->opts.target) && !trace->opts.initial_delay)
 		perf_evlist__enable(evlist);

 	if (forks)
 		perf_evlist__start_workload(evlist);

+	if (trace->opts.initial_delay) {
+		usleep(trace->opts.initial_delay * 1000);
+		perf_evlist__enable(evlist);
+	}
+
 	trace->multiple_threads = thread_map__pid(evlist->threads, 0) == -1 ||
 				  evlist->threads->nr > 1 ||
 				  perf_evlist__first(evlist)->attr.inherit;
@@ -2458,6 +2427,12 @@ static int trace__replay(struct trace *trace)
 	if (session == NULL)
 		return -1;

+	if (trace->opts.target.pid)
+		symbol_conf.pid_list_str = strdup(trace->opts.target.pid);
+
+	if (trace->opts.target.tid)
+		symbol_conf.tid_list_str = strdup(trace->opts.target.tid);
+
 	if (symbol__init(&session->header.env) < 0)
 		goto out;

@@ -2501,10 +2476,6 @@ static int trace__replay(struct trace *trace)
 			evsel->handler = trace__pgfault;
 	}

-	err = parse_target_str(trace);
-	if (err != 0)
-		goto out;
-
 	setup_pager();

 	err = perf_session__process_events(session);
@@ -2816,6 +2787,9 @@ int cmd_trace(int argc, const char **argv, const char *prefix __maybe_unused)
 		     "Default: kernel.perf_event_max_stack or " __stringify(PERF_MAX_STACK_DEPTH)),
 	OPT_UINTEGER(0, "proc-map-timeout", &trace.opts.proc_map_timeout,
 			"per thread proc mmap processing timeout in ms"),
+	OPT_UINTEGER('D', "delay", &trace.opts.initial_delay,
+		     "ms to wait before starting measurement after program "
+		     "start"),
 	OPT_END()
 	};
 	bool __maybe_unused max_stack_user_set = true;

--- a/tools/perf/builtin.h
+++ b/tools/perf/builtin.h
@@ -18,6 +18,7 @@ int cmd_bench(int argc, const char **argv, const char *prefix);
 int cmd_buildid_cache(int argc, const char **argv, const char *prefix);
 int cmd_buildid_list(int argc, const char **argv, const char *prefix);
 int cmd_config(int argc, const char **argv, const char *prefix);
+int cmd_c2c(int argc, const char **argv, const char *prefix);
 int cmd_diff(int argc, const char **argv, const char *prefix);
 int cmd_evlist(int argc, const char **argv, const char *prefix);
 int cmd_help(int argc, const char **argv, const char *prefix);

--- a/tools/perf/jvmti/Build
+++ b/tools/perf/jvmti/Build
+jvmti-y += libjvmti.o
+jvmti-y += jvmti_agent.o
+
+CFLAGS_jvmti         = -fPIC -DPIC -I$(JDIR)/include -I$(JDIR)/include/linux
+CFLAGS_REMOVE_jvmti  = -Wmissing-declarations
+CFLAGS_REMOVE_jvmti += -Wstrict-prototypes
+CFLAGS_REMOVE_jvmti += -Wextra
+CFLAGS_REMOVE_jvmti += -Wwrite-strings
--- a/tools/perf/jvmti/Makefile
+++ b/tools/perf/jvmti/Makefile
-ARCH=$(shell uname -m)
-
-ifeq ($(ARCH), x86_64)
-JARCH=amd64
-endif
-ifeq ($(ARCH), armv7l)
-JARCH=armhf
-endif
-ifeq ($(ARCH), armv6l)
-JARCH=armhf
-endif
-ifeq ($(ARCH), aarch64)
-JARCH=aarch64
-endif
-ifeq ($(ARCH), ppc64)
-JARCH=powerpc
-endif
-ifeq ($(ARCH), ppc64le)
-JARCH=powerpc
-endif
-
-DESTDIR=/usr/local
-
-VERSION=1
-REVISION=0
-AGE=0
-
-LN=ln -sf
-RM=rm
-
-SLIBJVMTI=libjvmti.so.$(VERSION).$(REVISION).$(AGE)
-VLIBJVMTI=libjvmti.so.$(VERSION)
-SLDFLAGS=-shared -Wl,-soname -Wl,$(VLIBJVMTI)
-SOLIBEXT=so
-
-# The following works at least on fedora 23, you may need the next
-# line for other distros.
-ifneq (,$(wildcard /usr/sbin/update-java-alternatives))
-JDIR=$(shell /usr/sbin/update-java-alternatives -l | head -1 | awk '{print $$3}')
-else
-  ifneq (,$(wildcard /usr/sbin/alternatives))
-    JDIR=$(shell alternatives --display java | tail -1 | cut -d' ' -f 5 | sed 's%/jre/bin/java.%%g')
-  endif
-endif
-ifndef JDIR
-$(error Could not find alternatives command, you need to set JDIR= to point to the root of your Java directory)
-else
-  ifeq (,$(wildcard $(JDIR)/include/jvmti.h))
-  $(error the openjdk development package appears to me missing, install and try again)
-  endif
-endif
-$(info Using Java from $(JDIR))
-# -lrt required in 32-bit mode for clock_gettime()
-LIBS=-lelf -lrt
-INCDIR=-I $(JDIR)/include -I $(JDIR)/include/linux
-
-TARGETS=$(SLIBJVMTI)
-
-SRCS=libjvmti.c jvmti_agent.c
-OBJS=$(SRCS:.c=.o)
-SOBJS=$(OBJS:.o=.lo)
-OPT=-O2 -g -Werror -Wall
-
-CFLAGS=$(INCDIR) $(OPT)
-
-all: $(TARGETS)
-
-.c.o:
-	$(CC) $(CFLAGS) -c $*.c
-.c.lo:
-	$(CC) -fPIC -DPIC $(CFLAGS) -c $*.c -o $*.lo
-
-$(OBJS) $(SOBJS): Makefile jvmti_agent.h ../util/jitdump.h
-
-$(SLIBJVMTI):  $(SOBJS)
-	$(CC) $(CFLAGS) $(SLDFLAGS)  -o $@ $(SOBJS) $(LIBS)
-	$(LN) $@ libjvmti.$(SOLIBEXT)
-
-clean:
-	$(RM) -f *.o *.so.* *.so *.lo
-
-install:
-	-mkdir -p $(DESTDIR)/lib
-	install -m 755 $(SLIBJVMTI) $(DESTDIR)/lib/
-	(cd $(DESTDIR)/lib; $(LN) $(SLIBJVMTI) $(VLIBJVMTI))
-	(cd $(DESTDIR)/lib; $(LN) $(SLIBJVMTI) libjvmti.$(SOLIBEXT))
-	ldconfig
-
-.SUFFIXES: .c .S .o .lo
--- a/tools/perf/jvmti/jvmti_agent.c
+++ b/tools/perf/jvmti/jvmti_agent.c
--- a/tools/perf/jvmti/libjvmti.c
+++ b/tools/perf/jvmti/libjvmti.c
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -43,6 +43,7 @@ static struct cmd_struct commands[] = {
 	{ "buildid-cache", cmd_buildid_cache, 0 },
 	{ "buildid-list", cmd_buildid_list, 0 },
 	{ "config",	cmd_config,	0 },
+	{ "c2c",	cmd_c2c,	0 },
 	{ "diff",	cmd_diff,	0 },
 	{ "evlist",	cmd_evlist,	0 },
 	{ "help",	cmd_help,	0 },

--- a/tools/perf/pmu-events/arch/powerpc/mapfile.csv
+++ b/tools/perf/pmu-events/arch/powerpc/mapfile.csv
+# Format:
+# 	PVR,Version,JSON/file/pathname,Type
+#
+# where
+# 	PVR	Processor version
+# 	Version could be used to track version of of JSON file
+# 		but currently unused.
+# 	JSON/file/pathname is the path to JSON file, relative
+# 		to tools/perf/pmu-events/arch/powerpc/.
+# 	Type is core, uncore etc
+#
+# Multiple PVRs could map to a single JSON file.
+#
+
+# Power8 entries
+004b0000,1,power8.json,core
+004b0201,1,power8.json,core
+004c0000,1,power8.json,core
+004d0000,1,power8.json,core
+004d0100,1,power8.json,core
+004d0200,1,power8.json,core
--- a/tools/perf/pmu-events/arch/powerpc/power8/cache.json
+++ b/tools/perf/pmu-events/arch/powerpc/power8/cache.json
--- a/tools/perf/pmu-events/arch/powerpc/power8/floating-point.json
+++ b/tools/perf/pmu-events/arch/powerpc/power8/floating-point.json
--- a/tools/perf/pmu-events/arch/powerpc/power8/frontend.json
+++ b/tools/perf/pmu-events/arch/powerpc/power8/frontend.json
--- a/tools/perf/pmu-events/arch/powerpc/power8/marked.json
+++ b/tools/perf/pmu-events/arch/powerpc/power8/marked.json
--- a/tools/perf/pmu-events/arch/powerpc/power8/memory.json
+++ b/tools/perf/pmu-events/arch/powerpc/power8/memory.json
--- a/tools/perf/pmu-events/arch/powerpc/power8/other.json
+++ b/tools/perf/pmu-events/arch/powerpc/power8/other.json
--- a/tools/perf/pmu-events/arch/powerpc/power8/pipeline.json
+++ b/tools/perf/pmu-events/arch/powerpc/power8/pipeline.json
--- a/tools/perf/pmu-events/arch/powerpc/power8/pmc.json
+++ b/tools/perf/pmu-events/arch/powerpc/power8/pmc.json
--- a/tools/perf/pmu-events/arch/powerpc/power8/translation.json
+++ b/tools/perf/pmu-events/arch/powerpc/power8/translation.json
--- a/tools/perf/pmu-events/arch/x86/bonnell/cache.json
+++ b/tools/perf/pmu-events/arch/x86/bonnell/cache.json
--- a/tools/perf/pmu-events/arch/x86/bonnell/floating-point.json
+++ b/tools/perf/pmu-events/arch/x86/bonnell/floating-point.json
--- a/tools/perf/pmu-events/arch/x86/bonnell/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/bonnell/frontend.json
--- a/tools/perf/pmu-events/arch/x86/bonnell/memory.json
+++ b/tools/perf/pmu-events/arch/x86/bonnell/memory.json
--- a/tools/perf/pmu-events/arch/x86/bonnell/other.json
+++ b/tools/perf/pmu-events/arch/x86/bonnell/other.json
--- a/tools/perf/pmu-events/arch/x86/bonnell/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/bonnell/pipeline.json
--- a/tools/perf/pmu-events/arch/x86/bonnell/virtual-memory.json
+++ b/tools/perf/pmu-events/arch/x86/bonnell/virtual-memory.json
--- a/tools/perf/pmu-events/arch/x86/broadwell/cache.json
+++ b/tools/perf/pmu-events/arch/x86/broadwell/cache.json
--- a/tools/perf/pmu-events/arch/x86/broadwell/floating-point.json
+++ b/tools/perf/pmu-events/arch/x86/broadwell/floating-point.json
--- a/tools/perf/pmu-events/arch/x86/broadwell/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/broadwell/frontend.json
--- a/tools/perf/pmu-events/arch/x86/broadwell/memory.json
+++ b/tools/perf/pmu-events/arch/x86/broadwell/memory.json
--- a/tools/perf/pmu-events/arch/x86/broadwell/other.json
+++ b/tools/perf/pmu-events/arch/x86/broadwell/other.json
--- a/tools/perf/pmu-events/arch/x86/broadwell/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/broadwell/pipeline.json
--- a/tools/perf/pmu-events/arch/x86/broadwell/virtual-memory.json
+++ b/tools/perf/pmu-events/arch/x86/broadwell/virtual-memory.json
--- a/tools/perf/pmu-events/arch/x86/broadwellde/cache.json
+++ b/tools/perf/pmu-events/arch/x86/broadwellde/cache.json
--- a/tools/perf/pmu-events/arch/x86/broadwellde/floating-point.json
+++ b/tools/perf/pmu-events/arch/x86/broadwellde/floating-point.json
--- a/tools/perf/pmu-events/arch/x86/broadwellde/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/broadwellde/frontend.json
--- a/tools/perf/pmu-events/arch/x86/broadwellde/memory.json
+++ b/tools/perf/pmu-events/arch/x86/broadwellde/memory.json
--- a/tools/perf/pmu-events/arch/x86/broadwellde/other.json
+++ b/tools/perf/pmu-events/arch/x86/broadwellde/other.json
--- a/tools/perf/pmu-events/arch/x86/broadwellde/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/broadwellde/pipeline.json
--- a/tools/perf/pmu-events/arch/x86/broadwellde/virtual-memory.json
+++ b/tools/perf/pmu-events/arch/x86/broadwellde/virtual-memory.json
--- a/tools/perf/pmu-events/arch/x86/broadwellx/cache.json
+++ b/tools/perf/pmu-events/arch/x86/broadwellx/cache.json
--- a/tools/perf/pmu-events/arch/x86/broadwellx/floating-point.json
+++ b/tools/perf/pmu-events/arch/x86/broadwellx/floating-point.json
--- a/tools/perf/pmu-events/arch/x86/broadwellx/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/broadwellx/frontend.json
--- a/tools/perf/pmu-events/arch/x86/broadwellx/memory.json
+++ b/tools/perf/pmu-events/arch/x86/broadwellx/memory.json
--- a/tools/perf/pmu-events/arch/x86/broadwellx/other.json
+++ b/tools/perf/pmu-events/arch/x86/broadwellx/other.json
--- a/tools/perf/pmu-events/arch/x86/broadwellx/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/broadwellx/pipeline.json
--- a/tools/perf/pmu-events/arch/x86/broadwellx/virtual-memory.json
+++ b/tools/perf/pmu-events/arch/x86/broadwellx/virtual-memory.json
--- a/tools/perf/pmu-events/arch/x86/goldmont/cache.json
+++ b/tools/perf/pmu-events/arch/x86/goldmont/cache.json
--- a/tools/perf/pmu-events/arch/x86/goldmont/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/goldmont/frontend.json
--- a/tools/perf/pmu-events/arch/x86/goldmont/memory.json
+++ b/tools/perf/pmu-events/arch/x86/goldmont/memory.json
--- a/tools/perf/pmu-events/arch/x86/goldmont/other.json
+++ b/tools/perf/pmu-events/arch/x86/goldmont/other.json
--- a/tools/perf/pmu-events/arch/x86/goldmont/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/goldmont/pipeline.json
--- a/tools/perf/pmu-events/arch/x86/goldmont/virtual-memory.json
+++ b/tools/perf/pmu-events/arch/x86/goldmont/virtual-memory.json
--- a/tools/perf/pmu-events/arch/x86/haswell/cache.json
+++ b/tools/perf/pmu-events/arch/x86/haswell/cache.json
--- a/tools/perf/pmu-events/arch/x86/haswell/floating-point.json
+++ b/tools/perf/pmu-events/arch/x86/haswell/floating-point.json
--- a/tools/perf/pmu-events/arch/x86/haswell/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/haswell/frontend.json
--- a/tools/perf/pmu-events/arch/x86/haswell/memory.json
+++ b/tools/perf/pmu-events/arch/x86/haswell/memory.json
--- a/tools/perf/pmu-events/arch/x86/haswell/other.json
+++ b/tools/perf/pmu-events/arch/x86/haswell/other.json
--- a/tools/perf/pmu-events/arch/x86/haswell/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/haswell/pipeline.json
--- a/tools/perf/pmu-events/arch/x86/haswell/virtual-memory.json
+++ b/tools/perf/pmu-events/arch/x86/haswell/virtual-memory.json
--- a/tools/perf/pmu-events/arch/x86/haswellx/cache.json
+++ b/tools/perf/pmu-events/arch/x86/haswellx/cache.json
--- a/tools/perf/pmu-events/arch/x86/haswellx/floating-point.json
+++ b/tools/perf/pmu-events/arch/x86/haswellx/floating-point.json
--- a/tools/perf/pmu-events/arch/x86/haswellx/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/haswellx/frontend.json
--- a/tools/perf/pmu-events/arch/x86/haswellx/memory.json
+++ b/tools/perf/pmu-events/arch/x86/haswellx/memory.json
--- a/tools/perf/pmu-events/arch/x86/haswellx/other.json
+++ b/tools/perf/pmu-events/arch/x86/haswellx/other.json
--- a/tools/perf/pmu-events/arch/x86/haswellx/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/haswellx/pipeline.json
--- a/tools/perf/pmu-events/arch/x86/haswellx/virtual-memory.json
+++ b/tools/perf/pmu-events/arch/x86/haswellx/virtual-memory.json
--- a/tools/perf/pmu-events/arch/x86/ivybridge/cache.json
+++ b/tools/perf/pmu-events/arch/x86/ivybridge/cache.json
--- a/tools/perf/pmu-events/arch/x86/ivybridge/floating-point.json
+++ b/tools/perf/pmu-events/arch/x86/ivybridge/floating-point.json
--- a/tools/perf/pmu-events/arch/x86/ivybridge/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/ivybridge/frontend.json
--- a/tools/perf/pmu-events/arch/x86/ivybridge/memory.json
+++ b/tools/perf/pmu-events/arch/x86/ivybridge/memory.json
--- a/tools/perf/pmu-events/arch/x86/ivybridge/other.json
+++ b/tools/perf/pmu-events/arch/x86/ivybridge/other.json
--- a/tools/perf/pmu-events/arch/x86/ivybridge/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/ivybridge/pipeline.json
--- a/tools/perf/pmu-events/arch/x86/ivybridge/virtual-memory.json
+++ b/tools/perf/pmu-events/arch/x86/ivybridge/virtual-memory.json
--- a/tools/perf/pmu-events/arch/x86/ivytown/cache.json
+++ b/tools/perf/pmu-events/arch/x86/ivytown/cache.json
--- a/tools/perf/pmu-events/arch/x86/ivytown/floating-point.json
+++ b/tools/perf/pmu-events/arch/x86/ivytown/floating-point.json
--- a/tools/perf/pmu-events/arch/x86/ivytown/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/ivytown/frontend.json
--- a/tools/perf/pmu-events/arch/x86/ivytown/memory.json
+++ b/tools/perf/pmu-events/arch/x86/ivytown/memory.json
--- a/tools/perf/pmu-events/arch/x86/ivytown/other.json
+++ b/tools/perf/pmu-events/arch/x86/ivytown/other.json
--- a/tools/perf/pmu-events/arch/x86/ivytown/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/ivytown/pipeline.json
--- a/tools/perf/pmu-events/arch/x86/ivytown/virtual-memory.json
+++ b/tools/perf/pmu-events/arch/x86/ivytown/virtual-memory.json
--- a/tools/perf/pmu-events/arch/x86/jaketown/cache.json
+++ b/tools/perf/pmu-events/arch/x86/jaketown/cache.json
--- a/tools/perf/pmu-events/arch/x86/jaketown/floating-point.json
+++ b/tools/perf/pmu-events/arch/x86/jaketown/floating-point.json
--- a/tools/perf/pmu-events/arch/x86/jaketown/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/jaketown/frontend.json
--- a/tools/perf/pmu-events/arch/x86/jaketown/memory.json
+++ b/tools/perf/pmu-events/arch/x86/jaketown/memory.json
--- a/tools/perf/pmu-events/arch/x86/jaketown/other.json
+++ b/tools/perf/pmu-events/arch/x86/jaketown/other.json
--- a/tools/perf/pmu-events/arch/x86/jaketown/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/jaketown/pipeline.json
--- a/tools/perf/pmu-events/arch/x86/jaketown/virtual-memory.json
+++ b/tools/perf/pmu-events/arch/x86/jaketown/virtual-memory.json
--- a/tools/perf/pmu-events/arch/x86/knightslanding/cache.json
+++ b/tools/perf/pmu-events/arch/x86/knightslanding/cache.json
--- a/tools/perf/pmu-events/arch/x86/knightslanding/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/knightslanding/frontend.json
--- a/tools/perf/pmu-events/arch/x86/knightslanding/memory.json
+++ b/tools/perf/pmu-events/arch/x86/knightslanding/memory.json
--- a/tools/perf/pmu-events/arch/x86/knightslanding/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/knightslanding/pipeline.json
--- a/tools/perf/pmu-events/arch/x86/knightslanding/virtual-memory.json
+++ b/tools/perf/pmu-events/arch/x86/knightslanding/virtual-memory.json
--- a/tools/perf/pmu-events/arch/x86/mapfile.csv
+++ b/tools/perf/pmu-events/arch/x86/mapfile.csv
--- a/tools/perf/pmu-events/arch/x86/nehalemep/cache.json
+++ b/tools/perf/pmu-events/arch/x86/nehalemep/cache.json
--- a/tools/perf/pmu-events/arch/x86/nehalemep/floating-point.json
+++ b/tools/perf/pmu-events/arch/x86/nehalemep/floating-point.json
--- a/tools/perf/pmu-events/arch/x86/nehalemep/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/nehalemep/frontend.json
--- a/tools/perf/pmu-events/arch/x86/nehalemep/memory.json
+++ b/tools/perf/pmu-events/arch/x86/nehalemep/memory.json
--- a/tools/perf/pmu-events/arch/x86/nehalemep/other.json
+++ b/tools/perf/pmu-events/arch/x86/nehalemep/other.json
--- a/tools/perf/pmu-events/arch/x86/nehalemep/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/nehalemep/pipeline.json
--- a/tools/perf/pmu-events/arch/x86/nehalemep/virtual-memory.json
+++ b/tools/perf/pmu-events/arch/x86/nehalemep/virtual-memory.json
--- a/tools/perf/pmu-events/arch/x86/nehalemex/cache.json
+++ b/tools/perf/pmu-events/arch/x86/nehalemex/cache.json
--- a/tools/perf/pmu-events/arch/x86/nehalemex/floating-point.json
+++ b/tools/perf/pmu-events/arch/x86/nehalemex/floating-point.json
--- a/tools/perf/pmu-events/arch/x86/nehalemex/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/nehalemex/frontend.json
--- a/tools/perf/pmu-events/arch/x86/nehalemex/memory.json
+++ b/tools/perf/pmu-events/arch/x86/nehalemex/memory.json
--- a/tools/perf/pmu-events/arch/x86/nehalemex/other.json
+++ b/tools/perf/pmu-events/arch/x86/nehalemex/other.json
--- a/tools/perf/pmu-events/arch/x86/nehalemex/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/nehalemex/pipeline.json
--- a/tools/perf/pmu-events/arch/x86/nehalemex/virtual-memory.json
+++ b/tools/perf/pmu-events/arch/x86/nehalemex/virtual-memory.json
--- a/tools/perf/pmu-events/arch/x86/sandybridge/cache.json
+++ b/tools/perf/pmu-events/arch/x86/sandybridge/cache.json
--- a/tools/perf/pmu-events/arch/x86/sandybridge/floating-point.json
+++ b/tools/perf/pmu-events/arch/x86/sandybridge/floating-point.json
--- a/tools/perf/pmu-events/arch/x86/sandybridge/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/sandybridge/frontend.json
--- a/tools/perf/pmu-events/arch/x86/sandybridge/memory.json
+++ b/tools/perf/pmu-events/arch/x86/sandybridge/memory.json
--- a/tools/perf/pmu-events/arch/x86/sandybridge/other.json
+++ b/tools/perf/pmu-events/arch/x86/sandybridge/other.json
--- a/tools/perf/pmu-events/arch/x86/sandybridge/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/sandybridge/pipeline.json
--- a/tools/perf/pmu-events/arch/x86/sandybridge/virtual-memory.json
+++ b/tools/perf/pmu-events/arch/x86/sandybridge/virtual-memory.json
--- a/tools/perf/pmu-events/arch/x86/silvermont/cache.json
+++ b/tools/perf/pmu-events/arch/x86/silvermont/cache.json
--- a/tools/perf/pmu-events/arch/x86/silvermont/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/silvermont/frontend.json
--- a/tools/perf/pmu-events/arch/x86/silvermont/memory.json
+++ b/tools/perf/pmu-events/arch/x86/silvermont/memory.json
--- a/tools/perf/pmu-events/arch/x86/silvermont/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/silvermont/pipeline.json
--- a/tools/perf/pmu-events/arch/x86/silvermont/virtual-memory.json
+++ b/tools/perf/pmu-events/arch/x86/silvermont/virtual-memory.json
--- a/tools/perf/pmu-events/arch/x86/skylake/cache.json
+++ b/tools/perf/pmu-events/arch/x86/skylake/cache.json
--- a/tools/perf/pmu-events/arch/x86/skylake/floating-point.json
+++ b/tools/perf/pmu-events/arch/x86/skylake/floating-point.json
--- a/tools/perf/pmu-events/arch/x86/skylake/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/skylake/frontend.json
--- a/tools/perf/pmu-events/arch/x86/skylake/memory.json
+++ b/tools/perf/pmu-events/arch/x86/skylake/memory.json
--- a/tools/perf/pmu-events/arch/x86/skylake/other.json
+++ b/tools/perf/pmu-events/arch/x86/skylake/other.json
--- a/tools/perf/pmu-events/arch/x86/skylake/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/skylake/pipeline.json
--- a/tools/perf/pmu-events/arch/x86/skylake/virtual-memory.json
+++ b/tools/perf/pmu-events/arch/x86/skylake/virtual-memory.json
--- a/tools/perf/pmu-events/arch/x86/westmereep-dp/cache.json
+++ b/tools/perf/pmu-events/arch/x86/westmereep-dp/cache.json
--- a/tools/perf/pmu-events/arch/x86/westmereep-dp/floating-point.json
+++ b/tools/perf/pmu-events/arch/x86/westmereep-dp/floating-point.json
--- a/tools/perf/pmu-events/arch/x86/westmereep-dp/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/westmereep-dp/frontend.json
--- a/tools/perf/pmu-events/arch/x86/westmereep-dp/memory.json
+++ b/tools/perf/pmu-events/arch/x86/westmereep-dp/memory.json
--- a/tools/perf/pmu-events/arch/x86/westmereep-dp/other.json
+++ b/tools/perf/pmu-events/arch/x86/westmereep-dp/other.json
--- a/tools/perf/pmu-events/arch/x86/westmereep-dp/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/westmereep-dp/pipeline.json
--- a/tools/perf/pmu-events/arch/x86/westmereep-dp/virtual-memory.json
+++ b/tools/perf/pmu-events/arch/x86/westmereep-dp/virtual-memory.json
--- a/tools/perf/pmu-events/arch/x86/westmereep-sp/cache.json
+++ b/tools/perf/pmu-events/arch/x86/westmereep-sp/cache.json
--- a/tools/perf/pmu-events/arch/x86/westmereep-sp/floating-point.json
+++ b/tools/perf/pmu-events/arch/x86/westmereep-sp/floating-point.json
--- a/tools/perf/pmu-events/arch/x86/westmereep-sp/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/westmereep-sp/frontend.json
--- a/tools/perf/pmu-events/arch/x86/westmereep-sp/memory.json
+++ b/tools/perf/pmu-events/arch/x86/westmereep-sp/memory.json
--- a/tools/perf/pmu-events/arch/x86/westmereep-sp/other.json
+++ b/tools/perf/pmu-events/arch/x86/westmereep-sp/other.json
--- a/tools/perf/pmu-events/arch/x86/westmereep-sp/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/westmereep-sp/pipeline.json
--- a/tools/perf/pmu-events/arch/x86/westmereep-sp/virtual-memory.json
+++ b/tools/perf/pmu-events/arch/x86/westmereep-sp/virtual-memory.json
--- a/tools/perf/pmu-events/arch/x86/westmereex/cache.json
+++ b/tools/perf/pmu-events/arch/x86/westmereex/cache.json
--- a/tools/perf/pmu-events/arch/x86/westmereex/floating-point.json
+++ b/tools/perf/pmu-events/arch/x86/westmereex/floating-point.json
--- a/tools/perf/pmu-events/arch/x86/westmereex/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/westmereex/frontend.json
--- a/tools/perf/pmu-events/arch/x86/westmereex/memory.json
+++ b/tools/perf/pmu-events/arch/x86/westmereex/memory.json
--- a/tools/perf/pmu-events/arch/x86/westmereex/other.json
+++ b/tools/perf/pmu-events/arch/x86/westmereex/other.json
--- a/tools/perf/pmu-events/arch/x86/westmereex/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/westmereex/pipeline.json
--- a/tools/perf/pmu-events/arch/x86/westmereex/virtual-memory.json
+++ b/tools/perf/pmu-events/arch/x86/westmereex/virtual-memory.json
--- a/tools/perf/tests/Build
+++ b/tools/perf/tests/Build
--- a/tools/perf/tests/backward-ring-buffer.c
+++ b/tools/perf/tests/backward-ring-buffer.c
--- a/tools/perf/tests/bpf.c
+++ b/tools/perf/tests/bpf.c
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
--- a/tools/perf/tests/clang.c
+++ b/tools/perf/tests/clang.c
--- a/tools/perf/tests/llvm.c
+++ b/tools/perf/tests/llvm.c
--- a/tools/perf/tests/llvm.h
+++ b/tools/perf/tests/llvm.h
--- a/tools/perf/tests/make
+++ b/tools/perf/tests/make
--- a/tools/perf/tests/perf-hooks.c
+++ b/tools/perf/tests/perf-hooks.c
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
--- a/tools/perf/ui/browsers/hists.h
+++ b/tools/perf/ui/browsers/hists.h
--- a/tools/perf/ui/gtk/annotate.c
+++ b/tools/perf/ui/gtk/annotate.c
--- a/tools/perf/ui/helpline.c
+++ b/tools/perf/ui/helpline.c
--- a/tools/perf/ui/helpline.h
+++ b/tools/perf/ui/helpline.h
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
--- a/tools/perf/util/c++/Build
+++ b/tools/perf/util/c++/Build
--- a/tools/perf/util/c++/clang-c.h
+++ b/tools/perf/util/c++/clang-c.h
--- a/tools/perf/util/c++/clang-test.cpp
+++ b/tools/perf/util/c++/clang-test.cpp
--- a/tools/perf/util/c++/clang.cpp
+++ b/tools/perf/util/c++/clang.cpp
--- a/tools/perf/util/c++/clang.h
+++ b/tools/perf/util/c++/clang.h
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
--- a/tools/perf/util/config.c
+++ b/tools/perf/util/config.c
--- a/tools/perf/util/config.h
+++ b/tools/perf/util/config.h
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
--- a/tools/perf/util/evsel_fprintf.c
+++ b/tools/perf/util/evsel_fprintf.c
--- a/tools/perf/util/genelf.c
+++ b/tools/perf/util/genelf.c
--- a/tools/perf/util/genelf.h
+++ b/tools/perf/util/genelf.h
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
--- a/tools/perf/util/intel-bts.c
+++ b/tools/perf/util/intel-bts.c
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
--- a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
--- a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h
--- a/tools/perf/util/intel-pt-decoder/intel-pt-log.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-log.c
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
--- a/tools/perf/util/jitdump.c
+++ b/tools/perf/util/jitdump.c
--- a/tools/perf/util/jitdump.h
+++ b/tools/perf/util/jitdump.h
--- a/tools/perf/util/llvm-utils.c
+++ b/tools/perf/util/llvm-utils.c
--- a/tools/perf/util/llvm-utils.h
+++ b/tools/perf/util/llvm-utils.h
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
--- a/tools/perf/util/mem-events.h
+++ b/tools/perf/util/mem-events.h
--- a/tools/perf/util/parse-branch-options.c
+++ b/tools/perf/util/parse-branch-options.c
--- a/tools/perf/util/parse-branch-options.h
+++ b/tools/perf/util/parse-branch-options.h
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
--- a/tools/perf/util/perf-hooks-list.h
+++ b/tools/perf/util/perf-hooks-list.h
--- a/tools/perf/util/perf-hooks.c
+++ b/tools/perf/util/perf-hooks.c
--- a/tools/perf/util/perf-hooks.h
+++ b/tools/perf/util/perf-hooks.h
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
--- a/tools/perf/util/python-ext-sources
+++ b/tools/perf/util/python-ext-sources
--- a/tools/perf/util/quote.c
+++ b/tools/perf/util/quote.c
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
--- a/tools/perf/util/string.c
+++ b/tools/perf/util/string.c
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
--- a/tools/perf/util/symbol_fprintf.c
+++ b/tools/perf/util/symbol_fprintf.c
--- a/tools/perf/util/time-utils.c
+++ b/tools/perf/util/time-utils.c
--- a/tools/perf/util/time-utils.h
+++ b/tools/perf/util/time-utils.h
--- a/tools/perf/util/trace-event-scripting.c
+++ b/tools/perf/util/trace-event-scripting.c
--- a/tools/perf/util/unwind-libunwind-local.c
+++ b/tools/perf/util/unwind-libunwind-local.c
--- a/tools/perf/util/util-cxx.h
+++ b/tools/perf/util/util-cxx.h
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
--- a/tools/perf/util/util.h
+++ b/tools/perf/util/util.h
--- a/tools/perf/util/values.c
+++ b/tools/perf/util/values.c
--- a/tools/perf/util/values.h
+++ b/tools/perf/util/values.h