- 01 Sep, 2017 2 commits
-
-
Arnaldo Carvalho de Melo authored
So now we can use: # perf trace -e pkey_* 532.784 ( 0.006 ms): pkey/16018 pkey_alloc(init_val: DISABLE_WRITE) = -1 EINVAL Invalid argument 532.795 ( 0.004 ms): pkey/16018 pkey_mprotect(start: 0x7f380d0a6000, len: 4096, prot: READ|WRITE, pkey: -1) = 0 532.801 ( 0.002 ms): pkey/16018 pkey_free(pkey: -1 ) = -1 EINVAL Invalid argument ^C[root@jouet ~]# Or '-e epoll*', '-e *msg*', etc. Combining syscall names with perf events, tracepoints, etc, continues to be valid, i.e. this is possible: # perf probe -L sys_nanosleep <SyS_nanosleep@/home/acme/git/linux/kernel/time/hrtimer.c:0> 0 SYSCALL_DEFINE2(nanosleep, struct timespec __user *, rqtp, struct timespec __user *, rmtp) { struct timespec64 tu; 5 if (get_timespec64(&tu, rqtp)) 6 return -EFAULT; if (!timespec64_valid(&tu)) 9 return -EINVAL; 11 current->restart_block.nanosleep.type = rmtp ? TT_NATIVE : TT_NONE; 12 current->restart_block.nanosleep.rmtp = rmtp; 13 return hrtimer_nanosleep(&tu, HRTIMER_MODE_REL, CLOCK_MONOTONIC); } # perf probe my_probe="sys_nanosleep:12 rmtp" Added new event: probe:my_probe (on sys_nanosleep:12 with rmtp) You can now use it in all perf tools, such as: perf record -e probe:my_probe -aR sleep 1 # # perf trace -e probe:my_probe/max-stack=5/,*sleep sleep 1 0.427 ( 0.003 ms): sleep/16690 nanosleep(rqtp: 0x7ffefc245090) ... 0.430 ( ): probe:my_probe:(ffffffffbd112923) rmtp=0) sys_nanosleep ([kernel.kallsyms]) do_syscall_64 ([kernel.kallsyms]) return_from_SYSCALL_64 ([kernel.kallsyms]) __nanosleep_nocancel (/usr/lib64/libc-2.25.so) 0.427 (1000.208 ms): sleep/16690 ... [continued]: nanosleep()) = 0 # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-elycoi8wy6y0w9dkj7ox1mzz@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
With two new methods, one to find the first match, returning its syscall id and its index in whatever internal database it keeps the syscall into, then one to find the next match, if any. Implemented only on arches where we actually read the syscall table from the kernel sources, i.e. x86-64 for now, all the others use the libaudit method for which this returns -1, i.e. just stubs were added, with the actual implementation using whatever libaudit functions for matching that may be available. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-i0sj4rxk1a63pfe9gl8z8irs@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 30 Aug, 2017 1 commit
-
-
Jin Yao authored
The branch history code has a loop detection function. With this, we can get the number of iterations by calculating the removed loops. While it would be nice for knowing the average cycles of iterations. This patch adds up the cycles in branch entries of removed loops and save the result to the next branch entry (e.g. branch entry A). Finally it will display the iteration number and average cycles at the "from" of branch entry A. For example: perf record -g -j any,save_type ./div perf report --branch-history --no-children --stdio --22.63%--main div.c:42 (RET CROSS_2M) compute_flag div.c:28 (cycles:2 iter:173115 avg_cycles:2) | --10.73%--compute_flag div.c:27 (RET CROSS_2M) rand rand.c:28 (cycles:1) rand rand.c:28 (RET CROSS_2M) __random random.c:298 (cycles:1) __random random.c:297 (COND_BWD CROSS_2M) __random random.c:295 (cycles:1) __random random.c:295 (COND_BWD CROSS_2M) __random random.c:295 (cycles:1) __random random.c:295 (RET CROSS_2M) Signed-off-by: Yao Jin <yao.jin@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1502111115-18305-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 29 Aug, 2017 9 commits
-
-
Ingo Molnar authored
Merge tag 'perf-core-for-mingo-4.14-20170829' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: - Fix remote HITM detection for Skylake in 'perf c2c' (Jiri Olsa) - Fixes for the handling of PERF_RECORD_READ records (Jiri Olsa) - Fix kprobes blackist symbol lookup in 'perf probe' (Li Bin) - The PLT header and entry sizes are not the same in !x86, fix it for ARM and AARCH64 (Li Bin) - Beautify pkey_{alloc,free,mprotect} arguments in 'perf trace' (Arnaldo Carvalho de Melo) - Fix CC, AR, LD external definition, allow flex and bison to be externally defined and other related Makefile fixes (David Carrillo-Cisneros) - Sync CPU features kernel ABI headers with tooling headers (Arnaldo Carvalho de Melo) - Fix path to PMU formats in 'perf stat' documentation (Jack Henschel) - Fix static build with newer toolchains (Jiri Olsa) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
Li Bin authored
On x86, the plt header size is as same as the plt entry size, and can be identified from shdr's sh_entsize of the plt. But we can't assume that the sh_entsize of the plt shdr is always the plt entry size in all architecture, and the plt header size may be not as same as the plt entry size in some architecure. On ARM, the plt header size is 20 bytes and the plt entry size is 12 bytes (don't consider the FOUR_WORD_PLT case) that refer to the binutils implementation. The plt section is as follows: Disassembly of section .plt: 000004a0 <__cxa_finalize@plt-0x14>: 4a0: e52de004 push {lr} ; (str lr, [sp, #-4]!) 4a4: e59fe004 ldr lr, [pc, #4] ; 4b0 <_init+0x1c> 4a8: e08fe00e add lr, pc, lr 4ac: e5bef008 ldr pc, [lr, #8]! 4b0: 00008424 .word 0x00008424 000004b4 <__cxa_finalize@plt>: 4b4: e28fc600 add ip, pc, #0, 12 4b8: e28cca08 add ip, ip, #8, 20 ; 0x8000 4bc: e5bcf424 ldr pc, [ip, #1060]! ; 0x424 000004c0 <printf@plt>: 4c0: e28fc600 add ip, pc, #0, 12 4c4: e28cca08 add ip, ip, #8, 20 ; 0x8000 4c8: e5bcf41c ldr pc, [ip, #1052]! ; 0x41c On AARCH64, the plt header size is 32 bytes and the plt entry size is 16 bytes. The plt section is as follows: Disassembly of section .plt: 0000000000000560 <__cxa_finalize@plt-0x20>: 560: a9bf7bf0 stp x16, x30, [sp,#-16]! 564: 90000090 adrp x16, 10000 <__FRAME_END__+0xf8a8> 568: f944be11 ldr x17, [x16,#2424] 56c: 9125e210 add x16, x16, #0x978 570: d61f0220 br x17 574: d503201f nop 578: d503201f nop 57c: d503201f nop 0000000000000580 <__cxa_finalize@plt>: 580: 90000090 adrp x16, 10000 <__FRAME_END__+0xf8a8> 584: f944c211 ldr x17, [x16,#2432] 588: 91260210 add x16, x16, #0x980 58c: d61f0220 br x17 0000000000000590 <__gmon_start__@plt>: 590: 90000090 adrp x16, 10000 <__FRAME_END__+0xf8a8> 594: f944c611 ldr x17, [x16,#2440] 598: 91262210 add x16, x16, #0x988 59c: d61f0220 br x17 NOTES: In addition to ARM and AARCH64, other architectures, such as s390/alpha/mips/parisc/poperpc/sh/sparc/xtensa also need to consider this issue. Signed-off-by: Li Bin <huawei.libin@huawei.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexis Berlemont <alexis.berlemont@gmail.com> Cc: David Tolnay <dtolnay@gmail.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: zhangmengting@huawei.com Link: http://lkml.kernel.org/r/1496622849-21877-1-git-send-email-huawei.libin@huawei.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Li Bin authored
The commit 9aaf5a5f ("perf probe: Check kprobes blacklist when adding new events"), 'perf probe' supports checking the blacklist of the fuctions which can not be probed. But the checking condition is wrong, that the end_addr of the symbol which is the start_addr of the next symbol can't be included. Committer notes: IOW make it match its kernel counterpart in kernel/kprobes.c: bool within_kprobe_blacklist(unsigned long addr) Each entry have as its end address not its end address, but the first address _outside_ that symbol, which for related functions, is the first address of the next symbol, like these from kernel/trace/trace_probe.c: 0xffffffffbd198df0-0xffffffffbd198e40 print_type_u8 0xffffffffbd198e40-0xffffffffbd198e90 print_type_u16 0xffffffffbd198e90-0xffffffffbd198ee0 print_type_u32 0xffffffffbd198ee0-0xffffffffbd198f30 print_type_u64 0xffffffffbd198f30-0xffffffffbd198f80 print_type_s8 0xffffffffbd198f80-0xffffffffbd198fd0 print_type_s16 0xffffffffbd198fd0-0xffffffffbd199020 print_type_s32 0xffffffffbd199020-0xffffffffbd199070 print_type_s64 0xffffffffbd199070-0xffffffffbd1990c0 print_type_x8 0xffffffffbd1990c0-0xffffffffbd199110 print_type_x16 0xffffffffbd199110-0xffffffffbd199160 print_type_x32 0xffffffffbd199160-0xffffffffbd1991b0 print_type_x64 But not always: 0xffffffffbd1997b0-0xffffffffbd1997c0 fetch_kernel_stack_address (kernel/trace/trace_probe.c) 0xffffffffbd1c57f0-0xffffffffbd1c58b0 __context_tracking_enter (kernel/context_tracking.c) Signed-off-by: Li Bin <huawei.libin@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: zhangmengting@huawei.com Fixes: 9aaf5a5f ("perf probe: Check kprobes blacklist when adding new events") Link: http://lkml.kernel.org/r/1504011443-7269-1-git-send-email-huawei.libin@huawei.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Peter Zijlstra authored
Move the 'max_precise' capability into generic x86 code where it belongs. This fixes a sysfs splat on !Intel systems where we fail to set x86_pmu_caps_group.atts. Reported-and-tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: hpa@zytor.com Fixes: 22688d1c20f5 ("x86/perf: Export some PMU attributes in caps/ directory") Link: http://lkml.kernel.org/r/20170828104650.2u3rsim4jafyjzv2@hirez.programming.kicks-ass.netSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Kan Liang authored
For understanding how the workload maps to memory channels and hardware behavior, it's very important to collect address maps with physical addresses. For example, 3D XPoint access can only be found by filtering the physical address. Add a new sample type for physical address. perf already has a facility to collect data virtual address. This patch introduces a function to convert the virtual address to physical address. The function is quite generic and can be extended to any architecture as long as a virtual address is provided. - For kernel direct mapping addresses, virt_to_phys is used to convert the virtual addresses to physical address. - For user virtual addresses, __get_user_pages_fast is used to walk the pages tables for user physical address. - This does not work for vmalloc addresses right now. These are not resolved, but code to do that could be added. The new sample type requires collecting the virtual address. The virtual address will not be output unless SAMPLE_ADDR is applied. For security, the physical address can only be exposed to root or privileged user. Tested-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: mpe@ellerman.id.au Link: http://lkml.kernel.org/r/1503967969-48278-1-git-send-email-kan.liang@intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Alexander Shishkin authored
I just noticed that hw.itrace_started and hw.config are aliased to the same location. Now, the PT driver happens to use both, which works out fine by sheer luck: - STORE(hw.itrace_start) is ordered before STORE(hw.config), in the program order, although there are no compiler barriers to ensure that, - to the perf_log_itrace_start() hw.itrace_start looks set at the same time as when it is intended to be set because both stores happen in the same path, - hw.config is never reset to zero in the PT driver. Now, the use of hw.config by the PT driver makes more sense (it being a HW PMU) than messing around with itrace_started, which is an awkward API to begin with. This patch replaces hw.itrace_started with an attach_state bit and an API call for the PMU drivers to use to communicate the condition. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20170330153956.25994-1-alexander.shishkin@linux.intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Ingo Molnar authored
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
Zhou Chengming authored
When running perf on the ftrace:function tracepoint, there is a bug which can be reproduced by: perf record -e ftrace:function -a sleep 20 & perf record -e ftrace:function ls perf script ls 10304 [005] 171.853235: ftrace:function: perf_output_begin ls 10304 [005] 171.853237: ftrace:function: perf_output_begin ls 10304 [005] 171.853239: ftrace:function: task_tgid_nr_ns ls 10304 [005] 171.853240: ftrace:function: task_tgid_nr_ns ls 10304 [005] 171.853242: ftrace:function: __task_pid_nr_ns ls 10304 [005] 171.853244: ftrace:function: __task_pid_nr_ns We can see that all the function traces are doubled. The problem is caused by the inconsistency of the register function perf_ftrace_event_register() with the probe function perf_ftrace_function_call(). The former registers one probe for every perf_event. And the latter handles all perf_events on the current cpu. So when two perf_events on the current cpu, the traces of them will be doubled. So this patch adds an extra parameter "event" for perf_tp_event, only send sample data to this event when it's not NULL. Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@kernel.org Cc: alexander.shishkin@linux.intel.com Cc: huawei.libin@huawei.com Link: http://lkml.kernel.org/r/1503668977-12526-1-git-send-email-zhouchengming1@huawei.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Meng Xu authored
While examining the kernel source code, I found a dangerous operation that could turn into a double-fetch situation (a race condition bug) where the same userspace memory region are fetched twice into kernel with sanity checks after the first fetch while missing checks after the second fetch. 1. The first fetch happens in line 9573 get_user(size, &uattr->size). 2. Subsequently the 'size' variable undergoes a few sanity checks and transformations (line 9577 to 9584). 3. The second fetch happens in line 9610 copy_from_user(attr, uattr, size) 4. Given that 'uattr' can be fully controlled in userspace, an attacker can race condition to override 'uattr->size' to arbitrary value (say, 0xFFFFFFFF) after the first fetch but before the second fetch. The changed value will be copied to 'attr->size'. 5. There is no further checks on 'attr->size' until the end of this function, and once the function returns, we lose the context to verify that 'attr->size' conforms to the sanity checks performed in step 2 (line 9577 to 9584). 6. My manual analysis shows that 'attr->size' is not used elsewhere later, so, there is no working exploit against it right now. However, this could easily turns to an exploitable one if careless developers start to use 'attr->size' later. To fix this, override 'attr->size' from the second fetch to the one from the first fetch, regardless of what is actually copied in. In this way, it is assured that 'attr->size' is consistent with the checks performed after the first fetch. Signed-off-by: Meng Xu <mengxu.gatech@gmail.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@kernel.org Cc: alexander.shishkin@linux.intel.com Cc: meng.xu@gatech.edu Cc: sanidhya@gatech.edu Cc: taesoo@gatech.edu Link: http://lkml.kernel.org/r/1503522470-35531-1-git-send-email-meng.xu@gatech.eduSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
- 28 Aug, 2017 26 commits
-
-
Linus Torvalds authored
Commit 3510ca20 ("Minor page waitqueue cleanups") made the page queue code always add new waiters to the back of the queue, which helps upcoming patches to batch the wakeups for some horrid loads where the wait queues grow to thousands of entries. However, I forgot about the nasrt add_page_wait_queue() special case code that is only used by the cachefiles code. That one still continued to add the new wait queue entries at the beginning of the list. Fix it, because any sane batched wakeup will require that we don't suddenly start getting new entries at the beginning of the list that we already handled in a previous batch. [ The current code always does the whole list while holding the lock, so wait queue ordering doesn't matter for correctness, but even then it's better to add later entries at the end from a fairness standpoint ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Tejun Heo authored
When !NUMA, cpumask_of_node(@node) equals cpu_online_mask regardless of @node. The assumption seems that if !NUMA, there shouldn't be more than one node and thus reporting cpu_online_mask regardless of @node is correct. However, that assumption was broken years ago to support DISCONTIGMEM and whether a system has multiple nodes or not is separately controlled by NEED_MULTIPLE_NODES. This means that, on a system with !NUMA && NEED_MULTIPLE_NODES, cpumask_of_node() will report cpu_online_mask for all possible nodes, indicating that the CPUs are associated with multiple nodes which is an impossible configuration. This bug has been around forever but doesn't look like it has caused any noticeable symptoms. However, it triggers a WARN recently added to workqueue to verify NUMA affinity configuration. Fix it by reporting empty cpumask on non-zero nodes if !NUMA. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alexey Brodkin authored
Recent commit a8ec3ee8 "arc: Mask individual IRQ lines during core INTC init" breaks interrupt handling on ARCv2 SMP systems. That commit masked all interrupts at onset, as some controllers on some boards (customer as well as internal), would assert interrutps early before any handlers were installed. For SMP systems, the masking was done at each cpu's core-intc. Later, when the IRQ was actually requested, it was unmasked, but only on the requesting cpu. For "common" interrupts, which were wired up from the 2nd level IDU intc, this was as issue as they needed to be enabled on ALL the cpus (given that IDU IRQs are by default served Round Robin across cpus) So fix that by NOT masking "common" interrupts at core-intc, but instead at the 2nd level IDU intc (latter already being done in idu_of_init()) Fixes: a8ec3ee8 ("arc: Mask individual IRQ lines during core INTC init") Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> [vgupta: reworked changelog, removed the extraneous idu_irq_mask_raw()] Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Helge Deller authored
Commit 464d6242 ("select: switch compat_{get,put}_fd_set() to compat_{get,put}_bitmap()") changed the calculation on how many bytes need to be zeroed when userspace handed over a NULL pointer for a fdset array in the select syscall. The calculation was changed in compat_get_fd_set() wrongly from memset(fdset, 0, ((nr + 1) & ~1)*sizeof(compat_ulong_t)); to memset(fdset, 0, ALIGN(nr, BITS_PER_LONG)); The ALIGN(nr, BITS_PER_LONG) calculates the number of _bits_ which need to be zeroed in the target fdset array (rounded up to the next full bits for an unsigned long). But the memset() call expects the number of _bytes_ to be zeroed. This leads to clearing more memory than wanted (on the stack area or even at kmalloc()ed memory areas) and to random kernel crashes as we have seen them on the parisc platform. The correct change should have been memset(fdset, 0, (ALIGN(nr, BITS_PER_LONG) / BITS_PER_LONG) * BYTES_PER_LONG); which is the same as can be archieved with a call to zero_fd_set(nr, fdset). Fixes: 464d6242 ("select: switch compat_{get,put}_fd_set() to compat_{get,put}_bitmap()" Acked-by: : Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Arnaldo Carvalho de Melo authored
Reuse 'mprotect' beautifiers for 'pkey_mprotect'. System wide tracing pkey_alloc, pkey_free and pkey_mprotect calls, with backtraces: # perf trace -e pkey_alloc,pkey_mprotect,pkey_free --max-stack=5 0.000 ( 0.011 ms): pkey/7818 pkey_alloc(init_val: DISABLE_ACCESS|DISABLE_WRITE) = -1 EINVAL Invalid argument syscall (/usr/lib64/libc-2.25.so) pkey_alloc (/home/acme/c/pkey) 0.022 ( 0.003 ms): pkey/7818 pkey_mprotect(start: 0x7f28c3890000, len: 4096, prot: READ|WRITE, pkey: -1) = 0 syscall (/usr/lib64/libc-2.25.so) pkey_mprotect (/home/acme/c/pkey) 0.030 ( 0.002 ms): pkey/7818 pkey_free(pkey: -1 ) = -1 EINVAL Invalid argument syscall (/usr/lib64/libc-2.25.so) pkey_free (/home/acme/c/pkey) The tools/include/uapi/asm-generic/mman-common.h file is used to find the access rights defines for the pkey_alloc syscall second argument. Since we have the detector of changes for the tools/include header files versus its kernel origin (include/uapi/asm-generic/mman-common.h), we'll get whatever new flag appears for that argument automatically. This method should be used in other cases where it is easy to generate those flags tables because the header has properly namespaced defines like PKEY_DISABLE_ACCESS and PKEY_DISABLE_WRITE. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-3xq5312qlks7wtfzv2sk3nct@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
These changes made the tools/arch/x86/include/ headers to drift from its kernel origins: 910448bb ("perf/x86/amd/uncore: Rename cpufeatures macro for cache counters") 5442c269 ("x86/cpufeature, kvm/svm: Rename (shorten) the new "virtualized VMSAVE/VMLOAD" CPUID flag") cba4671a ("x86/mm: Disable PCID on 32-bit kernels") Which was detected while building perf: make: Entering directory '/home/acme/git/linux/tools/perf' BUILD: Doing 'make -j4' parallel build Warning: Kernel ABI header at 'tools/arch/x86/include/asm/disabled-features.h' differs from latest version at 'arch/x86/include/asm/disabled-features.h' Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h' This sync causes just these perf object files to be rebuilt: CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o And the changes in the above changesets don't entail any need for change in the above 'perf bench' files. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-456aafouj911a4x4zwt8stkm@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
David Carrillo-Cisneros authored
When building with an external FEATURES_DUMP, bpf complains that features dump file is not found. Fix it by passing full file path. Signed-off-by: David Carrillo-Cisneros <davidcc@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Paul Turner <pjt@google.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20170827075442.108534-7-davidcc@google.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
David Carrillo-Cisneros authored
Prior to this patch, make scripts tested for CLANG with ifeq ($(CC), clang), failing to detect CLANG binaries with different names. Fix it by testing for the existence of __clang__ macro in the list of compiler defined macros. Signed-off-by: David Carrillo-Cisneros <davidcc@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Paul Turner <pjt@google.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20170827075442.108534-5-davidcc@google.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
David Carrillo-Cisneros authored
Use already defined values for CC, AR and LD when available. Signed-off-by: David Carrillo-Cisneros <davidcc@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Paul Turner <pjt@google.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20170827075442.108534-4-davidcc@google.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
David Carrillo-Cisneros authored
Allow user to define flex and bison binary names by passing FLEX and BISON variables. Signed-off-by: David Carrillo-Cisneros <davidcc@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Paul Turner <pjt@google.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20170827075442.108534-3-davidcc@google.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
David Carrillo-Cisneros authored
Use $(CC) instead of harcoded gcc binary name. Signed-off-by: David Carrillo-Cisneros <davidcc@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Paul Turner <pjt@google.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20170827075442.108534-2-davidcc@google.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
There's no big value on displaying counts for every event ID, which is one per every CPU. Rather than that, displaying the whole sum for the event. $ perf record -c 100000 -e cycles:u -s test $ perf report -T Before: # PID TID cycles:u cycles:u cycles:u cycles:u ... [20 more columns of 'cycles:u'] 3339 3339 0 0 0 0 3340 3340 0 0 0 0 3341 3341 0 0 0 0 3342 3342 0 0 0 0 Now: # PID TID cycles:u 3339 3339 19678 3340 3340 18744 3341 3341 17335 3342 3342 26414 Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20170824162737.7813-10-jolsa@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
We need to make sure the array of value pointers are zero initialized, because we use them in realloc later on and uninitialized non zero value will cause allocation error and aborted execution. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20170824162737.7813-9-jolsa@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
Bailing out in case the allocation failed, not the other way round. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20170824162737.7813-8-jolsa@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
We are taking wrong index (+1) for first thread, which leaves thread with index 0 unused and uninitialized. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20170824162737.7813-7-jolsa@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
Adding dump_read function to gather all the dump output of read function. Adding output of enabled and running times and id if enabled (3 new lines with '...' prefix below). $ perf record -s ... $ perf report -D 958358311769 0x91f8 [0x40]: PERF_RECORD_READ: 3339 3339 cycles:u 0 ... time enabled : 958358313731 ... time running : 958358313731 ... id : 80 Committer note: Do not use 'read' as a variable name as it breaks the build on older systems, such as RHEL6: CC /tmp/build/perf/util/session.o cc1: warnings being treated as errors util/session.c: In function 'dump_read': util/session.c:1132: error: declaration of 'read' shadows a global declaration /usr/include/bits/unistd.h:35: error: shadowed declaration is here mv: cannot stat `/tmp/build/perf/util/.session.o.tmp': No such file or directory Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20170824162737.7813-6-jolsa@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
git://linux-c6x.org/git/projects/linux-c6x-upstreamingLinus Torvalds authored
Pull c6x tweaks from Mark Salter. * tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming: c6x: Convert to using %pOF instead of full_name c6x: defconfig: Cleanup from old Kconfig options
-
Jiri Olsa authored
Set read_format for what we expect to get from read event generated by perf_event_attr::inherit_stat. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20170824162737.7813-5-jolsa@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
Skylake introduced new mem_remote bit in union perf_mem_data_src [1]. It applies to any other memory level to express Remote unknown level, as is reported by Skylake. Adding this extra check to c2c_decode_stats to properly decode remote HITMs on Skylake. [1] http://lkml.kernel.org/r/20170816222156.19953-4-andi@firstfloor.orgSigned-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20170824085732.28481-1-jolsa@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
We can't pass --dynamic-list list into static build anymore, because compilers starts to scream about that. Fedora 26 started to fail build with following error: $ make LDFLAGS=-static ... /usr/bin/ld: dynamic STT_GNU_IFUNC symbol `strcmp' with pointer equality in `/usr/lib/gcc/x86_64-redhat-linux/7/../../../../lib64/libc.a(strcmp.o +)' can not be used when making an executable; recompile with -fPIE and relink with -pie There's no sense for --dynamic-list in static build, because there's no .dynsym table in static binary. Consequently the traceevent plugins have never worked with static build, but it was quietly passed by. To fix this in future I think we should add support to compile plugins within the perf binary directly for static build. Reported-and-Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/n/tip-jeg6a7ff9j9hlqn8k4gllzvv@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jack Henschel authored
As defined in tools/perf/util/pmu.c, the EVENT_SOURCE_DEVICE_PATH is /sys/bus/event_source/devices/ (no traling 's' in event_source) This patch corrects the path in the perf stat documentation Signed-off-by: Jack Henschel <jackdev@mailbox.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jack Henschel <jackdev@mailbox.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: trivial@kernel.org Link: http://lkml.kernel.org/r/20170824132022.10934-1-jackdev@mailbox.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommuLinus Torvalds authored
Pull IOMMU fix from Joerg Roedel: "Another fix, this time in common IOMMU sysfs code. In the conversion from the old iommu sysfs-code to the iommu_device_register interface, I missed to update the release path for the struct device associated with an IOMMU. It freed the 'struct device', which was a pointer before, but is now embedded in another struct. Freeing from the middle of allocated memory had all kinds of nasty side effects when an IOMMU was unplugged. Unfortunatly nobody unplugged and IOMMU until now, so this was not discovered earlier. The fix is to make the 'struct device' a pointer again" * tag 'iommu-fixes-v4.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu: Fix wrong freeing of iommu_device->dev
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-miscLinus Torvalds authored
Pull char/misc fix from Greg KH: "Here is a single misc driver fix for 4.13-rc7. It resolves a reported problem in the Android binder driver due to previous patches in 4.13-rc. It's been in linux-next with no reported issues" * tag 'char-misc-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: ANDROID: binder: fix proc->tsk check.
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/stagingLinus Torvalds authored
Pull staging/iio fixes from Greg KH: "Here are few small staging driver fixes, and some more IIO driver fixes for 4.13-rc7. Nothing major, just resolutions for some reported problems. All of these have been in linux-next with no reported problems" * tag 'staging-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: iio: magnetometer: st_magn: remove ihl property for LSM303AGR iio: magnetometer: st_magn: fix status register address for LSM303AGR iio: hid-sensor-trigger: Fix the race with user space powering up sensors iio: trigger: stm32-timer: fix get trigger mode iio: imu: adis16480: Fix acceleration scale factor for adis16480 PATCH] iio: Fix some documentation warnings staging: rtl8188eu: add RNX-N150NUB support Revert "staging: fsl-mc: be consistent when checking strcmp() return" iio: adc: stm32: fix common clock rate iio: adc: ina219: Avoid underflow for sleeping time iio: trigger: stm32-timer: add enable attribute iio: trigger: stm32-timer: fix get/set down count direction iio: trigger: stm32-timer: fix write_raw return value iio: trigger: stm32-timer: fix quadrature mode get routine iio: bmp280: properly initialize device for humidity reading
-
git://github.com/jonmason/ntbLinus Torvalds authored
Pull NTB fixes from Jon Mason: "NTB bug fixes to address an incorrect ntb_mw_count reference in the NTB transport, improperly bringing down the link if SPADs are corrupted, and an out-of-order issue regarding link negotiation and data passing" * tag 'ntb-4.13-bugfixes' of git://github.com/jonmason/ntb: ntb: ntb_test: ensure the link is up before trying to configure the mws ntb: transport shouldn't disable link due to bogus values in SPADs ntb: use correct mw_count function in ntb_tool and ntb_transport
-
- 27 Aug, 2017 2 commits
-
-
Linus Torvalds authored
The "lock_page_killable()" function waits for exclusive access to the page lock bit using the WQ_FLAG_EXCLUSIVE bit in the waitqueue entry set. That means that if it gets woken up, other waiters may have been skipped. That, in turn, means that if it sees the page being unlocked, it *must* take that lock and return success, even if a lethal signal is also pending. So instead of checking for lethal signals first, we need to check for them after we've checked the actual bit that we were waiting for. Even if that might then delay the killing of the process. This matches the order of the old "wait_on_bit_lock()" infrastructure that the page locking used to use (and is still used in a few other areas). Note that if we still return an error after having unsuccessfully tried to acquire the page lock, that is ok: that means that some other thread was able to get ahead of us and lock the page, and when that other thread then unlocks the page, the wakeup event will be repeated. So any other pending waiters will now get properly woken up. Fixes: 62906027 ("mm: add PageWaiters indicating tasks are waiting for a page bit") Cc: Nick Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Jan Kara <jack@suse.cz> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Linus Torvalds authored
Tim Chen and Kan Liang have been battling a customer load that shows extremely long page wakeup lists. The cause seems to be constant NUMA migration of a hot page that is shared across a lot of threads, but the actual root cause for the exact behavior has not been found. Tim has a patch that batches the wait list traversal at wakeup time, so that we at least don't get long uninterruptible cases where we traverse and wake up thousands of processes and get nasty latency spikes. That is likely 4.14 material, but we're still discussing the page waitqueue specific parts of it. In the meantime, I've tried to look at making the page wait queues less expensive, and failing miserably. If you have thousands of threads waiting for the same page, it will be painful. We'll need to try to figure out the NUMA balancing issue some day, in addition to avoiding the excessive spinlock hold times. That said, having tried to rewrite the page wait queues, I can at least fix up some of the braindamage in the current situation. In particular: (a) we don't want to continue walking the page wait list if the bit we're waiting for already got set again (which seems to be one of the patterns of the bad load). That makes no progress and just causes pointless cache pollution chasing the pointers. (b) we don't want to put the non-locking waiters always on the front of the queue, and the locking waiters always on the back. Not only is that unfair, it means that we wake up thousands of reading threads that will just end up being blocked by the writer later anyway. Also add a comment about the layout of 'struct wait_page_key' - there is an external user of it in the cachefiles code that means that it has to match the layout of 'struct wait_bit_key' in the two first members. It so happens to match, because 'struct page *' and 'unsigned long *' end up having the same values simply because the page flags are the first member in struct page. Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Christopher Lameter <cl@linux.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-