- 16 Apr, 2020 15 commits
-
-
Alexey Budankov authored
Open access to monitoring for CAP_PERFMON privileged process. Providing the access under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes operation more secure. CAP_PERFMON implements the principle of least privilege for performance monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle of least privilege: A security design principle that states that a process or program be granted only those privileges (e.g., capabilities) necessary to accomplish its legitimate function, and only for the time that such privileges are actually required) For backward compatibility reasons access to the monitoring remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure monitoring is discouraged with respect to CAP_PERFMON capability. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Acked-by: Will Deacon <will@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Igor Lubashev <ilubashe@akamai.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Serge Hallyn <serge@hallyn.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: intel-gfx@lists.freedesktop.org Cc: linux-doc@vger.kernel.org Cc: linux-man@vger.kernel.org Cc: linux-security-module@vger.kernel.org Cc: selinux@vger.kernel.org Link: http://lore.kernel.org/lkml/4ec1d6f7-548c-8d1c-f84a-cebeb9674e4e@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Alexey Budankov authored
Open access to monitoring for CAP_PERFMON privileged process. Providing the access under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes operation more secure. CAP_PERFMON implements the principle of least privilege for performance monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle of least privilege: A security design principle that states that a process or program be granted only those privileges (e.g., capabilities) necessary to accomplish its legitimate function, and only for the time that such privileges are actually required) For backward compatibility reasons access to the monitoring remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure monitoring is discouraged with respect to CAP_PERFMON capability. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Acked-by: Helge Deller <deller@gmx.de> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Igor Lubashev <ilubashe@akamai.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Serge Hallyn <serge@hallyn.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: intel-gfx@lists.freedesktop.org Cc: linux-doc@vger.kernel.org Cc: linux-man@vger.kernel.org Cc: linux-security-module@vger.kernel.org Cc: selinux@vger.kernel.org Link: http://lore.kernel.org/lkml/8cc98809-d35b-de0f-de02-4cf554f3cf62@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Alexey Budankov authored
Open access to monitoring for CAP_PERFMON privileged process. Providing the access under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes operation more secure. CAP_PERFMON implements the principle of least privilege for performance monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle of least privilege: A security design principle that states that a process or program be granted only those privileges (e.g., capabilities) necessary to accomplish its legitimate function, and only for the time that such privileges are actually required) For backward compatibility reasons access to the monitoring remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure monitoring is discouraged with respect to CAP_PERFMON capability. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Acked-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Igor Lubashev <ilubashe@akamai.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Serge Hallyn <serge@hallyn.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: intel-gfx@lists.freedesktop.org Cc: linux-doc@vger.kernel.org Cc: linux-man@vger.kernel.org Cc: linux-security-module@vger.kernel.org Cc: selinux@vger.kernel.org Link: http://lore.kernel.org/lkml/ac98cd9f-b59e-673c-c70d-180b3e7695d2@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Alexey Budankov authored
Open access to bpf_trace monitoring for CAP_PERFMON privileged process. Providing the access under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes operation more secure. CAP_PERFMON implements the principle of least privilege for performance monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle of least privilege: A security design principle that states that a process or program be granted only those privileges (e.g., capabilities) necessary to accomplish its legitimate function, and only for the time that such privileges are actually required) For backward compatibility reasons access to bpf_trace monitoring remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure bpf_trace monitoring is discouraged with respect to CAP_PERFMON capability. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Igor Lubashev <ilubashe@akamai.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Serge Hallyn <serge@hallyn.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: intel-gfx@lists.freedesktop.org Cc: linux-doc@vger.kernel.org Cc: linux-man@vger.kernel.org Cc: linux-security-module@vger.kernel.org Cc: selinux@vger.kernel.org Link: http://lore.kernel.org/lkml/c0a0ae47-8b6e-ff3e-416b-3cd1faaf71c0@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Alexey Budankov authored
Open access to i915_perf monitoring for CAP_PERFMON privileged process. Providing the access under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes operation more secure. CAP_PERFMON implements the principle of least privilege for performance monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle of least privilege: A security design principle that states that a process or program be granted only those privileges (e.g., capabilities) necessary to accomplish its legitimate function, and only for the time that such privileges are actually required) For backward compatibility reasons access to i915_events subsystem remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure i915_events monitoring is discouraged with respect to CAP_PERFMON capability. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Igor Lubashev <ilubashe@akamai.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Serge Hallyn <serge@hallyn.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: intel-gfx@lists.freedesktop.org Cc: linux-doc@vger.kernel.org Cc: linux-man@vger.kernel.org Cc: linux-security-module@vger.kernel.org Cc: selinux@vger.kernel.org Link: http://lore.kernel.org/lkml/e3e3292f-f765-ea98-e59c-fbe2db93fd34@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Alexey Budankov authored
Extend error messages to mention CAP_PERFMON capability as an option to substitute CAP_SYS_ADMIN capability for secure system performance monitoring and observability operations. Make perf_event_paranoid_check() and __cmd_ftrace() to be aware of CAP_PERFMON capability. CAP_PERFMON implements the principle of least privilege for performance monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle of least privilege: A security design principle that states that a process or program be granted only those privileges (e.g., capabilities) necessary to accomplish its legitimate function, and only for the time that such privileges are actually required) For backward compatibility reasons access to perf_events subsystem remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure perf_events monitoring is discouraged with respect to CAP_PERFMON capability. Committer testing: Using a libcap with this patch: diff --git a/libcap/include/uapi/linux/capability.h b/libcap/include/uapi/linux/capability.h index 78b2fd4c8a95..89b5b0279b60 100644 --- a/libcap/include/uapi/linux/capability.h +++ b/libcap/include/uapi/linux/capability.h @@ -366,8 +366,9 @@ struct vfs_ns_cap_data { #define CAP_AUDIT_READ 37 +#define CAP_PERFMON 38 -#define CAP_LAST_CAP CAP_AUDIT_READ +#define CAP_LAST_CAP CAP_PERFMON #define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP) Note that using '38' in place of 'cap_perfmon' works to some degree with an old libcap, its only when cap_get_flag() is called that libcap performs an error check based on the maximum value known for capabilities that it will fail. This makes determining the default of perf_event_attr.exclude_kernel to fail, as it can't determine if CAP_PERFMON is in place. Using 'perf top -e cycles' avoids the default check and sets perf_event_attr.exclude_kernel to 1. As root, with a libcap supporting CAP_PERFMON: # groupadd perf_users # adduser perf -g perf_users # mkdir ~perf/bin # cp ~acme/bin/perf ~perf/bin/ # chgrp perf_users ~perf/bin/perf # setcap "cap_perfmon,cap_sys_ptrace,cap_syslog=ep" ~perf/bin/perf # getcap ~perf/bin/perf /home/perf/bin/perf = cap_sys_ptrace,cap_syslog,cap_perfmon+ep # ls -la ~perf/bin/perf -rwxr-xr-x. 1 root perf_users 16968552 Apr 9 13:10 /home/perf/bin/perf As the 'perf' user in the 'perf_users' group: $ perf top -a --stdio Error: Failed to mmap with 1 (Operation not permitted) $ Either add the cap_ipc_lock capability to the perf binary or reduce the ring buffer size to some smaller value: $ perf top -m10 -a --stdio rounding mmap pages size to 64K (16 pages) Error: Failed to mmap with 1 (Operation not permitted) $ perf top -m4 -a --stdio Error: Failed to mmap with 1 (Operation not permitted) $ perf top -m2 -a --stdio PerfTop: 762 irqs/sec kernel:49.7% exact: 100.0% lost: 0/0 drop: 0/0 [4000Hz cycles], (all, 4 CPUs) ------------------------------------------------------------------------------------------------------ 9.83% perf [.] __symbols__insert 8.58% perf [.] rb_next 5.91% [kernel] [k] module_get_kallsym 5.66% [kernel] [k] kallsyms_expand_symbol.constprop.0 3.98% libc-2.29.so [.] __GI_____strtoull_l_internal 3.66% perf [.] rb_insert_color 2.34% [kernel] [k] vsnprintf 2.30% [kernel] [k] string_nocheck 2.16% libc-2.29.so [.] _IO_getdelim 2.15% [kernel] [k] number 2.13% [kernel] [k] format_decode 1.58% libc-2.29.so [.] _IO_feof 1.52% libc-2.29.so [.] __strcmp_avx2 1.50% perf [.] rb_set_parent_color 1.47% libc-2.29.so [.] __libc_calloc 1.24% [kernel] [k] do_syscall_64 1.17% [kernel] [k] __x86_indirect_thunk_rax $ perf record -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.552 MB perf.data (74 samples) ] $ perf evlist cycles $ perf evlist -v cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 $ perf report | head -20 # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 74 of event 'cycles' # Event count (approx.): 15694834 # # Overhead Command Shared Object Symbol # ........ ............... .......................... ...................................... # 19.62% perf [kernel.vmlinux] [k] strnlen_user 13.88% swapper [kernel.vmlinux] [k] intel_idle 13.83% ksoftirqd/0 [kernel.vmlinux] [k] pfifo_fast_dequeue 13.51% swapper [kernel.vmlinux] [k] kmem_cache_free 6.31% gnome-shell [kernel.vmlinux] [k] kmem_cache_free 5.66% kworker/u8:3+ix [kernel.vmlinux] [k] delay_tsc 4.42% perf [kernel.vmlinux] [k] __set_cpus_allowed_ptr 3.45% kworker/2:1-eve [kernel.vmlinux] [k] shmem_truncate_range 2.29% gnome-shell libgobject-2.0.so.0.6000.7 [.] g_closure_ref $ Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Igor Lubashev <ilubashe@akamai.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Serge Hallyn <serge@hallyn.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: intel-gfx@lists.freedesktop.org Cc: linux-doc@vger.kernel.org Cc: linux-man@vger.kernel.org Cc: linux-security-module@vger.kernel.org Cc: selinux@vger.kernel.org Link: http://lore.kernel.org/lkml/a66d5648-2b8e-577e-e1f2-1d56c017ab5e@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Alexey Budankov authored
Open access to monitoring via kprobes and uprobes and eBPF tracing for CAP_PERFMON privileged process. Providing the access under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes operation more secure. perf kprobes and uprobes are used by ftrace and eBPF. perf probe uses ftrace to define new kprobe events, and those events are treated as tracepoint events. eBPF defines new probes via perf_event_open interface and then the probes are used in eBPF tracing. CAP_PERFMON implements the principle of least privilege for performance monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle of least privilege: A security design principle that states that a process or program be granted only those privileges (e.g., capabilities) necessary to accomplish its legitimate function, and only for the time that such privileges are actually required) For backward compatibility reasons access to perf_events subsystem remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure perf_events monitoring is discouraged with respect to CAP_PERFMON capability. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Igor Lubashev <ilubashe@akamai.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Serge Hallyn <serge@hallyn.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: intel-gfx@lists.freedesktop.org Cc: linux-doc@vger.kernel.org Cc: linux-security-module@vger.kernel.org Cc: selinux@vger.kernel.org Cc: linux-man@vger.kernel.org Link: http://lore.kernel.org/lkml/3c129d9a-ba8a-3483-ecc5-ad6c8e7c203f@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Alexey Budankov authored
Open access to monitoring of kernel code, CPUs, tracepoints and namespaces data for a CAP_PERFMON privileged process. Providing the access under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes operation more secure. CAP_PERFMON implements the principle of least privilege for performance monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle of least privilege: A security design principle that states that a process or program be granted only those privileges (e.g., capabilities) necessary to accomplish its legitimate function, and only for the time that such privileges are actually required) For backward compatibility reasons the access to perf_events subsystem remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure perf_events monitoring is discouraged with respect to CAP_PERFMON capability. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Igor Lubashev <ilubashe@akamai.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: linux-man@vger.kernel.org Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Serge Hallyn <serge@hallyn.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: intel-gfx@lists.freedesktop.org Cc: linux-doc@vger.kernel.org Cc: linux-security-module@vger.kernel.org Cc: selinux@vger.kernel.org Link: http://lore.kernel.org/lkml/471acaef-bb8a-5ce2-923f-90606b78eef9@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Alexey Budankov authored
Introduce the CAP_PERFMON capability designed to secure system performance monitoring and observability operations so that CAP_PERFMON can assist CAP_SYS_ADMIN capability in its governing role for performance monitoring and observability subsystems. CAP_PERFMON hardens system security and integrity during performance monitoring and observability operations by decreasing attack surface that is available to a CAP_SYS_ADMIN privileged process [2]. Providing the access to system performance monitoring and observability operations under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes the operation more secure. Thus, CAP_PERFMON implements the principle of least privilege for performance monitoring and observability operations (POSIX IEEE 1003.1e: 2.2.2.39 principle of least privilege: A security design principle that states that a process or program be granted only those privileges (e.g., capabilities) necessary to accomplish its legitimate function, and only for the time that such privileges are actually required) CAP_PERFMON meets the demand to secure system performance monitoring and observability operations for adoption in security sensitive, restricted, multiuser production environments (e.g. HPC clusters, cloud and virtual compute environments), where root or CAP_SYS_ADMIN credentials are not available to mass users of a system, and securely unblocks applicability and scalability of system performance monitoring and observability operations beyond root and CAP_SYS_ADMIN use cases. CAP_PERFMON takes over CAP_SYS_ADMIN credentials related to system performance monitoring and observability operations and balances amount of CAP_SYS_ADMIN credentials following the recommendations in the capabilities man page [1] for CAP_SYS_ADMIN: "Note: this capability is overloaded; see Notes to kernel developers, below." For backward compatibility reasons access to system performance monitoring and observability subsystems of the kernel remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN capability usage for secure system performance monitoring and observability operations is discouraged with respect to the designed CAP_PERFMON capability. Although the software running under CAP_PERFMON can not ensure avoidance of related hardware issues, the software can still mitigate these issues following the official hardware issues mitigation procedure [2]. The bugs in the software itself can be fixed following the standard kernel development process [3] to maintain and harden security of system performance monitoring and observability operations. [1] http://man7.org/linux/man-pages/man7/capabilities.7.html [2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html [3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.htmlSigned-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Acked-by: James Morris <jamorris@linux.microsoft.com> Acked-by: Serge E. Hallyn <serge@hallyn.com> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Igor Lubashev <ilubashe@akamai.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: intel-gfx@lists.freedesktop.org Cc: linux-doc@vger.kernel.org Cc: linux-man@vger.kernel.org Cc: linux-security-module@vger.kernel.org Cc: selinux@vger.kernel.org Link: http://lore.kernel.org/lkml/5590d543-82c6-490a-6544-08e6a5517db0@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
Add the DSO_BINARY_TYPE__BPF_IMAGE dso binary type to recognize BPF images that carry trampoline or dispatcher. Upcoming patches will add support to read the image data, store it within the BPF feature in perf.data and display it for annotation purposes. Currently we only display following message: # ./perf annotate bpf_trampoline_24456 --stdio Percent | Source code & Disassembly of . for cycles (504 ... --------------------------------------------------------------- ... : to be implemented Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andriin@fb.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@redhat.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jesper Dangaard Brouer <hawk@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200312195610.346362-16-jolsa@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
There's no special load action for ksymbol data on map__load/dso__load action, where the kernel is getting loaded. It only gets confused with kernel kallsyms/vmlinux load for bpf object, which fails and could mess up with the map. Disabling any further load of the map for ksymbol related dso/map. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andriin@fb.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@redhat.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jesper Dangaard Brouer <hawk@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200312195610.346362-15-jolsa@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
Synthesize bpf images (trampolines/dispatchers) on start, as ksymbol events from /proc/kallsyms. Having this perf can recognize samples from those images and perf report and top shows them correctly. The rest of the ksymbol handling is already in place from for the bpf programs monitoring, so only the initial state was needed. perf report output: # Overhead Command Shared Object Symbol 12.37% test_progs [kernel.vmlinux] [k] entry_SYSCALL_64 11.80% test_progs [kernel.vmlinux] [k] syscall_return_via_sysret 9.63% test_progs bpf_prog_bcf7977d3b93787c_prog2 [k] bpf_prog_bcf7977d3b93787c_prog2 6.90% test_progs bpf_trampoline_24456 [k] bpf_trampoline_24456 6.36% test_progs [kernel.vmlinux] [k] memcpy_erms Committer notes: Use scnprintf() instead of strncpy() to overcome this on fedora:32, rawhide and OpenMandriva Cooker: CC /tmp/build/perf/util/bpf-event.o In file included from /usr/include/string.h:495, from /git/linux/tools/lib/bpf/libbpf_common.h:12, from /git/linux/tools/lib/bpf/bpf.h:31, from util/bpf-event.c:4: In function 'strncpy', inlined from 'process_bpf_image' at util/bpf-event.c:323:2, inlined from 'kallsyms_process_symbol' at util/bpf-event.c:358:9: /usr/include/bits/string_fortified.h:106:10: error: '__builtin_strncpy' specified bound 256 equals destination size [-Werror=stringop-truncation] 106 | return __builtin___strncpy_chk (__dest, __src, __len, __bos (__dest)); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andriin@fb.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@redhat.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jesper Dangaard Brouer <hawk@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200312195610.346362-14-jolsa@kernel.org/Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
When --timeout is used and a workload is specified to be started by 'perf stat', i.e. $ perf stat --timeout 1000 sleep 1h The --timeout wasn't being honoured, i.e. the workload, 'sleep 1h' in the above example, should be terminated after 1000ms, but it wasn't, 'perf stat' was waiting for it to finish. Fix it by sending a SIGTERM when the timeout expires. Now it works: # perf stat -e cycles --timeout 1234 sleep 1h sleep: Terminated Performance counter stats for 'sleep 1h': 1,066,692 cycles 1.234314838 seconds time elapsed 0.000750000 seconds user 0.000000000 seconds sys # Fixes: f1f8ad52 ("perf stat: Add support to print counts after a period of time") Reported-by: Konstantin Kharlamov <hi-angel@yandex.ru> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207243Tested-by: Konstantin Kharlamov <hi-angel@yandex.ru> Cc: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: yuzhoujian <yuzhoujian@didichuxing.com> Link: https://lore.kernel.org/lkml/20200415153803.GB20324@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Ingo Molnar authored
Merge tag 'perf-urgent-for-mingo-5.7-20200414' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes from Arnaldo Carvalho de Melo: perf stat: Jin Yao: - Fix no metric header if --per-socket and --metric-only set build system: - Fix python building when built with clang, that was failing if the clang version doesn't support -fno-semantic-interposition. tools UAPI headers: Arnaldo Carvalho de Melo: - Update various copies of kernel headers, some ended up automatically updating build-time generated tables to enable tools such as 'perf trace' to decode syscalls and tracepoints arguments. Now the tools/perf build is free of UAPI drift warnings. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull EFI fixes from Ingo Molnar: "Misc EFI fixes, including the boot failure regression caused by the BSS section not being cleared by the loaders" * tag 'efi-urgent-2020-04-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/x86: Revert struct layout change to fix kexec boot regression efi/x86: Don't remap text<->rodata gap read-only for mixed mode efi/x86: Fix the deletion of variables in mixed mode efi/libstub/file: Merge file name buffers to reduce stack usage Documentation/x86, efi/x86: Clarify EFI handover protocol and its requirements efi/arm: Deal with ADR going out of range in efi_enter_kernel() efi/x86: Always relocate the kernel for EFI handover entry efi/x86: Move efi stub globals from .bss to .data efi/libstub/x86: Remove redundant assignment to pointer hdr efi/cper: Use scnprintf() for avoiding potential buffer overflow
-
- 14 Apr, 2020 25 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linuxLinus Torvalds authored
Pull hyperv fixes from Wei Liu: - a series from Tianyu Lan to fix crash reporting on Hyper-V - three miscellaneous cleanup patches * tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: x86/Hyper-V: Report crash data in die() when panic_on_oops is set x86/Hyper-V: Report crash register data when sysctl_record_panic_msg is not set x86/Hyper-V: Report crash register data or kmsg before running crash kernel x86/Hyper-V: Trigger crash enlightenment only once during system crash. x86/Hyper-V: Free hv_panic_page when fail to register kmsg dump x86/Hyper-V: Unload vmbus channel in hv panic callback x86: hyperv: report value of misc_features hv_debugfs: Make hv_debug_root static hv: hyperv_vmbus.h: Replace zero-length array with flexible-array member
-
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linuxLinus Torvalds authored
Pull btrfs fixes from David Sterba: "We have a few regressions and one fix for stable: - revert fsync optimization - fix lost i_size update - fix a space accounting leak - build fix, add back definition of a deprecated ioctl flag - fix search condition for old roots in relocation" * tag 'for-5.7-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: re-instantiate the removed BTRFS_SUBVOL_CREATE_ASYNC definition btrfs: fix reclaim counter leak of space_info objects btrfs: make full fsyncs always operate on the entire file again btrfs: fix lost i_size update after cloning inline extent btrfs: check commit root generation in should_ignore_root
-
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fsLinus Torvalds authored
Pull AFS fixes from David Howells: - Fix the decoding of fetched file status records so that the xdr pointer is advanced under all circumstances. - Fix the decoding of a fetched file status record that indicates an inline abort (ie. an error) so that it sets the flag saying the decoder stored the abort code. - Fix the decoding of the result of the rename operation so that it doesn't skip the decoding of the second fetched file status (ie. that of the dest dir) in the case that the source and dest dirs were the same as this causes the xdr pointer not to be advanced, leading to incorrect decoding of subsequent parts of the reply. - Fix the dump of a bad YFSFetchStatus record to dump the full length. - Fix a race between local editing of directory contents and accessing the dir for reading or d_revalidate by using the same lock in both. - Fix afs_d_revalidate() to not accidentally reverse the version on a dentry when it's meant to be bringing it forward. * tag 'afs-fixes-20200413' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Fix afs_d_validate() to set the right directory version afs: Fix race between post-modification dir edit and readdir/d_revalidate afs: Fix length of dump of bad YFSFetchStatus record afs: Fix rename operation status delivery afs: Fix decoding of inline abort codes from version 1 status records afs: Fix missing XDR advance in xdr_decode_{AFS,YFS}FSFetchStatus()
-
Arnaldo Carvalho de Melo authored
To pick up the changes in these csets: 295bcca8 ("linux/bits.h: add compile time sanity check of GENMASK inputs") 3945ff37 ("linux/bits.h: Extract common header for vDSO") To address this tools/perf build warning: Warning: Kernel ABI header at 'tools/include/linux/bits.h' differs from latest version at 'include/linux/bits.h' diff -u tools/include/linux/bits.h include/linux/bits.h This clashes with usage of userspace's static_assert(), that, at least on glibc, is guarded by a ifnded/endif pair, do the same to our copy of build_bug.h and avoid that diff in check_headers.sh so that we continue checking for drifts with the kernel sources master copy. This will all be tested with the set of build containers that includes uCLibc, musl libc, lots of glibc versions in lots of distros and cross build environments. The tools/objtool, tools/bpf, etc were tested as well. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Rikard Falkeborn <rikard.falkeborn@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
Will be needed when syncing the linux/bits.h header, in the next cset. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
To pick the changes from: d3b1b776 ("x86/entry/64: Remove ptregs qualifier from syscall table") cab56d34 ("x86/entry: Remove ABI prefixes from functions in syscall tables") 27dd84fa ("x86/entry/64: Use syscall wrappers for x32_rt_sigreturn") Addressing this tools/perf build warning: Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl' diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl That didn't result in any tooling changes, as what is extracted are just the first two columns, and these patches touched only the third. $ cp /tmp/build/perf/arch/x86/include/generated/asm/syscalls_64.c /tmp $ cp arch/x86/entry/syscalls/syscall_64.tbl tools/perf/arch/x86/entry/syscalls/syscall_64.tbl $ make -C tools/perf O=/tmp/build/perf install-bin make: Entering directory '/home/acme/git/perf/tools/perf' BUILD: Doing 'make -j12' parallel build DESCEND plugins CC /tmp/build/perf/util/syscalltbl.o INSTALL trace_plugins LD /tmp/build/perf/util/perf-in.o LD /tmp/build/perf/perf-in.o LINK /tmp/build/perf/perf $ diff -u /tmp/build/perf/arch/x86/include/generated/asm/syscalls_64.c /tmp/syscalls_64.c $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
To pick the change in: 88be76cd ("drm/i915: Allow userspace to specify ringsize on construction") That don't result in any changes in tooling, just silences this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h' diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
Picking the changes from: 455e00f1 ("drm: Add getfb2 ioctl") Silencing these perf build warnings: Warning: Kernel ABI header at 'tools/include/uapi/drm/drm.h' differs from latest version at 'include/uapi/drm/drm.h' diff -u tools/include/uapi/drm/drm.h include/uapi/drm/drm.h Now 'perf trace' and other code that might use the tools/perf/trace/beauty autogenerated tables will be able to translate this new ioctl code into a string: $ tools/perf/trace/beauty/drm_ioctl.sh > before $ cp include/uapi/drm/drm.h tools/include/uapi/drm/drm.h $ tools/perf/trace/beauty/drm_ioctl.sh > after $ diff -u before after --- before 2020-04-14 09:28:45.461821077 -0300 +++ after 2020-04-14 09:28:53.594782685 -0300 @@ -107,6 +107,7 @@ [0xCB] = "SYNCOBJ_QUERY", [0xCC] = "SYNCOBJ_TRANSFER", [0xCD] = "SYNCOBJ_TIMELINE_SIGNAL", + [0xCE] = "MODE_GETFB2", [DRM_COMMAND_BASE + 0x00] = "I915_INIT", [DRM_COMMAND_BASE + 0x01] = "I915_FLUSH", [DRM_COMMAND_BASE + 0x02] = "I915_FLIP", $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Daniel Stone <daniels@collabora.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Lyude Paul <lyude@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
To pick up the changes from: 9a5788c6 ("KVM: PPC: Book3S HV: Add a capability for enabling secure guests") 3c9bd400 ("KVM: x86: enable dirty log gradually in small chunks") 13da9ae1 ("KVM: s390: protvirt: introduce and enable KVM_CAP_S390_PROTECTED") e0d2773d ("KVM: s390: protvirt: UV calls in support of diag308 0, 1") 19e12277 ("KVM: S390: protvirt: Introduce instruction data area bounce buffer") 29b40f10 ("KVM: s390: protvirt: Add initial vm and cpu lifecycle handling") So far we're ignoring those arch specific ioctls, we need to revisit this at some time to have arch specific tables, etc: $ grep S390 tools/perf/trace/beauty/kvm_ioctl.sh egrep -v " ((ARM|PPC|S390)_|[GS]ET_(DEBUGREGS|PIT2|XSAVE|TSC_KHZ)|CREATE_SPAPR_TCE_64)" | \ $ This addresses these tools/perf build warnings: Warning: Kernel ABI header at 'tools/arch/arm/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm/include/uapi/asm/kvm.h' diff -u tools/arch/arm/include/uapi/asm/kvm.h arch/arm/include/uapi/asm/kvm.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jay Zhou <jianjay.zhou@huawei.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
To pick the changes from: e98ad464 ("fscrypt: add FS_IOC_GET_ENCRYPTION_NONCE ioctl") That don't trigger any changes in tooling. This silences this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/fscrypt.h' differs from latest version at 'include/uapi/linux/fscrypt.h' diff -u tools/include/uapi/linux/fscrypt.h include/uapi/linux/fscrypt.h In time we should come up with something like: $ tools/perf/trace/beauty/fsconfig.sh static const char *fsconfig_cmds[] = { [0] = "SET_FLAG", [1] = "SET_STRING", [2] = "SET_BINARY", [3] = "SET_PATH", [4] = "SET_PATH_EMPTY", [5] = "SET_FD", [6] = "CMD_CREATE", [7] = "CMD_RECONFIGURE", }; $ And: $ tools/perf/trace/beauty/drm_ioctl.sh | head #ifndef DRM_COMMAND_BASE #define DRM_COMMAND_BASE 0x40 #endif static const char *drm_ioctl_cmds[] = { [0x00] = "VERSION", [0x01] = "GET_UNIQUE", [0x02] = "GET_MAGIC", [0x03] = "IRQ_BUSID", [0x04] = "GET_MAP", [0x05] = "GET_CLIENT", $ For fscrypt's ioctls. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
To get the changes in: 4c8cf318 ("vhost: introduce vDPA-based backend") Silencing this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/vhost.h' differs from latest version at 'include/uapi/linux/vhost.h' diff -u tools/include/uapi/linux/vhost.h include/uapi/linux/vhost.h This automatically picks these new ioctls, making tools such as 'perf trace' aware of them and possibly allowing to use the strings in filters, etc: $ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > before $ cp include/uapi/linux/vhost.h tools/include/uapi/linux/vhost.h $ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > after $ diff -u before after --- before 2020-04-14 09:12:28.559748968 -0300 +++ after 2020-04-14 09:12:38.781696242 -0300 @@ -24,9 +24,16 @@ [0x44] = "SCSI_GET_EVENTS_MISSED", [0x60] = "VSOCK_SET_GUEST_CID", [0x61] = "VSOCK_SET_RUNNING", + [0x72] = "VDPA_SET_STATUS", + [0x74] = "VDPA_SET_CONFIG", + [0x75] = "VDPA_SET_VRING_ENABLE", }; static const char *vhost_virtio_ioctl_read_cmds[] = { [0x00] = "GET_FEATURES", [0x12] = "GET_VRING_BASE", [0x26] = "GET_BACKEND_FEATURES", + [0x70] = "VDPA_GET_DEVICE_ID", + [0x71] = "VDPA_GET_STATUS", + [0x73] = "VDPA_GET_CONFIG", + [0x76] = "VDPA_GET_VRING_NUM", }; $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
To pick up the changes from: 077168e2 ("x86/mce/amd: Add PPIN support for AMD MCE") 753039ef ("x86/cpu/amd: Call init_amd_zn() om Family 19h processors too") 6650cdd9 ("x86/split_lock: Enable split lock detection by kernel") These don't cause any changes in tooling, just silences this perf build warning: Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h' diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Wei Huang <wei.huang2@amd.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
To get the changes in: e346b381 ("mm/mremap: add MREMAP_DONTUNMAP to mremap()") Add that to 'perf trace's mremap 'flags' decoder. This silences this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/mman.h' differs from latest version at 'include/uapi/linux/mman.h' diff -u tools/include/uapi/linux/mman.h include/uapi/linux/mman.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Brian Geffon <bgeffon@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
To get the changes in: ef2c41cf ("clone3: allow spawning processes into cgroups") Add that to 'perf trace's clone 'flags' decoder. This silences this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/sched.h' differs from latest version at 'include/uapi/linux/sched.h' diff -u tools/include/uapi/linux/sched.h include/uapi/linux/sched.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
To get in line with: 8165b57b ("linux/const.h: Extract common header for vDSO") And silence this tools/perf/ build warning: Warning: Kernel ABI header at 'tools/include/linux/const.h' differs from latest version at 'include/linux/const.h' diff -u tools/include/linux/const.h include/linux/const.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jin Yao authored
We received a report that was no metric header displayed if --per-socket and --metric-only were both set. It's hard for script to parse the perf-stat output. This patch fixes this issue. Before: root@kbl-ppc:~# perf stat -a -M CPI --metric-only --per-socket ^C Performance counter stats for 'system wide': S0 8 2.6 2.215270071 seconds time elapsed root@kbl-ppc:~# perf stat -a -M CPI --metric-only --per-socket -I1000 # time socket cpus 1.000411692 S0 8 2.2 2.001547952 S0 8 3.4 3.002446511 S0 8 3.4 4.003346157 S0 8 4.0 5.004245736 S0 8 0.3 After: root@kbl-ppc:~# perf stat -a -M CPI --metric-only --per-socket ^C Performance counter stats for 'system wide': CPI S0 8 2.1 1.813579830 seconds time elapsed root@kbl-ppc:~# perf stat -a -M CPI --metric-only --per-socket -I1000 # time socket cpus CPI 1.000415122 S0 8 3.2 2.001630051 S0 8 2.9 3.002612278 S0 8 4.3 4.003523594 S0 8 3.0 5.004504256 S0 8 3.7 Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200331180226.25915-1-yao.jin@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
The set of C compiler options used by distros to build python bindings may include options that are unknown to clang, we check for a variety of such options, add -fno-semantic-interposition to that mix: This fixes the build on, among others, Manjaro Linux: GEN /tmp/build/perf/python/perf.so clang-9: error: unknown argument: '-fno-semantic-interposition' error: command 'clang' failed with exit status 1 make: Leaving directory '/git/perf/tools/perf' [perfbuilder@602aed1c266d ~]$ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/9.3.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /build/gcc/src/gcc/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-pkgversion='Arch Linux 9.3.0-1' --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++,d --enable-shared --enable-threads=posix --with-system-zlib --with-isl --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch --disable-libssp --enable-gnu-unique-object --enable-linker-build-id --enable-lto --enable-plugin --enable-install-libiberty --with-linker-hash-style=gnu --enable-gnu-indirect-function --enable-multilib --disable-werror --enable-checking=release --enable-default-pie --enable-default-ssp --enable-cet=auto gdc_include_dir=/usr/include/dlang/gdc Thread model: posix gcc version 9.3.0 (Arch Linux 9.3.0-1) [perfbuilder@602aed1c266d ~]$ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
To pick up the changes in: 6650cdd9 ("x86/split_lock: Enable split lock detection by kernel") Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h' diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h Which causes these changes in tooling: $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after $ diff -u before after --- before 2020-04-01 12:11:14.789344795 -0300 +++ after 2020-04-01 12:11:56.907798879 -0300 @@ -10,6 +10,7 @@ [0x00000029] = "KNC_EVNTSEL1", [0x0000002a] = "IA32_EBL_CR_POWERON", [0x0000002c] = "EBC_FREQUENCY_ID", + [0x00000033] = "TEST_CTRL", [0x00000034] = "SMI_COUNT", [0x0000003a] = "IA32_FEAT_CTL", [0x0000003b] = "IA32_TSC_ADJUST", @@ -27,6 +28,7 @@ [0x000000c2] = "IA32_PERFCTR1", [0x000000cd] = "FSB_FREQ", [0x000000ce] = "PLATFORM_INFO", + [0x000000cf] = "IA32_CORE_CAPS", [0x000000e2] = "PKG_CST_CONFIG_CONTROL", [0x000000e7] = "IA32_MPERF", [0x000000e8] = "IA32_APERF", $ $ make -C tools/perf O=/tmp/build/perf install-bin <SNIP> CC /tmp/build/perf/trace/beauty/tracepoints/x86_msr.o LD /tmp/build/perf/trace/beauty/tracepoints/perf-in.o LD /tmp/build/perf/trace/beauty/perf-in.o LD /tmp/build/perf/perf-in.o LINK /tmp/build/perf/perf <SNIP> Now one can do: perf trace -e msr:* --filter=msr==IA32_CORE_CAPS or: perf trace -e msr:* --filter='msr==IA32_CORE_CAPS || msr==TEST_CTRL' And see only those MSRs being accessed via: # perf trace -v -e msr:* --filter='msr==IA32_CORE_CAPS || msr==TEST_CTRL' New filter for msr:read_msr: (msr==0xcf || msr==0x33) && (common_pid != 8263 && common_pid != 23250) New filter for msr:write_msr: (msr==0xcf || msr==0x33) && (common_pid != 8263 && common_pid != 23250) New filter for msr:rdpmc: (msr==0xcf || msr==0x33) && (common_pid != 8263 && common_pid != 23250) Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/lkml/20200401153325.GC12534@kernel.org/Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Ard Biesheuvel authored
Commit 0a67361d ("efi/x86: Remove runtime table address from kexec EFI setup data") removed the code that retrieves the non-remapped UEFI runtime services pointer from the data structure provided by kexec, as it was never really needed on the kexec boot path: mapping the runtime services table at its non-remapped address is only needed when calling SetVirtualAddressMap(), which never happens during a kexec boot in the first place. However, dropping the 'runtime' member from struct efi_setup_data was a mistake. That struct is shared ABI between the kernel and the kexec tooling for x86, and so we cannot simply change its layout. So let's put back the removed field, but call it 'unused' to reflect the fact that we never look at its contents. While at it, add a comment to remind our future selves that the layout is external ABI. Fixes: 0a67361d ("efi/x86: Remove runtime table address from kexec EFI setup data") Reported-by: Theodore Ts'o <tytso@mit.edu> Tested-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dave Young <dyoung@redhat.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
Ard Biesheuvel authored
Commit d9e3d2c4 ("efi/x86: Don't map the entire kernel text RW for mixed mode") updated the code that creates the 1:1 memory mapping to use read-only attributes for the 1:1 alias of the kernel's text and rodata sections, to protect it from inadvertent modification. However, it failed to take into account that the unused gap between text and rodata is given to the page allocator for general use. If the vmap'ed stack happens to be allocated from this region, any by-ref output arguments passed to EFI runtime services that are allocated on the stack (such as the 'datasize' argument taken by GetVariable() when invoked from efivar_entry_size()) will be referenced via a read-only mapping, resulting in a page fault if the EFI code tries to write to it: BUG: unable to handle page fault for address: 00000000386aae88 #PF: supervisor write access in kernel mode #PF: error_code(0x0003) - permissions violation PGD fd61063 P4D fd61063 PUD fd62063 PMD 386000e1 Oops: 0003 [#1] SMP PTI CPU: 2 PID: 255 Comm: systemd-sysv-ge Not tainted 5.6.0-rc4-default+ #22 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0008:0x3eaeed95 Code: ... <89> 03 be 05 00 00 80 a1 74 63 b1 3e 83 c0 48 e8 44 d2 ff ff eb 05 RSP: 0018:000000000fd73fa0 EFLAGS: 00010002 RAX: 0000000000000001 RBX: 00000000386aae88 RCX: 000000003e9f1120 RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000001 RBP: 000000000fd73fd8 R08: 00000000386aae88 R09: 0000000000000000 R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000000 R13: ffffc0f040220000 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f21160ac940(0000) GS:ffff9cf23d500000(0000) knlGS:0000000000000000 CS: 0008 DS: 0018 ES: 0018 CR0: 0000000080050033 CR2: 00000000386aae88 CR3: 000000000fd6c004 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: Modules linked in: CR2: 00000000386aae88 ---[ end trace a8bfbd202e712834 ]--- Let's fix this by remapping text and rodata individually, and leave the gaps mapped read-write. Fixes: d9e3d2c4 ("efi/x86: Don't map the entire kernel text RW for mixed mode") Reported-by: Jiri Slaby <jslaby@suse.cz> Tested-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200409130434.6736-10-ardb@kernel.org
-
Gary Lin authored
efi_thunk_set_variable() treated the NULL "data" pointer as an invalid parameter, and this broke the deletion of variables in mixed mode. This commit fixes the check of data so that the userspace program can delete a variable in mixed mode. Fixes: 8319e9d5 ("efi/x86: Handle by-ref arguments covering multiple pages in mixed mode") Signed-off-by: Gary Lin <glin@suse.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200408081606.1504-1-glin@suse.com Link: https://lore.kernel.org/r/20200409130434.6736-9-ardb@kernel.org
-
Ard Biesheuvel authored
Arnd reports that commit 9302c1bb ("efi/libstub: Rewrite file I/O routine") reworks the file I/O routines in a way that triggers the following warning: drivers/firmware/efi/libstub/file.c:240:1: warning: the frame size of 1200 bytes is larger than 1024 bytes [-Wframe-larger-than=] We can work around this issue dropping an instance of efi_char16_t[256] from the stack frame, and reusing the 'filename' field of the file info struct that we use to obtain file information from EFI (which contains the file name even though we already know it since we used it to open the file in the first place) Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200409130434.6736-8-ardb@kernel.org
-
Ard Biesheuvel authored
The EFI handover protocol was introduced on x86 to permit the boot loader to pass a populated boot_params structure as an additional function argument to the entry point. This allows the bootloader to pass the base and size of a initrd image, which is more flexible than relying on the EFI stub's file I/O routines, which can only access the file system from which the kernel image itself was loaded from firmware. This approach requires a fair amount of internal knowledge regarding the layout of the boot_params structure on the part of the boot loader, as well as knowledge regarding the allowed placement of the initrd in memory, and so it has been deprecated in favour of a new initrd loading method that is based on existing UEFI protocols and best practices. So update the x86 boot protocol documentation to clarify that the EFI handover protocol has been deprecated, and while at it, add a note that invoking the EFI handover protocol still requires the PE/COFF image to be loaded properly (as opposed to simply being copied into memory). Also, drop the code32_start header field from the list of values that need to be provided, as this is no longer required. Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200409130434.6736-7-ardb@kernel.org
-
Ard Biesheuvel authored
Commit 0698fac4 ("efi/arm: Clean EFI stub exit code from cache instead of avoiding it") introduced a PC-relative reference to 'call_cache_fn' into efi_enter_kernel(), which lives way at the end of head.S. In some cases, the ARM version of the ADR instruction does not have sufficient range, resulting in a build error: arch/arm/boot/compressed/head.S:1453: Error: invalid constant (fffffffffffffbe4) after fixup ARM defines an alternative with a wider range, called ADRL, but this does not exist for Thumb-2. At the same time, the ADR instruction in Thumb-2 has a wider range, and so it does not suffer from the same issue. So let's switch to ADRL for ARM builds, and keep the ADR for Thumb-2 builds. Reported-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200409130434.6736-6-ardb@kernel.org
-
Arvind Sankar authored
Commit d5cdf4cf ("efi/x86: Don't relocate the kernel unless necessary") tries to avoid relocating the kernel in the EFI stub as far as possible. However, when systemd-boot is used to boot a unified kernel image [1], the image is constructed by embedding the bzImage as a .linux section in a PE executable that contains a small stub loader from systemd that will call the EFI stub handover entry, together with additional sections and potentially an initrd. When this image is constructed, by for example dracut, the initrd is placed after the bzImage without ensuring that at least init_size bytes are available for the bzImage. If the kernel is not relocated by the EFI stub, this could result in the compressed kernel's startup code in head_{32,64}.S overwriting the initrd. To prevent this, unconditionally relocate the kernel if the EFI stub was entered via the handover entry point. [1] https://systemd.io/BOOT_LOADER_SPECIFICATION/#type-2-efi-unified-kernel-images Fixes: d5cdf4cf ("efi/x86: Don't relocate the kernel unless necessary") Reported-by: Sergey Shatunov <me@prok.pw> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200406180614.429454-2-nivedita@alum.mit.edu Link: https://lore.kernel.org/r/20200409130434.6736-5-ardb@kernel.org
-