1. 17 Oct, 2019 1 commit
    • Joel Fernandes (Google)'s avatar
      perf_event: Add support for LSM and SELinux checks · da97e184
      Joel Fernandes (Google) authored
      In current mainline, the degree of access to perf_event_open(2) system
      call depends on the perf_event_paranoid sysctl.  This has a number of
      limitations:
      
      1. The sysctl is only a single value. Many types of accesses are controlled
         based on the single value thus making the control very limited and
         coarse grained.
      2. The sysctl is global, so if the sysctl is changed, then that means
         all processes get access to perf_event_open(2) opening the door to
         security issues.
      
      This patch adds LSM and SELinux access checking which will be used in
      Android to access perf_event_open(2) for the purposes of attaching BPF
      programs to tracepoints, perf profiling and other operations from
      userspace. These operations are intended for production systems.
      
      5 new LSM hooks are added:
      1. perf_event_open: This controls access during the perf_event_open(2)
         syscall itself. The hook is called from all the places that the
         perf_event_paranoid sysctl is checked to keep it consistent with the
         systctl. The hook gets passed a 'type' argument which controls CPU,
         kernel and tracepoint accesses (in this context, CPU, kernel and
         tracepoint have the same semantics as the perf_event_paranoid sysctl).
         Additionally, I added an 'open' type which is similar to
         perf_event_paranoid sysctl == 3 patch carried in Android and several other
         distros but was rejected in mainline [1] in 2016.
      
      2. perf_event_alloc: This allocates a new security object for the event
         which stores the current SID within the event. It will be useful when
         the perf event's FD is passed through IPC to another process which may
         try to read the FD. Appropriate security checks will limit access.
      
      3. perf_event_free: Called when the event is closed.
      
      4. perf_event_read: Called from the read(2) and mmap(2) syscalls for the event.
      
      5. perf_event_write: Called from the ioctl(2) syscalls for the event.
      
      [1] https://lwn.net/Articles/696240/
      
      Since Peter had suggest LSM hooks in 2016 [1], I am adding his
      Suggested-by tag below.
      
      To use this patch, we set the perf_event_paranoid sysctl to -1 and then
      apply selinux checking as appropriate (default deny everything, and then
      add policy rules to give access to domains that need it). In the future
      we can remove the perf_event_paranoid sysctl altogether.
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Co-developed-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarJames Morris <jmorris@namei.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: rostedt@goodmis.org
      Cc: Yonghong Song <yhs@fb.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: jeffv@google.com
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: primiano@google.com
      Cc: Song Liu <songliubraving@fb.com>
      Cc: rsavitski@google.com
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Matthew Garrett <matthewgarrett@google.com>
      Link: https://lkml.kernel.org/r/20191014170308.70668-1-joel@joelfernandes.org
      da97e184
  2. 15 Oct, 2019 1 commit
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo-5.5-20191011' of... · 39b656ee
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo-5.5-20191011' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      perf trace:
      
        Arnaldo Carvalho de Melo:
      
        - Reuse the strace-like syscall_arg_fmt->scnprintf() beautification routines
          (convert integer arguments into strings, like open flags, etc) in tracepoint
          arguments.
      
          For now the type based scnprintf routines (pid_t, umode_t, etc) and the
          ones based in well known arg name based ("fd", etc) gets associated with
          tracepoint args of that type.
      
          A tracepoint only arg, "msr", for the msr:{write,read}_msr gets added as
          an initial step.
      
        - Introduce syscall_arg_fmt->strtoul() methods to be the reverse operation
          of ->scnprintf(), i.e. to go from a string to an integer.
      
        - Implement --filter, just like in 'perf record', that affects the tracepoint
          events specied thus far in the command line, use the ->strtoul() methods
          to allow strings in tables associated with beautifiers to the integers
          the in-kernel tracepoint (eBPF later) filters expect, e.g.:
      
           # perf trace --max-events 1 -e sched:*ipi --filter="cpu==1 || cpu==2"
            0.000 as/24630 sched:sched_wake_idle_without_ipi(cpu: 1)
           #
      
           # perf trace --max-events 1 --max-stack=32 -e msr:* --filter="msr==IA32_TSC_DEADLINE"
            207.000 cc1/19963 msr:write_msr(msr: IA32_TSC_DEADLINE, val: 5442316760822)
                                              do_trace_write_msr ([kernel.kallsyms])
                                              do_trace_write_msr ([kernel.kallsyms])
                                              lapic_next_deadline ([kernel.kallsyms])
                                              clockevents_program_event ([kernel.kallsyms])
                                              hrtimer_interrupt ([kernel.kallsyms])
                                              smp_apic_timer_interrupt ([kernel.kallsyms])
                                              apic_timer_interrupt ([kernel.kallsyms])
                                              [0x6ff66c] (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1)
                                              [0x7047c3] (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1)
                                              [0x707708] (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1)
                                              execute_one_pass (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1)
                                              [0x4f3d37] (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1)
                                              [0x4f3d49] (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1)
                                              execute_pass_list (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1)
                                              cgraph_node::expand (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1)
                                              [0x2625b4] (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1)
                                              symbol_table::finalize_compilation_unit (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1)
                                              [0x5ae8b9] (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1)
                                              toplev::main (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1)
                                              main (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1)
                                              [0x26b6a] (/usr/lib/x86_64-linux-gnu/libc-2.29.so)
           #
           # perf trace --max-events 8 -e msr:* --filter="msr==IA32_SPEC_CTRL"
               0.000 :13281/13281 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
               0.063 migration/3/25 msr:write_msr(msr: IA32_SPEC_CTRL)
               0.217 kworker/u16:1-/4826 msr:write_msr(msr: IA32_SPEC_CTRL)
               0.687 rcu_sched/11 msr:write_msr(msr: IA32_SPEC_CTRL)
               0.696 :13280/13280 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
               0.305 :13281/13281 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
               0.355 :13274/13274 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
               2.743 kworker/u16:0-/6711 msr:write_msr(msr: IA32_SPEC_CTRL)
           #
           # perf trace --max-events 8 --cpu 1 -e msr:* --filter="msr!=IA32_SPEC_CTRL && msr!=IA32_TSC_DEADLINE && msr != FS_BASE"
                 0.000 mtr-packet/30819 msr:write_msr(msr: 0x830, val: 68719479037)
                 0.096 :0/0 msr:read_msr(msr: IA32_TSC_ADJUST)
               238.925 mtr-packet/30819 msr:write_msr(msr: 0x830, val: 8589936893)
               511.010 :0/0 msr:write_msr(msr: 0x830, val: 68719479037)
              1005.052 :0/0 msr:read_msr(msr: IA32_TSC_ADJUST)
              1235.131 CPU 0/KVM/3750 msr:write_msr(msr: 0x830, val: 4294969595)
              1235.195 CPU 0/KVM/3750 msr:read_msr(msr: IA32_SYSENTER_ESP, val: -2199023037952)
              1235.201 CPU 0/KVM/3750 msr:read_msr(msr: IA32_APICBASE, val: 4276096000)
           #
      
        - Default to not using libtraceevent and its plugins for beautifying
          tracepoint arguments, since now we're reusing the strace-like beatufiers.
          Use --libtraceevent_print (using just --libtrace is unambiguous and can
          be used as a short hand) to go back to those beautifiers.
      
          This will help in the transition, as can be seen in some of the sched tracepoints
          that still need some work in the libbeauty based mode:
      
          # trace --no-inherit -e msr:*,*sleep,sched:* sleep 1
               0.000 (         ): sched:sched_waking(comm: "trace", pid: 3319 (trace), prio: 120, success: 1)
               0.006 (         ): sched:sched_wakeup(comm: "trace", pid: 3319 (trace), prio: 120, success: 1)
               0.348 (         ): sched:sched_process_exec(filename: 140212596720100, pid: 3319 (sleep), old_pid: 3319 (sleep))
               0.490 (         ): msr:write_msr(msr: FS_BASE, val: 139631189321088)
               0.670 (         ): nanosleep(rqtp: 0x7ffc52c23bc0)                                    ...
               0.674 (         ): sched:sched_stat_runtime(comm: "sleep", pid: 3319 (sleep), runtime: 659259, vruntime: 78942418342)
               0.675 (         ): sched:sched_switch(prev_comm: "sleep", prev_pid: 3319 (sleep), prev_prio: 120, prev_state: 1, next_comm: "swapper/0", next_prio: 120)
            1001.059 (         ): sched:sched_waking(comm: "sleep", pid: 3319 (sleep), prio: 120, success: 1)
            1001.098 (         ): sched:sched_wakeup(comm: "sleep", pid: 3319 (sleep), prio: 120, success: 1)
               0.670 (1000.504 ms):  ... [continued]: nanosleep())                                        = 0
            1001.456 (         ): sched:sched_process_exit(comm: "sleep", pid: 3319 (sleep), prio: 120)
          # trace --libtrace --no-inherit -e msr:*,*sleep,sched:* sleep 1
          # trace --libtrace --no-inherit -e msr:*,*sleep,sched:* sleep 1
               0.000 (         ): sched:sched_waking(comm=trace pid=3323 prio=120 target_cpu=000)
               0.007 (         ): sched:sched_wakeup(comm=trace pid=3323 prio=120 target_cpu=000)
               0.382 (         ): sched:sched_process_exec(filename=/usr/bin/sleep pid=3323 old_pid=3323)
               0.525 (         ): msr:write_msr(c0000100, value 7f5d508a0580)
               0.713 (         ): nanosleep(rqtp: 0x7fff487fb4a0)                                    ...
               0.717 (         ): sched:sched_stat_runtime(comm=sleep pid=3323 runtime=617722 [ns] vruntime=78957731636 [ns])
               0.719 (         ): sched:sched_switch(prev_comm=sleep prev_pid=3323 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120)
            1001.117 (         ): sched:sched_waking(comm=sleep pid=3323 prio=120 target_cpu=000)
            1001.157 (         ): sched:sched_wakeup(comm=sleep pid=3323 prio=120 target_cpu=000)
               0.713 (1000.522 ms):  ... [continued]: nanosleep())                                        = 0
            1001.538 (         ): sched:sched_process_exit(comm=sleep pid=3323 prio=120)
          #
      
        - Make -v (verbose) mode be honoured for .perfconfig based trace.add_events,
          to help in diagnosing problems with building eBPF events (-e source.c).
      
        - When using eBPF syscall payload augmentation do not show strace-like
          syscalls when all the user specified was some tracepoint event, bringing
          the behaviour in line with that of when not using eBPF augmentation.
      
      Intel PT:
      
        exported-sql-viewer GUI:
      
        Adrian Hunter:
      
        - Add LookupModel, HBoxLayout, VBoxLayout, global time range calculations
          so as to add a time chart by CPU.
      
      perf script:
      
        Andi Kleen:
      
        - Allow --time (to specify a time span of interest) with --reltime
      
      perf diff:
      
        Jin Yao:
      
        - Report noise for cycles diff, i.e. a histogram + stddev.
          (timestamps relative to start).
      
      perf annotate:
      
        Arnaldo Carvalho de Melo:
      
        - Initialize env->cpuid when running in live mode (perf top), as it
          is used in some of the per arch annotation init routines.
      
      samples bpf:
      
        Björn Töpel:
      
        - Fixup fallout of using tools/perf/perf-sys. from outside tools/perf.
      
      Core:
      
        Ian Rogers:
      
        - Avoid 'sample_reg_masks' being const + weak, as this breaks with some
          compilers that constant-propagate from the weak symbol.
      
      libperf:
      
        - First part of moving the perf_mmap class from tools/perf to libperf.
      
        - Propagate CFLAGS to libperf from the tools/perf Makefile.
      
      Vendor events:
      
        John Garry:
      
        - Add entry in MAINTAINERS with reviewers for the for perf tool arm64
          pmu-events files.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      39b656ee
  3. 13 Oct, 2019 16 commits
  4. 12 Oct, 2019 22 commits