1. 12 Aug, 2016 3 commits
    • Kan Liang's avatar
      perf/x86/intel/uncore: Add enable_box for client MSR uncore · 95f3be79
      Kan Liang authored
      There are bug reports about miscounting uncore counters on some
      client machines like Sandybridge, Broadwell and Skylake. It is
      very likely to be observed on idle systems.
      
      This issue is caused by a hardware issue. PERF_GLOBAL_CTL could be
      cleared after Package C7, and nothing will be count.
      The related errata (HSD 158) could be found in:
      
        www.intel.com/content/dam/www/public/us/en/documents/specification-updates/4th-gen-core-family-desktop-specification-update.pdf
      
      This patch tries to work around this issue by re-enabling PERF_GLOBAL_CTL
      in ->enable_box(). The workaround does not cover all cases. It helps for new
      events after returning from C7. But it cannot prevent C7, it will still
      miscount if a counter is already active.
      
      There is no drawback in leaving it enabled, so it does not need
      disable_box() here.
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1470925874-59943-1-git-send-email-kan.liang@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      95f3be79
    • Kan Liang's avatar
      perf/x86/intel/uncore: Fix uncore num_counters · 10e9e7bd
      Kan Liang authored
      Some uncore boxes' num_counters value for Haswell server and
      Broadwell server are not correct (too large, off by one).
      
      This issue was found by comparing the code with the document. Although
      there is no bug report from users yet, accessing non-existent counters
      is dangerous and the behavior is undefined: it may cause miscounting or
      even crashes.
      
      This patch makes them consistent with the uncore document.
      Reported-by: default avatarLukasz Odzioba <lukasz.odzioba@intel.com>
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1470925820-59847-1-git-send-email-kan.liang@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      10e9e7bd
    • Denys Vlasenko's avatar
      uprobes/x86: Fix RIP-relative handling of EVEX-encoded instructions · 68187872
      Denys Vlasenko authored
      Since instruction decoder now supports EVEX-encoded instructions, two fixes
      are needed to correctly handle them in uprobes.
      
      Extended bits for MODRM.rm field need to be sanitized just like we do it
      for VEX3, to avoid encoding wrong register for register-relative access.
      
      EVEX has _two_ extended bits: b and x. Theoretically, EVEX.x should be
      ignored by the CPU (since GPRs go only up to 15, not 31), but let's be
      paranoid here: proper encoding for register-relative access
      should have EVEX.x = 1.
      
      Secondly, we should fetch vex.vvvv for EVEX too.
      This is now super easy because instruction decoder populates
      vex_prefix.bytes[2] for all flavors of (e)vex encodings, even for VEX2.
      Signed-off-by: default avatarDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Acked-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jim Keniston <jkenisto@us.ibm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Cc: <stable@vger.kernel.org> # v4.1+
      Fixes: 8a764a87 ("x86/asm/decoder: Create artificial 3rd byte for 2-byte VEX")
      Link: http://lkml.kernel.org/r/20160811154521.20469-1-dvlasenk@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      68187872
  2. 10 Aug, 2016 2 commits
    • David Carrillo-Cisneros's avatar
      perf/core: Set cgroup in CPU contexts for new cgroup events · db4a8356
      David Carrillo-Cisneros authored
      There's a perf stat bug easy to observer on a machine with only one cgroup:
      
        $ perf stat -e cycles -I 1000 -C 0 -G /
        #          time             counts unit events
            1.000161699      <not counted>      cycles                    /
            2.000355591      <not counted>      cycles                    /
            3.000565154      <not counted>      cycles                    /
            4.000951350      <not counted>      cycles                    /
      
      We'd expect some output there.
      
      The underlying problem is that there is an optimization in
      perf_cgroup_sched_{in,out}() that skips the switch of cgroup events
      if the old and new cgroups in a task switch are the same.
      
      This optimization interacts with the current code in two ways
      that cause a CPU context's cgroup (cpuctx->cgrp) to be NULL even if a
      cgroup event matches the current task. These are:
      
        1. On creation of the first cgroup event in a CPU: In current code,
        cpuctx->cpu is only set in perf_cgroup_sched_in, but due to the
        aforesaid optimization, perf_cgroup_sched_in will run until the next
        cgroup switches in that CPU. This may happen late or never happen,
        depending on system's number of cgroups, CPU load, etc.
      
        2. On deletion of the last cgroup event in a cpuctx: In list_del_event,
        cpuctx->cgrp is set NULL. Any new cgroup event will not be sched in
        because cpuctx->cgrp == NULL until a cgroup switch occurs and
        perf_cgroup_sched_in is executed (updating cpuctx->cgrp).
      
      This patch fixes both problems by setting cpuctx->cgrp in list_add_event,
      mirroring what list_del_event does when removing a cgroup event from CPU
      context, as introduced in:
      
        commit 68cacd29 ("perf_events: Fix stale ->cgrp pointer in update_cgrp_time_from_cpuctx()")
      
      With this patch, cpuctx->cgrp is always set/clear when installing/removing
      the first/last cgroup event in/from the CPU context. With cpuctx->cgrp
      correctly set, event_filter_match works as intended when events are
      sched in/out.
      
      After the fix, the output is as expected:
      
        $ perf stat -e cycles -I 1000 -a -G /
        #         time             counts unit events
           1.004699159          627342882      cycles                    /
           2.007397156          615272690      cycles                    /
           3.010019057          616726074      cycles                    /
      Signed-off-by: default avatarDavid Carrillo-Cisneros <davidcc@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vegard Nossum <vegard.nossum@gmail.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1470124092-113192-1-git-send-email-davidcc@google.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      db4a8356
    • Peter Zijlstra's avatar
      perf/core: Fix sideband list-iteration vs. event ordering NULL pointer deference crash · 0b8f1e2e
      Peter Zijlstra authored
      Vegard Nossum reported that perf fuzzing generates a NULL
      pointer dereference crash:
      
      > Digging a bit deeper into this, it seems the event itself is getting
      > created by perf_event_open() and it gets added to the pmu_event_list
      > through:
      >
      > perf_event_open()
      >  - perf_event_alloc()
      >     - account_event()
      >        - account_pmu_sb_event()
      >           - attach_sb_event()
      >
      > so at this point the event is being attached but its ->ctx is still
      > NULL. It seems like ->ctx is set just a bit later in
      > perf_event_open(), though.
      >
      > But before that, __schedule() comes along and creates a stack trace
      > similar to the one above:
      >
      > __schedule()
      >  - __perf_event_task_sched_out()
      >    - perf_iterate_sb()
      >      - perf_iterate_sb_cpu()
      >         - event_filter_match()
      >           - perf_cgroup_match()
      >             - __get_cpu_context()
      >               - (dereference ctx which is NULL)
      >
      > So I guess the question is... should the event be attached (= put on
      > the list) before ->ctx gets set? Or should the cgroup code check for a
      > NULL ->ctx?
      
      The latter seems like the simplest solution. Moving the list-add later
      creates a bit of a mess.
      Reported-by: default avatarVegard Nossum <vegard.nossum@gmail.com>
      Tested-by: default avatarVegard Nossum <vegard.nossum@gmail.com>
      Tested-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Carrillo-Cisneros <davidcc@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: f2fb6bef ("perf/core: Optimize side-band event delivery")
      Link: http://lkml.kernel.org/r/20160804123724.GN6862@twins.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0b8f1e2e
  3. 09 Aug, 2016 12 commits
    • Ingo Molnar's avatar
      Merge tag 'perf-urgent-for-mingo-20160809' of... · 69766c40
      Ingo Molnar authored
      Merge tag 'perf-urgent-for-mingo-20160809' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
      
      Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
      
      User visible fixes:
      
      - Fix the lookup for a kernel module in 'perf probe', fixing for instance, the
        erroneous return of "[raid10]" when looking for "[raid1]"  (Konstantin Khlebnikov)
      
      - Disable counters in a group before reading them in 'perf stat', to avoid skew (Mark Rutland)
      
      - Fix adding probes to function aliases in systems using kaslr (Masami Hiramatsu)
      
      - Trip libtraceevent trace_seq buffers, removing unnecessary memory usage that could
        bring a system using tracepoint events with 'perf top' to a crawl, as the trace_seq
        buffers start at a whooping 4 KB, which is very rarely used in perf's usecases,
        so realloc it to the really used space as a last measure after using libtraceevent
        functions to format the fields of tracepoint events (Arnaldo Carvalho de Melo)
      
      - Fix 'perf probe' location when using DWARF on ppc64le (Ravi Bangoria)
      
      - Allow specifying signedness casts to a 'perf probe' variable, to shorten
        the number of steps to see signed values that otherwise would always appear
        as hex values (Naohiro Aota)
      
      Documentation fixes:
      
      - Add 'bpf-output' field to 'perf script' usage message (Brendan Gregg)
      
      Infrastructure fixes:
      
      - Sync kernel header files: cpufeatures.h, {disabled,required}-features.h,
        bpf.h and vmx.h, so that we get a clean build, without warnings about files
        being different from the kernel counterparts.
      
        A verification of the need or desirability of changes in tools/ based on what
        was done in the kernel changesets was made and documented in the respective
        file sync changesets (Arnaldo Carvalho de Melo)
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      69766c40
    • Ravi Bangoria's avatar
      perf probe ppc64le: Fix probe location when using DWARF · 99e608b5
      Ravi Bangoria authored
      Powerpc has Global Entry Point and Local Entry Point for functions.  LEP
      catches call from both the GEP and the LEP. Symbol table of ELF contains
      GEP and Offset from which we can calculate LEP, but debuginfo does not
      have LEP info.
      
      Currently, perf prioritize symbol table over dwarf to probe on LEP for
      ppc64le. But when user tries to probe with function parameter, we fall
      back to using dwarf(i.e. GEP) and when function called via LEP, probe
      will never hit.
      
      For example:
      
        $ objdump -d vmlinux
          ...
          do_sys_open():
          c0000000002eb4a0:       e8 00 4c 3c     addis   r2,r12,232
          c0000000002eb4a4:       60 00 42 38     addi    r2,r2,96
          c0000000002eb4a8:       a6 02 08 7c     mflr    r0
          c0000000002eb4ac:       d0 ff 41 fb     std     r26,-48(r1)
      
        $ sudo ./perf probe do_sys_open
        $ sudo cat /sys/kernel/debug/tracing/kprobe_events
          p:probe/do_sys_open _text+3060904
      
        $ sudo ./perf probe 'do_sys_open filename:string'
        $ sudo cat /sys/kernel/debug/tracing/kprobe_events
          p:probe/do_sys_open _text+3060896 filename_string=+0(%gpr4):string
      
      For second case, perf probed on GEP. So when function will be called via
      LEP, probe won't hit.
      
        $ sudo ./perf record -a -e probe:do_sys_open ls
          [ perf record: Woken up 1 times to write data ]
          [ perf record: Captured and wrote 0.195 MB perf.data ]
      
      To resolve this issue, let's not prioritize symbol table, let perf
      decide what it wants to use. Perf is already converting GEP to LEP when
      it uses symbol table. When perf uses debuginfo, let it find LEP offset
      form symbol table. This way we fall back to probe on LEP for all cases.
      
      After patch:
      
        $ sudo ./perf probe 'do_sys_open filename:string'
        $ sudo cat /sys/kernel/debug/tracing/kprobe_events
          p:probe/do_sys_open _text+3060904 filename_string=+0(%gpr4):string
      
        $ sudo ./perf record -a -e probe:do_sys_open ls
          [ perf record: Woken up 1 times to write data ]
          [ perf record: Captured and wrote 0.197 MB perf.data (11 samples) ]
      Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1470723805-5081-2-git-send-email-ravi.bangoria@linux.vnet.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      99e608b5
    • Ravi Bangoria's avatar
      perf probe: Add function to post process kernel trace events · d820456d
      Ravi Bangoria authored
      Instead of inline code, introduce function to post process kernel
      probe trace events.
      Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1470723805-5081-1-git-send-email-ravi.bangoria@linux.vnet.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d820456d
    • Arnaldo Carvalho de Melo's avatar
      tools: Sync cpufeatures headers with the kernel · 840b49ba
      Arnaldo Carvalho de Melo authored
      Due to:
      
        1e61f78b ("x86/cpufeature: Make sure DISABLED/REQUIRED macros are updated")
      
      No changes to tools using those headers (tools/arch/x86/lib/mem{set,cpu}_64.S)
      seems necessary.
      
      Detected by the tools build header drift checker:
      
        $ make -C tools/perf O=/tmp/build/perf
        make: Entering directory '/home/acme/git/linux/tools/perf'
          BUILD:   Doing 'make -j4' parallel build
          GEN      /tmp/build/perf/common-cmds.h
        Warning: tools/arch/x86/include/asm/disabled-features.h differs from kernel
        Warning: tools/arch/x86/include/asm/required-features.h differs from kernel
        Warning: tools/arch/x86/include/asm/cpufeatures.h differs from kernel
          CC       /tmp/build/perf/util/probe-finder.o
          CC       /tmp/build/perf/builtin-help.o
        <SNIP>
        ^C$
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-ja75m7zk8j0jkzmrv16i5ehw@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      840b49ba
    • Arnaldo Carvalho de Melo's avatar
      toops: Sync tools/include/uapi/linux/bpf.h with the kernel · 791cceb8
      Arnaldo Carvalho de Melo authored
      The way we're using kernel headers in tools/ now, with a copy that is
      made to the same path prefixed by "tools/" plus checking if that copy
      got stale, i.e. if the kernel counterpart changed, helps in keeping
      track with new features that may be useful for tools to exploit.
      
      For instance, looking at all the changes to bpf.h since it was last
      copied to tools/include brings this to toolers' attention:
      
      Need to investigate this one to check how to run a program via perf, setting up
      a BPF event, that will take advantage of the way perf already calls clang/LLVM,
      sets up the event and runs the workload in a single command line, helping in
      debugging such semi cooperative programs:
      
        96ae5227 ("bpf: Add bpf_probe_write_user BPF helper to be called in tracers")
      
      This one needs further investigation about using the feature it improves
      in 'perf trace' to do some tcpdumpin' mixed with syscalls, tracepoints,
      probe points, callgraphs, etc:
      
        555c8a86 ("bpf: avoid stack copy and use skb ctx for event output")
      
      Add tracing just packets that are related to some container to that mix:
      
        4a482f34 ("cgroup: bpf: Add bpf_skb_in_cgroup_proto")
        4ed8ec52 ("cgroup: bpf: Add BPF_MAP_TYPE_CGROUP_ARRAY")
      
      Definetely needs to have example programs accessing task_struct from a bpf proggie
      started from 'perf trace':
      
        606274c5 ("bpf: introduce bpf_get_current_task() helper")
      
      Core networking related, XDP:
      
        6ce96ca3 ("bpf: add XDP_TX xdp_action for direct forwarding")
        6a773a15 ("bpf: add XDP prog type for early driver filter")
        13c5c240 ("bpf: add bpf_get_hash_recalc helper")
        d2485c42 ("bpf: add bpf_skb_change_type helper")
        6578171a ("bpf: add bpf_skb_change_proto helper")
      
      Changes detected by the tools build system:
      
        $ make -C tools/perf O=/tmp/build/perf install-bin
        make: Entering directory '/home/acme/git/linux/tools/perf'
          BUILD:   Doing 'make -j4' parallel build
        Warning: tools/include/uapi/linux/bpf.h differs from kernel
          INSTALL  GTK UI
          CC       /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
        <SNIP>
        $
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Brenden Blanco <bblanco@plumgrid.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Sargun Dhillon <sargun@sargun.me>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-difq4ts1xvww6eyfs9e7zlft@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      791cceb8
    • Arnaldo Carvalho de Melo's avatar
      tools: Sync cpufeatures.h and vmx.h with the kernel · bebfb730
      Arnaldo Carvalho de Melo authored
      There were changes related to the deprecation of the "pcommit"
      instruction:
      
        fd1d961d ("x86/insn: remove pcommit")
        dfa169bb ("Revert "KVM: x86: add pcommit support"")
      
      No need to update anything in the tools, as "pcommit" wasn't being
      listed on the VMX_EXIT_REASONS in the tools/perf/arch/x86/util/kvm-stat.c
      file.
      
      Just grab fresh copies of these files to silence the file cache
      coherency detector:
      
        $ make -C tools/perf O=/tmp/build/perf install-bin
        make: Entering directory '/home/acme/git/linux/tools/perf'
          BUILD:   Doing 'make -j4' parallel build
        Warning: tools/arch/x86/include/asm/cpufeatures.h differs from kernel
        Warning: tools/arch/x86/include/uapi/asm/vmx.h differs from kernel
          INSTALL  GTK UI
        <SNIP>
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
      Link: http://lkml.kernel.org/n/tip-07pmcc1ysydhyyxbmp1vt0l4@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bebfb730
    • Naohiro Aota's avatar
      perf probe: Support signedness casting · 19f00b01
      Naohiro Aota authored
      The 'perf probe' tool detects a variable's type and use the detected
      type to add a new probe. Then, kprobes prints its variable in
      hexadecimal format if the variable is unsigned and prints in decimal if
      it is signed.
      
      We sometimes want to see unsigned variable in decimal format (i.e.
      sector_t or size_t). In that case, we need to investigate the variable's
      size manually to specify just signedness.
      
      This patch add signedness casting support. By specifying "s" or "u" as a
      type, perf-probe will investigate variable size as usual and use the
      specified signedness.
      
      E.g. without this:
      
        $ perf probe -a 'submit_bio bio->bi_iter.bi_sector'
        Added new event:
          probe:submit_bio     (on submit_bio with bi_sector=bio->bi_iter.bi_sector)
        You can now use it in all perf tools, such as:
                perf record -e probe:submit_bio -aR sleep 1
        $ cat trace_pipe|head
                dbench-9692  [003] d..1   971.096633: submit_bio: (submit_bio+0x0/0x140) bi_sector=0x3a3d00
                dbench-9692  [003] d..1   971.096685: submit_bio: (submit_bio+0x0/0x140) bi_sector=0x1a3d80
                dbench-9692  [003] d..1   971.096687: submit_bio: (submit_bio+0x0/0x140) bi_sector=0x3a3d80
      ...
        // need to investigate the variable size
        $ perf probe -a 'submit_bio bio->bi_iter.bi_sector:s64'
        Added new event:
          probe:submit_bio     (on submit_bio with bi_sector=bio->bi_iter.bi_sector:s64)
        You can now use it in all perf tools, such as:
              perf record -e probe:submit_bio -aR sleep 1
      
        With this:
      
        // just use "s" to cast its signedness
        $ perf probe -v -a 'submit_bio bio->bi_iter.bi_sector:s'
        Added new event:
          probe:submit_bio     (on submit_bio with bi_sector=bio->bi_iter.bi_sector:s)
        You can now use it in all perf tools, such as:
                perf record -e probe:submit_bio -aR sleep 1
        $ cat trace_pipe|head
                dbench-9689  [001] d..1  1212.391237: submit_bio: (submit_bio+0x0/0x140) bi_sector=128
                dbench-9689  [001] d..1  1212.391252: submit_bio: (submit_bio+0x0/0x140) bi_sector=131072
                dbench-9697  [006] d..1  1212.398611: submit_bio: (submit_bio+0x0/0x140) bi_sector=30208
      
        This commit also update perf-probe.txt to describe "types". Most parts
        are based on existing documentation: Documentation/trace/kprobetrace.txt
      
      Committer note:
      
      Testing using 'perf trace':
      
        # perf probe -a 'submit_bio bio->bi_iter.bi_sector'
        Added new event:
          probe:submit_bio     (on submit_bio with bi_sector=bio->bi_iter.bi_sector)
      
        You can now use it in all perf tools, such as:
      
      	perf record -e probe:submit_bio -aR sleep 1
      
        # trace --no-syscalls --ev probe:submit_bio
            0.000 probe:submit_bio:(ffffffffac3aee00) bi_sector=0xc133c0)
         3181.861 probe:submit_bio:(ffffffffac3aee00) bi_sector=0x6cffb8)
         3181.881 probe:submit_bio:(ffffffffac3aee00) bi_sector=0x6cffc0)
         3184.488 probe:submit_bio:(ffffffffac3aee00) bi_sector=0x6cffc8)
      <SNIP>
         4717.927 probe:submit_bio:(ffffffffac3aee00) bi_sector=0x4dc7a88)
         4717.970 probe:submit_bio:(ffffffffac3aee00) bi_sector=0x4dc7880)
        ^C[root@jouet ~]#
      
      Now, using this new feature:
      
      [root@jouet ~]# perf probe -a 'submit_bio bio->bi_iter.bi_sector:s'
      Added new event:
        probe:submit_bio     (on submit_bio with bi_sector=bio->bi_iter.bi_sector:s)
      
      You can now use it in all perf tools, such as:
      
      	perf record -e probe:submit_bio -aR sleep 1
      
        [root@jouet ~]# trace --no-syscalls --ev probe:submit_bio
           0.000 probe:submit_bio:(ffffffffac3aee00) bi_sector=7145704)
           0.017 probe:submit_bio:(ffffffffac3aee00) bi_sector=7145712)
           0.019 probe:submit_bio:(ffffffffac3aee00) bi_sector=7145720)
           2.567 probe:submit_bio:(ffffffffac3aee00) bi_sector=7145728)
        5631.919 probe:submit_bio:(ffffffffac3aee00) bi_sector=0)
        5631.941 probe:submit_bio:(ffffffffac3aee00) bi_sector=8)
        5631.945 probe:submit_bio:(ffffffffac3aee00) bi_sector=16)
        5631.948 probe:submit_bio:(ffffffffac3aee00) bi_sector=24)
        ^C#
      
      With callchains:
      
        # trace --no-syscalls --ev probe:submit_bio/max-stack=10/
           0.000 probe:submit_bio:(ffffffffac3aee00) bi_sector=50662544)
                                             submit_bio+0xa8200001 ([kernel.kallsyms])
                                             submit_bh+0xa8200013 ([kernel.kallsyms])
                                             jbd2_journal_commit_transaction+0xa8200691 ([kernel.kallsyms])
                                             kjournald2+0xa82000ca ([kernel.kallsyms])
                                             kthread+0xa82000d8 ([kernel.kallsyms])
                                             ret_from_fork+0xa820001f ([kernel.kallsyms])
           0.023 probe:submit_bio:(ffffffffac3aee00) bi_sector=50662552)
                                             submit_bio+0xa8200001 ([kernel.kallsyms])
                                             submit_bh+0xa8200013 ([kernel.kallsyms])
                                             jbd2_journal_commit_transaction+0xa8200691 ([kernel.kallsyms])
                                             kjournald2+0xa82000ca ([kernel.kallsyms])
                                             kthread+0xa82000d8 ([kernel.kallsyms])
                                             ret_from_fork+0xa820001f ([kernel.kallsyms])
           0.027 probe:submit_bio:(ffffffffac3aee00) bi_sector=50662560)
                                             submit_bio+0xa8200001 ([kernel.kallsyms])
                                             submit_bh+0xa8200013 ([kernel.kallsyms])
                                             jbd2_journal_commit_transaction+0xa8200691 ([kernel.kallsyms])
                                             kjournald2+0xa82000ca ([kernel.kallsyms])
                                             kthread+0xa82000d8 ([kernel.kallsyms])
                                             ret_from_fork+0xa820001f ([kernel.kallsyms])
           2.593 probe:submit_bio:(ffffffffac3aee00) bi_sector=50662568)
                                             submit_bio+0xa8200001 ([kernel.kallsyms])
                                             submit_bh+0xa8200013 ([kernel.kallsyms])
                                             journal_submit_commit_record+0xa82001ac ([kernel.kallsyms])
                                             jbd2_journal_commit_transaction+0xa82012e8 ([kernel.kallsyms])
                                             kjournald2+0xa82000ca ([kernel.kallsyms])
                                             kthread+0xa82000d8 ([kernel.kallsyms])
                                             ret_from_fork+0xa820001f ([kernel.kallsyms])
        ^C#
      Signed-off-by: default avatarNaohiro Aota <naohiro.aota@hgst.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Hemant Kumar <hemant@linux.vnet.ibm.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1470710408-23515-1-git-send-email-naohiro.aota@hgst.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      19f00b01
    • Mark Rutland's avatar
      perf stat: Avoid skew when reading events · 3df33eff
      Mark Rutland authored
      When we don't have a tracee (i.e. we're attaching to a task or CPU),
      counters can still be running after our workload finishes, and can still
      be running as we read their values. As we read events one-by-one, there
      can be arbitrary skew between values of events, even within a group.
      This means that ratios within an event group are not reliable.
      
      This skew can be seen if measuring a group of identical events, e.g:
      
        # perf stat -a -C0 -e '{cycles,cycles}' sleep 1
      
      To avoid this, we must stop groups from counting before we read the
      values of any constituent events. This patch adds and makes use of a new
      disable_counters() helper, which disables group leaders (and thus each
      group as a whole). This mirrors the use of enable_counters() for
      starting event groups in the absence of a tracee.
      
      Closing a group leader splits the group, and without a disabled group
      leader the newly split events will begin counting. Thus to ensure counts
      are reliable we must defer closing group leaders until all counts have
      been read. To do so this patch removes the event closing logic from the
      read_counters() helper, explicitly closes the events using
      perf_evlist__close(), which also aids legibility.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1470747869-3567-1-git-send-email-mark.rutland@arm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3df33eff
    • Konstantin Khlebnikov's avatar
      perf probe: Fix module name matching · cb3f3378
      Konstantin Khlebnikov authored
      If module is "module" then dso->short_name is "[module]".  Substring
      comparing is't enough: "raid10" matches to "[raid1]".  This patch also
      checks terminating zero in module name.
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Link: http://lkml.kernel.org/r/147039975648.715620.12985971832789032159.stgit@buzzSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      cb3f3378
    • Masami Hiramatsu's avatar
      perf probe: Adjust map->reloc offset when finding kernel symbol from map · 8e34189b
      Masami Hiramatsu authored
      Adjust map->reloc offset for the unmapped address when finding
      alternative symbol address from map, because KASLR can relocate the
      kernel symbol address.
      
      The same adjustment has been done when finding appropriate kernel symbol
      address from map which was introduced by commit f90acac7 ("perf
      probe: Find given address from offline dwarf")
      Reported-by: default avatarArnaldo Carvalho de Melo <acme@kernel.org>
      Signed-off-by: default avatarMasami Hiramatsu <masami.hiramatsu@linaro.org>
      Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/20160806192948.e366f3fbc4b194de600f8326@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8e34189b
    • Arnaldo Carvalho de Melo's avatar
      perf hists: Trim libtraceevent trace_seq buffers · 887fa86d
      Arnaldo Carvalho de Melo authored
      When we use libtraceevent to format trace event fields into printable
      strings to use in hist entries it is important to trim it from the
      default 4 KiB it starts with to what is really used, to reduce the
      memory footprint, so use realloc(seq.buffer, seq.len + 1) when returning
      the seq.buffer formatted with the fields contents.
      Reported-and-Tested-by: default avatarWang Nan <wangnan0@huawei.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/n/tip-t3hl7uxmilrkigzmc90rlhk2@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      887fa86d
    • Brendan Gregg's avatar
      perf script: Add 'bpf-output' field to usage message · bcdc09af
      Brendan Gregg authored
      This adds the 'bpf-output' field to the perf script usage message, and docs.
      Signed-off-by: default avatarBrendan Gregg <bgregg@netflix.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1470192469-11910-4-git-send-email-bgregg@netflix.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bcdc09af
  4. 08 Aug, 2016 6 commits
    • Linus Torvalds's avatar
      Merge tag 'lkdtm-v4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 81abf252
      Linus Torvalds authored
      Pull lkdtm update from Kees Cook:
       "Fix rebuild problem with LKDTM's rodata test"
      
      [ This, and the usercopy branch, both came in before the merge window
        closed, but ended up in my 'need to look more' queue and thus got
        merged only after rc1 was out ]
      
      * tag 'lkdtm-v4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        lkdtm: Fix targets for objcopy usage
        lkdtm: fix false positive warning from -Wmaybe-uninitialized
      81abf252
    • Linus Torvalds's avatar
      Merge tag 'usercopy-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 1eccfa09
      Linus Torvalds authored
      Pull usercopy protection from Kees Cook:
       "Tbhis implements HARDENED_USERCOPY verification of copy_to_user and
        copy_from_user bounds checking for most architectures on SLAB and
        SLUB"
      
      * tag 'usercopy-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        mm: SLUB hardened usercopy support
        mm: SLAB hardened usercopy support
        s390/uaccess: Enable hardened usercopy
        sparc/uaccess: Enable hardened usercopy
        powerpc/uaccess: Enable hardened usercopy
        ia64/uaccess: Enable hardened usercopy
        arm64/uaccess: Enable hardened usercopy
        ARM: uaccess: Enable hardened usercopy
        x86/uaccess: Enable hardened usercopy
        mm: Hardened usercopy
        mm: Implement stack frame object validation
        mm: Add is_migrate_cma_page
      1eccfa09
    • Linus Torvalds's avatar
      unsafe_[get|put]_user: change interface to use a error target label · 1bd4403d
      Linus Torvalds authored
      When I initially added the unsafe_[get|put]_user() helpers in commit
      5b24a7a2 ("Add 'unsafe' user access functions for batched
      accesses"), I made the mistake of modeling the interface on our
      traditional __[get|put]_user() functions, which return zero on success,
      or -EFAULT on failure.
      
      That interface is fairly easy to use, but it's actually fairly nasty for
      good code generation, since it essentially forces the caller to check
      the error value for each access.
      
      In particular, since the error handling is already internally
      implemented with an exception handler, and we already use "asm goto" for
      various other things, we could fairly easily make the error cases just
      jump directly to an error label instead, and avoid the need for explicit
      checking after each operation.
      
      So switch the interface to pass in an error label, rather than checking
      the error value in the caller.  Best do it now before we start growing
      more users (the signal handling code in particular would be a good place
      to use the new interface).
      
      So rather than
      
      	if (unsafe_get_user(x, ptr))
      		... handle error ..
      
      the interface is now
      
      	unsafe_get_user(x, ptr, label);
      
      where an error during the user mode fetch will now just cause a jump to
      'label' in the caller.
      
      Right now the actual _implementation_ of this all still ends up being a
      "if (err) goto label", and does not take advantage of any exception
      label tricks, but for "unsafe_put_user()" in particular it should be
      fairly straightforward to convert to using the exception table model.
      
      Note that "unsafe_get_user()" is much harder to convert to a clever
      exception table model, because current versions of gcc do not allow the
      use of "asm goto" (for the exception) with output values (for the actual
      value to be fetched).  But that is hopefully not a limitation in the
      long term.
      
      [ Also note that it might be a good idea to switch unsafe_get_user() to
        actually _return_ the value it fetches from user space, but this
        commit only changes the error handling semantics ]
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1bd4403d
    • Andreas Ziegler's avatar
      printk: Remove unnecessary #ifdef CONFIG_PRINTK · 574673c2
      Andreas Ziegler authored
      In commit 874f9c7d ("printk: create pr_<level> functions"), new
      pr_level defines were added to printk.c.
      
      These new defines are guarded by an #ifdef CONFIG_PRINTK - however,
      there is already a surrounding #ifdef CONFIG_PRINTK starting a lot
      earlier in line 249 which means the newly introduced #ifdef is
      unnecessary.
      
      Let's remove it to avoid confusion.
      Signed-off-by: default avatarAndreas Ziegler <andreas.ziegler@fau.de>
      Cc: Joe Perches <joe@perches.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      574673c2
    • Ville Syrjälä's avatar
      x86/hweight: Don't clobber %rdi · 65ea11ec
      Ville Syrjälä authored
      The caller expects %rdi to remain intact, push+pop it make that happen.
      
      Fixes the following kind of explosions on my core2duo machine when
      trying to reboot or shut down:
      
        general protection fault: 0000 [#1] PREEMPT SMP
        Modules linked in: i915 i2c_algo_bit drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea drm netconsole configfs binfmt_misc iTCO_wdt psmouse pcspkr snd_hda_codec_idt e100 coretemp hwmon snd_hda_codec_generic i2c_i801 mii i2c_smbus lpc_ich mfd_core snd_hda_intel uhci_hcd snd_hda_codec snd_hwdep snd_hda_core ehci_pci 8250 ehci_hcd snd_pcm 8250_base usbcore evdev serial_core usb_common parport_pc parport snd_timer snd soundcore
        CPU: 0 PID: 3070 Comm: reboot Not tainted 4.8.0-rc1-perf-dirty #69
        Hardware name:                  /D946GZIS, BIOS TS94610J.86A.0087.2007.1107.1049 11/07/2007
        task: ffff88012a0b4080 task.stack: ffff880123850000
        RIP: 0010:[<ffffffff81003c92>]  [<ffffffff81003c92>] x86_perf_event_update+0x52/0xc0
        RSP: 0018:ffff880123853b60  EFLAGS: 00010087
        RAX: 0000000000000001 RBX: ffff88012fc0a3c0 RCX: 000000000000001e
        RDX: 0000000000000000 RSI: 0000000040000000 RDI: ffff88012b014800
        RBP: ffff880123853b88 R08: ffffffffffffffff R09: 0000000000000000
        R10: ffffea0004a012c0 R11: ffffea0004acedc0 R12: ffffffff80000001
        R13: ffff88012b0149c0 R14: ffff88012b014800 R15: 0000000000000018
        FS:  00007f8b155cd700(0000) GS:ffff88012fc00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007f8b155f5000 CR3: 000000012a2d7000 CR4: 00000000000006f0
        Stack:
         ffff88012fc0a3c0 ffff88012b014800 0000000000000004 0000000000000001
         ffff88012fc1b750 ffff880123853bb0 ffffffff81003d59 ffff88012b014800
         ffff88012fc0a3c0 ffff88012b014800 ffff880123853bd8 ffffffff81003e13
        Call Trace:
         [<ffffffff81003d59>] x86_pmu_stop+0x59/0xd0
         [<ffffffff81003e13>] x86_pmu_del+0x43/0x140
         [<ffffffff8111705d>] event_sched_out.isra.105+0xbd/0x260
         [<ffffffff8111738d>] __perf_remove_from_context+0x2d/0xb0
         [<ffffffff8111745d>] __perf_event_exit_context+0x4d/0x70
         [<ffffffff810c8826>] generic_exec_single+0xb6/0x140
         [<ffffffff81117410>] ? __perf_remove_from_context+0xb0/0xb0
         [<ffffffff81117410>] ? __perf_remove_from_context+0xb0/0xb0
         [<ffffffff810c898f>] smp_call_function_single+0xdf/0x140
         [<ffffffff81113d27>] perf_event_exit_cpu_context+0x87/0xc0
         [<ffffffff81113d73>] perf_reboot+0x13/0x40
         [<ffffffff8107578a>] notifier_call_chain+0x4a/0x70
         [<ffffffff81075ad7>] __blocking_notifier_call_chain+0x47/0x60
         [<ffffffff81075b06>] blocking_notifier_call_chain+0x16/0x20
         [<ffffffff81076a1d>] kernel_restart_prepare+0x1d/0x40
         [<ffffffff81076ae2>] kernel_restart+0x12/0x60
         [<ffffffff81076d56>] SYSC_reboot+0xf6/0x1b0
         [<ffffffff811a823c>] ? mntput_no_expire+0x2c/0x1b0
         [<ffffffff811a83e4>] ? mntput+0x24/0x40
         [<ffffffff811894fc>] ? __fput+0x16c/0x1e0
         [<ffffffff811895ae>] ? ____fput+0xe/0x10
         [<ffffffff81072fc3>] ? task_work_run+0x83/0xa0
         [<ffffffff81001623>] ? exit_to_usermode_loop+0x53/0xc0
         [<ffffffff8100105a>] ? trace_hardirqs_on_thunk+0x1a/0x1c
         [<ffffffff81076e6e>] SyS_reboot+0xe/0x10
         [<ffffffff814c4ba5>] entry_SYSCALL_64_fastpath+0x18/0xa3
        Code: 7c 4c 8d af c0 01 00 00 49 89 fe eb 10 48 09 c2 4c 89 e0 49 0f b1 55 00 4c 39 e0 74 35 4d 8b a6 c0 01 00 00 41 8b 8e 60 01 00 00 <0f> 33 8b 35 6e 02 8c 00 48 c1 e2 20 85 f6 7e d2 48 89 d3 89 cf
        RIP  [<ffffffff81003c92>] x86_perf_event_update+0x52/0xc0
         RSP <ffff880123853b60>
        ---[ end trace 7ec95181faf211be ]---
        note: reboot[3070] exited with preempt_count 2
      
      Cc: Borislav Petkov <bp@suse.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Fixes: f5967101 ("x86/hweight: Get rid of the special calling convention")
      Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      65ea11ec
    • Linus Torvalds's avatar
      Linux 4.8-rc1 · 29b4817d
      Linus Torvalds authored
      29b4817d
  5. 07 Aug, 2016 10 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 857953d7
      Linus Torvalds authored
      Pull more block fixes from Jens Axboe:
       "As mentioned in the pull the other day, a few more fixes for this
        round, all related to the bio op changes in this series.
      
        Two fixes, and then a cleanup, renaming bio->bi_rw to bio->bi_opf.  I
        wanted to do that change right after or right before -rc1, so that
        risk of conflict was reduced.  I just rebased the series on top of
        current master, and no new ->bi_rw usage has snuck in"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        block: rename bio bi_rw to bi_opf
        target: iblock_execute_sync_cache() should use bio_set_op_attrs()
        mm: make __swap_writepage() use bio_set_op_attrs()
        block/mm: make bdev_ops->rw_page() take a bool for read/write
      857953d7
    • Linus Torvalds's avatar
      Merge tag 'drm-for-v4.8-zpos' of git://people.freedesktop.org/~airlied/linux · 635a4ba1
      Linus Torvalds authored
      Pull drm zpos property support from Dave Airlie:
       "This tree was waiting on some media stuff I hadn't had time to get a
        stable branchpoint off, so I just waited until it was all in your tree
        first.
      
        It's been around a bit on the list and shouldn't affect anything
        outside adding the generic API and moving some ARM drivers to using
        it"
      
      * tag 'drm-for-v4.8-zpos' of git://people.freedesktop.org/~airlied/linux:
        drm: rcar: use generic code for managing zpos plane property
        drm/exynos: use generic code for managing zpos plane property
        drm: sti: use generic zpos for plane
        drm: add generic zpos property
      635a4ba1
    • Jens Axboe's avatar
      block: rename bio bi_rw to bi_opf · 1eff9d32
      Jens Axboe authored
      Since commit 63a4cc24, bio->bi_rw contains flags in the lower
      portion and the op code in the higher portions. This means that
      old code that relies on manually setting bi_rw is most likely
      going to be broken. Instead of letting that brokeness linger,
      rename the member, to force old and out-of-tree code to break
      at compile time instead of at runtime.
      
      No intended functional changes in this commit.
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      1eff9d32
    • Jens Axboe's avatar
      target: iblock_execute_sync_cache() should use bio_set_op_attrs() · 31c64f78
      Jens Axboe authored
      The original commit missed this function, it needs to mark it a
      write flush.
      
      Cc: Mike Christie <mchristi@redhat.com>
      Fixes: e742fc32 ("target: use bio op accessors")
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      31c64f78
    • Jens Axboe's avatar
      mm: make __swap_writepage() use bio_set_op_attrs() · ba13e83e
      Jens Axboe authored
      Cleaner than manipulating bio->bi_rw flags directly.
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      ba13e83e
    • Jens Axboe's avatar
      block/mm: make bdev_ops->rw_page() take a bool for read/write · c11f0c0b
      Jens Axboe authored
      Commit abf54548 changed it from an 'rw' flags type to the
      newer ops based interface, but now we're effectively leaking
      some bdev internals to the rest of the kernel. Since we only
      care about whether it's a read or a write at that level, just
      pass in a bool 'is_write' parameter instead.
      
      Then we can also move op_is_write() and friends back under
      CONFIG_BLOCK protection.
      Reviewed-by: default avatarMike Christie <mchristi@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      c11f0c0b
    • Linus Torvalds's avatar
      Merge tag 'doc-4.8-fixes' of git://git.lwn.net/linux · 52ddb7e9
      Linus Torvalds authored
      Pull documentation fixes from Jonathan Corbet:
       "Three fixes for the docs build, including removing an annoying warning
        on 'make help' if sphinx isn't present"
      
      * tag 'doc-4.8-fixes' of git://git.lwn.net/linux:
        DocBook: use DOCBOOKS="" to ignore DocBooks instead of IGNORE_DOCBOOKS=1
        Documenation: update cgroup's document path
        Documentation/sphinx: do not warn about missing tools in 'make help'
      52ddb7e9
    • Linus Torvalds's avatar
      Merge tag 'binfmt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/binfmt_misc · e9d488c3
      Linus Torvalds authored
      Pull binfmt_misc update from James Bottomley:
       "This update is to allow architecture emulation containers to function
        such that the emulation binary can be housed outside the container
        itself.  The container and fs parts both have acks from relevant
        experts.
      
        To use the new feature you have to add an F option to your binfmt_misc
        configuration"
      
      From the docs:
       "The usual behaviour of binfmt_misc is to spawn the binary lazily when
        the misc format file is invoked.  However, this doesn't work very well
        in the face of mount namespaces and changeroots, so the F mode opens
        the binary as soon as the emulation is installed and uses the opened
        image to spawn the emulator, meaning it is always available once
        installed, regardless of how the environment changes"
      
      * tag 'binfmt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/binfmt_misc:
        binfmt_misc: add F option description to documentation
        binfmt_misc: add persistent opened binary handler for containers
        fs: add filp_clone_open API
      e9d488c3
    • Eryu Guan's avatar
      fs: return EPERM on immutable inode · 337684a1
      Eryu Guan authored
      In most cases, EPERM is returned on immutable inode, and there're only a
      few places returning EACCES. I noticed this when running LTP on
      overlayfs, setxattr03 failed due to unexpected EACCES on immutable
      inode.
      
      So converting all EACCES to EPERM on immutable inode.
      Acked-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarEryu Guan <guaneryu@gmail.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      337684a1
    • Linus Torvalds's avatar
      Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · fe64f328
      Linus Torvalds authored
      Pull more vfs updates from Al Viro:
       "Assorted cleanups and fixes.
      
        In the "trivial API change" department - ->d_compare() losing 'parent'
        argument"
      
      * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        cachefiles: Fix race between inactivating and culling a cache object
        9p: use clone_fid()
        9p: fix braino introduced in "9p: new helper - v9fs_parent_fid()"
        vfs: make dentry_needs_remove_privs() internal
        vfs: remove file_needs_remove_privs()
        vfs: fix deadlock in file_remove_privs() on overlayfs
        get rid of 'parent' argument of ->d_compare()
        cifs, msdos, vfat, hfs+: don't bother with parent in ->d_compare()
        affs ->d_compare(): don't bother with ->d_inode
        fold _d_rehash() and __d_rehash() together
        fold dentry_rcuwalk_invalidate() into its only remaining caller
      fe64f328
  6. 06 Aug, 2016 7 commits
    • Linus Torvalds's avatar
      Merge tag 'xfs-rmap-for-linus-4.8-rc1' of... · 0cbbc422
      Linus Torvalds authored
      Merge tag 'xfs-rmap-for-linus-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs
      
      Pull more xfs updates from Dave Chinner:
       "This is the second part of the XFS updates for this merge cycle, and
        contains the new reverse block mapping feature for XFS.
      
        Reverse mapping allows us to track the owner of a specific block on
        disk precisely.  It is implemented as a set of btrees (one per
        allocation group) that track the owners of allocated extents.
        Effectively it is a "used space tree" that is updated when we allocate
        or free extents.  i.e. it is coherent with the free space btrees we
        already maintain and never overlaps with them.
      
        This reverse mapping infrastructure is the building block of several
        upcoming features - reflink, copy-on-write data, dedupe, online
        metadata and data scrubbing, highly accurate bad sector/data loss
        reporting to users, and significantly improved reconstruction of
        damaged and corrupted filesystems.  There's a lot of new stuff coming
        along in the next couple of cycles,a nd it all builds in the rmap
        infrastructure.
      
        As such, it's a huge chunk of new code with new on-disk format
        features and internal infrastructure.  It warns at mount time as an
        experimental feature and that it may eat data (as we do with all new
        on-disk features until they stabilise).  We have not released
        userspace suport for it yet - userspace support currently requires
        download from Darrick's xfsprogs repo and build from source, so the
        access to this feature is really developer/tester only at this point.
        Initial userspace support will be released at the same time kernel
        with this code in it is released.
      
        The new rmap enabled code regresses 3 xfstests - all are ENOSPC
        related corner cases, one of which Darrick posted a fix for a few
        hours ago.  The other two are fixed by infrastructure that is part of
        the upcoming reflink patchset.  This new ENOSPC infrastructure
        requires a on-disk format tweak required to keep mount times in
        check - we need to keep an on-disk count of allocated rmapbt blocks so
        we don't have to scan the entire btrees at mount time to count them.
      
        This is currently being tested and will be part of the fixes sent in
        the next week or two so users will not be exposed to this change"
      
      * tag 'xfs-rmap-for-linus-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (52 commits)
        xfs: move (and rename) the deferred bmap-free tracepoints
        xfs: collapse single use static functions
        xfs: remove unnecessary parentheses from log redo item recovery functions
        xfs: remove the extents array from the rmap update done log item
        xfs: in btree_lshift, only allocate temporary cursor when needed
        xfs: remove unnecesary lshift/rshift key initialization
        xfs: remove the get*keys and update_keys btree ops pointers
        xfs: enable the rmap btree functionality
        xfs: don't update rmapbt when fixing agfl
        xfs: disable XFS_IOC_SWAPEXT when rmap btree is enabled
        xfs: add rmap btree block detection to log recovery
        xfs: add rmap btree geometry feature flag
        xfs: propagate bmap updates to rmapbt
        xfs: enable the xfs_defer mechanism to process rmaps to update
        xfs: log rmap intent items
        xfs: create rmap update intent log items
        xfs: add rmap btree insert and delete helpers
        xfs: convert unwritten status of reverse mappings
        xfs: remove an extent from the rmap btree
        xfs: add an extent to the rmap btree
        ...
      0cbbc422
    • Linus Torvalds's avatar
      Merge branch 'work.const-qstr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 835c92d4
      Linus Torvalds authored
      Pull qstr constification updates from Al Viro:
       "Fairly self-contained bunch - surprising lot of places passes struct
        qstr * as an argument when const struct qstr * would suffice; it
        complicates analysis for no good reason.
      
        I'd prefer to feed that separately from the assorted fixes (those are
        in #for-linus and with somewhat trickier topology)"
      
      * 'work.const-qstr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        qstr: constify instances in adfs
        qstr: constify instances in lustre
        qstr: constify instances in f2fs
        qstr: constify instances in ext2
        qstr: constify instances in vfat
        qstr: constify instances in procfs
        qstr: constify instances in fuse
        qstr constify instances in fs/dcache.c
        qstr: constify instances in nfs
        qstr: constify instances in ocfs2
        qstr: constify instances in autofs4
        qstr: constify instances in hfs
        qstr: constify instances in hfsplus
        qstr: constify instances in logfs
        qstr: constify dentry_init_security
      835c92d4
    • Linus Torvalds's avatar
      Merge tag 'media/v4.8-6' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · ce804bf5
      Linus Torvalds authored
      Pull mailcap fixlets from Mauro Carvalho Chehab:
       "A small fixup for my and Shuah's entries in .mailcap.
      
        Basically, those entries were with a syntax that makes
        get_maintainer.pl to do the wrong thing"
      
      * tag 'media/v4.8-6' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
        .mailmap: Correct entries for Mauro Carvalho Chehab and Shuah Khan
      ce804bf5
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 0803e040
      Linus Torvalds authored
      Pull virtio/vhost updates from Michael Tsirkin:
      
       - new vsock device support in host and guest
      
       - platform IOMMU support in host and guest, including compatibility
         quirks for legacy systems.
      
       - misc fixes and cleanups.
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        VSOCK: Use kvfree()
        vhost: split out vringh Kconfig
        vhost: detect 32 bit integer wrap around
        vhost: new device IOTLB API
        vhost: drop vringh dependency
        vhost: convert pre sorted vhost memory array to interval tree
        vhost: introduce vhost memory accessors
        VSOCK: Add Makefile and Kconfig
        VSOCK: Introduce vhost_vsock.ko
        VSOCK: Introduce virtio_transport.ko
        VSOCK: Introduce virtio_vsock_common.ko
        VSOCK: defer sock removal to transports
        VSOCK: transport-specific vsock_transport functions
        vhost: drop vringh dependency
        vop: pull in vhost Kconfig
        virtio: new feature to detect IOMMU device quirk
        balloon: check the number of available pages in leak balloon
        vhost: lockless enqueuing
        vhost: simplify work flushing
      0803e040
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 80fac0f5
      Linus Torvalds authored
      Pull more KVM updates from Paolo Bonzini:
       - ARM bugfix and MSI injection support
       - x86 nested virt tweak and OOPS fix
       - Simplify pvclock code (vdso bits acked by Andy Lutomirski).
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        nvmx: mark ept single context invalidation as supported
        nvmx: remove comment about missing nested vpid support
        KVM: lapic: fix access preemption timer stuff even if kernel_irqchip=off
        KVM: documentation: fix KVM_CAP_X2APIC_API information
        x86: vdso: use __pvclock_read_cycles
        pvclock: introduce seqcount-like API
        arm64: KVM: Set cpsr before spsr on fault injection
        KVM: arm: vgic-irqfd: Workaround changing kvm_set_routing_entry prototype
        KVM: arm/arm64: Enable MSI routing
        KVM: arm/arm64: Enable irqchip routing
        KVM: Move kvm_setup_default/empty_irq_routing declaration in arch specific header
        KVM: irqchip: Convey devid to kvm_set_msi
        KVM: Add devid in kvm_kernel_irq_routing_entry
        KVM: api: Pass the devid in the msi routing entry
      80fac0f5
    • Linus Torvalds's avatar
      Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus · 4305f424
      Linus Torvalds authored
      Pull MIPS updates from Ralf Baechle:
       "This is the main pull request for MIPS for 4.8.  Also includes is a
        minor SSB cleanup as SSB code traditionally is merged through the MIPS
        tree:
      
        ATH25:
          - MIPS: Add default configuration for ath25
      
        Boot:
          - For zboot, copy appended dtb to the end of the kernel
          - store the appended dtb address in a variable
      
        BPF:
          - Fix off by one error in offset allocation
      
        Cobalt code:
          - Fix typos
      
        Core code:
          - debugfs_create_file returns NULL on error, so don't use IS_ERR for
            testing for errors.
          - Fix double locking issue in RM7000 S-cache code.  This would only
            affect RM7000 ARC systems on reboot.
          - Fix page table corruption on THP permission changes.
          - Use compat_sys_keyctl for 32 bit userspace on 64 bit kernels.
            David says, there are no compatibility issues raised by this fix.
          - Move some signal code around.
          - Rewrite r4k count/compare clockevent device registration such that
            min_delta_ticks/max_delta_ticks files are guaranteed to be
            initialized.
          - Only register r4k count/compare as clockevent device if we can
            assume the clock to be constant.
          - Fix MSA asm warnings in control reg accessors
          - uasm and tlbex fixes and tweaking.
          - Print segment physical address when EU=1.
          - Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFO.
          - CP: Allow booting by VP other than VP 0
          - Cache handling fixes and optimizations for r4k class caches
          - Add hotplug support for R6 processors
          - Cleanup hotplug bits in kconfig
          - traps: return correct si code for accessing nonmapped addresses
          - Remove cpu_has_safe_index_cacheops
      
        Lantiq:
          - Register IRQ handler for virtual IRQ number
          - Fix EIU interrupt loading code
          - Use the real EXIN count
          - Fix build error.
      
        Loongson 3:
          - Increase HPET_MIN_PROG_DELTA and decrease HPET_MIN_CYCLES
      
        Octeon:
          - Delete built-in DTB pruning code for D-Link DSR-1000N.
          - Clean up GPIO definitions in dlink_dsr-1000n.dts.
          - Add more LEDs to the DSR-100n DTS
          - Fix off by one in octeon_irq_gpio_map()
          - Typo fixes
          - Enable SATA by default in cavium_octeon_defconfig
          - Support readq/writeq()
          - Remove forced mappings of USB interrupts.
          - Ensure DMA descriptors are always in the low 4GB
          - Improve USB reset code for OCTEON II.
      
        Pistachio:
          - Add maintainers entry for pistachio SoC Support
          - Remove plat_setup_iocoherency
      
        Ralink:
          - Fix pwm UART in spis group pinmux.
      
        SSB:
          - Change bare unsigned to unsigned int to suit coding style
      
        Tools:
          - Fix reloc tool compiler warnings.
      
        Other:
          - Delete use of ARCH_WANT_OPTIONAL_GPIOLIB"
      
      * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (61 commits)
        MIPS: mm: Fix definition of R6 cache instruction
        MIPS: tools: Fix relocs tool compiler warnings
        MIPS: Cobalt: Fix typo
        MIPS: Octeon: Fix typo
        MIPS: Lantiq: Fix build failure
        MIPS: Use CPHYSADDR to implement mips32 __pa
        MIPS: Octeon: Dlink_dsr-1000n.dts: add more leds.
        MIPS: Octeon: Clean up GPIO definitions in dlink_dsr-1000n.dts.
        MIPS: Octeon: Delete built-in DTB pruning code for D-Link DSR-1000N.
        MIPS: store the appended dtb address in a variable
        MIPS: ZBOOT: copy appended dtb to the end of the kernel
        MIPS: ralink: fix spis group pinmux
        MIPS: Factor o32 specific code into signal_o32.c
        MIPS: non-exec stack & heap when non-exec PT_GNU_STACK is present
        MIPS: Use per-mm page to execute branch delay slot instructions
        MIPS: Modify error handling
        MIPS: c-r4k: Use SMP calls for CM indexed cache ops
        MIPS: c-r4k: Avoid small flush_icache_range SMP calls
        MIPS: c-r4k: Local flush_icache_range cache op override
        MIPS: c-r4k: Split r4k_flush_kernel_vmap_range()
        ...
      4305f424
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · db826278
      Linus Torvalds authored
      Pull perf updates from Ingo Molnar:
       "Mostly tooling fixes and some late tooling updates, plus two perf
        related printk message fixes"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf tests bpf: Use SyS_epoll_wait alias
        perf tests: objdump output can contain multi byte chunks
        perf record: Add --sample-cpu option
        perf hists: Introduce output_resort_cb method
        perf tools: Move config/Makefile into Makefile.config
        perf tests: Add test for bitmap_scnprintf function
        tools lib: Add bitmap_and function
        tools lib: Add bitmap_scnprintf function
        tools lib: Add bitmap_alloc function
        tools lib traceevent: Ignore generated library files
        perf tools: Fix build failure on perl script context
        perf/core: Change log level for duration warning to KERN_INFO
        perf annotate: Plug filename string leak
        perf annotate: Introduce strerror for handling symbol__disassemble() errors
        perf annotate: Rename symbol__annotate() to symbol__disassemble()
        perf/x86: Modify error message in virtualized environment
        perf target: str_error_r() always returns the buffer it receives
        perf annotate: Use pipe + fork instead of popen
        perf evsel: Introduce constructor for cycles event
      db826278