1. 25 Nov, 2016 7 commits
    • Namhyung Kim's avatar
      perf sched timehist: Mark schedule function in callchains · cdeb01bf
      Namhyung Kim authored
      The sched_switch event always captured from the scheduler function.  So
      it'd be great omit them from the callchain.  This patch marks the
      functions to be omitted by later patch.
      
      Committer notes:
      
      Testing it:
      
      Before:
      
        [root@jouet experimental]# perf sched record -g ls
        Dockerfile  perf.data  x-mips64
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 1.355 MB perf.data (29 samples) ]
        [root@jouet experimental]# perf sched timehist
            time  cpu  task name         wait time sch delay run time
                       [tid/pid]             (msec) (msec) (msec)
        ----------- -----  ----------------- ------ ------ ------
        6.494998 [001] <idle>                0.000  0.000  0.000
        6.495027 [002] perf[519]             0.000  0.000  0.000 __schedule <- schedule <- schedule_hrtimeout_range_clock <- schedule_hrtimeou
        6.495096 [003] <idle>                0.000  0.000  0.000
        6.495100 [003] rcuos/0[9]            0.000  0.005  0.003 __schedule <- schedule <- rcu_nocb_kthread <- kthread <- ret_from_fork
        6.495113 [001] perf[520]             0.000  0.008  0.114 __schedule <- preempt_schedule_common <- _cond_resched <- wait_for_completion
        6.495121 [000] <idle>                0.000  0.000  0.000
        6.495129 [001] migration/1[17]       0.000  0.003  0.016 __schedule <- schedule <- smpboot_thread_fn <- kthread <- ret_from_fork
        6.496085 [002] <idle>                0.000  0.000  1.057
        6.496096 [002] kworker/u16:1[31169]  0.000  0.004  0.011 __schedule <- schedule <- worker_thread <- kthread <- ret_from_fork
        6.496096 [003] <idle>                0.003  0.000  0.996
        6.496169 [002] <idle>                0.011  0.000  0.072
        6.496171 [000] ls[520]               0.008  0.000  1.049 __schedule <- schedule <- do_exit <- do_group_exit <- [unknown]
        6.496172 [003] gnome-terminal-[4391] 0.000  0.003  0.076 __schedule <- schedule <- schedule_hrtimeout_range_clock <- schedule_hrtimeo
      
      After:
      
        [root@jouet experimental]# perf sched timehist
            time  cpu  task name         wait time sch delay run time
                       [tid/pid]            (msec)  (msec)  (msec)
        ----------- -----  ----------------- -----  -----  ------
        6.494998 [001] <idle>                0.000  0.000  0.000
        6.495027 [002] perf[519]             0.000  0.000  0.000 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_t
        6.495096 [003] <idle>                0.000  0.000  0.000
        6.495100 [003] rcuos/0[9]            0.000  0.005  0.003 rcu_nocb_kthread <- kthread <- ret_from_fork
        6.495113 [001] perf[520]             0.000  0.008  0.114 preempt_schedule_common <- _cond_resched <- wait_for_completion <- stop_one_c
        6.495121 [000] <idle>                0.000  0.000  0.000
        6.495129 [001] migration/1[17]       0.000  0.003  0.016 smpboot_thread_fn <- kthread <- ret_from_fork
        6.496085 [002] <idle>                0.000  0.000  1.057
        6.496096 [002] kworker/u16:1[31169]  0.000  0.004  0.011 worker_thread <- kthread <- ret_from_fork
        6.496096 [003] <idle>                0.003  0.000  0.996
        6.496169 [002] <idle>                0.011  0.000  0.072
        6.496171 [000] ls[520]               0.008  0.000  1.049 do_exit <- do_group_exit <- [unknown]
        6.496172 [003] gnome-terminal-[4391] 0.000  0.003  0.076 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_
        [root@jouet experimental]#
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/20161124011114.7102-1-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      cdeb01bf
    • Namhyung Kim's avatar
      perf callchain: Add option to skip ignore symbol when printing callchains · 2d9bbf6e
      Namhyung Kim authored
      For tracepoint events, callchains always contain certain functions.
      Sometimes it'd be better to skip those functions as they have no value.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/20161124011114.7102-2-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2d9bbf6e
    • Ravi Bangoria's avatar
      perf annotate: Initial PowerPC support · dbdebdc5
      Ravi Bangoria authored
      Support the PowerPC architecture using the ins_ops association
      method.
      
      Committer notes:
      
      Testing it with a perf.data file collected on a PowerPC machine and
      cross-annotated on a x86_64 workstation, using the associated vmlinux
      file:
      
      $ perf report -i perf.data.f22vm.powerdev --vmlinux vmlinux.powerpc
        .ktime_get  vmlinux.powerpc
              │      clrldi r9,r28,63
         8.57 │   ┌──bne    e0                   <- TUI cursor positioned here
              │54:│  lwsync
         2.86 │   │  std    r2,40(r1)
              │   │  ld     r9,144(r31)
              │   │  ld     r3,136(r31)
              │   │  ld     r30,184(r31)
              │   │  ld     r10,0(r9)
              │   │  mtctr  r10
              │   │  ld     r2,8(r9)
         8.57 │   │→ bctrl
              │   │  ld     r2,40(r1)
              │   │  ld     r10,160(r31)
              │   │  ld     r5,152(r31)
              │   │  lwz    r7,168(r31)
              │   │  ld     r9,176(r31)
         8.57 │   │  lwz    r6,172(r31)
              │   │  lwsync
         2.86 │   │  lwz    r8,128(r31)
              │   │  cmpw   cr7,r8,r28
         2.86 │   │↑ bne    48
              │   │  subf   r10,r10,r3
              │   │  mr     r3,r29
              │   │  and    r10,r10,r5
         2.86 │   │  mulld  r10,r10,r7
              │   │  add    r9,r10,r9
              │   │  srd    r9,r9,r6
              │   │  add    r9,r9,r30
              │   │  std    r9,0(r29)
              │   │  addi   r1,r1,144
              │   │  ld     r0,16(r1)
              │   │  ld     r28,-32(r1)
              │   │  ld     r29,-24(r1)
              │   │  ld     r30,-16(r1)
              │   │  mtlr   r0
              │   │  ld     r31,-8(r1)
              │   │← blr
         5.71 │e0:└─→mr     r1,r1
        11.43 │      mr     r2,r2
        11.43 │      lwz    r28,128(r31)
        Press 'h' for help on key bindings
      
        $ perf report -i perf.data.f22vm.powerdev --header-only
        # ========
        # captured on: Thu Nov 24 12:40:38 2016
        # hostname : pdev-f22-qemu
        # os release : 4.4.10-200.fc22.ppc64
        # perf version : 4.9.rc1.g6298ce
        # arch : ppc64
        # nrcpus online : 48
        # nrcpus avail : 48
        # cpudesc : POWER7 (architected), altivec supported
        # cpuid : 74,513
        # total memory : 4158976 kB
        # cmdline : /home/ravi/Workspace/linux/tools/perf/perf record -a
        # event : name = cycles:ppp, , size = 112, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|CPU|PERIOD, disabled = 1, inherit = 1, mmap = 1, comm = 1, freq = 1, task = 1, precise_ip = 3, sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1
        # HEADER_CPU_TOPOLOGY info available, use -I to display
        # HEADER_NUMA_TOPOLOGY info available, use -I to display
        # pmu mappings: cpu = 4, software = 1, tracepoint = 2, breakpoint = 5
        # missing features: HEADER_TRACING_DATA HEADER_BRANCH_STACK HEADER_GROUP_DESC HEADER_AUXTRACE HEADER_STAT HEADER_CACHE
        # ========
        #
        $
      Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Signed-off-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Link: http://lkml.kernel.org/n/tip-tbjnp40ddoxxl474uvhwi6g4@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      dbdebdc5
    • Arnaldo Carvalho de Melo's avatar
      perf annotate: Improve support for ARM · acc9bfb5
      Arnaldo Carvalho de Melo authored
      By using arch->init() to set up some regular expressions to associate
      ins_ops to ARM instructions, ditching that old table that has
      instructions not present on ARM.
      
      Take advantage of having an arch->init() to hide more arm specific stuff
      from the common code, like the objdump details.
      
      The regular expressions comes from a patch written by Kim Phillips.
      Reviewed-by: default avatarRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Chris Riyder <chris.ryder@arm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Pawel Moll <pawel.moll@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Taeung Song <treeze.taeung@gmail.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-77m7lufz9ajjimkrebtg5ead@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      acc9bfb5
    • Arnaldo Carvalho de Melo's avatar
      perf annotate: Allow arches to have a init routine and a priv area · 0781ea92
      Arnaldo Carvalho de Melo authored
      Arches like ARM will want to use regular expressions when deciding what
      instructions to associate with what ins_ops, provide infrastructure for
      that.
      Reviewed-by: default avatarRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Chris Riyder <chris.ryder@arm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Pawel Moll <pawel.moll@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Taeung Song <treeze.taeung@gmail.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-7dmnk9el2ipu3nxog092k9z5@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      0781ea92
    • Arnaldo Carvalho de Melo's avatar
      perf annotate: Introduce alternative method of keeping instructions table · 2a1ff812
      Arnaldo Carvalho de Melo authored
      Some arches may want to dynamically populate the table using regular
      expressions on the instruction names to associate them with a set of
      parsing/formatting/etc functions (struct ins_ops), so provide a fallback
      for when the ins__find() method fails.
      
      That fall back will be able to resize the arch->instructions, setting
      arch->nr_instructions appropriately, helper functions to associate an
      ins_ops to an instruction name, growing the arch->instructions if needed
      and resorting it are provided, all the arch specific callback needs to
      do is to decide if the missing instruction should be added to
      arch->instructions with a ins_ops association.
      Reviewed-by: default avatarRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Chris Riyder <chris.ryder@arm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Pawel Moll <pawel.moll@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Taeung Song <treeze.taeung@gmail.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-auu13yradxf7g5dgtpnzt97a@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2a1ff812
    • Arnaldo Carvalho de Melo's avatar
      perf annotate: Remove duplicate 'name' field from disasm_line · 75b49202
      Arnaldo Carvalho de Melo authored
      The disasm_line::name field is always equal to ins::name, being used
      just to locate the instruction's ins_ops from the per-arch instructions
      table.
      
      Eliminate this duplication, nuking that field and instead make
      ins__find() return an ins_ops, store it in disasm_line::ins.ops, and
      keep just in disasm_line::ins.name what was in disasm_line::name, this
      way we end up not keeping a reference to entries in the per-arch
      instructions table.
      
      This in turn will help supporting multiple ways to manage the per-arch
      instructions table, allowing resorting that array, for instance, when
      the entries will move after references to its addresses were made. The
      same problem is avoided when one grows the array with realloc.
      
      So architectures simply keeping a constant array will work as well as
      architectures building the table using regular expressions or other
      logic that involves resorting the table.
      Reviewed-by: default avatarRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Chris Riyder <chris.ryder@arm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Pawel Moll <pawel.moll@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Taeung Song <treeze.taeung@gmail.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-vr899azvabnw9gtuepuqfd9t@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      75b49202
  2. 24 Nov, 2016 2 commits
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo-20161123' of... · 47414424
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo-20161123' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      New tool:
      
      - 'perf sched timehist' provides an analysis of scheduling events.
      
        Example usage:
            perf sched record -- sleep 1
            perf sched timehist
      
        By default it shows the individual schedule events, including the wait
        time (time between sched-out and next sched-in events for the task), the
        task scheduling delay (time between wakeup and actually running) and run
        time for the task:
      
              time    cpu  task name         wait time  sch delay  run time
                           [tid/pid]            (msec)     (msec)    (msec)
          -------- ------  ----------------  ---------  ---------  --------
          1.874569 [0011]  gcc[31949]            0.014      0.000     1.148
          1.874591 [0010]  gcc[31951]            0.000      0.000     0.024
          1.874603 [0010]  migration/10[59]      3.350      0.004     0.011
          1.874604 [0011]  <idle>                1.148      0.000     0.035
          1.874723 [0005]  <idle>                0.016      0.000     1.383
          1.874746 [0005]  gcc[31949]            0.153      0.078     0.022
        ...
      
        Times are in msec.usec. (David Ahern, Namhyung Kim)
      
      Improvements:
      
      - Make 'perf c2c report' support -f/--force, to allow skipping the
        ownership check for root users, for instance, just like the other
        tools (Jiri Olsa)
      
      - Allow sorting cachelines by total number of HITMs, in addition to
        local and remote numbers (Jiri Olsa)
      
      Fixes:
      
      - Make sure errors aren't suppressed by the TUI reset at the end of
        a 'perf c2c report' session (Jiri Olsa)
      
      Infrastructure changes:
      
      - Initial work on having the annotate code better support multiple
        architectures, including the ability to cross-annotate, i.e. to
        annotate perf.data files collected on an ARM system on a x86_64
        workstation (Arnaldo Carvalho de Melo, Ravi Bangoria, Kim Phillips)
      
      - Use USECS_PER_SEC instead of hard coded number in libtraceevent (Steven Rostedt)
      
      - Add retrieval of preempt count and latency flags in libtraceevent (Steven Rostedt)
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      47414424
    • Ingo Molnar's avatar
      69e6cdd0
  3. 23 Nov, 2016 20 commits
  4. 22 Nov, 2016 11 commits
    • Linus Torvalds's avatar
      Merge branch 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux · 23400ac9
      Linus Torvalds authored
      Pull thermal management fix from Zhang Rui:
       "We only have one urgent fix this time.
      
        Commit 3105f234 ("thermal/powerclamp: correct cpu support check"),
        which is shipped in 4.9-rc3, fixed a problem introduced by commit
        b721ca0d ("thermal/powerclamp: remove cpu whitelist").
      
        But unfortunately, it broke intel_powerclamp driver module auto-
        loading at the same time. Thus we need this change to add back module
        auto-loading for 4.9"
      
      * 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
        thermal/powerclamp: add back module device table
      23400ac9
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · b66c08ba
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Two small fixes.
      
        One prevents timeouts on mpt3sas when trying to use the secure erase
        protocol which causes the erase protocol to be aborted. The second is
        a regression in a prior fix which causes all commands to abort during
        PCI extended error recovery, which is incorrect because PCI EEH is
        independent from what's happening on the FC transport"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: qla2xxx: do not abort all commands in the adapter during EEH recovery
        scsi: mpt3sas: Fix secure erase premature termination
      b66c08ba
    • Linus Torvalds's avatar
      Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · 57527ed1
      Linus Torvalds authored
      Pull clk fixes from Stephen Boyd:
       "A handful of driver fixes.
      
        The sunxi fixes are for an incorrect clk tree configuration and a bad
        frequency calculation. The other two are fixes for passing the wrong
        pointer in drivers recently converted to clk_hw style registration"
      
      * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        clk: efm32gg: Pass correct type to hw provider registration
        clk: berlin: Pass correct type to hw provider registration
        clk: sunxi: Fix M factor computation for APB1
        clk: sunxi-ng: sun6i-a31: Force AHB1 clock to use PLL6 as parent
      57527ed1
    • Arnd Bergmann's avatar
      NFSv4.x: hide array-bounds warning · d55b352b
      Arnd Bergmann authored
      A correct bugfix introduced a harmless warning that shows up with gcc-7:
      
      fs/nfs/callback.c: In function 'nfs_callback_up':
      fs/nfs/callback.c:214:14: error: array subscript is outside array bounds [-Werror=array-bounds]
      
      What happens here is that the 'minorversion == 0' check tells the
      compiler that we assume minorversion can be something other than 0,
      but when CONFIG_NFS_V4_1 is disabled that would be invalid and
      result in an out-of-bounds access.
      
      The added check for IS_ENABLED(CONFIG_NFS_V4_1) tells gcc that this
      really can't happen, which makes the code slightly smaller and also
      avoids the warning.
      
      The bugfix that introduced the warning is marked for stable backports,
      we want this one backported to the same releases.
      
      Fixes: 98b0f80c ("NFSv4.x: Fix a refcount leak in nfs_callback_up_net")
      Cc: stable@vger.kernel.org # v3.7+
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      d55b352b
    • Linus Torvalds's avatar
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 000b8949
      Linus Torvalds authored
      Pull scheduler fixes from Ingo Molnar:
       "Two fixes for autogroup scheduling, for races when turning the feature
        on/off via /proc/sys/kernel/sched_autogroup_enabled"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/autogroup: Do not use autogroup->tg in zombie threads
        sched/autogroup: Fix autogroup_move_group() to never skip sched_move_task()
      000b8949
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7cfc4317
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "Misc fixes:
         - two fixes to make (very) old Intel CPUs boot reliably
         - fix the intel-mid driver and rename it
         - two KASAN false positive fixes
         - an FPU fix
         - two sysfb fixes
         - two build fixes related to new toolchain versions"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/platform/intel-mid: Rename platform_wdt to platform_mrfld_wdt
        x86/build: Build compressed x86 kernels as PIE when !CONFIG_RELOCATABLE as well
        x86/platform/intel-mid: Register watchdog device after SCU
        x86/fpu: Fix invalid FPU ptrace state after execve()
        x86/boot: Fail the boot if !M486 and CPUID is missing
        x86/traps: Ignore high word of regs->cs in early_fixup_exception()
        x86/dumpstack: Prevent KASAN false positive warnings
        x86/unwind: Prevent KASAN false positive warnings in guess unwinder
        x86/boot: Avoid warning for zero-filling .bss
        x86/sysfb: Fix lfb_size calculation
        x86/sysfb: Add support for 64bit EFI lfb_base
      7cfc4317
    • Peter Zijlstra's avatar
      perf/x86/intel/uncore: Allow only a single PMU/box within an events group · 033ac60c
      Peter Zijlstra authored
      Group validation expects all events to be of the same PMU; however
      is_uncore_pmu() is too wide, it matches _all_ uncore events, even
      across PMUs.
      
      This triggers failure when we group different events from different
      uncore PMUs, like:
      
        perf stat -vv -e '{uncore_cbox_0/config=0x0334/,uncore_qpi_0/event=1/}' -a sleep 1
      
      Fix is_uncore_pmu() by only matching events to the box at hand.
      
      Note that generic code; ran after this step; will disallow this
      mixture of PMU events.
      Reported-by: default avatarJiri Olsa <jolsa@redhat.com>
      Tested-by: default avatarJiri Olsa <jolsa@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vince@deater.net>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/20161118125354.GQ3117@twins.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      033ac60c
    • Peter Zijlstra's avatar
      perf/x86/intel: Cure bogus unwind from PEBS entries · b8000586
      Peter Zijlstra authored
      Vince Weaver reported that perf_fuzzer + KASAN detects that PEBS event
      unwinds sometimes do 'weird' things. In particular, we seemed to be
      ending up unwinding from random places on the NMI stack.
      
      While it was somewhat expected that the event record BP,SP would not
      match the interrupt BP,SP in that the interrupt is strictly later than
      the record event, it was overlooked that it could be on an already
      overwritten stack.
      
      Therefore, don't copy the recorded BP,SP over the interrupted BP,SP
      when we need stack unwinds.
      
      Note that its still possible the unwind doesn't full match the actual
      event, as its entirely possible to have done an (I)RET between record
      and interrupt, but on average it should still point in the general
      direction of where the event came from. Also, it's the best we can do,
      considering.
      
      The particular scenario that triggered the bogus NMI stack unwind was
      a PEBS event with very short period, upon enabling the event at the
      tail of the PMI handler (FREEZE_ON_PMI is not used), it instantly
      triggers a record (while still on the NMI stack) which in turn
      triggers the next PMI. This then causes back-to-back NMIs and we'll
      try and unwind the stack-frame from the last NMI, which obviously is
      now overwritten by our own.
      Analyzed-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Reported-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: davej@codemonkey.org.uk <davej@codemonkey.org.uk>
      Cc: dvyukov@google.com <dvyukov@google.com>
      Cc: stable@vger.kernel.org
      Fixes: ca037701 ("perf, x86: Add PEBS infrastructure")
      Link: http://lkml.kernel.org/r/20161117171731.GV3157@twins.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b8000586
    • Johannes Weiner's avatar
      perf/x86: Restore TASK_SIZE check on frame pointer · ae31fe51
      Johannes Weiner authored
      The following commit:
      
        75925e1a ("perf/x86: Optimize stack walk user accesses")
      
      ... switched from copy_from_user_nmi() to __copy_from_user_nmi() with a manual
      access_ok() check.
      
      Unfortunately, copy_from_user_nmi() does an explicit check against TASK_SIZE,
      whereas the access_ok() uses whatever the current address limit of the task is.
      
      We are getting NMIs when __probe_kernel_read() has switched to KERNEL_DS, and
      then see vmalloc faults when we access what looks like pointers into vmalloc
      space:
      
        [] WARNING: CPU: 3 PID: 3685731 at arch/x86/mm/fault.c:435 vmalloc_fault+0x289/0x290
        [] CPU: 3 PID: 3685731 Comm: sh Tainted: G        W       4.6.0-5_fbk1_223_gdbf0f40 #1
        [] Call Trace:
        []  <NMI>  [<ffffffff814717d1>] dump_stack+0x4d/0x6c
        []  [<ffffffff81076e43>] __warn+0xd3/0xf0
        []  [<ffffffff81076f2d>] warn_slowpath_null+0x1d/0x20
        []  [<ffffffff8104a899>] vmalloc_fault+0x289/0x290
        []  [<ffffffff8104b5a0>] __do_page_fault+0x330/0x490
        []  [<ffffffff8104b70c>] do_page_fault+0xc/0x10
        []  [<ffffffff81794e82>] page_fault+0x22/0x30
        []  [<ffffffff81006280>] ? perf_callchain_user+0x100/0x2a0
        []  [<ffffffff8115124f>] get_perf_callchain+0x17f/0x190
        []  [<ffffffff811512c7>] perf_callchain+0x67/0x80
        []  [<ffffffff8114e750>] perf_prepare_sample+0x2a0/0x370
        []  [<ffffffff8114e840>] perf_event_output+0x20/0x60
        []  [<ffffffff8114aee7>] ? perf_event_update_userpage+0xc7/0x130
        []  [<ffffffff8114ea01>] __perf_event_overflow+0x181/0x1d0
        []  [<ffffffff8114f484>] perf_event_overflow+0x14/0x20
        []  [<ffffffff8100a6e3>] intel_pmu_handle_irq+0x1d3/0x490
        []  [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10
        []  [<ffffffff81197191>] ? vunmap_page_range+0x1a1/0x2f0
        []  [<ffffffff811972f1>] ? unmap_kernel_range_noflush+0x11/0x20
        []  [<ffffffff814f2056>] ? ghes_copy_tofrom_phys+0x116/0x1f0
        []  [<ffffffff81040d1d>] ? x2apic_send_IPI_self+0x1d/0x20
        []  [<ffffffff8100411d>] perf_event_nmi_handler+0x2d/0x50
        []  [<ffffffff8101ea31>] nmi_handle+0x61/0x110
        []  [<ffffffff8101ef94>] default_do_nmi+0x44/0x110
        []  [<ffffffff8101f13b>] do_nmi+0xdb/0x150
        []  [<ffffffff81795187>] end_repeat_nmi+0x1a/0x1e
        []  [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10
        []  [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10
        []  [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10
        []  <<EOE>>  <IRQ>  [<ffffffff8115d05e>] ? __probe_kernel_read+0x3e/0xa0
      
      Fix this by moving the valid_user_frame() check to before the uaccess
      that loads the return address and the pointer to the next frame.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Fixes: 75925e1a ("perf/x86: Optimize stack walk user accesses")
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ae31fe51
    • Oleg Nesterov's avatar
      sched/autogroup: Do not use autogroup->tg in zombie threads · 8e5bfa8c
      Oleg Nesterov authored
      Exactly because for_each_thread() in autogroup_move_group() can't see it
      and update its ->sched_task_group before _put() and possibly free().
      
      So the exiting task needs another sched_move_task() before exit_notify()
      and we need to re-introduce the PF_EXITING (or similar) check removed by
      the previous change for another reason.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: hartsjc@redhat.com
      Cc: vbendel@redhat.com
      Cc: vlovejoy@redhat.com
      Link: http://lkml.kernel.org/r/20161114184612.GA15968@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8e5bfa8c
    • Oleg Nesterov's avatar
      sched/autogroup: Fix autogroup_move_group() to never skip sched_move_task() · 18f649ef
      Oleg Nesterov authored
      The PF_EXITING check in task_wants_autogroup() is no longer needed. Remove
      it, but see the next patch.
      
      However the comment is correct in that autogroup_move_group() must always
      change task_group() for every thread so the sysctl_ check is very wrong;
      we can race with cgroups and even sys_setsid() is not safe because a task
      running with task_group() == ag->tg must participate in refcounting:
      
      	int main(void)
      	{
      		int sctl = open("/proc/sys/kernel/sched_autogroup_enabled", O_WRONLY);
      
      		assert(sctl > 0);
      		if (fork()) {
      			wait(NULL); // destroy the child's ag/tg
      			pause();
      		}
      
      		assert(pwrite(sctl, "1\n", 2, 0) == 2);
      		assert(setsid() > 0);
      		if (fork())
      			pause();
      
      		kill(getppid(), SIGKILL);
      		sleep(1);
      
      		// The child has gone, the grandchild runs with kref == 1
      		assert(pwrite(sctl, "0\n", 2, 0) == 2);
      		assert(setsid() > 0);
      
      		// runs with the freed ag/tg
      		for (;;)
      			sleep(1);
      
      		return 0;
      	}
      
      crashes the kernel. It doesn't really need sleep(1), it doesn't matter if
      autogroup_move_group() actually frees the task_group or this happens later.
      Reported-by: default avatarVern Lovejoy <vlovejoy@redhat.com>
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: hartsjc@redhat.com
      Cc: vbendel@redhat.com
      Link: http://lkml.kernel.org/r/20161114184609.GA15965@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      18f649ef