1. 13 May, 2016 15 commits
  2. 12 May, 2016 25 commits
    • Andrea Arcangeli's avatar
      mm: thp: calculate the mapcount correctly for THP pages during WP faults · 6d0a07ed
      Andrea Arcangeli authored
      This will provide fully accuracy to the mapcount calculation in the
      write protect faults, so page pinning will not get broken by false
      positive copy-on-writes.
      
      total_mapcount() isn't the right calculation needed in
      reuse_swap_page(), so this introduces a page_trans_huge_mapcount()
      that is effectively the full accurate return value for page_mapcount()
      if dealing with Transparent Hugepages, however we only use the
      page_trans_huge_mapcount() during COW faults where it strictly needed,
      due to its higher runtime cost.
      
      This also provide at practical zero cost the total_mapcount
      information which is needed to know if we can still relocate the page
      anon_vma to the local vma. If page_trans_huge_mapcount() returns 1 we
      can reuse the page no matter if it's a pte or a pmd_trans_huge
      triggering the fault, but we can only relocate the page anon_vma to
      the local vma->anon_vma if we're sure it's only this "vma" mapping the
      whole THP physical range.
      
      Kirill A. Shutemov discovered the problem with moving the page
      anon_vma to the local vma->anon_vma in a previous version of this
      patch and another problem in the way page_move_anon_rmap() was called.
      
      Andrew Morton discovered that CONFIG_SWAP=n wouldn't build in a
      previous version, because reuse_swap_page must be a macro to call
      page_trans_huge_mapcount from swap.h, so this uses a macro again
      instead of an inline function. With this change at least it's a less
      dangerous usage than it was before, because "page" is used only once
      now, while with the previous code reuse_swap_page(page++) would have
      called page_mapcount on page+1 and it would have increased page twice
      instead of just once.
      
      Dean Luick noticed an uninitialized variable that could result in a
      rmap inefficiency for the non-THP case in a previous version.
      
      Mike Marciniszyn said:
      
      : Our RDMA tests are seeing an issue with memory locking that bisects to
      : commit 61f5d698 ("mm: re-enable THP")
      :
      : The test program registers two rather large MRs (512M) and RDMA
      : writes data to a passive peer using the first and RDMA reads it back
      : into the second MR and compares that data.  The sizes are chosen randomly
      : between 0 and 1024 bytes.
      :
      : The test will get through a few (<= 4 iterations) and then gets a
      : compare error.
      :
      : Tracing indicates the kernel logical addresses associated with the individual
      : pages at registration ARE correct , the data in the "RDMA read response only"
      : packets ARE correct.
      :
      : The "corruption" occurs when the packet crosse two pages that are not physically
      : contiguous.   The second page reads back as zero in the program.
      :
      : It looks like the user VA at the point of the compare error no longer points to
      : the same physical address as was registered.
      :
      : This patch totally resolves the issue!
      
      Link: http://lkml.kernel.org/r/1462547040-1737-2-git-send-email-aarcange@redhat.comSigned-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: default avatar"Kirill A. Shutemov" <kirill@shutemov.name>
      Reviewed-by: default avatarDean Luick <dean.luick@intel.com>
      Tested-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Tested-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Tested-by: default avatarJosh Collier <josh.d.collier@intel.com>
      Cc: Marc Haber <mh+linux-kernel@zugschlus.de>
      Cc: <stable@vger.kernel.org>	[4.5]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6d0a07ed
    • Zhou Chengming's avatar
      ksm: fix conflict between mmput and scan_get_next_rmap_item · 7496fea9
      Zhou Chengming authored
      A concurrency issue about KSM in the function scan_get_next_rmap_item.
      
      task A (ksmd):				|task B (the mm's task):
      					|
      mm = slot->mm;				|
      down_read(&mm->mmap_sem);		|
      					|
      ...					|
      					|
      spin_lock(&ksm_mmlist_lock);		|
      					|
      ksm_scan.mm_slot go to the next slot;	|
      					|
      spin_unlock(&ksm_mmlist_lock);		|
      					|mmput() ->
      					|	ksm_exit():
      					|
      					|spin_lock(&ksm_mmlist_lock);
      					|if (mm_slot && ksm_scan.mm_slot != mm_slot) {
      					|	if (!mm_slot->rmap_list) {
      					|		easy_to_free = 1;
      					|		...
      					|
      					|if (easy_to_free) {
      					|	mmdrop(mm);
      					|	...
      					|
      					|So this mm_struct may be freed in the mmput().
      					|
      up_read(&mm->mmap_sem);			|
      
      As we can see above, the ksmd thread may access a mm_struct that already
      been freed to the kmem_cache.  Suppose a fork will get this mm_struct from
      the kmem_cache, the ksmd thread then call up_read(&mm->mmap_sem), will
      cause mmap_sem.count to become -1.
      
      As suggested by Andrea Arcangeli, unmerge_and_remove_all_rmap_items has
      the same SMP race condition, so fix it too.  My prev fix in function
      scan_get_next_rmap_item will introduce a different SMP race condition, so
      just invert the up_read/spin_unlock order as Andrea Arcangeli said.
      
      Link: http://lkml.kernel.org/r/1462708815-31301-1-git-send-email-zhouchengming1@huawei.comSigned-off-by: default avatarZhou Chengming <zhouchengming1@huawei.com>
      Suggested-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Geliang Tang <geliangtang@163.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Hanjun Guo <guohanjun@huawei.com>
      Cc: Ding Tianhong <dingtianhong@huawei.com>
      Cc: Li Bin <huawei.libin@huawei.com>
      Cc: Zhen Lei <thunder.leizhen@huawei.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7496fea9
    • Junxiao Bi's avatar
      ocfs2: fix posix_acl_create deadlock · c25a1e06
      Junxiao Bi authored
      Commit 702e5bc6 ("ocfs2: use generic posix ACL infrastructure")
      refactored code to use posix_acl_create.  The problem with this function
      is that it is not mindful of the cluster wide inode lock making it
      unsuitable for use with ocfs2 inode creation with ACLs.  For example,
      when used in ocfs2_mknod, this function can cause deadlock as follows.
      The parent dir inode lock is taken when calling posix_acl_create ->
      get_acl -> ocfs2_iop_get_acl which takes the inode lock again.  This can
      cause deadlock if there is a blocked remote lock request waiting for the
      lock to be downconverted.  And same deadlock happened in ocfs2_reflink.
      This fix is to revert back using ocfs2_init_acl.
      
      Fixes: 702e5bc6 ("ocfs2: use generic posix ACL infrastructure")
      Signed-off-by: default avatarTariq Saeed <tariq.x.saeed@oracle.com>
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c25a1e06
    • Junxiao Bi's avatar
      ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang · 5ee0fbd5
      Junxiao Bi authored
      Commit 743b5f14 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
      introduced this issue.  ocfs2_setattr called by chmod command holds
      cluster wide inode lock when calling posix_acl_chmod.  This latter
      function in turn calls ocfs2_iop_get_acl and ocfs2_iop_set_acl.  These
      two are also called directly from vfs layer for getfacl/setfacl commands
      and therefore acquire the cluster wide inode lock.  If a remote
      conversion request comes after the first inode lock in ocfs2_setattr,
      OCFS2_LOCK_BLOCKED will be set.  And this will cause the second call to
      inode lock from the ocfs2_iop_get_acl() to block indefinetly.
      
      The deleted version of ocfs2_acl_chmod() calls __posix_acl_chmod() which
      does not call back into the filesystem.  Therefore, we restore
      ocfs2_acl_chmod(), modify it slightly for locking as needed, and use that
      instead.
      
      Fixes: 743b5f14 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
      Signed-off-by: default avatarTariq Saeed <tariq.x.saeed@oracle.com>
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5ee0fbd5
    • Linus Torvalds's avatar
      Merge tag 'keys-fixes-20160512' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 02c9c0e9
      Linus Torvalds authored
      Pull keyring fix from David Howells:
       "Fix ASN.1 indefinite length object parsing"
      
      * tag 'keys-fixes-20160512' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
        KEYS: Fix ASN.1 indefinite length object parsing
      02c9c0e9
    • Linus Torvalds's avatar
      Merge tag 'sound-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · e5ad8b6d
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "This is a pretty boring pull request as you wish: including a few
        small and trivial HD-audio and USB-audio quirks and a couple of small
        regression fixes in HD-audio"
      
      * tag 'sound-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: usb-audio: Yet another Phoneix Audio device quirk
        ALSA: hda - Fix regression on ATI HDMI audio
        ALSA: hda - Fix subwoofer pin on ASUS N751 and N551
        ALSA: hda - Fix broken reconfig
        ALSA: hda - Fix white noise on Asus UX501VW headset
        ALSA: usb-audio: Quirk for yet another Phoenix Audio devices (v2)
      e5ad8b6d
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · ed1e33dd
      Linus Torvalds authored
      Pull input subsystem fixes from Dmitry Torokhov.
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: twl6040-vibra - fix DT node memory management
        Input: max8997-haptic - fix NULL pointer dereference
        Input: byd - update copyright header
      ed1e33dd
    • Arnaldo Carvalho de Melo's avatar
      perf stat: Fallback to user only counters when perf_event_paranoid > 1 · 42ef8a78
      Arnaldo Carvalho de Melo authored
      After 0161028b ("perf/core: Change the default paranoia level to 2")
      'perf stat' fails for users without CAP_SYS_ADMIN, so just use
      'perf_evsel__fallback()' to have the same behaviour as 'perf record',
      i.e. set perf_event_attr.exclude_kernel to 1.
      
      Now:
      
        [acme@jouet linux]$ perf stat usleep 1
      
         Performance counter stats for 'usleep 1':
      
                0.352536      task-clock:u (msec)  #   0.423 CPUs utilized
                       0      context-switches:u   #   0.000 K/sec
                       0      cpu-migrations:u     #   0.000 K/sec
                      49      page-faults:u        #   0.139 M/sec
                 309,407      cycles:u             #   0.878 GHz
                 243,791      instructions:u       #   0.79  insn per cycle
                  49,622      branches:u           # 140.757 M/sec
                   3,884      branch-misses:u      #   7.83% of all branches
      
             0.000834174 seconds time elapsed
      
        [acme@jouet linux]$
      Reported-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-b20jmx4dxt5hpaa9t2rroi0o@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      42ef8a78
    • Arnaldo Carvalho de Melo's avatar
      perf evsel: Handle EACCESS + perf_event_paranoid=2 in fallback() · 08094828
      Arnaldo Carvalho de Melo authored
      Now with the default for the kernel.perf_event_paranoid sysctl being 2 [1]
      we need to fall back to :u, i.e. to set perf_event_attr.exclude_kernel
      to 1.
      
      Before:
      
        [acme@jouet linux]$ perf record usleep 1
        Error:
        You may not have permission to collect stats.
      
        Consider tweaking /proc/sys/kernel/perf_event_paranoid,
        which controls use of the performance events system by
        unprivileged users (without CAP_SYS_ADMIN).
      
        The current value is 2:
      
          -1: Allow use of (almost) all events by all users
        >= 0: Disallow raw tracepoint access by users without CAP_IOC_LOCK
        >= 1: Disallow CPU event access by users without CAP_SYS_ADMIN
        >= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN
        [acme@jouet linux]$
      
      After:
      
        [acme@jouet linux]$ perf record usleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.016 MB perf.data (7 samples) ]
        [acme@jouet linux]$ perf evlist
        cycles:u
        [acme@jouet linux]$ perf evlist -v
        cycles:u: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, exclude_kernel: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
        [acme@jouet linux]$
      
      And if the user turns on verbose mode, an explanation will appear:
      
        [acme@jouet linux]$ perf record -v usleep 1
        Warning:
        kernel.perf_event_paranoid=2, trying to fall back to excluding kernel samples
        mmap size 528384B
        [ perf record: Woken up 1 times to write data ]
        Looking at the vmlinux_path (8 entries long)
        Using /lib/modules/4.6.0-rc7+/build/vmlinux for symbols
        [ perf record: Captured and wrote 0.016 MB perf.data (7 samples) ]
        [acme@jouet linux]$
      
      [1] 0161028b ("perf/core: Change the default paranoia level to 2")
      Reported-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-b20jmx4dxt5hpaa9t2rroi0o@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      08094828
    • Alex Deucher's avatar
      drm/amdgpu: fix DP mode validation · c47b9e09
      Alex Deucher authored
      Switch the order of the loops to walk the rates on the top
      so we exhaust all DP 1.1 rate/lane combinations before trying
      DP 1.2 rate/lane combos.
      
      This avoids selecting rates that are supported by the monitor,
      but not the connector leading to valid modes getting rejected.
      
      bug:
      https://bugs.freedesktop.org/show_bug.cgi?id=95206Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org
      c47b9e09
    • Alex Deucher's avatar
      drm/radeon: fix DP mode validation · ff0bd441
      Alex Deucher authored
      Switch the order of the loops to walk the rates on the top
      so we exhaust all DP 1.1 rate/lane combinations before trying
      DP 1.2 rate/lane combos.
      
      This avoids selecting rates that are supported by the monitor,
      but not the connector leading to valid modes getting rejected.
      
      bug:
      https://bugs.freedesktop.org/show_bug.cgi?id=95206Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org
      ff0bd441
    • Arnaldo Carvalho de Melo's avatar
      perf evsel: Improve EPERM error handling in open_strerror() · 7d173913
      Arnaldo Carvalho de Melo authored
      We were showing a hardcoded default value for the kernel.perf_event_paranoid
      sysctl, now that it became more paranoid (1 -> 2 [1]), this would need to be
      updated, instead show the current value:
      
        [acme@jouet linux]$ perf record ls
        Error:
        You may not have permission to collect stats.
      
        Consider tweaking /proc/sys/kernel/perf_event_paranoid,
        which controls use of the performance events system by
        unprivileged users (without CAP_SYS_ADMIN).
      
        The current value is 2:
      
          -1: Allow use of (almost) all events by all users
        >= 0: Disallow raw tracepoint access by users without CAP_IOC_LOCK
        >= 1: Disallow CPU event access by users without CAP_SYS_ADMIN
        >= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN
        [acme@jouet linux]$
      
      [1] 0161028b ("perf/core: Change the default paranoia level to 2")
      Reported-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-0gc4rdpg8d025r5not8s8028@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7d173913
    • Linus Torvalds's avatar
      Merge tag 'pinctrl-v4.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · 422ce5a9
      Linus Torvalds authored
      Pull pinctrl fix from Linus Walleij:
       "A single last pin control fix for v4.6.  t's tagged for stable and
        only hits a single driver with two added lines so should be safe.
        Tested in linux-next.
      
         - The pull up/down logic for the AT91 PIO4 controller was tilted: we
           need to mask the reverse pull when unmasking a pull direction.
      
           Setting both pull up & pull down is illegal and makes no sense"
      
      * tag 'pinctrl-v4.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        pinctrl: at91-pio4: fix pull-up/down logic
      422ce5a9
    • Wanpeng Li's avatar
      workqueue: fix rebind bound workers warning · f7c17d26
      Wanpeng Li authored
      ------------[ cut here ]------------
      WARNING: CPU: 0 PID: 16 at kernel/workqueue.c:4559 rebind_workers+0x1c0/0x1d0
      Modules linked in:
      CPU: 0 PID: 16 Comm: cpuhp/0 Not tainted 4.6.0-rc4+ #31
      Hardware name: IBM IBM System x3550 M4 Server -[7914IUW]-/00Y8603, BIOS -[D7E128FUS-1.40]- 07/23/2013
       0000000000000000 ffff881037babb58 ffffffff8139d885 0000000000000010
       0000000000000000 0000000000000000 0000000000000000 ffff881037babba8
       ffffffff8108505d ffff881037ba0000 000011cf3e7d6e60 0000000000000046
      Call Trace:
       dump_stack+0x89/0xd4
       __warn+0xfd/0x120
       warn_slowpath_null+0x1d/0x20
       rebind_workers+0x1c0/0x1d0
       workqueue_cpu_up_callback+0xf5/0x1d0
       notifier_call_chain+0x64/0x90
       ? trace_hardirqs_on_caller+0xf2/0x220
       ? notify_prepare+0x80/0x80
       __raw_notifier_call_chain+0xe/0x10
       __cpu_notify+0x35/0x50
       notify_down_prepare+0x5e/0x80
       ? notify_prepare+0x80/0x80
       cpuhp_invoke_callback+0x73/0x330
       ? __schedule+0x33e/0x8a0
       cpuhp_down_callbacks+0x51/0xc0
       cpuhp_thread_fun+0xc1/0xf0
       smpboot_thread_fn+0x159/0x2a0
       ? smpboot_create_threads+0x80/0x80
       kthread+0xef/0x110
       ? wait_for_completion+0xf0/0x120
       ? schedule_tail+0x35/0xf0
       ret_from_fork+0x22/0x50
       ? __init_kthread_worker+0x70/0x70
      ---[ end trace eb12ae47d2382d8f ]---
      notify_down_prepare: attempt to take down CPU 0 failed
      
      This bug can be reproduced by below config w/ nohz_full= all cpus:
      
      CONFIG_BOOTPARAM_HOTPLUG_CPU0=y
      CONFIG_DEBUG_HOTPLUG_CPU0=y
      CONFIG_NO_HZ_FULL=y
      
      As Thomas pointed out:
      
      | If a down prepare callback fails, then DOWN_FAILED is invoked for all
      | callbacks which have successfully executed DOWN_PREPARE.
      |
      | But, workqueue has actually two notifiers. One which handles
      | UP/DOWN_FAILED/ONLINE and one which handles DOWN_PREPARE.
      |
      | Now look at the priorities of those callbacks:
      |
      | CPU_PRI_WORKQUEUE_UP        = 5
      | CPU_PRI_WORKQUEUE_DOWN      = -5
      |
      | So the call order on DOWN_PREPARE is:
      |
      | CB 1
      | CB ...
      | CB workqueue_up() -> Ignores DOWN_PREPARE
      | CB ...
      | CB X ---> Fails
      |
      | So we call up to CB X with DOWN_FAILED
      |
      | CB 1
      | CB ...
      | CB workqueue_up() -> Handles DOWN_FAILED
      | CB ...
      | CB X-1
      |
      | So the problem is that the workqueue stuff handles DOWN_FAILED in the up
      | callback, while it should do it in the down callback. Which is not a good idea
      | either because it wants to be called early on rollback...
      |
      | Brilliant stuff, isn't it? The hotplug rework will solve this problem because
      | the callbacks become symetric, but for the existing mess, we need some
      | workaround in the workqueue code.
      
      The boot CPU handles housekeeping duty(unbound timers, workqueues,
      timekeeping, ...) on behalf of full dynticks CPUs. It must remain
      online when nohz full is enabled. There is a priority set to every
      notifier_blocks:
      
      workqueue_cpu_up > tick_nohz_cpu_down > workqueue_cpu_down
      
      So tick_nohz_cpu_down callback failed when down prepare cpu 0, and
      notifier_blocks behind tick_nohz_cpu_down will not be called any
      more, which leads to workers are actually not unbound. Then hotplug
      state machine will fallback to undo and online cpu 0 again. Workers
      will be rebound unconditionally even if they are not unbound and
      trigger the warning in this progress.
      
      This patch fix it by catching !DISASSOCIATED to avoid rebind bound
      workers.
      
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: stable@vger.kernel.org
      Suggested-by: default avatarLai Jiangshan <jiangshanlai@gmail.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      f7c17d26
    • Arnd Bergmann's avatar
      Merge tag 'at91-fixes2' of... · 8a934ccb
      Arnd Bergmann authored
      Merge tag 'at91-fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/nferre/linux-at91 into fixes
      
      Merge "Second AT91 fix PR for 4.6" from Nicolas Ferre:
      
      - fix a regression on the clock subsystem while switching to syscon/regmap
        due to a stricter check of the register map.
      
      * tag 'at91-fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/nferre/linux-at91:
        ARM: dts: at91: sam9x5: Fix the memory range assigned to the PMC
      8a934ccb
    • Steven Rostedt's avatar
      tools lib traceevent: Do not reassign parg after collapse_tree() · 106b816c
      Steven Rostedt authored
      At the end of process_filter(), collapse_tree() was changed to update
      the parg parameter, but the reassignment after the call wasn't removed.
      
      What happens is that the "current_op" gets modified and freed and parg
      is assigned to the new allocated argument. But after the call to
      collapse_tree(), parg is assigned again to the just freed "current_op",
      and this causes the tool to crash.
      
      The current_op variable must also be assigned to NULL in case of error,
      otherwise it will cause it to be free()ed twice.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: stable@vger.kernel.org # 3.14+
      Fixes: 42d6194d ("tools lib traceevent: Refactor process_filter()")
      Link: http://lkml.kernel.org/r/20160511150936.678c18a1@gandalf.local.homeSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      106b816c
    • Arnaldo Carvalho de Melo's avatar
      perf probe: Check if dwarf_getlocations() is available · 49247345
      Arnaldo Carvalho de Melo authored
      If not, tell the user that:
      
        config/Makefile:273: Old libdw.h, finding variables at given 'perf probe' point will not work, install elfutils-devel/libdw-dev >= 0.157
      
      And return -ENOTSUPP in die_get_var_range(), failing features that
      need it, like the one pointed out above.
      
      This fixes the build on older systems, such as Ubuntu 12.04.5.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Vinson Lee <vlee@freedesktop.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-9l7luqkq4gfnx7vrklkq4obs@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      49247345
    • Arnaldo Carvalho de Melo's avatar
      perf dwarf: Guard !x86_64 definitions under #ifdef else clause · 62aa0e17
      Arnaldo Carvalho de Melo authored
      To fix the build on Fedora Rawhide (gcc 6.0.0 20160311 (Red Hat 6.0.0-0.17):
      
          CC       /tmp/build/perf/arch/x86/util/dwarf-regs.o
        arch/x86/util/dwarf-regs.c:66:36: error: 'x86_32_regoffset_table' defined but not used [-Werror=unused-const-variable=]
         static const struct pt_regs_offset x86_32_regoffset_table[] = {
                                            ^~~~~~~~~~~~~~~~~~~~~~
        cc1: all warnings being treated as errors
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-fghuksc1u8ln82bof4lwcj0o@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      62aa0e17
    • Arnaldo Carvalho de Melo's avatar
      perf tools: Use readdir() instead of deprecated readdir_r() · 22a9f41b
      Arnaldo Carvalho de Melo authored
      The readdir() function is thread safe as long as just one thread uses a
      DIR, which is the case when parsing tracepoint event definitions, to
      avoid breaking the build with glibc-2.23.90 (upcoming 2.24), use it
      instead of readdir_r().
      
      See: http://man7.org/linux/man-pages/man3/readdir.3.html
      
      "However, in modern implementations (including the glibc implementation),
      concurrent calls to readdir() that specify different directory streams
      are thread-safe.  In cases where multiple threads must read from the
      same directory stream, using readdir() with external synchronization is
      still preferable to the use of the deprecated readdir_r(3) function."
      
      Noticed while building on a Fedora Rawhide docker container.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-wddn49r6bz6wq4ee3dxbl7lo@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      22a9f41b
    • Arnaldo Carvalho de Melo's avatar
      perf thread_map: Use readdir() instead of deprecated readdir_r() · 7839b9f3
      Arnaldo Carvalho de Melo authored
      The readdir() function is thread safe as long as just one thread uses a
      DIR, which is the case in thread_map, so, to avoid breaking the build
      with glibc-2.23.90 (upcoming 2.24), use it instead of readdir_r().
      
      See: http://man7.org/linux/man-pages/man3/readdir.3.html
      
      "However, in modern implementations (including the glibc implementation),
      concurrent calls to readdir() that specify different directory streams
      are thread-safe.  In cases where multiple threads must read from the
      same directory stream, using readdir() with external synchronization is
      still preferable to the use of the deprecated readdir_r(3) function."
      
      Noticed while building on a Fedora Rawhide docker container.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-del8h2a0f40z75j4r42l96l0@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7839b9f3
    • Arnaldo Carvalho de Melo's avatar
      perf script: Use readdir() instead of deprecated readdir_r() · 9a5f3bf3
      Arnaldo Carvalho de Melo authored
      The readdir() function is thread safe as long as just one thread uses a
      DIR, which is the case in 'perf script', so, to avoid breaking the build
      with glibc-2.23.90 (upcoming 2.24), use it instead of readdir_r().
      
      See: http://man7.org/linux/man-pages/man3/readdir.3.html
      
      "However, in modern implementations (including the glibc implementation),
      concurrent calls to readdir() that specify different directory streams
      are thread-safe.  In cases where multiple threads must read from the
      same directory stream, using readdir() with external synchronization is
      still preferable to the use of the deprecated readdir_r(3) function."
      
      Noticed while building on a Fedora Rawhide docker container.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-mt3xz7n2hl49ni2vx7kuq74g@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9a5f3bf3
    • Arnaldo Carvalho de Melo's avatar
      perf tools: Use readdir() instead of deprecated readdir_r() · 2515e614
      Arnaldo Carvalho de Melo authored
      The readdir() function is thread safe as long as just one thread uses a
      DIR, which is the case when synthesizing events for pre-existing threads
      by traversing /proc, so, to avoid breaking the build with glibc-2.23.90
      (upcoming 2.24), use it instead of readdir_r().
      
      See: http://man7.org/linux/man-pages/man3/readdir.3.html
      
      "However, in modern implementations (including the glibc implementation),
      concurrent calls to readdir() that specify different directory streams
      are thread-safe.  In cases where multiple threads must read from the
      same directory stream, using readdir() with external synchronization is
      still preferable to the use of the deprecated readdir_r(3) function."
      
      Noticed while building on a Fedora Rawhide docker container.
      
         CC       /tmp/build/perf/util/event.o
        util/event.c: In function '__event__synthesize_thread':
        util/event.c:466:2: error: 'readdir_r' is deprecated [-Werror=deprecated-declarations]
          while (!readdir_r(tasks, &dirent, &next) && next) {
          ^~~~~
        In file included from /usr/include/features.h:368:0,
                         from /usr/include/stdint.h:25,
                         from /usr/lib/gcc/x86_64-redhat-linux/6.0.0/include/stdint.h:9,
                         from /git/linux/tools/include/linux/types.h:6,
                         from util/event.c:1:
        /usr/include/dirent.h:189:12: note: declared here
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-i1vj7nyjp2p750rirxgrfd3c@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2515e614
    • Alexander Shishkin's avatar
      perf/core: Disable the event on a truncated AUX record · 9f448cd3
      Alexander Shishkin authored
      When the PMU driver reports a truncated AUX record, it effectively means
      that there is no more usable room in the event's AUX buffer (even though
      there may still be some room, so that perf_aux_output_begin() doesn't take
      action). At this point the consumer still has to be woken up and the event
      has to be disabled, otherwise the event will just keep spinning between
      perf_aux_output_begin() and perf_aux_output_end() until its context gets
      unscheduled.
      
      Again, for cpu-wide events this means never, so once in this condition,
      they will be forever losing data.
      
      Fix this by disabling the event and waking up the consumer in case of a
      truncated AUX record.
      Reported-by: default avatarMarkus Metzger <markus.t.metzger@intel.com>
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/1462886313-13660-3-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9f448cd3
    • Alexander Shishkin's avatar
      perf/x86/intel/pt: Generate PMI in the STOP region as well · ab92b232
      Alexander Shishkin authored
      Currently, the PT driver always sets the PMI bit one region (page) before
      the STOP region so that we can wake up the consumer before we run out of
      room in the buffer and have to disable the event. However, we also need
      an interrupt in the last output region, so that we actually get to disable
      the event (if no more room from new data is available at that point),
      otherwise hardware just quietly refuses to start, but the event is
      scheduled in and we end up losing trace data till the event gets removed.
      
      For a cpu-wide event it is even worse since there may not be any
      re-scheduling at all and no chance for the ring buffer code to notice
      that its buffer is filled up and the event needs to be disabled (so that
      the consumer can re-enable it when it finishes reading the data out). In
      other words, all the trace data will be lost after the buffer gets filled
      up.
      
      This patch makes PT also generate a PMI when the last output region is
      full.
      Reported-by: default avatarMarkus Metzger <markus.t.metzger@intel.com>
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/1462886313-13660-2-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ab92b232
    • David Howells's avatar
      KEYS: Fix ASN.1 indefinite length object parsing · 23c8a812
      David Howells authored
      This fixes CVE-2016-0758.
      
      In the ASN.1 decoder, when the length field of an ASN.1 value is extracted,
      it isn't validated against the remaining amount of data before being added
      to the cursor.  With a sufficiently large size indicated, the check:
      
      	datalen - dp < 2
      
      may then fail due to integer overflow.
      
      Fix this by checking the length indicated against the amount of remaining
      data in both places a definite length is determined.
      
      Whilst we're at it, make the following changes:
      
       (1) Check the maximum size of extended length does not exceed the capacity
           of the variable it's being stored in (len) rather than the type that
           variable is assumed to be (size_t).
      
       (2) Compare the EOC tag to the symbolic constant ASN1_EOC rather than the
           integer 0.
      
       (3) To reduce confusion, move the initialisation of len outside of:
      
      	for (len = 0; n > 0; n--) {
      
           since it doesn't have anything to do with the loop counter n.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarMimi Zohar <zohar@linux.vnet.ibm.com>
      Acked-by: default avatarDavid Woodhouse <David.Woodhouse@intel.com>
      Acked-by: default avatarPeter Jones <pjones@redhat.com>
      23c8a812