1. 17 May, 2016 40 commits
    • Eric W. Biederman's avatar
      propogate_mnt: Handle the first propogated copy being a slave · 60f7e3a2
      Eric W. Biederman authored
      [ Upstream commit 5ec0811d ]
      
      When the first propgated copy was a slave the following oops would result:
      > BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
      > IP: [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0
      > PGD bacd4067 PUD bac66067 PMD 0
      > Oops: 0000 [#1] SMP
      > Modules linked in:
      > CPU: 1 PID: 824 Comm: mount Not tainted 4.6.0-rc5userns+ #1523
      > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
      > task: ffff8800bb0a8000 ti: ffff8800bac3c000 task.ti: ffff8800bac3c000
      > RIP: 0010:[<ffffffff811fba4e>]  [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0
      > RSP: 0018:ffff8800bac3fd38  EFLAGS: 00010283
      > RAX: 0000000000000000 RBX: ffff8800bb77ec00 RCX: 0000000000000010
      > RDX: 0000000000000000 RSI: ffff8800bb58c000 RDI: ffff8800bb58c480
      > RBP: ffff8800bac3fd48 R08: 0000000000000001 R09: 0000000000000000
      > R10: 0000000000001ca1 R11: 0000000000001c9d R12: 0000000000000000
      > R13: ffff8800ba713800 R14: ffff8800bac3fda0 R15: ffff8800bb77ec00
      > FS:  00007f3c0cd9b7e0(0000) GS:ffff8800bfb00000(0000) knlGS:0000000000000000
      > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      > CR2: 0000000000000010 CR3: 00000000bb79d000 CR4: 00000000000006e0
      > Stack:
      >  ffff8800bb77ec00 0000000000000000 ffff8800bac3fd88 ffffffff811fbf85
      >  ffff8800bac3fd98 ffff8800bb77f080 ffff8800ba713800 ffff8800bb262b40
      >  0000000000000000 0000000000000000 ffff8800bac3fdd8 ffffffff811f1da0
      > Call Trace:
      >  [<ffffffff811fbf85>] propagate_mnt+0x105/0x140
      >  [<ffffffff811f1da0>] attach_recursive_mnt+0x120/0x1e0
      >  [<ffffffff811f1ec3>] graft_tree+0x63/0x70
      >  [<ffffffff811f1f6b>] do_add_mount+0x9b/0x100
      >  [<ffffffff811f2c1a>] do_mount+0x2aa/0xdf0
      >  [<ffffffff8117efbe>] ? strndup_user+0x4e/0x70
      >  [<ffffffff811f3a45>] SyS_mount+0x75/0xc0
      >  [<ffffffff8100242b>] do_syscall_64+0x4b/0xa0
      >  [<ffffffff81988f3c>] entry_SYSCALL64_slow_path+0x25/0x25
      > Code: 00 00 75 ec 48 89 0d 02 22 22 01 8b 89 10 01 00 00 48 89 05 fd 21 22 01 39 8e 10 01 00 00 0f 84 e0 00 00 00 48 8b 80 d8 00 00 00 <48> 8b 50 10 48 89 05 df 21 22 01 48 89 15 d0 21 22 01 8b 53 30
      > RIP  [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0
      >  RSP <ffff8800bac3fd38>
      > CR2: 0000000000000010
      > ---[ end trace 2725ecd95164f217 ]---
      
      This oops happens with the namespace_sem held and can be triggered by
      non-root users.  An all around not pleasant experience.
      
      To avoid this scenario when finding the appropriate source mount to
      copy stop the walk up the mnt_master chain when the first source mount
      is encountered.
      
      Further rewrite the walk up the last_source mnt_master chain so that
      it is clear what is going on.
      
      The reason why the first source mount is special is that it it's
      mnt_parent is not a mount in the dest_mnt propagation tree, and as
      such termination conditions based up on the dest_mnt mount propgation
      tree do not make sense.
      
      To avoid other kinds of confusion last_dest is not changed when
      computing last_source.  last_dest is only used once in propagate_one
      and that is above the point of the code being modified, so changing
      the global variable is meaningless and confusing.
      
      Cc: stable@vger.kernel.org
      fixes: f2ebb3a9 ("smarter propagate_mnt()")
      Reported-by: default avatarTycho Andersen <tycho.andersen@canonical.com>
      Reviewed-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Tested-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      60f7e3a2
    • Maxim Patlasov's avatar
      fs/pnode.c: treat zero mnt_group_id-s as unequal · 2d7405bf
      Maxim Patlasov authored
      [ Upstream commit 7ae8fd03 ]
      
      propagate_one(m) calculates "type" argument for copy_tree() like this:
      
      >    if (m->mnt_group_id == last_dest->mnt_group_id) {
      >        type = CL_MAKE_SHARED;
      >    } else {
      >        type = CL_SLAVE;
      >        if (IS_MNT_SHARED(m))
      >           type |= CL_MAKE_SHARED;
      >   }
      
      The "type" argument then governs clone_mnt() behavior with respect to flags
      and mnt_master of new mount. When we iterate through a slave group, it is
      possible that both current "m" and "last_dest" are not shared (although,
      both are slaves, i.e. have non-NULL mnt_master-s). Then the comparison
      above erroneously makes new mount shared and sets its mnt_master to
      last_source->mnt_master. The patch fixes the problem by handling zero
      mnt_group_id-s as though they are unequal.
      
      The similar problem exists in the implementation of "else" clause above
      when we have to ascend upward in the master/slave tree by calling:
      
      >    last_source = last_source->mnt_master;
      >    last_dest = last_source->mnt_parent;
      
      proper number of times. The last step is governed by
      "n->mnt_group_id != last_dest->mnt_group_id" condition that may lie if
      both are zero. The patch fixes this case in the same way as the former one.
      
      [AV: don't open-code an obvious helper...]
      Signed-off-by: default avatarMaxim Patlasov <mpatlasov@virtuozzo.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      2d7405bf
    • Wang YanQing's avatar
      x86/sysfb_efi: Fix valid BAR address range check · ade3716f
      Wang YanQing authored
      [ Upstream commit c10fcb14 ]
      
      The code for checking whether a BAR address range is valid will break
      out of the loop when a start address of 0x0 is encountered.
      
      This behaviour is wrong since by breaking out of the loop we may miss
      the BAR that describes the EFI frame buffer in a later iteration.
      
      Because of this bug I can't use video=efifb: boot parameter to get
      efifb on my new ThinkPad E550 for my old linux system hard disk with
      3.10 kernel. In 3.10, efifb is the only choice due to DRM/I915 not
      supporting the GPU.
      
      This patch also add a trivial optimization to break out after we find
      the frame buffer address range without testing later BARs.
      Signed-off-by: default avatarWang YanQing <udknight@gmail.com>
      [ Rewrote changelog. ]
      Signed-off-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Reviewed-by: default avatarPeter Jones <pjones@redhat.com>
      Cc: <stable@vger.kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: David Herrmann <dh.herrmann@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/1462454061-21561-2-git-send-email-matt@codeblueprint.co.ukSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      ade3716f
    • Herbert Xu's avatar
      crypto: hash - Fix page length clamping in hash walk · 16695408
      Herbert Xu authored
      [ Upstream commit 13f4bb78 ]
      
      The crypto hash walk code is broken when supplied with an offset
      greater than or equal to PAGE_SIZE.  This patch fixes it by adjusting
      walk->pg and walk->offset when this happens.
      
      Cc: <stable@vger.kernel.org>
      Reported-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      16695408
    • Prarit Bhargava's avatar
      ACPICA: Dispatcher: Update thread ID for recursive method calls · a10c059a
      Prarit Bhargava authored
      [ Upstream commit 93d68841 ]
      
      ACPICA commit 7a3bd2d962f221809f25ddb826c9e551b916eb25
      
      Set the mutex owner thread ID.
      Original patch from: Prarit Bhargava <prarit@redhat.com>
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=115121
      Link: https://github.com/acpica/acpica/commit/7a3bd2d9Signed-off-by: default avatarPrarit Bhargava <prarit@redhat.com>
      Tested-by: Andy Lutomirski <luto@kernel.org> # On a Dell XPS 13 9350
      Signed-off-by: default avatarBob Moore <robert.moore@intel.com>
      Signed-off-by: default avatarLv Zheng <lv.zheng@intel.com>
      Cc: All applicable <stable@vger.kernel.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      a10c059a
    • Matt Fleming's avatar
      MAINTAINERS: Remove asterisk from EFI directory names · d15451da
      Matt Fleming authored
      [ Upstream commit e8dfe6d8 ]
      
      Mark reported that having asterisks on the end of directory names
      confuses get_maintainer.pl when it encounters subdirectories, and that
      my name does not appear when run on drivers/firmware/efi/libstub.
      Reported-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Cc: <stable@vger.kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/1462303781-8686-2-git-send-email-matt@codeblueprint.co.ukSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      d15451da
    • Alex Deucher's avatar
      drm/radeon: make sure vertical front porch is at least 1 · d102342c
      Alex Deucher authored
      [ Upstream commit 3104b812 ]
      
      hw doesn't like a 0 value.
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      d102342c
    • Chunyu Hu's avatar
      tracing: Don't display trigger file for events that can't be enabled · 3216eb22
      Chunyu Hu authored
      [ Upstream commit 854145e0 ]
      
      Currently register functions for events will be called
      through the 'reg' field of event class directly without
      any check when seting up triggers.
      
      Triggers for events that don't support register through
      debug fs (events under events/ftrace are for trace-cmd to
      read event format, and most of them don't have a register
      function except events/ftrace/functionx) can't be enabled
      at all, and an oops will be hit when setting up trigger
      for those events, so just not creating them is an easy way
      to avoid the oops.
      
      Link: http://lkml.kernel.org/r/1462275274-3911-1-git-send-email-chuhu@redhat.com
      
      Cc: stable@vger.kernel.org # 3.14+
      Fixes: 85f2b082 ("tracing: Add basic event trigger framework")
      Signed-off-by: default avatarChunyu Hu <chuhu@redhat.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      3216eb22
    • Linus Torvalds's avatar
      Minimal fix-up of bad hashing behavior of hash_64() · 9abc9e72
      Linus Torvalds authored
      [ Upstream commit 689de1d6 ]
      
      This is a fairly minimal fixup to the horribly bad behavior of hash_64()
      with certain input patterns.
      
      In particular, because the multiplicative value used for the 64-bit hash
      was intentionally bit-sparse (so that the multiply could be done with
      shifts and adds on architectures without hardware multipliers), some
      bits did not get spread out very much.  In particular, certain fairly
      common bit ranges in the input (roughly bits 12-20: commonly with the
      most information in them when you hash things like byte offsets in files
      or memory that have block factors that mean that the low bits are often
      zero) would not necessarily show up much in the result.
      
      There's a bigger patch-series brewing to fix up things more completely,
      but this is the fairly minimal fix for the 64-bit hashing problem.  It
      simply picks a much better constant multiplier, spreading the bits out a
      lot better.
      
      NOTE! For 32-bit architectures, the bad old hash_64() remains the same
      for now, since 64-bit multiplies are expensive.  The bigger hashing
      cleanup will replace the 32-bit case with something better.
      
      The new constants were picked by George Spelvin who wrote that bigger
      cleanup series.  I just picked out the constants and part of the comment
      from that series.
      
      Cc: stable@vger.kernel.org
      Cc: George Spelvin <linux@horizon.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      9abc9e72
    • Anton Blanchard's avatar
      powerpc: Fix bad inline asm constraint in create_zero_mask() · 420b214c
      Anton Blanchard authored
      [ Upstream commit b4c11211 ]
      
      In create_zero_mask() we have:
      
      	addi	%1,%2,-1
      	andc	%1,%1,%2
      	popcntd	%0,%1
      
      using the "r" constraint for %2. r0 is a valid register in the "r" set,
      but addi X,r0,X turns it into an li:
      
      	li	r7,-1
      	andc	r7,r7,r0
      	popcntd	r4,r7
      
      Fix this by using the "b" constraint, for which r0 is not a valid
      register.
      
      This was found with a kernel build using gcc trunk, narrowed down to
      when -frename-registers was enabled at -O2. It is just luck however
      that we aren't seeing this on older toolchains.
      
      Thanks to Segher for working with me to find this issue.
      
      Cc: stable@vger.kernel.org
      Fixes: d0cebfa6 ("powerpc: word-at-a-time optimization for 64-bit Little Endian")
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      420b214c
    • K. Y. Srinivasan's avatar
      Drivers: hv: vmbus: Fix signaling logic in hv_need_to_signal_on_read() · c1721139
      K. Y. Srinivasan authored
      [ Upstream commit 1db488d1 ]
      
      On the consumer side, we have interrupt driven flow management of the
      producer. It is sufficient to base the signaling decision on the
      amount of space that is available to write after the read is complete.
      The current code samples the previous available space and uses this
      in making the signaling decision. This state can be stale and is
      unnecessary. Since the state can be stale, we end up not signaling
      the host (when we should) and this can result in a hang. Fix this
      problem by removing the unnecessary check. I would like to thank
      Arseney Romanenko <arseneyr@microsoft.com> for pointing out this issue.
      
      Also, issue a full memory barrier before making the signaling descision
      to correctly deal with potential reordering of the write (read index)
      followed by the read of pending_sz.
      Signed-off-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Tested-by: default avatarDexuan Cui <decui@microsoft.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      c1721139
    • Christopher Oo's avatar
      Drivers: hv_vmbus: Fix signal to host condition · 26ac029f
      Christopher Oo authored
      [ Upstream commit a5cca686 ]
      
      Fixes a bug where previously hv_ringbuffer_read would pass in the old
      number of bytes available to read instead of the expected old read index
      when calculating when to signal to the host that the ringbuffer is empty.
      Since the previous write size is already saved, also changes the
      hv_need_to_signal_on_read to use the previously read value rather than
      recalculating it.
      Signed-off-by: default avatarChristopher Oo <t-chriso@microsoft.com>
      Signed-off-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      26ac029f
    • Vitaly Kuznetsov's avatar
      Drivers: hv: ring_buffer.c: fix comment style · 3807acbd
      Vitaly Kuznetsov authored
      [ Upstream commit 822f18d4 ]
      
      Convert 6+-string comments repeating function names to normal kernel-style
      comments and fix a couple of other comment style issues. No textual or
      functional changes intended.
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      3807acbd
    • Al Viro's avatar
      atomic_open(): fix the handling of create_error · 9ddd8340
      Al Viro authored
      [ Upstream commit 10c64cea ]
      
      * if we have a hashed negative dentry and either CREAT|EXCL on
      r/o filesystem, or CREAT|TRUNC on r/o filesystem, or CREAT|EXCL
      with failing may_o_create(), we should fail with EROFS or the
      error may_o_create() has returned, but not ENOENT.  Which is what
      the current code ends up returning.
      
      * if we have CREAT|TRUNC hitting a regular file on a read-only
      filesystem, we can't fail with EROFS here.  At the very least,
      not until we'd done follow_managed() - we might have a writable
      file (or a device, for that matter) bound on top of that one.
      Moreover, the code downstream will see that O_TRUNC and attempt
      to grab the write access (*after* following possible mount), so
      if we really should fail with EROFS, it will happen.  No need
      to do that inside atomic_open().
      
      The real logics is much simpler than what the current code is
      trying to do - if we decided to go for simple lookup, ended
      up with a negative dentry *and* had create_error set, fail with
      create_error.  No matter whether we'd got that negative dentry
      from lookup_real() or had found it in dcache.
      
      Cc: stable@vger.kernel.org # v3.6+
      Acked-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      9ddd8340
    • Tony Luck's avatar
      EDAC: i7core, sb_edac: Don't return NOTIFY_BAD from mce_decoder callback · cb4a26d1
      Tony Luck authored
      [ Upstream commit c4fc1956 ]
      
      Both of these drivers can return NOTIFY_BAD, but this terminates
      processing other callbacks that were registered later on the chain.
      Since the driver did nothing to log the error it seems wrong to prevent
      other interested parties from seeing it. E.g. neither of them had even
      bothered to check the type of the error to see if it was a memory error
      before the return NOTIFY_BAD.
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Acked-by: default avatarAristeu Rozanski <aris@redhat.com>
      Acked-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/72937355dd92318d2630979666063f8a2853495b.1461864507.git.tony.luck@intel.comSigned-off-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      cb4a26d1
    • Takashi Iwai's avatar
      ALSA: usb-audio: Quirk for yet another Phoenix Audio devices (v2) · 74e15f5d
      Takashi Iwai authored
      [ Upstream commit 2d2c038a ]
      
      Phoenix Audio MT202pcs (1de7:0114) and MT202exe (1de7:0013) need the
      same workaround as TMX320 for avoiding the firmware bug.  It fixes the
      frequent error about the sample rate inquiries and the slow device
      probe as consequence.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=117321
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      74e15f5d
    • Naoya Horiguchi's avatar
      mm: check __PG_HWPOISON separately from PAGE_FLAGS_CHECK_AT_* · 9de27bd7
      Naoya Horiguchi authored
      [ Upstream commit f4c18e6f ]
      
      The race condition addressed in commit add05cec ("mm: soft-offline:
      don't free target page in successful page migration") was not closed
      completely, because that can happen not only for soft-offline, but also
      for hard-offline.  Consider that a slab page is about to be freed into
      buddy pool, and then an uncorrected memory error hits the page just
      after entering __free_one_page(), then VM_BUG_ON_PAGE(page->flags &
      PAGE_FLAGS_CHECK_AT_PREP) is triggered, despite the fact that it's not
      necessary because the data on the affected page is not consumed.
      
      To solve it, this patch drops __PG_HWPOISON from page flag checks at
      allocation/free time.  I think it's justified because __PG_HWPOISON
      flags is defined to prevent the page from being reused, and setting it
      outside the page's alloc-free cycle is a designed behavior (not a bug.)
      
      For recent months, I was annoyed about BUG_ON when soft-offlined page
      remains on lru cache list for a while, which is avoided by calling
      put_page() instead of putback_lru_page() in page migration's success
      path.  This means that this patch reverts a major change from commit
      add05cec about the new refcounting rule of soft-offlined pages, so
      "reuse window" revives.  This will be closed by a subsequent patch.
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Dean Nelson <dnelson@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      9de27bd7
    • Naoya Horiguchi's avatar
      mm: soft-offline: don't free target page in successful page migration · 6936c167
      Naoya Horiguchi authored
      [ Upstream commit add05cec ]
      
      Stress testing showed that soft offline events for a process iterating
      "mmap-pagefault-munmap" loop can trigger
      VM_BUG_ON(PAGE_FLAGS_CHECK_AT_PREP) in __free_one_page():
      
        Soft offlining page 0x70fe1 at 0x70100008d000
        Soft offlining page 0x705fb at 0x70300008d000
        page:ffffea0001c3f840 count:0 mapcount:0 mapping:          (null) index:0x2
        flags: 0x1fffff80800000(hwpoison)
        page dumped because: VM_BUG_ON_PAGE(page->flags & ((1 << 25) - 1))
        ------------[ cut here ]------------
        kernel BUG at /src/linux-dev/mm/page_alloc.c:585!
        invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
        Modules linked in: cfg80211 rfkill crc32c_intel microcode ppdev parport_pc pcspkr serio_raw virtio_balloon parport i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi floppy
        CPU: 3 PID: 1779 Comm: test_base_madv_ Not tainted 4.0.0-v4.0-150511-1451-00009-g82360a3730e6 #139
        RIP: free_pcppages_bulk+0x52a/0x6f0
        Call Trace:
          drain_pages_zone+0x3d/0x50
          drain_local_pages+0x1d/0x30
          on_each_cpu_mask+0x46/0x80
          drain_all_pages+0x14b/0x1e0
          soft_offline_page+0x432/0x6e0
          SyS_madvise+0x73c/0x780
          system_call_fastpath+0x12/0x17
        Code: ff 89 45 b4 48 8b 45 c0 48 83 b8 a8 00 00 00 00 0f 85 e3 fb ff ff 0f 1f 00 0f 0b 48 8b 7d 90 48 c7 c6 e8 95 a6 81 e8 e6 32 02 00 <0f> 0b 8b 45 cc 49 89 47 30 41 8b 47 18 83 f8 ff 0f 85 10 ff ff
        RIP  [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0
         RSP <ffff88007a117d28>
        ---[ end trace 53926436e76d1f35 ]---
      
      When soft offline successfully migrates page, the source page is supposed
      to be freed.  But there is a race condition where a source page looks
      isolated (i.e.  the refcount is 0 and the PageHWPoison is set) but
      somewhat linked to pcplist.  Then another soft offline event calls
      drain_all_pages() and tries to free such hwpoisoned page, which is
      forbidden.
      
      This odd page state seems to happen due to the race between put_page() in
      putback_lru_page() and __pagevec_lru_add_fn().  But I don't want to play
      with tweaking drain code as done in commit 9ab3b598 "mm: hwpoison:
      drop lru_add_drain_all() in __soft_offline_page()", or to change page
      freeing code for this soft offline's purpose.
      
      Instead, let's think about the difference between hard offline and soft
      offline.  There is an interesting difference in how to isolate the in-use
      page between these, that is, hard offline marks PageHWPoison of the target
      page at first, and doesn't free it by keeping its refcount 1.  OTOH, soft
      offline tries to free the target page then marks PageHWPoison.  This
      difference might be the source of complexity and result in bugs like the
      above.  So making soft offline isolate with keeping refcount can be a
      solution for this problem.
      
      We can pass to page migration code the "reason" which shows the caller, so
      let's use this more to avoid calling putback_lru_page() when called from
      soft offline, which effectively does the isolation for soft offline.  With
      this change, target pages of soft offline never be reused without changing
      migratetype, so this patch also removes the related code.
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      6936c167
    • Minchan Kim's avatar
      mm: vmscan: reclaim highmem zone if buffer_heads is over limit · 978d733f
      Minchan Kim authored
      [ Upstream commit 7bf52fb8 ]
      
      We have been reclaimed highmem zone if buffer_heads is over limit but
      commit 6b4f7799 ("mm: vmscan: invoke slab shrinkers from
      shrink_zone()") changed the behavior so it doesn't reclaim highmem zone
      although buffer_heads is over the limit.  This patch restores the logic.
      
      Fixes: 6b4f7799 ("mm: vmscan: invoke slab shrinkers from shrink_zone()")
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      978d733f
    • Konstantin Khlebnikov's avatar
      mm/huge_memory: replace VM_NO_THP VM_BUG_ON with actual VMA check · 9684dc00
      Konstantin Khlebnikov authored
      [ Upstream commit 3486b85a ]
      
      Khugepaged detects own VMAs by checking vm_file and vm_ops but this way
      it cannot distinguish private /dev/zero mappings from other special
      mappings like /dev/hpet which has no vm_ops and popultes PTEs in mmap.
      
      This fixes false-positive VM_BUG_ON and prevents installing THP where
      they are not expected.
      
      Link: http://lkml.kernel.org/r/CACT4Y+ZmuZMV5CjSFOeXviwQdABAgT7T+StKfTqan9YDtgEi5g@mail.gmail.com
      Fixes: 78f11a25 ("mm: thp: fix /dev/zero MAP_PRIVATE and vm_flags cleanups")
      Signed-off-by: default avatarKonstantin Khlebnikov <koct9i@gmail.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      9684dc00
    • Jason Gunthorpe's avatar
      IB/security: Restrict use of the write() interface · 5d43a619
      Jason Gunthorpe authored
      [ Upstream commit e6bd18f5 ]
      
      The drivers/infiniband stack uses write() as a replacement for
      bi-directional ioctl().  This is not safe. There are ways to
      trigger write calls that result in the return structure that
      is normally written to user space being shunted off to user
      specified kernel memory instead.
      
      For the immediate repair, detect and deny suspicious accesses to
      the write API.
      
      For long term, update the user space libraries and the kernel API
      to something that doesn't present the same security vulnerabilities
      (likely a structured ioctl() interface).
      
      The impacted uAPI interfaces are generally only available if
      hardware from drivers/infiniband is installed in the system.
      Reported-by: default avatarJann Horn <jann@thejh.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      [ Expanded check to all known write() entry points ]
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      5d43a619
    • James Morse's avatar
      ARM: cpuidle: Pass on arm_cpuidle_suspend()'s return value · 5e17ef78
      James Morse authored
      [ Upstream commit 625fe4f8 ]
      
      arm_cpuidle_suspend() may return -EOPNOTSUPP, or any value returned
      by the cpu_ops/cpuidle_ops suspend call. arm_enter_idle_state() doesn't
      update 'ret' with this value, meaning we always signal success to
      cpuidle_enter_state(), causing it to update the usage counters as if we
      succeeded.
      
      Fixes: 191de17a ("ARM64: cpuidle: Replace cpu_suspend by the common ARM/ARM64 function")
      Signed-off-by: default avatarJames Morse <james.morse@arm.com>
      Acked-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Acked-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      Cc: 4.1+ <stable@vger.kernel.org> # 4.1+
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      5e17ef78
    • Sascha Hauer's avatar
      ARM: SoCFPGA: Fix secondary CPU startup in thumb2 kernel · 5bde3f2a
      Sascha Hauer authored
      [ Upstream commit 5616f367 ]
      
      The secondary CPU starts up in ARM mode. When the kernel is compiled in
      thumb2 mode we have to explicitly compile the secondary startup
      trampoline in ARM mode, otherwise the CPU will go to Nirvana.
      Signed-off-by: default avatarSascha Hauer <s.hauer@pengutronix.de>
      Reported-by: default avatarSteffen Trumtrar <s.trumtrar@pengutronix.de>
      Suggested-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDinh Nguyen <dinguyen@opensource.altera.com>
      Signed-off-by: default avatarKevin Hilman <khilman@baylibre.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      5bde3f2a
    • Vitaly Prosyak's avatar
      drm/radeon: fix vertical bars appear on monitor (v2) · 3b54e5f0
      Vitaly Prosyak authored
      [ Upstream commit 5d5b7803 ]
      
      When crtc/timing is disabled on boot the dig block
      should be stopped in order ignore timing from crtc,
      reset the steering fifo otherwise we get display
      corruption or hung in dp sst mode.
      
      v2: agd: fix coding style
      Signed-off-by: default avatarVitaly Prosyak <vitaly.prosyak@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      3b54e5f0
    • Ville Syrjälä's avatar
      drm/i915: Make RPS EI/thresholds multiple of 25 on SNB-BDW · 344a144c
      Ville Syrjälä authored
      [ Upstream commit 4ea39590 ]
      
      Somehow my SNB GT1 (Dell XPS 8300) gets very unhappy around
      GPU hangs if the RPS EI/thresholds aren't suitably aligned.
      It seems like scheduling/timer interupts stop working somehow
      and things get stuck eg. in usleep_range().
      
      I bisected the problem down to
      commit 8a586437 ("drm/i915/skl: Restructured the gen6_set_rps_thresholds function")
      I observed that before all the values were at least multiples of 25,
      but afterwards they are not. And rounding things up to the next multiple
      of 25 does seem to help, so lets' do that. I also tried roundup(..., 5)
      but that wasn't sufficient. Also I have no idea if we might need this sort of
      thing on gen9+ as well.
      
      These are the original EI/thresholds:
       LOW_POWER
        GEN6_RP_UP_EI          12500
        GEN6_RP_UP_THRESHOLD   11800
        GEN6_RP_DOWN_EI        25000
        GEN6_RP_DOWN_THRESHOLD 21250
       BETWEEN
        GEN6_RP_UP_EI          10250
        GEN6_RP_UP_THRESHOLD    9225
        GEN6_RP_DOWN_EI        25000
        GEN6_RP_DOWN_THRESHOLD 18750
       HIGH_POWER
        GEN6_RP_UP_EI           8000
        GEN6_RP_UP_THRESHOLD    6800
        GEN6_RP_DOWN_EI        25000
        GEN6_RP_DOWN_THRESHOLD 15000
      
      These are after 8a586437:
       LOW_POWER
        GEN6_RP_UP_EI          12500
        GEN6_RP_UP_THRESHOLD   11875
        GEN6_RP_DOWN_EI        25000
        GEN6_RP_DOWN_THRESHOLD 21250
       BETWEEN
        GEN6_RP_UP_EI          10156
        GEN6_RP_UP_THRESHOLD    9140
        GEN6_RP_DOWN_EI        25000
        GEN6_RP_DOWN_THRESHOLD 18750
       HIGH_POWER
        GEN6_RP_UP_EI           7812
        GEN6_RP_UP_THRESHOLD    6640
        GEN6_RP_DOWN_EI        25000
        GEN6_RP_DOWN_THRESHOLD 15000
      
      And these are what we have after this patch:
       LOW_POWER
        GEN6_RP_UP_EI          12500
        GEN6_RP_UP_THRESHOLD   11875
        GEN6_RP_DOWN_EI        25000
        GEN6_RP_DOWN_THRESHOLD 21250
       BETWEEN
        GEN6_RP_UP_EI          10175
        GEN6_RP_UP_THRESHOLD    9150
        GEN6_RP_DOWN_EI        25000
        GEN6_RP_DOWN_THRESHOLD 18750
       HIGH_POWER
        GEN6_RP_UP_EI           7825
        GEN6_RP_UP_THRESHOLD    6650
        GEN6_RP_DOWN_EI        25000
        GEN6_RP_DOWN_THRESHOLD 15000
      
      Cc: stable@vger.kernel.org
      Cc: Akash Goel <akash.goel@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Testcase: igt/kms_pipe_crc_basic/hang-read-crc-pipe-B
      Fixes: 8a586437 ("drm/i915/skl: Restructured the gen6_set_rps_thresholds function")
      Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1461159836-9108-1-git-send-email-ville.syrjala@linux.intel.comAcked-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarPatrik Jakobsson <patrik.jakobsson@linux.intel.com>
      (cherry picked from commit 8a292d01)
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      344a144c
    • Imre Deak's avatar
      drm/i915/ddi: Fix eDP VDD handling during booting and suspend/resume · 39161f8a
      Imre Deak authored
      [ Upstream commit 5eaa60c7 ]
      
      The driver's VDD on/off logic assumes that whenever the VDD is on we
      also hold an AUX power domain reference. Since BIOS can leave the VDD on
      during booting and resuming and on DDI platforms we won't take a
      corresponding power reference, the above assumption won't hold on those
      platforms and an eventual delayed VDD off work will do an extraneous AUX
      power domain put resulting in a refcount underflow. Fix this the same
      way we did this for non-DDI DP encoders:
      
      commit 6d93c0c4 ("drm/i915: fix VDD state tracking after system
      resume")
      
      At the same time call the DP encoder suspend handler the same way as the
      non-DDI DP encoders do to flush any pending VDD off work. Leaving the
      work running may cause a HW access where we don't expect this (at a point
      where power domains are suspended already).
      
      While at it remove an unnecessary function call indirection.
      
      This fixed for me AUX refcount underflow problems on BXT during
      suspend/resume.
      
      CC: Ville Syrjälä <ville.syrjala@linux.intel.com>
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarImre Deak <imre.deak@intel.com>
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1460963062-13211-4-git-send-email-imre.deak@intel.com
      (cherry picked from commit bf93ba67)
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      39161f8a
    • Michael Neuling's avatar
      cxl: Keep IRQ mappings on context teardown · 86128809
      Michael Neuling authored
      [ Upstream commit d6776bba ]
      
      Keep IRQ mappings on context teardown.  This won't leak IRQs as if we
      allocate the mapping again, the generic code will give the same
      mapping used last time.
      
      Doing this works around a race in the generic code. Masking the
      interrupt introduces a race which can crash the kernel or result in
      IRQ that is never EOIed. The lost of EOI results in all subsequent
      mappings to the same HW IRQ never receiving an interrupt.
      
      We've seen this race with cxl test cases which are doing heavy context
      startup and teardown at the same time as heavy interrupt load.
      
      A fix to the generic code is being investigated also.
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Cc: stable@vger.kernel.org # 3.8
      Tested-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Acked-by: default avatarIan Munsie <imunsie@au1.ibm.com>
      Tested-by: default avatarVaibhav Jain <vaibhav@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      86128809
    • Lyude's avatar
      drm/dp/mst: Restore primary hub guid on resume · b61a5d54
      Lyude authored
      [ Upstream commit 9dc0487d ]
      
      Some hubs are forgetful, and end up forgetting whatever GUID we set
      previously after we do a suspend/resume cycle. This can lead to
      hotplugging breaking (along with probably other things) since the hub
      will start sending connection notifications with the wrong GUID. As
      such, we need to check on resume whether or not the GUID the hub is
      giving us is valid.
      Signed-off-by: default avatarLyude <cpaul@redhat.com>
      Reviewed-by: default avatarHarry Wentland <harry.wentland@amd.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: http://patchwork.freedesktop.org/patch/msgid/1460580618-7421-1-git-send-email-cpaul@redhat.com
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      b61a5d54
    • cpaul@redhat.com's avatar
      drm/dp/mst: Get validated port ref in drm_dp_update_payload_part1() · 85489830
      cpaul@redhat.com authored
      [ Upstream commit 263efde3 ]
      
      We can thank KASAN for finding this, otherwise I probably would have spent
      hours on it. This fixes a somewhat harder to trigger kernel panic, occuring
      while enabling MST where the port we were currently updating the payload on
      would have all of it's refs dropped before we finished what we were doing:
      
      ==================================================================
      BUG: KASAN: use-after-free in drm_dp_update_payload_part1+0xb3f/0xdb0 [drm_kms_helper] at addr ffff8800d29de018
      Read of size 4 by task Xorg/973
      =============================================================================
      BUG kmalloc-2048 (Tainted: G    B   W      ): kasan: bad access detected
      -----------------------------------------------------------------------------
      
      INFO: Allocated in drm_dp_add_port+0x1aa/0x1ed0 [drm_kms_helper] age=16477 cpu=0 pid=2175
      	___slab_alloc+0x472/0x490
      	__slab_alloc+0x20/0x40
      	kmem_cache_alloc_trace+0x151/0x190
      	drm_dp_add_port+0x1aa/0x1ed0 [drm_kms_helper]
      	drm_dp_send_link_address+0x526/0x960 [drm_kms_helper]
      	drm_dp_check_and_send_link_address+0x1ac/0x210 [drm_kms_helper]
      	drm_dp_mst_link_probe_work+0x77/0xd0 [drm_kms_helper]
      	process_one_work+0x562/0x1350
      	worker_thread+0xd9/0x1390
      	kthread+0x1c5/0x260
      	ret_from_fork+0x22/0x40
      INFO: Freed in drm_dp_free_mst_port+0x50/0x60 [drm_kms_helper] age=7521 cpu=0 pid=2175
      	__slab_free+0x17f/0x2d0
      	kfree+0x169/0x180
      	drm_dp_free_mst_port+0x50/0x60 [drm_kms_helper]
      	drm_dp_destroy_connector_work+0x2b8/0x490 [drm_kms_helper]
      	process_one_work+0x562/0x1350
      	worker_thread+0xd9/0x1390
      	kthread+0x1c5/0x260
      	ret_from_fork+0x22/0x40
      
      which on this T460s, would eventually lead to kernel panics in somewhat
      random places later in intel_mst_enable_dp() if we got lucky enough.
      Signed-off-by: default avatarLyude <cpaul@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      85489830
    • Roman Pen's avatar
      workqueue: fix ghost PENDING flag while doing MQ IO · 14794cfb
      Roman Pen authored
      [ Upstream commit 346c09f8 ]
      
      The bug in a workqueue leads to a stalled IO request in MQ ctx->rq_list
      with the following backtrace:
      
      [  601.347452] INFO: task kworker/u129:5:1636 blocked for more than 120 seconds.
      [  601.347574]       Tainted: G           O    4.4.5-1-storage+ #6
      [  601.347651] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  601.348142] kworker/u129:5  D ffff880803077988     0  1636      2 0x00000000
      [  601.348519] Workqueue: ibnbd_server_fileio_wq ibnbd_dev_file_submit_io_worker [ibnbd_server]
      [  601.348999]  ffff880803077988 ffff88080466b900 ffff8808033f9c80 ffff880803078000
      [  601.349662]  ffff880807c95000 7fffffffffffffff ffffffff815b0920 ffff880803077ad0
      [  601.350333]  ffff8808030779a0 ffffffff815b01d5 0000000000000000 ffff880803077a38
      [  601.350965] Call Trace:
      [  601.351203]  [<ffffffff815b0920>] ? bit_wait+0x60/0x60
      [  601.351444]  [<ffffffff815b01d5>] schedule+0x35/0x80
      [  601.351709]  [<ffffffff815b2dd2>] schedule_timeout+0x192/0x230
      [  601.351958]  [<ffffffff812d43f7>] ? blk_flush_plug_list+0xc7/0x220
      [  601.352208]  [<ffffffff810bd737>] ? ktime_get+0x37/0xa0
      [  601.352446]  [<ffffffff815b0920>] ? bit_wait+0x60/0x60
      [  601.352688]  [<ffffffff815af784>] io_schedule_timeout+0xa4/0x110
      [  601.352951]  [<ffffffff815b3a4e>] ? _raw_spin_unlock_irqrestore+0xe/0x10
      [  601.353196]  [<ffffffff815b093b>] bit_wait_io+0x1b/0x70
      [  601.353440]  [<ffffffff815b056d>] __wait_on_bit+0x5d/0x90
      [  601.353689]  [<ffffffff81127bd0>] wait_on_page_bit+0xc0/0xd0
      [  601.353958]  [<ffffffff81096db0>] ? autoremove_wake_function+0x40/0x40
      [  601.354200]  [<ffffffff81127cc4>] __filemap_fdatawait_range+0xe4/0x140
      [  601.354441]  [<ffffffff81127d34>] filemap_fdatawait_range+0x14/0x30
      [  601.354688]  [<ffffffff81129a9f>] filemap_write_and_wait_range+0x3f/0x70
      [  601.354932]  [<ffffffff811ced3b>] blkdev_fsync+0x1b/0x50
      [  601.355193]  [<ffffffff811c82d9>] vfs_fsync_range+0x49/0xa0
      [  601.355432]  [<ffffffff811cf45a>] blkdev_write_iter+0xca/0x100
      [  601.355679]  [<ffffffff81197b1a>] __vfs_write+0xaa/0xe0
      [  601.355925]  [<ffffffff81198379>] vfs_write+0xa9/0x1a0
      [  601.356164]  [<ffffffff811c59d8>] kernel_write+0x38/0x50
      
      The underlying device is a null_blk, with default parameters:
      
        queue_mode    = MQ
        submit_queues = 1
      
      Verification that nullb0 has something inflight:
      
      root@pserver8:~# cat /sys/block/nullb0/inflight
             0        1
      root@pserver8:~# find /sys/block/nullb0/mq/0/cpu* -name rq_list -print -exec cat {} \;
      ...
      /sys/block/nullb0/mq/0/cpu2/rq_list
      CTX pending:
              ffff8838038e2400
      ...
      
      During debug it became clear that stalled request is always inserted in
      the rq_list from the following path:
      
         save_stack_trace_tsk + 34
         blk_mq_insert_requests + 231
         blk_mq_flush_plug_list + 281
         blk_flush_plug_list + 199
         wait_on_page_bit + 192
         __filemap_fdatawait_range + 228
         filemap_fdatawait_range + 20
         filemap_write_and_wait_range + 63
         blkdev_fsync + 27
         vfs_fsync_range + 73
         blkdev_write_iter + 202
         __vfs_write + 170
         vfs_write + 169
         kernel_write + 56
      
      So blk_flush_plug_list() was called with from_schedule == true.
      
      If from_schedule is true, that means that finally blk_mq_insert_requests()
      offloads execution of __blk_mq_run_hw_queue() and uses kblockd workqueue,
      i.e. it calls kblockd_schedule_delayed_work_on().
      
      That means, that we race with another CPU, which is about to execute
      __blk_mq_run_hw_queue() work.
      
      Further debugging shows the following traces from different CPUs:
      
        CPU#0                                  CPU#1
        ----------------------------------     -------------------------------
        reqeust A inserted
        STORE hctx->ctx_map[0] bit marked
        kblockd_schedule...() returns 1
        <schedule to kblockd workqueue>
                                               request B inserted
                                               STORE hctx->ctx_map[1] bit marked
                                               kblockd_schedule...() returns 0
        *** WORK PENDING bit is cleared ***
        flush_busy_ctxs() is executed, but
        bit 1, set by CPU#1, is not observed
      
      As a result request B pended forever.
      
      This behaviour can be explained by speculative LOAD of hctx->ctx_map on
      CPU#0, which is reordered with clear of PENDING bit and executed _before_
      actual STORE of bit 1 on CPU#1.
      
      The proper fix is an explicit full barrier <mfence>, which guarantees
      that clear of PENDING bit is to be executed before all possible
      speculative LOADS or STORES inside actual work function.
      Signed-off-by: default avatarRoman Pen <roman.penyaev@profitbricks.com>
      Cc: Gioh Kim <gi-oh.kim@profitbricks.com>
      Cc: Michael Wang <yun.wang@profitbricks.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-block@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      14794cfb
    • Conrad Kostecki's avatar
      ALSA: hda - Add dock support for ThinkPad X260 · 2519c9fc
      Conrad Kostecki authored
      [ Upstream commit 037e1197 ]
      
      Fixes audio output on a ThinkPad X260, when using Lenovo CES 2013
      docking station series (basic, pro, ultra).
      Signed-off-by: default avatarConrad Kostecki <ck+linuxkernel@bl4ckb0x.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      2519c9fc
    • Shaohua Li's avatar
      MD: make bio mergeable · cdfac06e
      Shaohua Li authored
      [ Upstream commit 9c573de3 ]
      
      blk_queue_split marks bio unmergeable, which makes sense for normal bio.
      But if dispatching the bio to underlayer disk, the blk_queue_split
      checks are invalid, hence it's possible the bio becomes mergeable.
      
      In the reported bug, this bug causes trim against raid0 performance slash
      https://bugzilla.kernel.org/show_bug.cgi?id=117051Reported-and-tested-by: default avatarPark Ju Hyung <qkrwngud825@gmail.com>
      Fixes: 6ac45aeb(block: avoid to merge splitted bio)
      Cc: stable@vger.kernel.org (v4.3+)
      Cc: Ming Lei <ming.lei@canonical.com>
      Cc: Neil Brown <neilb@suse.de>
      Reviewed-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      cdfac06e
    • Hans Verkuil's avatar
      [media] v4l2-dv-timings.h: fix polarity for 4k formats · 4f194898
      Hans Verkuil authored
      [ Upstream commit 3020ca71 ]
      
      The VSync polarity was negative instead of positive for the 4k CEA formats.
      I probably copy-and-pasted these from the DMT 4k format, which does have a
      negative VSync polarity.
      Signed-off-by: default avatarHans Verkuil <hans.verkuil@cisco.com>
      Reported-by: default avatarMartin Bugge <marbugge@cisco.com>
      Cc: <stable@vger.kernel.org>      # for v4.1 and up
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      4f194898
    • Jasem Mutlaq's avatar
      USB: serial: cp210x: add Straizona Focusers device ids · b4782b68
      Jasem Mutlaq authored
      [ Upstream commit 613ac23a ]
      
      Adding VID:PID for Straizona Focusers to cp210x driver.
      Signed-off-by: default avatarJasem Mutlaq <mutlaqja@ikarustech.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      b4782b68
    • Mike Manning's avatar
      USB: serial: cp210x: add ID for Link ECU · 35f45c8a
      Mike Manning authored
      [ Upstream commit 1d377f4d ]
      
      The Link ECU is an aftermarket ECU computer for vehicles that provides
      full tuning abilities as well as datalogging and displaying capabilities
      via the USB to Serial adapter built into the device.
      Signed-off-by: default avatarMike Manning <michael@bsch.com.au>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      35f45c8a
    • Laszlo Ersek's avatar
      efi: Fix out-of-bounds read in variable_matches() · 83619523
      Laszlo Ersek authored
      [ Upstream commit 630ba0cc ]
      
      The variable_matches() function can currently read "var_name[len]", for
      example when:
      
       - var_name[0] == 'a',
       - len == 1
       - match_name points to the NUL-terminated string "ab".
      
      This function is supposed to accept "var_name" inputs that are not
      NUL-terminated (hence the "len" parameter"). Document the function, and
      access "var_name[*match]" only if "*match" is smaller than "len".
      Reported-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarLaszlo Ersek <lersek@redhat.com>
      Cc: Peter Jones <pjones@redhat.com>
      Cc: Matthew Garrett <mjg59@coreos.com>
      Cc: Jason Andryuk <jandryuk@gmail.com>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: <stable@vger.kernel.org> # v3.10+
      Link: http://thread.gmane.org/gmane.comp.freedesktop.xorg.drivers.intel/86906Signed-off-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      83619523
    • Krzysztof Kozlowski's avatar
      iio: ak8975: Fix NULL pointer exception on early interrupt · e7e16bb6
      Krzysztof Kozlowski authored
      [ Upstream commit 07d2390e ]
      
      In certain probe conditions the interrupt came right after registering
      the handler causing a NULL pointer exception because of uninitialized
      waitqueue:
      
      $ udevadm trigger
      i2c-gpio i2c-gpio-1: using pins 143 (SDA) and 144 (SCL)
      i2c-gpio i2c-gpio-3: using pins 53 (SDA) and 52 (SCL)
      Unable to handle kernel NULL pointer dereference at virtual address 00000000
      pgd = e8b38000
      [00000000] *pgd=00000000
      Internal error: Oops: 5 [#1] SMP ARM
      Modules linked in: snd_soc_i2s(+) i2c_gpio(+) snd_soc_idma snd_soc_s3c_dma snd_soc_core snd_pcm_dmaengine snd_pcm snd_timer snd soundcore ac97_bus spi_s3c64xx pwm_samsung dwc2 exynos_adc phy_exynos_usb2 exynosdrm exynos_rng rng_core rtc_s3c
      CPU: 0 PID: 717 Comm: data-provider-m Not tainted 4.6.0-rc1-next-20160401-00011-g1b8d87473b9e-dirty #101
      Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
      (...)
      (__wake_up_common) from [<c0379624>] (__wake_up+0x38/0x4c)
      (__wake_up) from [<c0a41d30>] (ak8975_irq_handler+0x28/0x30)
      (ak8975_irq_handler) from [<c0386720>] (handle_irq_event_percpu+0x88/0x140)
      (handle_irq_event_percpu) from [<c038681c>] (handle_irq_event+0x44/0x68)
      (handle_irq_event) from [<c0389c40>] (handle_edge_irq+0xf0/0x19c)
      (handle_edge_irq) from [<c0385e04>] (generic_handle_irq+0x24/0x34)
      (generic_handle_irq) from [<c05ee360>] (exynos_eint_gpio_irq+0x50/0x68)
      (exynos_eint_gpio_irq) from [<c0386720>] (handle_irq_event_percpu+0x88/0x140)
      (handle_irq_event_percpu) from [<c038681c>] (handle_irq_event+0x44/0x68)
      (handle_irq_event) from [<c0389a70>] (handle_fasteoi_irq+0xb4/0x194)
      (handle_fasteoi_irq) from [<c0385e04>] (generic_handle_irq+0x24/0x34)
      (generic_handle_irq) from [<c03860b4>] (__handle_domain_irq+0x5c/0xb4)
      (__handle_domain_irq) from [<c0301774>] (gic_handle_irq+0x54/0x94)
      (gic_handle_irq) from [<c030c910>] (__irq_usr+0x50/0x80)
      
      The bug was reproduced on exynos4412-trats2 (with a max77693 device also
      using i2c-gpio) after building max77693 as a module.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 94a6d5cf ("iio:ak8975 Implement data ready interrupt handling")
      Signed-off-by: default avatarKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Tested-by: default avatarGregor Boirie <gregor.boirie@parrot.com>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      e7e16bb6
    • Jack Pham's avatar
      regmap: spmi: Fix regmap_spmi_ext_read in multi-byte case · 24a50739
      Jack Pham authored
      [ Upstream commit dec8e8f6 ]
      
      Specifically for the case of reads that use the Extended Register
      Read Long command, a multi-byte read operation is broken up into
      8-byte chunks.  However the call to spmi_ext_register_readl() is
      incorrectly passing 'val_size', which if greater than 8 will
      always fail.  The argument should instead be 'len'.
      
      Fixes: c9afbb05 ("regmap: spmi: support base and extended register spaces")
      Signed-off-by: default avatarJack Pham <jackp@codeaurora.org>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      24a50739
    • Srinivas Kandagatla's avatar
      ata: ahci-platform: Add ports-implemented DT bindings. · eab51598
      Srinivas Kandagatla authored
      [ Upstream commit 17dcc37e ]
      
      On some SOCs PORTS_IMPL register value is never programmed by the
      firmware and left at zero value. Which means that no sata ports are
      available for software. AHCI driver used to cope up with this by
      fabricating the port_map if the PORTS_IMPL register is read zero,
      but recent patch broke this workaround as zero value was valid for
      NVMe disks.
      
      This patch adds ports-implemented DT bindings as workaround for this issue
      in a way that DT can can override the PORTS_IMPL register in cases where
      the firmware did not program it already.
      
      Fixes: 566d1827 ("libata: disable forced PORTS_IMPL for >= AHCI 1.3")
      Cc: stable@vger.kernel.org # v4.5+
      Signed-off-by: default avatarSrinivas Kandagatla <srinivas.kandagatla@linaro.org>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarAndy Gross <andy.gross@linaro.org>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      eab51598
    • Srinivas Kandagatla's avatar
      libahci: save port map for forced port map · a5d2af4c
      Srinivas Kandagatla authored
      [ Upstream commit 2fd0f46c ]
      
      In usecases where force_port_map is used saved_port_map is never set,
      resulting in not programming the PORTS_IMPL register as part of initial
      config. This patch fixes this by setting it to port_map even in case
      where force_port_map is used, making it more inline with other parts of
      the code.
      
      Fixes: 566d1827 ("libata: disable forced PORTS_IMPL for >= AHCI 1.3")
      Cc: stable@vger.kernel.org # v4.5+
      Signed-off-by: default avatarSrinivas Kandagatla <srinivas.kandagatla@linaro.org>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarAndy Gross <andy.gross@linaro.org>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      a5d2af4c