1. 04 Mar, 2016 40 commits
    • Eric Dumazet's avatar
      af_unix: fix struct pid memory leak · 8d988538
      Eric Dumazet authored
      [ Upstream commit fa0dc04d ]
      
      Dmitry reported a struct pid leak detected by a syzkaller program.
      
      Bug happens in unix_stream_recvmsg() when we break the loop when a
      signal is pending, without properly releasing scm.
      
      Fixes: b3ca9b02 ("net: fix multithreaded signal handling in unix recv routines")
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Rainer Weikusat <rweikusat@mobileactivedefense.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      8d988538
    • Eric Dumazet's avatar
      tcp: fix NULL deref in tcp_v4_send_ack() · 93493f5a
      Eric Dumazet authored
      [ Upstream commit e62a123b ]
      
      Neal reported crashes with this stack trace :
      
       RIP: 0010:[<ffffffff8c57231b>] tcp_v4_send_ack+0x41/0x20f
      ...
       CR2: 0000000000000018 CR3: 000000044005c000 CR4: 00000000001427e0
      ...
        [<ffffffff8c57258e>] tcp_v4_reqsk_send_ack+0xa5/0xb4
        [<ffffffff8c1a7caa>] tcp_check_req+0x2ea/0x3e0
        [<ffffffff8c19e420>] tcp_rcv_state_process+0x850/0x2500
        [<ffffffff8c1a6d21>] tcp_v4_do_rcv+0x141/0x330
        [<ffffffff8c56cdb2>] sk_backlog_rcv+0x21/0x30
        [<ffffffff8c098bbd>] tcp_recvmsg+0x75d/0xf90
        [<ffffffff8c0a8700>] inet_recvmsg+0x80/0xa0
        [<ffffffff8c17623e>] sock_aio_read+0xee/0x110
        [<ffffffff8c066fcf>] do_sync_read+0x6f/0xa0
        [<ffffffff8c0673a1>] SyS_read+0x1e1/0x290
        [<ffffffff8c5ca262>] system_call_fastpath+0x16/0x1b
      
      The problem here is the skb we provide to tcp_v4_send_ack() had to
      be parked in the backlog of a new TCP fastopen child because this child
      was owned by the user at the time an out of window packet arrived.
      
      Before queuing a packet, TCP has to set skb->dev to NULL as the device
      could disappear before packet is removed from the queue.
      
      Fix this issue by using the net pointer provided by the socket (being a
      timewait or a request socket).
      
      IPv6 is immune to the bug : tcp_v6_send_response() already gets the net
      pointer from the socket if provided.
      
      Fixes: 168a8f58 ("tcp: TCP Fast Open Server - main code path")
      Reported-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Jerry Chu <hkchu@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      93493f5a
    • Manfred Rudigier's avatar
      net: dp83640: Fix tx timestamp overflow handling. · 0c0c5254
      Manfred Rudigier authored
      [ Upstream commit 81e8f2e9 ]
      
      PHY status frames are not reliable, the PHY may not be able to send them
      during heavy receive traffic. This overflow condition is signaled by the
      PHY in the next status frame, but the driver did not make use of it.
      Instead it always reported wrong tx timestamps to user space after an
      overflow happened because it assigned newly received tx timestamps to old
      packets in the queue.
      
      This commit fixes this issue by clearing the tx timestamp queue every time
      an overflow happens, so that no timestamps are delivered for overflow
      packets. This way time stamping will continue correctly after an overflow.
      Signed-off-by: default avatarManfred Rudigier <manfred.rudigier@omicron.at>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      0c0c5254
    • Ursula Braun's avatar
    • Gavin Shan's avatar
      powerpc/eeh: Fix build error caused by pci_dn · 35cc8146
      Gavin Shan authored
      eeh.h could be included when we have following condition. Then we
      run into build error as below: (CONFIG_PPC64 && !CONFIG_EEH) ||
      (!CONFIG_PPC64 && !CONFIG_EEH)
      
      In file included from arch/powerpc/kernel/of_platform.c:30:0:
      ./arch/powerpc/include/asm/eeh.h:344:48: error: ‘struct pci_dn’ \
      declared inside parameter list [-Werror]
          :
      In file included from arch/powerpc/mm/hash_utils_64.c:49:0:
      ./arch/powerpc/include/asm/eeh.h:344:48: error: ‘struct pci_dn’ \
      declared inside parameter list [-Werror]
      
      This fixes the issue by replacing those empty inline functions
      with macro so that we don't rely on @pci_dn when CONFIG_EEH is
      disabled.
      
      Cc: stable@vger.kernel.org # v4.1+
      Fixes: ff57b454 ("powerpc/eeh: Do probe on pci_dn")
      Reported-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      35cc8146
    • Jan Kara's avatar
      ext4: fix crashes in dioread_nolock mode · 62adae8f
      Jan Kara authored
      [ Upstream commit 74dae427 ]
      
      Competing overwrite DIO in dioread_nolock mode will just overwrite
      pointer to io_end in the inode. This may result in data corruption or
      extent conversion happening from IO completion interrupt because we
      don't properly set buffer_defer_completion() when unlocked DIO races
      with locked DIO to unwritten extent.
      
      Since unlocked DIO doesn't need io_end for anything, just avoid
      allocating it and corrupting pointer from inode for locked DIO.
      A cleaner fix would be to avoid these games with io_end pointer from the
      inode but that requires more intrusive changes so we leave that for
      later.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      62adae8f
    • Kirill A. Shutemov's avatar
      ipc/shm: handle removed segments gracefully in shm_mmap() · 5d0e8394
      Kirill A. Shutemov authored
      [ Upstream commit 15db15e2 ]
      
      commit 1ac0b6de upstream.
      
      remap_file_pages(2) emulation can reach file which represents removed
      IPC ID as long as a memory segment is mapped.  It breaks expectations of
      IPC subsystem.
      
      Test case (rewritten to be more human readable, originally autogenerated
      by syzkaller[1]):
      
      	#define _GNU_SOURCE
      	#include <stdlib.h>
      	#include <sys/ipc.h>
      	#include <sys/mman.h>
      	#include <sys/shm.h>
      
      	#define PAGE_SIZE 4096
      
      	int main()
      	{
      		int id;
      		void *p;
      
      		id = shmget(IPC_PRIVATE, 3 * PAGE_SIZE, 0);
      		p = shmat(id, NULL, 0);
      		shmctl(id, IPC_RMID, NULL);
      		remap_file_pages(p, 3 * PAGE_SIZE, 0, 7, 0);
      
      	        return 0;
      	}
      
      The patch changes shm_mmap() and code around shm_lock() to propagate
      locking error back to caller of shm_mmap().
      
      [1] http://github.com/google/syzkallerSigned-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      5d0e8394
    • Davidlohr Bueso's avatar
      ipc: convert invalid scenarios to use WARN_ON · 6e82212c
      Davidlohr Bueso authored
      [ Upstream commit d0edd852 ]
      
      Considering Linus' past rants about the (ab)use of BUG in the kernel, I
      took a look at how we deal with such calls in ipc.  Given that any errors
      or corruption in ipc code are most likely contained within the set of
      processes participating in the broken mechanisms, there aren't really many
      strong fatal system failure scenarios that would require a BUG call.
      Also, if something is seriously wrong, ipc might not be the place for such
      a BUG either.
      
      1. For example, recently, a customer hit one of these BUG_ONs in shm
         after failing shm_lock().  A busted ID imho does not merit a BUG_ON,
         and WARN would have been better.
      
      2. MSG_COPY functionality of posix msgrcv(2) for checkpoint/restore.
         I don't see how we can hit this anyway -- at least it should be IS_ERR.
          The 'copy' arg from do_msgrcv is always set by calling prepare_copy()
         first and foremost.  We could also probably drop this check altogether.
          Either way, it does not merit a BUG_ON.
      
      3. No ->fault() callback for the fs getting the corresponding page --
         seems selfish to make the system unusable.
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      6e82212c
    • Davidlohr Bueso's avatar
      ipc,shm: move BUG_ON check into shm_lock · 0eba52da
      Davidlohr Bueso authored
      [ Upstream commit c5c8975b ]
      
      Upon every shm_lock call, we BUG_ON if an error was returned, indicating
      racing either in idr or in shm_destroy.  Move this logic into the locking.
      
      [akpm@linux-foundation.org: simplify code]
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      0eba52da
    • Kirill A. Shutemov's avatar
      mm: fix regression in remap_file_pages() emulation · aff5e598
      Kirill A. Shutemov authored
      [ Upstream commit 48f7df32 ]
      
      Grazvydas Ignotas has reported a regression in remap_file_pages()
      emulation.
      
      Testcase:
      	#define _GNU_SOURCE
      	#include <assert.h>
      	#include <stdlib.h>
      	#include <stdio.h>
      	#include <sys/mman.h>
      
      	#define SIZE    (4096 * 3)
      
      	int main(int argc, char **argv)
      	{
      		unsigned long *p;
      		long i;
      
      		p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
      				MAP_SHARED | MAP_ANONYMOUS, -1, 0);
      		if (p == MAP_FAILED) {
      			perror("mmap");
      			return -1;
      		}
      
      		for (i = 0; i < SIZE / 4096; i++)
      			p[i * 4096 / sizeof(*p)] = i;
      
      		if (remap_file_pages(p, 4096, 0, 1, 0)) {
      			perror("remap_file_pages");
      			return -1;
      		}
      
      		if (remap_file_pages(p, 4096 * 2, 0, 1, 0)) {
      			perror("remap_file_pages");
      			return -1;
      		}
      
      		assert(p[0] == 1);
      
      		munmap(p, SIZE);
      
      		return 0;
      	}
      
      The second remap_file_pages() fails with -EINVAL.
      
      The reason is that remap_file_pages() emulation assumes that the target
      vma covers whole area we want to over map.  That assumption is broken by
      first remap_file_pages() call: it split the area into two vma.
      
      The solution is to check next adjacent vmas, if they map the same file
      with the same flags.
      
      Fixes: c8d78c18 ("mm: replace remap_file_pages() syscall with emulation")
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: default avatarGrazvydas Ignotas <notasas@gmail.com>
      Tested-by: default avatarGrazvydas Ignotas <notasas@gmail.com>
      Cc: <stable@vger.kernel.org>	[4.0+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      aff5e598
    • Takashi Iwai's avatar
      ALSA: pcm: Fix rwsem deadlock for non-atomic PCM stream · cab48a09
      Takashi Iwai authored
      [ Upstream commit 67ec1072 ]
      
      A non-atomic PCM stream may take snd_pcm_link_rwsem rw semaphore twice
      in the same code path, e.g. one in snd_pcm_action_nonatomic() and
      another in snd_pcm_stream_lock().  Usually this is OK, but when a
      write lock is issued between these two read locks, the problem
      happens: the write lock is blocked due to the first reade lock, and
      the second read lock is also blocked by the write lock.  This
      eventually deadlocks.
      
      The reason is the way rwsem manages waiters; it's queued like FIFO, so
      even if the writer itself doesn't take the lock yet, it blocks all the
      waiters (including reads) queued after it.
      
      As a workaround, in this patch, we replace the standard down_write()
      with an spinning loop.  This is far from optimal, but it's good
      enough, as the spinning time is supposed to be relatively short for
      normal PCM operations, and the code paths requiring the write lock
      aren't called so often.
      Reported-by: default avatarVinod Koul <vinod.koul@intel.com>
      Tested-by: default avatarRamesh Babu <ramesh.babu@intel.com>
      Cc: <stable@vger.kernel.org> # v3.18+
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      cab48a09
    • Toshi Kani's avatar
      x86/mm: Fix vmalloc_fault() to handle large pages properly · 468eeda0
      Toshi Kani authored
      [ Upstream commit f4eafd8b ]
      
      A kernel page fault oops with the callstack below was observed
      when a read syscall was made to a pmem device after a huge amount
      (>512GB) of vmalloc ranges was allocated by ioremap() on a x86_64
      system:
      
           BUG: unable to handle kernel paging request at ffff880840000ff8
           IP: vmalloc_fault+0x1be/0x300
           PGD c7f03a067 PUD 0
           Oops: 0000 [#1] SM
           Call Trace:
              __do_page_fault+0x285/0x3e0
              do_page_fault+0x2f/0x80
              ? put_prev_entity+0x35/0x7a0
              page_fault+0x28/0x30
              ? memcpy_erms+0x6/0x10
              ? schedule+0x35/0x80
              ? pmem_rw_bytes+0x6a/0x190 [nd_pmem]
              ? schedule_timeout+0x183/0x240
              btt_log_read+0x63/0x140 [nd_btt]
               :
              ? __symbol_put+0x60/0x60
              ? kernel_read+0x50/0x80
              SyS_finit_module+0xb9/0xf0
              entry_SYSCALL_64_fastpath+0x1a/0xa4
      
      Since v4.1, ioremap() supports large page (pud/pmd) mappings in
      x86_64 and PAE.  vmalloc_fault() however assumes that the vmalloc
      range is limited to pte mappings.
      
      vmalloc faults do not normally happen in ioremap'd ranges since
      ioremap() sets up the kernel page tables, which are shared by
      user processes.  pgd_ctor() sets the kernel's PGD entries to
      user's during fork().  When allocation of the vmalloc ranges
      crosses a 512GB boundary, ioremap() allocates a new pud table
      and updates the kernel PGD entry to point it.  If user process's
      PGD entry does not have this update yet, a read/write syscall
      to the range will cause a vmalloc fault, which hits the Oops
      above as it does not handle a large page properly.
      
      Following changes are made to vmalloc_fault().
      
      64-bit:
      
       - No change for the PGD sync operation as it handles large
         pages already.
       - Add pud_huge() and pmd_huge() to the validation code to
         handle large pages.
       - Change pud_page_vaddr() to pud_pfn() since an ioremap range
         is not directly mapped (while the if-statement still works
         with a bogus addr).
       - Change pmd_page() to pmd_pfn() since an ioremap range is not
         backed by struct page (while the if-statement still works
         with a bogus addr).
      
      32-bit:
       - No change for the sync operation since the index3 PGD entry
         covers the entire vmalloc range, which is always valid.
         (A separate change to sync PGD entry is necessary if this
          memory layout is changed regardless of the page size.)
       - Add pmd_huge() to the validation code to handle large pages.
         This is for completeness since vmalloc_fault() won't happen
         in ioremap'd ranges as its PGD entry is always valid.
      Reported-by: default avatarHenning Schild <henning.schild@siemens.com>
      Signed-off-by: default avatarToshi Kani <toshi.kani@hpe.com>
      Acked-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: <stable@vger.kernel.org> # 4.1+
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: linux-mm@kvack.org
      Cc: linux-nvdimm@lists.01.org
      Link: http://lkml.kernel.org/r/1455758214-24623-1-git-send-email-toshi.kani@hpe.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      468eeda0
    • Gerd Hoffmann's avatar
      drm/qxl: use kmalloc_array to alloc reloc_info in qxl_process_single_command · 1d1338cf
      Gerd Hoffmann authored
      [ Upstream commit 34855706 ]
      
      This avoids integer overflows on 32bit machines when calculating
      reloc_info size, as reported by Alan Cox.
      
      Cc: stable@vger.kernel.org
      Cc: gnomes@lxorguk.ukuu.org.uk
      Signed-off-by: default avatarGerd Hoffmann <kraxel@redhat.com>
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      1d1338cf
    • Rasmus Villemoes's avatar
      drm/radeon: use post-decrement in error handling · 1a82e899
      Rasmus Villemoes authored
      [ Upstream commit bc3f5d8c ]
      
      We need to use post-decrement to get the pci_map_page undone also for
      i==0, and to avoid some very unpleasant behaviour if pci_map_page
      failed already at i==0.
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      1a82e899
    • Takashi Iwai's avatar
      ALSA: seq: Fix double port list deletion · ad4ad1ac
      Takashi Iwai authored
      [ Upstream commit 13d5e5d4 ]
      
      The commit [7f0973e9: ALSA: seq: Fix lockdep warnings due to
      double mutex locks] split the management of two linked lists (source
      and destination) into two individual calls for avoiding the AB/BA
      deadlock.  However, this may leave the possible double deletion of one
      of two lists when the counterpart is being deleted concurrently.
      It ends up with a list corruption, as revealed by syzkaller fuzzer.
      
      This patch fixes it by checking the list emptiness and skipping the
      deletion and the following process.
      
      BugLink: http://lkml.kernel.org/r/CACT4Y+bay9qsrz6dQu31EcGaH9XwfW7o3oBzSQUG9fMszoh=Sg@mail.gmail.com
      Fixes: 7f0973e9 ('ALSA: seq: Fix lockdep warnings due to 'double mutex locks)
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      ad4ad1ac
    • Arnd Bergmann's avatar
      tracing: Fix freak link error caused by branch tracer · a77b1f3d
      Arnd Bergmann authored
      [ Upstream commit b33c8ff4 ]
      
      In my randconfig tests, I came across a bug that involves several
      components:
      
      * gcc-4.9 through at least 5.3
      * CONFIG_GCOV_PROFILE_ALL enabling -fprofile-arcs for all files
      * CONFIG_PROFILE_ALL_BRANCHES overriding every if()
      * The optimized implementation of do_div() that tries to
        replace a library call with an division by multiplication
      * code in drivers/media/dvb-frontends/zl10353.c doing
      
              u32 adc_clock = 450560; /* 45.056 MHz */
              if (state->config.adc_clock)
                      adc_clock = state->config.adc_clock;
              do_div(value, adc_clock);
      
      In this case, gcc fails to determine whether the divisor
      in do_div() is __builtin_constant_p(). In particular, it
      concludes that __builtin_constant_p(adc_clock) is false, while
      __builtin_constant_p(!!adc_clock) is true.
      
      That in turn throws off the logic in do_div() that also uses
      __builtin_constant_p(), and instead of picking either the
      constant- optimized division, and the code in ilog2() that uses
      __builtin_constant_p() to figure out whether it knows the answer at
      compile time. The result is a link error from failing to find
      multiple symbols that should never have been called based on
      the __builtin_constant_p():
      
      dvb-frontends/zl10353.c:138: undefined reference to `____ilog2_NaN'
      dvb-frontends/zl10353.c:138: undefined reference to `__aeabi_uldivmod'
      ERROR: "____ilog2_NaN" [drivers/media/dvb-frontends/zl10353.ko] undefined!
      ERROR: "__aeabi_uldivmod" [drivers/media/dvb-frontends/zl10353.ko] undefined!
      
      This patch avoids the problem by changing __trace_if() to check
      whether the condition is known at compile-time to be nonzero, rather
      than checking whether it is actually a constant.
      
      I see this one link error in roughly one out of 1600 randconfig builds
      on ARM, and the patch fixes all known instances.
      
      Link: http://lkml.kernel.org/r/1455312410-1058841-1-git-send-email-arnd@arndb.deAcked-by: default avatarNicolas Pitre <nico@linaro.org>
      Fixes: ab3c9c68 ("branch tracer, intel-iommu: fix build with CONFIG_BRANCH_TRACER=y")
      Cc: stable@vger.kernel.org # v2.6.30+
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      a77b1f3d
    • Steven Rostedt (Red Hat)'s avatar
      tracepoints: Do not trace when cpu is offline · 999c2cea
      Steven Rostedt (Red Hat) authored
      [ Upstream commit f3775549 ]
      
      The tracepoint infrastructure uses RCU sched protection to enable and
      disable tracepoints safely. There are some instances where tracepoints are
      used in infrastructure code (like kfree()) that get called after a CPU is
      going offline, and perhaps when it is coming back online but hasn't been
      registered yet.
      
      This can probuce the following warning:
      
       [ INFO: suspicious RCU usage. ]
       4.4.0-00006-g0fe53e8-dirty #34 Tainted: G S
       -------------------------------
       include/trace/events/kmem.h:141 suspicious rcu_dereference_check() usage!
      
       other info that might help us debug this:
      
       RCU used illegally from offline CPU!  rcu_scheduler_active = 1, debug_locks = 1
       no locks held by swapper/8/0.
      
       stack backtrace:
        CPU: 8 PID: 0 Comm: swapper/8 Tainted: G S              4.4.0-00006-g0fe53e8-dirty #34
        Call Trace:
        [c0000005b76c78d0] [c0000000008b9540] .dump_stack+0x98/0xd4 (unreliable)
        [c0000005b76c7950] [c00000000010c898] .lockdep_rcu_suspicious+0x108/0x170
        [c0000005b76c79e0] [c00000000029adc0] .kfree+0x390/0x440
        [c0000005b76c7a80] [c000000000055f74] .destroy_context+0x44/0x100
        [c0000005b76c7b00] [c0000000000934a0] .__mmdrop+0x60/0x150
        [c0000005b76c7b90] [c0000000000e3ff0] .idle_task_exit+0x130/0x140
        [c0000005b76c7c20] [c000000000075804] .pseries_mach_cpu_die+0x64/0x310
        [c0000005b76c7cd0] [c000000000043e7c] .cpu_die+0x3c/0x60
        [c0000005b76c7d40] [c0000000000188d8] .arch_cpu_idle_dead+0x28/0x40
        [c0000005b76c7db0] [c000000000101e6c] .cpu_startup_entry+0x50c/0x560
        [c0000005b76c7ed0] [c000000000043bd8] .start_secondary+0x328/0x360
        [c0000005b76c7f90] [c000000000008a6c] start_secondary_prolog+0x10/0x14
      
      This warning is not a false positive either. RCU is not protecting code that
      is being executed while the CPU is offline.
      
      Instead of playing "whack-a-mole(TM)" and adding conditional statements to
      the tracepoints we find that are used in this instance, simply add a
      cpu_online() test to the tracepoint code where the tracepoint will be
      ignored if the CPU is offline.
      
      Use of raw_smp_processor_id() is fine, as there should never be a case where
      the tracepoint code goes from running on a CPU that is online and suddenly
      gets migrated to a CPU that is offline.
      
      Link: http://lkml.kernel.org/r/1455387773-4245-1-git-send-email-kda@linux-powerpc.orgReported-by: default avatarDenis Kirjanov <kda@linux-powerpc.org>
      Fixes: 97e1c18e ("tracing: Kernel Tracepoints")
      Cc: stable@vger.kernel.org # v2.6.28+
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      999c2cea
    • Andy Shevchenko's avatar
      dmaengine: dw: disable BLOCK IRQs for non-cyclic xfer · 6bdec2b3
      Andy Shevchenko authored
      [ Upstream commit ee1cdcda ]
      
      The commit 2895b2ca ("dmaengine: dw: fix cyclic transfer callbacks")
      re-enabled BLOCK interrupts with regard to make cyclic transfers work. However,
      this change becomes a regression for non-cyclic transfers as interrupt counters
      under stress test had been grown enormously (approximately per 4-5 bytes in the
      UART loop back test).
      
      Taking into consideration above enable BLOCK interrupts if and only if channel
      is programmed to perform cyclic transfer.
      
      Fixes: 2895b2ca ("dmaengine: dw: fix cyclic transfer callbacks")
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Acked-by: default avatarMans Rullgard <mans@mansr.com>
      Tested-by: default avatarMans Rullgard <mans@mansr.com>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      6bdec2b3
    • Takashi Iwai's avatar
      ALSA: hda - Cancel probe work instead of flush at remove · 800b2663
      Takashi Iwai authored
      [ Upstream commit 0b8c8219 ]
      
      The commit [991f86d7: ALSA: hda - Flush the pending probe work at
      remove] introduced the sync of async probe work at remove for fixing
      the race.  However, this may lead to another hangup when the module
      removal is performed quickly before starting the probe work, because
      it issues flush_work() and it's blocked forever.
      
      The workaround is to use cancel_work_sync() instead of flush_work()
      there.
      
      Fixes: 991f86d7 ('ALSA: hda - Flush the pending probe work at remove')
      Cc: <stable@vger.kernel.org> # v3.17+
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      800b2663
    • Takashi Iwai's avatar
      ALSA: seq: Fix leak of pool buffer at concurrent writes · d26d8b48
      Takashi Iwai authored
      [ Upstream commit d99a36f4 ]
      
      When multiple concurrent writes happen on the ALSA sequencer device
      right after the open, it may try to allocate vmalloc buffer for each
      write and leak some of them.  It's because the presence check and the
      assignment of the buffer is done outside the spinlock for the pool.
      
      The fix is to move the check and the assignment into the spinlock.
      
      (The current implementation is suboptimal, as there can be multiple
       unnecessary vmallocs because the allocation is done before the check
       in the spinlock.  But the pool size is already checked beforehand, so
       this isn't a big problem; that is, the only possible path is the
       multiple writes before any pool assignment, and practically seen, the
       current coverage should be "good enough".)
      
      The issue was triggered by syzkaller fuzzer.
      
      BugLink: http://lkml.kernel.org/r/CACT4Y+bSzazpXNvtAr=WXaL8hptqjHwqEyFA+VN2AWEx=aurkg@mail.gmail.comReported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      d26d8b48
    • Gavin Shan's avatar
      powerpc/eeh: Fix stale cached primary bus · 71239452
      Gavin Shan authored
      [ Upstream commit 05ba75f8 ]
      
      When PE is created, its primary bus is cached to pe->bus. At later
      point, the cached primary bus is returned from eeh_pe_bus_get().
      However, we could get stale cached primary bus and run into kernel
      crash in one case: full hotplug as part of fenced PHB error recovery
      releases all PCI busses under the PHB at unplugging time and recreate
      them at plugging time. pe->bus is still dereferencing the PCI bus
      that was released.
      
      This adds another PE flag (EEH_PE_PRI_BUS) to represent the validity
      of pe->bus. pe->bus is updated when its first child EEH device is
      online and the flag is set. Before unplugging in full hotplug for
      error recovery, the flag is cleared.
      
      Fixes: 8cdb2833 ("powerpc/eeh: Trace PCI bus from PE")
      Cc: stable@vger.kernel.org #v3.11+
      Reported-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Reported-by: default avatarPradipta Ghosh <pradghos@in.ibm.com>
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Tested-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      71239452
    • Andrey Konovalov's avatar
      ALSA: usb-audio: avoid freeing umidi object twice · 1ea63b62
      Andrey Konovalov authored
      [ Upstream commit 07d86ca9 ]
      
      The 'umidi' object will be free'd on the error path by snd_usbmidi_free()
      when tearing down the rawmidi interface. So we shouldn't try to free it
      in snd_usbmidi_create() after having registered the rawmidi interface.
      
      Found by KASAN.
      Signed-off-by: default avatarAndrey Konovalov <andreyknvl@gmail.com>
      Acked-by: default avatarClemens Ladisch <clemens@ladisch.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      1ea63b62
    • Hannes Reinecke's avatar
      bio: return EINTR if copying to user space got interrupted · f3dd341b
      Hannes Reinecke authored
      [ Upstream commit 2d99b55d ]
      
      Commit 35dc2483 introduced a check for
      current->mm to see if we have a user space context and only copies data
      if we do. Now if an IO gets interrupted by a signal data isn't copied
      into user space any more (as we don't have a user space context) but
      user space isn't notified about it.
      
      This patch modifies the behaviour to return -EINTR from bio_uncopy_user()
      to notify userland that a signal has interrupted the syscall, otherwise
      it could lead to a situation where the caller may get a buffer with
      no data returned.
      
      This can be reproduced by issuing SG_IO ioctl()s in one thread while
      constantly sending signals to it.
      
      Fixes: 35dc2483 [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal
      Signed-off-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarHannes Reinecke <hare@suse.de>
      Cc: stable@vger.kernel.org # v.3.11+
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      f3dd341b
    • Ryan Ware's avatar
      EVM: Use crypto_memneq() for digest comparisons · d185fa45
      Ryan Ware authored
      [ Upstream commit 613317bd ]
      
      This patch fixes vulnerability CVE-2016-2085.  The problem exists
      because the vm_verify_hmac() function includes a use of memcmp().
      Unfortunately, this allows timing side channel attacks; specifically
      a MAC forgery complexity drop from 2^128 to 2^12.  This patch changes
      the memcmp() to the cryptographically safe crypto_memneq().
      Reported-by: default avatarXiaofei Rex Guo <xiaofei.rex.guo@intel.com>
      Signed-off-by: default avatarRyan Ware <ware@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMimi Zohar <zohar@linux.vnet.ibm.com>
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      d185fa45
    • Eryu Guan's avatar
      ext4: don't read blocks from disk after extents being swapped · a285eee1
      Eryu Guan authored
      [ Upstream commit bcff2488 ]
      
      I notice ext4/307 fails occasionally on ppc64 host, reporting md5
      checksum mismatch after moving data from original file to donor file.
      
      The reason is that move_extent_per_page() calls __block_write_begin()
      and block_commit_write() to write saved data from original inode blocks
      to donor inode blocks, but __block_write_begin() not only maps buffer
      heads but also reads block content from disk if the size is not block
      size aligned.  At this time the physical block number in mapped buffer
      head is pointing to the donor file not the original file, and that
      results in reading wrong data to page, which get written to disk in
      following block_commit_write call.
      
      This also can be reproduced by the following script on 1k block size ext4
      on x86_64 host:
      
          mnt=/mnt/ext4
          donorfile=$mnt/donor
          testfile=$mnt/testfile
          e4compact=~/xfstests/src/e4compact
      
          rm -f $donorfile $testfile
      
          # reserve space for donor file, written by 0xaa and sync to disk to
          # avoid EBUSY on EXT4_IOC_MOVE_EXT
          xfs_io -fc "pwrite -S 0xaa 0 1m" -c "fsync" $donorfile
      
          # create test file written by 0xbb
          xfs_io -fc "pwrite -S 0xbb 0 1023" -c "fsync" $testfile
      
          # compute initial md5sum
          md5sum $testfile | tee md5sum.txt
          # drop cache, force e4compact to read data from disk
          echo 3 > /proc/sys/vm/drop_caches
      
          # test defrag
          echo "$testfile" | $e4compact -i -v -f $donorfile
          # check md5sum
          md5sum -c md5sum.txt
      
      Fix it by creating & mapping buffer heads only but not reading blocks
      from disk, because all the data in page is guaranteed to be up-to-date
      in mext_page_mkuptodate().
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarEryu Guan <guaneryu@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      a285eee1
    • Insu Yun's avatar
      ext4: fix potential integer overflow · 9d767979
      Insu Yun authored
      [ Upstream commit 46901760 ]
      
      Since sizeof(ext_new_group_data) > sizeof(ext_new_flex_group_data),
      integer overflow could be happened.
      Therefore, need to fix integer overflow sanitization.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarInsu Yun <wuninsu@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      9d767979
    • David Sterba's avatar
      btrfs: properly set the termination value of ctx->pos in readdir · 3a68deb6
      David Sterba authored
      [ Upstream commit bc4ef759 ]
      
      The value of ctx->pos in the last readdir call is supposed to be set to
      INT_MAX due to 32bit compatibility, unless 'pos' is intentially set to a
      larger value, then it's LLONG_MAX.
      
      There's a report from PaX SIZE_OVERFLOW plugin that "ctx->pos++"
      overflows (https://forums.grsecurity.net/viewtopic.php?f=1&t=4284), on a
      64bit arch, where the value is 0x7fffffffffffffff ie. LLONG_MAX before
      the increment.
      
      We can get to that situation like that:
      
      * emit all regular readdir entries
      * still in the same call to readdir, bump the last pos to INT_MAX
      * next call to readdir will not emit any entries, but will reach the
        bump code again, finds pos to be INT_MAX and sets it to LLONG_MAX
      
      Normally this is not a problem, but if we call readdir again, we'll find
      'pos' set to LLONG_MAX and the unconditional increment will overflow.
      
      The report from Victor at
      (http://thread.gmane.org/gmane.comp.file-systems.btrfs/49500) with debugging
      print shows that pattern:
      
       Overflow: e
       Overflow: 7fffffff
       Overflow: 7fffffffffffffff
       PAX: size overflow detected in function btrfs_real_readdir
         fs/btrfs/inode.c:5760 cicus.935_282 max, count: 9, decl: pos; num: 0;
         context: dir_context;
       CPU: 0 PID: 2630 Comm: polkitd Not tainted 4.2.3-grsec #1
       Hardware name: Gigabyte Technology Co., Ltd. H81ND2H/H81ND2H, BIOS F3 08/11/2015
        ffffffff81901608 0000000000000000 ffffffff819015e6 ffffc90004973d48
        ffffffff81742f0f 0000000000000007 ffffffff81901608 ffffc90004973d78
        ffffffff811cb706 0000000000000000 ffff8800d47359e0 ffffc90004973ed8
       Call Trace:
        [<ffffffff81742f0f>] dump_stack+0x4c/0x7f
        [<ffffffff811cb706>] report_size_overflow+0x36/0x40
        [<ffffffff812ef0bc>] btrfs_real_readdir+0x69c/0x6d0
        [<ffffffff811dafc8>] iterate_dir+0xa8/0x150
        [<ffffffff811e6d8d>] ? __fget_light+0x2d/0x70
        [<ffffffff811dba3a>] SyS_getdents+0xba/0x1c0
       Overflow: 1a
        [<ffffffff811db070>] ? iterate_dir+0x150/0x150
        [<ffffffff81749b69>] entry_SYSCALL_64_fastpath+0x12/0x83
      
      The jump from 7fffffff to 7fffffffffffffff happens when new dir entries
      are not yet synced and are processed from the delayed list. Then the code
      could go to the bump section again even though it might not emit any new
      dir entries from the delayed list.
      
      The fix avoids entering the "bump" section again once we've finished
      emitting the entries, both for synced and delayed entries.
      
      References: https://forums.grsecurity.net/viewtopic.php?f=1&t=4284Reported-by: default avatarVictor <services@swwu.com>
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Tested-by: default avatarHolger Hoffstätte <holger.hoffstaette@googlemail.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      3a68deb6
    • Linus Walleij's avatar
      ARM: 8519/1: ICST: try other dividends than 1 · 1c35d875
      Linus Walleij authored
      [ Upstream commit e972c374 ]
      
      Since the dawn of time the ICST code has only supported divide
      by one or hang in an eternal loop. Luckily we were always dividing
      by one because the reference frequency for the systems using
      the ICSTs is 24MHz and the [min,max] values for the PLL input
      if [10,320] MHz for ICST307 and [6,200] for ICST525, so the loop
      will always terminate immediately without assigning any divisor
      for the reference frequency.
      
      But for the code to make sense, let's insert the missing i++
      Reported-by: default avatarDavid Binderman <dcb314@hotmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      1c35d875
    • Stefan Haberland's avatar
      s390/dasd: fix refcount for PAV reassignment · 5ae3f5ad
      Stefan Haberland authored
      [ Upstream commit 9d862aba ]
      
      Add refcount to the DASD device when a summary unit check worker is
      scheduled. This prevents that the device is set offline with worker
      in place.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarStefan Haberland <stefan.haberland@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      5ae3f5ad
    • Stefan Haberland's avatar
      s390/dasd: prevent incorrect length error under z/VM after PAV changes · 2e35e9c4
      Stefan Haberland authored
      [ Upstream commit 020bf042 ]
      
      The channel checks the specified length and the provided amount of
      data for CCWs and provides an incorrect length error if the size does
      not match. Under z/VM with simulation activated the length may get
      changed. Having the suppress length indication bit set is stated as
      good CCW coding practice and avoids errors under z/VM.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarStefan Haberland <stefan.haberland@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      2e35e9c4
    • Anton Protopopov's avatar
      cifs: fix erroneous return value · 4f55047b
      Anton Protopopov authored
      [ Upstream commit 4b550af5 ]
      
      The setup_ntlmv2_rsp() function may return positive value ENOMEM instead
      of -ENOMEM in case of kmalloc failure.
      Signed-off-by: default avatarAnton Protopopov <a.s.protopopov@gmail.com>
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      4f55047b
    • Nicolai Hähnle's avatar
      drm/radeon: hold reference to fences in radeon_sa_bo_new · 7071cc9b
      Nicolai Hähnle authored
      [ Upstream commit f6ff4f67 ]
      
      An arbitrary amount of time can pass between spin_unlock and
      radeon_fence_wait_any, so we need to ensure that nobody frees the
      fences from under us.
      
      Based on the analogous fix for amdgpu.
      Signed-off-by: default avatarNicolai Hähnle <nicolai.haehnle@amd.com>
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      7071cc9b
    • Tejun Heo's avatar
      workqueue: handle NUMA_NO_NODE for unbound pool_workqueue lookup · f3d69fd8
      Tejun Heo authored
      [ Upstream commit d6e022f1 ]
      
      When looking up the pool_workqueue to use for an unbound workqueue,
      workqueue assumes that the target CPU is always bound to a valid NUMA
      node.  However, currently, when a CPU goes offline, the mapping is
      destroyed and cpu_to_node() returns NUMA_NO_NODE.
      
      This has always been broken but hasn't triggered often enough before
      874bbfe6 ("workqueue: make sure delayed work run in local cpu").
      After the commit, workqueue forcifully assigns the local CPU for
      delayed work items without explicit target CPU to fix a different
      issue.  This widens the window where CPU can go offline while a
      delayed work item is pending causing delayed work items dispatched
      with target CPU set to an already offlined CPU.  The resulting
      NUMA_NO_NODE mapping makes workqueue try to queue the work item on a
      NULL pool_workqueue and thus crash.
      
      While 874bbfe6 has been reverted for a different reason making the
      bug less visible again, it can still happen.  Fix it by mapping
      NUMA_NO_NODE to the default pool_workqueue from unbound_pwq_by_node().
      This is a temporary workaround.  The long term solution is keeping CPU
      -> NODE mapping stable across CPU off/online cycles which is being
      worked on.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarMike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Len Brown <len.brown@intel.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/g/1454424264.11183.46.camel@gmail.com
      Link: http://lkml.kernel.org/g/1453702100-2597-1-git-send-email-tangchen@cn.fujitsu.comSigned-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      f3d69fd8
    • Lai Jiangshan's avatar
      workqueue: wq_pool_mutex protects the attrs-installation · d3c4dd88
      Lai Jiangshan authored
      [ Upstream commit 5b95e1af ]
      
      Current wq_pool_mutex doesn't proctect the attrs-installation, it results
      that ->unbound_attrs, ->numa_pwq_tbl[] and ->dfl_pwq can only be accessed
      under wq->mutex and causes some inconveniences. Example, wq_update_unbound_numa()
      has to acquire wq->mutex before fetching the wq->unbound_attrs->no_numa
      and the old_pwq.
      
      attrs-installation is a short operation, so this change will no cause any
      latency for other operations which also acquire the wq_pool_mutex.
      
      The only unprotected attrs-installation code is in apply_workqueue_attrs(),
      so this patch touches code less than comments.
      
      It is also a preparation patch for next several patches which read
      wq->unbound_attrs, wq->numa_pwq_tbl[] and wq->dfl_pwq with
      only wq_pool_mutex held.
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      d3c4dd88
    • Lai Jiangshan's avatar
      workqueue: split apply_workqueue_attrs() into 3 stages · 9e1a3771
      Lai Jiangshan authored
      [ Upstream commit 2d5f0764 ]
      
      Current apply_workqueue_attrs() includes pwqs-allocation and pwqs-installation,
      so when we batch multiple apply_workqueue_attrs()s as a transaction, we can't
      ensure the transaction must succeed or fail as a complete unit.
      
      To solve this, we split apply_workqueue_attrs() into three stages.
      The first stage does the preparation: allocation memory, pwqs.
      The second stage does the attrs-installaion and pwqs-installation.
      The third stage frees the allocated memory and (old or unused) pwqs.
      
      As the result, batching multiple apply_workqueue_attrs()s can
      succeed or fail as a complete unit:
      	1) batch do all the first stage for all the workqueues
      	2) only commit all when all the above succeed.
      
      This patch is a preparation for the next patch ("Allow modifying low level
      unbound workqueue cpumask") which will do a multiple apply_workqueue_attrs().
      
      The patch doesn't have functionality changed except two minor adjustment:
      	1) free_unbound_pwq() for the error path is removed, we use the
      	   heavier version put_pwq_unlocked() instead since the error path
      	   is rare. this adjustment simplifies the code.
      	2) the memory-allocation is also moved into wq_pool_mutex.
      	   this is needed to avoid to do the further splitting.
      
      tj: minor updates to comments.
      Suggested-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Mike Galbraith <bitbucket@online.de>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      9e1a3771
    • Alexandra Yates's avatar
      ahci: Intel DNV device IDs SATA · de7b555f
      Alexandra Yates authored
      [ Upstream commit 342decff ]
      
      Adding Intel codename DNV platform device IDs for SATA.
      Signed-off-by: default avatarAlexandra Yates <alexandra.yates@linux.intel.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      de7b555f
    • Tony Lindgren's avatar
      phy: twl4030-usb: Fix unbalanced pm_runtime_enable on module reload · bb789d04
      Tony Lindgren authored
      [ Upstream commit 58a66dba ]
      
      If we reload phy-twl4030-usb, we get a warning about unbalanced
      pm_runtime_enable. Let's fix the issue and also fix idling of the
      device on unload before we attempt to shut it down.
      
      If we don't properly idle the PHY before shutting it down on removal,
      the twl4030 ends up consuming about 62mW of extra power compared to
      running idle with the module loaded.
      
      Cc: stable@vger.kernel.org
      Cc: Bin Liu <b-liu@ti.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Kishon Vijay Abraham I <kishon@ti.com>
      Cc: NeilBrown <neil@brown.name>
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      Signed-off-by: default avatarKishon Vijay Abraham I <kishon@ti.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      bb789d04
    • Tony Lindgren's avatar
      phy: twl4030-usb: Relase usb phy on unload · feebbdd7
      Tony Lindgren authored
      [ Upstream commit b241d31e ]
      
      Otherwise rmmod omap2430; rmmod phy-twl4030-usb; modprobe omap2430
      will try to use a non-existing phy and oops:
      
      Unable to handle kernel paging request at virtual address b6f7c1f0
      ...
      [<c048a284>] (devm_usb_get_phy_by_node) from [<bf0758ac>]
      (omap2430_musb_init+0x44/0x2b4 [omap2430])
      [<bf0758ac>] (omap2430_musb_init [omap2430]) from [<bf055ec0>]
      (musb_init_controller+0x194/0x878 [musb_hdrc])
      
      Cc: stable@vger.kernel.org
      Cc: Bin Liu <b-liu@ti.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Kishon Vijay Abraham I <kishon@ti.com>
      Cc: NeilBrown <neil@brown.name>
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      Signed-off-by: default avatarKishon Vijay Abraham I <kishon@ti.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      feebbdd7
    • Shawn Lin's avatar
      phy: core: fix wrong err handle for phy_power_on · 7822d8eb
      Shawn Lin authored
      [ Upstream commit b82fcabe ]
      
      If phy_pm_runtime_get_sync failed but we already
      enable regulator, current code return directly without
      doing regulator_disable. This patch fix this problem
      and cleanup err handle of phy_power_on to be more readable.
      
      Fixes: 3be88125 ("phy: core: Support regulator ...")
      Cc: <stable@vger.kernel.org> # v3.18+
      Cc: Roger Quadros <rogerq@ti.com>
      Cc: Axel Lin <axel.lin@ingics.com>
      Signed-off-by: default avatarShawn Lin <shawn.lin@rock-chips.com>
      Signed-off-by: default avatarKishon Vijay Abraham I <kishon@ti.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      7822d8eb
    • Tejun Heo's avatar
      Revert "workqueue: make sure delayed work run in local cpu" · 68fce03b
      Tejun Heo authored
      [ Upstream commit 041bd12e ]
      
      This reverts commit 874bbfe6.
      
      Workqueue used to implicity guarantee that work items queued without
      explicit CPU specified are put on the local CPU.  Recent changes in
      timer broke the guarantee and led to vmstat breakage which was fixed
      by 176bed1d ("vmstat: explicitly schedule per-cpu work on the CPU
      we need it to run on").
      
      vmstat is the most likely to expose the issue and it's quite possible
      that there are other similar problems which are a lot more difficult
      to trigger.  As a preventive measure, 874bbfe6 ("workqueue: make
      sure delayed work run in local cpu") was applied to restore the local
      CPU guarnatee.  Unfortunately, the change exposed a bug in timer code
      which got fixed by 22b886dd ("timers: Use proper base migration in
      add_timer_on()").  Due to code restructuring, the commit couldn't be
      backported beyond certain point and stable kernels which only had
      874bbfe6 started crashing.
      
      The local CPU guarantee was accidental more than anything else and we
      want to get rid of it anyway.  As, with the vmstat case fixed,
      874bbfe6 is causing more problems than it's fixing, it has been
      decided to take the chance and officially break the guarantee by
      reverting the commit.  A debug feature will be added to force foreign
      CPU assignment to expose cases relying on the guarantee and fixes for
      the individual cases will be backported to stable as necessary.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Fixes: 874bbfe6 ("workqueue: make sure delayed work run in local cpu")
      Link: http://lkml.kernel.org/g/20160120211926.GJ10810@quack.suse.cz
      Cc: stable@vger.kernel.org
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
      Cc: Daniel Bilik <daniel.bilik@neosystem.cz>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Daniel Bilik <daniel.bilik@neosystem.cz>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      68fce03b