1. 12 Feb, 2016 26 commits
    • Tejun Heo's avatar
      Revert "workqueue: make sure delayed work run in local cpu" · 237c785f
      Tejun Heo authored
      commit 041bd12e upstream.
      
      This reverts commit 874bbfe6.
      
      Workqueue used to implicity guarantee that work items queued without
      explicit CPU specified are put on the local CPU.  Recent changes in
      timer broke the guarantee and led to vmstat breakage which was fixed
      by 176bed1d ("vmstat: explicitly schedule per-cpu work on the CPU
      we need it to run on").
      
      vmstat is the most likely to expose the issue and it's quite possible
      that there are other similar problems which are a lot more difficult
      to trigger.  As a preventive measure, 874bbfe6 ("workqueue: make
      sure delayed work run in local cpu") was applied to restore the local
      CPU guarnatee.  Unfortunately, the change exposed a bug in timer code
      which got fixed by 22b886dd ("timers: Use proper base migration in
      add_timer_on()").  Due to code restructuring, the commit couldn't be
      backported beyond certain point and stable kernels which only had
      874bbfe6 started crashing.
      
      The local CPU guarantee was accidental more than anything else and we
      want to get rid of it anyway.  As, with the vmstat case fixed,
      874bbfe6 is causing more problems than it's fixing, it has been
      decided to take the chance and officially break the guarantee by
      reverting the commit.  A debug feature will be added to force foreign
      CPU assignment to expose cases relying on the guarantee and fixes for
      the individual cases will be backported to stable as necessary.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Fixes: 874bbfe6 ("workqueue: make sure delayed work run in local cpu")
      Link: http://lkml.kernel.org/g/20160120211926.GJ10810@quack.suse.cz
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
      Cc: Daniel Bilik <daniel.bilik@neosystem.cz>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Daniel Bilik <daniel.bilik@neosystem.cz>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      237c785f
    • Linus Torvalds's avatar
      vmstat: explicitly schedule per-cpu work on the CPU we need it to run on · 933f407f
      Linus Torvalds authored
      commit 176bed1d upstream.
      
      The vmstat code uses "schedule_delayed_work_on()" to do the initial
      startup of the delayed work on the right CPU, but then once it was
      started it would use the non-cpu-specific "schedule_delayed_work()" to
      re-schedule it on that CPU.
      
      That just happened to schedule it on the same CPU historically (well, in
      almost all situations), but the code _requires_ this work to be per-cpu,
      and should say so explicitly rather than depend on the non-cpu-specific
      scheduling to schedule on the current CPU.
      
      The timer code is being changed to not be as single-minded in always
      running things on the calling CPU.
      
      See also commit 874bbfe6 ("workqueue: make sure delayed work run in
      local cpu") that for now maintains the local CPU guarantees just in case
      there are other broken users that depended on the accidental behavior.
      
      js: 3.12 backport
      
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Mike Galbraith <mgalbraith@suse.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      933f407f
    • Andrew Morton's avatar
      openrisc: fix CONFIG_UID16 setting · 6b443a1b
      Andrew Morton authored
      commit 04ea1e91 upstream.
      
      openrisc-allnoconfig:
      
        kernel/uid16.c: In function 'SYSC_setgroups16':
        kernel/uid16.c:184:2: error: implicit declaration of function 'groups_alloc'
        kernel/uid16.c:184:13: warning: assignment makes pointer from integer without a cast
      
      openrisc shouldn't be setting CONFIG_UID16 when CONFIG_MULTIUSER=n.
      
      Fixes: 2813893f ("kernel: conditionally support non-root users, groups and capabilities")
      Reported-by: default avatarFengguang Wu <fengguang.wu@gmail.com>
      Cc: Iulia Manda <iulia.manda21@gmail.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      6b443a1b
    • Jiri Slaby's avatar
      x86: vvar, fix excessive gcc-6 DECLARE_VVAR warnings · d6ace935
      Jiri Slaby authored
      On 3.12, with gcc-6, I see a lot of:
      arch/x86/include/asm/vvar.h:33:28: warning: ‘vvaraddr_jiffies’ defined but not used [-Wunused-const-variable]
        static type const * const vvaraddr_ ## name =   \
                                  ^
      arch/x86/include/asm/vvar.h:46:1: note: in expansion of macro ‘DECLARE_VVAR’
       DECLARE_VVAR(0, volatile unsigned long, jiffies)
       ^~~~~~~~~~~~
      
      In upstream, this is fixed by ef721987 (x86, vdso: Introduce VVAR
      marco for vdso32) and f40c3300 (x86, vdso: Move the vvar and hpet
      mappings next to the 64-bit vDSO). But this is not applicable to
      stable.
      
      So mark the vvar declaration as __maybe_unused and be done with it.
      This will generate it to the code only if it is used. I.e. the same as
      with gcc < 6.
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Cc: Andy Lutomirski <luto@amacapital.net>
      d6ace935
    • Joe Perches's avatar
      compiler-gcc: integrate the various compiler-gcc[345].h files · 1fa9b58c
      Joe Perches authored
      commit cb984d10 upstream.
      
      As gcc major version numbers are going to advance rather rapidly in the
      future, there's no real value in separate files for each compiler
      version.
      
      Deduplicate some of the macros #defined in each file too.
      
      Neaten comments using normal kernel commenting style.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Segher Boessenkool <segher@kernel.crashing.org>
      Cc: Sasha Levin <levinsasha928@gmail.com>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Alan Modra <amodra@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      1fa9b58c
    • Steven Noonan's avatar
      compiler/gcc4+: Remove inaccurate comment about 'asm goto' miscompiles · e3e923ee
      Steven Noonan authored
      commit 5631b8fb upstream.
      
      The bug referenced by the comment in this commit was not
      completely fixed in GCC 4.8.2, as I mentioned in a thread back
      in February:
      
         https://lkml.org/lkml/2014/2/12/797
      
      The conclusion at that time was to make the quirk unconditional
      until the bug could be found and fixed in GCC. Unfortunately,
      when I submitted the patch (commit a9f18034) I left a comment
      in that claimed the bug was fixed in GCC 4.8.2+.
      
      This comment is inaccurate, and should be removed.
      Signed-off-by: default avatarSteven Noonan <steven@uplinklabs.net>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/1414274982-14040-1-git-send-email-steven@uplinklabs.net
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e3e923ee
    • Yang Shi's avatar
      arm64: restore bogomips information in /proc/cpuinfo · 0b3f87b4
      Yang Shi authored
      commit 92e788b7 upstream.
      
      As previously reported, some userspace applications depend on bogomips
      showed by /proc/cpuinfo. Although there is much less legacy impact on
      aarch64 than arm, it does break libvirt.
      
      This patch reverts commit 326b16db ("arm64: delay: don't bother
      reporting bogomips in /proc/cpuinfo"), but with some tweak due to
      context change and without the pr_info().
      
      Fixes: 326b16db ("arm64: delay: don't bother reporting bogomips in /proc/cpuinfo")
      Signed-off-by: default avatarYang Shi <yang.shi@linaro.org>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      0b3f87b4
    • Guenter Roeck's avatar
      mn10300: Select CONFIG_HAVE_UID16 to fix build failure · f5e77f5a
      Guenter Roeck authored
      commit c86576ea upstream.
      
      mn10300 builds fail with
      
      fs/stat.c: In function 'cp_old_stat':
      fs/stat.c:163:2: error: 'old_uid_t' undeclared
      
      ipc/util.c: In function 'ipc64_perm_to_ipc_perm':
      ipc/util.c:540:2: error: 'old_uid_t' undeclared
      
      Select CONFIG_HAVE_UID16 and remove local definition of CONFIG_UID16
      to fix the problem.
      
      Fixes: fbc416ff ("arm64: fix building without CONFIG_UID16")
      Cc: Arnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarAcked-by: David Howells <dhowells@redhat.com>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      f5e77f5a
    • Richard Purdie's avatar
      HID: core: Avoid uninitialized buffer access · 19a3cb32
      Richard Purdie authored
      commit 79b568b9 upstream.
      
      hid_connect adds various strings to the buffer but they're all
      conditional. You can find circumstances where nothing would be written
      to it but the kernel will still print the supposedly empty buffer with
      printk. This leads to corruption on the console/in the logs.
      
      Ensure buf is initialized to an empty string.
      Signed-off-by: default avatarRichard Purdie <richard.purdie@linuxfoundation.org>
      [dvhart: Initialize string to "" rather than assign buf[0] = NULL;]
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: linux-input@vger.kernel.org
      Signed-off-by: default avatarDarren Hart <dvhart@linux.intel.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      19a3cb32
    • Mikulas Patocka's avatar
      parisc iommu: fix panic due to trying to allocate too large region · 01bdbe95
      Mikulas Patocka authored
      commit e46e31a3 upstream.
      
      When using the Promise TX2+ SATA controller on PA-RISC, the system often
      crashes with kernel panic, for example just writing data with the dd
      utility will make it crash.
      
      Kernel panic - not syncing: drivers/parisc/sba_iommu.c: I/O MMU @ 000000000000a000 is out of mapping resources
      
      CPU: 0 PID: 18442 Comm: mkspadfs Not tainted 4.4.0-rc2 #2
      Backtrace:
       [<000000004021497c>] show_stack+0x14/0x20
       [<0000000040410bf0>] dump_stack+0x88/0x100
       [<000000004023978c>] panic+0x124/0x360
       [<0000000040452c18>] sba_alloc_range+0x698/0x6a0
       [<0000000040453150>] sba_map_sg+0x260/0x5b8
       [<000000000c18dbb4>] ata_qc_issue+0x264/0x4a8 [libata]
       [<000000000c19535c>] ata_scsi_translate+0xe4/0x220 [libata]
       [<000000000c19a93c>] ata_scsi_queuecmd+0xbc/0x320 [libata]
       [<0000000040499bbc>] scsi_dispatch_cmd+0xfc/0x130
       [<000000004049da34>] scsi_request_fn+0x6e4/0x970
       [<00000000403e95a8>] __blk_run_queue+0x40/0x60
       [<00000000403e9d8c>] blk_run_queue+0x3c/0x68
       [<000000004049a534>] scsi_run_queue+0x2a4/0x360
       [<000000004049be68>] scsi_end_request+0x1a8/0x238
       [<000000004049de84>] scsi_io_completion+0xfc/0x688
       [<0000000040493c74>] scsi_finish_command+0x17c/0x1d0
      
      The cause of the crash is not exhaustion of the IOMMU space, there is
      plenty of free pages. The function sba_alloc_range is called with size
      0x11000, thus the pages_needed variable is 0x11. The function
      sba_search_bitmap is called with bits_wanted 0x11 and boundary size is
      0x10 (because dma_get_seg_boundary(dev) returns 0xffff).
      
      The function sba_search_bitmap attempts to allocate 17 pages that must not
      cross 16-page boundary - it can't satisfy this requirement
      (iommu_is_span_boundary always returns true) and fails even if there are
      many free entries in the IOMMU space.
      
      How did it happen that we try to allocate 17 pages that don't cross
      16-page boundary? The cause is in the function iommu_coalesce_chunks. This
      function tries to coalesce adjacent entries in the scatterlist. The
      function does several checks if it may coalesce one entry with the next,
      one of those checks is this:
      
      	if (startsg->length + dma_len > max_seg_size)
      		break;
      
      When it finishes coalescing adjacent entries, it allocates the mapping:
      
      sg_dma_len(contig_sg) = dma_len;
      dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE);
      sg_dma_address(contig_sg) =
      	PIDE_FLAG
      	| (iommu_alloc_range(ioc, dev, dma_len) << IOVP_SHIFT)
      	| dma_offset;
      
      It is possible that (startsg->length + dma_len > max_seg_size) is false
      (we are just near the 0x10000 max_seg_size boundary), so the funcion
      decides to coalesce this entry with the next entry. When the coalescing
      succeeds, the function performs
      	dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE);
      And now, because of non-zero dma_offset, dma_len is greater than 0x10000.
      iommu_alloc_range (a pointer to sba_alloc_range) is called and it attempts
      to allocate 17 pages for a device that must not cross 16-page boundary.
      
      To fix the bug, we must make sure that dma_len after addition of
      dma_offset and alignment doesn't cross the segment boundary. I.e. change
      	if (startsg->length + dma_len > max_seg_size)
      		break;
      to
      	if (ALIGN(dma_len + dma_offset + startsg->length, IOVP_SIZE) > max_seg_size)
      		break;
      
      This patch makes this change (it precalculates max_seg_boundary at the
      beginning of the function iommu_coalesce_chunks). I also added a check
      that the mapping length doesn't exceed dma_get_seg_boundary(dev) (it is
      not needed for Promise TX2+ SATA, but it may be needed for other devices
      that have dma_get_seg_boundary lower than dma_get_max_seg_size).
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      01bdbe95
    • Will Deacon's avatar
      arm64: mm: ensure that the zero page is visible to the page table walker · 13d8053e
      Will Deacon authored
      commit 32d63978 upstream.
      
      In paging_init, we allocate the zero page, memset it to zero and then
      point TTBR0 to it in order to avoid speculative fetches through the
      identity mapping.
      
      In order to guarantee that the freshly zeroed page is indeed visible to
      the page table walker, we need to execute a dsb instruction prior to
      writing the TTBR.
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      13d8053e
    • John Blackwood's avatar
      arm64: Clear out any singlestep state on a ptrace detach operation · 8abc0d5b
      John Blackwood authored
      commit 5db4fd8c upstream.
      
      Make sure to clear out any ptrace singlestep state when a ptrace(2)
      PTRACE_DETACH call is made on arm64 systems.
      
      Otherwise, the previously ptraced task will die off with a SIGTRAP
      signal if the debugger just previously singlestepped the ptraced task.
      Signed-off-by: default avatarJohn Blackwood <john.blackwood@ccur.com>
      [will: added comment to justify why this is in the arch code]
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      8abc0d5b
    • Arnd Bergmann's avatar
      arm64: fix building without CONFIG_UID16 · 4f29293d
      Arnd Bergmann authored
      commit fbc416ff upstream.
      
      As reported by Michal Simek, building an ARM64 kernel with CONFIG_UID16
      disabled currently fails because the system call table still needs to
      reference the individual function entry points that are provided by
      kernel/sys_ni.c in this case, and the declarations are hidden inside
      of #ifdef CONFIG_UID16:
      
      arch/arm64/include/asm/unistd32.h:57:8: error: 'sys_lchown16' undeclared here (not in a function)
       __SYSCALL(__NR_lchown, sys_lchown16)
      
      I believe this problem only exists on ARM64, because older architectures
      tend to not need declarations when their system call table is built
      in assembly code, while newer architectures tend to not need UID16
      support. ARM64 only uses these system calls for compatibility with
      32-bit ARM binaries.
      
      This changes the CONFIG_UID16 check into CONFIG_HAVE_UID16, which is
      set unconditionally on ARM64 with CONFIG_COMPAT, so we see the
      declarations whenever we need them, but otherwise the behavior is
      unchanged.
      
      Fixes: af1839eb ("Kconfig: clean up the long arch list for the UID16 config option")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      4f29293d
    • Marc Zyngier's avatar
      arm64: KVM: Fix AArch32 to AArch64 register mapping · 9bd3d70a
      Marc Zyngier authored
      commit c0f09634 upstream.
      
      When running a 32bit guest under a 64bit hypervisor, the ARMv8
      architecture defines a mapping of the 32bit registers in the 64bit
      space. This includes banked registers that are being demultiplexed
      over the 64bit ones.
      
      On exceptions caused by an operation involving a 32bit register, the
      HW exposes the register number in the ESR_EL2 register. It was so
      far understood that SW had to distinguish between AArch32 and AArch64
      accesses (based on the current AArch32 mode and register number).
      
      It turns out that I misinterpreted the ARM ARM, and the clue is in
      D1.20.1: "For some exceptions, the exception syndrome given in the
      ESR_ELx identifies one or more register numbers from the issued
      instruction that generated the exception. Where the exception is
      taken from an Exception level using AArch32 these register numbers
      give the AArch64 view of the register."
      
      Which means that the HW is already giving us the translated version,
      and that we shouldn't try to interpret it at all (for example, doing
      an MMIO operation from the IRQ mode using the LR register leads to
      very unexpected behaviours).
      
      The fix is thus not to perform a call to vcpu_reg32() at all from
      vcpu_reg(), and use whatever register number is supplied directly.
      The only case we need to find out about the mapping is when we
      actively generate a register access, which only occurs when injecting
      a fault in a guest.
      Reviewed-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9bd3d70a
    • Ulrich Weigand's avatar
      scripts/recordmcount.pl: support data in text section on powerpc · a69c779e
      Ulrich Weigand authored
      commit 2e50c4be upstream.
      
      If a text section starts out with a data blob before the first
      function start label, disassembly parsing doing in recordmcount.pl
      gets confused on powerpc, leading to creation of corrupted module
      objects.
      
      This was not a problem so far since the compiler would never create
      such text sections.  However, this has changed with a recent change
      in GCC 6 to support distances of > 2GB between a function and its
      assoicated TOC in the ELFv2 ABI, exposing this problem.
      
      There is already code in recordmcount.pl to handle such data blobs
      on the sparc64 platform.  This patch uses the same method to handle
      those on powerpc as well.
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarUlrich Weigand <ulrich.weigand@de.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      a69c779e
    • Boqun Feng's avatar
      powerpc: Make {cmp}xchg* and their atomic_ versions fully ordered · 166947a7
      Boqun Feng authored
      commit 81d7a329 upstream.
      
      According to memory-barriers.txt, xchg*, cmpxchg* and their atomic_
      versions all need to be fully ordered, however they are now just
      RELEASE+ACQUIRE, which are not fully ordered.
      
      So also replace PPC_RELEASE_BARRIER and PPC_ACQUIRE_BARRIER with
      PPC_ATOMIC_ENTRY_BARRIER and PPC_ATOMIC_EXIT_BARRIER in
      __{cmp,}xchg_{u32,u64} respectively to guarantee fully ordered semantics
      of atomic{,64}_{cmp,}xchg() and {cmp,}xchg(), as a complement of commit
      b97021f8 ("powerpc: Fix atomic_xxx_return barrier semantics")
      
      This patch depends on patch "powerpc: Make value-returning atomics fully
      ordered" for PPC_ATOMIC_ENTRY_BARRIER definition.
      Signed-off-by: default avatarBoqun Feng <boqun.feng@gmail.com>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      166947a7
    • Boqun Feng's avatar
      powerpc: Make value-returning atomics fully ordered · 432c20b3
      Boqun Feng authored
      commit 49e9cf3f upstream.
      
      According to memory-barriers.txt:
      
      > Any atomic operation that modifies some state in memory and returns
      > information about the state (old or new) implies an SMP-conditional
      > general memory barrier (smp_mb()) on each side of the actual
      > operation ...
      
      Which mean these operations should be fully ordered. However on PPC,
      PPC_ATOMIC_ENTRY_BARRIER is the barrier before the actual operation,
      which is currently "lwsync" if SMP=y. The leading "lwsync" can not
      guarantee fully ordered atomics, according to Paul Mckenney:
      
      https://lkml.org/lkml/2015/10/14/970
      
      To fix this, we define PPC_ATOMIC_ENTRY_BARRIER as "sync" to guarantee
      the fully-ordered semantics.
      
      This also makes futex atomics fully ordered, which can avoid possible
      memory ordering problems if userspace code relies on futex system call
      for fully ordered semantics.
      
      Fixes: b97021f8 ("powerpc: Fix atomic_xxx_return barrier semantics")
      Signed-off-by: default avatarBoqun Feng <boqun.feng@gmail.com>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      432c20b3
    • Michael Neuling's avatar
      powerpc/tm: Block signal return setting invalid MSR state · e9214d10
      Michael Neuling authored
      commit d2b9d2a5 upstream.
      
      Currently we allow both the MSR T and S bits to be set by userspace on
      a signal return.  Unfortunately this is a reserved configuration and
      will cause a TM Bad Thing exception if attempted (via rfid).
      
      This patch checks for this case in both the 32 and 64 bit signals
      code.  If both T and S are set, we mark the context as invalid.
      
      Found using a syscall fuzzer.
      
      Fixes: 2b0a576d ("powerpc: Add new transactional memory state to the signal context")
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e9214d10
    • Dan Streetman's avatar
      xfrm: dst_entries_init() per-net dst_ops · 3e29fa5b
      Dan Streetman authored
      [ Upstream commit a8a572a6 ]
      
      Remove the dst_entries_init/destroy calls for xfrm4 and xfrm6 dst_ops
      templates; their dst_entries counters will never be used.  Move the
      xfrm dst_ops initialization from the common xfrm/xfrm_policy.c to
      xfrm4/xfrm4_policy.c and xfrm6/xfrm6_policy.c, and call dst_entries_init
      and dst_entries_destroy for each net namespace.
      
      The ipv4 and ipv6 xfrms each create dst_ops template, and perform
      dst_entries_init on the templates.  The template values are copied to each
      net namespace's xfrm.xfrm*_dst_ops.  The problem there is the dst_ops
      pcpuc_entries field is a percpu counter and cannot be used correctly by
      simply copying it to another object.
      
      The result of this is a very subtle bug; changes to the dst entries
      counter from one net namespace may sometimes get applied to a different
      net namespace dst entries counter.  This is because of how the percpu
      counter works; it has a main count field as well as a pointer to the
      percpu variables.  Each net namespace maintains its own main count
      variable, but all point to one set of percpu variables.  When any net
      namespace happens to change one of the percpu variables to outside its
      small batch range, its count is moved to the net namespace's main count
      variable.  So with multiple net namespaces operating concurrently, the
      dst_ops entries counter can stray from the actual value that it should
      be; if counts are consistently moved from one net namespace to another
      (which my testing showed is likely), then one net namespace winds up
      with a negative dst_ops count while another winds up with a continually
      increasing count, eventually reaching its gc_thresh limit, which causes
      all new traffic on the net namespace to fail with -ENOBUFS.
      Signed-off-by: default avatarDan Streetman <dan.streetman@canonical.com>
      Signed-off-by: default avatarDan Streetman <ddstreet@ieee.org>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      3e29fa5b
    • Ido Schimmel's avatar
      team: Replace rcu_read_lock with a mutex in team_vlan_rx_kill_vid · a319d186
      Ido Schimmel authored
      [ Upstream commit 60a6531b ]
      
      We can't be within an RCU read-side critical section when deleting
      VLANs, as underlying drivers might sleep during the hardware operation.
      Therefore, replace the RCU critical section with a mutex. This is
      consistent with team_vlan_rx_add_vid.
      
      Fixes: 3d249d4c ("net: introduce ethernet teaming device")
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      a319d186
    • Eric Dumazet's avatar
      ipv6: update skb->csum when CE mark is propagated · 28882454
      Eric Dumazet authored
      [ Upstream commit 34ae6a1a ]
      
      When a tunnel decapsulates the outer header, it has to comply
      with RFC 6080 and eventually propagate CE mark into inner header.
      
      It turns out IP6_ECN_set_ce() does not correctly update skb->csum
      for CHECKSUM_COMPLETE packets, triggering infamous "hw csum failure"
      messages and stack traces.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      28882454
    • Eric Dumazet's avatar
      phonet: properly unshare skbs in phonet_rcv() · 2f962373
      Eric Dumazet authored
      [ Upstream commit 7aaed57c ]
      
      Ivaylo Dimitrov reported a regression caused by commit 7866a621
      ("dev: add per net_device packet type chains").
      
      skb->dev becomes NULL and we crash in __netif_receive_skb_core().
      
      Before above commit, different kind of bugs or corruptions could happen
      without major crash.
      
      But the root cause is that phonet_rcv() can queue skb without checking
      if skb is shared or not.
      
      Many thanks to Ivaylo Dimitrov for his help, diagnosis and tests.
      Reported-by: default avatarIvaylo Dimitrov <ivo.g.dimitrov.75@gmail.com>
      Tested-by: default avatarIvaylo Dimitrov <ivo.g.dimitrov.75@gmail.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Remi Denis-Courmont <courmisch@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      2f962373
    • Neal Cardwell's avatar
      tcp_yeah: don't set ssthresh below 2 · 113cc80b
      Neal Cardwell authored
      [ Upstream commit 83d15e70 ]
      
      For tcp_yeah, use an ssthresh floor of 2, the same floor used by Reno
      and CUBIC, per RFC 5681 (equation 4).
      
      tcp_yeah_ssthresh() was sometimes returning a 0 or negative ssthresh
      value if the intended reduction is as big or bigger than the current
      cwnd. Congestion control modules should never return a zero or
      negative ssthresh. A zero ssthresh generally results in a zero cwnd,
      causing the connection to stall. A negative ssthresh value will be
      interpreted as a u32 and will set a target cwnd for PRR near 4
      billion.
      
      Oleksandr Natalenko reported that a system using tcp_yeah with ECN
      could see a warning about a prior_cwnd of 0 in
      tcp_cwnd_reduction(). Testing verified that this was due to
      tcp_yeah_ssthresh() misbehaving in this way.
      Reported-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      113cc80b
    • Eric Dumazet's avatar
      ipv6: tcp: add rcu locking in tcp_v6_send_synack() · ea28b18b
      Eric Dumazet authored
      [ Upstream commit 3e4006f0 ]
      
      When first SYNACK is sent, we already hold rcu_read_lock(), but this
      is not true if a SYNACK is retransmitted, as a timer (soft) interrupt
      does not hold rcu_read_lock()
      
      Fixes: 45f6fad8 ("ipv6: add complete rcu protection around np->opt")
      Reported-by: default avatarDave Jones <davej@codemonkey.org.uk>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ea28b18b
    • Sasha Levin's avatar
      net: sctp: prevent writes to cookie_hmac_alg from accessing invalid memory · 9a449270
      Sasha Levin authored
      [ Upstream commit 320f1a4a ]
      
      proc_dostring() needs an initialized destination string, while the one
      provided in proc_sctp_do_hmac_alg() contains stack garbage.
      
      Thus, writing to cookie_hmac_alg would strlen() that garbage and end up
      accessing invalid memory.
      
      Fixes: 3c68198e ("sctp: Make hmac algorithm selection for cookie generation dynamic")
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9a449270
    • Hannes Frederic Sowa's avatar
      bridge: Only call /sbin/bridge-stp for the initial network namespace · b2e4e6d2
      Hannes Frederic Sowa authored
      [ Upstream commit ff621985 ]
      
      [I stole this patch from Eric Biederman. He wrote:]
      
      > There is no defined mechanism to pass network namespace information
      > into /sbin/bridge-stp therefore don't even try to invoke it except
      > for bridge devices in the initial network namespace.
      >
      > It is possible for unprivileged users to cause /sbin/bridge-stp to be
      > invoked for any network device name which if /sbin/bridge-stp does not
      > guard against unreasonable arguments or being invoked twice on the
      > same network device could cause problems.
      
      [Hannes: changed patch using netns_eq]
      
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      b2e4e6d2
  2. 28 Jan, 2016 4 commits
    • Florian Westphal's avatar
      connector: bump skb->users before callback invocation · 20f16f7e
      Florian Westphal authored
      [ Upstream commit 55285bf0 ]
      
      Dmitry reports memleak with syskaller program.
      Problem is that connector bumps skb usecount but might not invoke callback.
      
      So move skb_get to where we invoke the callback.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      20f16f7e
    • Xin Long's avatar
      sctp: sctp should release assoc when sctp_make_abort_user return NULL in sctp_close · 7bb8b4ae
      Xin Long authored
      [ Upstream commit 068d8bd3 ]
      
      In sctp_close, sctp_make_abort_user may return NULL because of memory
      allocation failure. If this happens, it will bypass any state change
      and never free the assoc. The assoc has no chance to be freed and it
      will be kept in memory with the state it had even after the socket is
      closed by sctp_close().
      
      So if sctp_make_abort_user fails to allocate memory, we should abort
      the asoc via sctp_primitive_ABORT as well. Just like the annotation in
      sctp_sf_cookie_wait_prm_abort and sctp_sf_do_9_1_prm_abort said,
      "Even if we can't send the ABORT due to low memory delete the TCB.
      This is a departure from our typical NOMEM handling".
      
      But then the chunk is NULL (low memory) and the SCTP_CMD_REPLY cmd would
      dereference the chunk pointer, and system crash. So we should add
      SCTP_CMD_REPLY cmd only when the chunk is not NULL, just like other
      places where it adds SCTP_CMD_REPLY cmd.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      7bb8b4ae
    • Andrey Ryabinin's avatar
      ipv6/addrlabel: fix ip6addrlbl_get() · 3877fb02
      Andrey Ryabinin authored
      [ Upstream commit e459dfee ]
      
      ip6addrlbl_get() has never worked. If ip6addrlbl_hold() succeeded,
      ip6addrlbl_get() will exit with '-ESRCH'. If ip6addrlbl_hold() failed,
      ip6addrlbl_get() will use about to be free ip6addrlbl_entry pointer.
      
      Fix this by inverting ip6addrlbl_hold() check.
      
      Fixes: 2a8cc6c8 ("[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table.")
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reviewed-by: default avatarCong Wang <cwang@twopensource.com>
      Acked-by: default avatarYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      3877fb02
    • Vijay Pandurangan's avatar
      veth: don’t modify ip_summed; doing so treats packets with bad checksums as good. · 0f915418
      Vijay Pandurangan authored
      [ Upstream commit ce8c839b ]
      
      Packets that arrive from real hardware devices have ip_summed ==
      CHECKSUM_UNNECESSARY if the hardware verified the checksums, or
      CHECKSUM_NONE if the packet is bad or it was unable to verify it. The
      current version of veth will replace CHECKSUM_NONE with
      CHECKSUM_UNNECESSARY, which causes corrupt packets routed from hardware to
      a veth device to be delivered to the application. This caused applications
      at Twitter to receive corrupt data when network hardware was corrupting
      packets.
      
      We believe this was added as an optimization to skip computing and
      verifying checksums for communication between containers. However, locally
      generated packets have ip_summed == CHECKSUM_PARTIAL, so the code as
      written does nothing for them. As far as we can tell, after removing this
      code, these packets are transmitted from one stack to another unmodified
      (tcpdump shows invalid checksums on both sides, as expected), and they are
      delivered correctly to applications. We didn’t test every possible network
      configuration, but we tried a few common ones such as bridging containers,
      using NAT between the host and a container, and routing from hardware
      devices to containers. We have effectively deployed this in production at
      Twitter (by disabling RX checksum offloading on veth devices).
      
      This code dates back to the first version of the driver, commit
      <e314dbdc> ("[NET]: Virtual ethernet device driver"), so I
      suspect this bug occurred mostly because the driver API has evolved
      significantly since then. Commit <0b796750> ("net/veth: Fix
      packet checksumming") (in December 2010) fixed this for packets that get
      created locally and sent to hardware devices, by not changing
      CHECKSUM_PARTIAL. However, the same issue still occurs for packets coming
      in from hardware devices.
      Co-authored-by: default avatarEvan Jones <ej@evanjones.ca>
      Signed-off-by: default avatarEvan Jones <ej@evanjones.ca>
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Cc: Phil Sutter <phil@nwl.cc>
      Cc: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Cc: netdev@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarVijay Pandurangan <vijayp@vijayp.ca>
      Acked-by: default avatarCong Wang <cwang@twopensource.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      0f915418
  3. 27 Jan, 2016 10 commits