1. 20 Jul, 2016 2 commits
    • Jennifer Herbert's avatar
      xenbus: Add proper handling of XS_ERROR from Xenbus for transactions. · 12d75ae7
      Jennifer Herbert authored
      [ Upstream commit a2e75bc2 ]
      
      If Xenstore sends back a XS_ERROR for TRANSACTION_END, the driver BUGs
      because it cannot find the matching transaction in the list.  For
      TRANSACTION_START, it leaks memory.
      
      Check the message as returned from xenbus_dev_request_and_reply(), and
      clean up for TRANSACTION_START or discard the error for
      TRANSACTION_END.
      Signed-off-by: default avatarJennifer Herbert <Jennifer.Herbert@citrix.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      12d75ae7
    • David Daney's avatar
      MIPS: Fix page table corruption on THP permission changes. · aad82778
      David Daney authored
      [ Upstream commit 88d02a2b ]
      
      When the core THP code is modifying the permissions of a huge page it
      calls pmd_modify(), which unfortunately was clearing the _PAGE_HUGE bit
      of the page table entry.  The result can be kernel messages like:
      
      mm/memory.c:397: bad pmd 000000040080004d.
      mm/memory.c:397: bad pmd 00000003ff00004d.
      mm/memory.c:397: bad pmd 000000040100004d.
      
      or:
      
      ------------[ cut here ]------------
      WARNING: at mm/mmap.c:3200 exit_mmap+0x150/0x158()
      Modules linked in: ipv6 at24 octeon3_ethernet octeon_srio_nexus m25p80
      CPU: 12 PID: 1295 Comm: pmderr Not tainted 3.10.87-rt80-Cavium-Octeon #4
      Stack : 0000000040808000 0000000014009ce1 0000000000400004 ffffffff81076ba0
                0000000000000000 0000000000000000 ffffffff85110000 0000000000000119
                0000000000000004 0000000000000000 0000000000000119 43617669756d2d4f
                0000000000000000 ffffffff850fda40 ffffffff85110000 0000000000000000
                0000000000000000 0000000000000009 ffffffff809207a0 0000000000000c80
                ffffffff80f1bf20 0000000000000001 000000ffeca36828 0000000000000001
                0000000000000000 0000000000000001 000000ffeca7e700 ffffffff80886924
                80000003fd7a0000 80000003fd7a39b0 80000003fdea8000 ffffffff80885780
                80000003fdea8000 ffffffff80f12218 000000000000000c 000000000000050f
                0000000000000000 ffffffff80865c4c 0000000000000000 0000000000000000
                ...
      Call Trace:
      [<ffffffff80865c4c>] show_stack+0x6c/0xf8
      [<ffffffff80885780>] warn_slowpath_common+0x78/0xa8
      [<ffffffff809207a0>] exit_mmap+0x150/0x158
      [<ffffffff80882d44>] mmput+0x5c/0x110
      [<ffffffff8088b450>] do_exit+0x230/0xa68
      [<ffffffff8088be34>] do_group_exit+0x54/0x1d0
      [<ffffffff8088bfc0>] __wake_up_parent+0x0/0x18
      
      ---[ end trace c7b38293191c57dc ]---
      BUG: Bad rss-counter state mm:80000003fa168000 idx:1 val:1536
      
      Fix by not clearing _PAGE_HUGE bit.
      Signed-off-by: default avatarDavid Daney <david.daney@cavium.com>
      Tested-by: default avatarAaro Koskinen <aaro.koskinen@nokia.com>
      Cc: stable@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/13687/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      aad82778
  2. 19 Jul, 2016 7 commits
    • Ursula Braun's avatar
      qeth: delete napi struct when removing a qeth device · 8c2cb775
      Ursula Braun authored
      [ Upstream commit 7831b4ff ]
      
      A qeth_card contains a napi_struct linked to the net_device during
      device probing. This struct must be deleted when removing the qeth
      device, otherwise Panic on oops can occur when qeth devices are
      repeatedly removed and added.
      
      Fixes: a1c3ed4c ("qeth: NAPI support for l2 and l3 discipline")
      Cc: stable@vger.kernel.org # v2.6.37+
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.vnet.ibm.com>
      Tested-by: default avatarAlexander Klein <ALKL@de.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      8c2cb775
    • Takashi Iwai's avatar
      ALSA: timer: Fix negative queue usage by racy accesses · a851e568
      Takashi Iwai authored
      [ Upstream commit 3fa6993f ]
      
      The user timer tu->qused counter may go to a negative value when
      multiple concurrent reads are performed since both the check and the
      decrement of tu->qused are done in two individual locked contexts.
      This results in bogus read outs, and the endless loop in the
      user-space side.
      
      The fix is to move the decrement of the tu->qused counter into the
      same spinlock context as the zero-check of the counter.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      a851e568
    • Omar Sandoval's avatar
      block: fix use-after-free in sys_ioprio_get() · b86ef7ef
      Omar Sandoval authored
      [ Upstream commit 8ba86821 ]
      
      get_task_ioprio() accesses the task->io_context without holding the task
      lock and thus can race with exit_io_context(), leading to a
      use-after-free. The reproducer below hits this within a few seconds on
      my 4-core QEMU VM:
      
      #define _GNU_SOURCE
      #include <assert.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      #include <sys/wait.h>
      
      int main(int argc, char **argv)
      {
      	pid_t pid, child;
      	long nproc, i;
      
      	/* ioprio_set(IOPRIO_WHO_PROCESS, 0, IOPRIO_PRIO_VALUE(IOPRIO_CLASS_IDLE, 0)); */
      	syscall(SYS_ioprio_set, 1, 0, 0x6000);
      
      	nproc = sysconf(_SC_NPROCESSORS_ONLN);
      
      	for (i = 0; i < nproc; i++) {
      		pid = fork();
      		assert(pid != -1);
      		if (pid == 0) {
      			for (;;) {
      				pid = fork();
      				assert(pid != -1);
      				if (pid == 0) {
      					_exit(0);
      				} else {
      					child = wait(NULL);
      					assert(child == pid);
      				}
      			}
      		}
      
      		pid = fork();
      		assert(pid != -1);
      		if (pid == 0) {
      			for (;;) {
      				/* ioprio_get(IOPRIO_WHO_PGRP, 0); */
      				syscall(SYS_ioprio_get, 2, 0);
      			}
      		}
      	}
      
      	for (;;) {
      		/* ioprio_get(IOPRIO_WHO_PGRP, 0); */
      		syscall(SYS_ioprio_get, 2, 0);
      	}
      
      	return 0;
      }
      
      This gets us KASAN dumps like this:
      
      [   35.526914] ==================================================================
      [   35.530009] BUG: KASAN: out-of-bounds in get_task_ioprio+0x7b/0x90 at addr ffff880066f34e6c
      [   35.530009] Read of size 2 by task ioprio-gpf/363
      [   35.530009] =============================================================================
      [   35.530009] BUG blkdev_ioc (Not tainted): kasan: bad access detected
      [   35.530009] -----------------------------------------------------------------------------
      
      [   35.530009] Disabling lock debugging due to kernel taint
      [   35.530009] INFO: Allocated in create_task_io_context+0x2b/0x370 age=0 cpu=0 pid=360
      [   35.530009] 	___slab_alloc+0x55d/0x5a0
      [   35.530009] 	__slab_alloc.isra.20+0x2b/0x40
      [   35.530009] 	kmem_cache_alloc_node+0x84/0x200
      [   35.530009] 	create_task_io_context+0x2b/0x370
      [   35.530009] 	get_task_io_context+0x92/0xb0
      [   35.530009] 	copy_process.part.8+0x5029/0x5660
      [   35.530009] 	_do_fork+0x155/0x7e0
      [   35.530009] 	SyS_clone+0x19/0x20
      [   35.530009] 	do_syscall_64+0x195/0x3a0
      [   35.530009] 	return_from_SYSCALL_64+0x0/0x6a
      [   35.530009] INFO: Freed in put_io_context+0xe7/0x120 age=0 cpu=0 pid=1060
      [   35.530009] 	__slab_free+0x27b/0x3d0
      [   35.530009] 	kmem_cache_free+0x1fb/0x220
      [   35.530009] 	put_io_context+0xe7/0x120
      [   35.530009] 	put_io_context_active+0x238/0x380
      [   35.530009] 	exit_io_context+0x66/0x80
      [   35.530009] 	do_exit+0x158e/0x2b90
      [   35.530009] 	do_group_exit+0xe5/0x2b0
      [   35.530009] 	SyS_exit_group+0x1d/0x20
      [   35.530009] 	entry_SYSCALL_64_fastpath+0x1a/0xa4
      [   35.530009] INFO: Slab 0xffffea00019bcd00 objects=20 used=4 fp=0xffff880066f34ff0 flags=0x1fffe0000004080
      [   35.530009] INFO: Object 0xffff880066f34e58 @offset=3672 fp=0x0000000000000001
      [   35.530009] ==================================================================
      
      Fix it by grabbing the task lock while we poke at the io_context.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      b86ef7ef
    • Borislav Petkov's avatar
      x86/amd_nb: Fix boot crash on non-AMD systems · 82736d51
      Borislav Petkov authored
      [ Upstream commit 1ead852d ]
      
      Fix boot crash that triggers if this driver is built into a kernel and
      run on non-AMD systems.
      
      AMD northbridges users call amd_cache_northbridges() and it returns
      a negative value to signal that we weren't able to cache/detect any
      northbridges on the system.
      
      At least, it should do so as all its callers expect it to do so. But it
      does return a negative value only when kmalloc() fails.
      
      Fix it to return -ENODEV if there are no NBs cached as otherwise, amd_nb
      users like amd64_edac, for example, which relies on it to know whether
      it should load or not, gets loaded on systems like Intel Xeons where it
      shouldn't.
      Reported-and-tested-by: default avatarTony Battersby <tonyb@cybernetics.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: <stable@vger.kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1466097230-5333-2-git-send-email-bp@alien8.de
      Link: https://lkml.kernel.org/r/5761BEB0.9000807@cybernetics.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      82736d51
    • Takashi Iwai's avatar
      ALSA: au88x0: Fix calculation in vortex_wtdma_bufshift() · fa48403a
      Takashi Iwai authored
      [ Upstream commit 62db7152 ]
      
      vortex_wtdma_bufshift() function does calculate the page index
      wrongly, first masking then shift, which always results in zero.
      The proper computation is to first shift, then mask.
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      fa48403a
    • Brian King's avatar
      ipr: Clear interrupt on croc/crocodile when running with LSI · 9f0a4f41
      Brian King authored
      [ Upstream commit 54e430bb ]
      
      If we fall back to using LSI on the Croc or Crocodile chip we need to
      clear the interrupt so we don't hang the system.
      
      Cc: <stable@vger.kernel.org>
      Tested-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      9f0a4f41
    • Christophe JAILLET's avatar
      ALSA: echoaudio: Fix memory allocation · 9d171576
      Christophe JAILLET authored
      [ Upstream commit 9c6795a9 ]
      
      'commpage_bak' is allocated with 'sizeof(struct echoaudio)' bytes.
      We then copy 'sizeof(struct comm_page)' bytes in it.
      On my system, smatch complains because one is 2960 and the other is 3072.
      
      This would result in memory corruption or a oops.
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      9d171576
  3. 13 Jul, 2016 1 commit
  4. 12 Jul, 2016 30 commits
    • Hugh Dickins's avatar
      tmpfs: fix regression hang in fallocate undo · 52a8202d
      Hugh Dickins authored
      [ Upstream commit 7f556567 ]
      
      The well-spotted fallocate undo fix is good in most cases, but not when
      fallocate failed on the very first page.  index 0 then passes lend -1
      to shmem_undo_range(), and that has two bad effects: (a) that it will
      undo every fallocation throughout the file, unrestricted by the current
      range; but more importantly (b) it can cause the undo to hang, because
      lend -1 is treated as truncation, which makes it keep on retrying until
      every page has gone, but those already fully instantiated will never go
      away.  Big thank you to xfstests generic/269 which demonstrates this.
      
      Fixes: b9b4bb26 ("tmpfs: don't undo fallocate past its last page")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      52a8202d
    • Florian Westphal's avatar
      netfilter: x_tables: introduce and use xt_copy_counters_from_user · 6a9f9d4e
      Florian Westphal authored
      [ Upstream commit d7591f0c ]
      
      The three variants use same copy&pasted code, condense this into a
      helper and use that.
      
      Make sure info.name is 0-terminated.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      6a9f9d4e
    • Florian Westphal's avatar
      netfilter: x_tables: do compat validation via translate_table · aaa155f8
      Florian Westphal authored
      [ Upstream commit 09d96860 ]
      
      This looks like refactoring, but its also a bug fix.
      
      Problem is that the compat path (32bit iptables, 64bit kernel) lacks a few
      sanity tests that are done in the normal path.
      
      For example, we do not check for underflows and the base chain policies.
      
      While its possible to also add such checks to the compat path, its more
      copy&pastry, for instance we cannot reuse check_underflow() helper as
      e->target_offset differs in the compat case.
      
      Other problem is that it makes auditing for validation errors harder; two
      places need to be checked and kept in sync.
      
      At a high level 32 bit compat works like this:
      1- initial pass over blob:
         validate match/entry offsets, bounds checking
         lookup all matches and targets
         do bookkeeping wrt. size delta of 32/64bit structures
         assign match/target.u.kernel pointer (points at kernel
         implementation, needed to access ->compatsize etc.)
      
      2- allocate memory according to the total bookkeeping size to
         contain the translated ruleset
      
      3- second pass over original blob:
         for each entry, copy the 32bit representation to the newly allocated
         memory.  This also does any special match translations (e.g.
         adjust 32bit to 64bit longs, etc).
      
      4- check if ruleset is free of loops (chase all jumps)
      
      5-first pass over translated blob:
         call the checkentry function of all matches and targets.
      
      The alternative implemented by this patch is to drop steps 3&4 from the
      compat process, the translation is changed into an intermediate step
      rather than a full 1:1 translate_table replacement.
      
      In the 2nd pass (step #3), change the 64bit ruleset back to a kernel
      representation, i.e. put() the kernel pointer and restore ->u.user.name .
      
      This gets us a 64bit ruleset that is in the format generated by a 64bit
      iptables userspace -- we can then use translate_table() to get the
      'native' sanity checks.
      
      This has two drawbacks:
      
      1. we re-validate all the match and target entry structure sizes even
      though compat translation is supposed to never generate bogus offsets.
      2. we put and then re-lookup each match and target.
      
      THe upside is that we get all sanity tests and ruleset validations
      provided by the normal path and can remove some duplicated compat code.
      
      iptables-restore time of autogenerated ruleset with 300k chains of form
      -A CHAIN0001 -m limit --limit 1/s -j CHAIN0002
      -A CHAIN0002 -m limit --limit 1/s -j CHAIN0003
      
      shows no noticeable differences in restore times:
      old:   0m30.796s
      new:   0m31.521s
      64bit: 0m25.674s
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      aaa155f8
    • Florian Westphal's avatar
      netfilter: x_tables: xt_compat_match_from_user doesn't need a retval · edd8ae36
      Florian Westphal authored
      [ Upstream commit 0188346f ]
      
      Always returned 0.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      edd8ae36
    • Florian Westphal's avatar
      accd9c00
    • Florian Westphal's avatar
      ad4ef8cc
    • Florian Westphal's avatar
      cae1c6c2
    • Florian Westphal's avatar
      netfilter: x_tables: don't reject valid target size on some architectures · bbb7ecc8
      Florian Westphal authored
      [ Upstream commit 7b7eba0f ]
      
      Quoting John Stultz:
        In updating a 32bit arm device from 4.6 to Linus' current HEAD, I
        noticed I was having some trouble with networking, and realized that
        /proc/net/ip_tables_names was suddenly empty.
        Digging through the registration process, it seems we're catching on the:
      
         if (strcmp(t->u.user.name, XT_STANDARD_TARGET) == 0 &&
             target_offset + sizeof(struct xt_standard_target) != next_offset)
               return -EINVAL;
      
        Where next_offset seems to be 4 bytes larger then the
        offset + standard_target struct size.
      
      next_offset needs to be aligned via XT_ALIGN (so we can access all members
      of ip(6)t_entry struct).
      
      This problem didn't show up on i686 as it only needs 4-byte alignment for
      u64, but iptables userspace on other 32bit arches does insert extra padding.
      Reported-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Tested-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Fixes: 7ed2abdd ("netfilter: x_tables: check standard target size too")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      bbb7ecc8
    • Florian Westphal's avatar
      netfilter: x_tables: validate all offsets and sizes in a rule · 3b94ada4
      Florian Westphal authored
      [ Upstream commit 13631bfc ]
      
      Validate that all matches (if any) add up to the beginning of
      the target and that each match covers at least the base structure size.
      
      The compat path should be able to safely re-use the function
      as the structures only differ in alignment; added a
      BUILD_BUG_ON just in case we have an arch that adds padding as well.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      3b94ada4
    • Florian Westphal's avatar
      netfilter: x_tables: check for bogus target offset · 7ba6a7df
      Florian Westphal authored
      [ Upstream commit ce683e5f ]
      
      We're currently asserting that targetoff + targetsize <= nextoff.
      
      Extend it to also check that targetoff is >= sizeof(xt_entry).
      Since this is generic code, add an argument pointing to the start of the
      match/target, we can then derive the base structure size from the delta.
      
      We also need the e->elems pointer in a followup change to validate matches.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      7ba6a7df
    • Florian Westphal's avatar
      netfilter: x_tables: check standard target size too · c1380ecb
      Florian Westphal authored
      [ Upstream commit 7ed2abdd ]
      
      We have targets and standard targets -- the latter carries a verdict.
      
      The ip/ip6tables validation functions will access t->verdict for the
      standard targets to fetch the jump offset or verdict for chainloop
      detection, but this happens before the targets get checked/validated.
      
      Thus we also need to check for verdict presence here, else t->verdict
      can point right after a blob.
      
      Spotted with UBSAN while testing malformed blobs.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      c1380ecb
    • Florian Westphal's avatar
      netfilter: x_tables: add compat version of xt_check_entry_offsets · 7ef13f49
      Florian Westphal authored
      [ Upstream commit fc1221b3 ]
      
      32bit rulesets have different layout and alignment requirements, so once
      more integrity checks get added to xt_check_entry_offsets it will reject
      well-formed 32bit rulesets.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      7ef13f49
    • Florian Westphal's avatar
      netfilter: x_tables: assert minimum target size · 37a6fed6
      Florian Westphal authored
      [ Upstream commit a08e4e19 ]
      
      The target size includes the size of the xt_entry_target struct.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      37a6fed6
    • Florian Westphal's avatar
      netfilter: x_tables: kill check_entry helper · dda46754
      Florian Westphal authored
      [ Upstream commit aa412ba2 ]
      
      Once we add more sanity testing to xt_check_entry_offsets it
      becomes relvant if we're expecting a 32bit 'config_compat' blob
      or a normal one.
      
      Since we already have a lot of similar-named functions (check_entry,
      compat_check_entry, find_and_check_entry, etc.) and the current
      incarnation is short just fold its contents into the callers.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      dda46754
    • Florian Westphal's avatar
      netfilter: x_tables: add and use xt_check_entry_offsets · 62e6fd20
      Florian Westphal authored
      [ Upstream commit 7d35812c ]
      
      Currently arp/ip and ip6tables each implement a short helper to check that
      the target offset is large enough to hold one xt_entry_target struct and
      that t->u.target_size fits within the current rule.
      
      Unfortunately these checks are not sufficient.
      
      To avoid adding new tests to all of ip/ip6/arptables move the current
      checks into a helper, then extend this helper in followup patches.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      62e6fd20
    • Florian Westphal's avatar
      netfilter: x_tables: validate targets of jumps · 674c7e17
      Florian Westphal authored
      [ Upstream commit 36472341 ]
      
      When we see a jump also check that the offset gets us to beginning of
      a rule (an ipt_entry).
      
      The extra overhead is negible, even with absurd cases.
      
      300k custom rules, 300k jumps to 'next' user chain:
      [ plus one jump from INPUT to first userchain ]:
      
      Before:
      real    0m24.874s
      user    0m7.532s
      sys     0m16.076s
      
      After:
      real    0m27.464s
      user    0m7.436s
      sys     0m18.840s
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      674c7e17
    • Florian Westphal's avatar
      netfilter: x_tables: don't move to non-existent next rule · d1fe825e
      Florian Westphal authored
      [ Upstream commit f24e230d ]
      
      Ben Hawkes says:
      
       In the mark_source_chains function (net/ipv4/netfilter/ip_tables.c) it
       is possible for a user-supplied ipt_entry structure to have a large
       next_offset field. This field is not bounds checked prior to writing a
       counter value at the supplied offset.
      
      Base chains enforce absolute verdict.
      
      User defined chains are supposed to end with an unconditional return,
      xtables userspace adds them automatically.
      
      But if such return is missing we will move to non-existent next rule.
      Reported-by: default avatarBen Hawkes <hawkes@google.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      d1fe825e
    • Florian Westphal's avatar
      netfilter: x_tables: fix unconditional helper · c2a1b8ee
      Florian Westphal authored
      [ Upstream commit 54d83fc7 ]
      
      Ben Hawkes says:
      
       In the mark_source_chains function (net/ipv4/netfilter/ip_tables.c) it
       is possible for a user-supplied ipt_entry structure to have a large
       next_offset field. This field is not bounds checked prior to writing a
       counter value at the supplied offset.
      
      Problem is that mark_source_chains should not have been called --
      the rule doesn't have a next entry, so its supposed to return
      an absolute verdict of either ACCEPT or DROP.
      
      However, the function conditional() doesn't work as the name implies.
      It only checks that the rule is using wildcard address matching.
      
      However, an unconditional rule must also not be using any matches
      (no -m args).
      
      The underflow validator only checked the addresses, therefore
      passing the 'unconditional absolute verdict' test, while
      mark_source_chains also tested for presence of matches, and thus
      proceeeded to the next (not-existent) rule.
      
      Unify this so that all the callers have same idea of 'unconditional rule'.
      Reported-by: default avatarBen Hawkes <hawkes@google.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      c2a1b8ee
    • Florian Westphal's avatar
      netfilter: x_tables: make sure e->next_offset covers remaining blob size · 66b7376b
      Florian Westphal authored
      [ Upstream commit 6e94e0cf ]
      
      Otherwise this function may read data beyond the ruleset blob.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      66b7376b
    • Florian Westphal's avatar
      netfilter: x_tables: validate e->target_offset early · 6a401cf3
      Florian Westphal authored
      [ Upstream commit bdf533de ]
      
      We should check that e->target_offset is sane before
      mark_source_chains gets called since it will fetch the target entry
      for loop detection.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      6a401cf3
    • Ralf Baechle's avatar
      MIPS: Fix 64k page support for 32 bit kernels. · 62b251ba
      Ralf Baechle authored
      [ Upstream commit d7de4134 ]
      
      TASK_SIZE was defined as 0x7fff8000UL which for 64k pages is not a
      multiple of the page size.  Somewhere further down the math fails
      such that executing an ELF binary fails.
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Tested-by: default avatarJoshua Henderson <joshua.henderson@microchip.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      62b251ba
    • David S. Miller's avatar
      sparc64: Fix return from trap window fill crashes. · 78c026dc
      David S. Miller authored
      [ Upstream commit 7cafc0b8 ]
      
      We must handle data access exception as well as memory address unaligned
      exceptions from return from trap window fill faults, not just normal
      TLB misses.
      
      Otherwise we can get an OOPS that looks like this:
      
      ld-linux.so.2(36808): Kernel bad sw trap 5 [#1]
      CPU: 1 PID: 36808 Comm: ld-linux.so.2 Not tainted 4.6.0 #34
      task: fff8000303be5c60 ti: fff8000301344000 task.ti: fff8000301344000
      TSTATE: 0000004410001601 TPC: 0000000000a1a784 TNPC: 0000000000a1a788 Y: 00000002    Not tainted
      TPC: <do_sparc64_fault+0x5c4/0x700>
      g0: fff8000024fc8248 g1: 0000000000db04dc g2: 0000000000000000 g3: 0000000000000001
      g4: fff8000303be5c60 g5: fff800030e672000 g6: fff8000301344000 g7: 0000000000000001
      o0: 0000000000b95ee8 o1: 000000000000012b o2: 0000000000000000 o3: 0000000200b9b358
      o4: 0000000000000000 o5: fff8000301344040 sp: fff80003013475c1 ret_pc: 0000000000a1a77c
      RPC: <do_sparc64_fault+0x5bc/0x700>
      l0: 00000000000007ff l1: 0000000000000000 l2: 000000000000005f l3: 0000000000000000
      l4: fff8000301347e98 l5: fff8000024ff3060 l6: 0000000000000000 l7: 0000000000000000
      i0: fff8000301347f60 i1: 0000000000102400 i2: 0000000000000000 i3: 0000000000000000
      i4: 0000000000000000 i5: 0000000000000000 i6: fff80003013476a1 i7: 0000000000404d4c
      I7: <user_rtt_fill_fixup+0x6c/0x7c>
      Call Trace:
       [0000000000404d4c] user_rtt_fill_fixup+0x6c/0x7c
      
      The window trap handlers are slightly clever, the trap table entries for them are
      composed of two pieces of code.  First comes the code that actually performs
      the window fill or spill trap handling, and then there are three instructions at
      the end which are for exception processing.
      
      The userland register window fill handler is:
      
      	add	%sp, STACK_BIAS + 0x00, %g1;		\
      	ldxa	[%g1 + %g0] ASI, %l0;			\
      	mov	0x08, %g2;				\
      	mov	0x10, %g3;				\
      	ldxa	[%g1 + %g2] ASI, %l1;			\
      	mov	0x18, %g5;				\
      	ldxa	[%g1 + %g3] ASI, %l2;			\
      	ldxa	[%g1 + %g5] ASI, %l3;			\
      	add	%g1, 0x20, %g1;				\
      	ldxa	[%g1 + %g0] ASI, %l4;			\
      	ldxa	[%g1 + %g2] ASI, %l5;			\
      	ldxa	[%g1 + %g3] ASI, %l6;			\
      	ldxa	[%g1 + %g5] ASI, %l7;			\
      	add	%g1, 0x20, %g1;				\
      	ldxa	[%g1 + %g0] ASI, %i0;			\
      	ldxa	[%g1 + %g2] ASI, %i1;			\
      	ldxa	[%g1 + %g3] ASI, %i2;			\
      	ldxa	[%g1 + %g5] ASI, %i3;			\
      	add	%g1, 0x20, %g1;				\
      	ldxa	[%g1 + %g0] ASI, %i4;			\
      	ldxa	[%g1 + %g2] ASI, %i5;			\
      	ldxa	[%g1 + %g3] ASI, %i6;			\
      	ldxa	[%g1 + %g5] ASI, %i7;			\
      	restored;					\
      	retry; nop; nop; nop; nop;			\
      	b,a,pt	%xcc, fill_fixup_dax;			\
      	b,a,pt	%xcc, fill_fixup_mna;			\
      	b,a,pt	%xcc, fill_fixup;
      
      And the way this works is that if any of those memory accesses
      generate an exception, the exception handler can revector to one of
      those final three branch instructions depending upon which kind of
      exception the memory access took.  In this way, the fault handler
      doesn't have to know if it was a spill or a fill that it's handling
      the fault for.  It just always branches to the last instruction in
      the parent trap's handler.
      
      For example, for a regular fault, the code goes:
      
      winfix_trampoline:
      	rdpr	%tpc, %g3
      	or	%g3, 0x7c, %g3
      	wrpr	%g3, %tnpc
      	done
      
      All window trap handlers are 0x80 aligned, so if we "or" 0x7c into the
      trap time program counter, we'll get that final instruction in the
      trap handler.
      
      On return from trap, we have to pull the register window in but we do
      this by hand instead of just executing a "restore" instruction for
      several reasons.  The largest being that from Niagara and onward we
      simply don't have enough levels in the trap stack to fully resolve all
      possible exception cases of a window fault when we are already at
      trap level 1 (which we enter to get ready to return from the original
      trap).
      
      This is executed inline via the FILL_*_RTRAP handlers.  rtrap_64.S's
      code branches directly to these to do the window fill by hand if
      necessary.  Now if you look at them, we'll see at the end:
      
      	    ba,a,pt    %xcc, user_rtt_fill_fixup;
      	    ba,a,pt    %xcc, user_rtt_fill_fixup;
      	    ba,a,pt    %xcc, user_rtt_fill_fixup;
      
      And oops, all three cases are handled like a fault.
      
      This doesn't work because each of these trap types (data access
      exception, memory address unaligned, and faults) store their auxiliary
      info in different registers to pass on to the C handler which does the
      real work.
      
      So in the case where the stack was unaligned, the unaligned trap
      handler sets up the arg registers one way, and then we branched to
      the fault handler which expects them setup another way.
      
      So the FAULT_TYPE_* value ends up basically being garbage, and
      randomly would generate the backtrace seen above.
      Reported-by: default avatarNick Alcock <nix@esperi.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      78c026dc
    • David S. Miller's avatar
      sparc: Harden signal return frame checks. · 16c19336
      David S. Miller authored
      [ Upstream commit d11c2a0d ]
      
      All signal frames must be at least 16-byte aligned, because that is
      the alignment we explicitly create when we build signal return stack
      frames.
      
      All stack pointers must be at least 8-byte aligned.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      16c19336
    • David S. Miller's avatar
      sparc64: Take ctx_alloc_lock properly in hugetlb_setup(). · 705de0f2
      David S. Miller authored
      [ Upstream commit 9ea46abe ]
      
      On cheetahplus chips we take the ctx_alloc_lock in order to
      modify the TLB lookup parameters for the indexed TLBs, which
      are stored in the context register.
      
      This is called with interrupts disabled, however ctx_alloc_lock
      is an IRQ safe lock, therefore we must take acquire/release it
      properly with spin_{lock,unlock}_irq().
      Reported-by: default avatarMeelis Roos <mroos@linux.ee>
      Tested-by: default avatarMeelis Roos <mroos@linux.ee>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      705de0f2
    • Babu Moger's avatar
      sparc/PCI: Fix for panic while enabling SR-IOV · 2a3e4b3c
      Babu Moger authored
      [ Upstream commit d0c31e02 ]
      
      We noticed this panic while enabling SR-IOV in sparc.
      
      mlx4_core: Mellanox ConnectX core driver v2.2-1 (Jan  1 2015)
      mlx4_core: Initializing 0007:01:00.0
      mlx4_core 0007:01:00.0: Enabling SR-IOV with 5 VFs
      mlx4_core: Initializing 0007:01:00.1
      Unable to handle kernel NULL pointer dereference
      insmod(10010): Oops [#1]
      CPU: 391 PID: 10010 Comm: insmod Not tainted
      		4.1.12-32.el6uek.kdump2.sparc64 #1
      TPC: <dma_supported+0x20/0x80>
      I7: <__mlx4_init_one+0x324/0x500 [mlx4_core]>
      Call Trace:
       [00000000104c5ea4] __mlx4_init_one+0x324/0x500 [mlx4_core]
       [00000000104c613c] mlx4_init_one+0xbc/0x120 [mlx4_core]
       [0000000000725f14] local_pci_probe+0x34/0xa0
       [0000000000726028] pci_call_probe+0xa8/0xe0
       [0000000000726310] pci_device_probe+0x50/0x80
       [000000000079f700] really_probe+0x140/0x420
       [000000000079fa24] driver_probe_device+0x44/0xa0
       [000000000079fb5c] __device_attach+0x3c/0x60
       [000000000079d85c] bus_for_each_drv+0x5c/0xa0
       [000000000079f588] device_attach+0x88/0xc0
       [000000000071acd0] pci_bus_add_device+0x30/0x80
       [0000000000736090] virtfn_add.clone.1+0x210/0x360
       [00000000007364a4] sriov_enable+0x2c4/0x520
       [000000000073672c] pci_enable_sriov+0x2c/0x40
       [00000000104c2d58] mlx4_enable_sriov+0xf8/0x180 [mlx4_core]
       [00000000104c49ac] mlx4_load_one+0x42c/0xd40 [mlx4_core]
      Disabling lock debugging due to kernel taint
      Caller[00000000104c5ea4]: __mlx4_init_one+0x324/0x500 [mlx4_core]
      Caller[00000000104c613c]: mlx4_init_one+0xbc/0x120 [mlx4_core]
      Caller[0000000000725f14]: local_pci_probe+0x34/0xa0
      Caller[0000000000726028]: pci_call_probe+0xa8/0xe0
      Caller[0000000000726310]: pci_device_probe+0x50/0x80
      Caller[000000000079f700]: really_probe+0x140/0x420
      Caller[000000000079fa24]: driver_probe_device+0x44/0xa0
      Caller[000000000079fb5c]: __device_attach+0x3c/0x60
      Caller[000000000079d85c]: bus_for_each_drv+0x5c/0xa0
      Caller[000000000079f588]: device_attach+0x88/0xc0
      Caller[000000000071acd0]: pci_bus_add_device+0x30/0x80
      Caller[0000000000736090]: virtfn_add.clone.1+0x210/0x360
      Caller[00000000007364a4]: sriov_enable+0x2c4/0x520
      Caller[000000000073672c]: pci_enable_sriov+0x2c/0x40
      Caller[00000000104c2d58]: mlx4_enable_sriov+0xf8/0x180 [mlx4_core]
      Caller[00000000104c49ac]: mlx4_load_one+0x42c/0xd40 [mlx4_core]
      Caller[00000000104c5f90]: __mlx4_init_one+0x410/0x500 [mlx4_core]
      Caller[00000000104c613c]: mlx4_init_one+0xbc/0x120 [mlx4_core]
      Caller[0000000000725f14]: local_pci_probe+0x34/0xa0
      Caller[0000000000726028]: pci_call_probe+0xa8/0xe0
      Caller[0000000000726310]: pci_device_probe+0x50/0x80
      Caller[000000000079f700]: really_probe+0x140/0x420
      Caller[000000000079fa24]: driver_probe_device+0x44/0xa0
      Caller[000000000079fb08]: __driver_attach+0x88/0xa0
      Caller[000000000079d90c]: bus_for_each_dev+0x6c/0xa0
      Caller[000000000079f29c]: driver_attach+0x1c/0x40
      Caller[000000000079e35c]: bus_add_driver+0x17c/0x220
      Caller[00000000007a02d4]: driver_register+0x74/0x120
      Caller[00000000007263fc]: __pci_register_driver+0x3c/0x60
      Caller[00000000104f62bc]: mlx4_init+0x60/0xcc [mlx4_core]
      Kernel panic - not syncing: Fatal exception
      Press Stop-A (L1-A) to return to the boot prom
      ---[ end Kernel panic - not syncing: Fatal exception
      
      Details:
      Here is the call sequence
      virtfn_add->__mlx4_init_one->dma_set_mask->dma_supported
      
      The panic happened at line 760(file arch/sparc/kernel/iommu.c)
      
      758 int dma_supported(struct device *dev, u64 device_mask)
      759 {
      760         struct iommu *iommu = dev->archdata.iommu;
      761         u64 dma_addr_mask = iommu->dma_addr_mask;
      762
      763         if (device_mask >= (1UL << 32UL))
      764                 return 0;
      765
      766         if ((device_mask & dma_addr_mask) == dma_addr_mask)
      767                 return 1;
      768
      769 #ifdef CONFIG_PCI
      770         if (dev_is_pci(dev))
      771		return pci64_dma_supported(to_pci_dev(dev), device_mask);
      772 #endif
      773
      774         return 0;
      775 }
      776 EXPORT_SYMBOL(dma_supported);
      
      Same panic happened with Intel ixgbe driver also.
      
      SR-IOV code looks for arch specific data while enabling
      VFs. When VF device is added, driver probe function makes set
      of calls to initialize the pci device. Because the VF device is
      added different way than the normal PF device(which happens via
      of_create_pci_dev for sparc), some of the arch specific initialization
      does not happen for VF device.  That causes panic when archdata is
      accessed.
      
      To fix this, I have used already defined weak function
      pcibios_setup_device to copy archdata from PF to VF.
      Also verified the fix.
      Signed-off-by: default avatarBabu Moger <babu.moger@oracle.com>
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Reviewed-by: default avatarEthan Zhao <ethan.zhao@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      2a3e4b3c
    • David S. Miller's avatar
      sparc64: Fix sparc64_set_context stack handling. · b9dcd3de
      David S. Miller authored
      [ Upstream commit 397d1533 ]
      
      Like a signal return, we should use synchronize_user_stack() rather
      than flush_user_windows().
      Reported-by: default avatarIlya Malakhov <ilmalakhovthefirst@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      b9dcd3de
    • Nitin Gupta's avatar
      sparc64: Fix numa node distance initialization · 0396a871
      Nitin Gupta authored
      [ Upstream commit 36beca65 ]
      
      Orabug: 22495713
      
      Currently, NUMA node distance matrix is initialized only
      when a machine descriptor (MD) exists. However, sun4u
      machines (e.g. Sun Blade 2500) do not have an MD and thus
      distance values were left uninitialized. The initialization
      is now moved such that it happens on both sun4u and sun4v.
      Signed-off-by: default avatarNitin Gupta <nitin.m.gupta@oracle.com>
      Tested-by: default avatarMikael Pettersson <mikpelinux@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      0396a871
    • David S. Miller's avatar
      sparc64: Fix bootup regressions on some Kconfig combinations. · df7136e7
      David S. Miller authored
      [ Upstream commit 49fa5230 ]
      
      The system call tracing bug fix mentioned in the Fixes tag
      below increased the amount of assembler code in the sequence
      of assembler files included by head_64.S
      
      This caused to total set of code to exceed 0x4000 bytes in
      size, which overflows the expression in head_64.S that works
      to place swapper_tsb at address 0x408000.
      
      When this is violated, the TSB is not properly aligned, and
      also the trap table is not aligned properly either.  All of
      this together results in failed boots.
      
      So, do two things:
      
      1) Simplify some code by using ba,a instead of ba/nop to get
         those bytes back.
      
      2) Add a linker script assertion to make sure that if this
         happens again the build will fail.
      
      Fixes: 1a40b953 ("sparc: Fix system call tracing register handling.")
      Reported-by: default avatarMeelis Roos <mroos@linux.ee>
      Reported-by: default avatarJoerg Abraham <joerg.abraham@nokia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      df7136e7
    • Mike Frysinger's avatar
      sparc: Fix system call tracing register handling. · 2129cf58
      Mike Frysinger authored
      [ Upstream commit 1a40b953 ]
      
      A system call trace trigger on entry allows the tracing
      process to inspect and potentially change the traced
      process's registers.
      
      Account for that by reloading the %g1 (syscall number)
      and %i0-%i5 (syscall argument) values.  We need to be
      careful to revalidate the range of %g1, and reload the
      system call table entry it corresponds to into %l7.
      Reported-by: default avatarMike Frysinger <vapier@gentoo.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Tested-by: default avatarMike Frysinger <vapier@gentoo.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      2129cf58
    • Yuchung Cheng's avatar
      tcp: record TLP and ER timer stats in v6 stats · 25b37ef7
      Yuchung Cheng authored
      [ Upstream commit ce3cf4ec ]
      
      The v6 tcp stats scan do not provide TLP and ER timer information
      correctly like the v4 version . This patch fixes that.
      
      Fixes: 6ba8a3b1 ("tcp: Tail loss probe (TLP)")
      Fixes: eed530b6 ("tcp: early retransmit")
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      25b37ef7