1. 17 Jun, 2017 8 commits
  2. 14 Jun, 2017 32 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.4.72 · 30c9187f
      Greg Kroah-Hartman authored
      30c9187f
    • Mark Rutland's avatar
      arm64: ensure extension of smp_store_release value · 4e528eb9
      Mark Rutland authored
      commit 994870be upstream.
      
      When an inline assembly operand's type is narrower than the register it
      is allocated to, the least significant bits of the register (up to the
      operand type's width) are valid, and any other bits are permitted to
      contain any arbitrary value. This aligns with the AAPCS64 parameter
      passing rules.
      
      Our __smp_store_release() implementation does not account for this, and
      implicitly assumes that operands have been zero-extended to the width of
      the type being stored to. Thus, we may store unknown values to memory
      when the value type is narrower than the pointer type (e.g. when storing
      a char to a long).
      
      This patch fixes the issue by casting the value operand to the same
      width as the pointer operand in all cases, which ensures that the value
      is zero-extended as we expect. We use the same union trickery as
      __smp_load_acquire and {READ,WRITE}_ONCE() to avoid GCC complaining that
      pointers are potentially cast to narrower width integers in unreachable
      paths.
      
      A whitespace issue at the top of __smp_store_release() is also
      corrected.
      
      No changes are necessary for __smp_load_acquire(). Load instructions
      implicitly clear any upper bits of the register, and the compiler will
      only consider the least significant bits of the register as valid
      regardless.
      
      Fixes: 47933ad4 ("arch: Introduce smp_load_acquire(), smp_store_release()")
      Fixes: 878a84d5 ("arm64: add missing data types in smp_load_acquire/smp_store_release")
      Cc: <stable@vger.kernel.org> # 3.14.x-
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Matthias Kaehlcke <mka@chromium.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4e528eb9
    • Mark Rutland's avatar
      arm64: armv8_deprecated: ensure extension of addr · 01ce16f4
      Mark Rutland authored
      commit 55de49f9 upstream.
      
      Our compat swp emulation holds the compat user address in an unsigned
      int, which it passes to __user_swpX_asm(). When a 32-bit value is passed
      in a register, the upper 32 bits of the register are unknown, and we
      must extend the value to 64 bits before we can use it as a base address.
      
      This patch casts the address to unsigned long to ensure it has been
      suitably extended, avoiding the potential issue, and silencing a related
      warning from clang.
      
      Fixes: bd35a4ad ("arm64: Port SWP/SWPB emulation support from arm")
      Cc: <stable@vger.kernel.org> # 3.19.x-
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      01ce16f4
    • Kees Cook's avatar
      usercopy: Adjust tests to deal with SMAP/PAN · 51ff10e7
      Kees Cook authored
      commit f5f893c5 upstream.
      
      Under SMAP/PAN/etc, we cannot write directly to userspace memory, so
      this rearranges the test bytes to get written through copy_to_user().
      Additionally drops the bad copy_from_user() test that would trigger a
      memcpy() against userspace on failure.
      
      [arnd: the test module was added in 3.14, and this backported patch
             should apply cleanly on all version from 3.14 to 4.10.
             The original patch was in 4.11 on top of a context change
             I saw the bug triggered with kselftest on a 4.4.y stable kernel]
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      51ff10e7
    • Mike Marciniszyn's avatar
      RDMA/qib,hfi1: Fix MR reference count leak on write with immediate · 746d4893
      Mike Marciniszyn authored
      commit 1feb4006 upstream.
      
      The handling of IB_RDMA_WRITE_ONLY_WITH_IMMEDIATE will leak a memory
      reference when a buffer cannot be allocated for returning the immediate
      data.
      
      The issue is that the rkey validation has already occurred and the RNR
      nak fails to release the reference that was fruitlessly gotten.  The
      the peer will send the identical single packet request when its RNR
      timer pops.
      
      The fix is to release the held reference prior to the rnr nak exit.
      This is the only sequence the requires both rkey validation and the
      buffer allocation on the same packet.
      
      Cc: Stable <stable@vger.kernel.org> # 4.7+
      Tested-by: default avatarTadeusz Struk <tadeusz.struk@intel.com>
      Reviewed-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      746d4893
    • Kristina Martsenko's avatar
      arm64: entry: improve data abort handling of tagged pointers · 3ccf6956
      Kristina Martsenko authored
      commit 276e9327 upstream.
      
      This backport has a minor difference from the upstream commit: it adds
      the asm-uaccess.h file, which is not present in 4.4, because 4.4 does
      not have commit b4b8664d ("arm64: don't pull uaccess.h into *.S").
      
      Original patch description:
      
      When handling a data abort from EL0, we currently zero the top byte of
      the faulting address, as we assume the address is a TTBR0 address, which
      may contain a non-zero address tag. However, the address may be a TTBR1
      address, in which case we should not zero the top byte. This patch fixes
      that. The effect is that the full TTBR1 address is passed to the task's
      signal handler (or printed out in the kernel log).
      
      When handling a data abort from EL1, we leave the faulting address
      intact, as we assume it's either a TTBR1 address or a TTBR0 address with
      tag 0x00. This is true as far as I'm aware, we don't seem to access a
      tagged TTBR0 address anywhere in the kernel. Regardless, it's easy to
      forget about address tags, and code added in the future may not always
      remember to remove tags from addresses before accessing them. So add tag
      handling to the EL1 data abort handler as well. This also makes it
      consistent with the EL0 data abort handler.
      
      Fixes: d50240a5 ("arm64: mm: permit use of tagged pointers at EL0")
      Reviewed-by: default avatarDave Martin <Dave.Martin@arm.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarKristina Martsenko <kristina.martsenko@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3ccf6956
    • Kristina Martsenko's avatar
      arm64: hw_breakpoint: fix watchpoint matching for tagged pointers · 4eaef365
      Kristina Martsenko authored
      commit 7dcd9dd8 upstream.
      
      This backport has a few small differences from the upstream commit:
       - The address tag is removed in watchpoint_handler() instead of
         get_distance_from_watchpoint(), because 4.4 does not have commit
         fdfeff0f ("arm64: hw_breakpoint: Handle inexact watchpoint
         addresses").
       - A macro is backported (untagged_addr), as it is not present in 4.4.
      
      Original patch description:
      
      When we take a watchpoint exception, the address that triggered the
      watchpoint is found in FAR_EL1. We compare it to the address of each
      configured watchpoint to see which one was hit.
      
      The configured watchpoint addresses are untagged, while the address in
      FAR_EL1 will have an address tag if the data access was done using a
      tagged address. The tag needs to be removed to compare the address to
      the watchpoints.
      
      Currently we don't remove it, and as a result can report the wrong
      watchpoint as being hit (specifically, always either the highest TTBR0
      watchpoint or lowest TTBR1 watchpoint). This patch removes the tag.
      
      Fixes: d50240a5 ("arm64: mm: permit use of tagged pointers at EL0")
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarKristina Martsenko <kristina.martsenko@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4eaef365
    • Artem Savkov's avatar
      Make __xfs_xattr_put_listen preperly report errors. · bc5f31d3
      Artem Savkov authored
      commit 791cc43b upstream.
      
      Commit 2a6fba6d "xfs: only return -errno or success from attr ->put_listent"
      changes the returnvalue of __xfs_xattr_put_listen to 0 in case when there is
      insufficient space in the buffer assuming that setting context->count to -1
      would be enough, but all of the ->put_listent callers only check seen_enough.
      This results in a failed assertion:
      XFS: Assertion failed: context->count >= 0, file: fs/xfs/xfs_xattr.c, line: 175
      in insufficient buffer size case.
      
      This is only reproducible with at least 2 xattrs and only when the buffer
      gets depleted before the last one.
      
      Furthermore if buffersize is such that it is enough to hold the last xattr's
      name, but not enough to hold the sum of preceeding xattr names listxattr won't
      fail with ERANGE, but will suceed returning last xattr's name without the
      first character. The first character end's up overwriting data stored at
      (context->alist - 1).
      Signed-off-by: default avatarArtem Savkov <asavkov@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Cc: Nikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bc5f31d3
    • Trond Myklebust's avatar
      NFSv4: Don't perform cached access checks before we've OPENed the file · e8a1086a
      Trond Myklebust authored
      commit 762674f8 upstream.
      
      Donald Buczek reports that a nfs4 client incorrectly denies
      execute access based on outdated file mode (missing 'x' bit).
      After the mode on the server is 'fixed' (chmod +x) further execution
      attempts continue to fail, because the nfs ACCESS call updates
      the access parameter but not the mode parameter or the mode in
      the inode.
      
      The root cause is ultimately that the VFS is calling may_open()
      before the NFS client has a chance to OPEN the file and hence revalidate
      the access and attribute caches.
      
      Al Viro suggests:
      >>> Make nfs_permission() relax the checks when it sees MAY_OPEN, if you know
      >>> that things will be caught by server anyway?
      >>
      >> That can work as long as we're guaranteed that everything that calls
      >> inode_permission() with MAY_OPEN on a regular file will also follow up
      >> with a vfs_open() or dentry_open() on success. Is this always the
      >> case?
      >
      > 1) in do_tmpfile(), followed by do_dentry_open() (not reachable by NFS since
      > it doesn't have ->tmpfile() instance anyway)
      >
      > 2) in atomic_open(), after the call of ->atomic_open() has succeeded.
      >
      > 3) in do_last(), followed on success by vfs_open()
      >
      > That's all.  All calls of inode_permission() that get MAY_OPEN come from
      > may_open(), and there's no other callers of that puppy.
      Reported-by: default avatarDonald Buczek <buczek@molgen.mpg.de>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=109771
      Link: http://lkml.kernel.org/r/1451046656-26319-1-git-send-email-buczek@molgen.mpg.de
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarPaul Menzel <pmenzel@molgen.mpg.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e8a1086a
    • Trond Myklebust's avatar
      NFS: Ensure we revalidate attributes before using execute_ok() · 53302082
      Trond Myklebust authored
      commit 5c5fc09a upstream.
      
      Donald Buczek reports that NFS clients can also report incorrect
      results for access() due to lack of revalidation of attributes
      before calling execute_ok().
      Looking closely, it seems chdir() is afflicted with the same problem.
      
      Fix is to ensure we call nfs_revalidate_inode_rcu() or
      nfs_revalidate_inode() as appropriate before deciding to trust
      execute_ok().
      Reported-by: default avatarDonald Buczek <buczek@molgen.mpg.de>
      Link: http://lkml.kernel.org/r/1451331530-3748-1-git-send-email-buczek@molgen.mpg.deSigned-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarPaul Menzel <pmenzel@molgen.mpg.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      53302082
    • Michal Hocko's avatar
      mm: consider memblock reservations for deferred memory initialization sizing · cb1fb15c
      Michal Hocko authored
      commit 864b9a39 upstream.
      
      We have seen an early OOM killer invocation on ppc64 systems with
      crashkernel=4096M:
      
      	kthreadd invoked oom-killer: gfp_mask=0x16040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), nodemask=7, order=0, oom_score_adj=0
      	kthreadd cpuset=/ mems_allowed=7
      	CPU: 0 PID: 2 Comm: kthreadd Not tainted 4.4.68-1.gd7fe927-default #1
      	Call Trace:
      	  dump_stack+0xb0/0xf0 (unreliable)
      	  dump_header+0xb0/0x258
      	  out_of_memory+0x5f0/0x640
      	  __alloc_pages_nodemask+0xa8c/0xc80
      	  kmem_getpages+0x84/0x1a0
      	  fallback_alloc+0x2a4/0x320
      	  kmem_cache_alloc_node+0xc0/0x2e0
      	  copy_process.isra.25+0x260/0x1b30
      	  _do_fork+0x94/0x470
      	  kernel_thread+0x48/0x60
      	  kthreadd+0x264/0x330
      	  ret_from_kernel_thread+0x5c/0xa4
      
      	Mem-Info:
      	active_anon:0 inactive_anon:0 isolated_anon:0
      	 active_file:0 inactive_file:0 isolated_file:0
      	 unevictable:0 dirty:0 writeback:0 unstable:0
      	 slab_reclaimable:5 slab_unreclaimable:73
      	 mapped:0 shmem:0 pagetables:0 bounce:0
      	 free:0 free_pcp:0 free_cma:0
      	Node 7 DMA free:0kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:52428800kB managed:110016kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:320kB slab_unreclaimable:4672kB kernel_stack:1152kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
      	lowmem_reserve[]: 0 0 0 0
      	Node 7 DMA: 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 0kB
      	0 total pagecache pages
      	0 pages in swap cache
      	Swap cache stats: add 0, delete 0, find 0/0
      	Free swap  = 0kB
      	Total swap = 0kB
      	819200 pages RAM
      	0 pages HighMem/MovableOnly
      	817481 pages reserved
      	0 pages cma reserved
      	0 pages hwpoisoned
      
      the reason is that the managed memory is too low (only 110MB) while the
      rest of the the 50GB is still waiting for the deferred intialization to
      be done.  update_defer_init estimates the initial memoty to initialize
      to 2GB at least but it doesn't consider any memory allocated in that
      range.  In this particular case we've had
      
      	Reserving 4096MB of memory at 128MB for crashkernel (System RAM: 51200MB)
      
      so the low 2GB is mostly depleted.
      
      Fix this by considering memblock allocations in the initial static
      initialization estimation.  Move the max_initialise to
      reset_deferred_meminit and implement a simple memblock_reserved_memory
      helper which iterates all reserved blocks and sums the size of all that
      start below the given address.  The cumulative size is than added on top
      of the initial estimation.  This is still not ideal because
      reset_deferred_meminit doesn't consider holes and so reservation might
      be above the initial estimation whihch we ignore but let's make the
      logic simpler until we really need to handle more complicated cases.
      
      Fixes: 3a80a7fa ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
      Link: http://lkml.kernel.org/r/20170531104010.GI27783@dhcp22.suse.czSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Tested-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      cb1fb15c
    • Eric Dumazet's avatar
      net: better skb->sender_cpu and skb->napi_id cohabitation · 52d8b8ad
      Eric Dumazet authored
      commit 52bd2d62 upstream.
      
      skb->sender_cpu and skb->napi_id share a common storage,
      and we had various bugs about this.
      
      We had to call skb_sender_cpu_clear() in some places to
      not leave a prior skb->napi_id and fool netdev_pick_tx()
      
      As suggested by Alexei, we could split the space so that
      these errors can not happen.
      
      0 value being reserved as the common (not initialized) value,
      let's reserve [1 .. NR_CPUS] range for valid sender_cpu,
      and [NR_CPUS+1 .. ~0U] for valid napi_id.
      
      This will allow proper busy polling support over tunnels.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Suggested-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: Paul Menzel <pmenzel@molgen.mpg.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      52d8b8ad
    • Takatoshi Akiyama's avatar
      serial: sh-sci: Fix panic when serial console and DMA are enabled · 3c0fcb52
      Takatoshi Akiyama authored
      commit 3c910176 upstream.
      
      This patch fixes an issue that kernel panic happens when DMA is enabled
      and we press enter key while the kernel booting on the serial console.
      
      * An interrupt may occur after sci_request_irq().
      * DMA transfer area is initialized by setup_timer() in sci_request_dma()
        and used in interrupt.
      
      If an interrupt occurred between sci_request_irq() and setup_timer() in
      sci_request_dma(), DMA transfer area has not been initialized yet.
      So, this patch changes the order of sci_request_irq() and
      sci_request_dma().
      
      Fixes: 73a19e4c ("serial: sh-sci: Add DMA support.")
      Signed-off-by: default avatarTakatoshi Akiyama <takatoshi.akiyama.kj@ps.hitachi-solutions.com>
      [Shimoda changes the commit log]
      Signed-off-by: default avatarYoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3c0fcb52
    • Peter Hurley's avatar
      tty: Drop krefs for interrupted tty lock · cc04a143
      Peter Hurley authored
      commit e9036d06 upstream.
      
      When the tty lock is interrupted on attempted re-open, 2 tty krefs
      are still held. Drop extra kref before returning failure from
      tty_lock_interruptible(), and drop lookup kref before returning
      failure from tty_open().
      
      Fixes: 0bfd464d ("tty: Wait interruptibly for tty lock on reopen")
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarPeter Hurley <peter@hurleysoftware.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cc04a143
    • Julius Werner's avatar
      drivers: char: mem: Fix wraparound check to allow mappings up to the end · 983c09eb
      Julius Werner authored
      commit 32829da5 upstream.
      
      A recent fix to /dev/mem prevents mappings from wrapping around the end
      of physical address space. However, the check was written in a way that
      also prevents a mapping reaching just up to the end of physical address
      space, which may be a valid use case (especially on 32-bit systems).
      This patch fixes it by checking the last mapped address (instead of the
      first address behind that) for overflow.
      
      Fixes: b299cde2 ("drivers: char: mem: Check for address space wraparound with mmap()")
      Reported-by: default avatarNico Huber <nico.h@gmx.de>
      Signed-off-by: default avatarJulius Werner <jwerner@chromium.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      983c09eb
    • Takashi Iwai's avatar
      ASoC: Fix use-after-free at card unregistration · 9a938895
      Takashi Iwai authored
      commit 4efda5f2 upstream.
      
      soc_cleanup_card_resources() call snd_card_free() at the last of its
      procedure.  This turned out to lead to a use-after-free.
      PCM runtimes have been already removed via soc_remove_pcm_runtimes(),
      while it's dereferenced later in soc_pcm_free() called via
      snd_card_free().
      
      The fix is simple: just move the snd_card_free() call to the beginning
      of the whole procedure.  This also gives another benefit: it
      guarantees that all operations have been shut down before actually
      releasing the resources, which was racy until now.
      Reported-and-tested-by: default avatarRobert Jarzmik <robert.jarzmik@free.fr>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9a938895
    • Takashi Iwai's avatar
      ALSA: timer: Fix missing queue indices reset at SNDRV_TIMER_IOCTL_SELECT · 54d12fbf
      Takashi Iwai authored
      commit ba3021b2 upstream.
      
      snd_timer_user_tselect() reallocates the queue buffer dynamically, but
      it forgot to reset its indices.  Since the read may happen
      concurrently with ioctl and snd_timer_user_tselect() allocates the
      buffer via kmalloc(), this may lead to the leak of uninitialized
      kernel-space data, as spotted via KMSAN:
      
        BUG: KMSAN: use of unitialized memory in snd_timer_user_read+0x6c4/0xa10
        CPU: 0 PID: 1037 Comm: probe Not tainted 4.11.0-rc5+ #2739
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
        Call Trace:
         __dump_stack lib/dump_stack.c:16
         dump_stack+0x143/0x1b0 lib/dump_stack.c:52
         kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:1007
         kmsan_check_memory+0xc2/0x140 mm/kmsan/kmsan.c:1086
         copy_to_user ./arch/x86/include/asm/uaccess.h:725
         snd_timer_user_read+0x6c4/0xa10 sound/core/timer.c:2004
         do_loop_readv_writev fs/read_write.c:716
         __do_readv_writev+0x94c/0x1380 fs/read_write.c:864
         do_readv_writev fs/read_write.c:894
         vfs_readv fs/read_write.c:908
         do_readv+0x52a/0x5d0 fs/read_write.c:934
         SYSC_readv+0xb6/0xd0 fs/read_write.c:1021
         SyS_readv+0x87/0xb0 fs/read_write.c:1018
      
      This patch adds the missing reset of queue indices.  Together with the
      previous fix for the ioctl/read race, we cover the whole problem.
      Reported-by: default avatarAlexander Potapenko <glider@google.com>
      Tested-by: default avatarAlexander Potapenko <glider@google.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      54d12fbf
    • Takashi Iwai's avatar
      ALSA: timer: Fix race between read and ioctl · f5bc9187
      Takashi Iwai authored
      commit d11662f4 upstream.
      
      The read from ALSA timer device, the function snd_timer_user_tread(),
      may access to an uninitialized struct snd_timer_user fields when the
      read is concurrently performed while the ioctl like
      snd_timer_user_tselect() is invoked.  We have already fixed the races
      among ioctls via a mutex, but we seem to have forgotten the race
      between read vs ioctl.
      
      This patch simply applies (more exactly extends the already applied
      range of) tu->ioctl_lock in snd_timer_user_tread() for closing the
      race window.
      Reported-by: default avatarAlexander Potapenko <glider@google.com>
      Tested-by: default avatarAlexander Potapenko <glider@google.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f5bc9187
    • Ben Skeggs's avatar
      drm/nouveau/tmr: fully separate alarm execution/pending lists · 5dffc1be
      Ben Skeggs authored
      commit b4e382ca upstream.
      
      Reusing the list_head for both is a bad idea.  Callback execution is done
      with the lock dropped so that alarms can be rescheduled from the callback,
      which means that with some unfortunate timing, lists can get corrupted.
      
      The execution list should not require its own locking, the single function
      that uses it can only be called from a single context.
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5dffc1be
    • Sinclair Yeh's avatar
      drm/vmwgfx: Make sure backup_handle is always valid · 74276868
      Sinclair Yeh authored
      commit 07678eca upstream.
      
      When vmw_gb_surface_define_ioctl() is called with an existing buffer,
      we end up returning an uninitialized variable in the backup_handle.
      
      The fix is to first initialize backup_handle to 0 just to be sure, and
      second, when a user-provided buffer is found, we will use the
      req->buffer_handle as the backup_handle.
      Reported-by: default avatarMurray McAllister <murray.mcallister@insomniasec.com>
      Signed-off-by: default avatarSinclair Yeh <syeh@vmware.com>
      Reviewed-by: default avatarDeepak Rawat <drawat@vmware.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      74276868
    • Vladis Dronov's avatar
      drm/vmwgfx: limit the number of mip levels in vmw_gb_surface_define_ioctl() · 619cc02f
      Vladis Dronov authored
      commit ee9c4e68 upstream.
      
      The 'req->mip_levels' parameter in vmw_gb_surface_define_ioctl() is
      a user-controlled 'uint32_t' value which is used as a loop count limit.
      This can lead to a kernel lockup and DoS. Add check for 'req->mip_levels'.
      
      References:
      https://bugzilla.redhat.com/show_bug.cgi?id=1437431Signed-off-by: default avatarVladis Dronov <vdronov@redhat.com>
      Reviewed-by: default avatarSinclair Yeh <syeh@vmware.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      619cc02f
    • Dan Carpenter's avatar
      drm/vmwgfx: Handle vmalloc() failure in vmw_local_fifo_reserve() · e4c05b3a
      Dan Carpenter authored
      commit f0c62e98 upstream.
      
      If vmalloc() fails then we need to a bit of cleanup before returning.
      
      Fixes: fb1d9738 ("drm/vmwgfx: Add DRM driver for VMware Virtual GPU")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarSinclair Yeh <syeh@vmware.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e4c05b3a
    • Jin Yao's avatar
      perf/core: Drop kernel samples even though :u is specified · e582b82c
      Jin Yao authored
      commit cc1582c2 upstream.
      
      When doing sampling, for example:
      
        perf record -e cycles:u ...
      
      On workloads that do a lot of kernel entry/exits we see kernel
      samples, even though :u is specified. This is due to skid existing.
      
      This might be a security issue because it can leak kernel addresses even
      though kernel sampling support is disabled.
      
      The patch drops the kernel samples if exclude_kernel is specified.
      
      For example, test on Haswell desktop:
      
        perf record -e cycles:u <mgen>
        perf report --stdio
      
      Before patch applied:
      
          99.77%  mgen     mgen              [.] buf_read
           0.20%  mgen     mgen              [.] rand_buf_init
           0.01%  mgen     [kernel.vmlinux]  [k] apic_timer_interrupt
           0.00%  mgen     mgen              [.] last_free_elem
           0.00%  mgen     libc-2.23.so      [.] __random_r
           0.00%  mgen     libc-2.23.so      [.] _int_malloc
           0.00%  mgen     mgen              [.] rand_array_init
           0.00%  mgen     [kernel.vmlinux]  [k] page_fault
           0.00%  mgen     libc-2.23.so      [.] __random
           0.00%  mgen     libc-2.23.so      [.] __strcasestr
           0.00%  mgen     ld-2.23.so        [.] strcmp
           0.00%  mgen     ld-2.23.so        [.] _dl_start
           0.00%  mgen     libc-2.23.so      [.] sched_setaffinity@@GLIBC_2.3.4
           0.00%  mgen     ld-2.23.so        [.] _start
      
      We can see kernel symbols apic_timer_interrupt and page_fault.
      
      After patch applied:
      
          99.79%  mgen     mgen           [.] buf_read
           0.19%  mgen     mgen           [.] rand_buf_init
           0.00%  mgen     libc-2.23.so   [.] __random_r
           0.00%  mgen     mgen           [.] rand_array_init
           0.00%  mgen     mgen           [.] last_free_elem
           0.00%  mgen     libc-2.23.so   [.] vfprintf
           0.00%  mgen     libc-2.23.so   [.] rand
           0.00%  mgen     libc-2.23.so   [.] __random
           0.00%  mgen     libc-2.23.so   [.] _int_malloc
           0.00%  mgen     libc-2.23.so   [.] _IO_doallocbuf
           0.00%  mgen     ld-2.23.so     [.] do_lookup_x
           0.00%  mgen     ld-2.23.so     [.] open_verify.constprop.7
           0.00%  mgen     ld-2.23.so     [.] _dl_important_hwcaps
           0.00%  mgen     libc-2.23.so   [.] sched_setaffinity@@GLIBC_2.3.4
           0.00%  mgen     ld-2.23.so     [.] _start
      
      There are only userspace symbols.
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: jolsa@kernel.org
      Cc: kan.liang@intel.com
      Cc: mark.rutland@arm.com
      Cc: will.deacon@arm.com
      Cc: yao.jin@intel.com
      Link: http://lkml.kernel.org/r/1495706947-3744-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e582b82c
    • Michael Bringmann's avatar
      powerpc/hotplug-mem: Fix missing endian conversion of aa_index · 1cfe1e9d
      Michael Bringmann authored
      commit dc421b20 upstream.
      
      When adding or removing memory, the aa_index (affinity value) for the
      memblock must also be converted to match the endianness of the rest
      of the 'ibm,dynamic-memory' property.  Otherwise, subsequent retrieval
      of the attribute will likely lead to non-existent nodes, followed by
      using the default node in the code inappropriately.
      
      Fixes: 5f97b2a0 ("powerpc/pseries: Implement memory hotplug add in the kernel")
      Signed-off-by: default avatarMichael Bringmann <mwb@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1cfe1e9d
    • Michael Ellerman's avatar
      powerpc/numa: Fix percpu allocations to be NUMA aware · 8c92870b
      Michael Ellerman authored
      commit ba4a648f upstream.
      
      In commit 8c272261 ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"), we
      switched to the generic implementation of cpu_to_node(), which uses a percpu
      variable to hold the NUMA node for each CPU.
      
      Unfortunately we neglected to notice that we use cpu_to_node() in the allocation
      of our percpu areas, leading to a chicken and egg problem. In practice what
      happens is when we are setting up the percpu areas, cpu_to_node() reports that
      all CPUs are on node 0, so we allocate all percpu areas on node 0.
      
      This is visible in the dmesg output, as all pcpu allocs being in group 0:
      
        pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
        pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
        pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
        pcpu-alloc: [0] 24 25 26 27 [0] 28 29 30 31
        pcpu-alloc: [0] 32 33 34 35 [0] 36 37 38 39
        pcpu-alloc: [0] 40 41 42 43 [0] 44 45 46 47
      
      To fix it we need an early_cpu_to_node() which can run prior to percpu being
      setup. We already have the numa_cpu_lookup_table we can use, so just plumb it
      in. With the patch dmesg output shows two groups, 0 and 1:
      
        pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
        pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
        pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
        pcpu-alloc: [1] 24 25 26 27 [1] 28 29 30 31
        pcpu-alloc: [1] 32 33 34 35 [1] 36 37 38 39
        pcpu-alloc: [1] 40 41 42 43 [1] 44 45 46 47
      
      We can also check the data_offset in the paca of various CPUs, with the fix we
      see:
      
        CPU 0:  data_offset = 0x0ffe8b0000
        CPU 24: data_offset = 0x1ffe5b0000
      
      And we can see from dmesg that CPU 24 has an allocation on node 1:
      
        node   0: [mem 0x0000000000000000-0x0000000fffffffff]
        node   1: [mem 0x0000001000000000-0x0000001fffffffff]
      
      Fixes: 8c272261 ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID")
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8c92870b
    • Russell Currey's avatar
      powerpc/eeh: Avoid use after free in eeh_handle_special_event() · fc7fb943
      Russell Currey authored
      commit daeba295 upstream.
      
      eeh_handle_special_event() is called when an EEH event is detected but
      can't be narrowed down to a specific PE.  This function looks through
      every PE to find one in an erroneous state, then calls the regular event
      handler eeh_handle_normal_event() once it knows which PE has an error.
      
      However, if eeh_handle_normal_event() found that the PE cannot possibly
      be recovered, it will free it, rendering the passed PE stale.
      This leads to a use after free in eeh_handle_special_event() as it attempts to
      clear the "recovering" state on the PE after eeh_handle_normal_event() returns.
      
      Thus, make sure the PE is valid when attempting to clear state in
      eeh_handle_special_event().
      
      Fixes: 8a6b1bc7 ("powerpc/eeh: EEH core to handle special event")
      Reported-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: default avatarRussell Currey <ruscur@russell.cc>
      Reviewed-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      fc7fb943
    • Johannes Thumshirn's avatar
      scsi: qla2xxx: don't disable a not previously enabled PCI device · 93d03807
      Johannes Thumshirn authored
      commit ddff7ed4 upstream.
      
      When pci_enable_device() or pci_enable_device_mem() fail in
      qla2x00_probe_one() we bail out but do a call to
      pci_disable_device(). This causes the dev_WARN_ON() in
      pci_disable_device() to trigger, as the device wasn't enabled
      previously.
      
      So instead of taking the 'probe_out' error path we can directly return
      *iff* one of the pci_enable_device() calls fails.
      
      Additionally rename the 'probe_out' goto label's name to the more
      descriptive 'disable_device'.
      Signed-off-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Fixes: e315cd28 ("[SCSI] qla2xxx: Code changes for qla data structure refactoring")
      Reviewed-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Reviewed-by: default avatarGiridhar Malavali <giridhar.malavali@cavium.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      93d03807
    • Marc Zyngier's avatar
      KVM: arm/arm64: Handle possible NULL stage2 pud when ageing pages · f267b064
      Marc Zyngier authored
      commit d6dbdd3c upstream.
      
      Under memory pressure, we start ageing pages, which amounts to parsing
      the page tables. Since we don't want to allocate any extra level,
      we pass NULL for our private allocation cache. Which means that
      stage2_get_pud() is allowed to fail. This results in the following
      splat:
      
      [ 1520.409577] Unable to handle kernel NULL pointer dereference at virtual address 00000008
      [ 1520.417741] pgd = ffff810f52fef000
      [ 1520.421201] [00000008] *pgd=0000010f636c5003, *pud=0000010f56f48003, *pmd=0000000000000000
      [ 1520.429546] Internal error: Oops: 96000006 [#1] PREEMPT SMP
      [ 1520.435156] Modules linked in:
      [ 1520.438246] CPU: 15 PID: 53550 Comm: qemu-system-aar Tainted: G        W       4.12.0-rc4-00027-g1885c397eaec #7205
      [ 1520.448705] Hardware name: FOXCONN R2-1221R-A4/C2U4N_MB, BIOS G31FB12A 10/26/2016
      [ 1520.463726] task: ffff800ac5fb4e00 task.stack: ffff800ce04e0000
      [ 1520.469666] PC is at stage2_get_pmd+0x34/0x110
      [ 1520.474119] LR is at kvm_age_hva_handler+0x44/0xf0
      [ 1520.478917] pc : [<ffff0000080b137c>] lr : [<ffff0000080b149c>] pstate: 40000145
      [ 1520.486325] sp : ffff800ce04e33d0
      [ 1520.489644] x29: ffff800ce04e33d0 x28: 0000000ffff40064
      [ 1520.494967] x27: 0000ffff27e00000 x26: 0000000000000000
      [ 1520.500289] x25: ffff81051ba65008 x24: 0000ffff40065000
      [ 1520.505618] x23: 0000ffff40064000 x22: 0000000000000000
      [ 1520.510947] x21: ffff810f52b20000 x20: 0000000000000000
      [ 1520.516274] x19: 0000000058264000 x18: 0000000000000000
      [ 1520.521603] x17: 0000ffffa6fe7438 x16: ffff000008278b70
      [ 1520.526940] x15: 000028ccd8000000 x14: 0000000000000008
      [ 1520.532264] x13: ffff7e0018298000 x12: 0000000000000002
      [ 1520.537582] x11: ffff000009241b93 x10: 0000000000000940
      [ 1520.542908] x9 : ffff0000092ef800 x8 : 0000000000000200
      [ 1520.548229] x7 : ffff800ce04e36a8 x6 : 0000000000000000
      [ 1520.553552] x5 : 0000000000000001 x4 : 0000000000000000
      [ 1520.558873] x3 : 0000000000000000 x2 : 0000000000000008
      [ 1520.571696] x1 : ffff000008fd5000 x0 : ffff0000080b149c
      [ 1520.577039] Process qemu-system-aar (pid: 53550, stack limit = 0xffff800ce04e0000)
      [...]
      [ 1521.510735] [<ffff0000080b137c>] stage2_get_pmd+0x34/0x110
      [ 1521.516221] [<ffff0000080b149c>] kvm_age_hva_handler+0x44/0xf0
      [ 1521.522054] [<ffff0000080b0610>] handle_hva_to_gpa+0xb8/0xe8
      [ 1521.527716] [<ffff0000080b3434>] kvm_age_hva+0x44/0xf0
      [ 1521.532854] [<ffff0000080a58b0>] kvm_mmu_notifier_clear_flush_young+0x70/0xc0
      [ 1521.539992] [<ffff000008238378>] __mmu_notifier_clear_flush_young+0x88/0xd0
      [ 1521.546958] [<ffff00000821eca0>] page_referenced_one+0xf0/0x188
      [ 1521.552881] [<ffff00000821f36c>] rmap_walk_anon+0xec/0x250
      [ 1521.558370] [<ffff000008220f78>] rmap_walk+0x78/0xa0
      [ 1521.563337] [<ffff000008221104>] page_referenced+0x164/0x180
      [ 1521.569002] [<ffff0000081f1af0>] shrink_active_list+0x178/0x3b8
      [ 1521.574922] [<ffff0000081f2058>] shrink_node_memcg+0x328/0x600
      [ 1521.580758] [<ffff0000081f23f4>] shrink_node+0xc4/0x328
      [ 1521.585986] [<ffff0000081f2718>] do_try_to_free_pages+0xc0/0x340
      [ 1521.592000] [<ffff0000081f2a64>] try_to_free_pages+0xcc/0x240
      [...]
      
      The trivial fix is to handle this NULL pud value early, rather than
      dereferencing it blindly.
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: default avatarChristoffer Dall <cdall@linaro.org>
      Signed-off-by: default avatarChristoffer Dall <cdall@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f267b064
    • Jeff Mahoney's avatar
      btrfs: fix memory leak in update_space_info failure path · 5c7955c8
      Jeff Mahoney authored
      commit 896533a7 upstream.
      
      If we fail to add the space_info kobject, we'll leak the memory
      for the percpu counter.
      
      Fixes: 6ab0a202 (btrfs: publish allocation data in sysfs)
      Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5c7955c8
    • David Sterba's avatar
      btrfs: use correct types for page indices in btrfs_page_exists_in_range · cc8c67ca
      David Sterba authored
      commit cc2b702c upstream.
      
      Variables start_idx and end_idx are supposed to hold a page index
      derived from the file offsets. The int type is not the right one though,
      offsets larger than 1 << 44 will get silently trimmed off the high bits.
      (1 << 44 is 16TiB)
      
      What can go wrong, if start is below the boundary and end gets trimmed:
      - if there's a page after start, we'll find it (radix_tree_gang_lookup_slot)
      - the final check "if (page->index <= end_idx)" will unexpectedly fail
      
      The function will return false, ie. "there's no page in the range",
      although there is at least one.
      
      btrfs_page_exists_in_range is used to prevent races in:
      
      * in hole punching, where we make sure there are not pages in the
        truncated range, otherwise we'll wait for them to finish and redo
        truncation, but we're going to replace the pages with holes anyway so
        the only problem is the intermediate state
      
      * lock_extent_direct: we want to make sure there are no pages before we
        lock and start DIO, to prevent stale data reads
      
      For practical occurence of the bug, there are several constaints.  The
      file must be quite large, the affected range must cross the 16TiB
      boundary and the internal state of the file pages and pending operations
      must match.  Also, we must not have started any ordered data in the
      range, otherwise we don't even reach the buggy function check.
      
      DIO locking tries hard in several places to avoid deadlocks with
      buffered IO and avoids waiting for ranges. The worst consequence seems
      to be stale data read.
      
      CC: Liu Bo <bo.li.liu@oracle.com>
      Fixes: fc4adbff ("btrfs: Drop EXTENT_UPTODATE check in hole punching and direct locking")
      Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cc8c67ca
    • Frederic Barrat's avatar
      cxl: Fix error path on bad ioctl · 8fe4345d
      Frederic Barrat authored
      commit cec422c1 upstream.
      
      Fix error path if we can't copy user structure on CXL_IOCTL_START_WORK
      ioctl. We shouldn't unlock the context status mutex as it was not
      locked (yet).
      
      Fixes: 0712dc7e ("cxl: Fix issues when unmapping contexts")
      Signed-off-by: default avatarFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Reviewed-by: default avatarVaibhav Jain <vaibhav@linux.vnet.ibm.com>
      Reviewed-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8fe4345d
    • Al Viro's avatar