1. 29 Jun, 2017 10 commits
  2. 28 Jun, 2017 2 commits
  3. 26 Jun, 2017 2 commits
  4. 25 Jun, 2017 6 commits
  5. 24 Jun, 2017 11 commits
  6. 23 Jun, 2017 9 commits
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 337c6ba2
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "8 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        fs/exec.c: account for argv/envp pointers
        ocfs2: fix deadlock caused by recursive locking in xattr
        slub: make sysfs file removal asynchronous
        lib/cmdline.c: fix get_options() overflow while parsing ranges
        fs/dax.c: fix inefficiency in dax_writeback_mapping_range()
        autofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAIL
        mm/vmalloc.c: huge-vmap: fail gracefully on unexpected huge vmap mappings
        mm, thp: remove cond_resched from __collapse_huge_page_copy
      337c6ba2
    • Kees Cook's avatar
      fs/exec.c: account for argv/envp pointers · 98da7d08
      Kees Cook authored
      When limiting the argv/envp strings during exec to 1/4 of the stack limit,
      the storage of the pointers to the strings was not included.  This means
      that an exec with huge numbers of tiny strings could eat 1/4 of the stack
      limit in strings and then additional space would be later used by the
      pointers to the strings.
      
      For example, on 32-bit with a 8MB stack rlimit, an exec with 1677721
      single-byte strings would consume less than 2MB of stack, the max (8MB /
      4) amount allowed, but the pointers to the strings would consume the
      remaining additional stack space (1677721 * 4 == 6710884).
      
      The result (1677721 + 6710884 == 8388605) would exhaust stack space
      entirely.  Controlling this stack exhaustion could result in
      pathological behavior in setuid binaries (CVE-2017-1000365).
      
      [akpm@linux-foundation.org: additional commenting from Kees]
      Fixes: b6a2fea3 ("mm: variable length argument support")
      Link: http://lkml.kernel.org/r/20170622001720.GA32173@beastSigned-off-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Qualys Security Advisory <qsa@qualys.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      98da7d08
    • Eric Ren's avatar
      ocfs2: fix deadlock caused by recursive locking in xattr · 8818efaa
      Eric Ren authored
      Another deadlock path caused by recursive locking is reported.  This
      kind of issue was introduced since commit 743b5f14 ("ocfs2: take
      inode lock in ocfs2_iop_set/get_acl()").  Two deadlock paths have been
      fixed by commit b891fa50 ("ocfs2: fix deadlock issue when taking
      inode lock at vfs entry points").  Yes, we intend to fix this kind of
      case in incremental way, because it's hard to find out all possible
      paths at once.
      
      This one can be reproduced like this.  On node1, cp a large file from
      home directory to ocfs2 mountpoint.  While on node2, run
      setfacl/getfacl.  Both nodes will hang up there.  The backtraces:
      
      On node1:
        __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
        ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
        ocfs2_write_begin+0x43/0x1a0 [ocfs2]
        generic_perform_write+0xa9/0x180
        __generic_file_write_iter+0x1aa/0x1d0
        ocfs2_file_write_iter+0x4f4/0xb40 [ocfs2]
        __vfs_write+0xc3/0x130
        vfs_write+0xb1/0x1a0
        SyS_write+0x46/0xa0
      
      On node2:
        __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
        ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
        ocfs2_xattr_set+0x12e/0xe80 [ocfs2]
        ocfs2_set_acl+0x22d/0x260 [ocfs2]
        ocfs2_iop_set_acl+0x65/0xb0 [ocfs2]
        set_posix_acl+0x75/0xb0
        posix_acl_xattr_set+0x49/0xa0
        __vfs_setxattr+0x69/0x80
        __vfs_setxattr_noperm+0x72/0x1a0
        vfs_setxattr+0xa7/0xb0
        setxattr+0x12d/0x190
        path_setxattr+0x9f/0xb0
        SyS_setxattr+0x14/0x20
      
      Fix this one by using ocfs2_inode_{lock|unlock}_tracker, which is
      exported by commit 439a36b8 ("ocfs2/dlmglue: prepare tracking logic
      to avoid recursive cluster lock").
      
      Link: http://lkml.kernel.org/r/20170622014746.5815-1-zren@suse.com
      Fixes: 743b5f14 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
      Signed-off-by: default avatarEric Ren <zren@suse.com>
      Reported-by: default avatarThomas Voegtle <tv@lio96.de>
      Tested-by: default avatarThomas Voegtle <tv@lio96.de>
      Reviewed-by: default avatarJoseph Qi <jiangqi903@gmail.com>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8818efaa
    • Tejun Heo's avatar
      slub: make sysfs file removal asynchronous · 3b7b3140
      Tejun Heo authored
      Commit bf5eb3de ("slub: separate out sysfs_slab_release() from
      sysfs_slab_remove()") made slub sysfs file removals synchronous to
      kmem_cache shutdown.
      
      Unfortunately, this created a possible ABBA deadlock between slab_mutex
      and sysfs draining mechanism triggering the following lockdep warning.
      
        ======================================================
        [ INFO: possible circular locking dependency detected ]
        4.10.0-test+ #48 Not tainted
        -------------------------------------------------------
        rmmod/1211 is trying to acquire lock:
         (s_active#120){++++.+}, at: [<ffffffff81308073>] kernfs_remove+0x23/0x40
      
        but task is already holding lock:
         (slab_mutex){+.+.+.}, at: [<ffffffff8120f691>] kmem_cache_destroy+0x41/0x2d0
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #1 (slab_mutex){+.+.+.}:
      	 lock_acquire+0xf6/0x1f0
      	 __mutex_lock+0x75/0x950
      	 mutex_lock_nested+0x1b/0x20
      	 slab_attr_store+0x75/0xd0
      	 sysfs_kf_write+0x45/0x60
      	 kernfs_fop_write+0x13c/0x1c0
      	 __vfs_write+0x28/0x120
      	 vfs_write+0xc8/0x1e0
      	 SyS_write+0x49/0xa0
      	 entry_SYSCALL_64_fastpath+0x1f/0xc2
      
        -> #0 (s_active#120){++++.+}:
      	 __lock_acquire+0x10ed/0x1260
      	 lock_acquire+0xf6/0x1f0
      	 __kernfs_remove+0x254/0x320
      	 kernfs_remove+0x23/0x40
      	 sysfs_remove_dir+0x51/0x80
      	 kobject_del+0x18/0x50
      	 __kmem_cache_shutdown+0x3e6/0x460
      	 kmem_cache_destroy+0x1fb/0x2d0
      	 kvm_exit+0x2d/0x80 [kvm]
      	 vmx_exit+0x19/0xa1b [kvm_intel]
      	 SyS_delete_module+0x198/0x1f0
      	 entry_SYSCALL_64_fastpath+0x1f/0xc2
      
        other info that might help us debug this:
      
         Possible unsafe locking scenario:
      
      	 CPU0                    CPU1
      	 ----                    ----
          lock(slab_mutex);
      				 lock(s_active#120);
      				 lock(slab_mutex);
          lock(s_active#120);
      
         *** DEADLOCK ***
      
        2 locks held by rmmod/1211:
         #0:  (cpu_hotplug.dep_map){++++++}, at: [<ffffffff810a7877>] get_online_cpus+0x37/0x80
         #1:  (slab_mutex){+.+.+.}, at: [<ffffffff8120f691>] kmem_cache_destroy+0x41/0x2d0
      
        stack backtrace:
        CPU: 3 PID: 1211 Comm: rmmod Not tainted 4.10.0-test+ #48
        Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
        Call Trace:
         print_circular_bug+0x1be/0x210
         __lock_acquire+0x10ed/0x1260
         lock_acquire+0xf6/0x1f0
         __kernfs_remove+0x254/0x320
         kernfs_remove+0x23/0x40
         sysfs_remove_dir+0x51/0x80
         kobject_del+0x18/0x50
         __kmem_cache_shutdown+0x3e6/0x460
         kmem_cache_destroy+0x1fb/0x2d0
         kvm_exit+0x2d/0x80 [kvm]
         vmx_exit+0x19/0xa1b [kvm_intel]
         SyS_delete_module+0x198/0x1f0
         ? SyS_delete_module+0x5/0x1f0
         entry_SYSCALL_64_fastpath+0x1f/0xc2
      
      It'd be the cleanest to deal with the issue by removing sysfs files
      without holding slab_mutex before the rest of shutdown; however, given
      the current code structure, it is pretty difficult to do so.
      
      This patch punts sysfs file removal to a work item.  Before commit
      bf5eb3de, the removal was punted to a RCU delayed work item which is
      executed after release.  Now, we're punting to a different work item on
      shutdown which still maintains the goal removing the sysfs files earlier
      when destroying kmem_caches.
      
      Link: http://lkml.kernel.org/r/20170620204512.GI21326@htj.duckdns.org
      Fixes: bf5eb3de ("slub: separate out sysfs_slab_release() from sysfs_slab_remove()")
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Tested-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3b7b3140
    • Ilya Matveychikov's avatar
      lib/cmdline.c: fix get_options() overflow while parsing ranges · a91e0f68
      Ilya Matveychikov authored
      When using get_options() it's possible to specify a range of numbers,
      like 1-100500.  The problem is that it doesn't track array size while
      calling internally to get_range() which iterates over the range and
      fills the memory with numbers.
      
      Link: http://lkml.kernel.org/r/2613C75C-B04D-4BFF-82A6-12F97BA0F620@gmail.comSigned-off-by: default avatarIlya V. Matveychikov <matvejchikov@gmail.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a91e0f68
    • Jan Kara's avatar
      fs/dax.c: fix inefficiency in dax_writeback_mapping_range() · 1eb643d0
      Jan Kara authored
      dax_writeback_mapping_range() fails to update iteration index when
      searching radix tree for entries needing cache flushing.  Thus each
      pagevec worth of entries is searched starting from the start which is
      inefficient and prone to livelocks.  Update index properly.
      
      Link: http://lkml.kernel.org/r/20170619124531.21491-1-jack@suse.cz
      Fixes: 9973c98e ("dax: add support for fsync/sync")
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1eb643d0
    • NeilBrown's avatar
      autofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAIL · 9fa4eb8e
      NeilBrown authored
      If a positive status is passed with the AUTOFS_DEV_IOCTL_FAIL ioctl,
      autofs4_d_automount() will return
      
         ERR_PTR(status)
      
      with that status to follow_automount(), which will then dereference an
      invalid pointer.
      
      So treat a positive status the same as zero, and map to ENOENT.
      
      See comment in systemd src/core/automount.c::automount_send_ready().
      
      Link: http://lkml.kernel.org/r/871sqwczx5.fsf@notabene.neil.brown.nameSigned-off-by: default avatarNeilBrown <neilb@suse.com>
      Cc: Ian Kent <raven@themaw.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9fa4eb8e
    • Ard Biesheuvel's avatar
      mm/vmalloc.c: huge-vmap: fail gracefully on unexpected huge vmap mappings · 029c54b0
      Ard Biesheuvel authored
      Existing code that uses vmalloc_to_page() may assume that any address
      for which is_vmalloc_addr() returns true may be passed into
      vmalloc_to_page() to retrieve the associated struct page.
      
      This is not un unreasonable assumption to make, but on architectures
      that have CONFIG_HAVE_ARCH_HUGE_VMAP=y, it no longer holds, and we need
      to ensure that vmalloc_to_page() does not go off into the weeds trying
      to dereference huge PUDs or PMDs as table entries.
      
      Given that vmalloc() and vmap() themselves never create huge mappings or
      deal with compound pages at all, there is no correct answer in this
      case, so return NULL instead, and issue a warning.
      
      When reading /proc/kcore on arm64, you will hit an oops as soon as you
      hit the huge mappings used for the various segments that make up the
      mapping of vmlinux.  With this patch applied, you will no longer hit the
      oops, but the kcore contents willl be incorrect (these regions will be
      zeroed out)
      
      We are fixing this for kcore specifically, so it avoids vread() for
      those regions.  At least one other problematic user exists, i.e.,
      /dev/kmem, but that is currently broken on arm64 for other reasons.
      
      Link: http://lkml.kernel.org/r/20170609082226.26152-1-ard.biesheuvel@linaro.orgSigned-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reviewed-by: default avatarLaura Abbott <labbott@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: zhong jiang <zhongjiang@huawei.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      029c54b0
    • David Rientjes's avatar
      mm, thp: remove cond_resched from __collapse_huge_page_copy · c891d9f6
      David Rientjes authored
      This is a partial revert of commit 338a16ba ("mm, thp: copying user
      pages must schedule on collapse") which added a cond_resched() to
      __collapse_huge_page_copy().
      
      On x86 with CONFIG_HIGHPTE, __collapse_huge_page_copy is called in
      atomic context and thus scheduling is not possible.  This is only a
      possible config on arm and i386.
      
      Although need_resched has been shown to be set for over 100 jiffies
      while doing the iteration in __collapse_huge_page_copy, this is better
      than doing
      
      	if (in_atomic())
      		cond_resched()
      
      to cover only non-CONFIG_HIGHPTE configs.
      
      Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1706191341550.97821@chino.kir.corp.google.comSigned-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Reported-by: default avatarLarry Finger <Larry.Finger@lwfinger.net>
      Tested-by: default avatarLarry Finger <Larry.Finger@lwfinger.net>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c891d9f6