1. 07 Jun, 2017 22 commits
    • Johan Hovold's avatar
      USB: serial: kl5kusb105: fix open error path · 6017ea9d
      Johan Hovold authored
      commit 6774d5f5 upstream.
      
      Kill urbs and disable read before returning from open on failure to
      retrieve the line state.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      6017ea9d
    • Robbie Ko's avatar
      Btrfs: fix tree search logic when replaying directory entry deletes · 6e97d31a
      Robbie Ko authored
      commit 2a7bf53f upstream.
      
      If a log tree has a layout like the following:
      
      leaf N:
              ...
              item 240 key (282 DIR_LOG_ITEM 0) itemoff 8189 itemsize 8
                      dir log end 1275809046
      leaf N + 1:
              item 0 key (282 DIR_LOG_ITEM 3936149215) itemoff 16275 itemsize 8
                      dir log end 18446744073709551615
              ...
      
      When we pass the value 1275809046 + 1 as the parameter start_ret to the
      function tree-log.c:find_dir_range() (done by replay_dir_deletes()), we
      end up with path->slots[0] having the value 239 (points to the last item
      of leaf N, item 240). Because the dir log item in that position has an
      offset value smaller than *start_ret (1275809046 + 1) we need to move on
      to the next leaf, however the logic for that is wrong since it compares
      the current slot to the number of items in the leaf, which is smaller
      and therefore we don't lookup for the next leaf but instead we set the
      slot to point to an item that does not exist, at slot 240, and we later
      operate on that slot which has unexpected content or in the worst case
      can result in an invalid memory access (accessing beyond the last page
      of leaf N's extent buffer).
      
      So fix the logic that checks when we need to lookup at the next leaf
      by first incrementing the slot and only after to check if that slot
      is beyond the last item of the current leaf.
      Signed-off-by: default avatarRobbie Ko <robbieko@synology.com>
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Fixes: e02119d5 (Btrfs: Add a write ahead tree log to optimize synchronous operations)
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      [Modified changelog for clarity and correctness]
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      6e97d31a
    • Michal Hocko's avatar
      hotplug: Make register and unregister notifier API symmetric · 57bf12f4
      Michal Hocko authored
      commit 777c6e0d upstream.
      
      Yu Zhao has noticed that __unregister_cpu_notifier only unregisters its
      notifiers when HOTPLUG_CPU=y while the registration might succeed even
      when HOTPLUG_CPU=n if MODULE is enabled. This means that e.g. zswap
      might keep a stale notifier on the list on the manual clean up during
      the pool tear down and thus corrupt the list. Resulting in the following
      
      [  144.964346] BUG: unable to handle kernel paging request at ffff880658a2be78
      [  144.971337] IP: [<ffffffffa290b00b>] raw_notifier_chain_register+0x1b/0x40
      <snipped>
      [  145.122628] Call Trace:
      [  145.125086]  [<ffffffffa28e5cf8>] __register_cpu_notifier+0x18/0x20
      [  145.131350]  [<ffffffffa2a5dd73>] zswap_pool_create+0x273/0x400
      [  145.137268]  [<ffffffffa2a5e0fc>] __zswap_param_set+0x1fc/0x300
      [  145.143188]  [<ffffffffa2944c1d>] ? trace_hardirqs_on+0xd/0x10
      [  145.149018]  [<ffffffffa2908798>] ? kernel_param_lock+0x28/0x30
      [  145.154940]  [<ffffffffa2a3e8cf>] ? __might_fault+0x4f/0xa0
      [  145.160511]  [<ffffffffa2a5e237>] zswap_compressor_param_set+0x17/0x20
      [  145.167035]  [<ffffffffa2908d3c>] param_attr_store+0x5c/0xb0
      [  145.172694]  [<ffffffffa290848d>] module_attr_store+0x1d/0x30
      [  145.178443]  [<ffffffffa2b2b41f>] sysfs_kf_write+0x4f/0x70
      [  145.183925]  [<ffffffffa2b2a5b9>] kernfs_fop_write+0x149/0x180
      [  145.189761]  [<ffffffffa2a99248>] __vfs_write+0x18/0x40
      [  145.194982]  [<ffffffffa2a9a412>] vfs_write+0xb2/0x1a0
      [  145.200122]  [<ffffffffa2a9a732>] SyS_write+0x52/0xa0
      [  145.205177]  [<ffffffffa2ff4d97>] entry_SYSCALL_64_fastpath+0x12/0x17
      
      This can be even triggered manually by changing
      /sys/module/zswap/parameters/compressor multiple times.
      
      Fix this issue by making unregister APIs symmetric to the register so
      there are no surprises.
      
      [js] backport to 3.12
      
      Fixes: 47e627bc ("[PATCH] hotplug: Allow modules to use the cpu hotplug notifiers even if !CONFIG_HOTPLUG_CPU")
      Reported-and-tested-by: default avatarYu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: linux-mm@kvack.org
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Link: http://lkml.kernel.org/r/20161207135438.4310-1-mhocko@kernel.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      57bf12f4
    • Boris Brezillon's avatar
      m68k: Fix ndelay() macro · 3d1b965c
      Boris Brezillon authored
      commit 7e251bb2 upstream.
      
      The current ndelay() macro definition has an extra semi-colon at the
      end of the line thus leading to a compilation error when ndelay is used
      in a conditional block without curly braces like this one:
      
      	if (cond)
      		ndelay(t);
      	else
      		...
      
      which, after the preprocessor pass gives:
      
      	if (cond)
      		m68k_ndelay(t);;
      	else
      		...
      
      thus leading to the following gcc error:
      
      	error: 'else' without a previous 'if'
      
      Remove this extra semi-colon.
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@free-electrons.com>
      Fixes: c8ee038b ("m68k: Implement ndelay() based on the existing udelay() logic")
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      3d1b965c
    • Thomas Gleixner's avatar
      locking/rtmutex: Prevent dequeue vs. unlock race · 07f03860
      Thomas Gleixner authored
      commit dbb26055 upstream.
      
      David reported a futex/rtmutex state corruption. It's caused by the
      following problem:
      
      CPU0		CPU1		CPU2
      
      l->owner=T1
      		rt_mutex_lock(l)
      		lock(l->wait_lock)
      		l->owner = T1 | HAS_WAITERS;
      		enqueue(T2)
      		boost()
      		  unlock(l->wait_lock)
      		schedule()
      
      				rt_mutex_lock(l)
      				lock(l->wait_lock)
      				l->owner = T1 | HAS_WAITERS;
      				enqueue(T3)
      				boost()
      				  unlock(l->wait_lock)
      				schedule()
      		signal(->T2)	signal(->T3)
      		lock(l->wait_lock)
      		dequeue(T2)
      		deboost()
      		  unlock(l->wait_lock)
      				lock(l->wait_lock)
      				dequeue(T3)
      				  ===> wait list is now empty
      				deboost()
      				 unlock(l->wait_lock)
      		lock(l->wait_lock)
      		fixup_rt_mutex_waiters()
      		  if (wait_list_empty(l)) {
      		    owner = l->owner & ~HAS_WAITERS;
      		    l->owner = owner
      		     ==> l->owner = T1
      		  }
      
      				lock(l->wait_lock)
      rt_mutex_unlock(l)		fixup_rt_mutex_waiters()
      				  if (wait_list_empty(l)) {
      				    owner = l->owner & ~HAS_WAITERS;
      cmpxchg(l->owner, T1, NULL)
       ===> Success (l->owner = NULL)
      				    l->owner = owner
      				     ==> l->owner = T1
      				  }
      
      That means the problem is caused by fixup_rt_mutex_waiters() which does the
      RMW to clear the waiters bit unconditionally when there are no waiters in
      the rtmutexes rbtree.
      
      This can be fatal: A concurrent unlock can release the rtmutex in the
      fastpath because the waiters bit is not set. If the cmpxchg() gets in the
      middle of the RMW operation then the previous owner, which just unlocked
      the rtmutex is set as the owner again when the write takes place after the
      successfull cmpxchg().
      
      The solution is rather trivial: verify that the owner member of the rtmutex
      has the waiters bit set before clearing it. This does not require a
      cmpxchg() or other atomic operations because the waiters bit can only be
      set and cleared with the rtmutex wait_lock held. It's also safe against the
      fast path unlock attempt. The unlock attempt via cmpxchg() will either see
      the bit set and take the slowpath or see the bit cleared and release it
      atomically in the fastpath.
      
      It's remarkable that the test program provided by David triggers on ARM64
      and MIPS64 really quick, but it refuses to reproduce on x86-64, while the
      problem exists there as well. That refusal might explain that this got not
      discovered earlier despite the bug existing from day one of the rtmutex
      implementation more than 10 years ago.
      
      Thanks to David for meticulously instrumenting the code and providing the
      information which allowed to decode this subtle problem.
      Reported-by: default avatarDavid Daney <ddaney@caviumnetworks.com>
      Tested-by: default avatarDavid Daney <david.daney@cavium.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Fixes: 23f78d4a ("[PATCH] pi-futex: rt mutex core")
      Link: http://lkml.kernel.org/r/20161130210030.351136722@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      [wt: s/{READ,WRITE}_ONCE/ACCESS_ONCE/]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      07f03860
    • Jan Kara's avatar
      ext4: fix data exposure after a crash · 9488a477
      Jan Kara authored
      commit 06bd3c36 upstream.
      
      Huang has reported that in his powerfail testing he is seeing stale
      block contents in some of recently allocated blocks although he mounts
      ext4 in data=ordered mode. After some investigation I have found out
      that indeed when delayed allocation is used, we don't add inode to
      transaction's list of inodes needing flushing before commit. Originally
      we were doing that but commit f3b59291 removed the logic with a
      flawed argument that it is not needed.
      
      The problem is that although for delayed allocated blocks we write their
      contents immediately after allocating them, there is no guarantee that
      the IO scheduler or device doesn't reorder things and thus transaction
      allocating blocks and attaching them to inode can reach stable storage
      before actual block contents. Actually whenever we attach freshly
      allocated blocks to inode using a written extent, we should add inode to
      transaction's ordered inode list to make sure we properly wait for block
      contents to be written before committing the transaction. So that is
      what we do in this patch. This also handles other cases where stale data
      exposure was possible - like filling hole via mmap in
      data=ordered,nodelalloc mode.
      
      The only exception to the above rule are extending direct IO writes where
      blkdev_direct_IO() waits for IO to complete before increasing i_size and
      thus stale data exposure is not possible. For now we don't complicate
      the code with optimizing this special case since the overhead is pretty
      low. In case this is observed to be a performance problem we can always
      handle it using a special flag to ext4_map_blocks().
      
      Fixes: f3b59291Reported-by: default avatar"HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com>
      Tested-by: default avatar"HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      9488a477
    • Eric Biggers's avatar
      KEYS: fix keyctl_set_reqkey_keyring() to not leak thread keyrings · d19182e2
      Eric Biggers authored
      commit c9f838d1 upstream.
      
      This fixes CVE-2017-7472.
      
      Running the following program as an unprivileged user exhausts kernel
      memory by leaking thread keyrings:
      
      	#include <keyutils.h>
      
      	int main()
      	{
      		for (;;)
      			keyctl_set_reqkey_keyring(KEY_REQKEY_DEFL_THREAD_KEYRING);
      	}
      
      Fix it by only creating a new thread keyring if there wasn't one before.
      To make things more consistent, make install_thread_keyring_to_cred()
      and install_process_keyring_to_cred() both return 0 if the corresponding
      keyring is already present.
      
      Fixes: d84f4f99 ("CRED: Inaugurate COW credentials")
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      d19182e2
    • David Howells's avatar
      KEYS: Change the name of the dead type to ".dead" to prevent user access · 5b22a8a5
      David Howells authored
      commit c1644fe0 upstream.
      
      This fixes CVE-2017-6951.
      
      Userspace should not be able to do things with the "dead" key type as it
      doesn't have some of the helper functions set upon it that the kernel
      needs.  Attempting to use it may cause the kernel to crash.
      
      Fix this by changing the name of the type to ".dead" so that it's rejected
      up front on userspace syscalls by key_get_type_from_user().
      
      Though this doesn't seem to affect recent kernels, it does affect older
      ones, certainly those prior to:
      
      	commit c06cfb08
      	Author: David Howells <dhowells@redhat.com>
      	Date:   Tue Sep 16 17:36:06 2014 +0100
      	KEYS: Remove key_type::match in favour of overriding default by match_preparse
      
      which went in before 3.18-rc1.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      5b22a8a5
    • David Howells's avatar
      KEYS: Disallow keyrings beginning with '.' to be joined as session keyrings · 8e728d26
      David Howells authored
      commit ee8f844e upstream.
      
      This fixes CVE-2016-9604.
      
      Keyrings whose name begin with a '.' are special internal keyrings and so
      userspace isn't allowed to create keyrings by this name to prevent
      shadowing.  However, the patch that added the guard didn't fix
      KEYCTL_JOIN_SESSION_KEYRING.  Not only can that create dot-named keyrings,
      it can also subscribe to them as a session keyring if they grant SEARCH
      permission to the user.
      
      This, for example, allows a root process to set .builtin_trusted_keys as
      its session keyring, at which point it has full access because now the
      possessor permissions are added.  This permits root to add extra public
      keys, thereby bypassing module verification.
      
      This also affects kexec and IMA.
      
      This can be tested by (as root):
      
      	keyctl session .builtin_trusted_keys
      	keyctl add user a a @s
      	keyctl list @s
      
      which on my test box gives me:
      
      	2 keys in keyring:
      	180010936: ---lswrv     0     0 asymmetric: Build time autogenerated kernel key: ae3d4a31b82daa8e1a75b49dc2bba949fd992a05
      	801382539: --alswrv     0     0 user: a
      
      Fix this by rejecting names beginning with a '.' in the keyctl.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarMimi Zohar <zohar@linux.vnet.ibm.com>
      cc: linux-ima-devel@lists.sourceforge.net
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      8e728d26
    • Andy Whitcroft's avatar
      xfrm_user: validate XFRM_MSG_NEWAE incoming ESN size harder · cbd95d10
      Andy Whitcroft authored
      commit f843ee6d upstream.
      
      Kees Cook has pointed out that xfrm_replay_state_esn_len() is subject to
      wrapping issues.  To ensure we are correctly ensuring that the two ESN
      structures are the same size compare both the overall size as reported
      by xfrm_replay_state_esn_len() and the internal length are the same.
      
      CVE-2017-7184
      Signed-off-by: default avatarAndy Whitcroft <apw@canonical.com>
      Acked-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      cbd95d10
    • Andy Whitcroft's avatar
      xfrm_user: validate XFRM_MSG_NEWAE XFRMA_REPLAY_ESN_VAL replay_window · ccf85441
      Andy Whitcroft authored
      commit 677e806d upstream.
      
      When a new xfrm state is created during an XFRM_MSG_NEWSA call we
      validate the user supplied replay_esn to ensure that the size is valid
      and to ensure that the replay_window size is within the allocated
      buffer.  However later it is possible to update this replay_esn via a
      XFRM_MSG_NEWAE call.  There we again validate the size of the supplied
      buffer matches the existing state and if so inject the contents.  We do
      not at this point check that the replay_window is within the allocated
      memory.  This leads to out-of-bounds reads and writes triggered by
      netlink packets.  This leads to memory corruption and the potential for
      priviledge escalation.
      
      We already attempt to validate the incoming replay information in
      xfrm_new_ae() via xfrm_replay_verify_len().  This confirms that the user
      is not trying to change the size of the replay state buffer which
      includes the replay_esn.  It however does not check the replay_window
      remains within that buffer.  Add validation of the contained
      replay_window.
      
      CVE-2017-7184
      Signed-off-by: default avatarAndy Whitcroft <apw@canonical.com>
      Acked-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      ccf85441
    • Eric Dumazet's avatar
      tcp: avoid infinite loop in tcp_splice_read() · 452655e7
      Eric Dumazet authored
      commit ccf7abb9 upstream.
      
      Splicing from TCP socket is vulnerable when a packet with URG flag is
      received and stored into receive queue.
      
      __tcp_splice_read() returns 0, and sk_wait_data() immediately
      returns since there is the problematic skb in queue.
      
      This is a nice way to burn cpu (aka infinite loop) and trigger
      soft lockups.
      
      Again, this gem was found by syzkaller tool.
      
      Fixes: 9c55e01c ("[TCP]: Splice receive support.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov  <dvyukov@google.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      452655e7
    • Stephen Smalley's avatar
      selinux: fix off-by-one in setprocattr · a71b4196
      Stephen Smalley authored
      commit 0c461cb7 upstream.
      
      SELinux tries to support setting/clearing of /proc/pid/attr attributes
      from the shell by ignoring terminating newlines and treating an
      attribute value that begins with a NUL or newline as an attempt to
      clear the attribute.  However, the test for clearing attributes has
      always been wrong; it has an off-by-one error, and this could further
      lead to reading past the end of the allocated buffer since commit
      bb646cdb ("proc_pid_attr_write():
      switch to memdup_user()").  Fix the off-by-one error.
      
      Even with this fix, setting and clearing /proc/pid/attr attributes
      from the shell is not straightforward since the interface does not
      support multiple write() calls (so shells that write the value and
      newline separately will set and then immediately clear the attribute,
      requiring use of echo -n to set the attribute), whereas trying to use
      echo -n "" to clear the attribute causes the shell to skip the
      write() call altogether since POSIX says that a zero-length write
      causes no side effects. Thus, one must use echo -n to set and echo
      without -n to clear, as in the following example:
      $ echo -n unconfined_u:object_r:user_home_t:s0 > /proc/$$/attr/fscreate
      $ cat /proc/$$/attr/fscreate
      unconfined_u:object_r:user_home_t:s0
      $ echo "" > /proc/$$/attr/fscreate
      $ cat /proc/$$/attr/fscreate
      
      Note the use of /proc/$$ rather than /proc/self, as otherwise
      the cat command will read its own attribute value, not that of the shell.
      
      There are no users of this facility to my knowledge; possibly we
      should just get rid of it.
      
      UPDATE: Upon further investigation it appears that a local process
      with the process:setfscreate permission can cause a kernel panic as a
      result of this bug.  This patch fixes CVE-2017-2618.
      Signed-off-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      [PM: added the update about CVE-2017-2618 to the commit description]
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      a71b4196
    • Kees Cook's avatar
      fbdev: color map copying bounds checking · e39ab836
      Kees Cook authored
      commit 2dc705a9 upstream.
      
      Copying color maps to userspace doesn't check the value of to->start,
      which will cause kernel heap buffer OOB read due to signedness wraps.
      
      CVE-2016-8405
      
      Link: http://lkml.kernel.org/r/20170105224249.GA50925@beast
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Reported-by: Peter Pi (@heisecode) of Trend Micro
      Cc: Min Chong <mchong@google.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      e39ab836
    • Gu Zheng's avatar
      tmpfs: clear S_ISGID when setting posix ACLs · 1026a8ce
      Gu Zheng authored
      commit 497de07d upstream.
      
      This change was missed the tmpfs modification in In CVE-2016-7097
      commit 07393101 ("posix_acl: Clear SGID bit when setting
      file permissions")
      It can test by xfstest generic/375, which failed to clear
      setgid bit in the following test case on tmpfs:
      
        touch $testfile
        chown 100:100 $testfile
        chmod 2755 $testfile
        _runas -u 100 -g 101 -- setfacl -m u::rwx,g::rwx,o::rwx $testfile
      Signed-off-by: default avatarGu Zheng <guzheng1@huawei.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      1026a8ce
    • Jan Kara's avatar
      posix_acl: Clear SGID bit when setting file permissions · dd2421b5
      Jan Kara authored
      commit 07393101 upstream.
      
      When file permissions are modified via chmod(2) and the user is not in
      the owning group or capable of CAP_FSETID, the setgid bit is cleared in
      inode_change_ok().  Setting a POSIX ACL via setxattr(2) sets the file
      permissions as well as the new ACL, but doesn't clear the setgid bit in
      a similar way; this allows to bypass the check in chmod(2).  Fix that.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      [wt: dropped hfsplus changes : no xattr in 3.10]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      dd2421b5
    • Steve Rutherford's avatar
      KVM: x86: Introduce segmented_write_std · 27bf4b6e
      Steve Rutherford authored
      commit 129a72a0 upstream.
      
      Introduces segemented_write_std.
      
      Switches from emulated reads/writes to standard read/writes in fxsave,
      fxrstor, sgdt, and sidt.  This fixes CVE-2017-2584, a longstanding
      kernel memory leak.
      
      Since commit 283c95d0 ("KVM: x86: emulate FXSAVE and FXRSTOR",
      2016-11-09), which is luckily not yet in any final release, this would
      also be an exploitable kernel memory *write*!
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Fixes: 96051572
      Fixes: 283c95d0Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarSteve Rutherford <srutherford@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      27bf4b6e
    • Paolo Bonzini's avatar
      KVM: x86: fix emulation of "MOV SS, null selector" · 2dd7d7e4
      Paolo Bonzini authored
      commit 33ab9110 upstream.
      
      This is CVE-2017-2583.  On Intel this causes a failed vmentry because
      SS's type is neither 3 nor 7 (even though the manual says this check is
      only done for usable SS, and the dmesg splat says that SS is unusable!).
      On AMD it's worse: svm.c is confused and sets CPL to 0 in the vmcb.
      
      The fix fabricates a data segment descriptor when SS is set to a null
      selector, so that CPL and SS.DPL are set correctly in the VMCS/vmcb.
      Furthermore, only allow setting SS to a NULL selector if SS.RPL < 3;
      this in turn ensures CPL < 3 because RPL must be equal to CPL.
      
      Thanks to Andy Lutomirski and Willy Tarreau for help in analyzing
      the bug and deciphering the manuals.
      
      [js] backport to 3.12
      Reported-by: default avatarXiaohan Zhang <zhangxiaohan1@huawei.com>
      Fixes: 79d5b4c3Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      2dd7d7e4
    • Ilya Dryomov's avatar
      libceph: don't set weight to IN when OSD is destroyed · dd7d4be1
      Ilya Dryomov authored
      commit b581a585 upstream.
      
      Since ceph.git commit 4e28f9e63644 ("osd/OSDMap: clear osd_info,
      osd_xinfo on osd deletion"), weight is set to IN when OSD is deleted.
      This changes the result of applying an incremental for clients, not
      just OSDs.  Because CRUSH computations are obviously affected,
      pre-4e28f9e63644 servers disagree with post-4e28f9e63644 clients on
      object placement, resulting in misdirected requests.
      
      Mirrors ceph.git commit a6009d1039a55e2c77f431662b3d6cc5a8e8e63f.
      
      Fixes: 930c5328 ("libceph: apply new_state before new_up_client on incrementals")
      Link: http://tracker.ceph.com/issues/19122Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarSage Weil <sage@redhat.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      dd7d4be1
    • Ryan Ware's avatar
      EVM: Use crypto_memneq() for digest comparisons · 37510515
      Ryan Ware authored
      commit 613317bd upstream.
      
      This patch fixes vulnerability CVE-2016-2085.  The problem exists
      because the vm_verify_hmac() function includes a use of memcmp().
      Unfortunately, this allows timing side channel attacks; specifically
      a MAC forgery complexity drop from 2^128 to 2^12.  This patch changes
      the memcmp() to the cryptographically safe crypto_memneq().
      Reported-by: default avatarXiaofei Rex Guo <xiaofei.rex.guo@intel.com>
      Signed-off-by: default avatarRyan Ware <ware@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMimi Zohar <zohar@linux.vnet.ibm.com>
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      Cc: Jason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      37510515
    • James Yonan's avatar
      crypto: crypto_memneq - add equality testing of memory regions w/o timing leaks · 5fd53819
      James Yonan authored
      commit 6bf37e5a upstream.
      
      When comparing MAC hashes, AEAD authentication tags, or other hash
      values in the context of authentication or integrity checking, it
      is important not to leak timing information to a potential attacker,
      i.e. when communication happens over a network.
      
      Bytewise memory comparisons (such as memcmp) are usually optimized so
      that they return a nonzero value as soon as a mismatch is found. E.g,
      on x86_64/i5 for 512 bytes this can be ~50 cyc for a full mismatch
      and up to ~850 cyc for a full match (cold). This early-return behavior
      can leak timing information as a side channel, allowing an attacker to
      iteratively guess the correct result.
      
      This patch adds a new method crypto_memneq ("memory not equal to each
      other") to the crypto API that compares memory areas of the same length
      in roughly "constant time" (cache misses could change the timing, but
      since they don't reveal information about the content of the strings
      being compared, they are effectively benign). Iow, best and worst case
      behaviour take the same amount of time to complete (in contrast to
      memcmp).
      
      Note that crypto_memneq (unlike memcmp) can only be used to test for
      equality or inequality, NOT for lexicographical order. This, however,
      is not an issue for its use-cases within the crypto API.
      
      We tried to locate all of the places in the crypto API where memcmp was
      being used for authentication or integrity checking, and convert them
      over to crypto_memneq.
      
      crypto_memneq is declared noinline, placed in its own source file,
      and compiled with optimizations that might increase code size disabled
      ("Os") because a smart compiler (or LTO) might notice that the return
      value is always compared against zero/nonzero, and might then
      reintroduce the same early-return optimization that we are trying to
      avoid.
      
      Using #pragma or __attribute__ optimization annotations of the code
      for disabling optimization was avoided as it seems to be considered
      broken or unmaintained for long time in GCC [1]. Therefore, we work
      around that by specifying the compile flag for memneq.o directly in
      the Makefile. We found that this seems to be most appropriate.
      
      As we use ("Os"), this patch also provides a loop-free "fast-path" for
      frequently used 16 byte digests. Similarly to kernel library string
      functions, leave an option for future even further optimized architecture
      specific assembler implementations.
      
      This was a joint work of James Yonan and Daniel Borkmann. Also thanks
      for feedback from Florian Weimer on this and earlier proposals [2].
      
        [1] http://gcc.gnu.org/ml/gcc/2012-07/msg00211.html
        [2] https://lkml.org/lkml/2013/2/10/131Signed-off-by: default avatarJames Yonan <james@openvpn.net>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Florian Weimer <fw@deneb.enyo.de>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Cc: Jason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      5fd53819
    • Philip Pettersson's avatar
      packet: fix race condition in packet_set_ring · 4bfb6dd7
      Philip Pettersson authored
      commit 84ac7260 upstream.
      
      When packet_set_ring creates a ring buffer it will initialize a
      struct timer_list if the packet version is TPACKET_V3. This value
      can then be raced by a different thread calling setsockopt to
      set the version to TPACKET_V1 before packet_set_ring has finished.
      
      This leads to a use-after-free on a function pointer in the
      struct timer_list when the socket is closed as the previously
      initialized timer will not be deleted.
      
      The bug is fixed by taking lock_sock(sk) in packet_setsockopt when
      changing the packet version while also taking the lock at the start
      of packet_set_ring.
      
      Fixes: f6fb8f10 ("af-packet: TPACKET_V3 flexible buffer implementation.")
      Signed-off-by: default avatarPhilip Pettersson <philip.pettersson@gmail.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: gregkh@linuxfoundation.org
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      4bfb6dd7
  2. 10 Feb, 2017 18 commits
    • Willy Tarreau's avatar
      Linux 3.10.105 · ec55e7c2
      Willy Tarreau authored
      ec55e7c2
    • Guenter Roeck's avatar
      metag: Only define atomic_dec_if_positive conditionally · 63d7f335
      Guenter Roeck authored
      commit 35d04077 upstream.
      
      The definition of atomic_dec_if_positive() assumes that
      atomic_sub_if_positive() exists, which is only the case if
      metag specific atomics are used. This results in the following
      build error when trying to build metag1_defconfig.
      
      kernel/ucount.c: In function 'dec_ucount':
      kernel/ucount.c:211: error:
      	implicit declaration of function 'atomic_sub_if_positive'
      
      Moving the definition of atomic_dec_if_positive() into the metag
      conditional code fixes the problem.
      
      Fixes: 6006c0d8 ("metag: Atomics, locks and bitops")
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      63d7f335
    • Max Staudt's avatar
      fbdev/efifb: Fix 16 color palette entry calculation · 16654a56
      Max Staudt authored
      commit d50b3f43 upstream.
      
      When using efifb with a 16-bit (5:6:5) visual, fbcon's text is rendered
      in the wrong colors - e.g. text gray (#aaaaaa) is rendered as green
      (#50bc50) and neighboring pixels have slightly different values
      (such as #50bc78).
      
      The reason is that fbcon loads its 16 color palette through
      efifb_setcolreg(), which in turn calculates a 32-bit value to write
      into memory for each palette index.
      Until now, this code could only handle 8-bit visuals and didn't mask
      overlapping values when ORing them.
      
      With this patch, fbcon displays the correct colors when a qemu VM is
      booted in 16-bit mode (in GRUB: "set gfxpayload=800x600x16").
      
      Fixes: 7c83172b ("x86_64 EFI boot support: EFI frame buffer driver")  # v2.6.24+
      Signed-off-by: default avatarMax Staudt <mstaudt@suse.de>
      Acked-By: default avatarPeter Jones <pjones@redhat.com>
      Signed-off-by: default avatarTomi Valkeinen <tomi.valkeinen@ti.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      16654a56
    • Bart Van Assche's avatar
      dm: mark request_queue dead before destroying the DM device · fa88cd79
      Bart Van Assche authored
      commit 3b785fbc upstream.
      
      This avoids that new requests are queued while __dm_destroy() is in
      progress.
      
      [js] use md->queue instead of non-present helper
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      fa88cd79
    • Jan Remmet's avatar
      regulator: tps65910: Work around silicon erratum SWCZ010 · 7fbbad13
      Jan Remmet authored
      commit 8f9165c9 upstream.
      
      http://www.ti.com/lit/pdf/SWCZ010:
        DCDC o/p voltage can go higher than programmed value
      
      Impact:
      VDDI, VDD2, and VIO output programmed voltage level can go higher than
      expected or crash, when coming out of PFM to PWM mode or using DVFS.
      
      Description:
      When DCDC CLK SYNC bits are 11/01:
      * VIO 3-MHz oscillator is the source clock of the digital core and input
        clock of VDD1 and VDD2
      * Turn-on of VDD1 and VDD2 HSD PFETis synchronized or at a constant
        phase shift
      * Current pulled though VCC1+VCC2 is Iload(VDD1) + Iload(VDD2)
      * The 3 HSD PFET will be turned-on at the same time, causing the highest
        possible switching noise on the application. This noise level depends
        on the layout, the VBAT level, and the load current. The noise level
        increases with improper layout.
      
      When DCDC CLK SYNC bits are 00:
      * VIO 3-MHz oscillator is the source clock of digital core
      * VDD1 and VDD2 are running on their own 3-MHz oscillator
      * Current pulled though VCC1+VCC2 average of Iload(VDD1) + Iload(VDD2)
      * The switching noise of the 3 SMPS will be randomly spread over time,
        causing lower overall switching noise.
      
      Workaround:
      Set DCDCCTRL_REG[1:0]= 00.
      Signed-off-by: default avatarJan Remmet <j.remmet@phytec.de>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      7fbbad13
    • Peter Ujfalusi's avatar
      ASoC: omap-mcpdm: Fix irq resource handling · bc90bffd
      Peter Ujfalusi authored
      commit a8719670 upstream.
      
      Fixes: ddd17531 ("ASoC: omap-mcpdm: Clean up with devm_* function")
      
      Managed irq request will not doing any good in ASoC probe level as it is
      not going to free up the irq when the driver is unbound from the sound
      card.
      Signed-off-by: default avatarPeter Ujfalusi <peter.ujfalusi@ti.com>
      Reported-by: default avatarRussell King <linux@armlinux.org.uk>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      bc90bffd
    • Dan Carpenter's avatar
      mfd: 88pm80x: Double shifting bug in suspend/resume · a28d8358
      Dan Carpenter authored
      commit 9a6dc644 upstream.
      
      set_bit() and clear_bit() take the bit number so this code is really
      doing "1 << (1 << irq)" which is a double shift bug.  It's done
      consistently so it won't cause a problem unless "irq" is more than 4.
      
      Fixes: 70c6cce0 ('mfd: Support 88pm80x in 80x driver')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      a28d8358
    • Andrey Ryabinin's avatar
      mpi: Fix NULL ptr dereference in mpi_powm() [ver #3] · 940d1175
      Andrey Ryabinin authored
      commit f5527fff upstream.
      
      This fixes CVE-2016-8650.
      
      If mpi_powm() is given a zero exponent, it wants to immediately return
      either 1 or 0, depending on the modulus.  However, if the result was
      initalised with zero limb space, no limbs space is allocated and a
      NULL-pointer exception ensues.
      
      Fix this by allocating a minimal amount of limb space for the result when
      the 0-exponent case when the result is 1 and not touching the limb space
      when the result is 0.
      
      This affects the use of RSA keys and X.509 certificates that carry them.
      
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6
      PGD 0
      Oops: 0002 [#1] SMP
      Modules linked in:
      CPU: 3 PID: 3014 Comm: keyctl Not tainted 4.9.0-rc6-fscache+ #278
      Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
      task: ffff8804011944c0 task.stack: ffff880401294000
      RIP: 0010:[<ffffffff8138ce5d>]  [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6
      RSP: 0018:ffff880401297ad8  EFLAGS: 00010212
      RAX: 0000000000000000 RBX: ffff88040868bec0 RCX: ffff88040868bba0
      RDX: ffff88040868b260 RSI: ffff88040868bec0 RDI: ffff88040868bee0
      RBP: ffff880401297ba8 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000047 R11: ffffffff8183b210 R12: 0000000000000000
      R13: ffff8804087c7600 R14: 000000000000001f R15: ffff880401297c50
      FS:  00007f7a7918c700(0000) GS:ffff88041fb80000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 0000000401250000 CR4: 00000000001406e0
      Stack:
       ffff88040868bec0 0000000000000020 ffff880401297b00 ffffffff81376cd4
       0000000000000100 ffff880401297b10 ffffffff81376d12 ffff880401297b30
       ffffffff81376f37 0000000000000100 0000000000000000 ffff880401297ba8
      Call Trace:
       [<ffffffff81376cd4>] ? __sg_page_iter_next+0x43/0x66
       [<ffffffff81376d12>] ? sg_miter_get_next_page+0x1b/0x5d
       [<ffffffff81376f37>] ? sg_miter_next+0x17/0xbd
       [<ffffffff8138ba3a>] ? mpi_read_raw_from_sgl+0xf2/0x146
       [<ffffffff8132a95c>] rsa_verify+0x9d/0xee
       [<ffffffff8132acca>] ? pkcs1pad_sg_set_buf+0x2e/0xbb
       [<ffffffff8132af40>] pkcs1pad_verify+0xc0/0xe1
       [<ffffffff8133cb5e>] public_key_verify_signature+0x1b0/0x228
       [<ffffffff8133d974>] x509_check_for_self_signed+0xa1/0xc4
       [<ffffffff8133cdde>] x509_cert_parse+0x167/0x1a1
       [<ffffffff8133d609>] x509_key_preparse+0x21/0x1a1
       [<ffffffff8133c3d7>] asymmetric_key_preparse+0x34/0x61
       [<ffffffff812fc9f3>] key_create_or_update+0x145/0x399
       [<ffffffff812fe227>] SyS_add_key+0x154/0x19e
       [<ffffffff81001c2b>] do_syscall_64+0x80/0x191
       [<ffffffff816825e4>] entry_SYSCALL64_slow_path+0x25/0x25
      Code: 56 41 55 41 54 53 48 81 ec a8 00 00 00 44 8b 71 04 8b 42 04 4c 8b 67 18 45 85 f6 89 45 80 0f 84 b4 06 00 00 85 c0 75 2f 41 ff ce <49> c7 04 24 01 00 00 00 b0 01 75 0b 48 8b 41 18 48 83 38 01 0f
      RIP  [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6
       RSP <ffff880401297ad8>
      CR2: 0000000000000000
      ---[ end trace d82015255d4a5d8d ]---
      
      Basically, this is a backport of a libgcrypt patch:
      
      	http://git.gnupg.org/cgi-bin/gitweb.cgi?p=libgcrypt.git;a=patch;h=6e1adb05d290aeeb1c230c763970695f4a538526
      
      Fixes: cdec9cb5 ("crypto: GnuPG based MPI lib - source files (part 1)")
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
      cc: linux-ima-devel@lists.sourceforge.net
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      940d1175
    • Michael Walle's avatar
      hwmon: (adt7411) set bit 3 in CFG1 register · 2c8c5e67
      Michael Walle authored
      commit b53893aa upstream.
      
      According to the datasheet you should only write 1 to this bit. If it is
      not set, at least AIN3 will return bad values on newer silicon revisions.
      
      Fixes: d84ca5b3 ("hwmon: Add driver for ADT7411 voltage and temperature sensor")
      Signed-off-by: default avatarMichael Walle <michael@walle.cc>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      2c8c5e67
    • Sergei Miroshnichenko's avatar
      can: dev: fix deadlock reported after bus-off · ebee2e2f
      Sergei Miroshnichenko authored
      commit 9abefcb1 upstream.
      
      A timer was used to restart after the bus-off state, leading to a
      relatively large can_restart() executed in an interrupt context,
      which in turn sets up pinctrl. When this happens during system boot,
      there is a high probability of grabbing the pinctrl_list_mutex,
      which is locked already by the probe() of other device, making the
      kernel suspect a deadlock condition [1].
      
      To resolve this issue, the restart_timer is replaced by a delayed
      work.
      
      [1] https://github.com/victronenergy/venus/issues/24Signed-off-by: default avatarSergei Miroshnichenko <sergeimir@emcraft.com>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      ebee2e2f
    • zhong jiang's avatar
      mm,ksm: fix endless looping in allocating memory when ksm enable · 3419dc51
      zhong jiang authored
      commit 5b398e41 upstream.
      
      I hit the following hung task when runing a OOM LTP test case with 4.1
      kernel.
      
      Call trace:
      [<ffffffc000086a88>] __switch_to+0x74/0x8c
      [<ffffffc000a1bae0>] __schedule+0x23c/0x7bc
      [<ffffffc000a1c09c>] schedule+0x3c/0x94
      [<ffffffc000a1eb84>] rwsem_down_write_failed+0x214/0x350
      [<ffffffc000a1e32c>] down_write+0x64/0x80
      [<ffffffc00021f794>] __ksm_exit+0x90/0x19c
      [<ffffffc0000be650>] mmput+0x118/0x11c
      [<ffffffc0000c3ec4>] do_exit+0x2dc/0xa74
      [<ffffffc0000c46f8>] do_group_exit+0x4c/0xe4
      [<ffffffc0000d0f34>] get_signal+0x444/0x5e0
      [<ffffffc000089fcc>] do_signal+0x1d8/0x450
      [<ffffffc00008a35c>] do_notify_resume+0x70/0x78
      
      The oom victim cannot terminate because it needs to take mmap_sem for
      write while the lock is held by ksmd for read which loops in the page
      allocator
      
      ksm_do_scan
      	scan_get_next_rmap_item
      		down_read
      		get_next_rmap_item
      			alloc_rmap_item   #ksmd will loop permanently.
      
      There is no way forward because the oom victim cannot release any memory
      in 4.1 based kernel.  Since 4.6 we have the oom reaper which would solve
      this problem because it would release the memory asynchronously.
      Nevertheless we can relax alloc_rmap_item requirements and use
      __GFP_NORETRY because the allocation failure is acceptable as ksm_do_scan
      would just retry later after the lock got dropped.
      
      Such a patch would be also easy to backport to older stable kernels which
      do not have oom_reaper.
      
      While we are at it add GFP_NOWARN so the admin doesn't have to be alarmed
      by the allocation failure.
      
      Link: http://lkml.kernel.org/r/1474165570-44398-1-git-send-email-zhongjiang@huawei.comSigned-off-by: default avatarzhong jiang <zhongjiang@huawei.com>
      Suggested-by: default avatarHugh Dickins <hughd@google.com>
      Suggested-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      3419dc51
    • Mike Snitzer's avatar
      dm flakey: fix reads to be issued if drop_writes configured · 46049e5c
      Mike Snitzer authored
      commit 299f6230 upstream.
      
      v4.8-rc3 commit 99f3c90d ("dm flakey: error READ bios during the
      down_interval") overlooked the 'drop_writes' feature, which is meant to
      allow reads to be issued rather than errored, during the down_interval.
      
      Fixes: 99f3c90d ("dm flakey: error READ bios during the down_interval")
      Reported-by: default avatarQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      46049e5c
    • Chris Metcalf's avatar
      tile: avoid using clocksource_cyc2ns with absolute cycle count · 04be31f5
      Chris Metcalf authored
      commit e658a6f1 upstream.
      
      For large values of "mult" and long uptimes, the intermediate
      result of "cycles * mult" can overflow 64 bits.  For example,
      the tile platform calls clocksource_cyc2ns with a 1.2 GHz clock;
      we have mult = 853, and after 208.5 days, we overflow 64 bits.
      
      Since clocksource_cyc2ns() is intended to be used for relative
      cycle counts, not absolute cycle counts, performance is more
      importance than accepting a wider range of cycle values.  So,
      just use mult_frac() directly in tile's sched_clock().
      
      Commit 4cecf6d4 ("sched, x86: Avoid unnecessary overflow
      in sched_clock") by Salman Qazi results in essentially the same
      generated code for x86 as this change does for tile.  In fact,
      a follow-on change by Salman introduced mult_frac() and switched
      to using it, so the C code was largely identical at that point too.
      
      Peter Zijlstra then added mul_u64_u32_shr() and switched x86
      to use it.  This is, in principle, better; by optimizing the
      64x64->64 multiplies to be 32x32->64 multiplies we can potentially
      save some time.  However, the compiler piplines the 64x64->64
      multiplies pretty well, and the conditional branch in the generic
      mul_u64_u32_shr() causes some bubbles in execution, with the
      result that it's pretty much a wash.  If tilegx provided its own
      implementation of mul_u64_u32_shr() without the conditional branch,
      we could potentially save 3 cycles, but that seems like small gain
      for a fair amount of additional build scaffolding; no other platform
      currently provides a mul_u64_u32_shr() override, and tile doesn't
      currently have an <asm/div64.h> header to put the override in.
      
      Additionally, gcc currently has an optimization bug that prevents
      it from recognizing the opportunity to use a 32x32->64 multiply,
      and so the result would be no better than the existing mult_frac()
      until such time as the compiler is fixed.
      
      For now, just using mult_frac() seems like the right answer.
      Signed-off-by: default avatarChris Metcalf <cmetcalf@mellanox.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      04be31f5
    • Myron Stowe's avatar
      PCI: Handle read-only BARs on AMD CS553x devices · 569d62a3
      Myron Stowe authored
      commit 06cf35f9 upstream.
      
      Some AMD CS553x devices have read-only BARs because of a firmware or
      hardware defect.  There's a workaround in quirk_cs5536_vsa(), but it no
      longer works after 36e81648 ("PCI: Restore detection of read-only
      BARs").  Prior to 36e81648, we filled in res->start; afterwards we
      leave it zeroed out.  The quirk only updated the size, so the driver tried
      to use a region starting at zero, which didn't work.
      
      Expand quirk_cs5536_vsa() to read the base addresses from the BARs and
      hard-code the sizes.
      
      On Nix's system BAR 2's read-only value is 0x6200.  Prior to 36e81648,
      we interpret that as a 512-byte BAR based on the lowest-order bit set.  Per
      datasheet sec 5.6.1, that BAR (MFGPT) requires only 64 bytes; use that to
      avoid clearing any address bits if a platform uses only 64-byte alignment.
      
      [js] pcibios_bus_to_resource takes pdev, not bus in 3.12
      
      [bhelgaas: changelog, reduce BAR 2 size to 64]
      Fixes: 36e81648 ("PCI: Restore detection of read-only BARs")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=85991#c4
      Link: http://support.amd.com/TechDocs/31506_cs5535_databook.pdf
      Link: http://support.amd.com/TechDocs/33238G_cs5536_db.pdfReported-and-tested-by: default avatarNix <nix@esperi.org.uk>
      Signed-off-by: default avatarMyron Stowe <myron.stowe@redhat.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      569d62a3
    • Punit Agrawal's avatar
      ACPI / APEI: Fix incorrect return value of ghes_proc() · bdb29c7b
      Punit Agrawal authored
      commit 806487a8 upstream.
      
      Although ghes_proc() tests for errors while reading the error status,
      it always return success (0). Fix this by propagating the return
      value.
      
      Fixes: d334a491 (ACPI, APEI, Generic Hardware Error Source memory error support)
      Signed-of-by: default avatarPunit Agrawal <punit.agrawa.@arm.com>
      Tested-by: default avatarTyler Baicar <tbaicar@codeaurora.org>
      Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
      [ rjw: Subject ]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      bdb29c7b
    • Alexander Usyskin's avatar
      mei: bus: fix received data size check in NFC fixup · 2ef72292
      Alexander Usyskin authored
      commit 582ab27a upstream.
      
      NFC version reply size checked against only header size, not against
      full message size. That may lead potentially to uninitialized memory access
      in version data.
      
      That leads to warnings when version data is accessed:
      drivers/misc/mei/bus-fixup.c: warning: '*((void *)&ver+11)' may be used uninitialized in this function [-Wuninitialized]:  => 212:2
      
      Reported in
      Build regressions/improvements in v4.9-rc3
      https://lkml.org/lkml/2016/10/30/57
      
      [js] the check is in 3.12 only once
      
      Fixes: 59fcd7c6 (mei: nfc: Initial nfc implementation)
      Signed-off-by: default avatarAlexander Usyskin <alexander.usyskin@intel.com>
      Signed-off-by: default avatarTomas Winkler <tomas.winkler@intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      2ef72292
    • Arnd Bergmann's avatar
      staging: iio: ad5933: avoid uninitialized variable in error case · 0b0d4fc3
      Arnd Bergmann authored
      commit 34eee70a upstream.
      
      The ad5933_i2c_read function returns an error code to indicate
      whether it could read data or not. However ad5933_work() ignores
      this return code and just accesses the data unconditionally,
      which gets detected by gcc as a possible bug:
      
      drivers/staging/iio/impedance-analyzer/ad5933.c: In function 'ad5933_work':
      drivers/staging/iio/impedance-analyzer/ad5933.c:649:16: warning: 'status' may be used uninitialized in this function [-Wmaybe-uninitialized]
      
      This adds minimal error handling so we only evaluate the
      data if it was correctly read.
      
      Link: https://patchwork.kernel.org/patch/8110281/Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      0b0d4fc3
    • Long Li's avatar
      hv: do not lose pending heartbeat vmbus packets · c670aaed
      Long Li authored
      commit 407a3aee upstream.
      
      The host keeps sending heartbeat packets independent of the
      guest responding to them.  Even though we respond to the heartbeat messages at
      interrupt level, we can have situations where there maybe multiple heartbeat
      messages pending that have not been responded to. For instance this occurs when the
      VM is paused and the host continues to send the heartbeat messages.
      Address this issue by draining and responding to all
      the heartbeat messages that maybe pending.
      Signed-off-by: default avatarLong Li <longli@microsoft.com>
      Signed-off-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      c670aaed