Commits · 1171afc4a34e2926e6e8e27c896cf328c8825ac3 · nexedi / linux

23 Dec, 2016 11 commits

mnt: Add a per mount namespace limit on the number of mounts · 1171afc4

Eric W. Biederman authored Dec 14, 2016

[ Upstream commit d2921684 ]

CAI Qian <caiqian@redhat.com> pointed out that the semantics
of shared subtrees make it possible to create an exponentially
increasing number of mounts in a mount namespace.

    mkdir /tmp/1 /tmp/2
    mount --make-rshared /
    for i in $(seq 1 20) ; do mount --bind /tmp/1 /tmp/2 ; done

Will create create 2^20 or 1048576 mounts, which is a practical problem
as some people have managed to hit this by accident.

As such CVE-2016-6213 was assigned.

Ian Kent <raven@themaw.net> described the situation for autofs users
as follows:

> The number of mounts for direct mount maps is usually not very large because of
> the way they are implemented, large direct mount maps can have performance
> problems. There can be anywhere from a few (likely case a few hundred) to less
> than 10000, plus mounts that have been triggered and not yet expired.
>
> Indirect mounts have one autofs mount at the root plus the number of mounts that
> have been triggered and not yet expired.
>
> The number of autofs indirect map entries can range from a few to the common
> case of several thousand and in rare cases up to between 30000 and 50000. I've
> not heard of people with maps larger than 50000 entries.
>
> The larger the number of map entries the greater the possibility for a large
> number of active mounts so it's not hard to expect cases of a 1000 or somewhat
> more active mounts.

So I am setting the default number of mounts allowed per mount
namespace at 100,000.  This is more than enough for any use case I
know of, but small enough to quickly stop an exponential increase
in mounts.  Which should be perfect to catch misconfigurations and
malfunctioning programs.

For anyone who needs a higher limit this can be changed by writing
to the new /proc/sys/fs/mount-max sysctl.
Tested-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Conflicts:
	fs/namespace.c
	kernel/sysctl.c
Signed-off-by: Philipp Hahn <hahn@univention.de>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

1171afc4

posix_acl: Clear SGID bit when setting file permissions · 62fa696b

Jan Kara authored Dec 14, 2016

[ Upstream commit 07393101 ]

When file permissions are modified via chmod(2) and the user is not in
the owning group or capable of CAP_FSETID, the setgid bit is cleared in
inode_change_ok().  Setting a POSIX ACL via setxattr(2) sets the file
permissions as well as the new ACL, but doesn't clear the setgid bit in
a similar way; this allows to bypass the check in chmod(2).  Fix that.

References: CVE-2016-7097
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Philipp Hahn <hahn@univention.de>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

62fa696b

fs: Avoid premature clearing of capabilities · de42b955

Jan Kara authored Dec 14, 2016

[ Upstream commit 030b533c ]

Currently, notify_change() clears capabilities or IMA attributes by
calling security_inode_killpriv() before calling into ->setattr. Thus it
happens before any other permission checks in inode_change_ok() and user
is thus allowed to trigger clearing of capabilities or IMA attributes
for any file he can look up e.g. by calling chown for that file. This is
unexpected and can lead to user DoSing a system.

Fix the problem by calling security_inode_killpriv() at the end of
inode_change_ok() instead of from notify_change(). At that moment we are
sure user has permissions to do the requested change.

References: CVE-2015-1350
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Philipp Hahn <hahn@univention.de>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

de42b955

fs: Give dentry to inode_change_ok() instead of inode · cb8e1eef

Jan Kara authored Dec 14, 2016

[ Upstream commit 31051c85 ]

inode_change_ok() will be resposible for clearing capabilities and IMA
extended attributes and as such will need dentry. Give it as an argument
to inode_change_ok() instead of an inode. Also rename inode_change_ok()
to setattr_prepare() to better relect that it does also some
modifications in addition to checks.

References: CVE-2015-1350
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Philipp Hahn <hahn@univention.de>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

cb8e1eef

nfsd: Disable NFSv2 timestamp workaround for NFSv3+ · 2ee3ceec

Andreas Gruenbacher authored Dec 14, 2016

NFSv2 can set the atime and/or mtime of a file to specific timestamps but not
to the server's current time.  To implement the equivalent of utimes("file",
NULL), it uses a heuristic.

NFSv3 and later do support setting the atime and/or mtime to the server's
current time directly.  The NFSv2 heuristic is still enabled, and causes
timestamps to be set wrong sometimes.

Fix this by moving the heuristic into the NFSv2 specific code.  We can leave it
out of the create code path: the owner can always set timestamps arbitrarily,
and the workaround would never trigger.

References: CVE-2015-1350
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Philipp Hahn <hahn@univention.de>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

2ee3ceec

fuse: Propagate dentry down to inode_change_ok() · 820bc458

Jan Kara authored Dec 14, 2016

[ Upstream commit 62490330 ]

To avoid clearing of capabilities or security related extended
attributes too early, inode_change_ok() will need to take dentry instead
of inode. Propagate it down to fuse_do_setattr().

References: CVE-2015-1350
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>

Conflicts: Missing file_dentry() from d101a125Signed-off-by: Philipp Hahn <hahn@univention.de>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

820bc458

xfs: Propagate dentry down to inode_change_ok() · 89bc54c5

Jan Kara authored Dec 14, 2016

[ upstream commit 69bca807 ]

To avoid clearing of capabilities or security related extended
attributes too early, inode_change_ok() will need to take dentry instead
of inode. Propagate dentry down to functions calling inode_change_ok().
This is rather straightforward except for xfs_set_mode() function which
does not have dentry easily available. Luckily that function does not
call inode_change_ok() anyway so we just have to do a little dance with
function prototypes.

References: CVE-2015-1350
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>

Conflicts: Missing file_dentry() from d101a125Signed-off-by: Philipp Hahn <hahn@univention.de>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

89bc54c5

xattr: Option to disable meta-data block cache · 1b364dc9

Andreas Dilger authored Dec 14, 2016

mbcache provides absolutely no value for Lustre xattrs (because
they are unique and cannot be shared between files) and as we can
see it has a noticable overhead in some cases. In the past there
was a CONFIG_MBCACHE option that would allow it to be disabled,
but this was removed in newer kernels, so we will need to patch
ldiskfs to fix this.

References: <https://bugzilla.kernel.org/show_bug.cgi?id=107301>
References: <https://git.hpdd.intel.com/fs/lustre-release.git/blob_plain/HEAD:/ldiskfs/kernel_patches/patches/rhel7/ext4-disable-mb-cache.patch>
References: CVE-2015-8952

On 13.12.2016 at 15:58 Ben Hutchings wrote:
> I decided not to apply this as it's a userland ABI extension that we
> would then need to carry indefinitely.
Signed-off-by: Philipp Hahn <hahn@univention.de>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

1b364dc9

tcp: fix use after free in tcp_xmit_retransmit_queue() · 9a66bc6e

Eric Dumazet authored Dec 14, 2016

[ Upstream commit bb1fceca ]

When tcp_sendmsg() allocates a fresh and empty skb, it puts it at the
tail of the write queue using tcp_add_write_queue_tail()

Then it attempts to copy user data into this fresh skb.

If the copy fails, we undo the work and remove the fresh skb.

Unfortunately, this undo lacks the change done to tp->highest_sack and
we can leave a dangling pointer (to a freed skb)

Later, tcp_xmit_retransmit_queue() can dereference this pointer and
access freed memory. For regular kernels where memory is not unmapped,
this might cause SACK bugs because tcp_highest_sack_seq() is buggy,
returning garbage instead of tp->snd_nxt, but with various debug
features like CONFIG_DEBUG_PAGEALLOC, this can crash the kernel.

This bug was found by Marco Grassi thanks to syzkaller.

Fixes: 6859d494 ("[TCP]: Abstract tp->highest_sack accessing & point to next skb")
Reported-by: Marco Grassi <marco.gra@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
References: CVE-2016-6828
Signed-off-by: Philipp Hahn <hahn@univention.de>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

9a66bc6e

x86/kexec: add -fno-PIE · ebdb88b8

Sebastian Andrzej Siewior authored Nov 04, 2016

[ Upstream commit 90944e40 ]

If the gcc is configured to do -fPIE by default then the build aborts
later with:
| Unsupported relocation type: unknown type rel type name (29)

Tagging it stable so it is possible to compile recent stable kernels as
well.

Cc: stable@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Michal Marek <mmarek@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

ebdb88b8

scripts/has-stack-protector: add -fno-PIE · 672612a2

Sebastian Andrzej Siewior authored Nov 04, 2016

[ Upstream commit 82031ea2 ]

Adding -no-PIE to the fstack protector check. -no-PIE was introduced
before -fstack-protector so there is no need for a runtime check.

Without it the build stops:
|Cannot use CONFIG_CC_STACKPROTECTOR_STRONG: -fstack-protector-strong available but compiler is broken

due to -mcmodel=kernel + -fPIE if -fPIE is enabled by default.

Tagging it stable so it is possible to compile recent stable kernels as
well.

Cc: stable@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Michal Marek <mmarek@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

672612a2

22 Dec, 2016 9 commits

x86/init: Fix cr4_init_shadow() on CR4-less machines · e06ded86

Andy Lutomirski authored Sep 28, 2016

[ Upstream commit e1bfc11c ]

cr4_init_shadow() will panic on 486-like machines without CR4.  Fix
it using __read_cr4_safe().

Reported-by: david@saggiorato.net
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: 1e02ce4c ("x86: Store a per-cpu shadow copy of CR4")
Link: http://lkml.kernel.org/r/43a20f81fb504013bf613913dc25574b45336a61.1475091074.git.luto@kernel.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

e06ded86

ARM: 8617/1: dma: fix dma_max_pfn() · eec74693

Roger Quadros authored Sep 29, 2016

[ Upstream commit d248220f ]

Since commit 6ce0d200 ("ARM: dma: Use dma_pfn_offset for dma address translation"),
dma_to_pfn() already returns the PFN with the physical memory start offset
so we don't need to add it again.

This fixes USB mass storage lock-up problem on systems that can't do DMA
over the entire physical memory range (e.g.) Keystone 2 systems with 4GB RAM
can only do DMA over the first 2GB. [K2E-EVM].

What happens there is that without this patch SCSI layer sets a wrong
bounce buffer limit in scsi_calculate_bounce_limit() for the USB mass
storage device. dma_max_pfn() evaluates to 0x8fffff and bounce_limit
is set to 0x8fffff000 whereas maximum DMA'ble physical memory on Keystone 2
is 0x87fffffff. This results in non DMA'ble pages being given to the
USB controller and hence the lock-up.

NOTE: in the above case, USB-SCSI-device's dma_pfn_offset was showing as 0.
This should have really been 0x780000 as on K2e, LOWMEM_START is 0x80000000
and HIGHMEM_START is 0x800000000. DMA zone is 2GB so dma_max_pfn should be
0x87ffff. The incorrect dma_pfn_offset for the USB storage device is because
USB devices are not correctly inheriting the dma_pfn_offset from the
USB host controller. This will be fixed by a separate patch.

Fixes: 6ce0d200 ("ARM: dma: Use dma_pfn_offset for dma address translation")
Cc: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Olof Johansson <olof@lixom.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Reported-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

eec74693

mm,ksm: fix endless looping in allocating memory when ksm enable · 58024f82

zhong jiang authored Sep 28, 2016

[ Upstream commit 5b398e41 ]

I hit the following hung task when runing a OOM LTP test case with 4.1
kernel.

Call trace:
[<ffffffc000086a88>] __switch_to+0x74/0x8c
[<ffffffc000a1bae0>] __schedule+0x23c/0x7bc
[<ffffffc000a1c09c>] schedule+0x3c/0x94
[<ffffffc000a1eb84>] rwsem_down_write_failed+0x214/0x350
[<ffffffc000a1e32c>] down_write+0x64/0x80
[<ffffffc00021f794>] __ksm_exit+0x90/0x19c
[<ffffffc0000be650>] mmput+0x118/0x11c
[<ffffffc0000c3ec4>] do_exit+0x2dc/0xa74
[<ffffffc0000c46f8>] do_group_exit+0x4c/0xe4
[<ffffffc0000d0f34>] get_signal+0x444/0x5e0
[<ffffffc000089fcc>] do_signal+0x1d8/0x450
[<ffffffc00008a35c>] do_notify_resume+0x70/0x78

The oom victim cannot terminate because it needs to take mmap_sem for
write while the lock is held by ksmd for read which loops in the page
allocator

ksm_do_scan
	scan_get_next_rmap_item
		down_read
		get_next_rmap_item
			alloc_rmap_item   #ksmd will loop permanently.

There is no way forward because the oom victim cannot release any memory
in 4.1 based kernel.  Since 4.6 we have the oom reaper which would solve
this problem because it would release the memory asynchronously.
Nevertheless we can relax alloc_rmap_item requirements and use
__GFP_NORETRY because the allocation failure is acceptable as ksm_do_scan
would just retry later after the lock got dropped.

Such a patch would be also easy to backport to older stable kernels which
do not have oom_reaper.

While we are at it add GFP_NOWARN so the admin doesn't have to be alarmed
by the allocation failure.

Link: http://lkml.kernel.org/r/1474165570-44398-1-git-send-email-zhongjiang@huawei.comSigned-off-by: zhong jiang <zhongjiang@huawei.com>
Suggested-by: Hugh Dickins <hughd@google.com>
Suggested-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

58024f82

can: dev: fix deadlock reported after bus-off · d427d645

Sergei Miroshnichenko authored Sep 07, 2016

[ Upstream commit 9abefcb1 ]

A timer was used to restart after the bus-off state, leading to a
relatively large can_restart() executed in an interrupt context,
which in turn sets up pinctrl. When this happens during system boot,
there is a high probability of grabbing the pinctrl_list_mutex,
which is locked already by the probe() of other device, making the
kernel suspect a deadlock condition [1].

To resolve this issue, the restart_timer is replaced by a delayed
work.

[1] https://github.com/victronenergy/venus/issues/24Signed-off-by: Sergei Miroshnichenko <sergeimir@emcraft.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

d427d645

cpuset: handle race between CPU hotplug and cpuset_hotplug_work · 791a9289

Joonwoo Park authored Sep 11, 2016

[ Upstream commit 28b89b9e ]

A discrepancy between cpu_online_mask and cpuset's effective_cpus
mask is inevitable during hotplug since cpuset defers updating of
effective_cpus mask using a workqueue, during which time nothing
prevents the system from more hotplug operations.  For that reason
guarantee_online_cpus() walks up the cpuset hierarchy until it finds
an intersection under the assumption that top cpuset's effective_cpus
mask intersects with cpu_online_mask even with such a race occurring.

However a sequence of CPU hotplugs can open a time window, during which
none of the effective CPUs in the top cpuset intersect with
cpu_online_mask.

For example when there are 4 possible CPUs 0-3 and only CPU0 is online:

  ========================  ===========================
   cpu_online_mask           top_cpuset.effective_cpus
  ========================  ===========================
   echo 1 > cpu2/online.
   CPU hotplug notifier woke up hotplug work but not yet scheduled.
      [0,2]                     [0]

   echo 0 > cpu0/online.
   The workqueue is still runnable.
      [2]                       [0]
  ========================  ===========================

  Now there is no intersection between cpu_online_mask and
  top_cpuset.effective_cpus.  Thus invoking sys_sched_setaffinity() at
  this moment can cause following:

   Unable to handle kernel NULL pointer dereference at virtual address 000000d0
   ------------[ cut here ]------------
   Kernel BUG at ffffffc0001389b0 [verbose debug info unavailable]
   Internal error: Oops - BUG: 96000005 [#1] PREEMPT SMP
   Modules linked in:
   CPU: 2 PID: 1420 Comm: taskset Tainted: G        W       4.4.8+ #98
   task: ffffffc06a5c4880 ti: ffffffc06e124000 task.ti: ffffffc06e124000
   PC is at guarantee_online_cpus+0x2c/0x58
   LR is at cpuset_cpus_allowed+0x4c/0x6c
   <snip>
   Process taskset (pid: 1420, stack limit = 0xffffffc06e124020)
   Call trace:
   [<ffffffc0001389b0>] guarantee_online_cpus+0x2c/0x58
   [<ffffffc00013b208>] cpuset_cpus_allowed+0x4c/0x6c
   [<ffffffc0000d61f0>] sched_setaffinity+0xc0/0x1ac
   [<ffffffc0000d6374>] SyS_sched_setaffinity+0x98/0xac
   [<ffffffc000085cb0>] el0_svc_naked+0x24/0x28

The top cpuset's effective_cpus are guaranteed to be identical to
cpu_online_mask eventually.  Hence fall back to cpu_online_mask when
there is no intersection between top cpuset's effective_cpus and
cpu_online_mask.
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: cgroups@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: <stable@vger.kernel.org> # 3.17+
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

791a9289

mtd: nand: davinci: Reinitialize the HW ECC engine in 4bit hwctl · 6b82b060

Karl Beldan authored Aug 29, 2016

[ Upstream commit f6d7c1b5 ]

This fixes subpage writes when using 4-bit HW ECC.

There has been numerous reports about ECC errors with devices using this
driver for a while.  Also the 4-bit ECC has been reported as broken with
subpages in [1] and with 16 bits NANDs in the driver and in mach* board
files both in mainline and in the vendor BSPs.

What I saw with 4-bit ECC on a 16bits NAND (on an LCDK) which got me to
try reinitializing the ECC engine:
- R/W on whole pages properly generates/checks RS code
- try writing the 1st subpage only of a blank page, the subpage is well
  written and the RS code properly generated, re-reading the same page
  the HW detects some ECC error, reading the same page again no ECC
  error is detected

Note that the ECC engine is already reinitialized in the 1-bit case.

Tested on my LCDK with UBI+UBIFS using subpages.
This could potentially get rid of the issue workarounded in [1].

[1] 28c015a9 ("mtd: davinci-nand: disable subpage write for keystone-nand")

Fixes: 6a4123e5 ("mtd: nand: davinci_nand, 4-bit ECC for smallpage")
Cc: <stable@vger.kernel.org>
Signed-off-by: Karl Beldan <kbeldan@baylibre.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

6b82b060

drm/msm: fix use of copy_from_user() while holding spinlock · e537a097

Rob Clark authored Aug 22, 2016

[ Upstream commit 89f82cbb ]

Use instead __copy_from_user_inatomic() and fallback to slow-path where
we drop and re-aquire the lock in case of fault.

Cc: stable@vger.kernel.org
Reported-by: Vaishali Thakkar <vaishali.thakkar@oracle.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

e537a097

bus: arm-ccn: Fix PMU handling of MN · b56eb9cd

Pawel Moll authored Aug 02, 2016

[ Upstream commit 4e486cba ]

The "Miscellaneous Node" fell through cracks of node initialisation,
as its ID is shared with HN-I.

This patch treats MN as a special case (which it is), adding separate
validation check for it and pre-defining the node ID in relevant events
descriptions. That way one can simply run:

	# perf stat -a -e ccn/mn_ecbarrier/ <workload>

Additionally, direction in the MN pseudo-events XP watchpoint
definitions is corrected to be "TX" (1) as they are defined from the
crosspoint point of view (thus barriers are transmitted from XP to MN).

Cc: stable@vger.kernel.org # 3.17+
Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

b56eb9cd

bus: arm-ccn: Provide required event arguments · 7298a8bf

Pawel Moll authored Apr 02, 2015

[ Upstream commit 8f06c51f ]

Since 688d4dfc "perf tools: Support
parsing parameterized events" the perf userspace tools understands
"argument=?" syntax in the events file, making sure that required
arguments are provided by the user and not defaulting to 0, causing
confusion.

This patch adds the required arguments lists for CCN events.
Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

7298a8bf

29 Nov, 2016 1 commit
- Linux 4.1.36 · 8576fa45
  Sasha Levin authored Nov 29, 2016
```
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
```
  8576fa45
26 Nov, 2016 19 commits

kbuild: add -fno-PIE · 39f99860

Sebastian Andrzej Siewior authored Nov 04, 2016

[ Upstream commit 8ae94224 ]

Debian started to build the gcc with -fPIE by default so the kernel
build ends before it starts properly with:
|kernel/bounds.c:1:0: error: code model kernel does not support PIC mode

Also add to KBUILD_AFLAGS due to:

|gcc -Wp,-MD,arch/x86/entry/vdso/vdso32/.note.o.d … -mfentry -DCC_USING_FENTRY … vdso/vdso32/note.S
|arch/x86/entry/vdso/vdso32/note.S:1:0: sorry, unimplemented: -mfentry isn’t supported for 32-bit in combination with -fpic

Tagging it stable so it is possible to compile recent stable kernels as
well.

Cc: stable@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Michal Marek <mmarek@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

39f99860

firewire: net: fix fragmented datagram_size off-by-one · bf5d3d29

Stefan Richter authored Oct 30, 2016

[ Upstream commit e9300a4b ]

RFC 2734 defines the datagram_size field in fragment encapsulation
headers thus:

    datagram_size:  The encoded size of the entire IP datagram.  The
    value of datagram_size [...] SHALL be one less than the value of
    Total Length in the datagram's IP header (see STD 5, RFC 791).

Accordingly, the eth1394 driver of Linux 2.6.36 and older set and got
this field with a -/+1 offset:

    ether1394_tx() /* transmit */
        ether1394_encapsulate_prep()
            hdr->ff.dg_size = dg_size - 1;

    ether1394_data_handler() /* receive */
        if (hdr->common.lf == ETH1394_HDR_LF_FF)
            dg_size = hdr->ff.dg_size + 1;
        else
            dg_size = hdr->sf.dg_size + 1;

Likewise, I observe OS X 10.4 and Windows XP Pro SP3 to transmit 1500
byte sized datagrams in fragments with datagram_size=1499 if link
fragmentation is required.

Only firewire-net sets and gets datagram_size without this offset.  The
result is lacking interoperability of firewire-net with OS X, Windows
XP, and presumably Linux' eth1394.  (I did not test with the latter.)
For example, FTP data transfers to a Linux firewire-net box with max_rec
smaller than the 1500 bytes MTU
  - from OS X fail entirely,
  - from Win XP start out with a bunch of fragmented datagrams which
    time out, then continue with unfragmented datagrams because Win XP
    temporarily reduces the MTU to 576 bytes.

So let's fix firewire-net's datagram_size accessors.

Note that firewire-net thereby loses interoperability with unpatched
firewire-net, but only if link fragmentation is employed.  (This happens
with large broadcast datagrams, and with large datagrams on several
FireWire CardBus cards with smaller max_rec than equivalent PCI cards,
and it can be worked around by setting a small enough MTU.)

Cc: stable@vger.kernel.org
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

bf5d3d29

firewire: net: guard against rx buffer overflows · c604dec3

Stefan Richter authored Oct 29, 2016

[ Upstream commit 667121ac ]

The IP-over-1394 driver firewire-net lacked input validation when
handling incoming fragmented datagrams.  A maliciously formed fragment
with a respectively large datagram_offset would cause a memcpy past the
datagram buffer.

So, drop any packets carrying a fragment with offset + length larger
than datagram_size.

In addition, ensure that
  - GASP header, unfragmented encapsulation header, or fragment
    encapsulation header actually exists before we access it,
  - the encapsulated datagram or fragment is of nonzero size.
Reported-by: Eyal Itkin <eyal.itkin@gmail.com>
Reviewed-by: Eyal Itkin <eyal.itkin@gmail.com>
Fixes: CVE 2016-8633
Cc: stable@vger.kernel.org
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

c604dec3

parisc: Ensure consistent state when switching to kernel stack at syscall entry · 9fe6256c

John David Anglin authored Oct 28, 2016

[ Upstream commit 6ed51832 ]

We have one critical section in the syscall entry path in which we switch from
the userspace stack to kernel stack. In the event of an external interrupt, the
interrupt code distinguishes between those two states by analyzing the value of
sr7. If sr7 is zero, it uses the kernel stack. Therefore it's important, that
the value of sr7 is in sync with the currently enabled stack.

This patch now disables interrupts while executing the critical section.  This
prevents the interrupt handler to possibly see an inconsistent state which in
the worst case can lead to crashes.

Interestingly, in the syscall exit path interrupts were already disabled in the
critical section which switches back to the userspace stack.

Cc: <stable@vger.kernel.org>
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

9fe6256c

ovl: fsync after copy-up · 83a474ed

Miklos Szeredi authored Oct 31, 2016

[ Upstream commit 641089c1 ]

Make sure the copied up file hits the disk before renaming to the final
destination.  If this is not done then the copy-up may corrupt the data in
the file in case of a crash.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

83a474ed

virtio: console: Unlock vqs while freeing buffers · c0b309f1

Matt Redfearn authored Oct 11, 2016

[ Upstream commit 34563769 ]

Commit c6017e79 ("virtio: console: add locks around buffer removal
in port unplug path") added locking around the freeing of buffers in the
vq. However, when free_buf() is called with can_sleep = true and rproc
is enabled, it calls dma_free_coherent() directly, requiring interrupts
to be enabled. Currently a WARNING is triggered due to the spin locking
around free_buf, with a call stack like this:

WARNING: CPU: 3 PID: 121 at ./include/linux/dma-mapping.h:433
free_buf+0x1a8/0x288
Call Trace:
[<8040c538>] show_stack+0x74/0xc0
[<80757240>] dump_stack+0xd0/0x110
[<80430d98>] __warn+0xfc/0x130
[<80430ee0>] warn_slowpath_null+0x2c/0x3c
[<807e7c6c>] free_buf+0x1a8/0x288
[<807ea590>] remove_port_data+0x50/0xac
[<807ea6a0>] unplug_port+0xb4/0x1bc
[<807ea858>] virtcons_remove+0xb0/0xfc
[<807b6734>] virtio_dev_remove+0x58/0xc0
[<807f918c>] __device_release_driver+0xac/0x134
[<807f924c>] device_release_driver+0x38/0x50
[<807f7edc>] bus_remove_device+0xfc/0x130
[<807f4b74>] device_del+0x17c/0x21c
[<807f4c38>] device_unregister+0x24/0x38
[<807b6b50>] unregister_virtio_device+0x28/0x44

Fix this by restructuring the loops to allow the locks to only be taken
where it is necessary to protect the vqs, and release it while the
buffer is being freed.

Fixes: c6017e79 ("virtio: console: add locks around buffer removal in port unplug path")
Cc: stable@vger.kernel.org
Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

c0b309f1

md: be careful not lot leak internal curr_resync value into metadata. -- (all) · 4fe9ae4d

NeilBrown authored Oct 28, 2016

[ Upstream commit 1217e1d1 ]

mddev->curr_resync usually records where the current resync is up to,
but during the starting phase it has some "magic" values.

 1 - means that the array is trying to start a resync, but has yielded
     to another array which shares physical devices, and also needs to
     start a resync
 2 - means the array is trying to start resync, but has found another
     array which shares physical devices and has already started resync.

 3 - means that resync has commensed, but it is possible that nothing
     has actually been resynced yet.

It is important that this value not be visible to user-space and
particularly that it doesn't get written to the metadata, as the
resync or recovery checkpoint.  In part, this is because it may be
slightly higher than the correct value, though this is very rare.
In part, because it is not a multiple of 4K, and some devices only
support 4K aligned accesses.

There are two places where this value is propagates into either
->curr_resync_completed or ->recovery_cp or ->recovery_offset.
These currently avoid the propagation of values 1 and 3, but will
allow 3 to leak through.

Change them to only propagate the value if it is > 3.

As this can cause an array to fail, the patch is suitable for -stable.

Cc: stable@vger.kernel.org (v3.7+)
Reported-by: Viswesh <viswesh.vichu@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

4fe9ae4d

md: sync sync_completed has correct value as recovery finishes. · e1e5cab9

NeilBrown authored Jul 24, 2015

[ Upstream commit 5ed1df2e ]

There can be a small window between the moment that recovery
actually writes the last block and the time when various sysfs
and /proc/mdstat attributes report that it has finished.
During this time, 'sync_completed' can have the wrong value.
This can confuse monitoring software.

So:
 - don't set curr_resync_completed beyond the end of the devices,
 - set it correctly when resync/recovery has completed.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

e1e5cab9

scsi: arcmsr: Send SYNCHRONIZE_CACHE command to firmware · 97d53c4d

Ching Huang authored Oct 19, 2016

[ Upstream commit 2bf7dc84 ]

The arcmsr driver failed to pass SYNCHRONIZE CACHE to controller
firmware. Depending on how drive caches are handled internally by
controller firmware this could potentially lead to data integrity
problems.

Ensure that cache flushes are passed to the controller.

[mkp: applied by hand and removed unused vars]

Cc: <stable@vger.kernel.org>
Signed-off-by: Ching Huang <ching2048@areca.com.tw>
Reported-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

97d53c4d

scsi: scsi_debug: Fix memory leak if LBP enabled and module is unloaded · d207c660

Ewan D. Milne authored Oct 26, 2016

[ Upstream commit 4d2b496f ]

map_storep was not being vfree()'d in the module_exit call.

Cc: <stable@vger.kernel.org>
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

d207c660

drm/radeon/si_dpm: workaround for SI kickers · 169eb57c

Alex Deucher authored Oct 14, 2016

[ Upstream commit 7dc86ef5 ]

Consolidate existing quirks. Fixes stability issues
on some kickers.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

169eb57c

drm/dp/mst: Check peer device type before attempting EDID read · c1593e5d

Ville Syrjälä authored Oct 26, 2016

[ Upstream commit 4da5caa6 ]

Only certain types of pdts have the DDC bus registered, so check for
that before we attempt the EDID read. Othwewise we risk playing around
with an i2c adapter that doesn't actually exist.

Cc: stable@vger.kernel.org
Cc: Carlos Santa <carlos.santa@intel.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Tested-by: Carlos Santa <carlos.santa@intel.com>
Tested-by: Kirill A. Shutemov <kirill@shutemov.name>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97666Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1477472755-15288-5-git-send-email-ville.syrjala@linux.intel.comSigned-off-by: Sasha Levin <alexander.levin@verizon.com>

c1593e5d

drm/dp/mst: add some defines for logical/physical ports · e5c6bbbc

Dave Airlie authored Oct 01, 2015

[ Upstream commit ccf03d69 ]

This just removes the magic number.
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

e5c6bbbc

drm/dp/mst: Clear port->pdt when tearing down the i2c adapter · dadd5803

Ville Syrjälä authored Oct 26, 2016

[ Upstream commit 36e3fa6a ]

The i2c adapter is only relevant for some peer device types, so
let's clear the pdt if it's still the same as the old_pdt when we
tear down the i2c adapter.

I don't really like this design pattern of updating port->whatever
before doing the accompanying changes and passing around old_whatever
to figure stuff out. Would make much more sense to me to the pass the
new value around and only update the port->whatever when things are
consistent. But let's try to work with what we have right now.

Quoting a follow-up from Ville:

"And naturally I forgot to amend the commit message w.r.t. this guy
[the change in drm_dp_destroy_port].  We don't really need to do this
here, but I figured I'd try to be a bit more consistent by having it,
just to avoid accidental mistakes if/when someone changes this stuff
again later."

v2: Clear port->pdt in the caller, if needed (Daniel)

Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: stable@vger.kernel.org
Cc: Carlos Santa <carlos.santa@intel.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Tested-by: Carlos Santa <carlos.santa@intel.com> (v1)
Tested-by: Kirill A. Shutemov <kirill@shutemov.name> (v1)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97666Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1477488633-16544-1-git-send-email-ville.syrjala@linux.intel.comSigned-off-by: Sasha Levin <alexander.levin@verizon.com>

dadd5803

KVM: MIPS: Precalculate MMIO load resume PC · a2d4bd9c

James Hogan authored Oct 25, 2016

[ Upstream commit e1e575f6 ]

The advancing of the PC when completing an MMIO load is done before
re-entering the guest, i.e. before restoring the guest ASID. However if
the load is in a branch delay slot it may need to access guest code to
read the prior branch instruction. This isn't safe in TLB mapped code at
the moment, nor in the future when we'll access unmapped guest segments
using direct user accessors too, as it could read the branch from host
user memory instead.

Therefore calculate the resume PC in advance while we're still in the
right context and save it in the new vcpu->arch.io_pc (replacing the no
longer needed vcpu->arch.pending_load_cause), and restore it on MMIO
completion.

Fixes: e685c689 ("KVM/MIPS32: Privileged instruction/target branch emulation.")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: <stable@vger.kernel.org> # 3.10.x-
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

a2d4bd9c

KVM: MIPS: Make ERET handle ERL before EXL · b05ff0cb

James Hogan authored Oct 25, 2016

[ Upstream commit ede5f3e7 ]

The ERET instruction to return from exception is used for returning from
exception level (Status.EXL) and error level (Status.ERL). If both bits
are set however we should be returning from ERL first, as ERL can
interrupt EXL, for example when an NMI is taken. KVM however checks EXL
first.

Fix the order of the checks to match the pseudocode in the instruction
set manual.

Fixes: e685c689 ("KVM/MIPS32: Privileged instruction/target branch emulation.")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: <stable@vger.kernel.org> # 3.10.x-
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

b05ff0cb

drm/radeon: drop register readback in cayman_cp_int_cntl_setup · 90a107c0

Lucas Stach authored Oct 24, 2016

[ Upstream commit 537b4b46 ]

The read is taking a considerable amount of time (about 50us on this
machine). The register does not ever hold anything other than the ring
ID that is updated in this exact function, so there is no need for
the read modify write cycle.

This chops off a big chunk of the time spent in hardirq disabled
context, as this function is called multiple times in the interrupt
handler. With this change applied radeon won't show up in the list
of the worst IRQ latency offenders anymore, where it was a regular
before.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

90a107c0

scsi: megaraid_sas: Fix data integrity failure for JBOD (passthrough) devices · 9a9a2373

Kashyap Desai authored Oct 21, 2016

[ Upstream commit 1e793f6f ]

Commit 02b01e01 ("megaraid_sas: return sync cache call with
success") modified the driver to successfully complete SYNCHRONIZE_CACHE
commands without passing them to the controller. Disk drive caches are
only explicitly managed by controller firmware when operating in RAID
mode. So this commit effectively disabled writeback cache flushing for
any drives used in JBOD mode, leading to data integrity failures.

[mkp: clarified patch description]

Fixes: 02b01e01
CC: stable@vger.kernel.org
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

9a9a2373

Revert "drm/radeon: fix DP link training issue with second 4K monitor" · 1b15bd73

Michel Dänzer authored Oct 24, 2016

[ Upstream commit 9dc79965 ]

This reverts commit 1a738347.

It caused at least some Kaveri laptops to incorrectly report DisplayPort
connectors as connected.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97857
Cc: stable@vger.kernel.org
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>

1b15bd73