Commits · 406239222a68ff405f79ea360f20aeca3218dd79 · nexedi / linux

06 Jun, 2016 40 commits

wait/ptrace: assume __WALL if the child is traced · 40623922

Oleg Nesterov authored May 23, 2016

[ Upstream commit bf959931 ]

The following program (simplified version of generated by syzkaller)

	#include <pthread.h>
	#include <unistd.h>
	#include <sys/ptrace.h>
	#include <stdio.h>
	#include <signal.h>

	void *thread_func(void *arg)
	{
		ptrace(PTRACE_TRACEME, 0,0,0);
		return 0;
	}

	int main(void)
	{
		pthread_t thread;

		if (fork())
			return 0;

		while (getppid() != 1)
			;

		pthread_create(&thread, NULL, thread_func, NULL);
		pthread_join(thread, NULL);
		return 0;
	}

creates an unreapable zombie if /sbin/init doesn't use __WALL.

This is not a kernel bug, at least in a sense that everything works as
expected: debugger should reap a traced sub-thread before it can reap the
leader, but without __WALL/__WCLONE do_wait() ignores sub-threads.

Unfortunately, it seems that /sbin/init in most (all?) distributions
doesn't use it and we have to change the kernel to avoid the problem.
Note also that most init's use sys_waitid() which doesn't allow __WALL, so
the necessary user-space fix is not that trivial.

This patch just adds the "ptrace" check into eligible_child().  To some
degree this matches the "tsk->ptrace" in exit_notify(), ->exit_signal is
mostly ignored when the tracee reports to debugger.  Or WSTOPPED, the
tracer doesn't need to set this flag to wait for the stopped tracee.

This obviously means the user-visible change: __WCLONE and __WALL no
longer have any meaning for debugger.  And I can only hope that this won't
break something, but at least strace/gdb won't suffer.

We could make a more conservative change.  Say, we can take __WCLONE into
account, or !thread_group_leader().  But it would be nice to not
complicate these historical/confusing checks.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: <syzkaller@googlegroups.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

40623922

sunrpc: fix stripping of padded MIC tokens · b36205bd

Tomáš Trnka authored May 20, 2016

[ Upstream commit c0cb8bf3 ]

The length of the GSS MIC token need not be a multiple of four bytes.
It is then padded by XDR to a multiple of 4 B, but unwrap_integ_data()
would previously only trim mic.len + 4 B. The remaining up to three
bytes would then trigger a check in nfs4svc_decode_compoundargs(),
leading to a "garbage args" error and mount failure:

nfs4svc_decode_compoundargs: compound not properly padded!
nfsd: failed to decode arguments!

This would prevent older clients using the pre-RFC 4121 MIC format
(37-byte MIC including a 9-byte OID) from mounting exports from v3.9+
servers using krb5i.

The trimming was introduced by commit 4c190e2f ("sunrpc: trim off
trailing checksum before returning decrypted or integrity authenticated
buffer").

Fixes: 4c190e2f "unrpc: trim off trailing checksum..."
Signed-off-by: Tomáš Trnka <ttrnka@mail.muni.cz>
Cc: stable@vger.kernel.org
Acked-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

b36205bd

mmc: sdhci-acpi: Remove MMC_CAP_BUS_WIDTH_TEST for Intel controllers · 4433e375

Adrian Hunter authored May 20, 2016

[ Upstream commit 265984b3 ]

The CMD19/CMD14 bus width test has been found to be unreliable in
some cases.  It is not essential, so simply remove it.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

4433e375

mmc: sdhci-acpi: Add two host capabilities for Intel · 15d0023e

Adrian Hunter authored Dec 01, 2014

[ Upstream commit 9d65cb88 ]

Intel host controllers are capable of doing the bus
width test and of waiting while busy, so add the
capability flags.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

15d0023e

mmc: longer timeout for long read time quirk · ee0fc86c

Matt Gumbel authored May 20, 2016

[ Upstream commit 32ecd320 ]

008GE0 Toshiba mmc in some Intel Baytrail tablets responds to
MMC_SEND_EXT_CSD in 450-600ms.

This patch will...

() Increase the long read time quirk timeout from 300ms to 600ms. Original
   author of that quirk says 300ms was only a guess and that the number
   may need to be raised in the future.

() Add this specific MMC to the quirk
Signed-off-by: Matt Gumbel <matthew.k.gumbel@intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

ee0fc86c

drm/i915: Don't leave old junk in ilk active watermarks on readout · f4502e6e

Ville Syrjälä authored May 13, 2016

[ Upstream commit 7045c368 ]

When we read out the watermark state from the hardware we're supposed to
transfer that into the active watermarks, but currently we fail to any
part of the active watermarks that isn't explicitly written. Let's clear
it all upfront.

Looks like this has been like this since the beginning, when I added the
readout. No idea why I didn't clear it up.

Cc: Matt Roper <matthew.d.roper@intel.com>
Fixes: 243e6a44 ("drm/i915: Init HSW watermark tracking in intel_modeset_setup_hw_state()")
Cc: stable@vger.kernel.org
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463151318-14719-2-git-send-email-ville.syrjala@linux.intel.com
(cherry picked from commit 15606534)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

f4502e6e

PM / sleep: Handle failures in device_suspend_late() consistently · 732468f7

Rafael J. Wysocki authored May 20, 2016

[ Upstream commit 3a17fb32 ]

Grygorii Strashko reports:

 The PM runtime will be left disabled for the device if its
 .suspend_late() callback fails and async suspend is not allowed
 for this device. In this case device will not be added in
 dpm_late_early_list and dpm_resume_early() will ignore this
 device, as result PM runtime will be disabled for it forever
 (side effect: after 8 subsequent failures for the same device
 the PM runtime will be reenabled due to disable_depth overflow).

To fix this problem, add devices to dpm_late_early_list regardless
of whether or not device_suspend_late() returns errors for them.

That will ensure failures in there to be handled consistently for
all devices regardless of their async suspend/resume status.
Reported-by: Grygorii Strashko <grygorii.strashko@ti.com>
Tested-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

732468f7

Input: uinput - handle compat ioctl for UI_SET_PHYS · 3cfc8e5f

Ricky Liang authored May 20, 2016

[ Upstream commit affa80bd ]

When running a 32-bit userspace on a 64-bit kernel, the UI_SET_PHYS
ioctl needs to be treated with special care, as it has the pointer
size encoded in the command.
Signed-off-by: Ricky Liang <jcliang@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

3cfc8e5f

cifs: Create dedicated keyring for spnego operations · 4ccd5ccb

Sachin Prabhu authored May 17, 2016

[ Upstream commit b74cb9a8 ]

The session key is the default keyring set for request_key operations.
This session key is revoked when the user owning the session logs out.
Any long running daemon processes started by this session ends up with
revoked session keyring which prevents these processes from using the
request_key mechanism from obtaining the krb5 keys.

The problem has been reported by a large number of autofs users. The
problem is also seen with multiuser mounts where the share may be used
by processes run by a user who has since logged out. A reproducer using
automount is available on the Red Hat bz.

The patch creates a new keyring which is used to cache cifs spnego
upcalls.

Red Hat bz: 1267754
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reported-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

4ccd5ccb

ASoC: ak4642: Enable cache usage to fix crashes on resume · 50c86076

Mark Brown authored May 18, 2016

[ Upstream commit d3030d11 ]

The ak4642 driver is using a regmap cache sync to restore the
configuration of the chip on resume but (as Peter observed) does not
actually define a register cache which means that the resume is never
going to work and we trigger asserts in regmap.  Fix this by enabling
caching.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reported-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

50c86076

ASoC: ak4642: Fix up max_register setting · 8f4c6107

Axel Lin authored Jul 01, 2015

[ Upstream commit f8ea6ceb ]

The max_register setting for ak4642, ak4643 and ak4648 are wrong, fix it.

According to the datasheet:
        the maximum valid register for ak4642 is 0x1f
        the maximum valid register for ak4643 is 0x24
        the maximum valid register for ak4648 is 0x27

The default settings for ak4642 and ak4643 are the same for 0x0 ~ 0x1f
registers, so it's fine to use the same reg_default table with differnt
num_reg_defaults setting.
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Tested-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

8f4c6107

xfs: skip stale inodes in xfs_iflush_cluster · 94f1ab98

Dave Chinner authored May 18, 2016

[ Upstream commit 7d3aa7fe ]

We don't write back stale inodes so we should skip them in
xfs_iflush_cluster, too.

cc: <stable@vger.kernel.org> # 3.10.x-
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

94f1ab98

xfs: fix inode validity check in xfs_iflush_cluster · 7c43f418

Dave Chinner authored May 18, 2016

[ Upstream commit 51b07f30 ]

Some careless idiot(*) wrote crap code in commit 1a3e8f3d ("xfs:
convert inode cache lookups to use RCU locking") back in late 2010,
and so xfs_iflush_cluster checks the wrong inode for whether it is
still valid under RCU protection. Fix it to lock and check the
correct inode.

(*) Careless-idiot: Dave Chinner <dchinner@redhat.com>

cc: <stable@vger.kernel.org> # 3.10.x-
Discovered-by: Brain Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

7c43f418

xfs: xfs_iflush_cluster fails to abort on error · 2712bb6a

Dave Chinner authored May 18, 2016

[ Upstream commit b1438f47 ]

When a failure due to an inode buffer occurs, the error handling
fails to abort the inode writeback correctly. This can result in the
inode being reclaimed whilst still in the AIL, leading to
use-after-free situations as well as filesystems that cannot be
unmounted as the inode log items left in the AIL never get removed.

Fix this by ensuring fatal errors from xfs_imap_to_bp() result in
the inode flush being aborted correctly.

cc: <stable@vger.kernel.org> # 3.10.x-
Reported-by: Shyam Kaushik <shyam@zadarastorage.com>
Diagnosed-by: Shyam Kaushik <shyam@zadarastorage.com>
Tested-by: Shyam Kaushik <shyam@zadarastorage.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

2712bb6a

cpuidle: Fix cpuidle_state_is_coupled() argument in cpuidle_enter() · 2506cfa4

Daniel Lezcano authored May 17, 2016

[ Upstream commit e7387da5 ]

Commit 0b89e9aa (cpuidle: delay enabling interrupts until all
coupled CPUs leave idle) rightfully fixed a regression by letting
the coupled idle state framework to handle local interrupt enabling
when the CPU is exiting an idle state.

The current code checks if the idle state is coupled and, if so, it
will let the coupled code to enable interrupts. This way, it can
decrement the ready-count before handling the interrupt. This
mechanism prevents the other CPUs from waiting for a CPU which is
handling interrupts.

But the check is done against the state index returned by the back
end driver's ->enter functions which could be different from the
initial index passed as parameter to the cpuidle_enter_state()
function.

 entered_state = target_state->enter(dev, drv, index);

 [ ... ]

 if (!cpuidle_state_is_coupled(drv, entered_state))
	local_irq_enable();

 [ ... ]

If the 'index' is referring to a coupled idle state but the
'entered_state' is *not* coupled, then the interrupts are enabled
again. All CPUs blocked on the sync barrier may busy loop longer
if the CPU has interrupts to handle before decrementing the
ready-count. That's consuming more energy than saving.

Fixes: 0b89e9aa (cpuidle: delay enabling interrupts until all coupled CPUs leave idle)
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: 3.15+ <stable@vger.kernel.org> # 3.15+
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

2506cfa4

remove directory incorrectly tries to set delete on close on non-empty directories · ed49188a

Steve French authored May 12, 2016

[ Upstream commit 897fba11 ]

Wrong return code was being returned on SMB3 rmdir of
non-empty directory.

For SMB3 (unlike for cifs), we attempt to delete a directory by
set of delete on close flag on the open. Windows clients set
this flag via a set info (SET_FILE_DISPOSITION to set this flag)
which properly checks if the directory is empty.

With this patch on smb3 mounts we correctly return
 "DIRECTORY NOT EMPTY"
on attempts to remove a non-empty directory.
Signed-off-by: Steve French <steve.french@primarydata.com>
CC: Stable <stable@vger.kernel.org>
Acked-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

ed49188a

fs/cifs: correctly to anonymous authentication for the NTLM(v2) authentication · 2acb5949

Stefan Metzmacher authored May 03, 2016

[ Upstream commit 1a967d6c ]

Only server which map unknown users to guest will allow
access using a non-null NTLMv2_Response.

For Samba it's the "map to guest = bad user" option.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913Signed-off-by: Stefan Metzmacher <metze@samba.org>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

2acb5949

fs/cifs: correctly to anonymous authentication for the NTLM(v1) authentication · 8da81561

Stefan Metzmacher authored May 03, 2016

[ Upstream commit 777f69b8 ]

Only server which map unknown users to guest will allow
access using a non-null NTChallengeResponse.

For Samba it's the "map to guest = bad user" option.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913Signed-off-by: Stefan Metzmacher <metze@samba.org>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

8da81561

fs/cifs: correctly to anonymous authentication for the LANMAN authentication · 345a226f

Stefan Metzmacher authored May 03, 2016

[ Upstream commit fa8f3a35 ]

Only server which map unknown users to guest will allow
access using a non-null LMChallengeResponse.

For Samba it's the "map to guest = bad user" option.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913Signed-off-by: Stefan Metzmacher <metze@samba.org>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

345a226f

fs/cifs: correctly to anonymous authentication via NTLMSSP · b9cdb796

Stefan Metzmacher authored May 03, 2016

[ Upstream commit cfda35d9 ]

See [MS-NLMP] 3.2.5.1.2 Server Receives an AUTHENTICATE_MESSAGE from the Client:

   ...
   Set NullSession to FALSE
   If (AUTHENTICATE_MESSAGE.UserNameLen == 0 AND
      AUTHENTICATE_MESSAGE.NtChallengeResponse.Length == 0 AND
      (AUTHENTICATE_MESSAGE.LmChallengeResponse == Z(1)
       OR
       AUTHENTICATE_MESSAGE.LmChallengeResponse.Length == 0))
       -- Special case: client requested anonymous authentication
       Set NullSession to TRUE
   ...

Only server which map unknown users to guest will allow
access using a non-null NTChallengeResponse.

For Samba it's the "map to guest = bad user" option.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913

CC: Stable <stable@vger.kernel.org>
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

b9cdb796

drm/fb_helper: Fix references to dev->mode_config.num_connector · 31c6ce3a

Lyude authored May 12, 2016

[ Upstream commit 255f0e7c ]

During boot, MST hotplugs are generally expected (even if no physical
hotplugging occurs) and result in DRM's connector topology changing.
This means that using num_connector from the current mode configuration
can lead to the number of connectors changing under us. This can lead to
some nasty scenarios in fbcon:

- We allocate an array to the size of dev->mode_config.num_connectors.
- MST hotplug occurs, dev->mode_config.num_connectors gets incremented.
- We try to loop through each element in the array using the new value
  of dev->mode_config.num_connectors, and end up going out of bounds
  since dev->mode_config.num_connectors is now larger then the array we
  allocated.

fb_helper->connector_count however, will always remain consistent while
we do a modeset in fb_helper.

Note: This is just polish for 4.7, Dave Airlie's drm_connector
refcounting fixed these bugs for real. But it's good enough duct-tape
for stable kernel backporting, since backporting the refcounting
changes is way too invasive.

Cc: stable@vger.kernel.org
Signed-off-by: Lyude <cpaul@redhat.com>
[danvet: Clarify why we need this. Also remove the now unused "dev"
local variable to appease gcc.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1463065021-18280-3-git-send-email-cpaul@redhat.comSigned-off-by: Sasha Levin <sasha.levin@oracle.com>

31c6ce3a

MIPS: MSA: Fix a link error on `_init_msa_upper' with older GCC · 9a4687f4

Maciej W. Rozycki authored May 17, 2016

[ Upstream commit e49d3848 ]

Fix a build regression from commit c9017757 ("MIPS: init upper 64b
of vector registers when MSA is first used"):

arch/mips/built-in.o: In function `enable_restore_fp_context':
traps.c:(.text+0xbb90): undefined reference to `_init_msa_upper'
traps.c:(.text+0xbb90): relocation truncated to fit: R_MIPS_26 against `_init_msa_upper'
traps.c:(.text+0xbef0): undefined reference to `_init_msa_upper'
traps.c:(.text+0xbef0): relocation truncated to fit: R_MIPS_26 against `_init_msa_upper'

to !CONFIG_CPU_HAS_MSA configurations with older GCC versions, which are
unable to figure out that calls to `_init_msa_upper' are indeed dead.
Of the many ways to tackle this failure choose the approach we have
already taken in `thread_msa_context_live'.

[ralf@linux-mips.org: Drop patch segment to junk file.]
Signed-off-by: Maciej W. Rozycki <macro@imgtec.com>
Cc: stable@vger.kernel.org # v3.16+
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13271/Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

9a4687f4

PCI: Disable all BAR sizing for devices with non-compliant BARs · cf2e0092

Prarit Bhargava authored May 11, 2016

[ Upstream commit ad67b437 ]

b84106b4 ("PCI: Disable IO/MEM decoding for devices with non-compliant
BARs") disabled BAR sizing for BARs 0-5 of devices that don't comply with
the PCI spec.  But it didn't do anything for expansion ROM BARs, so we
still try to size them, resulting in warnings like this on Broadwell-EP:

  pci 0000:ff:12.0: BAR 6: failed to assign [mem size 0x00000001 pref]

Move the non-compliant BAR check from __pci_read_base() up to
pci_read_bases() so it applies to the expansion ROM BAR as well as
to BARs 0-5.

Note that direct callers of __pci_read_base(), like sriov_init(), will now
bypass this check.  We haven't had reports of devices with broken SR-IOV
BARs yet.

[bhelgaas: changelog]
Fixes: b84106b4 ("PCI: Disable IO/MEM decoding for devices with non-compliant BARs")
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

cf2e0092

mmc: mmc: Fix partition switch timeout for some eMMCs · ae96721f

Adrian Hunter authored May 05, 2016

[ Upstream commit 1c447116 ]

Some eMMCs set the partition switch timeout too low.

Now typically eMMCs are considered a critical component (e.g. because
they store the root file system) and consequently are expected to be
reliable.  Thus we can neglect the use case where eMMCs can't switch
reliably and we might want a lower timeout to facilitate speedy
recovery.

Although we could employ a quirk for the cards that are affected (if
we could identify them all), as described above, there is little
benefit to having a low timeout, so instead simply set a minimum
timeout.

The minimum is set to 300ms somewhat arbitrarily - the examples that
have been seen had a timeout of 10ms but were sometimes taking 60-70ms.

Cc: stable@vger.kernel.org
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

ae96721f

ring-buffer: Prevent overflow of size in ring_buffer_resize() · 180fbec3

Steven Rostedt (Red Hat) authored May 13, 2016

[ Upstream commit 59643d15 ]

If the size passed to ring_buffer_resize() is greater than MAX_LONG - BUF_PAGE_SIZE
then the DIV_ROUND_UP() will return zero.

Here's the details:

  # echo 18014398509481980 > /sys/kernel/debug/tracing/buffer_size_kb

tracing_entries_write() processes this and converts kb to bytes.

 18014398509481980 << 10 = 18446744073709547520

and this is passed to ring_buffer_resize() as unsigned long size.

 size = DIV_ROUND_UP(size, BUF_PAGE_SIZE);

Where DIV_ROUND_UP(a, b) is (a + b - 1)/b

BUF_PAGE_SIZE is 4080 and here

 18446744073709547520 + 4080 - 1 = 18446744073709551599

where 18446744073709551599 is still smaller than 2^64

 2^64 - 18446744073709551599 = 17

But now 18446744073709551599 / 4080 = 4521260802379792

and size = size * 4080 = 18446744073709551360

This is checked to make sure its still greater than 2 * 4080,
which it is.

Then we convert to the number of buffer pages needed.

 nr_page = DIV_ROUND_UP(size, BUF_PAGE_SIZE)

but this time size is 18446744073709551360 and

 2^64 - (18446744073709551360 + 4080 - 1) = -3823

Thus it overflows and the resulting number is less than 4080, which makes

  3823 / 4080 = 0

an nr_pages is set to this. As we already checked against the minimum that
nr_pages may be, this causes the logic to fail as well, and we crash the
kernel.

There's no reason to have the two DIV_ROUND_UP() (that's just result of
historical code changes), clean up the code and fix this bug.

Cc: stable@vger.kernel.org # 3.5+
Fixes: 83f40318 ("ring-buffer: Make removal of ring buffer pages atomic")
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

180fbec3

ring-buffer: Use long for nr_pages to avoid overflow failures · a2d04c9e

Steven Rostedt (Red Hat) authored May 12, 2016

[ Upstream commit 9b94a8fb ]

The size variable to change the ring buffer in ftrace is a long. The
nr_pages used to update the ring buffer based on the size is int. On 64 bit
machines this can cause an overflow problem.

For example, the following will cause the ring buffer to crash:

 # cd /sys/kernel/debug/tracing
 # echo 10 > buffer_size_kb
 # echo 8556384240 > buffer_size_kb

Then you get the warning of:

 WARNING: CPU: 1 PID: 318 at kernel/trace/ring_buffer.c:1527 rb_update_pages+0x22f/0x260

Which is:

  RB_WARN_ON(cpu_buffer, nr_removed);

Note each ring buffer page holds 4080 bytes.

This is because:

 1) 10 causes the ring buffer to have 3 pages.
    (10kb requires 3 * 4080 pages to hold)

 2) (2^31 / 2^10  + 1) * 4080 = 8556384240
    The value written into buffer_size_kb is shifted by 10 and then passed
    to ring_buffer_resize(). 8556384240 * 2^10 = 8761737461760

 3) The size passed to ring_buffer_resize() is then divided by BUF_PAGE_SIZE
    which is 4080. 8761737461760 / 4080 = 2147484672

 4) nr_pages is subtracted from the current nr_pages (3) and we get:
    2147484669. This value is saved in a signed integer nr_pages_to_update

 5) 2147484669 is greater than 2^31 but smaller than 2^32, a signed int
    turns into the value of -2147482627

 6) As the value is a negative number, in update_pages_handler() it is
    negated and passed to rb_remove_pages() and 2147482627 pages will
    be removed, which is much larger than 3 and it causes the warning
    because not all the pages asked to be removed were removed.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=118001

Cc: stable@vger.kernel.org # 2.6.28+
Fixes: 7a8e76a3 ("tracing: unified trace buffer")
Reported-by: Hao Qin <QEver.cn@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

a2d04c9e

ring-buffer: Move recursive check to per_cpu descriptor · 654052ee

Steven Rostedt (Red Hat) authored May 27, 2015

[ Upstream commit 58a09ec6 ]

Instead of using a global per_cpu variable to perform the recursive
checks into the ring buffer, use the already existing per_cpu descriptor
that is part of the ring buffer itself.

Not only does this simplify the code, it also allows for one ring buffer
to be used within the guts of the use of another ring buffer. For example
trace_printk() can now be used within the ring buffer to record changes
done by an instance into the main ring buffer. The recursion checks
will prevent the trace_printk() itself from causing recursive issues
with the main ring buffer (it is just ignored), but the recursive
checks wont prevent the trace_printk() from recording other ring buffers.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

654052ee

ring-buffer: Remove duplicate use of '&' in recursive code · 09024348

Steven Rostedt (Red Hat) authored Mar 27, 2015

[ Upstream commit d631c8cc ]

A clean up of the recursive protection code changed

  val = this_cpu_read(current_context);
  val--;
  val &= this_cpu_read(current_context);

to

  val = this_cpu_read(current_context);
  val &= val & (val - 1);

Which has a duplicate use of '&' as the above is the same as

  val = val & (val - 1);

Actually, it would be best to remove that line altogether and
just add it to where it is used.

And Christoph even mentioned that it can be further compacted to
just a single line:

  __this_cpu_and(current_context, __this_cpu_read(current_context) - 1);

Link: http://lkml.kernel.org/alpine.DEB.2.11.1503271423580.23114@gentwo.orgSuggested-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

09024348

ring-buffer: Add unlikelys to make fast path the default · 4baf9733

Steven Rostedt (Red Hat) authored May 21, 2015

[ Upstream commit 3205f806 ]

I was running the trace_event benchmark and noticed that the times
to record a trace_event was all over the place. I looked at the assembly
of the ring_buffer_lock_reserver() and saw this:

 <ring_buffer_lock_reserve>:
       31 c0                   xor    %eax,%eax
       48 83 3d 76 47 bd 00    cmpq   $0x1,0xbd4776(%rip)        # ffffffff81d10d60 <ring_buffer_flags>
       01
       55                      push   %rbp
       48 89 e5                mov    %rsp,%rbp
       75 1d                   jne    ffffffff8113c60d <ring_buffer_lock_reserve+0x2d>
       65 ff 05 69 e3 ec 7e    incl   %gs:0x7eece369(%rip)        # a960 <__preempt_count>
       8b 47 08                mov    0x8(%rdi),%eax
       85 c0                   test   %eax,%eax
 +---- 74 12                   je     ffffffff8113c610 <ring_buffer_lock_reserve+0x30>
 |     65 ff 0d 5b e3 ec 7e    decl   %gs:0x7eece35b(%rip)        # a960 <__preempt_count>
 |     0f 84 85 00 00 00       je     ffffffff8113c690 <ring_buffer_lock_reserve+0xb0>
 |     31 c0                   xor    %eax,%eax
 |     5d                      pop    %rbp
 |     c3                      retq
 |     90                      nop
 +---> 65 44 8b 05 48 e3 ec    mov    %gs:0x7eece348(%rip),%r8d        # a960 <__preempt_count>
       7e
       41 81 e0 ff ff ff 7f    and    $0x7fffffff,%r8d
       b0 08                   mov    $0x8,%al
       65 8b 0d 58 36 ed 7e    mov    %gs:0x7eed3658(%rip),%ecx        # fc80 <current_context>
       41 f7 c0 00 ff 1f 00    test   $0x1fff00,%r8d
       74 1e                   je     ffffffff8113c64f <ring_buffer_lock_reserve+0x6f>
       41 f7 c0 00 00 10 00    test   $0x100000,%r8d
       b0 01                   mov    $0x1,%al
       75 13                   jne    ffffffff8113c64f <ring_buffer_lock_reserve+0x6f>
       41 81 e0 00 00 0f 00    and    $0xf0000,%r8d
       49 83 f8 01             cmp    $0x1,%r8
       19 c0                   sbb    %eax,%eax
       83 e0 02                and    $0x2,%eax
       83 c0 02                add    $0x2,%eax
       85 c8                   test   %ecx,%eax
       75 ab                   jne    ffffffff8113c5fe <ring_buffer_lock_reserve+0x1e>
       09 c8                   or     %ecx,%eax
       65 89 05 24 36 ed 7e    mov    %eax,%gs:0x7eed3624(%rip)        # fc80 <current_context>

The arrow is the fast path.

After adding the unlikely's, the fast path looks a bit better:

 <ring_buffer_lock_reserve>:
       31 c0                   xor    %eax,%eax
       48 83 3d 76 47 bd 00    cmpq   $0x1,0xbd4776(%rip)        # ffffffff81d10d60 <ring_buffer_flags>
       01
       55                      push   %rbp
       48 89 e5                mov    %rsp,%rbp
       75 7b                   jne    ffffffff8113c66b <ring_buffer_lock_reserve+0x8b>
       65 ff 05 69 e3 ec 7e    incl   %gs:0x7eece369(%rip)        # a960 <__preempt_count>
       8b 47 08                mov    0x8(%rdi),%eax
       85 c0                   test   %eax,%eax
       0f 85 9f 00 00 00       jne    ffffffff8113c6a1 <ring_buffer_lock_reserve+0xc1>
       65 8b 0d 57 e3 ec 7e    mov    %gs:0x7eece357(%rip),%ecx        # a960 <__preempt_count>
       81 e1 ff ff ff 7f       and    $0x7fffffff,%ecx
       b0 08                   mov    $0x8,%al
       65 8b 15 68 36 ed 7e    mov    %gs:0x7eed3668(%rip),%edx        # fc80 <current_context>
       f7 c1 00 ff 1f 00       test   $0x1fff00,%ecx
       75 50                   jne    ffffffff8113c670 <ring_buffer_lock_reserve+0x90>
       85 d0                   test   %edx,%eax
       75 7d                   jne    ffffffff8113c6a1 <ring_buffer_lock_reserve+0xc1>
       09 d0                   or     %edx,%eax
       65 89 05 53 36 ed 7e    mov    %eax,%gs:0x7eed3653(%rip)        # fc80 <current_context>
       65 8b 05 fc da ec 7e    mov    %gs:0x7eecdafc(%rip),%eax        # a130 <cpu_number>
       89 c2                   mov    %eax,%edx
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

4baf9733

MIPS: math-emu: Fix jalr emulation when rd == $0 · de66b0f0

Paul Burton authored Apr 21, 2016

[ Upstream commit ab4a92e6 ]

When emulating a jalr instruction with rd == $0, the code in
isBranchInstr was incorrectly writing to GPR $0 which should actually
always remain zeroed. This would lead to any further instructions
emulated which use $0 operating on a bogus value until the task is next
context switched, at which point the value of $0 in the task context
would be restored to the correct zero by a store in SAVE_SOME. Fix this
by not writing to rd if it is $0.

Fixes: 102cedc3 ("MIPS: microMIPS: Floating point support.")
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Maciej W. Rozycki <macro@imgtec.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Cc: stable <stable@vger.kernel.org> # v3.10
Patchwork: https://patchwork.linux-mips.org/patch/13160/Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

de66b0f0

powerpc/eeh: Restore initial state in eeh_pe_reset_and_recover() · 9365bd6c

Gavin Shan authored Apr 27, 2016

[ Upstream commit 5a0cdbfd ]

The function eeh_pe_reset_and_recover() is used to recover EEH
error when the passthrou device are transferred to guest and
backwards. The content in the device's config space will be lost
on PE reset issued in the middle of the recovery. The function
saves/restores it before/after the reset. However, config access
to some adapters like Broadcom BCM5719 at this point will causes
fenced PHB. The config space is always blocked and we save 0xFF's
that are restored at late point. The memory BARs are totally
corrupted, causing another EEH error upon access to one of the
memory BARs.

This restores the config space on those adapters like BCM5719
from the content saved to the EEH device when it's populated,
to resolve above issue.

Fixes: 5cfb20b9 ("powerpc/eeh: Emulate EEH recovery for VFIO devices")
Cc: stable@vger.kernel.org #v3.18+
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

9365bd6c

powerpc/eeh: Don't report error in eeh_pe_reset_and_recover() · 606232ca

Gavin Shan authored Apr 27, 2016

[ Upstream commit affeb0f2 ]

The function eeh_pe_reset_and_recover() is used to recover EEH
error when the passthrough device are transferred to guest and
backwards, meaning the device's driver is vfio-pci or none.
When the driver is vfio-pci that provides error_detected() error
handler only, the handler simply stops the guest and it's not
expected behaviour. On the other hand, no error handlers will
be called if we don't have a bound driver.

This ignores the error handler in eeh_pe_reset_and_recover()
that reports the error to device driver to avoid the exceptional
behaviour.

Fixes: 5cfb20b9 ("powerpc/eeh: Emulate EEH recovery for VFIO devices")
Cc: stable@vger.kernel.org #v3.18+
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

606232ca

sched/loadavg: Fix loadavg artifacts on fully idle and on fully loaded systems · 8d2ba3ff

Sasha Levin authored May 31, 2016

[ Upstream commit 20878232 ]

Systems show a minimal load average of 0.00, 0.01, 0.05 even when they
have no load at all.

Uptime and /proc/loadavg on all systems with kernels released during the
last five years up until kernel version 4.6-rc5, show a 5- and 15-minute
minimum loadavg of 0.01 and 0.05 respectively. This should be 0.00 on
idle systems, but the way the kernel calculates this value prevents it
from getting lower than the mentioned values.

Likewise but not as obviously noticeable, a fully loaded system with no
processes waiting, shows a maximum 1/5/15 loadavg of 1.00, 0.99, 0.95
(multiplied by number of cores).

Once the (old) load becomes 93 or higher, it mathematically can never
get lower than 93, even when the active (load) remains 0 forever.
This results in the strange 0.00, 0.01, 0.05 uptime values on idle
systems.  Note: 93/2048 = 0.0454..., which rounds up to 0.05.

It is not correct to add a 0.5 rounding (=1024/2048) here, since the
result from this function is fed back into the next iteration again,
so the result of that +0.5 rounding value then gets multiplied by
(2048-2037), and then rounded again, so there is a virtual "ghost"
load created, next to the old and active load terms.

By changing the way the internally kept value is rounded, that internal
value equivalent now can reach 0.00 on idle, and 1.00 on full load. Upon
increasing load, the internally kept load value is rounded up, when the
load is decreasing, the load value is rounded down.

The modified code was tested on nohz=off and nohz kernels. It was tested
on vanilla kernel 4.6-rc5 and on centos 7.1 kernel 3.10.0-327. It was
tested on single, dual, and octal cores system. It was tested on virtual
hosts and bare hardware. No unwanted effects have been observed, and the
problems that the patch intended to fix were indeed gone.
Tested-by: Damien Wyart <damien.wyart@free.fr>
Signed-off-by: Vik Heyndrickx <vik.heyndrickx@veribox.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Doug Smythies <dsmythies@telus.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 0f004f5a ("sched: Cure more NO_HZ load average woes")
Link: http://lkml.kernel.org/r/e8d32bff-d544-7748-72b5-3c86cc71f09f@veribox.netSigned-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

8d2ba3ff

rtlwifi: pci: use dev_kfree_skb_irq instead of kfree_skb in rtl_pci_reset_trx_ring · 23b229af

Sasha Levin authored May 31, 2016

[ Upstream commit cf968937 ]

We can't use kfree_skb in irq disable context, because spin_lock_irqsave
make sure we are always in irq disable context, use dev_kfree_skb_irq
instead of kfree_skb is better than dev_kfree_skb_any.

This patch fix below kernel warning:
[ 7612.095528] ------------[ cut here ]------------
[ 7612.095546] WARNING: CPU: 3 PID: 4460 at kernel/softirq.c:150 __local_bh_enable_ip+0x58/0x80()
[ 7612.095550] Modules linked in: rtl8723be x86_pkg_temp_thermal btcoexist rtl_pci rtlwifi rtl8723_common
[ 7612.095567] CPU: 3 PID: 4460 Comm: ifconfig Tainted: G        W       4.4.0+ #4
[ 7612.095570] Hardware name: LENOVO 20DFA04FCD/20DFA04FCD, BIOS J5ET48WW (1.19 ) 08/27/2015
[ 7612.095574]  00000000 00000000 da37fc70 c12ce7c5 00000000 da37fca0 c104cc59 c19d4454
[ 7612.095584]  00000003 0000116c c19d4784 00000096 c10508a8 c10508a8 00000200 c1b42400
[ 7612.095594]  f29be780 da37fcb0 c104ccad 00000009 00000000 da37fcbc c10508a8 f21f08b8
[ 7612.095604] Call Trace:
[ 7612.095614]  [<c12ce7c5>] dump_stack+0x41/0x5c
[ 7612.095620]  [<c104cc59>] warn_slowpath_common+0x89/0xc0
[ 7612.095628]  [<c10508a8>] ? __local_bh_enable_ip+0x58/0x80
[ 7612.095634]  [<c10508a8>] ? __local_bh_enable_ip+0x58/0x80
[ 7612.095640]  [<c104ccad>] warn_slowpath_null+0x1d/0x20
[ 7612.095646]  [<c10508a8>] __local_bh_enable_ip+0x58/0x80
[ 7612.095653]  [<c16b7d34>] destroy_conntrack+0x64/0xa0
[ 7612.095660]  [<c16b300f>] nf_conntrack_destroy+0xf/0x20
[ 7612.095665]  [<c1677565>] skb_release_head_state+0x55/0xa0
[ 7612.095670]  [<c16775bb>] skb_release_all+0xb/0x20
[ 7612.095674]  [<c167760b>] __kfree_skb+0xb/0x60
[ 7612.095679]  [<c16776f0>] kfree_skb+0x30/0x70
[ 7612.095686]  [<f81b869d>] ? rtl_pci_reset_trx_ring+0x22d/0x370 [rtl_pci]
[ 7612.095692]  [<f81b869d>] rtl_pci_reset_trx_ring+0x22d/0x370 [rtl_pci]
[ 7612.095698]  [<f81b87f9>] rtl_pci_start+0x19/0x190 [rtl_pci]
[ 7612.095705]  [<f81970e6>] rtl_op_start+0x56/0x90 [rtlwifi]
[ 7612.095712]  [<c17e3f16>] drv_start+0x36/0xc0
[ 7612.095717]  [<c17f5ab3>] ieee80211_do_open+0x2d3/0x890
[ 7612.095725]  [<c16820fe>] ? call_netdevice_notifiers_info+0x2e/0x60
[ 7612.095730]  [<c17f60bd>] ieee80211_open+0x4d/0x50
[ 7612.095736]  [<c16891b3>] __dev_open+0xa3/0x130
[ 7612.095742]  [<c183fa53>] ? _raw_spin_unlock_bh+0x13/0x20
[ 7612.095748]  [<c1689499>] __dev_change_flags+0x89/0x140
[ 7612.095753]  [<c127c70d>] ? selinux_capable+0xd/0x10
[ 7612.095759]  [<c1689589>] dev_change_flags+0x29/0x60
[ 7612.095765]  [<c1700b93>] devinet_ioctl+0x553/0x670
[ 7612.095772]  [<c12db758>] ? _copy_to_user+0x28/0x40
[ 7612.095777]  [<c17018b5>] inet_ioctl+0x85/0xb0
[ 7612.095783]  [<c166e647>] sock_ioctl+0x67/0x260
[ 7612.095788]  [<c166e5e0>] ? sock_fasync+0x80/0x80
[ 7612.095795]  [<c115c99b>] do_vfs_ioctl+0x6b/0x550
[ 7612.095800]  [<c127c812>] ? selinux_file_ioctl+0x102/0x1e0
[ 7612.095807]  [<c10a8914>] ? timekeeping_suspend+0x294/0x320
[ 7612.095813]  [<c10a256a>] ? __hrtimer_run_queues+0x14a/0x210
[ 7612.095820]  [<c1276e24>] ? security_file_ioctl+0x34/0x50
[ 7612.095827]  [<c115cef0>] SyS_ioctl+0x70/0x80
[ 7612.095832]  [<c1001804>] do_fast_syscall_32+0x84/0x120
[ 7612.095839]  [<c183ff91>] sysenter_past_esp+0x36/0x55
[ 7612.095844] ---[ end trace 97e9c637a20e8348 ]---
Signed-off-by: Wang YanQing <udknight@gmail.com>
Cc: Stable <stable@vger.kernel.org>
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

23b229af

rtlwifi: Fix logic error in enter/exit power-save mode · 0bae003f

Sasha Levin authored May 31, 2016

[ Upstream commit 873ffe15 ]

In commit a269913c ("rtlwifi: Rework rtl_lps_leave() and
rtl_lps_enter() to use work queue"), the tests for enter/exit
power-save mode were inverted. With this change applied, the
wifi connection becomes much more stable.

Fixes: a269913c ("rtlwifi: Rework rtl_lps_leave() and rtl_lps_enter() to use work queue")
Signed-off-by: Wang YanQing <udknight@gmail.com>
CC: Stable <stable@vger.kernel.org> [3.10+]
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

0bae003f

kbuild: move -Wunused-const-variable to W=1 warning level · 0a11bd16

Arnd Bergmann authored May 10, 2016

[ Upstream commit c9c6837d ]

gcc-6 started warning by default about variables that are not
used anywhere and that are marked 'const', generating many
false positives in an allmodconfig build, e.g.:

arch/arm/mach-davinci/board-da830-evm.c:282:20: warning: 'da830_evm_emif25_pins' defined but not used [-Wunused-const-variable=]
arch/arm/plat-omap/dmtimer.c:958:34: warning: 'omap_timer_match' defined but not used [-Wunused-const-variable=]
drivers/bluetooth/hci_bcm.c:625:39: warning: 'acpi_bcm_default_gpios' defined but not used [-Wunused-const-variable=]
drivers/char/hw_random/omap-rng.c:92:18: warning: 'reg_map_omap4' defined but not used [-Wunused-const-variable=]
drivers/devfreq/exynos/exynos5_bus.c:381:32: warning: 'exynos5_busfreq_int_pm' defined but not used [-Wunused-const-variable=]
drivers/dma/mv_xor.c:1139:34: warning: 'mv_xor_dt_ids' defined but not used [-Wunused-const-variable=]

This is similar to the existing -Wunused-but-set-variable warning
that was added in an earlier release and that we disable by default
now and only enable when W=1 is set, so it makes sense to do
the same here. Once we have eliminated the majority of the
warnings for both, we can put them back into the default list.

We probably want this in backport kernels as well, to allow building
them with gcc-6 without introducing extra warnings.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Lee Jones <lee.jones@linaro.org>
Cc: stable@vger.kernel.org
Signed-off-by: Michal Marek <mmarek@suse.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

0a11bd16

irqchip/gic: Ensure ordering between read of INTACK and shared data · 91c4ed35

Will Deacon authored Apr 26, 2016

[ Upstream commit f86c4fbd ]

When an IPI is generated by a CPU, the pattern looks roughly like:

  <write shared data>
  smp_wmb();
  <write to GIC to signal SGI>

On the receiving CPU we rely on the fact that, once we've taken the
interrupt, then the freshly written shared data must be visible to us.
Put another way, the CPU isn't going to speculate taking an interrupt.

Unfortunately, this assumption turns out to be broken.

Consider that CPUx wants to send an IPI to CPUy, which will cause CPUy
to read some shared_data. Before CPUx has done anything, a random
peripheral raises an IRQ to the GIC and the IRQ line on CPUy is raised.
CPUy then takes the IRQ and starts executing the entry code, heading
towards gic_handle_irq. Furthermore, let's assume that a bunch of the
previous interrupts handled by CPUy were SGIs, so the branch predictor
kicks in and speculates that irqnr will be <16 and we're likely to
head into handle_IPI. The prefetcher then grabs a speculative copy of
shared_data which contains a stale value.

Meanwhile, CPUx gets round to updating shared_data and asking the GIC
to send an SGI to CPUy. Internally, the GIC decides that the SGI is
more important than the peripheral interrupt (which hasn't yet been
ACKed) but doesn't need to do anything to CPUy, because the IRQ line
is already raised.

CPUy then reads the ACK register on the GIC, sees the SGI value which
confirms the branch prediction and we end up with a stale shared_data
value.

This patch fixes the problem by adding an smp_rmb() to the IPI entry
code in gic_handle_irq. As it turns out, the combination of a control
dependency and an ISB instruction from the EOI in the GICv3 driver is
enough to provide the ordering we need, so we add a comment there
justifying the absence of an explicit smp_rmb().

Cc: stable@vger.kernel.org
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

91c4ed35

gcov: disable tree-loop-im to reduce stack usage · d3091460

Arnd Bergmann authored Apr 25, 2016

[ Upstream commit c87bf431 ]

Enabling CONFIG_GCOV_PROFILE_ALL produces us a lot of warnings like

lib/lz4/lz4hc_compress.c: In function 'lz4_compresshcctx':
lib/lz4/lz4hc_compress.c:514:1: warning: the frame size of 1504 bytes is larger than 1024 bytes [-Wframe-larger-than=]

After some investigation, I found that this behavior started with gcc-4.9,
and opened https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69702.
A suggested workaround for it is to use the -fno-tree-loop-im
flag that turns off one of the optimization stages in gcc, so the
code runs a little slower but does not use excessive amounts
of stack.

We could make this conditional on the gcc version, but I could not
find an easy way to do this in Kbuild and the benefit would be
fairly small, given that most of the gcc version in production are
affected now.

I'm marking this for 'stable' backports because it addresses a bug
with code generation in gcc that exists in all kernel versions
with the affected gcc releases.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Michal Marek <mmarek@suse.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

d3091460

MIPS: KVM: Fix timer IRQ race when writing CP0_Compare · f1df969d

James Hogan authored Apr 22, 2016

[ Upstream commit b45bacd2 ]

Writing CP0_Compare clears the timer interrupt pending bit
(CP0_Cause.TI), but this wasn't being done atomically. If a timer
interrupt raced with the write of the guest CP0_Compare, the timer
interrupt could end up being pending even though the new CP0_Compare is
nowhere near CP0_Count.

We were already updating the hrtimer expiry with
kvm_mips_update_hrtimer(), which used both kvm_mips_freeze_hrtimer() and
kvm_mips_resume_hrtimer(). Close the race window by expanding out
kvm_mips_update_hrtimer(), and clearing CP0_Cause.TI and setting
CP0_Compare between the freeze and resume. Since the pending timer
interrupt should not be cleared when CP0_Compare is written via the KVM
user API, an ack argument is added to distinguish the source of the
write.

Fixes: e30492bb ("MIPS: KVM: Rewrite count/compare timer emulation")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim KrÄmÃ¡Å™" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: <stable@vger.kernel.org> # 3.16.x-
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

f1df969d

MIPS: KVM: Fix timer IRQ race when freezing timer · f255eae4

James Hogan authored Apr 22, 2016

[ Upstream commit 4355c44f ]

There's a particularly narrow and subtle race condition when the
software emulated guest timer is frozen which can allow a guest timer
interrupt to be missed.

This happens due to the hrtimer expiry being inexact, so very
occasionally the freeze time will be after the moment when the emulated
CP0_Count transitions to the same value as CP0_Compare (so an IRQ should
be generated), but before the moment when the hrtimer is due to expire
(so no IRQ is generated). The IRQ won't be generated when the timer is
resumed either, since the resume CP0_Count will already match CP0_Compare.

With VZ guests in particular this is far more likely to happen, since
the soft timer may be frozen frequently in order to restore the timer
state to the hardware guest timer. This happens after 5-10 hours of
guest soak testing, resulting in an overflow in guest kernel timekeeping
calculations, hanging the guest. A more focussed test case to
intentionally hit the race (with the help of a new hypcall to cause the
timer state to migrated between hardware & software) hits the condition
fairly reliably within around 30 seconds.

Instead of relying purely on the inexact hrtimer expiry to determine
whether an IRQ should be generated, read the guest CP0_Compare and
directly check whether the freeze time is before or after it. Only if
CP0_Count is on or after CP0_Compare do we check the hrtimer expiry to
determine whether the last IRQ has already been generated (which will
have pushed back the expiry by one timer period).

Fixes: e30492bb ("MIPS: KVM: Rewrite count/compare timer emulation")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim KrÄmÃ¡Å™" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: <stable@vger.kernel.org> # 3.16.x-
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

f255eae4