Commits · 0e42d84ba218ee0d3dd2e8f367785c8bc71a9c14 · Kirill Smelkov / linux

16 Jan, 2015 40 commits

arm64: kernel: refactor the CPU suspend API for retention states · 0e42d84b

Lorenzo Pieralisi authored Aug 07, 2014

commit 714f5992 upstream.

CPU suspend is the standard kernel interface to be used to enter
low-power states on ARM64 systems. Current cpu_suspend implementation
by default assumes that all low power states are losing the CPU context,
so the CPU registers must be saved and cleaned to DRAM upon state
entry. Furthermore, the current cpu_suspend() implementation assumes
that if the CPU suspend back-end method returns when called, this has
to be considered an error regardless of the return code (which can be
successful) since the CPU was not expected to return from a code path that
is different from cpu_resume code path - eg returning from the reset vector.

All in all this means that the current API does not cope well with low-power
states that preserve the CPU context when entered (ie retention states),
since first of all the context is saved for nothing on state entry for
those states and a successful state entry can return as a normal function
return, which is considered an error by the current CPU suspend
implementation.

This patch refactors the cpu_suspend() API so that it can be split in
two separate functionalities. The arm64 cpu_suspend API just provides
a wrapper around CPU suspend operation hook. A new function is
introduced (for architecture code use only) for states that require
context saving upon entry:

__cpu_suspend(unsigned long arg, int (*fn)(unsigned long))

__cpu_suspend() saves the context on function entry and calls the
so called suspend finisher (ie fn) to complete the suspend operation.
The finisher is not expected to return, unless it fails in which case
the error is propagated back to the __cpu_suspend caller.

The API refactoring results in the following pseudo code call sequence for a
suspending CPU, when triggered from a kernel subsystem:

/*
 * int cpu_suspend(unsigned long idx)
 * @idx: idle state index
 */
{
-> cpu_suspend(idx)
	|---> CPU operations suspend hook called, if present
		|--> if (retention_state)
			|--> direct suspend back-end call (eg PSCI suspend)
		     else
			|--> __cpu_suspend(idx, &back_end_finisher);
}

By refactoring the cpu_suspend API this way, the CPU operations back-end
has a chance to detect whether idle states require state saving or not
and can call the required suspend operations accordingly either through
simple function call or indirectly through __cpu_suspend() which carries out
state saving and suspend finisher dispatching to complete idle state entry.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

0e42d84b

arm64: kernel: add missing __init section marker to cpu_suspend_init · 5ef30fef

Lorenzo Pieralisi authored Jul 17, 2014

commit 18ab7db6 upstream.

Suspend init function must be marked as __init, since it is not needed
after the kernel has booted. This patch moves the cpu_suspend_init()
function to the __init section.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

5ef30fef

ACPI / PM: Fix PM initialization for devices that are not present · 47abb28e

Rafael J. Wysocki authored Jan 01, 2015

commit 1b1f3e16 upstream.

If an ACPI device object whose _STA returns 0 (not present and not
functional) has _PR0 or _PS0, its power_manageable flag will be set
and acpi_bus_init_power() will return 0 for it.  Consequently, if
such a device object is passed to the ACPI device PM functions, they
will attempt to carry out the requested operation on the device,
although they should not do that for devices that are not present.

To fix that problem make acpi_bus_init_power() return an error code
for devices that are not present which will cause power_manageable to
be cleared for them as appropriate in acpi_bus_get_power_flags().
However, the lists of power resources should not be freed for the
device in that case, so modify acpi_bus_get_power_flags() to keep
those lists even if acpi_bus_init_power() returns an error.
Accordingly, when deciding whether or not the lists of power
resources need to be freed, acpi_free_power_resources_lists()
should check the power.flags.power_resources flag instead of
flags.power_manageable, so make that change too.

Furthermore, if acpi_bus_attach() sees that flags.initialized is
unset for the given device, it should reset the power management
settings of the device and re-initialize them from scratch instead
of relying on the previous settings (the device may have appeared
after being not present previously, for example), so make it use
the 'valid' flag of the D0 power state as the initial value of
flags.power_manageable for it and call acpi_bus_init_power() to
discover its current power state.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

47abb28e

ARM: mvebu: disable I/O coherency on non-SMP situations on Armada 370/375/38x/XP · 8d33f514

Thomas Petazzoni authored Nov 13, 2014

commit e5535545 upstream.

Enabling the hardware I/O coherency on Armada 370, Armada 375, Armada
38x and Armada XP requires a certain number of conditions:

 - On Armada 370, the cache policy must be set to write-allocate.

 - On Armada 375, 38x and XP, the cache policy must be set to
   write-allocate, the pages must be mapped with the shareable
   attribute, and the SMP bit must be set

Currently, on Armada XP, when CONFIG_SMP is enabled, those conditions
are met. However, when Armada XP is used in a !CONFIG_SMP kernel, none
of these conditions are met. With Armada 370, the situation is worse:
since the processor is single core, regardless of whether CONFIG_SMP
or !CONFIG_SMP is used, the cache policy will be set to write-back by
the kernel and not write-allocate.

Since solving this problem turns out to be quite complicated, and we
don't want to let users with a mainline kernel known to have
infrequent but existing data corruptions, this commit proposes to
simply disable hardware I/O coherency in situations where it is known
not to work.

And basically, the is_smp() function of the kernel tells us whether it
is OK to enable hardware I/O coherency or not, so this commit slightly
refactors the coherency_type() function to return
COHERENCY_FABRIC_TYPE_NONE when is_smp() is false, or the appropriate
type of the coherency fabric in the other case.

Thanks to this, the I/O coherency fabric will no longer be used at all
in !CONFIG_SMP configurations. It will continue to be used in
CONFIG_SMP configurations on Armada XP, Armada 375 and Armada 38x
(which are multiple cores processors), but will no longer be used on
Armada 370 (which is a single core processor).

In the process, it simplifies the implementation of the
coherency_type() function, and adds a missing call to of_node_put().
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Fixes: e60304f8 ("arm: mvebu: Add hardware I/O Coherency support")
Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Link: https://lkml.kernel.org/r/1415871540-20302-3-git-send-email-thomas.petazzoni@free-electrons.comSigned-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

8d33f514

Revert "ARM: 7830/1: delay: don't bother reporting bogomips in /proc/cpuinfo" · 3f4ddf1a

Pavel Machek authored Jan 04, 2015

commit 4bf9636c upstream.

Commit 9fc2105a ("ARM: 7830/1: delay: don't bother reporting
bogomips in /proc/cpuinfo") breaks audio in python, and probably
elsewhere, with message

  FATAL: cannot locate cpu MHz in /proc/cpuinfo

I'm not the first one to hit it, see for example

  https://theredblacktree.wordpress.com/2014/08/10/fatal-cannot-locate-cpu-mhz-in-proccpuinfo/
  https://devtalk.nvidia.com/default/topic/765800/workaround-for-fatal-cannot-locate-cpu-mhz-in-proc-cpuinf/?offset=1

Reading original changelog, I have to say "Stop breaking working setups.
You know who you are!".
Signed-off-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

3f4ddf1a

ARM: OMAP4: PM: Only do static dependency configuration in omap4_init_static_deps · dab35042

Nishanth Menon authored Oct 21, 2014

commit 9008d83f upstream.

Commit 705814b5 ("ARM: OMAP4+: PM: Consolidate OMAP4 PM code to
re-use it for OMAP5")

Moved logic generic for OMAP5+ as part of the init routine by
introducing omap4_pm_init. However, the patch left the powerdomain
initial setup, an unused omap4430 es1.0 check and a spurious log
"Power Management for TI OMAP4." in the original code.

Remove the duplicate code which is already present in omap4_pm_init from
omap4_init_static_deps.

As part of this change, also move the u-boot version print out of the
static dependency function to the omap4_pm_init function.

Fixes: 705814b5 ("ARM: OMAP4+: PM: Consolidate OMAP4 PM code to re-use it for OMAP5")
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

dab35042

ARM: dts: Enable PWM node by default for s3c64xx · 8be47453

Tomasz Figa authored Sep 24, 2014

commit 5e794de5 upstream.

The PWM block is required for system clock source so it must be always
enabled. This patch fixes boot issues on SMDK6410 which did not have
the node enabled explicitly for other purposes.

Fixes: eeb93d02 ("clocksource: of: Respect device tree node status")
Signed-off-by: Tomasz Figa <tomasz.figa@gmail.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

8be47453

ARM: dts: DRA7: wdt: Fix compatible property for watchdog node · 9feeb8f3

Lokesh Vutla authored Nov 12, 2014

commit be668835 upstream.

OMAP wdt driver supports only ti,omap3-wdt compatible. In DRA7 dt
wdt compatible property is defined as ti,omap4-wdt by mistake instead of
ti,omap3-wdt. Correcting the typo.

Fixes: 6e58b8f1 ("ARM: dts: DRA7: Add the dts files for dra7 SoC and dra7-evm board")
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

9feeb8f3

sched/deadline: Avoid double-accounting in case of missed deadlines · cae817ad

Luca Abeni authored Dec 17, 2014

commit 269ad801 upstream.

The dl_runtime_exceeded() function is supposed to ckeck if
a SCHED_DEADLINE task must be throttled, by checking if its
current runtime is <= 0. However, it also checks if the
scheduling deadline has been missed (the current time is
larger than the current scheduling deadline), further
decreasing the runtime if this happens.
This "double accounting" is wrong:

- In case of partitioned scheduling (or single CPU), this
  happens if task_tick_dl() has been called later than expected
  (due to small HZ values). In this case, the current runtime is
  also negative, and replenish_dl_entity() can take care of the
  deadline miss by recharging the current runtime to a value smaller
  than dl_runtime

- In case of global scheduling on multiple CPUs, scheduling
  deadlines can be missed even if the task did not consume more
  runtime than expected, hence penalizing the task is wrong

This patch fix this problem by throttling a SCHED_DEADLINE task
only when its runtime becomes negative, and not modifying the runtime
Signed-off-by: Luca Abeni <luca.abeni@unitn.it>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@gmail.com>
Cc: Dario Faggioli <raistlin@linux.it>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1418813432-20797-3-git-send-email-luca.abeni@unitn.itSigned-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cae817ad

sched/deadline: Fix migration of SCHED_DEADLINE tasks · 678c8bb7

Luca Abeni authored Dec 17, 2014

commit 6a503c3b upstream.

According to global EDF, tasks should be migrated between runqueues
without checking if their scheduling deadlines and runtimes are valid.
However, SCHED_DEADLINE currently performs such a check:
a migration happens doing:

	deactivate_task(rq, next_task, 0);
	set_task_cpu(next_task, later_rq->cpu);
	activate_task(later_rq, next_task, 0);

which ends up calling dequeue_task_dl(), setting the new CPU, and then
calling enqueue_task_dl().

enqueue_task_dl() then calls enqueue_dl_entity(), which calls
update_dl_entity(), which can modify scheduling deadline and runtime,
breaking global EDF scheduling.

As a result, some of the properties of global EDF are not respected:
for example, a taskset {(30, 80), (40, 80), (120, 170)} scheduled on
two cores can have unbounded response times for the third task even
if 30/80+40/80+120/170 = 1.5809 < 2

This can be fixed by invoking update_dl_entity() only in case of
wakeup, or if this is a new SCHED_DEADLINE task.
Signed-off-by: Luca Abeni <luca.abeni@unitn.it>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@gmail.com>
Cc: Dario Faggioli <raistlin@linux.it>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1418813432-20797-2-git-send-email-luca.abeni@unitn.itSigned-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

678c8bb7

scripts/kernel-doc: don't eat struct members with __aligned · 11d1b5db

Johannes Berg authored Dec 10, 2014

commit 7b990789 upstream.

The change from \d+ to .+ inside __aligned() means that the following
structure:

  struct test {
        u8 a __aligned(2);
        u8 b __aligned(2);
  };

essentially gets modified to

  struct test {
        u8 a;
  };

for purposes of kernel-doc, thus dropping a struct member, which in
turns causes warnings and invalid kernel-doc generation.

Fix this by replacing the catch-all (".") with anything that's not a
semicolon ("[^;]").

Fixes: 9dc30918 ("scripts/kernel-doc: handle struct member __aligned without numbers")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Cc: Nishanth Menon <nm@ti.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

11d1b5db

nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races · 1cde1254

Ryusuke Konishi authored Dec 10, 2014

commit 705304a8 upstream.

Same story as in commit 41080b5a ("nfsd race fixes: ext2") (similar
ext2 fix) except that nilfs2 needs to use insert_inode_locked4() instead
of insert_inode_locked() and a bug of a check for dead inodes needs to
be fixed.

If nilfs_iget() is called from nfsd after nilfs_new_inode() calls
insert_inode_locked4(), nilfs_iget() will wait for unlock_new_inode() at
the end of nilfs_mkdir()/nilfs_create()/etc to unlock the inode.

If nilfs_iget() is called before nilfs_new_inode() calls
insert_inode_locked4(), it will create an in-core inode and read its
data from the on-disk inode.  But, nilfs_iget() will find i_nlink equals
zero and fail at nilfs_read_inode_common(), which will lead it to call
iget_failed() and cleanly fail.

However, this sanity check doesn't work as expected for reused on-disk
inodes because they leave a non-zero value in i_mode field and it
hinders the test of i_nlink.  This patch also fixes the issue by
removing the test on i_mode that nilfs2 doesn't need.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

1cde1254

mtd: tests: abort torturetest on erase errors · b9c2571e

Brian Norris authored Nov 21, 2014

commit 68f29815 upstream.

The torture test should quit once it actually induces an error in the
flash. This step was accidentally removed during refactoring.

Without this fix, the torturetest just continues infinitely, or until
the maximum cycle count is reached. e.g.:

   ...
   [ 7619.218171] mtd_test: error -5 while erasing EB 100
   [ 7619.297981] mtd_test: error -5 while erasing EB 100
   [ 7619.377953] mtd_test: error -5 while erasing EB 100
   [ 7619.457998] mtd_test: error -5 while erasing EB 100
   [ 7619.537990] mtd_test: error -5 while erasing EB 100
   ...

Fixes: 6cf78358 ("mtd: mtd_torturetest: use mtd_test helpers")
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

b9c2571e

ceph: fix null pointer dereference in discard_cap_releases() · 92a34c87

Yan, Zheng authored Mar 24, 2014

commit 00bd8edb upstream.

send_mds_reconnect() may call discard_cap_releases() after all
release messages have been dropped by cleanup_cap_releases()
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Cc: Markus Blank-Burian <burian@muenster.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

92a34c87

ceph: do_sync is never initialized · e3de52b7

Dan Carpenter authored Nov 28, 2014

commit 021b77be upstream.

Probably this code was syncing a lot more often then intended because
the do_sync variable wasn't set to zero.

Fixes: c62988ec ('ceph: avoid meaningless calling ceph_caps_revoking if sync_mode == WB_SYNC_ALL.')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

e3de52b7

nfsd4: fix xdr4 inclusion of escaped char · 0cd81f59

Benjamin Coddington authored Dec 07, 2014

commit 5a64e569 upstream.

Fix a bug where nfsd4_encode_components_esc() includes the esc_end char as
an additional string encoding.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: e7a0444a "nfsd: add IPv6 addr escaping to fs_location hosts"
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

0cd81f59

fs: nfsd: Fix signedness bug in compare_blob · d7d13fde

Rasmus Villemoes authored Dec 05, 2014

commit ef17af2a upstream.

Bugs similar to the one in acbbe6fb (kcmp: fix standard comparison
bug) are in rich supply.

In this variant, the problem is that struct xdr_netobj::len has type
unsigned int, so the expression o1->len - o2->len _also_ has type
unsigned int; it has completely well-defined semantics, and the result
is some non-negative integer, which is always representable in a long
long. But this means that if the conditional triggers, we are
guaranteed to return a positive value from compare_blob.

In this case it could be fixed by

-       res = o1->len - o2->len;
+       res = (long long)o1->len - (long long)o2->len;

but I'd rather eliminate the usually broken 'return a - b;' idiom.
Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

d7d13fde

Drivers: hv: vmbus: Fix a race condition when unregistering a device · 8a8c38c8

Vitaly Kuznetsov authored Nov 04, 2014

commit 04a258c1 upstream.

When build with Debug the following crash is sometimes observed:
Call Trace:
 [<ffffffff812b9600>] string+0x40/0x100
 [<ffffffff812bb038>] vsnprintf+0x218/0x5e0
 [<ffffffff810baf7d>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff812bb4c1>] vscnprintf+0x11/0x30
 [<ffffffff8107a2f0>] vprintk+0xd0/0x5c0
 [<ffffffffa0051ea0>] ? vmbus_process_rescind_offer+0x0/0x110 [hv_vmbus]
 [<ffffffff8155c71c>] printk+0x41/0x45
 [<ffffffffa004ebac>] vmbus_device_unregister+0x2c/0x40 [hv_vmbus]
 [<ffffffffa0051ecb>] vmbus_process_rescind_offer+0x2b/0x110 [hv_vmbus]
...

This happens due to the following race: between 'if (channel->device_obj)' check
in vmbus_process_rescind_offer() and pr_debug() in vmbus_device_unregister() the
device can disappear. Fix the issue by taking an additional reference to the
device before proceeding to vmbus_device_unregister().
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

8a8c38c8

n_tty: Fix read_buf race condition, increment read_head after pushing data · 7810e6d8

Christian Riesch authored Nov 13, 2014

commit 8bfbe2de upstream.

Commit 19e2ad6a ("n_tty: Remove overflow
tests from receive_buf() path") moved the increment of read_head into
the arguments list of read_buf_addr(). Function calls represent a
sequence point in C. Therefore read_head is incremented before the
character c is placed in the buffer. Since the circular read buffer is
a lock-less design since commit 6d76bd26
("n_tty: Make N_TTY ldisc receive path lockless"), this creates a race
condition that leads to communication errors.

This patch modifies the code to increment read_head _after_ the data
is placed in the buffer and thus fixes the race for non-SMP machines.
To fix the problem for SMP machines, memory barriers must be added in
a separate patch.
Signed-off-by: Christian Riesch <christian.riesch@omicron.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

7810e6d8

serial: samsung: wait for transfer completion before clock disable · 24003a22

Robert Baldyga authored Nov 24, 2014

commit 1ff383a4 upstream.

This patch adds waiting until transmit buffer and shifter will be empty
before clock disabling.

Without this fix it's possible to have clock disabled while data was
not transmited yet, which causes unproper state of TX line and problems
in following data transfers.
Signed-off-by: Robert Baldyga <r.baldyga@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

24003a22

tracing/sched: Check preempt_count() for current when reading task->state · 37b2a5a7

Steven Rostedt (Red Hat) authored Dec 10, 2014

commit aee4e5f3 upstream.

When recording the state of a task for the sched_switch tracepoint a check of
task_preempt_count() is performed to see if PREEMPT_ACTIVE is set. This is
because, technically, a task being preempted is really in the TASK_RUNNING
state, and that is what should be recorded when tracing a sched_switch,
even if the task put itself into another state (it hasn't scheduled out
in that state yet).

But with the change to use per_cpu preempt counts, the
task_thread_info(p)->preempt_count is no longer used, and instead
task_preempt_count(p) is used.

The problem is that this does not use the current preempt count but a stale
one from a previous sched_switch. The task_preempt_count(p) uses
saved_preempt_count and not preempt_count(). But for tracing sched_switch,
if p is current, we really want preempt_count().

I hit this bug when I was tracing sleep and the call from do_nanosleep()
scheduled out in the "RUNNING" state.

           sleep-4290  [000] 537272.259992: sched_switch:         sleep:4290 [120] R ==> swapper/0:0 [120]
           sleep-4290  [000] 537272.260015: kernel_stack:         <stack trace>
=> __schedule (ffffffff8150864a)
=> schedule (ffffffff815089f8)
=> do_nanosleep (ffffffff8150b76c)
=> hrtimer_nanosleep (ffffffff8108d66b)
=> SyS_nanosleep (ffffffff8108d750)
=> return_to_handler (ffffffff8150e8e5)
=> tracesys_phase2 (ffffffff8150c844)

After a bit of hair pulling, I found that the state was really
TASK_INTERRUPTIBLE, but the saved_preempt_count had an old PREEMPT_ACTIVE
set and caused the sched_switch tracepoint to show it as RUNNING.

Link: http://lkml.kernel.org/r/20141210174428.3cb7542a@gandalf.local.homeAcked-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 01028747 "sched: Create more preempt_count accessors"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

37b2a5a7

writeback: fix a subtle race condition in I_DIRTY clearing · adfd1149

Tejun Heo authored Oct 24, 2014

commit 9c6ac78e upstream.

After invoking ->dirty_inode(), __mark_inode_dirty() does smp_mb() and
tests inode->i_state locklessly to see whether it already has all the
necessary I_DIRTY bits set.  The comment above the barrier doesn't
contain any useful information - memory barriers can't ensure "changes
are seen by all cpus" by itself.

And it sure enough was broken.  Please consider the following
scenario.

 CPU 0					CPU 1
 -------------------------------------------------------------------------------

					enters __writeback_single_inode()
					grabs inode->i_lock
					tests PAGECACHE_TAG_DIRTY which is clear
 enters __set_page_dirty()
 grabs mapping->tree_lock
 sets PAGECACHE_TAG_DIRTY
 releases mapping->tree_lock
 leaves __set_page_dirty()

 enters __mark_inode_dirty()
 smp_mb()
 sees I_DIRTY_PAGES set
 leaves __mark_inode_dirty()
					clears I_DIRTY_PAGES
					releases inode->i_lock

Now @inode has dirty pages w/ I_DIRTY_PAGES clear.  This doesn't seem
to lead to an immediately critical problem because requeue_inode()
later checks PAGECACHE_TAG_DIRTY instead of I_DIRTY_PAGES when
deciding whether the inode needs to be requeued for IO and there are
enough unintentional memory barriers inbetween, so while the inode
ends up with inconsistent I_DIRTY_PAGES flag, it doesn't fall off the
IO list.

The lack of explicit barrier may also theoretically affect the other
I_DIRTY bits which deal with metadata dirtiness.  There is no
guarantee that a strong enough barrier exists between
I_DIRTY_[DATA]SYNC clearing and write_inode() writing out the dirtied
inode.  Filesystem inode writeout path likely has enough stuff which
can behave as full barrier but it's theoretically possible that the
writeout may not see all the updates from ->dirty_inode().

Fix it by adding an explicit smp_mb() after I_DIRTY clearing.  Note
that I_DIRTY_PAGES needs a special treatment as it always needs to be
cleared to be interlocked with the lockless test on
__mark_inode_dirty() side.  It's cleared unconditionally and
reinstated after smp_mb() if the mapping still has dirty pages.

Also add comments explaining how and why the barriers are paired.

Lightly tested.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

adfd1149

cdc-acm: memory leak in error case · 7756c314

Oliver Neukum authored Nov 20, 2014

commit d908f847 upstream.

If probe() fails not only the attributes need to be removed
but also the memory freed.
Reported-by: Ahmed Tamrawi <ahmedtamrawi@gmail.com>
Signed-off-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

7756c314

genhd: check for int overflow in disk_expand_part_tbl() · 0fa79992

Jens Axboe authored Nov 19, 2014

commit 5fabcb4c upstream.

We can get here from blkdev_ioctl() -> blkpg_ioctl() -> add_partition()
with a user passed in partno value. If we pass in 0x7fffffff, the
new target in disk_expand_part_tbl() overflows the 'int' and we
access beyond the end of ptbl->part[] and even write to it when we
do the rcu_assign_pointer() to assign the new partition.
Reported-by: David Ramos <daramos@stanford.edu>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

0fa79992

Add USB_EHCI_EXYNOS to multi_v7_defconfig · 25f9cb00

Steev Klimaszewski authored Dec 30, 2014

commit 007487f1 upstream.

Currently we enable Exynos devices in the multi v7 defconfig, however, when
testing on my ODROID-U3, I noticed that USB was not working.  Enabling this
option causes USB to work, which enables networking support as well since the
ODROID-U3 has networking on the USB bus.

[arnd] Support for odroid-u3 was added in 3.10, so it would be nice to
backport this fix at least that far.
Signed-off-by: Steev Klimaszewski <steev@gentoo.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

25f9cb00

USB: cdc-acm: check for valid interfaces · 77e2e487

Greg Kroah-Hartman authored Nov 07, 2014

commit 403dff4e upstream.

We need to check that we have both a valid data and control inteface for both
types of headers (union and not union.)

References: https://bugzilla.kernel.org/show_bug.cgi?id=83551Reported-by: Simon Schubert <2+kernel@0x2c.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

77e2e487

ALSA: hda - Fix wrong gpio_dir & gpio_mask hint setups for IDT/STAC codecs · 628b7761

Takashi Iwai authored Jan 05, 2015

commit c507de88 upstream.

stac_store_hints() does utterly wrong for masking the values for
gpio_dir and gpio_data, likely due to copy&paste errors.  Fortunately,
this feature is used very rarely, so the impact must be really small.
Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

628b7761

ALSA: hda - using uninitialized data · b9de3480

Dan Carpenter authored Nov 27, 2014

commit 69eba10e upstream.

In olden times the snd_hda_param_read() function always set "*start_id"
but in 2007 we introduced a new return and it causes uninitialized data
bugs in a couple of the callers: print_codec_info() and
hdmi_parse_codec().

Fixes: e8a7f136 ('[ALSA] hda-intel - Improve HD-audio codec probing robustness')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

b9de3480

ALSA: usb-audio: extend KEF X300A FU 10 tweak to Arcam rPAC · af769e76

Jiri Jaburek authored Dec 18, 2014

commit d70a1b98 upstream.

The Arcam rPAC seems to have the same problem - whenever anything
(alsamixer, udevd, 3.9+ kernel from 60af3d03, ..) attempts to
access mixer / control interface of the card, the firmware "locks up"
the entire device, resulting in
  SNDRV_PCM_IOCTL_HW_PARAMS failed (-5): Input/output error
from alsa-lib.

Other operating systems can somehow read the mixer (there seems to be
playback volume/mute), but any manipulation is ignored by the device
(which has hardware volume controls).
Signed-off-by: Jiri Jaburek <jjaburek@redhat.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

af769e76

misc: genwqe: check for error from get_user_pages_fast() · 7de4a0e7

Ian Abbott authored Nov 06, 2014

commit cf35d6e0 upstream.

`genwqe_user_vmap()` calls `get_user_pages_fast()` and if the return
value is less than the number of pages requested, it frees the pages and
returns an error (`-EFAULT`).  However, it fails to consider a negative
error return value from `get_user_pages_fast()`.  In that case, the test
`if (rc < m->nr_pages)` will be false (due to promotion of `rc` to a
large `unsigned int`) and the code will continue on to call
`genwqe_map_pages()` with an invalid list of page pointers.  Fix it by
bailing out if `get_user_pages_fast()` returns a negative error value.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

7de4a0e7

driver core: Fix unbalanced device reference in drivers_probe · e9820ed9

Alex Williamson authored Oct 31, 2014

commit bb34cb6b upstream.

bus_find_device_by_name() acquires a device reference which is never
released.  This results in an object leak, which on older kernels
results in failure to release all resources of PCI devices.  libvirt
uses drivers_probe to re-attach devices to the host after assignment
and is therefore a common trigger for this leak.

Example:

# cd /sys/bus/pci/
# dmesg -C
# echo 1 > devices/0000\:01\:00.0/sriov_numvfs
# echo 0 > devices/0000\:01\:00.0/sriov_numvfs
# dmesg | grep 01:10
 pci 0000:01:10.0: [8086:10ca] type 00 class 0x020000
 kobject: '0000:01:10.0' (ffff8801d79cd0a8): kobject_add_internal: parent: '0000:00:01.0', set: 'devices'
 kobject: '0000:01:10.0' (ffff8801d79cd0a8): kobject_uevent_env
 kobject: '0000:01:10.0' (ffff8801d79cd0a8): fill_kobj_path: path = '/devices/pci0000:00/0000:00:01.0/0000:01:10.0'
 kobject: '0000:01:10.0' (ffff8801d79cd0a8): kobject_uevent_env
 kobject: '0000:01:10.0' (ffff8801d79cd0a8): fill_kobj_path: path = '/devices/pci0000:00/0000:00:01.0/0000:01:10.0'
 kobject: '0000:01:10.0' (ffff8801d79cd0a8): kobject_uevent_env
 kobject: '0000:01:10.0' (ffff8801d79cd0a8): fill_kobj_path: path = '/devices/pci0000:00/0000:00:01.0/0000:01:10.0'
 kobject: '0000:01:10.0' (ffff8801d79cd0a8): kobject_cleanup, parent           (null)
 kobject: '0000:01:10.0' (ffff8801d79cd0a8): calling ktype release
 kobject: '0000:01:10.0': free name

[kobject freed as expected]

# dmesg -C
# echo 1 > devices/0000\:01\:00.0/sriov_numvfs
# echo 0000:01:10.0 > drivers_probe
# echo 0 > devices/0000\:01\:00.0/sriov_numvfs
# dmesg | grep 01:10
 pci 0000:01:10.0: [8086:10ca] type 00 class 0x020000
 kobject: '0000:01:10.0' (ffff8801d79ce0a8): kobject_add_internal: parent: '0000:00:01.0', set: 'devices'
 kobject: '0000:01:10.0' (ffff8801d79ce0a8): kobject_uevent_env
 kobject: '0000:01:10.0' (ffff8801d79ce0a8): fill_kobj_path: path = '/devices/pci0000:00/0000:00:01.0/0000:01:10.0'
 kobject: '0000:01:10.0' (ffff8801d79ce0a8): kobject_uevent_env
 kobject: '0000:01:10.0' (ffff8801d79ce0a8): fill_kobj_path: path = '/devices/pci0000:00/0000:00:01.0/0000:01:10.0'
 kobject: '0000:01:10.0' (ffff8801d79ce0a8): kobject_uevent_env
 kobject: '0000:01:10.0' (ffff8801d79ce0a8): fill_kobj_path: path = '/devices/pci0000:00/0000:00:01.0/0000:01:10.0'

[no free]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

e9820ed9

x86, vdso: Use asm volatile in __getcpu · 99c86195

Andy Lutomirski authored Dec 21, 2014

commit 1ddf0b1b upstream.

In Linux 3.18 and below, GCC hoists the lsl instructions in the
pvclock code all the way to the beginning of __vdso_clock_gettime,
slowing the non-paravirt case significantly.  For unknown reasons,
presumably related to the removal of a branch, the performance issue
is gone as of

e76b027e x86,vdso: Use LSL unconditionally for vgetcpu

but I don't trust GCC enough to expect the problem to stay fixed.

There should be no correctness issue, because the __getcpu calls in
__vdso_vlock_gettime were never necessary in the first place.

Note to stable maintainers: In 3.18 and below, depending on
configuration, gcc 4.9.2 generates code like this:

     9c3:       44 0f 03 e8             lsl    %ax,%r13d
     9c7:       45 89 eb                mov    %r13d,%r11d
     9ca:       0f 03 d8                lsl    %ax,%ebx

This patch won't apply as is to any released kernel, but I'll send a
trivial backported version if needed.

[
 Backported by Andy Lutomirski.  Should apply to all affected
 versions.  This fixes a functionality bug as well as a performance
 bug: buggy kernels can infinite loop in __vdso_clock_gettime on
 affected compilers.  See, for exammple:

 https://bugzilla.redhat.com/show_bug.cgi?id=1178975
]

Fixes: 51c19b4f x86: vdso: pvclock gettime support
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

99c86195

x86_64, vdso: Fix the vdso address randomization algorithm · 67ff8e53

Andy Lutomirski authored Dec 19, 2014

commit 394f56fe upstream.

The theory behind vdso randomization is that it's mapped at a random
offset above the top of the stack.  To avoid wasting a page of
memory for an extra page table, the vdso isn't supposed to extend
past the lowest PMD into which it can fit.  Other than that, the
address should be a uniformly distributed address that meets all of
the alignment requirements.

The current algorithm is buggy: the vdso has about a 50% probability
of being at the very end of a PMD.  The current algorithm also has a
decent chance of failing outright due to incorrect handling of the
case where the top of the stack is near the top of its PMD.

This fixes the implementation.  The paxtest estimate of vdso
"randomisation" improves from 11 bits to 18 bits.  (Disclaimer: I
don't know what the paxtest code is actually calculating.)

It's worth noting that this algorithm is inherently biased: the vdso
is more likely to end up near the end of its PMD than near the
beginning.  Ideally we would either nix the PMD sharing requirement
or jointly randomize the vdso and the stack to reduce the bias.

In the mean time, this is a considerable improvement with basically
no risk of compatibility issues, since the allowed outputs of the
algorithm are unchanged.

As an easy test, doing this:

for i in `seq 10000`
  do grep -P vdso /proc/self/maps |cut -d- -f1
done |sort |uniq -d

used to produce lots of output (1445 lines on my most recent run).
A tiny subset looks like this:

7fffdfffe000
7fffe01fe000
7fffe05fe000
7fffe07fe000
7fffe09fe000
7fffe0bfe000
7fffe0dfe000

Note the suspicious fe000 endings.  With the fix, I get a much more
palatable 76 repeated addresses.
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

67ff8e53

kvm: x86: drop severity of "generation wraparound" message · 858788b7

Paolo Bonzini authored Dec 22, 2014

commit a629df7e upstream.

Since most virtual machines raise this message once, it is a bit annoying.
Make it KERN_DEBUG severity.

Fixes: 7a2e8aafSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

858788b7

HID: Add a new id 0x501a for Genius MousePen i608X · 52c05f86

Giedrius Statkevičius authored Dec 27, 2014

commit 2bacedad upstream.

New Genius MousePen i608X devices have a new id 0x501a instead of the
old 0x5011 so add a new #define with "_2" appended and change required
places.

The remaining two checkpatch warnings about line length
being over 80 characters are present in the original files too and this
patch was made in the same style (no line break).

Just adding a new id and changing the required places should make the
new device work without any issues according to the bug report in the
following url.

This patch was made according to and fixes:
https://bugzilla.kernel.org/show_bug.cgi?id=67111Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

52c05f86

HID: add battery quirk for USB_DEVICE_ID_APPLE_ALU_WIRELESS_2011_ISO keyboard · eeffa922

Karl Relton authored Dec 16, 2014

commit da940db4 upstream.

Apple bluetooth wireless keyboard (sold in UK) has always reported zero
for battery strength no matter what condition the batteries are actually
in. With this patch applied (applying same quirk as other Apple
keyboards), the battery strength is now correctly reported.
Signed-off-by: Karl Relton <karllinuxtest.relton@ntlworld.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eeffa922

HID: roccat: potential out of bounds in pyra_sysfs_write_settings() · d9cb0b30

Dan Carpenter authored Jan 09, 2015

commit 606185b2 upstream.

This is a static checker fix.  We write some binary settings to the
sysfs file.  One of the settings is the "->startup_profile".  There
isn't any checking to make sure it fits into the
pyra->profile_settings[] array in the profile_activated() function.

I added a check to pyra_sysfs_write_settings() in both places because
I wasn't positive that the other callers were correct.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

d9cb0b30

HID: i2c-hid: prevent buffer overflow in early IRQ · 6d7f4a3b

Gwendal Grignou authored Dec 11, 2014

commit d1c7e29e upstream.

Before ->start() is called, bufsize size is set to HID_MIN_BUFFER_SIZE,
64 bytes. While processing the IRQ, we were asking to receive up to
wMaxInputLength bytes, which can be bigger than 64 bytes.

Later, when ->start is run, a proper bufsize will be calculated.

Given wMaxInputLength is said to be unreliable in other part of the
code, set to receive only what we can even if it results in truncated
reports.
Signed-off-by: Gwendal Grignou <gwendal@chromium.org>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

6d7f4a3b

HID: i2c-hid: fix race condition reading reports · 199eb5ad

Jean-Baptiste Maneyrol authored Nov 20, 2014

commit 6296f4a8 upstream.

Current driver uses a common buffer for reading reports either
synchronously in i2c_hid_get_raw_report() and asynchronously in
the interrupt handler.
There is race condition if an interrupt arrives immediately after
the report is received in i2c_hid_get_raw_report(); the common
buffer is modified by the interrupt handler with the new report
and then i2c_hid_get_raw_report() proceed using wrong data.

Fix it by using a separate buffers for synchronous reports.
Signed-off-by: Jean-Baptiste Maneyrol <jmaneyrol@invensense.com>
[Antonio Borneo: cleanup, rebase to v3.17, submit mainline]
Signed-off-by: Antonio Borneo <borneo.antonio@gmail.com>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

199eb5ad

blk-mq: use 'nr_cpu_ids' as highest CPU ID count for hwq <-> cpu map · bdc69c2a

Jens Axboe authored Nov 24, 2014

commit a33c1ba2 upstream.

We currently use num_possible_cpus(), but that breaks on sparc64 where
the CPU ID space is discontig. Use nr_cpu_ids as the highest CPU ID
instead, so we don't end up reading from invalid memory.
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bdc69c2a