Commits · ec9f4ebc155134da356db56303adb84c8f2972a0 · Kirill Smelkov / linux

25 Jan, 2016 19 commits

powercap / RAPL: fix BIOS lock check · ec9f4ebc

Prarit Bhargava authored Dec 09, 2015

commit 79a21dbf upstream.

Intel RAPL initialized on several systems where the BIOS lock bit (msr
0x610, bit 63) was set.  This occured because the return value of
rapl_read_data_raw() was being checked, rather than the value of the variable
passed in, locked.

This patch properly implments the rapl_read_data_raw() call to check the
variable locked, and now the Intel RAPL driver outputs the warning:

	intel_rapl: RAPL package 0 domain package locked by BIOS

and does not initialize for the package.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

ec9f4ebc

ses: fix additional element traversal bug · 84c40f73

James Bottomley authored Dec 11, 2015

commit 5e103356 upstream.

KASAN found that our additional element processing scripts drop off
the end of the VPD page into unallocated space.  The reason is that
not every element has additional information but our traversal
routines think they do, leading to them expecting far more additional
information than is present.  Fix this by adding a gate to the
traversal routine so that it only processes elements that are expected
to have additional information (list is in SES-2 section 6.1.13.1:
Additional Element Status diagnostic page overview)
Reported-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Tested-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

84c40f73

Revert "SCSI: Fix NULL pointer dereference in runtime PM" · f711f9cb

Ken Xue authored Dec 01, 2015

commit 1c69d3b6 upstream.

This reverts commit 49718f0f ("SCSI: Fix NULL pointer dereference in
runtime PM")

The old commit may lead to a issue that blk_{pre|post}_runtime_suspend and
blk_{pre|post}_runtime_resume may not be called in pairs.

Take sr device as example, when sr device goes to runtime suspend,
blk_{pre|post}_runtime_suspend will be called since sr device defined
pm->runtime_suspend. But blk_{pre|post}_runtime_resume will not be called
since sr device doesn't have pm->runtime_resume. so, sr device can not
resume correctly anymore.

More discussion can be found from below link.
http://marc.info/?l=linux-scsi&m=144163730531875&w=2Signed-off-by: Ken Xue <Ken.Xue@amd.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Cc: Xiangliang Yu <Xiangliang.Yu@amd.com>
Cc: James E.J. Bottomley <JBottomley@odin.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Michael Terry <Michael.terry@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

f711f9cb

ses: Fix problems with simple enclosures · d0112a83

James Bottomley authored Dec 08, 2015

commit 3417c1b5 upstream.

Simple enclosure implementations (mostly USB) are allowed to return only
page 8 to every diagnostic query.  That really confuses our
implementation because we assume the return is the page we asked for and
end up doing incorrect offsets based on bogus information leading to
accesses outside of allocated ranges.  Fix that by checking the page
code of the return and giving an error if it isn't the one we asked for.
This should fix reported bugs with USB storage by simply refusing to
attach to enclosures that behave like this.  It's also good defensive
practise now that we're starting to see more USB enclosures.
Reported-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

d0112a83

rfkill: copy the name into the rfkill struct · 819da758

Johannes Berg authored Dec 10, 2015

commit b7bb1100 upstream.

Some users of rfkill, like NFC and cfg80211, use a dynamic name when
allocating rfkill, in those cases dev_name(). Therefore, the pointer
passed to rfkill_alloc() might not be valid forever, I specifically
found the case that the rfkill name was quite obviously an invalid
pointer (or at least garbage) when the wiphy had been renamed.

Fix this by making a copy of the rfkill name in rfkill_alloc().
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

819da758

crypto: skcipher - Copy iv from desc even for 0-len walks · 2b4c87db

Jason A. Donenfeld authored Dec 06, 2015

commit 70d906bc upstream.

Some ciphers actually support encrypting zero length plaintexts. For
example, many AEAD modes support this. The resulting ciphertext for
those winds up being only the authentication tag, which is a result of
the key, the iv, the additional data, and the fact that the plaintext
had zero length. The blkcipher constructors won't copy the IV to the
right place, however, when using a zero length input, resulting in
some significant problems when ciphers call their initialization
routines, only to find that the ->iv parameter is uninitialized. One
such example of this would be using chacha20poly1305 with a zero length
input, which then calls chacha20, which calls the key setup routine,
which eventually OOPSes due to the uninitialized ->iv member.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

2b4c87db

video: fbdev: fsl: Fix kernel crash when diu_ops is not implemented · be55f7df

Wang Dongsheng authored Dec 03, 2015

commit acfc1cc1 upstream.

If diu_ops is not implemented on platform, kernel will access a NULL
pointer. We need to check this pointer in DIU initialization.
Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
Acked-by: Timur Tabi <timur@tabi.org>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

be55f7df

xen/events/fifo: Consume unprocessed events when a CPU dies · dff790f9

Ross Lagerwall authored Jun 19, 2015

commit 3de88d62 upstream.

When a CPU is offlined, there may be unprocessed events on a port for
that CPU.  If the port is subsequently reused on a different CPU, it
could be in an unexpected state with the link bit set, resulting in
interrupts being missed. Fix this by consuming any unprocessed events
for a particular CPU when that CPU dies.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
[ luis: backported to 3.16: adjusted context ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

dff790f9

i2c: mv64xxx: The n clockdiv factor is 0 based on sunxi SoCs · 50affdc4

Hans de Goede authored Sep 27, 2015

commit bba61f50 upstream.

According to the datasheets the n factor for dividing the tclk is
2 to the power n on Allwinner SoCs, not 2 to the power n + 1 as it is
on other mv64xxx implementations.

I've contacted Allwinner about this and they have confirmed that the
datasheet is correct.

This commit fixes the clk-divider calculations for Allwinner SoCs
accordingly.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Tested-by: Olliver Schinagl <oliver@schinagl.nl>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

50affdc4

tools: Add a "make all" rule · 603f9acd

Kamal Mostafa authored Jan 06, 2016

commit f6ba98c5 upstream.
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Pali Rohar <pali.rohar@gmail.com>
Cc: Roberta Dobrescu <roberta.dobrescu@gmail.com>
Link: http://lkml.kernel.org/r/1447280736-2161-2-git-send-email-kamal@canonical.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
[ kamal: backport to 3.16-stable: build all tools for this version ]
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

603f9acd

MIPS: uaccess: Take EVA into account in [__]clear_user · fbd67708

James Hogan authored Jan 04, 2016

commit d6a428fb upstream.

__clear_user() (and clear_user() which uses it), always access the user
mode address space, which results in EVA store instructions when EVA is
enabled even if the current user address limit is KERNEL_DS.

Fix this by adding a new symbol __bzero_kernel for the normal kernel
address space bzero in EVA mode, and call that from __clear_user() if
eva_kernel_access().
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10844/Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
[james.hogan@imgtec.com: backport]
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

fbd67708

MIPS: uaccess: Take EVA into account in __copy_from_user() · 029692a0

James Hogan authored Jan 04, 2016

commit 6f06a2c4 upstream.

When EVA is in use, __copy_from_user() was unconditionally using the EVA
instructions to read the user address space, however this can also be
used for kernel access. If the address isn't a valid user address it
will cause an address error or TLB exception, and if it is then user
memory may be read instead of kernel memory.

For example in the following stack trace from Linux v3.10 (changes since
then will prevent this particular one still happening) kernel_sendmsg()
set the user address limit to KERNEL_DS, and tcp_sendmsg() goes on to
use __copy_from_user() with a kernel address in KSeg0.

[<8002d434>] __copy_fromuser_common+0x10c/0x254
[<805710e0>] tcp_sendmsg+0x5f4/0xf00
[<804e8e3c>] sock_sendmsg+0x78/0xa0
[<804e8f28>] kernel_sendmsg+0x24/0x38
[<804ee0f8>] sock_no_sendpage+0x70/0x7c
[<8017c820>] pipe_to_sendpage+0x80/0x98
[<8017c6b0>] splice_from_pipe_feed+0xa8/0x198
[<8017cc54>] __splice_from_pipe+0x4c/0x8c
[<8017e844>] splice_from_pipe+0x58/0x78
[<8017e884>] generic_splice_sendpage+0x20/0x2c
[<8017d690>] do_splice_from+0xb4/0x110
[<8017d710>] direct_splice_actor+0x24/0x30
[<8017d394>] splice_direct_to_actor+0xd8/0x208
[<8017d51c>] do_splice_direct+0x58/0x7c
[<8014eaf4>] do_sendfile+0x1dc/0x39c
[<8014f82c>] SyS_sendfile+0x90/0xf8

Add the eva_kernel_access() check in __copy_from_user() like the one in
copy_from_user().
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10843/Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
[james.hogan@imgtec.com: backport]
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

029692a0

efi: Disable interrupts around EFI calls, not in the epilog/prolog calls · 4692cf46

Ingo Molnar authored Mar 03, 2015

commit 23a0d4e8 upstream.

Tapasweni Pathak reported that we do a kmalloc() in efi_call_phys_prolog()
on x86-64 while having interrupts disabled, which is a big no-no, as
kmalloc() can sleep.

Solve this by removing the irq disabling from the prolog/epilog calls
around EFI calls: it's unnecessary, as in this stage we are single
threaded in the boot thread, and we don't ever execute this from
interrupt contexts.
Reported-by: Tapasweni Pathak <tapaswenipathak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
[ luis: backported to 3.16: adjusted context ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

4692cf46

usb: musb: USB_TI_CPPI41_DMA requires dmaengine support · 9d1d87b2

Arnd Bergmann authored Nov 18, 2015

commit 183e53e8 upstream.

The CPPI-4.1 driver selects TI_CPPI41, which is a dmaengine
driver and that may not be available when CONFIG_DMADEVICES
is not set:

warning: (USB_TI_CPPI41_DMA) selects TI_CPPI41 which has unmet direct dependencies (DMADEVICES && ARCH_OMAP)

This adds an extra dependency to avoid generating warnings in randconfig
builds. Ideally we'd remove the 'select' statement, but that has the
potential to break defconfig files.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 411dd19c ("usb: musb: Kconfig: Select the DMA driver if DMA mode of MUSB is enabled")
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

9d1d87b2

sh64: fix __NR_fgetxattr · 915d5098

Dmitry V. Levin authored Dec 11, 2015

commit 2d33fa10 upstream.

According to arch/sh/kernel/syscalls_64.S and common sense, __NR_fgetxattr
has to be defined to 259, but it doesn't.  Instead, it's defined to 269,
which is of course used by another syscall, __NR_sched_setaffinity in this
case.

This bug was found by strace test suite.
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

915d5098

ocfs2: fix SGID not inherited issue · 980386d5

Junxiao Bi authored Dec 11, 2015

commit 854ee2e9 upstream.

Commit 8f1eb487 ("ocfs2: fix umask ignored issue") introduced an
issue, SGID of sub dir was not inherited from its parents dir.  It is
because SGID is set into "inode->i_mode" in ocfs2_get_init_inode(), but
is overwritten by "mode" which don't have SGID set later.

Fixes: 8f1eb487 ("ocfs2: fix umask ignored issue")
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Acked-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

980386d5

drivers/base/memory.c: prohibit offlining of memory blocks with missing sections · 79d27276

Seth Jennings authored Dec 11, 2015

commit 26bbe7ef upstream.

Commit bdee237c ("x86: mm: Use 2GB memory block size on large-memory
x86-64 systems") and 982792c7 ("x86, mm: probe memory block size for
generic x86 64bit") introduced large block sizes for x86.  This made it
possible to have multiple sections per memory block where previously,
there was a only every one section per block.

Since blocks consist of contiguous ranges of section, there can be holes
in the blocks where sections are not present.  If one attempts to
offline such a block, a crash occurs since the code is not designed to
deal with this.

This patch is a quick fix to gaurd against the crash by not allowing
blocks with non-present sections to be offlined.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=107781Signed-off-by: Seth Jennings <sjennings@variantweb.net>
Reported-by: Andrew Banman <abanman@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Greg KH <greg@kroah.com>
Cc: Russ Anderson <rja@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

79d27276

mm: hugetlb: call huge_pte_alloc() only if ptep is null · 41d25989

Naoya Horiguchi authored Dec 11, 2015

commit 0d777df5 upstream.

Currently at the beginning of hugetlb_fault(), we call huge_pte_offset()
and check whether the obtained *ptep is a migration/hwpoison entry or
not.  And if not, then we get to call huge_pte_alloc().  This is racy
because the *ptep could turn into migration/hwpoison entry after the
huge_pte_offset() check.  This race results in BUG_ON in
huge_pte_alloc().

We don't have to call huge_pte_alloc() when the huge_pte_offset()
returns non-NULL, so let's fix this bug with moving the code into else
block.

Note that the *ptep could turn into a migration/hwpoison entry after
this block, but that's not a problem because we have another
!pte_present check later (we never go into hugetlb_no_page() in that
case.)

Fixes: 290408d4 ("hugetlb: hugepage migration core")
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

41d25989

mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress · f98ab7a1

Michal Hocko authored Dec 11, 2015

commit 373ccbe5 upstream.

Tetsuo Handa has reported that the system might basically livelock in
OOM condition without triggering the OOM killer.

The issue is caused by internal dependency of the direct reclaim on
vmstat counter updates (via zone_reclaimable) which are performed from
the workqueue context.  If all the current workers get assigned to an
allocation request, though, they will be looping inside the allocator
trying to reclaim memory but zone_reclaimable can see stalled numbers so
it will consider a zone reclaimable even though it has been scanned way
too much.  WQ concurrency logic will not consider this situation as a
congested workqueue because it relies that worker would have to sleep in
such a situation.  This also means that it doesn't try to spawn new
workers or invoke the rescuer thread if the one is assigned to the
queue.

In order to fix this issue we need to do two things.  First we have to
let wq concurrency code know that we are in trouble so we have to do a
short sleep.  In order to prevent from issues handled by 0e093d99
("writeback: do not sleep on the congestion queue if there are no
congested BDIs or if significant congestion is not being encountered in
the current zone") we limit the sleep only to worker threads which are
the ones of the interest anyway.

The second thing to do is to create a dedicated workqueue for vmstat and
mark it WQ_MEM_RECLAIM to note it participates in the reclaim and to
have a spare worker thread for it.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Tejun Heo <tj@kernel.org>
Cc: Cristopher Lameter <clameter@sgi.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Arkadiusz Miskiewicz <arekm@maven.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
[ luis: backported to 3.16, based on Ben's backport to 3.2:
  - use queue_delayed_work instead of queue_delayed_work_on in function
    vmstat_update()
  - change start_cpu_timer() instead of vmstat_shepherd()
  - adjusted context ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

f98ab7a1

18 Jan, 2016 13 commits

mm: hugetlb: fix hugepage memory leak caused by wrong reserve count · 60d786c5

Naoya Horiguchi authored Dec 11, 2015

commit a88c7695 upstream.

When dequeue_huge_page_vma() in alloc_huge_page() fails, we fall back on
alloc_buddy_huge_page() to directly create a hugepage from the buddy
allocator.

In that case, however, if alloc_buddy_huge_page() succeeds we don't
decrement h->resv_huge_pages, which means that successful
hugetlb_fault() returns without releasing the reserve count.  As a
result, subsequent hugetlb_fault() might fail despite that there are
still free hugepages.

This patch simply adds decrementing code on that code path.

I reproduced this problem when testing v4.3 kernel in the following situation:
 - the test machine/VM is a NUMA system,
 - hugepage overcommiting is enabled,
 - most of hugepages are allocated and there's only one free hugepage
   which is on node 0 (for example),
 - another program, which calls set_mempolicy(MPOL_BIND) to bind itself to
   node 1, tries to allocate a hugepage,
 - the allocation should fail but the reserve count is still hold.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[ luis: backported to 3.16:
  - use 'chg' instead of 'gbl_chg'
  - adjusted context ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

60d786c5

parisc iommu: fix panic due to trying to allocate too large region · 03dd7ad3

Mikulas Patocka authored Nov 30, 2015

commit e46e31a3 upstream.

When using the Promise TX2+ SATA controller on PA-RISC, the system often
crashes with kernel panic, for example just writing data with the dd
utility will make it crash.

Kernel panic - not syncing: drivers/parisc/sba_iommu.c: I/O MMU @ 000000000000a000 is out of mapping resources

CPU: 0 PID: 18442 Comm: mkspadfs Not tainted 4.4.0-rc2 #2
Backtrace:
 [<000000004021497c>] show_stack+0x14/0x20
 [<0000000040410bf0>] dump_stack+0x88/0x100
 [<000000004023978c>] panic+0x124/0x360
 [<0000000040452c18>] sba_alloc_range+0x698/0x6a0
 [<0000000040453150>] sba_map_sg+0x260/0x5b8
 [<000000000c18dbb4>] ata_qc_issue+0x264/0x4a8 [libata]
 [<000000000c19535c>] ata_scsi_translate+0xe4/0x220 [libata]
 [<000000000c19a93c>] ata_scsi_queuecmd+0xbc/0x320 [libata]
 [<0000000040499bbc>] scsi_dispatch_cmd+0xfc/0x130
 [<000000004049da34>] scsi_request_fn+0x6e4/0x970
 [<00000000403e95a8>] __blk_run_queue+0x40/0x60
 [<00000000403e9d8c>] blk_run_queue+0x3c/0x68
 [<000000004049a534>] scsi_run_queue+0x2a4/0x360
 [<000000004049be68>] scsi_end_request+0x1a8/0x238
 [<000000004049de84>] scsi_io_completion+0xfc/0x688
 [<0000000040493c74>] scsi_finish_command+0x17c/0x1d0

The cause of the crash is not exhaustion of the IOMMU space, there is
plenty of free pages. The function sba_alloc_range is called with size
0x11000, thus the pages_needed variable is 0x11. The function
sba_search_bitmap is called with bits_wanted 0x11 and boundary size is
0x10 (because dma_get_seg_boundary(dev) returns 0xffff).

The function sba_search_bitmap attempts to allocate 17 pages that must not
cross 16-page boundary - it can't satisfy this requirement
(iommu_is_span_boundary always returns true) and fails even if there are
many free entries in the IOMMU space.

How did it happen that we try to allocate 17 pages that don't cross
16-page boundary? The cause is in the function iommu_coalesce_chunks. This
function tries to coalesce adjacent entries in the scatterlist. The
function does several checks if it may coalesce one entry with the next,
one of those checks is this:

	if (startsg->length + dma_len > max_seg_size)
		break;

When it finishes coalescing adjacent entries, it allocates the mapping:

sg_dma_len(contig_sg) = dma_len;
dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE);
sg_dma_address(contig_sg) =
	PIDE_FLAG
	| (iommu_alloc_range(ioc, dev, dma_len) << IOVP_SHIFT)
	| dma_offset;

It is possible that (startsg->length + dma_len > max_seg_size) is false
(we are just near the 0x10000 max_seg_size boundary), so the funcion
decides to coalesce this entry with the next entry. When the coalescing
succeeds, the function performs
	dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE);
And now, because of non-zero dma_offset, dma_len is greater than 0x10000.
iommu_alloc_range (a pointer to sba_alloc_range) is called and it attempts
to allocate 17 pages for a device that must not cross 16-page boundary.

To fix the bug, we must make sure that dma_len after addition of
dma_offset and alignment doesn't cross the segment boundary. I.e. change
	if (startsg->length + dma_len > max_seg_size)
		break;
to
	if (ALIGN(dma_len + dma_offset + startsg->length, IOVP_SIZE) > max_seg_size)
		break;

This patch makes this change (it precalculates max_seg_boundary at the
beginning of the function iommu_coalesce_chunks). I also added a check
that the mapping length doesn't exceed dma_get_seg_boundary(dev) (it is
not needed for Promise TX2+ SATA, but it may be needed for other devices
that have dma_get_seg_boundary lower than dma_get_max_seg_size).
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

03dd7ad3

USB: add quirk for devices with broken LPM · 23b42ce5

Alan Stern authored Dec 10, 2015

commit ad87e032 upstream.

Some USB device / host controller combinations seem to have problems
with Link Power Management.  For example, Steinar found that his xHCI
controller wouldn't handle bandwidth calculations correctly for two
video cards simultaneously when LPM was enabled, even though the bus
had plenty of bandwidth available.

This patch introduces a new quirk flag for devices that should remain
disabled for LPM, and creates quirk entries for Steinar's devices.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Steinar H. Gunderson <sgunderson@bigfoot.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ luis: backported to 3.16: adjusted context ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

23b42ce5

xhci: fix usb2 resume timing and races. · 30f8b511

Mathias Nyman authored Dec 11, 2015

commit f69115fd upstream.

According to USB 2 specs ports need to signal resume for at least 20ms,
in practice even longer, before moving to U0 state.
Both host and devices can initiate resume.

On device initiated resume, a port status interrupt with the port in resume
state in issued. The interrupt handler tags a resume_done[port]
timestamp with current time + USB_RESUME_TIMEOUT, and kick roothub timer.
Root hub timer requests for port status, finds the port in resume state,
checks if resume_done[port] timestamp passed, and set port to U0 state.

On host initiated resume, current code sets the port to resume state,
sleep 20ms, and finally sets the port to U0 state. This should also
be changed to work in a similar way as the device initiated resume, with
timestamp tagging, but that is not yet tested and will be a separate
fix later.

There are a few issues with this approach

1. A host initiated resume will also generate a resume event. The event
   handler will find the port in resume state, believe it's a device
   initiated resume, and act accordingly.

2. A port status request might cut the resume signalling short if a
   get_port_status request is handled during the host resume signalling.
   The port will be found in resume state. The timestamp is not set leading
   to time_after_eq(jiffies, timestamp) returning true, as timestamp = 0.
   get_port_status will proceed with moving the port to U0.

3. If an error, or anything else happens to the port during device
   initiated resume signalling it will leave all the device resume
   parameters hanging uncleared, preventing further suspend, returning
   -EBUSY, and cause the pm thread to busyloop trying to enter suspend.

Fix this by using the existing resuming_ports bitfield to indicate that
resume signalling timing is taken care of.
Check if the resume_done[port] is set before using it for timestamp
comparison, and also clear out any resume signalling related variables
if port is not in U0 or Resume state

This issue was discovered when a PM thread busylooped, trying to runtime
suspend the xhci USB 2 roothub on a Dell XPS
Reported-by: Daniel J Blueman <daniel@quora.org>
Tested-by: Daniel J Blueman <daniel@quora.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

30f8b511

vgaarb: fix signal handling in vga_get() · ccebddfb

Kirill A. Shutemov authored Nov 30, 2015

commit 9f5bd308 upstream.

There are few defects in vga_get() related to signal hadning:

  - we shouldn't check for pending signals for TASK_UNINTERRUPTIBLE
    case;

  - if we found pending signal we must remove ourself from wait queue
    and change task state back to running;

  - -ERESTARTSYS is more appropriate, I guess.
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Reviewed-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

ccebddfb

dm btree: fix bufio buffer leaks in dm_btree_del() error path · a94064d4

Joe Thornber authored Dec 10, 2015

commit ed8b45a3 upstream.

If dm_btree_del()'s call to push_frame() fails, e.g. due to
btree_node_validator finding invalid metadata, the dm_btree_del() error
path must unlock all frames (which have active dm-bufio buffers) that
were pushed onto the del_stack.

Otherwise, dm_bufio_client_destroy() will BUG_ON() because dm-bufio
buffers have leaked, e.g.:
  device-mapper: bufio: leaked buffer 3, hold count 1, list 0
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

a94064d4

ipmi: move timer init to before irq is setup · 35f6d738

Jan Stancek authored Dec 08, 2015

commit 27f972d3 upstream.

We encountered a panic on boot in ipmi_si on a dell per320 due to an
uninitialized timer as follows.

static int smi_start_processing(void       *send_info,
                                ipmi_smi_t intf)
{
        /* Try to claim any interrupts. */
        if (new_smi->irq_setup)
                new_smi->irq_setup(new_smi);

 --> IRQ arrives here and irq handler tries to modify uninitialized timer

    which triggers BUG_ON(!timer->function) in __mod_timer().

 Call Trace:
   <IRQ>
   [<ffffffffa0532617>] start_new_msg+0x47/0x80 [ipmi_si]
   [<ffffffffa053269e>] start_check_enables+0x4e/0x60 [ipmi_si]
   [<ffffffffa0532bd8>] smi_event_handler+0x1e8/0x640 [ipmi_si]
   [<ffffffff810f5584>] ? __rcu_process_callbacks+0x54/0x350
   [<ffffffffa053327c>] si_irq_handler+0x3c/0x60 [ipmi_si]
   [<ffffffff810efaf0>] handle_IRQ_event+0x60/0x170
   [<ffffffff810f245e>] handle_edge_irq+0xde/0x180
   [<ffffffff8100fc59>] handle_irq+0x49/0xa0
   [<ffffffff8154643c>] do_IRQ+0x6c/0xf0
   [<ffffffff8100ba53>] ret_from_intr+0x0/0x11

        /* Set up the timer that drives the interface. */
        setup_timer(&new_smi->si_timer, smi_timeout, (long)new_smi);

The following patch fixes the problem.

To: Openipmi-developer@lists.sourceforge.net
To: Corey Minyard <minyard@acm.org>
CC: linux-kernel@vger.kernel.org
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Tony Camuso <tcamuso@redhat.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

35f6d738

dm space map metadata: fix ref counting bug when bootstrapping a new space map · b934c4ec

Joe Thornber authored Dec 09, 2015

commit 50dd842a upstream.

When applying block operations (BOPs) do not remove them from the
uncommitted BOP ring-buffer until after they've been applied -- in case
we recurse.

Also, perform BOP_INC operation, in dm_sm_metadata_create() and
sm_metadata_extend(), in terms of the uncommitted BOP ring-buffer rather
than using direct calls to sm_ll_inc().
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

b934c4ec

dm thin metadata: fix bug when taking a metadata snapshot · b8e3468d

Joe Thornber authored Dec 09, 2015

commit 49e99fc7 upstream.

When you take a metadata snapshot the btree roots for the mapping and
details tree need to have their reference counts incremented so they
persist for the lifetime of the metadata snap.

The roots being incremented were those currently written in the
superblock, which could possibly be out of date if concurrent IO is
triggering new mappings, breaking of sharing, etc.

Fix this by performing a commit with the metadata lock held while taking
a metadata snapshot.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

b8e3468d

ALSA: hda - Fix noise problems on Thinkpad T440s · b7caa158

Takashi Iwai authored Dec 09, 2015

commit 9a811230 upstream.

Lenovo Thinkpad T440s suffers from constant background noises, and it
seems to be a generic hardware issue on this model:
  https://forums.lenovo.com/t5/ThinkPad-T400-T500-and-newer-T/T440s-speaker-noise/td-p/1339883

As the noise comes from the analog loopback path, disabling the path
is the easy workaround.

Also, the machine gives significant cracking noises at PM suspend.  A
workaround found by trial-and-error is to disable the shutup callback
currently used for ALC269-variant.

This patch addresses these noise issues by introducing a new fixup
chain.  Although the same workaround might be applicable to other
Thinkpad models, it's applied only to T440s (17aa:220c) in this patch,
so far, just to be safe (you chicken!).  As a compromise, a new model
option string "tp440" is provided now, though, so that owners of other
Thinkpad models can test it more easily.

Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=958504Reported-and-tested-by: Tim Hardeck <thardeck@suse.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[ luis: backported to 3.16: adjusted context ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

b7caa158

radeon: Fix VCE IB test on Big-Endian systems · 4b328046

Oded Gabbay authored Dec 04, 2015

commit 361c32d3 upstream.

This patch makes the VCE IB test pass on Big-Endian systems. It converts
to little-endian the contents of the VCE message.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[ luis: backported to 3.16: adjusted context ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

4b328046

radeon: Fix VCE ring test for Big-Endian systems · ec3c6a0f

Oded Gabbay authored Dec 04, 2015

commit 687f4b98 upstream.

This patch fixes the VCE ring test when running on Big-Endian machines.
Every write to the ring needs to be translated to little-endian.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[ luis: backported to 3.16: adjusted context ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

ec3c6a0f

radeon/cik: Fix GFX IB test on Big-Endian · 1fe8b129

Oded Gabbay authored Dec 04, 2015

commit 5f3e226f upstream.

This patch makes the IB test on the GFX ring pass for CI-based cards
installed in Big-Endian machines.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

1fe8b129

11 Jan, 2016 8 commits

9p: ->evict_inode() should kick out ->i_data, not ->i_mapping · 19d1f432

Al Viro authored Dec 08, 2015

commit 4ad78628 upstream.

For block devices the pagecache is associated with the inode
on bdevfs, not with the aliasing ones on the mountable filesystems.
The latter have its own ->i_data empty and ->i_mapping pointing
to the (unique per major/minor) bdevfs inode.  That guarantees
cache coherence between all block device inodes with the same
device number.

Eviction of an alias inode has no business trying to evict the
pages belonging to bdevfs one; moreover, ->i_mapping is only
safe to access when the thing is opened.  At the time of
->evict_inode() the victim is definitely *not* opened.  We are
about to kill the address space embedded into struct inode
(inode->i_data) and that's what we need to empty of any pages.

9p instance tries to empty inode->i_mapping instead, which is
both unsafe and bogus - if we have several device nodes with
the same device number in different places, closing one of them
should not try to empty the (shared) page cache.

Fortunately, other instances in the tree are OK; they are
evicting from &inode->i_data instead, as 9p one should.
Reported-by: "Suzuki K. Poulose" <Suzuki.Poulose@arm.com>
Tested-by: "Suzuki K. Poulose" <Suzuki.Poulose@arm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

19d1f432

ALSA: hda - Fixing speaker noise on the two latest thinkpad models · f8ca884d

Hui Wang authored Dec 08, 2015

commit 23adc192 upstream.

We have two latest thinkpad laptop models which are all based on the
Intel skylake platforms, and all of them have the codec alc293 on
them. When the machines boot to the desktop, an greeting dialogue
shows up with the notification sound. But on these two models, there
is noise with the notification sound. We have 3 SKUs for each of
the models, all of them have this problem.

So far, this problem is only specific to these two thinkpad models,
we did not find this problem on the old thinkpad models with the
codec alc293 or alc292.

A workaround for this problem is disabling the aamix.

BugLink: https://bugs.launchpad.net/bugs/1523517Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

f8ca884d

IB/srp: Fix possible send queue overflow · bb087aec

Sagi Grimberg authored Dec 01, 2015

commit 09c0c0be upstream.

When using work request based memory registration (fast_reg)
we must reserve SQ entries for registration and invalidation
in addition to send operations. Each IO consumes 3 SQ entries
(registration, send, invalidation) so we need to allocate 3x
larger send-queue instead of 2x.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
[ luis: backported to 3.16:
  - change function srp_create_target_ib() instead of srp_create_ch_ib()
  - adjusted context ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

bb087aec

SUNRPC: Fix callback channel · ea312c49

Trond Myklebust authored Dec 07, 2015

commit 756b9b37 upstream.

The NFSv4.1 callback channel is currently broken because the receive
message will keep shrinking because the backchannel receive buffer size
never gets reset.
The easiest solution to this problem is instead of changing the receive
buffer, to rather adjust the copied request.

Fixes: 38b7631f ("nfs4: limit callback decoding to received bytes")
Cc: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

ea312c49

nfs4: limit callback decoding to received bytes · 169c69de

Benjamin Coddington authored Nov 20, 2015

commit 38b7631f upstream.

A truncated cb_compound request will cause the client to decode null or
data from a previous callback for nfs4.1 backchannel case, or uninitialized
data for the nfs4.0 case. This is because the path through
svc_process_common() advances the request's iov_base and decrements iov_len
without adjusting the overall xdr_buf's len field.  That causes
xdr_init_decode() to set up the xdr_stream with an incorrect length in
nfs4_callback_compound().

Fixing this for the nfs4.1 backchannel case first requires setting the
correct iov_len and page_len based on the length of received data in the
same manner as the nfs4.0 case.

Then the request's xdr_buf length can be adjusted for both cases based upon
the remaining iov_len and page_len.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

169c69de

virtio: fix memory leak of virtio ida cache layers · 1962eb80

Suman Anna authored Sep 16, 2015

commit c13f99b7 upstream.

The virtio core uses a static ida named virtio_index_ida for
assigning index numbers to virtio devices during registration.
The ida core may allocate some internal idr cache layers and
an ida bitmap upon any ida allocation, and all these layers are
truely freed only upon the ida destruction. The virtio_index_ida
is not destroyed at present, leading to a memory leak when using
the virtio core as a module and atleast one virtio device is
registered and unregistered.

Fix this by invoking ida_destroy() in the virtio core module
exit.
Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

1962eb80

ALSA: hda - Add inverted dmic for Packard Bell DOTS · 75fff031

David Henningsson authored Dec 07, 2015

commit 02f6ff90 upstream.

On the internal mic of the Packard Bell DOTS, one channel
has an inverted signal. Add a quirk to fix this up.

BugLink: https://bugs.launchpad.net/bugs/1523232Signed-off-by: David Henningsson <david.henningsson@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

75fff031

ALSA: rme96: Fix unexpected volume reset after rate changes · e09f1248

Takashi Iwai authored Dec 04, 2015

commit a74a8216 upstream.

rme96 driver needs to reset DAC depending on the sample rate, and this
results in resetting to the max volume suddenly.  It's because of the
missing call of snd_rme96_apply_dac_volume().

However, calling this function right after the DAC reset still may not
work, and we need some delay before this call.  Since the DAC reset
and the procedure after that are performed in the spinlock, we delay
the DAC volume restore at the end after the spinlock.
Reported-and-tested-by: Sylvain LABOISNE <maeda1@free.fr>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

e09f1248