Commits · 8d00e98edfd2e2d9bb1b1b7c9eaa7c2ad23b737a · Kirill Smelkov / linux

15 Jan, 2016 7 commits

mac80211: mesh: fix call_rcu() usage · 8d00e98e

Johannes Berg authored Nov 17, 2015

commit c2e703a5 upstream.

When using call_rcu(), the called function may be delayed quite
significantly, and without a matching rcu_barrier() there's no
way to be sure it has finished.
Therefore, global state that could be gone/freed/reused should
never be touched in the callback.

Fix this in mesh by moving the atomic_dec() into the caller;
that's not really a problem since we already unlinked the path
and it will be destroyed anyway.

This fixes a crash Jouni observed when running certain tests in
a certain order, in which the mesh interface was torn down, the
memory reused for a function pointer (work struct) and running
that then crashed since the pointer had been decremented by 1,
resulting in an invalid instruction byte stream.

Fixes: eb2b9311 ("mac80211: mesh path table implementation")
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

8d00e98e

rtlwifi: rtl8821ae: Fix lockups on boot · 6006b89e

Larry Finger authored Nov 10, 2015

commit eeec5d0e upstream.

In commit 54328e64 ("rtlwifi: rtl8821ae: Fix system lockups on boot"),
an attempt was made to fix a regression introduced in commit 1277fa2a
("rtlwifi: Remove the clear interrupt routine from all drivers").
Unfortunately, there were logic errors in that patch that prevented
affected boxes from booting even after that patch was applied.

The actual cause of the original problem is unknown as none of the
developers have systems that are affected.

Fixes: 54328e64 ("rtlwifi: rtl8821ae: Fix system lockups on boot")
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

6006b89e

ASoC: wm8962: correct addresses for HPF_C_0/1 · ad63c8c9

Sachin Pandhare authored Nov 10, 2015

commit e9f96bc5 upstream.

From datasheet:
R17408 (4400h) HPF_C_1
R17409 (4401h) HPF_C_0
17048 -> 17408 (0x4400)
17049 -> 17409 (0x4401)
Signed-off-by: Sachin Pandhare <sachinpandhare@gmail.com>
Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

ad63c8c9

crypto: talitos - Fix timing leak in ESP ICV verification · 08a4178c

David Gstir authored Nov 15, 2015

commit 79960943 upstream.

Using non-constant time memcmp() makes the verification of the authentication
tag in the decrypt path vulnerable to timing attacks. Fix this by using
crypto_memneq() instead.
Signed-off-by: David Gstir <david@sigma-star.at>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[ kamal: backport to 4.2-stable: context ]
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

08a4178c

crypto: nx - Fix timing leak in GCM and CCM decryption · 05c4b320

David Gstir authored Nov 15, 2015

commit cb8affb5 upstream.

Using non-constant time memcmp() makes the verification of the authentication
tag in the decrypt path vulnerable to timing attacks. Fix this by using
crypto_memneq() instead.
Signed-off-by: David Gstir <david@sigma-star.at>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

05c4b320

Bluetooth: Fix l2cap_chan leak in SMP · c0960c6f

Johan Hedberg authored Nov 11, 2015

commit 7883746b upstream.

The L2CAP core expects channel implementations to manage the reference
returned by the new_connection callback. With sockets this is already
handled with each channel being tied to the corresponding socket. With
SMP however there's no context to tie the pointer to in the
smp_new_conn_cb function. The function can also not just drop the
reference since it's the only one at that point.

For fixed channels (like SMP) the code path inside the L2CAP core from
new_connection() to ready() is short and straight-forwards. The
crucial difference is that in ready() the implementation has access to
the l2cap_conn that SMP needs associate its l2cap_chan. Instead of
taking a new reference in smp_ready_cb() we can simply assume to
already own the reference created in smp_new_conn_cb(), i.e. there is
no need to call l2cap_chan_hold().
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

c0960c6f

ASoC: rsnd: fixup SCU_SYS_INT_EN1 address · a2b6f57e

Kuninori Morimoto authored Nov 05, 2015

commit 021c5d94 upstream.

cfcefe01 ("ASoC: rsnd: add recovery support for under/over flow
error on SRC") added SCU_SYS_INT_EN1 address, but it should be
0x1d4, not 0x1c4. This patch fixup it.

Fixes: cfcefe01 ("ASoC: rsnd: add recovery support for under/over flow error on SRC")
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

a2b6f57e

14 Jan, 2016 33 commits

ARM: dts: Kirkwood: Fix QNAP TS219 power-off · a1498d82

Helmut Klein authored Nov 11, 2015

commit 5442f0ea upstream.

The "reg" entry in the "poweroff" section of "kirkwood-ts219.dtsi"
addressed the wrong uart (0 = console). This patch changes the address
to select uart 1, which is the uart connected to the pic
microcontroller, which can switch the device off.
Signed-off-by: Helmut Klein <hgkr.klein@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 4350a47b ("ARM: Kirkwood: Make use of the QNAP Power off driver.")
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

a1498d82

ARM: 8454/1: OF implies OF_FLATTREE · a76abfec

Arnd Bergmann authored Nov 19, 2015

commit aa7d5f18 upstream.

On the ARM architecture, individual platforms select CONFIG_USE_OF if they
need it, but all device tree code is keyed off CONFIG_OF. When building
a platform without DT support and manually enabling CONFIG_OF, we now
get a number of build errors, e.g.

arch/arm/kernel/devtree.c: In function 'setup_machine_fdt':
arch/arm/kernel/devtree.c:215:19: error: implicit declaration of function 'early_init_dt_verify' [-Werror=implicit-function-declaration]

We could now try to separate the use case of booting from DT vs. the
case of using the dynamic implementation, but that seems more complicated
than it can gain us.

This simply changes the ARM Kconfig file to always enable OF_RESERVED_MEM
and OF_EARLY_FLATTREE when CONFIG_OF is enabled. These options add a little
extra code when we just want the dynamic OF implementation, but that seems
like a rather obscure case, and this version solves all CONFIG_OF related
randconfig regressions.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 0166dc11 ("of: make CONFIG_OF user selectable")
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

a76abfec

thermal: fix thermal_zone_bind_cooling_device prototype · f6ecbe07

Arnd Bergmann authored Nov 17, 2015

commit c86b3de8 upstream.

When the prototype for thermal_zone_bind_cooling_device
changed, the static inline wrapper function was left alone,
which in theory can cause build warnings:

I have seen this error in the past:
drivers/thermal/db8500_thermal.c: In function 'db8500_cdev_bind':
drivers/thermal/db8500_thermal.c:78:9: error: too many arguments to function 'thermal_zone_bind_cooling_device'
   ret = thermal_zone_bind_cooling_device(thermal, i, cdev,

while this one no longer shows up, there is no doubt that
the prototype is still wrong, so let's just fix it anyway.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 6cd9e9f6 ("thermal: of: fix cooling device weights in device tree")
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

f6ecbe07

target: fix COMPARE_AND_WRITE non zero SGL offset data corruption · 0a49477f

Jan Engelhardt authored Nov 23, 2015

commit d94e5a61 upstream.

target_core_sbc's compare_and_write functionality suffers from taking
data at the wrong memory location when writing a CAW request to disk
when a SGL offset is non-zero.

This can happen with loopback and vhost-scsi fabric drivers when
SCF_PASSTHROUGH_SG_TO_MEM_NOALLOC is used to map existing user-space
SGL memory into COMPARE_AND_WRITE READ/WRITE payload buffers.

Given the following sample LIO subtopology,

% targetcli ls /loopback/
o- loopback ................................. [1 Target]
  o- naa.6001405ebb8df14a ....... [naa.60014059143ed2b3]
    o- luns ................................... [2 LUNs]
      o- lun0 ................ [iblock/ram0 (/dev/ram0)]
      o- lun1 ................ [iblock/ram1 (/dev/ram1)]
% lsscsi -g
[3:0:1:0]    disk    LIO-ORG  IBLOCK           4.0   /dev/sdc   /dev/sg3
[3:0:1:1]    disk    LIO-ORG  IBLOCK           4.0   /dev/sdd   /dev/sg4

the following bug can be observed in Linux 4.3 and 4.4~rc1:

% perl -e 'print chr$_ for 0..255,reverse 0..255' >rand
% perl -e 'print "\0" x 512' >zero
% cat rand >/dev/sdd
% sg_compare_and_write -i rand -D zero --lba 0 /dev/sdd
% sg_compare_and_write -i zero -D rand --lba 0 /dev/sdd
Miscompare reported
% hexdump -Cn 512 /dev/sdd
00000000  0f 0e 0d 0c 0b 0a 09 08  07 06 05 04 03 02 01 00
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
*
00000200

Rather than writing all-zeroes as instructed with the -D file, it
corrupts the data in the sector by splicing some of the original
bytes in. The page of the first entry of cmd->t_data_sg includes the
CDB, and sg->offset is set to a position past the CDB. I presume that
sg->offset is also the right choice to use for subsequent sglist
members.
Signed-off-by: Jan Engelhardt <jengelh@netitwork.de>
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

0a49477f

target: Fix race for SCF_COMPARE_AND_WRITE_POST checking · c0ad0376

Nicholas Bellinger authored Nov 05, 2015

commit 057085e5 upstream.

This patch addresses a race + use after free where the first
stage of COMPARE_AND_WRITE in compare_and_write_callback()
is rescheduled after the backend sends the secondary WRITE,
resulting in second stage compare_and_write_post() callback
completing in target_complete_ok_work() before the first
can return.

Because current code depends on checking se_cmd->se_cmd_flags
after return from se_cmd->transport_complete_callback(),
this results in first stage having SCF_COMPARE_AND_WRITE_POST
set, which incorrectly falls through into second stage CAW
processing code, eventually triggering a NULL pointer
dereference due to use after free.

To address this bug, pass in a new *post_ret parameter into
se_cmd->transport_complete_callback(), and depend upon this
value instead of ->se_cmd_flags to determine when to return
or fall through into ->queue_status() code for CAW.

Cc: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

c0ad0376

iscsi-target: Fix rx_login_comp hang after login failure · 562a8cee

Nicholas Bellinger authored Nov 05, 2015

commit ca82c2bd upstream.

This patch addresses a case where iscsi_target_do_tx_login_io()
fails sending the last login response PDU, after the RX/TX
threads have already been started.

The case centers around iscsi_target_rx_thread() not invoking
allow_signal(SIGINT) before the send_sig(SIGINT, ...) occurs
from the failure path, resulting in RX thread hanging
indefinately on iscsi_conn->rx_login_comp.

Note this bug is a regression introduced by:

  commit e5419865
  Author: Nicholas Bellinger <nab@linux-iscsi.org>
  Date:   Wed Jul 22 23:14:19 2015 -0700

      iscsi-target: Fix iscsit_start_kthreads failure OOPs

To address this bug, complete ->rx_login_complete for good
measure in the failure path, and immediately return from
RX thread context if connection state did not actually reach
full feature phase (TARG_CONN_STATE_LOGGED_IN).

Cc: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

562a8cee

xen/gntdev: Grant maps should not be subject to NUMA balancing · 4ed156b8

Boris Ostrovsky authored Nov 10, 2015

commit 9c17d965 upstream.

Doing so will cause the grant to be unmapped and then, during
fault handling, the fault to be mistakenly treated as NUMA hint
fault.

In addition, even if those maps could partcipate in NUMA
balancing, it wouldn't provide any benefit since we are unable
to determine physical page's node (even if/when VNUMA is
implemented).

Marking grant maps' VMAs as VM_IO will exclude them from being
part of NUMA balancing.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

4ed156b8

nfs4: resend LAYOUTGET when there is a race that changes the seqid · d72013cc

Jeff Layton authored Nov 25, 2015

commit 4f2e9dce upstream.

pnfs_layout_process will check the returned layout stateid against what
the kernel has in-core. If it turns out that the stateid we received is
older, then we should resend the LAYOUTGET instead of falling back to
MDS I/O.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

d72013cc

NFSv4.1/pNFS: Don't request a minimal read layout beyond the end of file · 73a9184c

Trond Myklebust authored Aug 31, 2015

commit 2d89a1d3 upstream.

If we have a read layout, then sanity check the minimal layout length
so that it does not extend beyond the end of file.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

73a9184c

drm/radeon: make some dpm errors debug only · 71cc72fd

Alex Deucher authored Nov 23, 2015

commit 9c565e33 upstream.

"Could not force DPM to low", etc. is usually harmless and
just confuses users.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

71cc72fd

ARM: orion5x: Fix legacy get_irqnr_and_base · 765eb02e

Nicolas Pitre authored Nov 22, 2015

commit 4d2ec7e2 upstream.

Commit 5be9fc23 ("ARM: orion5x: fix legacy orion5x IRQ numbers") shifted
IRQ numbers by one but didn't update the get_irqnr_and_base macro
accordingly.  This macro is involved when CONFIG_MULTI_IRQ_HANDLER
is not defined.

[jac: 5d6bed2a went in to v4.2, but was backported to v3.18]
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Fixes: 5be9fc23 ("ARM: orion5x: fix legacy orion5x IRQ numbers")
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

765eb02e

ARM: dove: Fix legacy get_irqnr_and_base · ac4a44d4

Nicolas Pitre authored Nov 22, 2015

commit c1c90728 upstream.

Commit 5d6bed2a ("ARM: dove: fix legacy dove IRQ numbers") shifted
IRQ numbers by one but didn't update the get_irqnr_and_base macro
accordingly.  This macro is involved when CONFIG_MULTI_IRQ_HANDLER
is not defined.

[jac: 5d6bed2a went in to v4.2, but was backported to v3.18]
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Fixes: 5d6bed2a ("ARM: dove: fix legacy dove IRQ numbers")
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

ac4a44d4

ALSA: hda - Fix noise on Gigabyte Z170X mobo · 23a343f5

Takashi Iwai authored Nov 24, 2015

commit 0c25ad80 upstream.

Gigabyte Z710X mobo with ALC1150 codec gets significant noises from
the analog loopback routes even if their inputs are all muted.
Simply kill the aamix for fixing it.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=108301Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

23a343f5

PCI: Prevent out of bounds access in numa_node override · 2a937a1e

Mathias Krause authored Nov 09, 2015

commit 3dcc8d39 upstream.

Commit 12669631 ("PCI: Prevent out of bounds access in numa_node
override") missed that the user-provided node could also be negative.
Handle this case as well to avoid out-of-bounds accesses to the
node_states[] array.  However, allow the special value -1, i.e.
NUMA_NO_NODE, to be able to set the 'no specific node' configuration.

Fixes: 12669631 ("PCI: Prevent out of bounds access in numa_node override")
Fixes: 63692df1 ("PCI: Allow numa_node override via sysfs")
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Sasha Levin <sasha.levin@oracle.com>
CC: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

2a937a1e

drm/radeon: make rv770_set_sw_state failures non-fatal · fd7c64e8

Alex Deucher authored Nov 23, 2015

commit 4e7697ed upstream.

On some cards it takes a relatively long time for the change
to take place.  Make a timeout non-fatal.

bug:
https://bugs.freedesktop.org/show_bug.cgi?id=76130Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

fd7c64e8

arm64: KVM: Fix AArch32 to AArch64 register mapping · 88438afd

Marc Zyngier authored Nov 16, 2015

commit c0f09634 upstream.

When running a 32bit guest under a 64bit hypervisor, the ARMv8
architecture defines a mapping of the 32bit registers in the 64bit
space. This includes banked registers that are being demultiplexed
over the 64bit ones.

On exceptions caused by an operation involving a 32bit register, the
HW exposes the register number in the ESR_EL2 register. It was so
far understood that SW had to distinguish between AArch32 and AArch64
accesses (based on the current AArch32 mode and register number).

It turns out that I misinterpreted the ARM ARM, and the clue is in
D1.20.1: "For some exceptions, the exception syndrome given in the
ESR_ELx identifies one or more register numbers from the issued
instruction that generated the exception. Where the exception is
taken from an Exception level using AArch32 these register numbers
give the AArch64 view of the register."

Which means that the HW is already giving us the translated version,
and that we shouldn't try to interpret it at all (for example, doing
an MMIO operation from the IRQ mode using the LR register leads to
very unexpected behaviours).

The fix is thus not to perform a call to vcpu_reg32() at all from
vcpu_reg(), and use whatever register number is supplied directly.
The only case we need to find out about the mapping is when we
actively generate a register access, which only occurs when injecting
a fault in a guest.
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

88438afd

ARM/arm64: KVM: test properly for a PTE's uncachedness · 6cbe0242

Ard Biesheuvel authored Nov 10, 2015

commit e6fab544 upstream.

The open coded tests for checking whether a PTE maps a page as
uncached use a flawed '(pte_val(xxx) & CONST) != CONST' pattern,
which is not guaranteed to work since the type of a mapping is
not a set of mutually exclusive bits

For HYP mappings, the type is an index into the MAIR table (i.e, the
index itself does not contain any information whatsoever about the
type of the mapping), and for stage-2 mappings it is a bit field where
normal memory and device types are defined as follows:

    #define MT_S2_NORMAL            0xf
    #define MT_S2_DEVICE_nGnRE      0x1

I.e., masking *and* comparing with the latter matches on the former,
and we have been getting lucky merely because the S2 device mappings
also have the PTE_UXN bit set, or we would misidentify memory mappings
as device mappings.

Since the unmap_range() code path (which contains one instance of the
flawed test) is used both for HYP mappings and stage-2 mappings, and
considering the difference between the two, it is non-trivial to fix
this by rewriting the tests in place, as it would involve passing
down the type of mapping through all the functions.

However, since HYP mappings and stage-2 mappings both deal with host
physical addresses, we can simply check whether the mapping is backed
by memory that is managed by the host kernel, and only perform the
D-cache maintenance if this is the case.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Pavel Fedin <p.fedin@samsung.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

6cbe0242

ARM: dts: vfxxx: Fix dspi[01] spi-num-chipselects. · 4ae4d53c

Cory Tusar authored Nov 18, 2015

commit 897ed0ca upstream.

Per the Vybrid Reference Manual (section 3.8.6.1), dspi0 has 6 chip
select signals associated with it, while dspi1 has only 4.
Signed-off-by: Cory Tusar <cory.tusar@pid1solutions.com>
Acked-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

4ae4d53c

ALSA: hda - Fix headphone noise after Dell XPS 13 resume back from S3 · eed0fe0e

Hui Wang authored Nov 24, 2015

commit 8c69729b upstream.

We have a machine Dell XPS 13 with the codec alc256, after resume back
from S3, the headphone has noise when play sound.

Through comparing with the coeff vaule before and after S3, we found
restoring a coeff register will help remove noise.

BugLink: https://bugs.launchpad.net/bugs/1519168
Cc: Kailang Yang <kailang@realtek.com>
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

eed0fe0e

nfs4: limit callback decoding to received bytes · 19a86cbc

Benjamin Coddington authored Nov 20, 2015

commit 38b7631f upstream.

A truncated cb_compound request will cause the client to decode null or
data from a previous callback for nfs4.1 backchannel case, or uninitialized
data for the nfs4.0 case. This is because the path through
svc_process_common() advances the request's iov_base and decrements iov_len
without adjusting the overall xdr_buf's len field.  That causes
xdr_init_decode() to set up the xdr_stream with an incorrect length in
nfs4_callback_compound().

Fixing this for the nfs4.1 backchannel case first requires setting the
correct iov_len and page_len based on the length of received data in the
same manner as the nfs4.0 case.

Then the request's xdr_buf length can be adjusted for both cases based upon
the remaining iov_len and page_len.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

19a86cbc

vfs: Avoid softlockups with sendfile(2) · 32f04380

Jan Kara authored Nov 23, 2015

commit c2489e07 upstream.

The following test program from Dmitry can cause softlockups or RCU
stalls as it copies 1GB from tmpfs into eventfd and we don't have any
scheduling point at that path in sendfile(2) implementation:

        int r1 = eventfd(0, 0);
        int r2 = memfd_create("", 0);
        unsigned long n = 1<<30;
        fallocate(r2, 0, 0, n);
        sendfile(r1, r2, 0, n);

Add cond_resched() into __splice_from_pipe() to fix the problem.

CC: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

32f04380

vfs: Make sendfile(2) killable even better · e392e07a

Jan Kara authored Nov 23, 2015

commit c725bfce upstream.

Commit 296291cd (mm: make sendfile(2) killable) fixed an issue where
sendfile(2) was doing a lot of tiny writes into a filesystem and thus
was unkillable for a long time. However sendfile(2) can be (mis)used to
issue lots of writes into arbitrary file descriptor such as evenfd or
similar special file descriptors which never hit the standard filesystem
write path and thus are still unkillable. E.g. the following example
from Dmitry burns CPU for ~16s on my test system without possibility to
be killed:

        int r1 = eventfd(0, 0);
        int r2 = memfd_create("", 0);
        unsigned long n = 1<<30;
        fallocate(r2, 0, 0, n);
        sendfile(r1, r2, 0, n);

There are actually quite a few tests for pending signals in sendfile
code however we data to write is always available none of them seems to
trigger. So fix the problem by adding a test for pending signal into
splice_from_pipe_next() also before the loop waiting for pipe buffers to
be available. This should fix all the lockup issues with sendfile of the
do-ton-of-tiny-writes nature.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

e392e07a

fix sysvfs symlinks · dbc9f70b

Al Viro authored Nov 23, 2015

commit 0ebf7f10 upstream.

The thing got broken back in 2002 - sysvfs does *not* have inline
symlinks; even short ones have bodies stored in the first block
of file.  sysv_symlink() handles that correctly; unfortunately,
attempting to look an existing symlink up will end up confusing
them for inline symlinks, and interpret the block number containing
the body as the body itself.

Nobody has noticed until now, which says something about the level
of testing sysvfs gets ;-/
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

dbc9f70b

dm thin: fix regression in advertised discard limits · ecef95fe

Mike Snitzer authored Nov 23, 2015

commit 0fcb04d5 upstream.

When establishing a thin device's discard limits we cannot rely on the
underlying thin-pool device's discard capabilities (which are inherited
from the thin-pool's underlying data device) given that DM thin devices
must provide discard support even when the thin-pool's underlying data
device doesn't support discards.

Users were exposed to this thin device discard limits regression if
their thin-pool's underlying data device does _not_ support discards.
This regression caused all upper-layers that called the
blkdev_issue_discard() interface to not be able to issue discards to
thin devices (because discard_granularity was 0). This regression
wasn't caught earlier because the device-mapper-test-suite's extensive
'thin-provisioning' discard tests are only ever performed against
thin-pool's with data devices that support discards.

Fix is to have thin_io_hints() test the pool's 'discard_enabled' feature
rather than inferring whether or not a thin device's discard support
should be enabled by looking at the thin-pool's discard_granularity.

Fixes: 21607670 ("dm thin: disable discard support for thin devices if pool's is disabled")
Reported-by: Mike Gerber <mike@sprachgewalt.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

ecef95fe

ARC: dw2 unwind: Remove falllback linear search thru FDE entries · 4a38fa76

Vineet Gupta authored Nov 23, 2015

commit 2e22502c upstream.

Fixes STAR 9000953410: "perf callgraph profiling causing RCU stalls"

| perf record -g -c 15000 -e cycles /sbin/hackbench
|
| INFO: rcu_preempt self-detected stall on CPU
| 1: (1 GPs behind) idle=609/140000000000002/0 softirq=2914/2915 fqs=603
| Task dump for CPU 1:

in-kernel dwarf unwinder has a fast binary lookup and a fallback linear
search (which iterates thru each of ~11K entries) thus takes 2 orders of
magnitude longer (~3 million cycles vs. 2000). Routines written in hand
assembler lack dwarf info (as we don't support assembler CFI pseudo-ops
yet) fail the unwinder binary lookup, hit linear search, failing
nevertheless in the end.

However the linear search is pointless as binary lookup tables are created
from it in first place. It is impossible to have binary lookup fail while
succeed the linear search. It is pure waste of cycles thus removed by
this patch.

This manifested as RCU stalls / NMI watchdog splat when running
hackbench under perf with callgraph profiling. The triggering condition
was perf counter overflowing in routine lacking dwarf info (like memset)
leading to patheic 3 million cycle unwinder slow path and by the time it
returned new interrupts were already pending (Timer, IPI) and taken
rightaway. The original memset didn't make forward progress, system kept
accruing more interrupts and more unwinder delayes in a vicious feedback
loop, ultimately triggering the NMI diagnostic.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

4a38fa76

powerpc/tm: Check for already reclaimed tasks · 74a55442

Michael Neuling authored Nov 19, 2015

commit 7f821fc9 upstream.

Currently we can hit a scenario where we'll tm_reclaim() twice.  This
results in a TM bad thing exception because the second reclaim occurs
when not in suspend mode.

The scenario in which this can happen is the following.  We attempt to
deliver a signal to userspace.  To do this we need obtain the stack
pointer to write the signal context.  To get this stack pointer we
must tm_reclaim() in case we need to use the checkpointed stack
pointer (see get_tm_stackpointer()).  Normally we'd then return
directly to userspace to deliver the signal without going through
__switch_to().

Unfortunatley, if at this point we get an error (such as a bad
userspace stack pointer), we need to exit the process.  The exit will
result in a __switch_to().  __switch_to() will attempt to save the
process state which results in another tm_reclaim().  This
tm_reclaim() now causes a TM Bad Thing exception as this state has
already been saved and the processor is no longer in TM suspend mode.
Whee!

This patch checks the state of the MSR to ensure we are TM suspended
before we attempt the tm_reclaim().  If we've already saved the state
away, we should no longer be in TM suspend mode.  This has the
additional advantage of checking for a potential TM Bad Thing
exception.

Found using syscall fuzzer.

Fixes: fb09692e ("powerpc: Add reclaim and recheckpoint functions for context switching transactional memory processes")
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

74a55442

powerpc/tm: Block signal return setting invalid MSR state · dc74bbc2

Michael Neuling authored Nov 19, 2015

commit d2b9d2a5 upstream.

Currently we allow both the MSR T and S bits to be set by userspace on
a signal return.  Unfortunately this is a reserved configuration and
will cause a TM Bad Thing exception if attempted (via rfid).

This patch checks for this case in both the 32 and 64 bit signals
code.  If both T and S are set, we mark the context as invalid.

Found using a syscall fuzzer.

Fixes: 2b0a576d ("powerpc: Add new transactional memory state to the signal context")
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

dc74bbc2

watchdog: omap_wdt: fix null pointer dereference · efccab69

Peter Robinson authored Nov 02, 2015

commit de55acd1 upstream.

Fix issue from two patches overlapping causing a kernel oops

[ 3569.297449] Unable to handle kernel NULL pointer dereference at virtual address 00000088
[ 3569.306272] pgd = dc894000
[ 3569.309287] [00000088] *pgd=00000000
[ 3569.313104] Internal error: Oops: 5 [#1] SMP ARM
[ 3569.317986] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_filter ebtable_nat ebtable_broute bridge stp llc ebtables ip6table_security ip6table_raw ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_filter ip6_tables iptable_security iptable_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle musb_dsps cppi41 musb_hdrc phy_am335x udc_core phy_generic phy_am335x_control omap_sham omap_aes omap_rng omap_hwspinlock omap_mailbox hwspinlock_core musb_am335x omap_wdt at24 8250_omap leds_gpio cpufreq_dt smsc davinci_mdio mmc_block ti_cpsw cpsw_common ptp pps_core cpsw_ale davinci_cpdma omap_hsmmc omap_dma mmc_core i2c_dev
[ 3569.386293] CPU: 0 PID: 1429 Comm: wdctl Not tainted 4.3.0-0.rc7.git0.1.fc24.armv7hl #1
[ 3569.394740] Hardware name: Generic AM33XX (Flattened Device Tree)
[ 3569.401179] task: dbd11a00 ti: dbaac000 task.ti: dbaac000
[ 3569.406917] PC is at omap_wdt_get_timeleft+0xc/0x20 [omap_wdt]
[ 3569.413106] LR is at watchdog_ioctl+0x3cc/0x42c
[ 3569.417902] pc : [<bf0ab138>]    lr : [<c0739c54>]    psr: 600f0013
[ 3569.417902] sp : dbaadf18  ip : 00000003  fp : 7f5d3bbe
[ 3569.430014] r10: 00000000  r9 : 00000003  r8 : bef21ab8
[ 3569.435535] r7 : dbbc0f7c  r6 : dbbc0f18  r5 : bef21ab8  r4 : 00000000
[ 3569.442427] r3 : 00000000  r2 : 00000000  r1 : 8004570a  r0 : dbbc0f18
[ 3569.449323] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[ 3569.456858] Control: 10c5387d  Table: 9c894019  DAC: 00000051
[ 3569.462927] Process wdctl (pid: 1429, stack limit = 0xdbaac220)
[ 3569.469179] Stack: (0xdbaadf18 to 0xdbaae000)
[ 3569.473790] df00:                                                       bef21ab8 dbf60e38
[ 3569.482441] df20: dc91b840 8004570a bef21ab8 c03988a4 dbaadf48 dc854000 00000000 dd313850
[ 3569.491092] df40: ddf033b8 0000570a dc91b80b dbaadf3c dbf60e38 00000020 c0df9250 c0df6c48
[ 3569.499741] df60: dc91b840 8004570a 00000000 dc91b840 dc91b840 8004570a bef21ab8 00000003
[ 3569.508389] df80: 00000000 c03989d4 bef21b74 7f5d3bad 00000003 00000036 c020fcc4 dbaac000
[ 3569.517037] dfa0: 00000000 c020fb00 bef21b74 7f5d3bad 00000003 8004570a bef21ab8 00000001
[ 3569.525685] dfc0: bef21b74 7f5d3bad 00000003 00000036 00000001 00000000 7f5e4eb0 7f5d3bbe
[ 3569.534334] dfe0: 7f5e4f10 bef21a3c 7f5d0a54 b6e97e0c a00f0010 00000003 00000000 00000000
[ 3569.543038] [<bf0ab138>] (omap_wdt_get_timeleft [omap_wdt]) from [<c0739c54>] (watchdog_ioctl+0x3cc/0x42c)
[ 3569.553266] [<c0739c54>] (watchdog_ioctl) from [<c03988a4>] (do_vfs_ioctl+0x5bc/0x698)
[ 3569.561648] [<c03988a4>] (do_vfs_ioctl) from [<c03989d4>] (SyS_ioctl+0x54/0x7c)
[ 3569.569400] [<c03989d4>] (SyS_ioctl) from [<c020fb00>] (ret_fast_syscall+0x0/0x3c)
[ 3569.577413] Code: e12fff1e e52de004 e8bd4000 e5903060 (e5933088)
[ 3569.584089] ---[ end trace cec3039bd3ae610a ]---
Signed-off-by: Peter Robinson <pbrobinson@gmail.com>
Acked-by: Lars Poeschel <poeschel@lemonage.de>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

efccab69

ARM: imx: add platform irq type setting in gpc · 4cca9906

Anson Huang authored Oct 20, 2015

commit 4699ccbf upstream.

GPC irq domain is a child domain of GIC, now all of platform irqs
are inside GPC domain, during the module populate, all devices irq
should have correct type setting in GIC, however, there is no
.irq_set_type callback setting in GPC, so the irq_set_type will be
skipped and cause all irqs' type in /proc/interrupt are "edge" which
mismatch with irq type setting in dtb file. Since GPC has no irq
type setting, so just tell kernel to use irq_chip_set_type_parent.
Signed-off-by: Anson Huang <Anson.Huang@freescale.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

4cca9906

blk-mq: fix calling unplug callbacks with preempt disabled · d0304c3c

Jens Axboe authored Nov 20, 2015

commit b094f89c upstream.

Liu reported that running certain parts of xfstests threw the
following error:

BUG: sleeping function called from invalid context at mm/page_alloc.c:3190
in_atomic(): 1, irqs_disabled(): 0, pid: 6, name: kworker/u16:0
3 locks held by kworker/u16:0/6:
 #0:  ("writeback"){++++.+}, at: [<ffffffff8107f083>] process_one_work+0x173/0x730
 #1:  ((&(&wb->dwork)->work)){+.+.+.}, at: [<ffffffff8107f083>] process_one_work+0x173/0x730
 #2:  (&type->s_umount_key#44){+++++.}, at: [<ffffffff811e6805>] trylock_super+0x25/0x60
CPU: 5 PID: 6 Comm: kworker/u16:0 Tainted: G           OE   4.3.0+ #3
Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
Workqueue: writeback wb_workfn (flush-btrfs-108)
 ffffffff81a3abab ffff88042e282ba8 ffffffff8130191b ffffffff81a3abab
 0000000000000c76 ffff88042e282ba8 ffff88042e27c180 ffff88042e282bd8
 ffffffff8108ed95 ffff880400000004 0000000000000000 0000000000000c76
Call Trace:
 [<ffffffff8130191b>] dump_stack+0x4f/0x74
 [<ffffffff8108ed95>] ___might_sleep+0x185/0x240
 [<ffffffff8108eea2>] __might_sleep+0x52/0x90
 [<ffffffff811817e8>] __alloc_pages_nodemask+0x268/0x410
 [<ffffffff8109a43c>] ? sched_clock_local+0x1c/0x90
 [<ffffffff8109a6d1>] ? local_clock+0x21/0x40
 [<ffffffff810b9eb0>] ? __lock_release+0x420/0x510
 [<ffffffff810b534c>] ? __lock_acquired+0x16c/0x3c0
 [<ffffffff811ca265>] alloc_pages_current+0xc5/0x210
 [<ffffffffa0577105>] ? rbio_is_full+0x55/0x70 [btrfs]
 [<ffffffff810b7ed8>] ? mark_held_locks+0x78/0xa0
 [<ffffffff81666d50>] ? _raw_spin_unlock_irqrestore+0x40/0x60
 [<ffffffffa0578c0a>] full_stripe_write+0x5a/0xc0 [btrfs]
 [<ffffffffa0578ca9>] __raid56_parity_write+0x39/0x60 [btrfs]
 [<ffffffffa0578deb>] run_plug+0x11b/0x140 [btrfs]
 [<ffffffffa0578e33>] btrfs_raid_unplug+0x23/0x70 [btrfs]
 [<ffffffff812d36c2>] blk_flush_plug_list+0x82/0x1f0
 [<ffffffff812e0349>] blk_sq_make_request+0x1f9/0x740
 [<ffffffff812ceba2>] ? generic_make_request_checks+0x222/0x7c0
 [<ffffffff812cf264>] ? blk_queue_enter+0x124/0x310
 [<ffffffff812cf1d2>] ? blk_queue_enter+0x92/0x310
 [<ffffffff812d0ae2>] generic_make_request+0x172/0x2c0
 [<ffffffff812d0ad4>] ? generic_make_request+0x164/0x2c0
 [<ffffffff812d0ca0>] submit_bio+0x70/0x140
 [<ffffffffa0577b29>] ? rbio_add_io_page+0x99/0x150 [btrfs]
 [<ffffffffa0578a89>] finish_rmw+0x4d9/0x600 [btrfs]
 [<ffffffffa0578c4c>] full_stripe_write+0x9c/0xc0 [btrfs]
 [<ffffffffa057ab7f>] raid56_parity_write+0xef/0x160 [btrfs]
 [<ffffffffa052bd83>] btrfs_map_bio+0xe3/0x2d0 [btrfs]
 [<ffffffffa04fbd6d>] btrfs_submit_bio_hook+0x8d/0x1d0 [btrfs]
 [<ffffffffa05173c4>] submit_one_bio+0x74/0xb0 [btrfs]
 [<ffffffffa0517f55>] submit_extent_page+0xe5/0x1c0 [btrfs]
 [<ffffffffa0519b18>] __extent_writepage_io+0x408/0x4c0 [btrfs]
 [<ffffffffa05179c0>] ? alloc_dummy_extent_buffer+0x140/0x140 [btrfs]
 [<ffffffffa051dc88>] __extent_writepage+0x218/0x3a0 [btrfs]
 [<ffffffff810b7ed8>] ? mark_held_locks+0x78/0xa0
 [<ffffffffa051e2c9>] extent_write_cache_pages.clone.0+0x2f9/0x400 [btrfs]
 [<ffffffffa051e422>] extent_writepages+0x52/0x70 [btrfs]
 [<ffffffffa05001f0>] ? btrfs_set_inode_index+0x70/0x70 [btrfs]
 [<ffffffffa04fcc17>] btrfs_writepages+0x27/0x30 [btrfs]
 [<ffffffff81184df3>] do_writepages+0x23/0x40
 [<ffffffff81212229>] __writeback_single_inode+0x89/0x4d0
 [<ffffffff81212a60>] ? writeback_sb_inodes+0x260/0x480
 [<ffffffff81212a60>] ? writeback_sb_inodes+0x260/0x480
 [<ffffffff8121295f>] ? writeback_sb_inodes+0x15f/0x480
 [<ffffffff81212ad2>] writeback_sb_inodes+0x2d2/0x480
 [<ffffffff810b1397>] ? down_read_trylock+0x57/0x60
 [<ffffffff811e6805>] ? trylock_super+0x25/0x60
 [<ffffffff810d629f>] ? rcu_read_lock_sched_held+0x4f/0x90
 [<ffffffff81212d0c>] __writeback_inodes_wb+0x8c/0xc0
 [<ffffffff812130b5>] wb_writeback+0x2b5/0x500
 [<ffffffff810b7ed8>] ? mark_held_locks+0x78/0xa0
 [<ffffffff810660a8>] ? __local_bh_enable_ip+0x68/0xc0
 [<ffffffff81213362>] ? wb_do_writeback+0x62/0x310
 [<ffffffff812133c1>] wb_do_writeback+0xc1/0x310
 [<ffffffff8107c3d9>] ? set_worker_desc+0x79/0x90
 [<ffffffff81213842>] wb_workfn+0x92/0x330
 [<ffffffff8107f133>] process_one_work+0x223/0x730
 [<ffffffff8107f083>] ? process_one_work+0x173/0x730
 [<ffffffff8108035f>] ? worker_thread+0x18f/0x430
 [<ffffffff810802ed>] worker_thread+0x11d/0x430
 [<ffffffff810801d0>] ? maybe_create_worker+0xf0/0xf0
 [<ffffffff810801d0>] ? maybe_create_worker+0xf0/0xf0
 [<ffffffff810858df>] kthread+0xef/0x110
 [<ffffffff8108f74e>] ? schedule_tail+0x1e/0xd0
 [<ffffffff810857f0>] ? __init_kthread_worker+0x70/0x70
 [<ffffffff816673bf>] ret_from_fork+0x3f/0x70
 [<ffffffff810857f0>] ? __init_kthread_worker+0x70/0x70

The issue is that we've got the software context pinned while
calling blk_flush_plug_list(), which flushes callbacks that
are allowed to sleep. btrfs and raid has such callbacks.

Flip the checks around a bit, so we can enable preempt a bit
earlier and flush plugs without having preempt disabled.

This only affects blk-mq driven devices, and only those that
register a single queue.
Reported-by: Liu Bo <bo.li.liu@oracle.com>
Tested-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

d0304c3c

ALSA: hda - Apply HP headphone fixups more generically · 37a6f23f

Takashi Iwai authored Nov 09, 2015

commit 196543d5 upstream.

It turned out that many HP laptops suffer from the same problem as
fixed in commit [c932b98c: ALSA: hda - Apply pin fixup for HP
ProBook 6550b].  But, it's tiresome to list up all such PCI SSIDs, as
there are really lots of HP machines.

Instead, we do a bit more clever, try to check the supposedly dock and
built-in headphone pins, and apply the fixup when both seem valid.
This rule can be applied generically to all models using the same
quirk, so we'll fix all in a shot.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=107491Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

37a6f23f

mac: validate mac_partition is within sector · 44f685e9

Kees Cook authored Nov 19, 2015

commit 02e2a5bf upstream.

If md->signature == MAC_DRIVER_MAGIC and md->block_size == 1023, a single
512 byte sector would be read (secsize / 512). However the partition
structure would be located past the end of the buffer (secsize % 512).
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

44f685e9

dm crypt: fix a possible hang due to race condition on exit · aebe441c

Mikulas Patocka authored Nov 19, 2015

commit bcbd94ff upstream.

A kernel thread executes __set_current_state(TASK_INTERRUPTIBLE),
__add_wait_queue, spin_unlock_irq and then tests kthread_should_stop().
It is possible that the processor reorders memory accesses so that
kthread_should_stop() is executed before __set_current_state().  If such
reordering happens, there is a possible race on thread termination:

CPU 0:
calls kthread_should_stop()
	it tests KTHREAD_SHOULD_STOP bit, returns false
CPU 1:
calls kthread_stop(cc->write_thread)
	sets the KTHREAD_SHOULD_STOP bit
	calls wake_up_process on the kernel thread, that sets the thread
	state to TASK_RUNNING
CPU 0:
sets __set_current_state(TASK_INTERRUPTIBLE)
spin_unlock_irq(&cc->write_thread_wait.lock)
schedule() - and the process is stuck and never terminates, because the
	state is TASK_INTERRUPTIBLE and wake_up_process on CPU 1 already
	terminated

Fix this race condition by using a new flag DM_CRYPT_EXIT_THREAD to
signal that the kernel thread should exit.  The flag is set and tested
while holding cc->write_thread_wait.lock, so there is no possibility of
racy access to the flag.

Also, remove the unnecessary set_task_state(current, TASK_RUNNING)
following the schedule() call.  When the process was woken up, its state
was already set to TASK_RUNNING.  Other kernel code also doesn't set the
state to TASK_RUNNING following schedule() (for example,
do_wait_for_common in completion.c doesn't do it).

Fixes: dc267621 ("dm crypt: offload writes to thread")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

aebe441c