Commits · 3cf02960bc4def449050e7a65802611f550e20fb · nexedi / linux

28 Oct, 2016 40 commits

Linux 4.8.5 · 3cf02960
Greg Kroah-Hartman authored Oct 28, 2016

3cf02960

Revert "target: Fix residual overflow handling in target_complete_cmd_with_length" · 7979d3c6

Nicholas Bellinger authored Oct 16, 2016

commit 61f36166 upstream.

This reverts commit c1ccbfe0.

Reverting this patch, as it incorrectly assumes the additional length
for INQUIRY in target_complete_cmd_with_length() is SCSI allocation
length, which breaks existing user-space code when SCSI allocation
length is smaller than additional length.

  root@scsi-mq:~# sg_inq --len=4 -vvvv /dev/sdb
  found bsg_major=253
  open /dev/sdb with flags=0x800
      inquiry cdb: 12 00 00 00 04 00
        duration=0 ms
      inquiry: pass-through requested 4 bytes (data-in) but got -28 bytes
      inquiry: pass-through can't get negative bytes, say it got none
      inquiry: got too few bytes (0)
  INQUIRY resid (32) should never exceed requested len=4
      inquiry: failed requesting 4 byte response: Malformed response to
               SCSI command [resid=32]

AFAICT the original change was not to address a specific host issue,
so go ahead and revert to original logic for now.

Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Sumit Rai <sumitrai96@gmail.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

7979d3c6

target: Don't override EXTENDED_COPY xcopy_pt_cmd SCSI status code · c24cc973

Dinesh Israni authored Oct 10, 2016

commit 926317de upstream.

This patch addresses a bug where a local EXTENDED_COPY WRITE or READ
backend I/O request would always return SAM_STAT_CHECK_CONDITION,
even if underlying xcopy_pt_cmd->se_cmd generated a different
SCSI status code.

ESX host environments expect to hit SAM_STAT_RESERVATION_CONFLICT
for certain scenarios, and SAM_STAT_CHECK_CONDITION results in
non-retriable status for these cases.

Tested on v4.1.y with ESX v5.5u2+ with local IBLOCK backend copy.
Reported-by: Nixon Vincent <nixon.vincent@calsoftinc.com>
Tested-by: Nixon Vincent <nixon.vincent@calsoftinc.com>
Cc: Nixon Vincent <nixon.vincent@calsoftinc.com>
Tested-by: Dinesh Israni <ddi@datera.io>
Signed-off-by: Dinesh Israni <ddi@datera.io>
Cc: Dinesh Israni <ddi@datera.io>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

c24cc973

target: Make EXTENDED_COPY 0xe4 failure return COPY TARGET DEVICE NOT REACHABLE · 1f0b8fc2

Nicholas Bellinger authored Oct 08, 2016

commit 449a1378 upstream.

This patch addresses a bug where EXTENDED_COPY across multiple LUNs
results in a CHECK_CONDITION when the source + destination are not
located on the same physical node.

ESX Host environments expect sense COPY_ABORTED w/ COPY TARGET DEVICE
NOT REACHABLE to be returned when this occurs, in order to signal
fallback to local copy method.

As described in section 6.3.3 of spc4r22:

  "If it is not possible to complete processing of a segment because the
   copy manager is unable to establish communications with a copy target
   device, because the copy target device does not respond to INQUIRY,
   or because the data returned in response to INQUIRY indicates
   an unsupported logical unit, then the EXTENDED COPY command shall be
   terminated with CHECK CONDITION status, with the sense key set to
   COPY ABORTED, and the additional sense code set to COPY TARGET DEVICE
   NOT REACHABLE."

Tested on v4.1.y with ESX v5.5u2+ with BlockCopy across multiple nodes.
Reported-by: Nixon Vincent <nixon.vincent@calsoftinc.com>
Tested-by: Nixon Vincent <nixon.vincent@calsoftinc.com>
Cc: Nixon Vincent <nixon.vincent@calsoftinc.com>
Tested-by: Dinesh Israni <ddi@datera.io>
Signed-off-by: Dinesh Israni <ddi@datera.io>
Cc: Dinesh Israni <ddi@datera.io>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

1f0b8fc2

target: Re-add missing SCF_ACK_KREF assignment in v4.1.y · cb5fbb20

Nicholas Bellinger authored Oct 04, 2016

commit 527268df upstream.

This patch fixes a regression in >= v4.1.y code where the original
SCF_ACK_KREF assignment in target_get_sess_cmd() was dropped upstream
in commit 054922bb, but the series for addressing TMR ABORT_TASK +
LUN_RESET with fabric session reinstatement in commit febe562c still
depends on this code in transport_cmd_finish_abort().

The regression manifests itself as a se_cmd->cmd_kref +1 leak, where
ABORT_TASK + LUN_RESET can hang indefinately for a specific I_T session
for drivers using SCF_ACK_KREF, resulting in hung kthreads.

This patch has been verified with v4.1.y code.
Reported-by: Vaibhav Tandon <vst@datera.io>
Tested-by: Vaibhav Tandon <vst@datera.io>
Cc: Vaibhav Tandon <vst@datera.io>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cb5fbb20

target/tcm_fc: use CPU affinity for responses · 57339009

Hannes Reinecke authored Aug 22, 2016

commit 1ba0158f upstream.

The libfc stack assigns exchange IDs based on the CPU the request
was received on, so we need to send the responses via the same CPU.
Otherwise the send logic gets confuses and responses will be delayed,
causing exchange timeouts on the initiator side.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

57339009

ubifs: Fix xattr_names length in exit paths · e435e0e3

Richard Weinberger authored Sep 20, 2016

commit 843741c5 upstream.

When the operation fails we also have to undo the changes
we made to ->xattr_names. Otherwise listxattr() will report
wrong lengths.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

e435e0e3

jbd2: fix incorrect unlock on j_list_lock · 58f620cf

Taesoo Kim authored Oct 12, 2016

commit 559cce69 upstream.

When 'jh->b_transaction == transaction' (asserted by below)

  J_ASSERT_JH(jh, (jh->b_transaction == transaction || ...

'journal->j_list_lock' will be incorrectly unlocked, since
the the lock is aquired only at the end of if / else-if
statements (missing the else case).
Signed-off-by: Taesoo Kim <tsgatesv@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Fixes: 6e4862a5Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

58f620cf

ext4: do not advertise encryption support when disabled · 224c1a19

Eric Biggers authored Oct 12, 2016

commit c4704a4f upstream.

The sysfs file /sys/fs/ext4/features/encryption was present on kernels
compiled with CONFIG_EXT4_FS_ENCRYPTION=n.  This was misleading because
such kernels do not actually support ext4 encryption.  Therefore, only
provide this file on kernels compiled with CONFIG_EXT4_FS_ENCRYPTION=y.

Note: since the ext4 feature files are all hardcoded to have a contents
of "supported", it really is the presence or absence of the file that is
significant, not the contents (and this change reflects that).
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

224c1a19

fscrypto: lock inode while setting encryption policy · deb197e9

Eric Biggers authored Oct 15, 2016

commit 8906a822 upstream.

i_rwsem needs to be acquired while setting an encryption policy so that
concurrent calls to FS_IOC_SET_ENCRYPTION_POLICY are correctly
serialized (especially the ->get_context() + ->set_context() pair), and
so that new files cannot be created in the directory during or after the
->empty_dir() check.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

deb197e9

fscrypto: make XTS tweak initialization endian-independent · 8b722a45

Eric Biggers authored Oct 12, 2016

commit fb445437 upstream.

The XTS tweak (or IV) was initialized differently on little endian and
big endian systems.  Because the ciphertext depends on the XTS tweak, it
was not possible to use an encrypted filesystem created by a little
endian system on a big endian system and vice versa, even if they shared
the same PAGE_SIZE.  Fix this by always using little endian.

This will break hypothetical big endian users of ext4 or f2fs
encryption.  However, all users we are aware of are little endian, and
it's believed that "real" big endian users are unlikely to exist yet.
So this might as well be fixed now before it's too late.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

8b722a45

KVM: s390: reject invalid modes for runtime instrumentation · 628ab14b

Christian Borntraeger authored Sep 28, 2016

commit a5efb6b6 upstream.

Usually a validity intercept is a programming error of the host
because of invalid entries in the state description.
We can get a validity intercept if the mode of the runtime
instrumentation control block is wrong. As the host does not know
which modes are valid, this can be used by userspace to trigger
a WARN.
Instead of printing a WARN let's return an error to userspace as
this can only happen if userspace provides a malformed initial
value (e.g. on migration). The kernel should never warn on bogus
input. Instead let's log it into the s390 debug feature.

While at it, let's return -EINVAL for all validity intercepts as
this will trigger an error in QEMU like

error: kvm run failed Invalid argument
PSW=mask 0404c00180000000 addr 000000000063c226 cc 00
R00=000000000000004f R01=0000000000000004 R02=0000000000760005 R03=000000007fe0a000
R04=000000000064ba2a R05=000000049db73dd0 R06=000000000082c4b0 R07=0000000000000041
R08=0000000000000002 R09=000003e0804042a8 R10=0000000496152c42 R11=000000007fe0afb0
[...]

This will avoid an endless loop of validity intercepts.

Fixes: c6e5f166 ("KVM: s390: implement the RI support of guest")
Acked-by: Fan Zhang <zhangfan@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

628ab14b

mmc: rtsx_usb_sdmmc: Handle runtime PM while changing the led · 9d74d0b0

Ulf Hansson authored Sep 15, 2016

commit 4f48aa7a upstream.

Accesses of the rtsx sdmmc's parent device, which is the rtsx usb device,
must be done when it's runtime resumed. Currently this isn't case when
changing the led, so let's fix this by adding a pm_runtime_get_sync() and
a pm_runtime_put() around those operations.
Reported-by: Ritesh Raj Sarraf <rrs@researchut.com>
Tested-by: Ritesh Raj Sarraf <rrs@researchut.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

9d74d0b0

mmc: rtsx_usb_sdmmc: Avoid keeping the device runtime resumed when unused · 2b545660

Ulf Hansson authored Sep 27, 2016

commit 31cf742f upstream.

The rtsx_usb_sdmmc driver may bail out in its ->set_ios() callback when no
SD card is inserted. This is wrong, as it could cause the device to remain
runtime resumed when it's unused. Fix this behaviour.
Tested-by: Ritesh Raj Sarraf <rrs@researchut.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

2b545660

mmc: core: switch to 1V8 or 1V2 for hs400es mode · 509424fd

Shawn Lin authored Sep 30, 2016

commit 1720d354 upstream.

When introducing hs400es, I didn't notice that we haven't
switched voltage to 1V2 or 1V8 for it. That happens to work
as the first controller claiming to support hs400es, arasan(5.1),
which is designed to only support 1V8. So the voltage is fixed to 1V8.
But it actually is wrong, and will not fit for other host controllers.
Let's fix it.

Fixes: commit 81ac2af6 ("mmc: core: implement enhanced strobe support")
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

509424fd

mmc: core: Annotate cmd_hdr as __le32 · 8573b7f6

Jiri Slaby authored Oct 03, 2016

commit 3f2d2664 upstream.

Commit f68381a7 (mmc: block: fix packed command header endianness)
correctly fixed endianness handling of packed_cmd_hdr in
mmc_blk_packed_hdr_wrq_prep.

But now, sparse complains about incorrect types:
drivers/mmc/card/block.c:1613:27: sparse: incorrect type in assignment (different base types)
drivers/mmc/card/block.c:1613:27:    expected unsigned int [unsigned] [usertype] <noident>
drivers/mmc/card/block.c:1613:27:    got restricted __le32 [usertype] <noident>
...

So annotate cmd_hdr properly using __le32 to make everyone happy.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Fixes: f68381a7 (mmc: block: fix packed command header endianness)
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

8573b7f6

powerpc/mm: Prevent unlikely crash in copro_calculate_slb() · 389db4fe

Frederic Barrat authored Jun 17, 2016

commit d2cf909c upstream.

If a cxl adapter faults on an invalid address for a kernel context, we
may enter copro_calculate_slb() with a NULL mm pointer (kernel
context) and an effective address which looks like a user
address. Which will cause a crash when dereferencing mm. It is clearly
an AFU bug, but there's no reason to crash either. So return an error,
so that cxl can ack the interrupt with an address error.

Fixes: 73d16a6e ("powerpc/cell: Move data segment faulting code out of cell platform")
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

389db4fe

ceph: fix error handling in ceph_read_iter · d3f1ec54

Nikolay Borisov authored Oct 10, 2016

commit 0d7718f6 upstream.

In case __ceph_do_getattr returns an error and the retry_op in
ceph_read_iter is not READ_INLINE, then it's possible to invoke
__free_page on a page which is NULL, this naturally leads to a crash.
This can happen when, for example, a process waiting on a MDS reply
receives sigterm.

Fix this by explicitly checking whether the page is set or not.
Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

d3f1ec54

arm64: KVM: Take S1 walks into account when determining S2 write faults · d08622e1

Will Deacon authored Sep 29, 2016

commit 60e21a0e upstream.

The WnR bit in the HSR/ESR_EL2 indicates whether a data abort was
generated by a read or a write instruction. For stage 2 data aborts
generated by a stage 1 translation table walk (i.e. the actual page
table access faults at EL2), the WnR bit therefore reports whether the
instruction generating the walk was a load or a store, *not* whether the
page table walker was reading or writing the entry.

For page tables marked as read-only at stage 2 (e.g. due to KSM merging
them with the tables from another guest), this could result in livelock,
where a page table walk generated by a load instruction attempts to
set the access flag in the stage 1 descriptor, but fails to trigger
CoW in the host since only a read fault is reported.

This patch modifies the arm64 kvm_vcpu_dabt_iswrite function to
take into account stage 2 faults in stage 1 walks. Since DBM cannot be
disabled at EL2 for CPUs that implement it, we assume that these faults
are always causes by writes, avoiding the livelock situation at the
expense of occasional, spurious CoWs.

We could, in theory, do a bit better by checking the guest TCR
configuration and inspecting the page table to see why the PTE faulted.
However, I doubt this is measurable in practice, and the threat of
livelock is real.

Cc: Julien Grall <julien.grall@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

d08622e1

arm64: Cortex-A53 errata workaround: check for kernel addresses · ac591c10

Andre Przywara authored Oct 19, 2016

commit 87261d19 upstream.

Commit 7dd01aef ("arm64: trap userspace "dc cvau" cache operation on
errata-affected core") adds code to execute cache maintenance instructions
in the kernel on behalf of userland on CPUs with certain ARM CPU errata.
It turns out that the address hasn't been checked to be a valid user
space address, allowing userland to clean cache lines in kernel space.
Fix this by introducing an address check before executing the
instructions on behalf of userland.

Since the address doesn't come via a syscall parameter, we can't just
reject tagged pointers and instead have to remove the tag when checking
against the user address limit.

Fixes: 7dd01aef ("arm64: trap userspace "dc cvau" cache operation on errata-affected core")
Reported-by: Kristina Martsenko <kristina.martsenko@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
[will: rework commit message + replace access_ok with max_user_addr()]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ac591c10

arm64: kernel: Init MDCR_EL2 even in the absence of a PMU · 32c0a66f

Marc Zyngier authored Oct 17, 2016

commit 85054035 upstream.

Commit f436b2ac ("arm64: kernel: fix architected PMU registers
unconditional access") made sure we wouldn't access unimplemented
PMU registers, but also left MDCR_EL2 uninitialized in that case,
leading to trap bits being potentially left set.

Make sure we always write something in that register.

Fixes: f436b2ac ("arm64: kernel: fix architected PMU registers unconditional access")
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

32c0a66f

arm64: percpu: rewrite ll/sc loops in assembly · 69e1b2d4

Will Deacon authored Jul 04, 2016

commit 1e6e57d9 upstream.

Writing the outer loop of an LL/SC sequence using do {...} while
constructs potentially allows the compiler to hoist memory accesses
between the STXR and the branch back to the LDXR. On CPUs that do not
guarantee forward progress of LL/SC loops when faced with memory
accesses to the same ERG (up to 2k) between the failed STXR and the
branch back, we may end up livelocking.

This patch avoids this issue in our percpu atomics by rewriting the
outer loop as part of the LL/SC inline assembly block.

Fixes: f97fc810 ("arm64: percpu: Implement this_cpu operations")
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

69e1b2d4

arm64: kaslr: fix breakage with CONFIG_MODVERSIONS=y · f1816cea

Ard Biesheuvel authored Oct 13, 2016

commit 9c0e83c3 upstream.

As it turns out, the KASLR code breaks CONFIG_MODVERSIONS, since the
kcrctab has an absolute address field that is relocated at runtime
when the kernel offset is randomized.

This has been fixed already for PowerPC in the past, so simply wire up
the existing code dealing with this issue.

Fixes: f80fb3a3 ("arm64: add support for kernel ASLR")
Tested-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f1816cea

arm64: swp emulation: bound LL/SC retries before rescheduling · 44870b14

Will Deacon authored Jul 04, 2016

commit 1c5b51df upstream.

If a CPU does not implement a global monitor for certain memory types,
then userspace can attempt a kernel DoS by issuing SWP instructions
targetting the problematic memory (for example, a framebuffer mapped
with non-cacheable attributes).

The SWP emulation code protects against these sorts of attacks by
checking for pending signals and potentially rescheduling when the STXR
instruction fails during the emulation. Whilst this is good for avoiding
livelock, it harms emulation of legitimate SWP instructions on CPUs
where forward progress is not guaranteed if there are memory accesses to
the same reservation granule (up to 2k) between the failing STXR and
the retry of the LDXR.

This patch solves the problem by retrying the STXR a bounded number of
times (4) before breaking out of the LL/SC loop and looking for
something else to do.

Fixes: bd35a4ad ("arm64: Port SWP/SWPB emulation support from arm")
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

44870b14

memstick: rtsx_usb_ms: Manage runtime PM when accessing the device · 5e09db27

Ulf Hansson authored Sep 28, 2016

commit 9158cb29 upstream.

Accesses to the rtsx usb device, which is the parent of the rtsx memstick
device, must not be done unless it's runtime resumed. This is currently not
the case and it could trigger various errors.

Fix this by properly deal with runtime PM in this regards. This means
making sure the device is runtime resumed, when serving requests via the
->request() callback or changing settings via the ->set_param() callbacks.

Cc: Ritesh Raj Sarraf <rrs@researchut.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

5e09db27

memstick: rtsx_usb_ms: Runtime resume the device when polling for cards · 959288b7

Alan Stern authored Sep 26, 2016

commit 796aa46a upstream.

Accesses to the rtsx usb device, which is the parent of the rtsx memstick
device, must not be done unless it's runtime resumed.

Therefore when the rtsx_usb_ms driver polls for inserted memstick cards,
let's add pm_runtime_get|put*() to make sure accesses is done when the
rtsx usb device is runtime resumed.
Reported-by: Ritesh Raj Sarraf <rrs@researchut.com>
Tested-by: Ritesh Raj Sarraf <rrs@researchut.com>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

959288b7

isofs: Do not return EACCES for unknown filesystems · d0ee9157

Jan Kara authored Oct 04, 2016

commit a2ed0b39 upstream.

When isofs_mount() is called to mount a device read-write, it returns
EACCES even before it checks that the device actually contains an isofs
filesystem. This may confuse mount(8) which then tries to mount all
subsequent filesystem types in read-only mode.

Fix the problem by returning EACCES only once we verify that the device
indeed contains an iso9660 filesystem.

Fixes: 17b7f7cfReported-by: Kent Overstreet <kent.overstreet@gmail.com>
Reported-by: Karel Zak <kzak@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

d0ee9157

cxl: Prevent adapter reset if an active context exists · 627da039

Vaibhav Jain authored Oct 14, 2016

commit 70b565bb upstream.

This patch prevents resetting the cxl adapter via sysfs in presence of
one or more active cxl_context on it. This protects against an
unrecoverable error caused by PSL owning a dirty cache line even after
reset and host tries to touch the same cache line. In case a force reset
of the card is required irrespective of any active contexts, the int
value -1 can be stored in the 'reset' sysfs attribute of the card.

The patch introduces a new atomic_t member named contexts_num inside
struct cxl that holds the number of active context attached to the card
, which is checked against '0' before proceeding with the reset. To
prevent against a race condition where a context is activated just after
reset check is performed, the contexts_num is atomically set to '-1'
after reset-check to indicate that no more contexts can be activated on
the card anymore.

Before activating a context we atomically test if contexts_num is
non-negative and if so, increment its value by one. In case the value of
contexts_num is negative then it indicates that the card is about to be
reset and context activation is error-ed out at that point.

Fixes: 62fa19d4 ("cxl: Add ability to reset the card")
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

627da039

irqchip/gic-v3-its: Fix entry size mask for GITS_BASER · ee13e8a1

Vladimir Murzin authored Oct 17, 2016

commit 9224eb77 upstream.

Entry Size in GITS_BASER<n> occupies 5 bits [52:48], but we mask out 8
bits.

Fixes: cc2d3216 ("irqchip: GICv3: ITS command queue")
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ee13e8a1

irqchip/eznps: Acknowledge NPS_IPI before calling the handler · eff4f765

Noam Camus authored Oct 13, 2016

commit c0ca8df7 upstream.

IPI_IRQ (also TIMER0_IRQ) should be acked before the action->handler is called
in handle_percpu_devid_irq.

The IPI irq is edge sensitive and we might miss an IPI interrupt if it is
triggered again while the handler runs.

Fixes: 44df427c ("irqchip: add nps Internal and external irqchips")
Signed-off-by: Noam Camus <noamca@mellanox.com>
Cc: marc.zyngier@arm.com
Cc: jason@lakedaemon.net
Link: http://lkml.kernel.org/r/1476364532-12634-1-git-send-email-noamca@mellanox.comSigned-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eff4f765

irqchip/gicv3: Handle loop timeout proper · 82301da7

Dan Carpenter authored Oct 14, 2016

commit d102eb5c upstream.

The timeout loop terminates when the loop count is zero, but the decrement
of the count variable is post check. So count is -1 when we check for the
timeout and therefor the error message is supressed.

Change it to predecrement, so the error message is emitted.

[ tglx: Massaged changelog ]

Fixes: a2c22510 ("irqchip: gic-v3: Refactor gic_enable_redist to support both enabling and disabling")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: kernel-janitors@vger.kernel.org
Cc: Jason Cooper <jason@lakedaemon.net>
Link: http://lkml.kernel.org/r/20161014072534.GA15168@mwandaSigned-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

82301da7

sched/fair: Fix min_vruntime tracking · 2bb293cb

Peter Zijlstra authored Sep 20, 2016

commit b60205c7 upstream.

While going through enqueue/dequeue to review the movement of
set_curr_task() I noticed that the (2nd) update_min_vruntime() call in
dequeue_entity() is suspect.

It turns out, its actually wrong because it will consider
cfs_rq->curr, which could be the entry we just normalized. This mixes
different vruntime forms and leads to fail.

The purpose of the second update_min_vruntime() is to move
min_vruntime forward if the entity we just removed is the one that was
holding it back; _except_ for the DEQUEUE_SAVE case, because then we
know its a temporary removal and it will come back.

However, since we do put_prev_task() _after_ dequeue(), cfs_rq->curr
will still be set (and per the above, can be tranformed into a
different unit), so update_min_vruntime() should also consider
curr->on_rq. This also fixes another corner case where the enqueue
(which also does update_curr()->update_min_vruntime()) happens on the
rq->lock break in schedule(), between dequeue and put_prev_task.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Fixes: 1e876231 ("sched: Fix ->min_vruntime calculation in dequeue_entity()")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

2bb293cb

sched/fair: Fix incorrect task group ->load_avg · f6ba2614

Vincent Guittot authored Oct 19, 2016

commit b5a9b340 upstream.

A scheduler performance regression has been reported by Joseph Salisbury,
which he bisected back to:

  3d30544f ("sched/fair: Apply more PELT fixes)

The regression triggers when several levels of task groups are involved
(read: SystemD) and cpu_possible_mask != cpu_present_mask.

The root cause is that group entity's load (tg_child->se[i]->avg.load_avg)
is initialized to scale_load_down(se->load.weight). During the creation of
a child task group, its group entities on possible CPUs are attached to
parent's cfs_rq (tg_parent) and their loads are added to the parent's load
(tg_parent->load_avg) with update_tg_load_avg().

But only the load on online CPUs will then be updated to reflect real load,
whereas load on other CPUs will stay at the initial value.

The result is a tg_parent->load_avg that is higher than the real load, the
weight of group entities (tg_parent->se[i]->load.weight) on online CPUs is
smaller than it should be, and the task group gets a less running time than
what it could expect.

( This situation can be detected with /proc/sched_debug. The ".tg_load_avg"
  of the task group will be much higher than sum of ".tg_load_avg_contrib"
  of online cfs_rqs of the task group. )

The load of group entities don't have to be intialized to something else
than 0 because their load will increase when an entity is attached.
Reported-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: joonwoop@codeaurora.org
Fixes: 3d30544f ("sched/fair: Apply more PELT fixes)
Link: http://lkml.kernel.org/r/1476881123-10159-1-git-send-email-vincent.guittot@linaro.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f6ba2614

pinctrl: baytrail: Fix lockdep · cbc82f17

Ville Syrjälä authored Oct 03, 2016

commit a171bc51 upstream.

Initialize the spinlock before using it.

INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.8.0-dwc-bisect #4
Hardware name: Intel Corp. VALLEYVIEW C0 PLATFORM/BYT-T FFD8, BIOS BLAKFF81.X64.0088.R10.1403240443 FFD8_X64_R_2014_13_1_00 03/24/2014
 0000000000000000 ffff8800788ff770 ffffffff8133d597 0000000000000000
 0000000000000000 ffff8800788ff7e0 ffffffff810cfb9e 0000000000000002
 ffff8800788ff7d0 ffffffff8205b600 0000000000000002 ffff8800788ff7f0
Call Trace:
 [<ffffffff8133d597>] dump_stack+0x67/0x90
 [<ffffffff810cfb9e>] register_lock_class+0x52e/0x540
 [<ffffffff810d2081>] __lock_acquire+0x81/0x16b0
 [<ffffffff810cede1>] ? save_trace+0x41/0xd0
 [<ffffffff810d33b2>] ? __lock_acquire+0x13b2/0x16b0
 [<ffffffff810cf05a>] ? __lock_is_held+0x4a/0x70
 [<ffffffff810d3b1a>] lock_acquire+0xba/0x220
 [<ffffffff8136f1fe>] ? byt_gpio_get_direction+0x3e/0x80
 [<ffffffff81631567>] _raw_spin_lock_irqsave+0x47/0x60
 [<ffffffff8136f1fe>] ? byt_gpio_get_direction+0x3e/0x80
 [<ffffffff8136f1fe>] byt_gpio_get_direction+0x3e/0x80
 [<ffffffff813740a9>] gpiochip_add_data+0x319/0x7d0
 [<ffffffff81631723>] ? _raw_spin_unlock_irqrestore+0x43/0x70
 [<ffffffff8136fe3b>] byt_pinctrl_probe+0x2fb/0x620
 [<ffffffff8142fb0c>] platform_drv_probe+0x3c/0xa0
...

Based on the diff it looks like the problem was introduced in
commit 71e6ca61 ("pinctrl: baytrail: Register pin control handling")
but I wasn't able to verify that empirically as the parent commit
just oopsed when I tried to boot it.

Cc: Cristina Ciocan <cristina.ciocan@intel.com>
Fixes: 71e6ca61 ("pinctrl: baytrail: Register pin control handling")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cbc82f17

pinctrl: intel: Only restore pins that are used by the driver · 2e141f62

Mika Westerberg authored Oct 10, 2016

commit c538b943 upstream.

Dell XPS 13 (and maybe some others) uses a GPIO (CPU_GP_1) during suspend
to explicitly disable USB touchscreen interrupt. This is done to prevent
situation where the lid is closed the touchscreen is left functional.

The pinctrl driver (wrongly) assumes it owns all pins which are owned by
host and not locked down. It is perfectly fine for BIOS to use those pins
as it is also considered as host in this context.

What happens is that when the lid of Dell XPS 13 is closed, the BIOS
configures CPU_GP_1 low disabling the touchscreen interrupt. During resume
we restore all host owned pins to the known state which includes CPU_GP_1
and this overwrites what the BIOS has programmed there causing the
touchscreen to fail as no interrupts are reaching the CPU anymore.

Fix this by restoring only those pins we know are explicitly requested by
the kernel one way or other.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=176361Reported-by: AceLan Kao <acelan.kao@canonical.com>
Tested-by: AceLan Kao <acelan.kao@canonical.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

2e141f62

x86/boot/smp: Don't try to poke disabled/non-existent APIC · 85533a1a

Ville Syrjälä authored Oct 22, 2016

commit ff856051 upstream.

Apparently trying to poke a disabled or non-existent APIC
leads to a box that doesn't even boot. Let's not do that.

No real clue if this is the right fix, but at least my
P3 machine boots again.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: dyoung@redhat.com
Cc: kexec@lists.infradead.org
Fixes: 2a51fe08 ("arch/x86: Handle non enumerated CPU after physical hotplug")
Link: http://lkml.kernel.org/r/1477102684-5092-1-git-send-email-ville.syrjala@linux.intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

85533a1a

x86/platform/UV: Fix support for EFI_OLD_MEMMAP after BIOS callback updates · 5f874b4b

Alex Thorlton authored Oct 19, 2016

commit caef78b6 upstream.

Some time ago, we brought our UV BIOS callback code up to speed with the
new EFI memory mapping scheme, in commit:

    d1be84a2 ("x86/uv: Update uv_bios_call() to use efi_call_virt_pointer()")

By leveraging some changes that I made to a few of the EFI runtime
callback mechanisms, in commit:

    80e75596 ("efi: Convert efi_call_virt() to efi_call_virt_pointer()")

This got everything running smoothly on UV, with the new EFI mapping
code.  However, this left one, small loose end, in that EFI_OLD_MEMMAP
(a.k.a. efi=old_map) will no longer work on UV, on kernels that include
the aforementioned changes.

At the time this was not a major issue (in fact, it still really isn't),
but there's no reason that EFI_OLD_MEMMAP *shouldn't* work on our
systems.  This commit adds a check into uv_bios_call(), to see if we have
the EFI_OLD_MEMMAP bit set in efi.flags.  If it is set, we fall back to
using our old callback method, which uses efi_call() directly on the __va()
of our function pointer.
Signed-off-by: Alex Thorlton <athorlton@sgi.com>
Acked-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1476928131-170101-1-git-send-email-athorlton@sgi.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

5f874b4b

kvm: x86: memset whole irq_eoi · 05574eae

Jiri Slaby authored Oct 13, 2016

commit 8678654e upstream.

gcc 7 warns:
arch/x86/kvm/ioapic.c: In function 'kvm_ioapic_reset':
arch/x86/kvm/ioapic.c:597:2: warning: 'memset' used with length equal to number of elements without multiplication by element size [-Wmemset-elt-size]

And it is right. Memset whole array using sizeof operator.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
[Added x86 subject tag]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

05574eae

x86/e820: Don't merge consecutive E820_PRAM ranges · 9c50927f

Dan Williams authored Oct 12, 2016

commit 23446cb6 upstream.

Commit:

  917db484 ("x86/boot: Fix kdump, cleanup aborted E820_PRAM max_pfn manipulation")

... fixed up the broken manipulations of max_pfn in the presence of
E820_PRAM ranges.

However, it also broke the sanitize_e820_map() support for not merging
E820_PRAM ranges.

Re-introduce the enabling to keep resource boundaries between
consecutive defined ranges. Otherwise, for example, an environment that
boots with memmap=2G!8G,2G!10G will end up with a single 4G /dev/pmem0
device instead of a /dev/pmem0 and /dev/pmem1 device 2G in size.
Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Zhang Yi <yizhan@redhat.com>
Cc: linux-nvdimm@lists.01.org
Fixes: 917db484 ("x86/boot: Fix kdump, cleanup aborted E820_PRAM max_pfn manipulation")
Link: http://lkml.kernel.org/r/147629530854.10618.10383744751594021268.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

9c50927f

blkcg: Unlock blkcg_pol_mutex only once when cpd == NULL · f974fb26

Bart Van Assche authored Sep 29, 2016

commit bbb427e3 upstream.

Unlocking a mutex twice is wrong. Hence modify blkcg_policy_register()
such that blkcg_pol_mutex is unlocked once if cpd == NULL. This patch
avoids that smatch reports the following error:

block/blk-cgroup.c:1378: blkcg_policy_register() error: double unlock 'mutex:&blkcg_pol_mutex'

Fixes: 06b285bd ("blkcg: fix blkcg_policy_data allocation bug")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f974fb26