Commits · 10257d719686706aa669b348309cfd9fd9783ad9 · nexedi / linux

13 Feb, 2017 3 commits

EXPORT_SYMBOL radix_tree_replace_slot · 10257d71

Song Liu authored Jan 11, 2017

It will be used in drivers/md/raid5-cache.c
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>

10257d71

raid5: only dispatch IO from raid5d for harddisk raid · 765d704d

Shaohua Li authored Jan 04, 2017

We made raid5 stripe handling multi-thread before. It works well for
SSD. But for harddisk, the multi-threading creates more disk seek, so
not always improve performance. For several hard disks based raid5,
multi-threading is required as raid5d becames a bottleneck especially
for sequential write.

To overcome the disk seek issue, we only dispatch IO from raid5d if the
array is harddisk based. Other threads can still handle stripes, but
can't dispatch IO.

Idealy, we should control IO dispatching order according to IO position
interrnally. Right now we still depend on block layer, which isn't very
efficient sometimes though.

My setup has 9 harddisks, each disk can do around 180M/s sequential
write. So in theory, the raid5 can do 180 * 8 = 1440M/s sequential
write. The test machine uses an ATOM CPU. I measure sequential write
with large iodepth bandwidth to raid array:

without patch: ~600M/s
without patch and group_thread_cnt=4: 750M/s
with patch and group_thread_cnt=4: 950M/s
with patch, group_thread_cnt=4, skip_copy=1: 1150M/s

We are pretty close to the maximum bandwidth in the large iodepth
iodepth case. The performance gap of small iodepth sequential write
between software raid and theory value is still very big though, because
we don't have an efficient pipeline.

Cc: NeilBrown <neilb@suse.com>
Cc: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>

765d704d

md linear: fix a race between linear_add() and linear_congested() · 03a9e24e

colyli@suse.de authored Jan 28, 2017

Recently I receive a bug report that on Linux v3.0 based kerenl, hot add
disk to a md linear device causes kernel crash at linear_congested(). From
the crash image analysis, I find in linear_congested(), mddev->raid_disks
contains value N, but conf->disks[] only has N-1 pointers available. Then
a NULL pointer deference crashes the kernel.

There is a race between linear_add() and linear_congested(), RCU stuffs
used in these two functions cannot avoid the race. Since Linuv v4.0
RCU code is replaced by introducing mddev_suspend().  After checking the
upstream code, it seems linear_congested() is not called in
generic_make_request() code patch, so mddev_suspend() cannot provent it
from being called. The possible race still exists.

Here I explain how the race still exists in current code.  For a machine
has many CPUs, on one CPU, linear_add() is called to add a hard disk to a
md linear device; at the same time on other CPU, linear_congested() is
called to detect whether this md linear device is congested before issuing
an I/O request onto it.

Now I use a possible code execution time sequence to demo how the possible
race happens,

seq    linear_add()                linear_congested()
 0                                 conf=mddev->private
 1   oldconf=mddev->private
 2   mddev->raid_disks++
 3                              for (i=0; i<mddev->raid_disks;i++)
 4                                bdev_get_queue(conf->disks[i].rdev->bdev)
 5   mddev->private=newconf

In linear_add() mddev->raid_disks is increased in time seq 2, and on
another CPU in linear_congested() the for-loop iterates conf->disks[i] by
the increased mddev->raid_disks in time seq 3,4. But conf with one more
element (which is a pointer to struct dev_info type) to conf->disks[] is
not updated yet, accessing its structure member in time seq 4 will cause a
NULL pointer deference fault.

To fix this race, there are 2 parts of modification in the patch,
 1) Add 'int raid_disks' in struct linear_conf, as a copy of
    mddev->raid_disks. It is initialized in linear_conf(), always being
    consistent with pointers number of 'struct dev_info disks[]'. When
    iterating conf->disks[] in linear_congested(), use conf->raid_disks to
    replace mddev->raid_disks in the for-loop, then NULL pointer deference
    will not happen again.
 2) RCU stuffs are back again, and use kfree_rcu() in linear_add() to
    free oldconf memory. Because oldconf may be referenced as mddev->private
    in linear_congested(), kfree_rcu() makes sure that its memory will not
    be released until no one uses it any more.
Also some code comments are added in this patch, to make this modification
to be easier understandable.

This patch can be applied for kernels since v4.0 after commit:
3be260cc ("md/linear: remove rcu protections in favour of
suspend/resume"). But this bug is reported on Linux v3.0 based kernel, for
people who maintain kernels before Linux v4.0, they need to do some back
back port to this patch.

Changelog:
 - V3: add 'int raid_disks' in struct linear_conf, and use kfree_rcu() to
       replace rcu_call() in linear_add().
 - v2: add RCU stuffs by suggestion from Shaohua and Neil.
 - v1: initial effort.
Signed-off-by: Coly Li <colyli@suse.de>
Cc: Shaohua Li <shli@fb.com>
Cc: Neil Brown <neilb@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: Shaohua Li <shli@fb.com>

03a9e24e

12 Feb, 2017 1 commit
- Linux 4.10-rc8 · 7089db84
  Linus Torvalds authored Feb 12, 2017
  
  7089db84
11 Feb, 2017 8 commits

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1ce42845

Linus Torvalds authored Feb 11, 2017

Pull x86 fixes from Ingo Molnar:
 "Last minute x86 fixes:

   - Fix a softlockup detector warning and long delays if using ptdump
     with KASAN enabled.

   - Two more TSC-adjust fixes for interesting firmware interactions.

   - Two commits to fix an AMD CPU topology enumeration bug that caused
     a measurable gaming performance regression"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm/ptdump: Fix soft lockup in page table walker
  x86/tsc: Make the TSC ADJUST sanitizing work for tsc_reliable
  x86/tsc: Avoid the large time jump when sanitizing TSC ADJUST
  x86/CPU/AMD: Fix Zen SMT topology
  x86/CPU/AMD: Bring back Compute Unit ID

1ce42845

Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · fdb0ee7c

Linus Torvalds authored Feb 11, 2017

Pull timer fix from Ingo Molnar:
 "Fix a sporadic missed timer hw reprogramming bug that can result in
  random delays"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tick/nohz: Fix possible missing clock reprog after tick soft restart

fdb0ee7c

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · d5b76bef

Linus Torvalds authored Feb 11, 2017

Pull perf fixes from Ingo Molnar:
 "A kernel crash fix plus three tooling fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Fix crash in perf_event_read()
  perf callchain: Reference count maps
  perf diff: Fix -o/--order option behavior (again)
  perf diff: Fix segfault on 'perf diff -o N' option

d5b76bef

Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4e4f74a7

Linus Torvalds authored Feb 11, 2017

Pull lockdep fix from Ingo Molnar:
 "This fixes an ugly lockdep stack trace output regression. (But also
  affects other stacktrace users such as kmemleak, KASAN, etc)"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  stacktrace, lockdep: Fix address, newline ugliness

4e4f74a7

Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 21a7061c

Linus Torvalds authored Feb 11, 2017

Pull irq fixes from Ingo Molnar:
 "Two last minute ARM irqchip driver fixes"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/mxs: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND
  irqchip/keystone: Fix "scheduling while atomic" on rt

21a7061c

Merge branch 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · 2b95550a

Linus Torvalds authored Feb 11, 2017

Pull btrfs fixes from Chris Mason:
 "This has two last minute fixes. The highest priority here is a
  regression fix for the decompression code, but we also fixed up a
  problem with the 32-bit compat ioctls.

  The decompression bug could hand back the wrong data on big reads when
  zlib was used. I have a larger cleanup to make the math here less
  error prone, but at this stage in the release Omar's patch is the best
  choice"

* 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  btrfs: fix btrfs_decompress_buf2page()
  btrfs: fix btrfs_compat_ioctl failures on non-compat ioctls

2b95550a

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 13ebfd06

Linus Torvalds authored Feb 11, 2017

Pull SCSI fixes from James Bottomley:
 "Six fairly small fixes. None is a real show stopper, two automation
  detected problems: one memory leak, one use after free and four others
  each of which fixes something that has been a significant source of
  annoyance to someone"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: zfcp: fix use-after-free by not tracing WKA port open/close on failed send
  scsi: aacraid: Fix INTx/MSI-x issue with older controllers
  scsi: mpt3sas: disable ASPM for MPI2 controllers
  scsi: mpt3sas: Force request partial completion alignment
  scsi: qla2xxx: Avoid that issuing a LIP triggers a kernel crash
  scsi: qla2xxx: Fix a recently introduced memory leak

13ebfd06

Btrfs: fix btrfs_decompress_buf2page() · 6e78b3f7

Omar Sandoval authored Feb 10, 2017

If btrfs_decompress_buf2page() is handed a bio with its page in the
middle of the working buffer, then we adjust the offset into the working
buffer. After we copy into the bio, we advance the iterator by the
number of bytes we copied. Then, we have some logic to handle the case
of discontiguous pages and adjust the offset into the working buffer
again. However, if we didn't advance the bio to a new page, we may enter
this case in error, essentially repeating the adjustment that we already
made when we entered the function. The end result is bogus data in the
bio.

Previously, we only checked for this case when we advanced to a new
page, but the conversion to bio iterators changed that. This restores
the old, correct behavior.

A case I saw when testing with zlib was:

    buf_start = 42769
    total_out = 46865
    working_bytes = total_out - buf_start = 4096
    start_byte = 45056

The condition (total_out > start_byte && buf_start < start_byte) is
true, so we adjust the offset:

    buf_offset = start_byte - buf_start = 2287
    working_bytes -= buf_offset = 1809
    current_buf_start = buf_start = 42769

Then, we copy

    bytes = min(bvec.bv_len, PAGE_SIZE - buf_offset, working_bytes) = 1809
    buf_offset += bytes = 4096
    working_bytes -= bytes = 0
    current_buf_start += bytes = 44578

After bio_advance(), we are still in the same page, so start_byte is the
same. Then, we check (total_out > start_byte && current_buf_start < start_byte),
which is true! So, we adjust the values again:

    buf_offset = start_byte - buf_start = 2287
    working_bytes = total_out - start_byte = 1809
    current_buf_start = buf_start + buf_offset = 45056

But note that working_bytes was already zero before this, so we should
have stopped copying.

Fixes: 974b1adc ("btrfs: use bio iterators for the decompression handlers")
Reported-by: Pat Erley <pat-lkml@erley.org>
Reviewed-by: Chris Mason <clm@fb.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Tested-by: Liu Bo <bo.li.liu@oracle.com>

6e78b3f7

10 Feb, 2017 25 commits

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 1ee18329

Linus Torvalds authored Feb 10, 2017

Pull networking fixes from David Miller:

 1) If the timing is wrong we can indefinitely stop generating new ipv6
    temporary addresses, from Marcus Huewe.

 2) Don't double free per-cpu stats in ipv6 SIT tunnel driver, from Cong
    Wang.

 3) Put protections in place so that AF_PACKET is not able to submit
    packets which don't even have a link level header to drivers. From
    Willem de Bruijn.

 4) Fix memory leaks in ipv4 and ipv6 multicast code, from Hangbin Liu.

 5) Don't use udp_ioctl() in l2tp code, UDP version expects a UDP socket
    and that doesn't go over very well when it is passed an L2TP one.
    Fix from Eric Dumazet.

 6) Don't crash on NULL pointer in phy_attach_direct(), from Florian
    Fainelli.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  l2tp: do not use udp_ioctl()
  xen-netfront: Delete rx_refill_timer in xennet_disconnect_backend()
  NET: mkiss: Fix panic
  net: hns: Fix the device being used for dma mapping during TX
  net: phy: Initialize mdio clock at probe function
  igmp, mld: Fix memory leak in igmpv3/mld_del_delrec()
  xen-netfront: Improve error handling during initialization
  sierra_net: Skip validating irrelevant fields for IDLE LSIs
  sierra_net: Add support for IPv6 and Dual-Stack Link Sense Indications
  kcm: fix 0-length case for kcm_sendmsg()
  xen-netfront: Rework the fix for Rx stall during OOM and network stress
  net: phy: Fix PHY module checks and NULL deref in phy_attach_direct()
  net: thunderx: Fix PHY autoneg for SGMII QLM mode
  net: dsa: Do not destroy invalid network devices
  ping: fix a null pointer dereference
  packet: round up linear to header len
  net: introduce device min_header_len
  sit: fix a double free on error path
  lwtunnel: valid encap attr check should return 0 when lwtunnel is disabled
  ipv6: addrconf: fix generation of new temporary addresses

1ee18329

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma · a9dbf5c8

Linus Torvalds authored Feb 10, 2017

Pull rdma fixes from Doug Ledford:
 "Third round of -rc fixes for 4.10 kernel:

   - two security related issues in the rxe driver

   - one compile issue in the RDMA uapi header"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  RDMA: Don't reference kernel private header from UAPI header
  IB/rxe: Fix mem_check_range integer overflow
  IB/rxe: Fix resid update

a9dbf5c8

Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · aca9fa0c

Linus Torvalds authored Feb 10, 2017

Pull i2c bugfixes from Wolfram Sang:
 "Two bugfixes (proper IO mapping and use of mutex) for a driver feature
  we introduced in this cycle"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: piix4: Request the SMBUS semaphore inside the mutex
  i2c: piix4: Fix request_region size

aca9fa0c

Merge tag 'mmc-v4.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · fc6f41ba

Linus Torvalds authored Feb 10, 2017

Pull MMC host fix from Ulf Hansson:
 "mmci: Fix hang while waiting for busy-end interrupt"

* tag 'mmc-v4.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: mmci: avoid clearing ST Micro busy end interrupt mistakenly

fc6f41ba

Merge tag 'sound-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 1f369d16

Linus Torvalds authored Feb 10, 2017

Pull sound fixes from Takashi Iwai:
 "Here are some last-minute fixes: two fixes for races in ALSA sequencer
  queue spotted by syzkaller, a revert for a regression of LINE6 driver
  (since 4.9), and a trivial new codec ID addition for Nvidia HDMI"

* tag 'sound-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - adding a new NV HDMI/DP codec ID in the driver
  ALSA: seq: Fix race at creating a queue
  Revert "ALSA: line6: Only determine control port properties if needed"
  ALSA: seq: Don't handle loop timeout at snd_seq_pool_done()

1f369d16

Merge tag 'nfsd-4.10-3' of git://linux-nfs.org/~bfields/linux · 7fe654dc

Linus Torvalds authored Feb 10, 2017

Pull nfsd revert from Bruce Fields:
 "This patch turned out to have a couple problems. The problems are
  fixable, but at least one of the fixes is a little ugly. The original
  bug has always been there, so we can wait another week or two to get
  this right"

* tag 'nfsd-4.10-3' of git://linux-nfs.org/~bfields/linux:
  nfsd: Revert "nfsd: special case truncates some more"

7fe654dc

Merge tag 'powerpc-4.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 3ebc7033

Linus Torvalds authored Feb 10, 2017

Pull powerpc fixes friom Michael Ellerman:
 "Apologies for the late pull request, but Ben has been busy finding bugs.

   - Userspace was semi-randomly segfaulting on radix due to us
     incorrectly handling a fault triggered by autonuma, caused by a
     patch we merged earlier in v4.10 to prevent the kernel executing
     userspace.

   - We weren't marking host IPIs properly for KVM in the OPAL ICP
     backend.

   - The ERAT flushing on radix was missing an isync and was incorrectly
     marked as DD1 only.

   - The powernv CPU hotplug code was missing a wakeup type and failing
     to flush the interrupt correctly when using OPAL ICP

  Thanks to Benjamin Herrenschmidt"

* tag 'powerpc-4.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/powernv: Properly set "host-ipi" on IPIs
  powerpc/powernv: Fix CPU hotplug to handle waking on HVI
  powerpc/mm/radix: Update ERAT flushes when invalidating TLB
  powerpc/mm: Fix spurrious segfaults on radix with autonuma

3ebc7033

l2tp: do not use udp_ioctl() · 72fb96e7

Eric Dumazet authored Feb 09, 2017

udp_ioctl(), as its name suggests, is used by UDP protocols,
but is also used by L2TP :(

L2TP should use its own handler, because it really does not
look the same.

SIOCINQ for instance should not assume UDP checksum or headers.

Thanks to Andrey and syzkaller team for providing the report
and a nice reproducer.

While crashes only happen on recent kernels (after commit
7c13f97f ("udp: do fwd memory scheduling on dequeue")), this
probably needs to be backported to older kernels.

Fixes: 7c13f97f ("udp: do fwd memory scheduling on dequeue")
Fixes: 85584672 ("udp: Fix udp_poll() and ioctl()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

72fb96e7

Merge branch 'for-chris' of... · f3c7bfbd

Chris Mason authored Feb 10, 2017

Merge branch 'for-chris' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.10

f3c7bfbd

xen-netfront: Delete rx_refill_timer in xennet_disconnect_backend() · 74470954

Boris Ostrovsky authored Jan 30, 2017

rx_refill_timer should be deleted as soon as we disconnect from the
backend since otherwise it is possible for the timer to go off before
we get to xennet_destroy_queues(). If this happens we may dereference
queue->rx.sring which is set to NULL in xennet_disconnect_backend().
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
CC: stable@vger.kernel.org
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

74470954

NET: mkiss: Fix panic · 7ba1b689

Ralf Baechle authored Feb 09, 2017

If a USB-to-serial adapter is unplugged, the driver re-initializes, with
dev->hard_header_len and dev->addr_len set to zero, instead of the correct
values.  If then a packet is sent through the half-dead interface, the
kernel will panic due to running out of headroom in the skb when pushing
for the AX.25 headers resulting in this panic:

[<c0595468>] (skb_panic) from [<c0401f70>] (skb_push+0x4c/0x50)
[<c0401f70>] (skb_push) from [<bf0bdad4>] (ax25_hard_header+0x34/0xf4 [ax25])
[<bf0bdad4>] (ax25_hard_header [ax25]) from [<bf0d05d4>] (ax_header+0x38/0x40 [mkiss])
[<bf0d05d4>] (ax_header [mkiss]) from [<c041b584>] (neigh_compat_output+0x8c/0xd8)
[<c041b584>] (neigh_compat_output) from [<c043e7a8>] (ip_finish_output+0x2a0/0x914)
[<c043e7a8>] (ip_finish_output) from [<c043f948>] (ip_output+0xd8/0xf0)
[<c043f948>] (ip_output) from [<c043f04c>] (ip_local_out_sk+0x44/0x48)

This patch makes mkiss behave like the 6pack driver. 6pack does not
panic.  In 6pack.c sp_setup() (same function name here) the values for
dev->hard_header_len and dev->addr_len are set to the same values as in
my mkiss patch.

[ralf@linux-mips.org: Massages original submission to conform to the usual
standards for patch submissions.]
Signed-off-by: Thomas Osterried <thomas@osterried.de>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

7ba1b689

net: hns: Fix the device being used for dma mapping during TX · b85ea006

Kejian Yan authored Feb 09, 2017

This patch fixes the device being used to DMA map skb->data.
Erroneous device assignment causes the crash when SMMU is enabled.
This happens during TX since buffer gets DMA mapped with device
correspondign to net_device and gets unmapped using the device
related to DSAF.
Signed-off-by: Kejian Yan <yankejian@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b85ea006

Merge tag 'irqchip-fixes-4.10' of git://git.infradead.org/users/jcooper/linux into irq/urgent · d128dfb5

Thomas Gleixner authored Feb 10, 2017

Pull irqchip fixes for v4.10 from Jason Cooper

- keystone: Fix scheduling while atomic for realtime
- mxs: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND

d128dfb5

x86/mm/ptdump: Fix soft lockup in page table walker · 146fbb76

Andrey Ryabinin authored Feb 10, 2017

CONFIG_KASAN=y needs a lot of virtual memory mapped for its shadow.
In that case ptdump_walk_pgd_level_core() takes a lot of time to
walk across all page tables and doing this without
a rescheduling causes soft lockups:

 NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [swapper/0:1]
 ...
 Call Trace:
  ptdump_walk_pgd_level_core+0x40c/0x550
  ptdump_walk_pgd_level_checkwx+0x17/0x20
  mark_rodata_ro+0x13b/0x150
  kernel_init+0x2f/0x120
  ret_from_fork+0x2c/0x40

I guess that this issue might arise even without KASAN on huge machines
with several terabytes of RAM.

Stick cond_resched() in pgd loop to fix this.
Reported-by: Tobias Regnery <tobias.regnery@gmail.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: kasan-dev@googlegroups.com
Cc: Alexander Potapenko <glider@google.com>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170210095405.31802-1-aryabinin@virtuozzo.comSigned-off-by: Thomas Gleixner <tglx@linutronix.de>

146fbb76

x86/tsc: Make the TSC ADJUST sanitizing work for tsc_reliable · 5f2e71e7

Thomas Gleixner authored Feb 09, 2017

When the TSC is marked reliable then the synchronization check is skipped,
but that also skips the TSC ADJUST sanitizing code. So on a machine with a
wreckaged BIOS the TSC deviation between CPUs might go unnoticed.

Let the TSC adjust sanitizing code run unconditionally and just skip the
expensive synchronization checks when TSC is marked reliable.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Olof Johansson <olof@lixom.net>
Link: http://lkml.kernel.org/r/20170209151231.491189912@linutronix.deSigned-off-by: Thomas Gleixner <tglx@linutronix.de>

5f2e71e7

x86/tsc: Avoid the large time jump when sanitizing TSC ADJUST · f2e04214

Thomas Gleixner authored Feb 09, 2017

Olof reported that on a machine which has a BIOS wreckaged TSC the
timestamps in dmesg are making a large jump because the TSC value is
jumping forward after resetting the TSC ADJUST register to a sane value.

This can be avoided by calling the TSC ADJUST saniziting function before
initializing the per cpu sched clock machinery. That takes the offset into
account and avoid the time jump.

What cannot be avoided is that the 'Firmware Bug' warnings on the secondary
CPUs are printed with the large time offsets because it would be too much
effort and ugly hackery to print those warnings into a buffer and emit them
after the adjustemt on the starting CPUs. It's a firmware bug and should be
fixed in firmware. The weird timestamps are collateral damage and just
illustrate the sillyness of the BIOS folks:

[    0.397445] smp: Bringing up secondary CPUs ...
[    0.402100] x86: Booting SMP configuration:
[    0.406343] .... node  #0, CPUs:      #1
[1265776479.930667] [Firmware Bug]: TSC ADJUST differs: Reference CPU0: -2978888639075328 CPU1: -2978888639183101
[1265776479.944664] TSC ADJUST synchronize: Reference CPU0: 0 CPU1: -2978888639183101
[    0.508119]  #2
[1265776480.032346] [Firmware Bug]: TSC ADJUST differs: Reference CPU0: -2978888639075328 CPU2: -2978888639183677
[1265776480.044192] TSC ADJUST synchronize: Reference CPU0: 0 CPU2: -2978888639183677
[    0.607643]  #3
[1265776480.131874] [Firmware Bug]: TSC ADJUST differs: Reference CPU0: -2978888639075328 CPU3: -2978888639184530
[1265776480.143720] TSC ADJUST synchronize: Reference CPU0: 0 CPU3: -2978888639184530
[    0.707108] smp: Brought up 1 node, 4 CPUs
[    0.711271] smpboot: Total of 4 processors activated (21698.88 BogoMIPS)
Reported-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170209151231.411460506@linutronix.deSigned-off-by: Thomas Gleixner <tglx@linutronix.de>

f2e04214

tick/nohz: Fix possible missing clock reprog after tick soft restart · 7bdb59f1

Frederic Weisbecker authored Feb 07, 2017

ts->next_tick keeps track of the next tick deadline in order to optimize
clock programmation on irq exit and avoid redundant clock device writes.

Now if ts->next_tick missed an update, we may spuriously miss a clock
reprog later as the nohz code is fooled by an obsolete next_tick value.

This is what happens here on a specific path: when we observe an
expired timer from the nohz update code on irq exit, we perform a soft
tick restart which simply fires the closest possible tick without
actually exiting the nohz mode and restoring a periodic state. But we
forget to update ts->next_tick accordingly.

As a result, after the next tick resulting from such soft tick restart,
the nohz code sees a stale value on ts->next_tick which doesn't match
the clock deadline that just expired. If that obsolete ts->next_tick
value happens to collide with the actual next tick deadline to be
scheduled, we may spuriously bypass the clock reprogramming. In the
worst case, the tick may never fire again.

Fix this with a ts->next_tick reset on soft tick restart.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed: Wanpeng Li <wanpeng.li@hotmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1486485894-29173-1-git-send-email-fweisbec@gmail.comSigned-off-by: Thomas Gleixner <tglx@linutronix.de>

7bdb59f1

perf/core: Fix crash in perf_event_read() · 451d24d1

Peter Zijlstra authored Jan 31, 2017

Alexei had his box explode because doing read() on a package
(rapl/uncore) event that isn't currently scheduled in ends up doing an
out-of-bounds load.

Rework the code to more explicitly deal with event->oncpu being -1.
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: David Carrillo-Cisneros <davidcc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: eranian@google.com
Fixes: d6a2f903 ("perf/core: Introduce PMU_EV_CAP_READ_ACTIVE_PKG")
Link: http://lkml.kernel.org/r/20170131102710.GL6515@twins.programming.kicks-ass.netSigned-off-by: Ingo Molnar <mingo@kernel.org>

451d24d1

Merge remote-tracking branch 'mkp-scsi/4.10/scsi-fixes' into fixes · ed6de456
James Bottomley authored Feb 09, 2017

ed6de456

nfsd: Revert "nfsd: special case truncates some more" · 0839ffb8

J. Bruce Fields authored Feb 09, 2017

This patch incorrectly attempted nested mnt_want_write, and incorrectly
disabled nfsd's owner override for truncate.  We'll fix those problems
and make another attempt soon, for the moment I think the safest is to
revert.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

0839ffb8

Merge tag 'drm-fixes-for-v4.10-rc8' of git://people.freedesktop.org/~airlied/linux · 3d88460d

Linus Torvalds authored Feb 09, 2017

Pull drm fixes from Dave Airlie:
 "This should be the final set of drm fixes for 4.10: one vmwgfx boot
  fix, one vc4 fix, and a few i915 fixes:

* tag 'drm-fixes-for-v4.10-rc8' of git://people.freedesktop.org/~airlied/linux:
  drm: vc4: adapt to new behaviour of drm_crtc.c
  drm/i915: Always convert incoming exec offsets to non-canonical
  drm/i915: Remove overzealous fence warn on runtime suspend
  drm/i915/bxt: Add MST support when do DPLL calculation
  drm/i915: don't warn about Skylake CPU - KabyPoint PCH combo
  drm/i915: fix i915 running as dom0 under Xen
  drm/i915: Flush untouched framebuffers before display on !llc
  drm/i915: fix use-after-free in page_flip_completed()
  drm/vmwgfx: Fix depth input into drm_mode_legacy_fb_format

3d88460d

scsi: zfcp: fix use-after-free by not tracing WKA port open/close on failed send · 2dfa6688

Steffen Maier authored Feb 08, 2017

Dan Carpenter kindly reported:
<quote>
The patch d27a7cb9: "zfcp: trace on request for open and close of
WKA port" from Aug 10, 2016, leads to the following static checker
warning:

	drivers/s390/scsi/zfcp_fsf.c:1615 zfcp_fsf_open_wka_port()
	warn: 'req' was already freed.

drivers/s390/scsi/zfcp_fsf.c
  1609          zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT);
  1610          retval = zfcp_fsf_req_send(req);
  1611          if (retval)
  1612                  zfcp_fsf_req_free(req);
                                          ^^^
Freed.

  1613  out:
  1614          spin_unlock_irq(&qdio->req_q_lock);
  1615          if (req && !IS_ERR(req))
  1616                  zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req->req_id);
                                                                  ^^^^^^^^^^^
Use after free.

  1617          return retval;
  1618  }

Same thing for zfcp_fsf_close_wka_port() as well.
</quote>

Rather than relying on req being NULL (or ERR_PTR) for all cases where
we don't want to trace or should not trace,
simply check retval which is unconditionally initialized with -EIO != 0
and it can only become 0 on successful retval = zfcp_fsf_req_send(req).
With that we can also remove the then again unnecessary unconditional
initialization of req which was introduced with that earlier commit.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Suggested-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: d27a7cb9 ("zfcp: trace on request for open and close of WKA port")
Cc: <stable@vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

2dfa6688

scsi: aacraid: Fix INTx/MSI-x issue with older controllers · 8af8e1c2

Dave Carroll authored Feb 09, 2017

commit 78cbccd3 ("aacraid: Fix for KDUMP driver hang")

caused a problem on older controllers which do not support MSI-x (namely
ASR3405,ASR3805). This patch conditionalizes the previous patch to
controllers which support MSI-x

Cc: <stable@vger.kernel.org> # v4.7+
Fixes: 78cbccd3 ("aacraid: Fix for KDUMP driver hang")
Reported-by: Arkadiusz Miskiewicz <a.miskiewicz@gmail.com>
Signed-off-by: Dave Carroll <david.carroll@microsemi.com>
Reviewed-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

8af8e1c2

Merge tag 'drm-intel-fixes-2017-02-09' of... · 697d3a21

Dave Airlie authored Feb 10, 2017

Merge tag 'drm-intel-fixes-2017-02-09' of git://anongit.freedesktop.org/git/drm-intel into drm-fixes

Hopefully final fixes for v4.10, about half of them stable material.

* tag 'drm-intel-fixes-2017-02-09' of git://anongit.freedesktop.org/git/drm-intel:
  drm/i915: Always convert incoming exec offsets to non-canonical
  drm/i915: Remove overzealous fence warn on runtime suspend
  drm/i915/bxt: Add MST support when do DPLL calculation
  drm/i915: don't warn about Skylake CPU - KabyPoint PCH combo
  drm/i915: fix i915 running as dom0 under Xen
  drm/i915: Flush untouched framebuffers before display on !llc
  drm/i915: fix use-after-free in page_flip_completed()

697d3a21

Merge tag 'drm-misc-fixes-2017-02-09' of git://anongit.freedesktop.org/git/drm-misc into drm-fixes · 811b40c8

Dave Airlie authored Feb 10, 2017

Last-minute vc4 fix for 4.10.

* tag 'drm-misc-fixes-2017-02-09' of git://anongit.freedesktop.org/git/drm-misc:
  drm: vc4: adapt to new behaviour of drm_crtc.c

811b40c8

09 Feb, 2017 3 commits

scsi: mpt3sas: disable ASPM for MPI2 controllers · ffdadd68

ojab authored Dec 28, 2016

MPI2 controllers sometimes got lost (i.e. disappear from
/sys/bus/pci/devices) if ASMP is enabled.
Signed-off-by: Slava Kardakov <ojab@ojab.ru>
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=60644
Cc: <stable@vger.kernel.org>
Acked-by: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

ffdadd68

net: phy: Initialize mdio clock at probe function · bb1a6197

Yendapally Reddy Dhananjaya Reddy authored Feb 08, 2017

USB PHYs need the MDIO clock divisor enabled earlier to work.
Initialize mdio clock divisor in probe function. The ext bus
bit available in the same register will be used by mdio mux
to enable external mdio.
Signed-off-by: Yendapally Reddy Dhananjaya Reddy <yendapally.reddy@broadcom.com>
Fixes: ddc24ae1 ("net: phy: Broadcom iProc MDIO bus driver")
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jon Mason <jon.mason@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bb1a6197

igmp, mld: Fix memory leak in igmpv3/mld_del_delrec() · 9c8bb163

Hangbin Liu authored Feb 08, 2017

In function igmpv3/mld_add_delrec() we allocate pmc and put it in
idev->mc_tomb, so we should free it when we don't need it in del_delrec().
But I removed kfree(pmc) incorrectly in latest two patches. Now fix it.

Fixes: 24803f38 ("igmp: do not remove igmp souce list info when ...")
Fixes: 1666d49e ("mld: do not remove mld souce list info when ...")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9c8bb163