1. 16 Jan, 2019 16 commits
  2. 13 Jan, 2019 24 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.4.170 · b83b3fa7
      Greg Kroah-Hartman authored
      b83b3fa7
    • Lubomir Rintel's avatar
      power: supply: olpc_battery: correct the temperature units · 1bd63edb
      Lubomir Rintel authored
      commit ed54ffbe upstream.
      
      According to [1] and [2], the temperature values are in tenths of degree
      Celsius. Exposing the Celsius value makes the battery appear on fire:
      
        $ upower -i /org/freedesktop/UPower/devices/battery_olpc_battery
        ...
            temperature:         236.9 degrees C
      
      Tested on OLPC XO-1 and OLPC XO-1.75 laptops.
      
      [1] include/linux/power_supply.h
      [2] Documentation/power/power_supply_class.txt
      
      Fixes: fb972873 ("[BATTERY] One Laptop Per Child power/battery driver")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLubomir Rintel <lkundrak@v3.sk>
      Acked-by: default avatarPavel Machek <pavel@ucw.cz>
      Signed-off-by: default avatarSebastian Reichel <sebastian.reichel@collabora.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1bd63edb
    • Alexander Shishkin's avatar
      intel_th: msu: Fix an off-by-one in attribute store · b1892669
      Alexander Shishkin authored
      commit ec5b5ad6 upstream.
      
      The 'nr_pages' attribute of the 'msc' subdevices parses a comma-separated
      list of window sizes, passed from userspace. However, there is a bug in
      the string parsing logic wherein it doesn't exclude the comma character
      from the range of characters as it consumes them. This leads to an
      out-of-bounds access given a sufficiently long list. For example:
      
      > # echo 8,8,8,8 > /sys/bus/intel_th/devices/0-msc0/nr_pages
      > ==================================================================
      > BUG: KASAN: slab-out-of-bounds in memchr+0x1e/0x40
      > Read of size 1 at addr ffff8803ffcebcd1 by task sh/825
      >
      > CPU: 3 PID: 825 Comm: npktest.sh Tainted: G        W         4.20.0-rc1+
      > Call Trace:
      >  dump_stack+0x7c/0xc0
      >  print_address_description+0x6c/0x23c
      >  ? memchr+0x1e/0x40
      >  kasan_report.cold.5+0x241/0x308
      >  memchr+0x1e/0x40
      >  nr_pages_store+0x203/0xd00 [intel_th_msu]
      
      Fix this by accounting for the comma character.
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Fixes: ba82664c ("intel_th: Add Memory Storage Unit driver")
      Cc: stable@vger.kernel.org # v4.4+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b1892669
    • Christian Borntraeger's avatar
      genwqe: Fix size check · 21c7b137
      Christian Borntraeger authored
      commit fdd66968 upstream.
      
      Calling the test program genwqe_cksum with the default buffer size of
      2MB triggers the following kernel warning on s390:
      
      WARNING: CPU: 30 PID: 9311 at mm/page_alloc.c:3189 __alloc_pages_nodemask+0x45c/0xbe0
      CPU: 30 PID: 9311 Comm: genwqe_cksum Kdump: loaded Not tainted 3.10.0-957.el7.s390x #1
      task: 00000005e5d13980 ti: 00000005e7c6c000 task.ti: 00000005e7c6c000
      Krnl PSW : 0704c00180000000 00000000002780ac (__alloc_pages_nodemask+0x45c/0xbe0)
                 R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
      Krnl GPRS: 00000000002932b8 0000000000b73d7c 0000000000000010 0000000000000009
                 0000000000000041 00000005e7c6f9b8 0000000000000001 00000000000080d0
                 0000000000000000 0000000000b70500 0000000000000001 0000000000000000
                 0000000000b70528 00000000007682c0 0000000000277df2 00000005e7c6f9a0
      Krnl Code: 000000000027809e: de7195001000	ed	1280(114,%r9),0(%r1)
      	   00000000002780a4: a774fead		brc	7,277dfe
      	  #00000000002780a8: a7f40001		brc	15,2780aa
      	  >00000000002780ac: 92011000		mvi	0(%r1),1
      	   00000000002780b0: a7f4fea7		brc	15,277dfe
      	   00000000002780b4: 9101c6b6		tm	1718(%r12),1
      	   00000000002780b8: a784ff3a		brc	8,277f2c
      	   00000000002780bc: a7f4fe2e		brc	15,277d18
      Call Trace:
      ([<0000000000277df2>] __alloc_pages_nodemask+0x1a2/0xbe0)
       [<000000000013afae>] s390_dma_alloc+0xfe/0x310
       [<000003ff8065f362>] __genwqe_alloc_consistent+0xfa/0x148 [genwqe_card]
       [<000003ff80658f7a>] genwqe_mmap+0xca/0x248 [genwqe_card]
       [<00000000002b2712>] mmap_region+0x4e2/0x778
       [<00000000002b2c54>] do_mmap+0x2ac/0x3e0
       [<0000000000292d7e>] vm_mmap_pgoff+0xd6/0x118
       [<00000000002b081c>] SyS_mmap_pgoff+0xdc/0x268
       [<00000000002b0a34>] SyS_old_mmap+0x8c/0xb0
       [<000000000074e518>] sysc_tracego+0x14/0x1e
       [<000003ffacf87dc6>] 0x3ffacf87dc6
      
      turns out the check in __genwqe_alloc_consistent uses "> MAX_ORDER"
      while the mm code uses ">= MAX_ORDER". Fix genwqe.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarFrank Haverkamp <haver@linux.vnet.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      21c7b137
    • Yan, Zheng's avatar
      ceph: don't update importing cap's mseq when handing cap export · 5f6ce5ea
      Yan, Zheng authored
      commit 3c1392d4 upstream.
      
      Updating mseq makes client think importer mds has accepted all prior
      cap messages and importer mds knows what caps client wants. Actually
      some cap messages may have been dropped because of mseq mismatch.
      
      If mseq is left untouched, importing cap's mds_wanted later will get
      reset by cap import message.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatar"Yan, Zheng" <zyan@redhat.com>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5f6ce5ea
    • Sohil Mehta's avatar
      iommu/vt-d: Handle domain agaw being less than iommu agaw · 0216bf65
      Sohil Mehta authored
      commit 3569dd07 upstream.
      
      The Intel IOMMU driver opportunistically skips a few top level page
      tables from the domain paging directory while programming the IOMMU
      context entry. However there is an implicit assumption in the code that
      domain's adjusted guest address width (agaw) would always be greater
      than IOMMU's agaw.
      
      The IOMMU capabilities in an upcoming platform cause the domain's agaw
      to be lower than IOMMU's agaw. The issue is seen when the IOMMU supports
      both 4-level and 5-level paging. The domain builds a 4-level page table
      based on agaw of 2. However the IOMMU's agaw is set as 3 (5-level). In
      this case the code incorrectly tries to skip page page table levels.
      This causes the IOMMU driver to avoid programming the context entry. The
      fix handles this case and programs the context entry accordingly.
      
      Fixes: de24e553 ("iommu/vt-d: Simplify domain_context_mapping_one")
      Cc: <stable@vger.kernel.org>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
      Cc: Lu Baolu <baolu.lu@linux.intel.com>
      Reviewed-by: default avatarLu Baolu <baolu.lu@linux.intel.com>
      Reported-by: default avatarRamos Falcon, Ernesto R <ernesto.r.ramos.falcon@intel.com>
      Tested-by: default avatarRicardo Neri <ricardo.neri-calderon@linux.intel.com>
      Signed-off-by: default avatarSohil Mehta <sohil.mehta@intel.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0216bf65
    • Dominique Martinet's avatar
      9p/net: put a lower bound on msize · 2d750144
      Dominique Martinet authored
      commit 574d356b upstream.
      
      If the requested msize is too small (either from command line argument
      or from the server version reply), we won't get any work done.
      If it's *really* too small, nothing will work, and this got caught by
      syzbot recently (on a new kmem_cache_create_usercopy() call)
      
      Just set a minimum msize to 4k in both code paths, until someone
      complains they have a use-case for a smaller msize.
      
      We need to check in both mount option and server reply individually
      because the msize for the first version request would be unchecked
      with just a global check on clnt->msize.
      
      Link: http://lkml.kernel.org/r/1541407968-31350-1-git-send-email-asmadeus@codewreck.org
      Reported-by: syzbot+0c1d61e4db7db94102ca@syzkaller.appspotmail.com
      Signed-off-by: default avatarDominique Martinet <dominique.martinet@cea.fr>
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Latchesar Ionkov <lucho@ionkov.net>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2d750144
    • Larry Finger's avatar
      b43: Fix error in cordic routine · c820ac33
      Larry Finger authored
      commit 8ea3819c upstream.
      
      The cordic routine for calculating sines and cosines that was added in
      commit 6f98e62a ("b43: update cordic code to match current specs")
      contains an error whereby a quantity declared u32 can in fact go negative.
      
      This problem was detected by Priit Laes who is switching b43 to use the
      routine in the library functions of the kernel.
      
      Fixes: 98650454 ("b43: make cordic common (LP-PHY and N-PHY need it)")
      Reported-by: default avatarPriit Laes <plaes@plaes.org>
      Cc: Rafał Miłecki <zajec5@gmail.com>
      Cc: Stable <stable@vger.kernel.org> # 2.6.34
      Signed-off-by: default avatarLarry Finger <Larry.Finger@lwfinger.net>
      Signed-off-by: default avatarPriit Laes <plaes@plaes.org>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c820ac33
    • Andreas Gruenbacher's avatar
      gfs2: Fix loop in gfs2_rbm_find · 18451898
      Andreas Gruenbacher authored
      commit 2d29f6b9 upstream.
      
      Fix the resource group wrap-around logic in gfs2_rbm_find that commit
      e579ed4f broke.  The bug can lead to unnecessary repeated scanning of the
      same bitmaps; there is a risk that future changes will turn this into an
      endless loop.
      
      Fixes: e579ed4f ("GFS2: Introduce rbm field bii")
      Cc: stable@vger.kernel.org # v3.13+
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      18451898
    • Vasily Averin's avatar
      dlm: memory leaks on error path in dlm_user_request() · bf72973c
      Vasily Averin authored
      commit d47b41ac upstream.
      
      According to comment in dlm_user_request() ua should be freed
      in dlm_free_lkb() after successful attach to lkb.
      
      However ua is attached to lkb not in set_lock_args() but later,
      inside request_lock().
      
      Fixes 597d0cae ("[DLM] dlm: user locks")
      Cc: stable@kernel.org # 2.6.19
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bf72973c
    • Vasily Averin's avatar
      dlm: lost put_lkb on error path in receive_convert() and receive_unlock() · 3ed774e5
      Vasily Averin authored
      commit c0174726 upstream.
      
      Fixes 6d40c4a7 ("dlm: improve error and debug messages")
      Cc: stable@kernel.org # 3.5
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3ed774e5
    • Vasily Averin's avatar
      dlm: possible memory leak on error path in create_lkb() · 27f4aa2a
      Vasily Averin authored
      commit 23851e97 upstream.
      
      Fixes 3d6aa675 ("dlm: keep lkbs in idr")
      Cc: stable@kernel.org # 3.1
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      27f4aa2a
    • Vasily Averin's avatar
      dlm: fixed memory leaks after failed ls_remove_names allocation · a09b8db2
      Vasily Averin authored
      commit b982896c upstream.
      
      If allocation fails on last elements of array need to free already
      allocated elements.
      
      v2: just move existing out_rsbtbl label to right place
      
      Fixes 789924ba635f ("dlm: fix race between remove and lookup")
      Cc: stable@kernel.org # 3.6
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a09b8db2
    • Hui Peng's avatar
      ALSA: usb-audio: Fix an out-of-bound read in create_composite_quirks · 11e04713
      Hui Peng authored
      commit cbb2ebf7 upstream.
      
      In `create_composite_quirk`, the terminating condition of for loops is
      `quirk->ifnum < 0`. So any composite quirks should end with `struct
      snd_usb_audio_quirk` object with ifnum < 0.
      
          for (quirk = quirk_comp->data; quirk->ifnum >= 0; ++quirk) {
      
          	.....
          }
      
      the data field of Bower's & Wilkins PX headphones usb device device quirks
      do not end with {.ifnum = -1}, wihch may result in out-of-bound read.
      
      This Patch fix the bug by adding an ending quirk object.
      
      Fixes: 240a8af9 ("ALSA: usb-audio: Add a quirck for B&W PX headphones")
      Signed-off-by: default avatarHui Peng <benquike@163.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      11e04713
    • Takashi Iwai's avatar
      ALSA: usb-audio: Avoid access before bLength check in build_audio_procunit() · a5e09a90
      Takashi Iwai authored
      commit f4351a19 upstream.
      
      The parser for the processing unit reads bNrInPins field before the
      bLength sanity check, which may lead to an out-of-bound access when a
      malformed descriptor is given.  Fix it by assignment after the bLength
      check.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a5e09a90
    • Dan Carpenter's avatar
      ALSA: cs46xx: Potential NULL dereference in probe · 83f470eb
      Dan Carpenter authored
      commit 1524f4e4 upstream.
      
      The "chip->dsp_spos_instance" can be NULL on some of the ealier error
      paths in snd_cs46xx_create().
      Reported-by: default avatar"Yavuz, Tuba" <tuba@ece.ufl.edu>
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      83f470eb
    • Eric Biggers's avatar
      crypto: x86/chacha20 - avoid sleeping with preemption disabled · 557f16c7
      Eric Biggers authored
      In chacha20-simd, clear the MAY_SLEEP flag in the blkcipher_desc to
      prevent sleeping with preemption disabled, under kernel_fpu_begin().
      
      This was fixed upstream incidentally by a large refactoring,
      commit 9ae433bc ("crypto: chacha20 - convert generic and x86
      versions to skcipher").  But syzkaller easily trips over this when
      running on older kernels, as it's easily reachable via AF_ALG.
      Therefore, this patch makes the minimal fix for older kernels.
      
      Fixes: c9320b6d ("crypto: chacha20 - Add a SSSE3 SIMD variant for x86_64")
      Cc: linux-crypto@vger.kernel.org
      Cc: Martin Willi <martin@strongswan.org>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Acked-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      557f16c7
    • Vasily Averin's avatar
      sunrpc: use SVC_NET() in svcauth_gss_* functions · 69c1fd10
      Vasily Averin authored
      commit b8be5674 upstream.
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      69c1fd10
    • Vasily Averin's avatar
      sunrpc: fix cache_head leak due to queued request · 192f7ca0
      Vasily Averin authored
      commit 4ecd55ea upstream.
      
      After commit d202cce8, an expired cache_head can be removed from the
      cache_detail's hash.
      
      However, the expired cache_head may be waiting for a reply from a
      previously submitted request. Such a cache_head has an increased
      refcounter and therefore it won't be freed after cache_put(freeme).
      
      Because the cache_head was removed from the hash it cannot be found
      during cache_clean() and can be leaked forever, together with stalled
      cache_request and other taken resources.
      
      In our case we noticed it because an entry in the export cache was
      holding a reference on a filesystem.
      
      Fixes d202cce8 ("sunrpc: never return expired entries in sunrpc_cache_lookup")
      Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
      Cc: stable@kernel.org # 2.6.35
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Reviewed-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      192f7ca0
    • Dan Williams's avatar
      mm, devm_memremap_pages: kill mapping "System RAM" support · 6331b9d7
      Dan Williams authored
      commit 06489cfb upstream.
      
      Given the fact that devm_memremap_pages() requires a percpu_ref that is
      torn down by devm_memremap_pages_release() the current support for mapping
      RAM is broken.
      
      Support for remapping "System RAM" has been broken since the beginning and
      there is no existing user of this this code path, so just kill the support
      and make it an explicit error.
      
      This cleanup also simplifies a follow-on patch to fix the error path when
      setting a devm release action for devm_memremap_pages_release() fails.
      
      Link: http://lkml.kernel.org/r/154275557997.76910.14689813630968180480.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatar"Jérôme Glisse" <jglisse@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6331b9d7
    • Dan Williams's avatar
      mm, devm_memremap_pages: mark devm_memremap_pages() EXPORT_SYMBOL_GPL · a93d56de
      Dan Williams authored
      commit 808153e1 upstream.
      
      devm_memremap_pages() is a facility that can create struct page entries
      for any arbitrary range and give drivers the ability to subvert core
      aspects of page management.
      
      Specifically the facility is tightly integrated with the kernel's memory
      hotplug functionality.  It injects an altmap argument deep into the
      architecture specific vmemmap implementation to allow allocating from
      specific reserved pages, and it has Linux specific assumptions about page
      structure reference counting relative to get_user_pages() and
      get_user_pages_fast().  It was an oversight and a mistake that this was
      not marked EXPORT_SYMBOL_GPL from the outset.
      
      Again, devm_memremap_pagex() exposes and relies upon core kernel internal
      assumptions and will continue to evolve along with 'struct page', memory
      hotplug, and support for new memory types / topologies.  Only an in-kernel
      GPL-only driver is expected to keep up with this ongoing evolution.  This
      interface, and functionality derived from this interface, is not suitable
      for kernel-external drivers.
      
      Link: http://lkml.kernel.org/r/154275557457.76910.16923571232582744134.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a93d56de
    • Michal Hocko's avatar
      hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined · 060853fd
      Michal Hocko authored
      commit b15c8726 upstream.
      
      We have received a bug report that an injected MCE about faulty memory
      prevents memory offline to succeed on 4.4 base kernel.  The underlying
      reason was that the HWPoison page has an elevated reference count and the
      migration keeps failing.  There are two problems with that.  First of all
      it is dubious to migrate the poisoned page because we know that accessing
      that memory is possible to fail.  Secondly it doesn't make any sense to
      migrate a potentially broken content and preserve the memory corruption
      over to a new location.
      
      Oscar has found out that 4.4 and the current upstream kernels behave
      slightly differently with his simply testcase
      
      ===
      
      int main(void)
      {
              int ret;
              int i;
              int fd;
              char *array = malloc(4096);
              char *array_locked = malloc(4096);
      
              fd = open("/tmp/data", O_RDONLY);
              read(fd, array, 4095);
      
              for (i = 0; i < 4096; i++)
                      array_locked[i] = 'd';
      
              ret = mlock((void *)PAGE_ALIGN((unsigned long)array_locked), sizeof(array_locked));
              if (ret)
                      perror("mlock");
      
              sleep (20);
      
              ret = madvise((void *)PAGE_ALIGN((unsigned long)array_locked), 4096, MADV_HWPOISON);
              if (ret)
                      perror("madvise");
      
              for (i = 0; i < 4096; i++)
                      array_locked[i] = 'd';
      
              return 0;
      }
      ===
      
      + offline this memory.
      
      In 4.4 kernels he saw the hwpoisoned page to be returned back to the LRU
      list
      kernel:  [<ffffffff81019ac9>] dump_trace+0x59/0x340
      kernel:  [<ffffffff81019e9a>] show_stack_log_lvl+0xea/0x170
      kernel:  [<ffffffff8101ac71>] show_stack+0x21/0x40
      kernel:  [<ffffffff8132bb90>] dump_stack+0x5c/0x7c
      kernel:  [<ffffffff810815a1>] warn_slowpath_common+0x81/0xb0
      kernel:  [<ffffffff811a275c>] __pagevec_lru_add_fn+0x14c/0x160
      kernel:  [<ffffffff811a2eed>] pagevec_lru_move_fn+0xad/0x100
      kernel:  [<ffffffff811a334c>] __lru_cache_add+0x6c/0xb0
      kernel:  [<ffffffff81195236>] add_to_page_cache_lru+0x46/0x70
      kernel:  [<ffffffffa02b4373>] extent_readpages+0xc3/0x1a0 [btrfs]
      kernel:  [<ffffffff811a16d7>] __do_page_cache_readahead+0x177/0x200
      kernel:  [<ffffffff811a18c8>] ondemand_readahead+0x168/0x2a0
      kernel:  [<ffffffff8119673f>] generic_file_read_iter+0x41f/0x660
      kernel:  [<ffffffff8120e50d>] __vfs_read+0xcd/0x140
      kernel:  [<ffffffff8120e9ea>] vfs_read+0x7a/0x120
      kernel:  [<ffffffff8121404b>] kernel_read+0x3b/0x50
      kernel:  [<ffffffff81215c80>] do_execveat_common.isra.29+0x490/0x6f0
      kernel:  [<ffffffff81215f08>] do_execve+0x28/0x30
      kernel:  [<ffffffff81095ddb>] call_usermodehelper_exec_async+0xfb/0x130
      kernel:  [<ffffffff8161c045>] ret_from_fork+0x55/0x80
      
      And that latter confuses the hotremove path because an LRU page is
      attempted to be migrated and that fails due to an elevated reference
      count.  It is quite possible that the reuse of the HWPoisoned page is some
      kind of fixed race condition but I am not really sure about that.
      
      With the upstream kernel the failure is slightly different.  The page
      doesn't seem to have LRU bit set but isolate_movable_page simply fails and
      do_migrate_range simply puts all the isolated pages back to LRU and
      therefore no progress is made and scan_movable_pages finds same set of
      pages over and over again.
      
      Fix both cases by explicitly checking HWPoisoned pages before we even try
      to get reference on the page, try to unmap it if it is still mapped.  As
      explained by Naoya:
      
      : Hwpoison code never unmapped those for no big reason because
      : Ksm pages never dominate memory, so we simply didn't have strong
      : motivation to save the pages.
      
      Also put WARN_ON(PageLRU) in case there is a race and we can hit LRU
      HWPoison pages which shouldn't happen but I couldn't convince myself about
      that.  Naoya has noted the following:
      
      : Theoretically no such gurantee, because try_to_unmap() doesn't have a
      : guarantee of success and then memory_failure() returns immediately
      : when hwpoison_user_mappings fails.
      : Or the following code (comes after hwpoison_user_mappings block) also impli=
      : es
      : that the target page can still have PageLRU flag.
      :
      :         /*
      :          * Torn down by someone else?
      :          */
      :         if (PageLRU(p) && !PageSwapCache(p) && p->mapping =3D=3D NULL) {
      :                 action_result(pfn, MF_MSG_TRUNCATED_LRU, MF_IGNORED);
      :                 res =3D -EBUSY;
      :                 goto out;
      :         }
      :
      : So I think it's OK to keep "if (WARN_ON(PageLRU(page)))" block in
      : current version of your patch.
      
      Link: http://lkml.kernel.org/r/20181206120135.14079-1-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.com>
      Debugged-by: default avatarOscar Salvador <osalvador@suse.com>
      Tested-by: default avatarOscar Salvador <osalvador@suse.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      060853fd
    • David Herrmann's avatar
      fork: record start_time late · d447cf0c
      David Herrmann authored
      commit 7b558513 upstream.
      
      This changes the fork(2) syscall to record the process start_time after
      initializing the basic task structure but still before making the new
      process visible to user-space.
      
      Technically, we could record the start_time anytime during fork(2).  But
      this might lead to scenarios where a start_time is recorded long before
      a process becomes visible to user-space.  For instance, with
      userfaultfd(2) and TLS, user-space can delay the execution of fork(2)
      for an indefinite amount of time (and will, if this causes network
      access, or similar).
      
      By recording the start_time late, it much closer reflects the point in
      time where the process becomes live and can be observed by other
      processes.
      
      Lastly, this makes it much harder for user-space to predict and control
      the start_time they get assigned.  Previously, user-space could fork a
      process and stall it in copy_thread_tls() before its pid is allocated,
      but after its start_time is recorded.  This can be misused to later-on
      cycle through PIDs and resume the stalled fork(2) yielding a process
      that has the same pid and start_time as a process that existed before.
      This can be used to circumvent security systems that identify processes
      by their pid+start_time combination.
      
      Even though user-space was always aware that start_time recording is
      flaky (but several projects are known to still rely on start_time-based
      identification), changing the start_time to be recorded late will help
      mitigate existing attacks and make it much harder for user-space to
      control the start_time a process gets assigned.
      Reported-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarTom Gundersen <teg@jklm.no>
      Signed-off-by: default avatarDavid Herrmann <dh.herrmann@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d447cf0c
    • Steffen Maier's avatar
      scsi: zfcp: fix posting too many status read buffers leading to adapter shutdown · d15d1677
      Steffen Maier authored
      commit 60a161b7 upstream.
      
      Suppose adapter (open) recovery is between opened QDIO queues and before
      (the end of) initial posting of status read buffers (SRBs). This time
      window can be seconds long due to FSF_PROT_HOST_CONNECTION_INITIALIZING
      causing by design looping with exponential increase sleeps in the function
      performing exchange config data during recovery
      [zfcp_erp_adapter_strat_fsf_xconf()]. Recovery triggered by local link up.
      
      Suppose an event occurs for which the FCP channel would send an unsolicited
      notification to zfcp by means of a previously posted SRB.  We saw it with
      local cable pull (link down) in multi-initiator zoning with multiple
      NPIV-enabled subchannels of the same shared FCP channel.
      
      As soon as zfcp_erp_adapter_strategy_open_fsf() starts posting the initial
      status read buffers from within the adapter's ERP thread, the channel does
      send an unsolicited notification.
      
      Since v2.6.27 commit d26ab06e ("[SCSI] zfcp: receiving an unsolicted
      status can lead to I/O stall"), zfcp_fsf_status_read_handler() schedules
      adapter->stat_work to re-fill the just consumed SRB from a work item.
      
      Now the ERP thread and the work item post SRBs in parallel.  Both contexts
      call the helper function zfcp_status_read_refill().  The tracking of
      missing (to be posted / re-filled) SRBs is not thread-safe due to separate
      atomic_read() and atomic_dec(), in order to depend on posting
      success. Hence, both contexts can see
      atomic_read(&adapter->stat_miss) == 1. One of the two contexts posts
      one too many SRB. Zfcp gets QDIO_ERROR_SLSB_STATE on the output queue
      (trace tag "qdireq1") leading to zfcp_erp_adapter_shutdown() in
      zfcp_qdio_handler_error().
      
      An obvious and seemingly clean fix would be to schedule stat_work from the
      ERP thread and wait for it to finish. This would serialize all SRB
      re-fills. However, we already have another work item wait on the ERP
      thread: adapter->scan_work runs zfcp_fc_scan_ports() which calls
      zfcp_fc_eval_gpn_ft(). The latter calls zfcp_erp_wait() to wait for all the
      open port recoveries during zfcp auto port scan, but in fact it waits for
      any pending recovery including an adapter recovery. This approach leads to
      a deadlock.  [see also v3.19 commit 18f87a67 ("zfcp: auto port scan
      resiliency"); v2.6.37 commit d3e1088d
      ("[SCSI] zfcp: No ERP escalation on gpn_ft eval");
      v2.6.28 commit fca55b6f
      ("[SCSI] zfcp: fix deadlock between wq triggered port scan and ERP")
      fixing v2.6.27 commit c57a39a4
      ("[SCSI] zfcp: wait until adapter is finished with ERP during auto-port");
      v2.6.27 commit cc8c2829
      ("[SCSI] zfcp: Automatically attach remote ports")]
      
      Instead make the accounting of missing SRBs atomic for parallel execution
      in both the ERP thread and adapter->stat_work.
      Signed-off-by: default avatarSteffen Maier <maier@linux.ibm.com>
      Fixes: d26ab06e ("[SCSI] zfcp: receiving an unsolicted status can lead to I/O stall")
      Cc: <stable@vger.kernel.org> #2.6.27+
      Reviewed-by: default avatarJens Remus <jremus@linux.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d15d1677