1. 13 Jun, 2017 13 commits
  2. 08 Jun, 2017 27 commits
    • Yisheng Xie's avatar
      mlock: fix mlock count can not decrease in race condition · 00fc586e
      Yisheng Xie authored
      [ Upstream commit 70feee0e ]
      
      Kefeng reported that when running the follow test, the mlock count in
      meminfo will increase permanently:
      
       [1] testcase
       linux:~ # cat test_mlockal
       grep Mlocked /proc/meminfo
        for j in `seq 0 10`
        do
       	for i in `seq 4 15`
       	do
       		./p_mlockall >> log &
       	done
       	sleep 0.2
       done
       # wait some time to let mlock counter decrease and 5s may not enough
       sleep 5
       grep Mlocked /proc/meminfo
      
       linux:~ # cat p_mlockall.c
       #include <sys/mman.h>
       #include <stdlib.h>
       #include <stdio.h>
      
       #define SPACE_LEN	4096
      
       int main(int argc, char ** argv)
       {
      	 	int ret;
      	 	void *adr = malloc(SPACE_LEN);
      	 	if (!adr)
      	 		return -1;
      
      	 	ret = mlockall(MCL_CURRENT | MCL_FUTURE);
      	 	printf("mlcokall ret = %d\n", ret);
      
      	 	ret = munlockall();
      	 	printf("munlcokall ret = %d\n", ret);
      
      	 	free(adr);
      	 	return 0;
      	 }
      
      In __munlock_pagevec() we should decrement NR_MLOCK for each page where
      we clear the PageMlocked flag.  Commit 1ebb7cc6 ("mm: munlock: batch
      NR_MLOCK zone state updates") has introduced a bug where we don't
      decrement NR_MLOCK for pages where we clear the flag, but fail to
      isolate them from the lru list (e.g.  when the pages are on some other
      cpu's percpu pagevec).  Since PageMlocked stays cleared, the NR_MLOCK
      accounting gets permanently disrupted by this.
      
      Fix it by counting the number of page whose PageMlock flag is cleared.
      
      Fixes: 1ebb7cc6 (" mm: munlock: batch NR_MLOCK zone state updates")
      Link: http://lkml.kernel.org/r/1495678405-54569-1-git-send-email-xieyisheng1@huawei.comSigned-off-by: default avatarYisheng Xie <xieyisheng1@huawei.com>
      Reported-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Tested-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: zhongjiang <zhongjiang@huawei.com>
      Cc: Hanjun Guo <guohanjun@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      00fc586e
    • Naoya Horiguchi's avatar
      mm/memory-failure: introduce get_hwpoison_page() for consistent refcount handling · 00136071
      Naoya Horiguchi authored
      [ Upstream commit ead07f6a ]
      
      memory_failure() can run in 2 different mode (specified by
      MF_COUNT_INCREASED) in page refcount perspective.  When
      MF_COUNT_INCREASED is set, memory_failure() assumes that the caller
      takes a refcount of the target page.  And if cleared, memory_failure()
      takes it in it's own.
      
      In current code, however, refcounting is done differently in each caller.
      For example, madvise_hwpoison() uses get_user_pages_fast() and
      hwpoison_inject() uses get_page_unless_zero().  So this inconsistent
      refcounting causes refcount failure especially for thp tail pages.
      Typical user visible effects are like memory leak or
      VM_BUG_ON_PAGE(!page_count(page)) in isolate_lru_page().
      
      To fix this refcounting issue, this patch introduces get_hwpoison_page()
      to handle thp tail pages in the same manner for each caller of hwpoison
      code.
      
      memory_failure() might fail to split thp and in such case it returns
      without completing page isolation.  This is not good because PageHWPoison
      on the thp is still set and there's no easy way to unpoison such thps.  So
      this patch try to roll back any action to the thp in "non anonymous thp"
      case and "thp split failed" case, expecting an MCE(SRAR) generated by
      later access afterward will properly free such thps.
      
      [akpm@linux-foundation.org: fix CONFIG_HWPOISON_INJECT=m]
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      00136071
    • Naoya Horiguchi's avatar
      mm/memory-failure: split thp earlier in memory error handling · da7cbd0c
      Naoya Horiguchi authored
      [ Upstream commit 415c64c1 ]
      
      memory_failure() doesn't handle thp itself at this time and need to split
      it before doing isolation.  Currently thp is split in the middle of
      hwpoison_user_mappings(), but there're corner cases where memory_failure()
      wrongly tries to handle thp without splitting.
      
      1) "non anonymous" thp, which is not a normal operating mode of thp,
         but a memory error could hit a thp before anon_vma is initialized.  In
         such case, split_huge_page() fails and me_huge_page() (intended for
         hugetlb) is called for thp, which triggers BUG_ON in page_hstate().
      
      2) !PageLRU case, where hwpoison_user_mappings() returns with
         SWAP_SUCCESS and the result is the same as case 1.
      
      memory_failure() can't avoid splitting, so let's split it more earlier,
      which also reduces code which are prepared for both of normal page and
      thp.
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      da7cbd0c
    • Thomas Gleixner's avatar
      slub/memcg: cure the brainless abuse of sysfs attributes · aeb3435b
      Thomas Gleixner authored
      [ Upstream commit 478fe303 ]
      
      memcg_propagate_slab_attrs() abuses the sysfs attribute file functions
      to propagate settings from the root kmem_cache to a newly created
      kmem_cache.  It does that with:
      
           attr->show(root, buf);
           attr->store(new, buf, strlen(bug);
      
      Aside of being a lazy and absurd hackery this is broken because it does
      not check the return value of the show() function.
      
      Some of the show() functions return 0 w/o touching the buffer.  That
      means in such a case the store function is called with the stale content
      of the previous show().  That causes nonsense like invoking
      kmem_cache_shrink() on a newly created kmem_cache.  In the worst case it
      would cause handing in an uninitialized buffer.
      
      This should be rewritten proper by adding a propagate() callback to
      those slub_attributes which must be propagated and avoid that insane
      conversion to and from ASCII, but that's too large for a hot fix.
      
      Check at least the return value of the show() function, so calling
      store() with stale content is prevented.
      
      Steven said:
       "It can cause a deadlock with get_online_cpus() that has been uncovered
        by recent cpu hotplug and lockdep changes that Thomas and Peter have
        been doing.
      
           Possible unsafe locking scenario:
      
                 CPU0                    CPU1
                 ----                    ----
            lock(cpu_hotplug.lock);
                                         lock(slab_mutex);
                                         lock(cpu_hotplug.lock);
            lock(slab_mutex);
      
           *** DEADLOCK ***"
      
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1705201244540.2255@nanosSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reported-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      aeb3435b
    • Tejun Heo's avatar
      blkcg: use blkg_free() in blkcg_init_queue() failure path · afc6ec14
      Tejun Heo authored
      [ Upstream commit 994b7832 ]
      
      When blkcg_init_queue() fails midway after creating a new blkg, it
      performs kfree() directly; however, this doesn't free the policy data
      areas.  Make it use blkg_free() instead.  In turn, blkg_free() is
      updated to handle root request_list special case.
      
      While this fixes a possible memory leak, it's on an unlikely failure
      path of an already cold path and the size leaked per occurrence is
      miniscule too.  I don't think it needs to be tagged for -stable.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      afc6ec14
    • Tejun Heo's avatar
      blkcg: always create the blkcg_gq for the root blkcg · f9fac98f
      Tejun Heo authored
      [ Upstream commit ec13b1d6 ]
      
      Currently, blkcg does a minor optimization where the root blkcg is
      created when the first blkcg policy is activated on a queue and
      destroyed on the deactivation of the last.  On systems where blkcg is
      configured but not used, this saves one blkcg_gq struct per queue.  On
      systems where blkcg is actually used, there's no difference.  The only
      case where this can lead to any meaninful, albeit still minute, save
      in memory consumption is when all blkcg policies are deactivated after
      being widely used in the system, which is a hihgly unlikely scenario.
      
      The conditional existence of root blkcg_gq has already created several
      bugs in blkcg and became an issue once again for the new per-cgroup
      wb_congested mechanism for cgroup writeback support leading to a NULL
      dereference when no blkcg policy is active.  This is really not worth
      bothering with.  This patch makes blkcg always allocate and link the
      root blkcg_gq and release it only on queue destruction.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      f9fac98f
    • Herbert Xu's avatar
      iscsi-target: Use shash and ahash · 712b6a6d
      Herbert Xu authored
      [ Upstream commit 69110e3c ]
      
      This patch replaces uses of the long obsolete hash interface with
      either shash (for non-SG users) or ahash.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      712b6a6d
    • Alexei Potashnik's avatar
      target/iscsi: Use proper SGL accessors for digest computation · 1bd31de3
      Alexei Potashnik authored
      [ Upstream commit aa75679c ]
      
      Current implementation assumes that all the buffers of an IO are linked
      with a single SG list, which is OK because target-core is only allocating
      a contigious scatterlist region.  However, this assumption is wrong for
      se_cmd descriptors that want to use chaining across multiple SGL regions.
      
      Fix this up by using proper SGL accessors for digest payload computation.
      Signed-off-by: default avatarAlexei Potashnik <alexei@purestorage.com>
      Cc: Roland Dreier <roland@purestorage.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      1bd31de3
    • Nicholas Bellinger's avatar
      iscsi-target: Fix initial login PDU asynchronous socket close OOPs · 89ff28d0
      Nicholas Bellinger authored
      [ Upstream commit 25cdda95 ]
      
      This patch fixes a OOPs originally introduced by:
      
         commit bb048357
         Author: Nicholas Bellinger <nab@linux-iscsi.org>
         Date:   Thu Sep 5 14:54:04 2013 -0700
      
         iscsi-target: Add sk->sk_state_change to cleanup after TCP failure
      
      which would trigger a NULL pointer dereference when a TCP connection
      was closed asynchronously via iscsi_target_sk_state_change(), but only
      when the initial PDU processing in iscsi_target_do_login() from iscsi_np
      process context was blocked waiting for backend I/O to complete.
      
      To address this issue, this patch makes the following changes.
      
      First, it introduces some common helper functions used for checking
      socket closing state, checking login_flags, and atomically checking
      socket closing state + setting login_flags.
      
      Second, it introduces a LOGIN_FLAGS_INITIAL_PDU bit to know when a TCP
      connection has dropped via iscsi_target_sk_state_change(), but the
      initial PDU processing within iscsi_target_do_login() in iscsi_np
      context is still running.  For this case, it sets LOGIN_FLAGS_CLOSED,
      but doesn't invoke schedule_delayed_work().
      
      The original NULL pointer dereference case reported by MNC is now handled
      by iscsi_target_do_login() doing a iscsi_target_sk_check_close() before
      transitioning to FFP to determine when the socket has already closed,
      or iscsi_target_start_negotiation() if the login needs to exchange
      more PDUs (eg: iscsi_target_do_login returned 0) but the socket has
      closed.  For both of these cases, the cleanup up of remaining connection
      resources will occur in iscsi_target_start_negotiation() from iscsi_np
      process context once the failure is detected.
      
      Finally, to handle to case where iscsi_target_sk_state_change() is
      called after the initial PDU procesing is complete, it now invokes
      conn->login_work -> iscsi_target_do_login_rx() to perform cleanup once
      existing iscsi_target_sk_check_close() checks detect connection failure.
      For this case, the cleanup of remaining connection resources will occur
      in iscsi_target_do_login_rx() from delayed workqueue process context
      once the failure is detected.
      Reported-by: default avatarMike Christie <mchristi@redhat.com>
      Reviewed-by: default avatarMike Christie <mchristi@redhat.com>
      Tested-by: default avatarMike Christie <mchristi@redhat.com>
      Cc: Mike Christie <mchristi@redhat.com>
      Reported-by: default avatarHannes Reinecke <hare@suse.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: Varun Prakash <varun@chelsio.com>
      Cc: <stable@vger.kernel.org> # v3.12+
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      89ff28d0
    • Bart Van Assche's avatar
      target/iscsi: Fix indentation in iscsi_target_start_negotiation() · 09cb399b
      Bart Van Assche authored
      [ Upstream commit 1efaa949 ]
      
      This patch avoids that smatch complains about inconsistent
      indentation in iscsi_target_start_negotiation().
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      09cb399b
    • Nicholas Bellinger's avatar
      iscsi-target: Fix early sk_data_ready LOGIN_FLAGS_READY race · 68185cb1
      Nicholas Bellinger authored
      [ Upstream commit 8f0dfb3d ]
      
      There is a iscsi-target/tcp login race in LOGIN_FLAGS_READY
      state assignment that can result in frequent errors during
      iscsi discovery:
      
            "iSCSI Login negotiation failed."
      
      To address this bug, move the initial LOGIN_FLAGS_READY
      assignment ahead of iscsi_target_do_login() when handling
      the initial iscsi_target_start_negotiation() request PDU
      during connection login.
      
      As iscsi_target_do_login_rx() work_struct callback is
      clearing LOGIN_FLAGS_READ_ACTIVE after subsequent calls
      to iscsi_target_do_login(), the early sk_data_ready
      ahead of the first iscsi_target_do_login() expects
      LOGIN_FLAGS_READY to also be set for the initial
      login request PDU.
      
      As reported by Maged, this was first obsered using an
      MSFT initiator running across multiple VMWare host
      virtual machines with iscsi-target/tcp.
      Reported-by: default avatarMaged Mokhtar <mmokhtar@binarykinetics.com>
      Tested-by: default avatarMaged Mokhtar <mmokhtar@binarykinetics.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      68185cb1
    • David Arcari's avatar
      cpufreq: cpufreq_register_driver() should return -ENODEV if init fails · 5df474e6
      David Arcari authored
      [ Upstream commit 6c770036 ]
      
      For a driver that does not set the CPUFREQ_STICKY flag, if all of the
      ->init() calls fail, cpufreq_register_driver() should return an error.
      This will prevent the driver from loading.
      
      Fixes: ce1bcfe9 (cpufreq: check cpufreq_policy_list instead of scanning policies for all CPUs)
      Cc: 4.0+ <stable@vger.kernel.org> # 4.0+
      Signed-off-by: default avatarDavid Arcari <darcari@redhat.com>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      5df474e6
    • Eric Anholt's avatar
      drm/msm: Expose our reservation object when exporting a dmabuf. · 7e144ca4
      Eric Anholt authored
      [ Upstream commit 43523eba ]
      
      Without this, polling on the dma-buf (and presumably other devices
      synchronizing against our rendering) would return immediately, even
      while the BO was busy.
      Signed-off-by: default avatarEric Anholt <eric@anholt.net>
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Cc: stable@vger.kernel.org
      Cc: Rob Clark <robdclark@gmail.com>
      Cc: linux-arm-msm@vger.kernel.org
      Cc: freedreno@lists.freedesktop.org
      Reviewed-by: default avatarRob Clark <robdclark@gmail.com>
      Signed-off-by: default avatarRob Clark <robdclark@gmail.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      7e144ca4
    • Jan Kara's avatar
      xfs: Fix missed holes in SEEK_HOLE implementation · 7e185e00
      Jan Kara authored
      [ Upstream commit 5375023a ]
      
      XFS SEEK_HOLE implementation could miss a hole in an unwritten extent as
      can be seen by the following command:
      
      xfs_io -c "falloc 0 256k" -c "pwrite 0 56k" -c "pwrite 128k 8k"
             -c "seek -h 0" file
      wrote 57344/57344 bytes at offset 0
      56 KiB, 14 ops; 0.0000 sec (49.312 MiB/sec and 12623.9856 ops/sec)
      wrote 8192/8192 bytes at offset 131072
      8 KiB, 2 ops; 0.0000 sec (70.383 MiB/sec and 18018.0180 ops/sec)
      Whence	Result
      HOLE	139264
      
      Where we can see that hole at offset 56k was just ignored by SEEK_HOLE
      implementation. The bug is in xfs_find_get_desired_pgoff() which does
      not properly detect the case when pages are not contiguous.
      
      Fix the problem by properly detecting when found page has larger offset
      than expected.
      
      CC: stable@vger.kernel.org
      Fixes: d126d43fSigned-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      7e185e00
    • Eryu Guan's avatar
      xfs: fix off-by-one on max nr_pages in xfs_find_get_desired_pgoff() · 59acce81
      Eryu Guan authored
      [ Upstream commit 8affebe1 ]
      
      xfs_find_get_desired_pgoff() is used to search for offset of hole or
      data in page range [index, end] (both inclusive), and the max number
      of pages to search should be at least one, if end == index.
      Otherwise the only page is missed and no hole or data is found,
      which is not correct.
      
      When block size is smaller than page size, this can be demonstrated
      by preallocating a file with size smaller than page size and writing
      data to the last block. E.g. run this xfs_io command on a 1k block
      size XFS on x86_64 host.
      
        # xfs_io -fc "falloc 0 3k" -c "pwrite 2k 1k" \
        	    -c "seek -d 0" /mnt/xfs/testfile
        wrote 1024/1024 bytes at offset 2048
        1 KiB, 1 ops; 0.0000 sec (33.675 MiB/sec and 34482.7586 ops/sec)
        Whence  Result
        DATA    EOF
      
      Data at offset 2k was missed, and lseek(2) returned ENXIO.
      
      This is uncovered by generic/285 subtest 07 and 08 on ppc64 host,
      where pagesize is 64k. Because a recent change to generic/285
      reduced the preallocated file size to smaller than 64k.
      
      Cc: stable@vger.kernel.org # v3.7+
      Signed-off-by: default avatarEryu Guan <eguan@redhat.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      59acce81
    • Lyude's avatar
      drm/radeon: Unbreak HPD handling for r600+ · b96e5f18
      Lyude authored
      [ Upstream commit 3d18e337 ]
      
      We end up reading the interrupt register for HPD5, and then writing it
      to HPD6 which on systems without anything using HPD5 results in
      permanently disabling hotplug on one of the display outputs after the
      first time we acknowledge a hotplug interrupt from the GPU.
      
      This code is really bad. But for now, let's just fix this. I will
      hopefully have a large patch series to refactor all of this soon.
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarLyude <lyude@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      b96e5f18
    • Alexander Sverdlin's avatar
      dmaengine: ep93xx: Don't drain the transfers in terminate_all() · 81402e40
      Alexander Sverdlin authored
      [ Upstream commit 98f9de36 ]
      
      Draining the transfers in terminate_all callback happens with IRQs disabled,
      therefore induces huge latency:
      
       irqsoff latency trace v1.1.5 on 4.11.0
       --------------------------------------------------------------------
       latency: 39770 us, #57/57, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0)
          -----------------
          | task: process-129 (uid:0 nice:0 policy:2 rt_prio:50)
          -----------------
        => started at: _snd_pcm_stream_lock_irqsave
        => ended at:   snd_pcm_stream_unlock_irqrestore
      
                        _------=> CPU#
                       / _-----=> irqs-off
                      | / _----=> need-resched
                      || / _---=> hardirq/softirq
                      ||| / _--=> preempt-depth
                      |||| /     delay
        cmd     pid   ||||| time  |   caller
           \   /      |||||  \    |   /
      process-129     0d.s.    3us : _snd_pcm_stream_lock_irqsave
      process-129     0d.s1    9us : snd_pcm_stream_lock <-_snd_pcm_stream_lock_irqsave
      process-129     0d.s1   15us : preempt_count_add <-snd_pcm_stream_lock
      process-129     0d.s2   22us : preempt_count_add <-snd_pcm_stream_lock
      process-129     0d.s3   32us : snd_pcm_update_hw_ptr0 <-snd_pcm_period_elapsed
      process-129     0d.s3   41us : soc_pcm_pointer <-snd_pcm_update_hw_ptr0
      process-129     0d.s3   50us : dmaengine_pcm_pointer <-soc_pcm_pointer
      process-129     0d.s3   58us+: snd_dmaengine_pcm_pointer_no_residue <-dmaengine_pcm_pointer
      process-129     0d.s3   96us : update_audio_tstamp <-snd_pcm_update_hw_ptr0
      process-129     0d.s3  103us : snd_pcm_update_state <-snd_pcm_update_hw_ptr0
      process-129     0d.s3  112us : xrun <-snd_pcm_update_state
      process-129     0d.s3  119us : snd_pcm_stop <-xrun
      process-129     0d.s3  126us : snd_pcm_action <-snd_pcm_stop
      process-129     0d.s3  134us : snd_pcm_action_single <-snd_pcm_action
      process-129     0d.s3  141us : snd_pcm_pre_stop <-snd_pcm_action_single
      process-129     0d.s3  150us : snd_pcm_do_stop <-snd_pcm_action_single
      process-129     0d.s3  157us : soc_pcm_trigger <-snd_pcm_do_stop
      process-129     0d.s3  166us : snd_dmaengine_pcm_trigger <-soc_pcm_trigger
      process-129     0d.s3  175us : ep93xx_dma_terminate_all <-snd_dmaengine_pcm_trigger
      process-129     0d.s3  182us : preempt_count_add <-ep93xx_dma_terminate_all
      process-129     0d.s4  189us*: m2p_hw_shutdown <-ep93xx_dma_terminate_all
      process-129     0d.s4 39472us : m2p_hw_setup <-ep93xx_dma_terminate_all
      
       ... rest skipped...
      
      process-129     0d.s. 40080us : <stack trace>
       => ep93xx_dma_tasklet
       => tasklet_action
       => __do_softirq
       => irq_exit
       => __handle_domain_irq
       => vic_handle_irq
       => __irq_usr
       => 0xb66c6668
      
      Just abort the transfers and warn if the HW state is not what we expect.
      Move draining into device_synchronize callback.
      Signed-off-by: default avatarAlexander Sverdlin <alexander.sverdlin@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      81402e40
    • Alexander Sverdlin's avatar
      dmaengine: ep93xx: Always start from BASE0 · 1a45b842
      Alexander Sverdlin authored
      [ Upstream commit 0037ae47 ]
      
      The current buffer is being reset to zero on device_free_chan_resources()
      but not on device_terminate_all(). It could happen that HW is restarted and
      expects BASE0 to be used, but the driver is not synchronized and will start
      from BASE1. One solution is to reset the buffer explicitly in
      m2p_hw_setup().
      Signed-off-by: default avatarAlexander Sverdlin <alexander.sverdlin@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      1a45b842
    • Patrik Jakobsson's avatar
      drm/gma500/psb: Actually use VBT mode when it is found · 72a5ed83
      Patrik Jakobsson authored
      [ Upstream commit 82bc9a42 ]
      
      With LVDS we were incorrectly picking the pre-programmed mode instead of
      the prefered mode provided by VBT. Make sure we pick the VBT mode if
      one is provided. It is likely that the mode read-out code is still wrong
      but this patch fixes the immediate problem on most machines.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78562
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarPatrik Jakobsson <patrik.r.jakobsson@gmail.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170418114332.12183-1-patrik.r.jakobsson@gmail.comSigned-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      72a5ed83
    • Rafael J. Wysocki's avatar
      PCI / PM: Avoid resuming more devices during system suspend · 4f268a10
      Rafael J. Wysocki authored
      [ Upstream commit 2cef548a ]
      
      Commit bac2a909 (PCI / PM: Avoid resuming PCI devices during
      system suspend) introduced a mechanism by which some PCI devices that
      were runtime-suspended at the system suspend time might be left in
      that state for the duration of the system suspend-resume cycle.
      However, it overlooked devices that were marked as capable of waking
      up the system just because PME support was detected in their PCI
      config space.
      
      Namely, in that case, device_can_wakeup(dev) returns 'true' for the
      device and if the device is not configured for system wakeup,
      device_may_wakeup(dev) returns 'false' and it will be resumed during
      system suspend even though configuring it for system wakeup may not
      really make sense at all.
      
      To avoid this problem, simply disable PME for PCI devices that have
      not been configured for system wakeup and are runtime-suspended at
      the system suspend time for the duration of the suspend-resume cycle.
      
      If the device is in D3cold, its config space is not available and it
      shouldn't be written to, but that's only possible if the device
      has platform PM support and the platform code is responsible for
      checking whether or not the device's configuration is suitable for
      system suspend in that case.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      4f268a10
    • Tadeusz Struk's avatar
      PCI: Add quirk for Intel DH895xCC VF PCI config erratum · b060ae49
      Tadeusz Struk authored
      [ Upstream commit 3388a614 ]
      
      The PCI capabilities list for Intel DH895xCC VFs (device id 0x0443) with
      QuickAssist Technology is prematurely terminated in hardware.
      Workaround the issue by hard-coding the known expected next capability
      pointer and saving the PCIE cap into internal buffer.
      
      Patch generated against cryptodev-2.6
      Signed-off-by: default avatarTadeusz Struk <tadeusz.struk@intel.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      b060ae49
    • Alexander Tsoy's avatar
      ALSA: hda - apply STAC_9200_DELL_M22 quirk for Dell Latitude D430 · e0bda32c
      Alexander Tsoy authored
      [ Upstream commit 1fc2e41f ]
      
      This model is actually called 92XXM2-8 in Windows driver. But since pin
      configs for M22 and M28 are identical, just reuse M22 quirk.
      
      Fixes external microphone (tested) and probably docking station ports
      (not tested).
      Signed-off-by: default avatarAlexander Tsoy <alexander@tsoy.me>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      e0bda32c
    • Srinath Mannam's avatar
      mmc: sdhci-iproc: suppress spurious interrupt with Multiblock read · 9dbe42c5
      Srinath Mannam authored
      [ Upstream commit f5f968f2 ]
      
      The stingray SDHCI hardware supports ACMD12 and automatically
      issues after multi block transfer completed.
      
      If ACMD12 in SDHCI is disabled, spurious tx done interrupts are seen
      on multi block read command with below error message:
      
      Got data interrupt 0x00000002 even though no data
      operation was in progress.
      
      This patch uses SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12 to enable
      ACM12 support in SDHCI hardware and suppress spurious interrupt.
      Signed-off-by: default avatarSrinath Mannam <srinath.mannam@broadcom.com>
      Reviewed-by: default avatarRay Jui <ray.jui@broadcom.com>
      Reviewed-by: default avatarScott Branden <scott.branden@broadcom.com>
      Acked-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Fixes: b580c52d ("mmc: sdhci-iproc: add IPROC SDHCI driver")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      9dbe42c5
    • Sebastian Reichel's avatar
      i2c: i2c-tiny-usb: fix buffer not being DMA capable · 0210333e
      Sebastian Reichel authored
      [ Upstream commit 5165da59 ]
      
      Since v4.9 i2c-tiny-usb generates the below call trace
      and longer works, since it can't communicate with the
      USB device. The reason is, that since v4.9 the USB
      stack checks, that the buffer it should transfer is DMA
      capable. This was a requirement since v2.2 days, but it
      usually worked nevertheless.
      
      [   17.504959] ------------[ cut here ]------------
      [   17.505488] WARNING: CPU: 0 PID: 93 at drivers/usb/core/hcd.c:1587 usb_hcd_map_urb_for_dma+0x37c/0x570
      [   17.506545] transfer buffer not dma capable
      [   17.507022] Modules linked in:
      [   17.507370] CPU: 0 PID: 93 Comm: i2cdetect Not tainted 4.11.0-rc8+ #10
      [   17.508103] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
      [   17.509039] Call Trace:
      [   17.509320]  ? dump_stack+0x5c/0x78
      [   17.509714]  ? __warn+0xbe/0xe0
      [   17.510073]  ? warn_slowpath_fmt+0x5a/0x80
      [   17.510532]  ? nommu_map_sg+0xb0/0xb0
      [   17.510949]  ? usb_hcd_map_urb_for_dma+0x37c/0x570
      [   17.511482]  ? usb_hcd_submit_urb+0x336/0xab0
      [   17.511976]  ? wait_for_completion_timeout+0x12f/0x1a0
      [   17.512549]  ? wait_for_completion_timeout+0x65/0x1a0
      [   17.513125]  ? usb_start_wait_urb+0x65/0x160
      [   17.513604]  ? usb_control_msg+0xdc/0x130
      [   17.514061]  ? usb_xfer+0xa4/0x2a0
      [   17.514445]  ? __i2c_transfer+0x108/0x3c0
      [   17.514899]  ? i2c_transfer+0x57/0xb0
      [   17.515310]  ? i2c_smbus_xfer_emulated+0x12f/0x590
      [   17.515851]  ? _raw_spin_unlock_irqrestore+0x11/0x20
      [   17.516408]  ? i2c_smbus_xfer+0x125/0x330
      [   17.516876]  ? i2c_smbus_xfer+0x125/0x330
      [   17.517329]  ? i2cdev_ioctl_smbus+0x1c1/0x2b0
      [   17.517824]  ? i2cdev_ioctl+0x75/0x1c0
      [   17.518248]  ? do_vfs_ioctl+0x9f/0x600
      [   17.518671]  ? vfs_write+0x144/0x190
      [   17.519078]  ? SyS_ioctl+0x74/0x80
      [   17.519463]  ? entry_SYSCALL_64_fastpath+0x1e/0xad
      [   17.519959] ---[ end trace d047c04982f5ac50 ]---
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarSebastian Reichel <sebastian.reichel@collabora.co.uk>
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: default avatarTill Harbaum <till@harbaum.org>
      Signed-off-by: default avatarWolfram Sang <wsa@the-dreams.de>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      0210333e
    • Chen, Gong's avatar
      x86/mce: Don't use percpu workqueues · 8852d28b
      Chen, Gong authored
      [ Upstream commit 061120ae ]
      
      An MCE is a rare event. Therefore, there's no need to have
      per-CPU instances of both normal and IRQ workqueues. Make them
      both global.
      Signed-off-by: default avatarChen, Gong <gong.chen@linux.intel.com>
      [ Fold in subsequent patch from Rui/Boris/Tony for early boot logging. ]
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      [ Massage commit message. ]
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1439396985-12812-4-git-send-email-bp@alien8.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      8852d28b
    • Al Viro's avatar
      osf_wait4(): fix infoleak · 94d42e88
      Al Viro authored
      [ Upstream commit a8c39544 ]
      
      failing sys_wait4() won't fill struct rusage...
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      94d42e88
    • Wanpeng Li's avatar
      KVM: X86: Fix read out-of-bounds vulnerability in kvm pio emulation · 156c18c7
      Wanpeng Li authored
      [ Upstream commit cbfc6c91 ]
      
      Huawei folks reported a read out-of-bounds vulnerability in kvm pio emulation.
      
      - "inb" instruction to access PIT Mod/Command register (ioport 0x43, write only,
        a read should be ignored) in guest can get a random number.
      - "rep insb" instruction to access PIT register port 0x43 can control memcpy()
        in emulator_pio_in_emulated() to copy max 0x400 bytes but only read 1 bytes,
        which will disclose the unimportant kernel memory in host but no crash.
      
      The similar test program below can reproduce the read out-of-bounds vulnerability:
      
      void hexdump(void *mem, unsigned int len)
      {
              unsigned int i, j;
      
              for(i = 0; i < len + ((len % HEXDUMP_COLS) ? (HEXDUMP_COLS - len % HEXDUMP_COLS) : 0); i++)
              {
                      /* print offset */
                      if(i % HEXDUMP_COLS == 0)
                      {
                              printf("0x%06x: ", i);
                      }
      
                      /* print hex data */
                      if(i < len)
                      {
                              printf("%02x ", 0xFF & ((char*)mem)[i]);
                      }
                      else /* end of block, just aligning for ASCII dump */
                      {
                              printf("   ");
                      }
      
                      /* print ASCII dump */
                      if(i % HEXDUMP_COLS == (HEXDUMP_COLS - 1))
                      {
                              for(j = i - (HEXDUMP_COLS - 1); j <= i; j++)
                              {
                                      if(j >= len) /* end of block, not really printing */
                                      {
                                              putchar(' ');
                                      }
                                      else if(isprint(((char*)mem)[j])) /* printable char */
                                      {
                                              putchar(0xFF & ((char*)mem)[j]);
                                      }
                                      else /* other char */
                                      {
                                              putchar('.');
                                      }
                              }
                              putchar('\n');
                      }
              }
      }
      
      int main(void)
      {
      	int i;
      	if (iopl(3))
      	{
      		err(1, "set iopl unsuccessfully\n");
      		return -1;
      	}
      	static char buf[0x40];
      
      	/* test ioport 0x40,0x41,0x42,0x43,0x44,0x45 */
      
      	memset(buf, 0xab, sizeof(buf));
      
      	asm volatile("push %rdi;");
      	asm volatile("mov %0, %%rdi;"::"q"(buf));
      
      	asm volatile ("mov $0x40, %rdx;");
      	asm volatile ("in %dx,%al;");
      	asm volatile ("stosb;");
      
      	asm volatile ("mov $0x41, %rdx;");
      	asm volatile ("in %dx,%al;");
      	asm volatile ("stosb;");
      
      	asm volatile ("mov $0x42, %rdx;");
      	asm volatile ("in %dx,%al;");
      	asm volatile ("stosb;");
      
      	asm volatile ("mov $0x43, %rdx;");
      	asm volatile ("in %dx,%al;");
      	asm volatile ("stosb;");
      
      	asm volatile ("mov $0x44, %rdx;");
      	asm volatile ("in %dx,%al;");
      	asm volatile ("stosb;");
      
      	asm volatile ("mov $0x45, %rdx;");
      	asm volatile ("in %dx,%al;");
      	asm volatile ("stosb;");
      
      	asm volatile ("pop %rdi;");
      	hexdump(buf, 0x40);
      
      	printf("\n");
      
      	/* ins port 0x40 */
      
      	memset(buf, 0xab, sizeof(buf));
      
      	asm volatile("push %rdi;");
      	asm volatile("mov %0, %%rdi;"::"q"(buf));
      
      	asm volatile ("mov $0x20, %rcx;");
      	asm volatile ("mov $0x40, %rdx;");
      	asm volatile ("rep insb;");
      
      	asm volatile ("pop %rdi;");
      	hexdump(buf, 0x40);
      
      	printf("\n");
      
      	/* ins port 0x43 */
      
      	memset(buf, 0xab, sizeof(buf));
      
      	asm volatile("push %rdi;");
      	asm volatile("mov %0, %%rdi;"::"q"(buf));
      
      	asm volatile ("mov $0x20, %rcx;");
      	asm volatile ("mov $0x43, %rdx;");
      	asm volatile ("rep insb;");
      
      	asm volatile ("pop %rdi;");
      	hexdump(buf, 0x40);
      
      	printf("\n");
      	return 0;
      }
      
      The vcpu->arch.pio_data buffer is used by both in/out instrutions emulation
      w/o clear after using which results in some random datas are left over in
      the buffer. Guest reads port 0x43 will be ignored since it is write only,
      however, the function kernel_pio() can't distigush this ignore from successfully
      reads data from device's ioport. There is no new data fill the buffer from
      port 0x43, however, emulator_pio_in_emulated() will copy the stale data in
      the buffer to the guest unconditionally. This patch fixes it by clearing the
      buffer before in instruction emulation to avoid to grant guest the stale data
      in the buffer.
      
      In addition, string I/O is not supported for in kernel device. So there is no
      iteration to read ioport %RCX times for string I/O. The function kernel_pio()
      just reads one round, and then copy the io size * %RCX to the guest unconditionally,
      actually it copies the one round ioport data w/ other random datas which are left
      over in the vcpu->arch.pio_data buffer to the guest. This patch fixes it by
      introducing the string I/O support for in kernel device in order to grant the right
      ioport datas to the guest.
      
      Before the patch:
      
      0x000000: fe 38 93 93 ff ff ab ab .8......
      0x000008: ab ab ab ab ab ab ab ab ........
      0x000010: ab ab ab ab ab ab ab ab ........
      0x000018: ab ab ab ab ab ab ab ab ........
      0x000020: ab ab ab ab ab ab ab ab ........
      0x000028: ab ab ab ab ab ab ab ab ........
      0x000030: ab ab ab ab ab ab ab ab ........
      0x000038: ab ab ab ab ab ab ab ab ........
      
      0x000000: f6 00 00 00 00 00 00 00 ........
      0x000008: 00 00 00 00 00 00 00 00 ........
      0x000010: 00 00 00 00 4d 51 30 30 ....MQ00
      0x000018: 30 30 20 33 20 20 20 20 00 3
      0x000020: ab ab ab ab ab ab ab ab ........
      0x000028: ab ab ab ab ab ab ab ab ........
      0x000030: ab ab ab ab ab ab ab ab ........
      0x000038: ab ab ab ab ab ab ab ab ........
      
      0x000000: f6 00 00 00 00 00 00 00 ........
      0x000008: 00 00 00 00 00 00 00 00 ........
      0x000010: 00 00 00 00 4d 51 30 30 ....MQ00
      0x000018: 30 30 20 33 20 20 20 20 00 3
      0x000020: ab ab ab ab ab ab ab ab ........
      0x000028: ab ab ab ab ab ab ab ab ........
      0x000030: ab ab ab ab ab ab ab ab ........
      0x000038: ab ab ab ab ab ab ab ab ........
      
      After the patch:
      
      0x000000: 1e 02 f8 00 ff ff ab ab ........
      0x000008: ab ab ab ab ab ab ab ab ........
      0x000010: ab ab ab ab ab ab ab ab ........
      0x000018: ab ab ab ab ab ab ab ab ........
      0x000020: ab ab ab ab ab ab ab ab ........
      0x000028: ab ab ab ab ab ab ab ab ........
      0x000030: ab ab ab ab ab ab ab ab ........
      0x000038: ab ab ab ab ab ab ab ab ........
      
      0x000000: d2 e2 d2 df d2 db d2 d7 ........
      0x000008: d2 d3 d2 cf d2 cb d2 c7 ........
      0x000010: d2 c4 d2 c0 d2 bc d2 b8 ........
      0x000018: d2 b4 d2 b0 d2 ac d2 a8 ........
      0x000020: ab ab ab ab ab ab ab ab ........
      0x000028: ab ab ab ab ab ab ab ab ........
      0x000030: ab ab ab ab ab ab ab ab ........
      0x000038: ab ab ab ab ab ab ab ab ........
      
      0x000000: 00 00 00 00 00 00 00 00 ........
      0x000008: 00 00 00 00 00 00 00 00 ........
      0x000010: 00 00 00 00 00 00 00 00 ........
      0x000018: 00 00 00 00 00 00 00 00 ........
      0x000020: ab ab ab ab ab ab ab ab ........
      0x000028: ab ab ab ab ab ab ab ab ........
      0x000030: ab ab ab ab ab ab ab ab ........
      0x000038: ab ab ab ab ab ab ab ab ........
      Reported-by: default avatarMoguofang <moguofang@huawei.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Moguofang <moguofang@huawei.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      156c18c7