1. 13 Jun, 2017 24 commits
  2. 08 Jun, 2017 16 commits
    • Yisheng Xie's avatar
      mlock: fix mlock count can not decrease in race condition · 00fc586e
      Yisheng Xie authored
      [ Upstream commit 70feee0e ]
      
      Kefeng reported that when running the follow test, the mlock count in
      meminfo will increase permanently:
      
       [1] testcase
       linux:~ # cat test_mlockal
       grep Mlocked /proc/meminfo
        for j in `seq 0 10`
        do
       	for i in `seq 4 15`
       	do
       		./p_mlockall >> log &
       	done
       	sleep 0.2
       done
       # wait some time to let mlock counter decrease and 5s may not enough
       sleep 5
       grep Mlocked /proc/meminfo
      
       linux:~ # cat p_mlockall.c
       #include <sys/mman.h>
       #include <stdlib.h>
       #include <stdio.h>
      
       #define SPACE_LEN	4096
      
       int main(int argc, char ** argv)
       {
      	 	int ret;
      	 	void *adr = malloc(SPACE_LEN);
      	 	if (!adr)
      	 		return -1;
      
      	 	ret = mlockall(MCL_CURRENT | MCL_FUTURE);
      	 	printf("mlcokall ret = %d\n", ret);
      
      	 	ret = munlockall();
      	 	printf("munlcokall ret = %d\n", ret);
      
      	 	free(adr);
      	 	return 0;
      	 }
      
      In __munlock_pagevec() we should decrement NR_MLOCK for each page where
      we clear the PageMlocked flag.  Commit 1ebb7cc6 ("mm: munlock: batch
      NR_MLOCK zone state updates") has introduced a bug where we don't
      decrement NR_MLOCK for pages where we clear the flag, but fail to
      isolate them from the lru list (e.g.  when the pages are on some other
      cpu's percpu pagevec).  Since PageMlocked stays cleared, the NR_MLOCK
      accounting gets permanently disrupted by this.
      
      Fix it by counting the number of page whose PageMlock flag is cleared.
      
      Fixes: 1ebb7cc6 (" mm: munlock: batch NR_MLOCK zone state updates")
      Link: http://lkml.kernel.org/r/1495678405-54569-1-git-send-email-xieyisheng1@huawei.comSigned-off-by: default avatarYisheng Xie <xieyisheng1@huawei.com>
      Reported-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Tested-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: zhongjiang <zhongjiang@huawei.com>
      Cc: Hanjun Guo <guohanjun@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      00fc586e
    • Naoya Horiguchi's avatar
      mm/memory-failure: introduce get_hwpoison_page() for consistent refcount handling · 00136071
      Naoya Horiguchi authored
      [ Upstream commit ead07f6a ]
      
      memory_failure() can run in 2 different mode (specified by
      MF_COUNT_INCREASED) in page refcount perspective.  When
      MF_COUNT_INCREASED is set, memory_failure() assumes that the caller
      takes a refcount of the target page.  And if cleared, memory_failure()
      takes it in it's own.
      
      In current code, however, refcounting is done differently in each caller.
      For example, madvise_hwpoison() uses get_user_pages_fast() and
      hwpoison_inject() uses get_page_unless_zero().  So this inconsistent
      refcounting causes refcount failure especially for thp tail pages.
      Typical user visible effects are like memory leak or
      VM_BUG_ON_PAGE(!page_count(page)) in isolate_lru_page().
      
      To fix this refcounting issue, this patch introduces get_hwpoison_page()
      to handle thp tail pages in the same manner for each caller of hwpoison
      code.
      
      memory_failure() might fail to split thp and in such case it returns
      without completing page isolation.  This is not good because PageHWPoison
      on the thp is still set and there's no easy way to unpoison such thps.  So
      this patch try to roll back any action to the thp in "non anonymous thp"
      case and "thp split failed" case, expecting an MCE(SRAR) generated by
      later access afterward will properly free such thps.
      
      [akpm@linux-foundation.org: fix CONFIG_HWPOISON_INJECT=m]
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      00136071
    • Naoya Horiguchi's avatar
      mm/memory-failure: split thp earlier in memory error handling · da7cbd0c
      Naoya Horiguchi authored
      [ Upstream commit 415c64c1 ]
      
      memory_failure() doesn't handle thp itself at this time and need to split
      it before doing isolation.  Currently thp is split in the middle of
      hwpoison_user_mappings(), but there're corner cases where memory_failure()
      wrongly tries to handle thp without splitting.
      
      1) "non anonymous" thp, which is not a normal operating mode of thp,
         but a memory error could hit a thp before anon_vma is initialized.  In
         such case, split_huge_page() fails and me_huge_page() (intended for
         hugetlb) is called for thp, which triggers BUG_ON in page_hstate().
      
      2) !PageLRU case, where hwpoison_user_mappings() returns with
         SWAP_SUCCESS and the result is the same as case 1.
      
      memory_failure() can't avoid splitting, so let's split it more earlier,
      which also reduces code which are prepared for both of normal page and
      thp.
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      da7cbd0c
    • Thomas Gleixner's avatar
      slub/memcg: cure the brainless abuse of sysfs attributes · aeb3435b
      Thomas Gleixner authored
      [ Upstream commit 478fe303 ]
      
      memcg_propagate_slab_attrs() abuses the sysfs attribute file functions
      to propagate settings from the root kmem_cache to a newly created
      kmem_cache.  It does that with:
      
           attr->show(root, buf);
           attr->store(new, buf, strlen(bug);
      
      Aside of being a lazy and absurd hackery this is broken because it does
      not check the return value of the show() function.
      
      Some of the show() functions return 0 w/o touching the buffer.  That
      means in such a case the store function is called with the stale content
      of the previous show().  That causes nonsense like invoking
      kmem_cache_shrink() on a newly created kmem_cache.  In the worst case it
      would cause handing in an uninitialized buffer.
      
      This should be rewritten proper by adding a propagate() callback to
      those slub_attributes which must be propagated and avoid that insane
      conversion to and from ASCII, but that's too large for a hot fix.
      
      Check at least the return value of the show() function, so calling
      store() with stale content is prevented.
      
      Steven said:
       "It can cause a deadlock with get_online_cpus() that has been uncovered
        by recent cpu hotplug and lockdep changes that Thomas and Peter have
        been doing.
      
           Possible unsafe locking scenario:
      
                 CPU0                    CPU1
                 ----                    ----
            lock(cpu_hotplug.lock);
                                         lock(slab_mutex);
                                         lock(cpu_hotplug.lock);
            lock(slab_mutex);
      
           *** DEADLOCK ***"
      
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1705201244540.2255@nanosSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reported-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      aeb3435b
    • Tejun Heo's avatar
      blkcg: use blkg_free() in blkcg_init_queue() failure path · afc6ec14
      Tejun Heo authored
      [ Upstream commit 994b7832 ]
      
      When blkcg_init_queue() fails midway after creating a new blkg, it
      performs kfree() directly; however, this doesn't free the policy data
      areas.  Make it use blkg_free() instead.  In turn, blkg_free() is
      updated to handle root request_list special case.
      
      While this fixes a possible memory leak, it's on an unlikely failure
      path of an already cold path and the size leaked per occurrence is
      miniscule too.  I don't think it needs to be tagged for -stable.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      afc6ec14
    • Tejun Heo's avatar
      blkcg: always create the blkcg_gq for the root blkcg · f9fac98f
      Tejun Heo authored
      [ Upstream commit ec13b1d6 ]
      
      Currently, blkcg does a minor optimization where the root blkcg is
      created when the first blkcg policy is activated on a queue and
      destroyed on the deactivation of the last.  On systems where blkcg is
      configured but not used, this saves one blkcg_gq struct per queue.  On
      systems where blkcg is actually used, there's no difference.  The only
      case where this can lead to any meaninful, albeit still minute, save
      in memory consumption is when all blkcg policies are deactivated after
      being widely used in the system, which is a hihgly unlikely scenario.
      
      The conditional existence of root blkcg_gq has already created several
      bugs in blkcg and became an issue once again for the new per-cgroup
      wb_congested mechanism for cgroup writeback support leading to a NULL
      dereference when no blkcg policy is active.  This is really not worth
      bothering with.  This patch makes blkcg always allocate and link the
      root blkcg_gq and release it only on queue destruction.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      f9fac98f
    • Herbert Xu's avatar
      iscsi-target: Use shash and ahash · 712b6a6d
      Herbert Xu authored
      [ Upstream commit 69110e3c ]
      
      This patch replaces uses of the long obsolete hash interface with
      either shash (for non-SG users) or ahash.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      712b6a6d
    • Alexei Potashnik's avatar
      target/iscsi: Use proper SGL accessors for digest computation · 1bd31de3
      Alexei Potashnik authored
      [ Upstream commit aa75679c ]
      
      Current implementation assumes that all the buffers of an IO are linked
      with a single SG list, which is OK because target-core is only allocating
      a contigious scatterlist region.  However, this assumption is wrong for
      se_cmd descriptors that want to use chaining across multiple SGL regions.
      
      Fix this up by using proper SGL accessors for digest payload computation.
      Signed-off-by: default avatarAlexei Potashnik <alexei@purestorage.com>
      Cc: Roland Dreier <roland@purestorage.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      1bd31de3
    • Nicholas Bellinger's avatar
      iscsi-target: Fix initial login PDU asynchronous socket close OOPs · 89ff28d0
      Nicholas Bellinger authored
      [ Upstream commit 25cdda95 ]
      
      This patch fixes a OOPs originally introduced by:
      
         commit bb048357
         Author: Nicholas Bellinger <nab@linux-iscsi.org>
         Date:   Thu Sep 5 14:54:04 2013 -0700
      
         iscsi-target: Add sk->sk_state_change to cleanup after TCP failure
      
      which would trigger a NULL pointer dereference when a TCP connection
      was closed asynchronously via iscsi_target_sk_state_change(), but only
      when the initial PDU processing in iscsi_target_do_login() from iscsi_np
      process context was blocked waiting for backend I/O to complete.
      
      To address this issue, this patch makes the following changes.
      
      First, it introduces some common helper functions used for checking
      socket closing state, checking login_flags, and atomically checking
      socket closing state + setting login_flags.
      
      Second, it introduces a LOGIN_FLAGS_INITIAL_PDU bit to know when a TCP
      connection has dropped via iscsi_target_sk_state_change(), but the
      initial PDU processing within iscsi_target_do_login() in iscsi_np
      context is still running.  For this case, it sets LOGIN_FLAGS_CLOSED,
      but doesn't invoke schedule_delayed_work().
      
      The original NULL pointer dereference case reported by MNC is now handled
      by iscsi_target_do_login() doing a iscsi_target_sk_check_close() before
      transitioning to FFP to determine when the socket has already closed,
      or iscsi_target_start_negotiation() if the login needs to exchange
      more PDUs (eg: iscsi_target_do_login returned 0) but the socket has
      closed.  For both of these cases, the cleanup up of remaining connection
      resources will occur in iscsi_target_start_negotiation() from iscsi_np
      process context once the failure is detected.
      
      Finally, to handle to case where iscsi_target_sk_state_change() is
      called after the initial PDU procesing is complete, it now invokes
      conn->login_work -> iscsi_target_do_login_rx() to perform cleanup once
      existing iscsi_target_sk_check_close() checks detect connection failure.
      For this case, the cleanup of remaining connection resources will occur
      in iscsi_target_do_login_rx() from delayed workqueue process context
      once the failure is detected.
      Reported-by: default avatarMike Christie <mchristi@redhat.com>
      Reviewed-by: default avatarMike Christie <mchristi@redhat.com>
      Tested-by: default avatarMike Christie <mchristi@redhat.com>
      Cc: Mike Christie <mchristi@redhat.com>
      Reported-by: default avatarHannes Reinecke <hare@suse.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: Varun Prakash <varun@chelsio.com>
      Cc: <stable@vger.kernel.org> # v3.12+
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      89ff28d0
    • Bart Van Assche's avatar
      target/iscsi: Fix indentation in iscsi_target_start_negotiation() · 09cb399b
      Bart Van Assche authored
      [ Upstream commit 1efaa949 ]
      
      This patch avoids that smatch complains about inconsistent
      indentation in iscsi_target_start_negotiation().
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      09cb399b
    • Nicholas Bellinger's avatar
      iscsi-target: Fix early sk_data_ready LOGIN_FLAGS_READY race · 68185cb1
      Nicholas Bellinger authored
      [ Upstream commit 8f0dfb3d ]
      
      There is a iscsi-target/tcp login race in LOGIN_FLAGS_READY
      state assignment that can result in frequent errors during
      iscsi discovery:
      
            "iSCSI Login negotiation failed."
      
      To address this bug, move the initial LOGIN_FLAGS_READY
      assignment ahead of iscsi_target_do_login() when handling
      the initial iscsi_target_start_negotiation() request PDU
      during connection login.
      
      As iscsi_target_do_login_rx() work_struct callback is
      clearing LOGIN_FLAGS_READ_ACTIVE after subsequent calls
      to iscsi_target_do_login(), the early sk_data_ready
      ahead of the first iscsi_target_do_login() expects
      LOGIN_FLAGS_READY to also be set for the initial
      login request PDU.
      
      As reported by Maged, this was first obsered using an
      MSFT initiator running across multiple VMWare host
      virtual machines with iscsi-target/tcp.
      Reported-by: default avatarMaged Mokhtar <mmokhtar@binarykinetics.com>
      Tested-by: default avatarMaged Mokhtar <mmokhtar@binarykinetics.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      68185cb1
    • David Arcari's avatar
      cpufreq: cpufreq_register_driver() should return -ENODEV if init fails · 5df474e6
      David Arcari authored
      [ Upstream commit 6c770036 ]
      
      For a driver that does not set the CPUFREQ_STICKY flag, if all of the
      ->init() calls fail, cpufreq_register_driver() should return an error.
      This will prevent the driver from loading.
      
      Fixes: ce1bcfe9 (cpufreq: check cpufreq_policy_list instead of scanning policies for all CPUs)
      Cc: 4.0+ <stable@vger.kernel.org> # 4.0+
      Signed-off-by: default avatarDavid Arcari <darcari@redhat.com>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      5df474e6
    • Eric Anholt's avatar
      drm/msm: Expose our reservation object when exporting a dmabuf. · 7e144ca4
      Eric Anholt authored
      [ Upstream commit 43523eba ]
      
      Without this, polling on the dma-buf (and presumably other devices
      synchronizing against our rendering) would return immediately, even
      while the BO was busy.
      Signed-off-by: default avatarEric Anholt <eric@anholt.net>
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Cc: stable@vger.kernel.org
      Cc: Rob Clark <robdclark@gmail.com>
      Cc: linux-arm-msm@vger.kernel.org
      Cc: freedreno@lists.freedesktop.org
      Reviewed-by: default avatarRob Clark <robdclark@gmail.com>
      Signed-off-by: default avatarRob Clark <robdclark@gmail.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      7e144ca4
    • Jan Kara's avatar
      xfs: Fix missed holes in SEEK_HOLE implementation · 7e185e00
      Jan Kara authored
      [ Upstream commit 5375023a ]
      
      XFS SEEK_HOLE implementation could miss a hole in an unwritten extent as
      can be seen by the following command:
      
      xfs_io -c "falloc 0 256k" -c "pwrite 0 56k" -c "pwrite 128k 8k"
             -c "seek -h 0" file
      wrote 57344/57344 bytes at offset 0
      56 KiB, 14 ops; 0.0000 sec (49.312 MiB/sec and 12623.9856 ops/sec)
      wrote 8192/8192 bytes at offset 131072
      8 KiB, 2 ops; 0.0000 sec (70.383 MiB/sec and 18018.0180 ops/sec)
      Whence	Result
      HOLE	139264
      
      Where we can see that hole at offset 56k was just ignored by SEEK_HOLE
      implementation. The bug is in xfs_find_get_desired_pgoff() which does
      not properly detect the case when pages are not contiguous.
      
      Fix the problem by properly detecting when found page has larger offset
      than expected.
      
      CC: stable@vger.kernel.org
      Fixes: d126d43fSigned-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      7e185e00
    • Eryu Guan's avatar
      xfs: fix off-by-one on max nr_pages in xfs_find_get_desired_pgoff() · 59acce81
      Eryu Guan authored
      [ Upstream commit 8affebe1 ]
      
      xfs_find_get_desired_pgoff() is used to search for offset of hole or
      data in page range [index, end] (both inclusive), and the max number
      of pages to search should be at least one, if end == index.
      Otherwise the only page is missed and no hole or data is found,
      which is not correct.
      
      When block size is smaller than page size, this can be demonstrated
      by preallocating a file with size smaller than page size and writing
      data to the last block. E.g. run this xfs_io command on a 1k block
      size XFS on x86_64 host.
      
        # xfs_io -fc "falloc 0 3k" -c "pwrite 2k 1k" \
        	    -c "seek -d 0" /mnt/xfs/testfile
        wrote 1024/1024 bytes at offset 2048
        1 KiB, 1 ops; 0.0000 sec (33.675 MiB/sec and 34482.7586 ops/sec)
        Whence  Result
        DATA    EOF
      
      Data at offset 2k was missed, and lseek(2) returned ENXIO.
      
      This is uncovered by generic/285 subtest 07 and 08 on ppc64 host,
      where pagesize is 64k. Because a recent change to generic/285
      reduced the preallocated file size to smaller than 64k.
      
      Cc: stable@vger.kernel.org # v3.7+
      Signed-off-by: default avatarEryu Guan <eguan@redhat.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      59acce81
    • Lyude's avatar
      drm/radeon: Unbreak HPD handling for r600+ · b96e5f18
      Lyude authored
      [ Upstream commit 3d18e337 ]
      
      We end up reading the interrupt register for HPD5, and then writing it
      to HPD6 which on systems without anything using HPD5 results in
      permanently disabling hotplug on one of the display outputs after the
      first time we acknowledge a hotplug interrupt from the GPU.
      
      This code is really bad. But for now, let's just fix this. I will
      hopefully have a large patch series to refactor all of this soon.
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarLyude <lyude@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      b96e5f18