1. 30 Sep, 2016 3 commits
    • Eric Ren's avatar
      ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock() · c33f0785
      Eric Ren authored
      The testcase "mmaptruncate" of ocfs2-test deadlocks occasionally.
      
      In this testcase, we create a 2*CLUSTER_SIZE file and mmap() on it;
      there are 2 process repeatedly performing the following operations
      respectively: one is doing memset(mmaped_addr + 2*CLUSTER_SIZE - 1, 'a',
      1), while the another is playing ftruncate(fd, 2*CLUSTER_SIZE) and then
      ftruncate(fd, CLUSTER_SIZE) again and again.
      
      This is the backtrace when the deadlock happens:
      
         __wait_on_bit_lock+0x50/0xa0
         __lock_page+0xb7/0xc0
         ocfs2_write_begin_nolock+0x163f/0x1790 [ocfs2]
         ocfs2_page_mkwrite+0x1c7/0x2a0 [ocfs2]
         do_page_mkwrite+0x66/0xc0
         handle_mm_fault+0x685/0x1350
         __do_page_fault+0x1d8/0x4d0
         trace_do_page_fault+0x37/0xf0
         do_async_page_fault+0x19/0x70
         async_page_fault+0x28/0x30
      
      In ocfs2_write_begin_nolock(), we first grab the pages and then allocate
      disk space for this write; ocfs2_try_to_free_truncate_log() will be
      called if -ENOSPC is returned; if we're lucky to get enough clusters,
      which is usually the case, we start over again.
      
      But in ocfs2_free_write_ctxt() the target page isn't unlocked, so we
      will deadlock when trying to grab the target page again.
      
      Also, -ENOMEM might be returned in ocfs2_grab_pages_for_write().
      Another deadlock will happen in __do_page_mkwrite() if
      ocfs2_page_mkwrite() returns non-VM_FAULT_LOCKED, and along with a
      locked target page.
      
      These two errors fail on the same path, so fix them by unlocking the
      target page manually before ocfs2_free_write_ctxt().
      
      Jan Kara helps me clear out the JBD2 part, and suggest the hint for root
      cause.
      
      Changes since v1:
      1. Also put ENOMEM error case into consideration.
      
      Link: http://lkml.kernel.org/r/1474173902-32075-1-git-send-email-zren@suse.comSigned-off-by: default avatarEric Ren <zren@suse.com>
      Reviewed-by: default avatarHe Gang <ghe@suse.com>
      Acked-by: default avatarJoseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c33f0785
    • Johannes Weiner's avatar
      mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page() · 22f2ac51
      Johannes Weiner authored
      Antonio reports the following crash when using fuse under memory pressure:
      
        kernel BUG at /build/linux-a2WvEb/linux-4.4.0/mm/workingset.c:346!
        invalid opcode: 0000 [#1] SMP
        Modules linked in: all of them
        CPU: 2 PID: 63 Comm: kswapd0 Not tainted 4.4.0-36-generic #55-Ubuntu
        Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
        task: ffff88040cae6040 ti: ffff880407488000 task.ti: ffff880407488000
        RIP: shadow_lru_isolate+0x181/0x190
        Call Trace:
          __list_lru_walk_one.isra.3+0x8f/0x130
          list_lru_walk_one+0x23/0x30
          scan_shadow_nodes+0x34/0x50
          shrink_slab.part.40+0x1ed/0x3d0
          shrink_zone+0x2ca/0x2e0
          kswapd+0x51e/0x990
          kthread+0xd8/0xf0
          ret_from_fork+0x3f/0x70
      
      which corresponds to the following sanity check in the shadow node
      tracking:
      
        BUG_ON(node->count & RADIX_TREE_COUNT_MASK);
      
      The workingset code tracks radix tree nodes that exclusively contain
      shadow entries of evicted pages in them, and this (somewhat obscure)
      line checks whether there are real pages left that would interfere with
      reclaim of the radix tree node under memory pressure.
      
      While discussing ways how fuse might sneak pages into the radix tree
      past the workingset code, Miklos pointed to replace_page_cache_page(),
      and indeed there is a problem there: it properly accounts for the old
      page being removed - __delete_from_page_cache() does that - but then
      does a raw raw radix_tree_insert(), not accounting for the replacement
      page.  Eventually the page count bits in node->count underflow while
      leaving the node incorrectly linked to the shadow node LRU.
      
      To address this, make sure replace_page_cache_page() uses the tracked
      page insertion code, page_cache_tree_insert().  This fixes the page
      accounting and makes sure page-containing nodes are properly unlinked
      from the shadow node LRU again.
      
      Also, make the sanity checks a bit less obscure by using the helpers for
      checking the number of pages and shadows in a radix tree node.
      
      Fixes: 449dd698 ("mm: keep page cache radix tree nodes in check")
      Link: http://lkml.kernel.org/r/20160919155822.29498-1-hannes@cmpxchg.orgSigned-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: default avatarAntonio SJ Musumeci <trapexit@spawn.link>
      Debugged-by: default avatarMiklos Szeredi <miklos@szeredi.hu>
      Cc: <stable@vger.kernel.org>	[3.15+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      22f2ac51
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-for-v4.8-final' of git://people.freedesktop.org/~airlied/linux · e3b3656c
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "drm fixes for final 4.8.
      
        One big regression fix for udl, along with two amdgpu fixes and two
        nouveau fixes.
      
        All seems pretty safe and useful"
      
      * tag 'drm-fixes-for-v4.8-final' of git://people.freedesktop.org/~airlied/linux:
        drm/udl: fix line iterator in damage handling
        drm/radeon/si/dpm: add workaround for for Jet parts
        drm/amdgpu: disable CRTCs before teardown
        drm/nouveau: Revert "bus: remove cpu_coherent flag"
        drm/nouveau/fifo/nv04: avoid ramht race against cookie insertion
      e3b3656c
  2. 29 Sep, 2016 1 commit
    • Linus Torvalds's avatar
      Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · c6169de7
      Linus Torvalds authored
      Pull libnvdimm fixes from Dan Williams:
      
       - Four fixes for "flush hint" support.
      
         Flush hints are addresses advertised by the ACPI 6+ NFIT (NVDIMM
         Firmware Interface Table) that when written and fenced guarantee that
         writes pending in platform write buffers (outside the cpu) have been
         flushed to media.  They might also be used by hypervisors as a
         trigger condition to flush guest-persistent memory ranges to storage.
      
          Fix a potential data corruption issue, a broken definition of the
          hint array, a wrong allocation size for the unit test implementation
          of the flush hint table, and missing NULL check in an error path.
      
          The unit test, while it did not prevent these bugs from being
          merged, at least triggered occasional crashes in advance of
          production usages.
      
       - Fix handling of ACPI DSM error status results.  The DSM mechanism
         allows communication with platform and memory device firmware.  We
         correctly parse known errors, but were silently ignoring others.
      
         Fix it to consistently fail any command with a non-zero status return
         that we otherwise do not interpret / handle.
      
      * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        libnvdimm, region: fix flush hint table thinko
        nfit: fail DSMs that return non-zero status by default
        libnvdimm: fix devm_nvdimm_memremap() error path
        tools/testing/nvdimm: fix allocation range for mock flush hint tables
        nvdimm: fix PHYS_PFN/PFN_PHYS mixup
      c6169de7
  3. 28 Sep, 2016 10 commits
  4. 27 Sep, 2016 3 commits
  5. 26 Sep, 2016 3 commits
  6. 25 Sep, 2016 7 commits
    • Lorenzo Stoakes's avatar
      mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing · 38e08854
      Lorenzo Stoakes authored
      The NUMA balancing logic uses an arch-specific PROT_NONE page table flag
      defined by pte_protnone() or pmd_protnone() to mark PTEs or huge page
      PMDs respectively as requiring balancing upon a subsequent page fault.
      User-defined PROT_NONE memory regions which also have this flag set will
      not normally invoke the NUMA balancing code as do_page_fault() will send
      a segfault to the process before handle_mm_fault() is even called.
      
      However if access_remote_vm() is invoked to access a PROT_NONE region of
      memory, handle_mm_fault() is called via faultin_page() and
      __get_user_pages() without any access checks being performed, meaning
      the NUMA balancing logic is incorrectly invoked on a non-NUMA memory
      region.
      
      A simple means of triggering this problem is to access PROT_NONE mmap'd
      memory using /proc/self/mem which reliably results in the NUMA handling
      functions being invoked when CONFIG_NUMA_BALANCING is set.
      
      This issue was reported in bugzilla (issue 99101) which includes some
      simple repro code.
      
      There are BUG_ON() checks in do_numa_page() and do_huge_pmd_numa_page()
      added at commit c0e7cad9 to avoid accidentally provoking strange
      behaviour by attempting to apply NUMA balancing to pages that are in
      fact PROT_NONE.  The BUG_ON()'s are consistently triggered by the repro.
      
      This patch moves the PROT_NONE check into mm/memory.c rather than
      invoking BUG_ON() as faulting in these pages via faultin_page() is a
      valid reason for reaching the NUMA check with the PROT_NONE page table
      flag set and is therefore not always a bug.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=99101Reported-by: default avatarTrevor Saunders <tbsaunde@tbsaunde.org>
      Signed-off-by: default avatarLorenzo Stoakes <lstoakes@gmail.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      38e08854
    • Linus Torvalds's avatar
      Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus · 831e45d8
      Linus Torvalds authored
      Pull MIPS fixes from Ralf Baechle:
       "A round of 4.8 fixes:
      
        MIPS generic code:
         - Add a missing ".set pop" in an early commit
         - Fix memory regions reaching top of physical
         - MAAR: Fix address alignment
         - vDSO: Fix Malta EVA mapping to vDSO page structs
         - uprobes: fix incorrect uprobe brk handling
         - uprobes: select HAVE_REGS_AND_STACK_ACCESS_API
         - Avoid a BUG warning during PR_SET_FP_MODE prctl
         - SMP: Fix possibility of deadlock when bringing CPUs online
         - R6: Remove compact branch policy Kconfig entries
         - Fix size calc when avoiding IPIs for small icache flushes
         - Fix pre-r6 emulation FPU initialisation
         - Fix delay slot emulation count in debugfs
      
        ATH79:
         - Fix test for error return of clk_register_fixed_factor.
      
        Octeon:
         - Fix kernel header to work for VDSO build.
         - Fix initialization of platform device probing.
      
        paravirt:
         - Fix undefined reference to smp_bootstrap"
      
      * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
        MIPS: Fix delay slot emulation count in debugfs
        MIPS: SMP: Fix possibility of deadlock when bringing CPUs online
        MIPS: Fix pre-r6 emulation FPU initialisation
        MIPS: vDSO: Fix Malta EVA mapping to vDSO page structs
        MIPS: Select HAVE_REGS_AND_STACK_ACCESS_API
        MIPS: Octeon: Fix platform bus probing
        MIPS: Octeon: mangle-port: fix build failure with VDSO code
        MIPS: Avoid a BUG warning during prctl(PR_SET_FP_MODE, ...)
        MIPS: c-r4k: Fix size calc when avoiding IPIs for small icache flushes
        MIPS: Add a missing ".set pop" in an early commit
        MIPS: paravirt: Fix undefined reference to smp_bootstrap
        MIPS: Remove compact branch policy Kconfig entries
        MIPS: MAAR: Fix address alignment
        MIPS: Fix memory regions reaching top of physical
        MIPS: uprobes: fix incorrect uprobe brk handling
        MIPS: ath79: Fix test for error return of clk_register_fixed_factor().
      831e45d8
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.8-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 751b9a5d
      Linus Torvalds authored
      Pull one more powerpc fix from Michael Ellerman:
       "powernv/pci: Fix m64 checks for SR-IOV and window alignment from
        Russell Currey"
      
      * tag 'powerpc-4.8-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/powernv/pci: Fix m64 checks for SR-IOV and window alignment
      751b9a5d
    • Linus Torvalds's avatar
      radix tree: fix sibling entry handling in radix_tree_descend() · 8d2c0d36
      Linus Torvalds authored
      The fixes to the radix tree test suite show that the multi-order case is
      broken.  The basic reason is that the radix tree code uses tagged
      pointers with the "internal" bit in the low bits, and calculating the
      pointer indices was supposed to mask off those bits.  But gcc will
      notice that we then use the index to re-create the pointer, and will
      avoid doing the arithmetic and use the tagged pointer directly.
      
      This cleans the code up, using the existing is_sibling_entry() helper to
      validate the sibling pointer range (instead of open-coding it), and
      using entry_to_node() to mask off the low tag bit from the pointer.  And
      once you do that, you might as well just use the now cleaned-up pointer
      directly.
      
      [ Side note: the multi-order code isn't actually ever used in the kernel
        right now, and the only reason I didn't just delete all that code is
        that Kirill Shutemov piped up and said:
      
          "Well, my ext4-with-huge-pages patchset[1] uses multi-order entries.
           It also converts shmem-with-huge-pages and hugetlb to them.
      
           I'm okay with converting it to other mechanism, but I need
           something.  (I looked into Konstantin's RFC patchset[2].  It looks
           okay, but I don't feel myself qualified to review it as I don't
           know much about radix-tree internals.)"
      
        [1] http://lkml.kernel.org/r/20160915115523.29737-1-kirill.shutemov@linux.intel.com
        [2] http://lkml.kernel.org/r/147230727479.9957.1087787722571077339.stgit@zurg ]
      Reported-by: default avatarMatthew Wilcox <mawilcox@microsoft.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Cedric Blancher <cedric.blancher@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8d2c0d36
    • Matthew Wilcox's avatar
      radix tree test suite: Test radix_tree_replace_slot() for multiorder entries · 62fd5258
      Matthew Wilcox authored
      When we replace a multiorder entry, check that all indices reflect the
      new value.
      
      Also, compile the test suite with -O2, which shows other problems with
      the code due to some dodgy pointer operations in the radix tree code.
      Signed-off-by: default avatarMatthew Wilcox <mawilcox@microsoft.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      62fd5258
    • Al Viro's avatar
      fix memory leaks in tracing_buffers_splice_read() · 1ae2293d
      Al Viro authored
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      1ae2293d
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Move mutex to protect against resetting of seq data · 1245800c
      Steven Rostedt (Red Hat) authored
      The iter->seq can be reset outside the protection of the mutex. So can
      reading of user data. Move the mutex up to the beginning of the function.
      
      Fixes: d7350c3f ("tracing/core: make the read callbacks reentrants")
      Cc: stable@vger.kernel.org # 2.6.30+
      Reported-by: default avatarAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      1245800c
  7. 24 Sep, 2016 11 commits
  8. 23 Sep, 2016 2 commits