1. 19 Feb, 2015 29 commits
  2. 24 Jan, 2015 11 commits
    • Sasha Levin's avatar
      Linux 3.8.13.38 · 69368f50
      Sasha Levin authored
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      69368f50
    • Chris Mason's avatar
      Btrfs: don't delay inode ref updates during log replay · e79b64e2
      Chris Mason authored
      Commit 1d52c78a (Btrfs: try not to ENOSPC on log replay) added a
      check to skip delayed inode updates during log replay because it
      confuses the enospc code.  But the delayed processing will end up
      ignoring delayed refs from log replay because the inode itself wasn't
      put through the delayed code.
      
      This can end up triggering a warning at commit time:
      
      WARNING: CPU: 2 PID: 778 at fs/btrfs/delayed-inode.c:1410 btrfs_assert_delayed_root_empty+0x32/0x34()
      
      Which is repeated for each commit because we never process the delayed
      inode ref update.
      
      The fix used here is to change btrfs_delayed_delete_inode_ref to return
      an error if we're currently in log replay.  The caller will do the ref
      deletion immediately and everything will work properly.
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      cc: stable@vger.kernel.org # v3.18 and any stable series that picked 1d52c78a
      
      (cherry picked from commit 6f896054)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      e79b64e2
    • Thomas Petazzoni's avatar
      ARM: mvebu: disable I/O coherency on non-SMP situations on Armada 370/375/38x/XP · d7b5e1a4
      Thomas Petazzoni authored
      Enabling the hardware I/O coherency on Armada 370, Armada 375, Armada
      38x and Armada XP requires a certain number of conditions:
      
       - On Armada 370, the cache policy must be set to write-allocate.
      
       - On Armada 375, 38x and XP, the cache policy must be set to
         write-allocate, the pages must be mapped with the shareable
         attribute, and the SMP bit must be set
      
      Currently, on Armada XP, when CONFIG_SMP is enabled, those conditions
      are met. However, when Armada XP is used in a !CONFIG_SMP kernel, none
      of these conditions are met. With Armada 370, the situation is worse:
      since the processor is single core, regardless of whether CONFIG_SMP
      or !CONFIG_SMP is used, the cache policy will be set to write-back by
      the kernel and not write-allocate.
      
      Since solving this problem turns out to be quite complicated, and we
      don't want to let users with a mainline kernel known to have
      infrequent but existing data corruptions, this commit proposes to
      simply disable hardware I/O coherency in situations where it is known
      not to work.
      
      And basically, the is_smp() function of the kernel tells us whether it
      is OK to enable hardware I/O coherency or not, so this commit slightly
      refactors the coherency_type() function to return
      COHERENCY_FABRIC_TYPE_NONE when is_smp() is false, or the appropriate
      type of the coherency fabric in the other case.
      
      Thanks to this, the I/O coherency fabric will no longer be used at all
      in !CONFIG_SMP configurations. It will continue to be used in
      CONFIG_SMP configurations on Armada XP, Armada 375 and Armada 38x
      (which are multiple cores processors), but will no longer be used on
      Armada 370 (which is a single core processor).
      
      In the process, it simplifies the implementation of the
      coherency_type() function, and adds a missing call to of_node_put().
      Signed-off-by: default avatarThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Fixes: e60304f8 ("arm: mvebu: Add hardware I/O Coherency support")
      Cc: <stable@vger.kernel.org> # v3.8+
      Acked-by: default avatarGregory CLEMENT <gregory.clement@free-electrons.com>
      Link: https://lkml.kernel.org/r/1415871540-20302-3-git-send-email-thomas.petazzoni@free-electrons.comSigned-off-by: default avatarJason Cooper <jason@lakedaemon.net>
      
      (cherry picked from commit e5535545)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      d7b5e1a4
    • Rob Herring's avatar
      pstore-ram: Fix hangs by using write-combine mappings · f388c700
      Rob Herring authored
      Currently trying to use pstore on at least ARMs can hang as we're
      mapping the peristent RAM with pgprot_noncached().
      
      On ARMs, pgprot_noncached() will actually make the memory strongly
      ordered, and as the atomic operations pstore uses are implementation
      defined for strongly ordered memory, they may not work. So basically
      atomic operations have undefined behavior on ARM for device or strongly
      ordered memory types.
      
      Let's fix the issue by using write-combine variants for mappings. This
      corresponds to normal, non-cacheable memory on ARM. For many other
      architectures, this change does not change the mapping type as by
      default we have:
      
      #define pgprot_writecombine pgprot_noncached
      
      The reason why pgprot_noncached() was originaly used for pstore
      is because Colin Cross <ccross@android.com> had observed lost
      debug prints right before a device hanging write operation on some
      systems. For the platforms supporting pgprot_noncached(), we can
      add a an optional configuration option to support that. But let's
      get pstore working first before adding new features.
      
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Anton Vorontsov <cbouatmailru@gmail.com>
      Cc: Colin Cross <ccross@android.com>
      Cc: Olof Johansson <olof@lixom.net>
      Cc: linux-kernel@vger.kernel.org
      Cc: stable@vger.kernel.org
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarRob Herring <rob.herring@calxeda.com>
      [tony@atomide.com: updated description]
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      
      (cherry picked from commit 7ae9cb81)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      f388c700
    • Kirill A. Shutemov's avatar
      thp: close race between split and zap huge pages · 33eff6f8
      Kirill A. Shutemov authored
      Sasha Levin has reported two THP BUGs[1][2].  I believe both of them
      have the same root cause.  Let's look to them one by one.
      
      The first bug[1] is "kernel BUG at mm/huge_memory.c:1829!".  It's
      BUG_ON(mapcount != page_mapcount(page)) in __split_huge_page().  From my
      testing I see that page_mapcount() is higher than mapcount here.
      
      I think it happens due to race between zap_huge_pmd() and
      page_check_address_pmd().  page_check_address_pmd() misses PMD which is
      under zap:
      
      	CPU0						CPU1
      						zap_huge_pmd()
      						  pmdp_get_and_clear()
      __split_huge_page()
        anon_vma_interval_tree_foreach()
          __split_huge_page_splitting()
            page_check_address_pmd()
              mm_find_pmd()
      	  /*
      	   * We check if PMD present without taking ptl: no
      	   * serialization against zap_huge_pmd(). We miss this PMD,
      	   * it's not accounted to 'mapcount' in __split_huge_page().
      	   */
      	  pmd_present(pmd) == 0
      
        BUG_ON(mapcount != page_mapcount(page)) // CRASH!!!
      
      						  page_remove_rmap(page)
      						    atomic_add_negative(-1, &page->_mapcount)
      
      The second bug[2] is "kernel BUG at mm/huge_memory.c:1371!".
      It's VM_BUG_ON_PAGE(!PageHead(page), page) in zap_huge_pmd().
      
      This happens in similar way:
      
      	CPU0						CPU1
      						zap_huge_pmd()
      						  pmdp_get_and_clear()
      						  page_remove_rmap(page)
      						    atomic_add_negative(-1, &page->_mapcount)
      __split_huge_page()
        anon_vma_interval_tree_foreach()
          __split_huge_page_splitting()
            page_check_address_pmd()
              mm_find_pmd()
      	  pmd_present(pmd) == 0	/* The same comment as above */
        /*
         * No crash this time since we already decremented page->_mapcount in
         * zap_huge_pmd().
         */
        BUG_ON(mapcount != page_mapcount(page))
      
        /*
         * We split the compound page here into small pages without
         * serialization against zap_huge_pmd()
         */
        __split_huge_page_refcount()
      						VM_BUG_ON_PAGE(!PageHead(page), page); // CRASH!!!
      
      So my understanding the problem is pmd_present() check in mm_find_pmd()
      without taking page table lock.
      
      The bug was introduced by me commit with commit 117b0791. Sorry for
      that. :(
      
      Let's open code mm_find_pmd() in page_check_address_pmd() and do the
      check under page table lock.
      
      Note that __page_check_address() does the same for PTE entires
      if sync != 0.
      
      I've stress tested split and zap code paths for 36+ hours by now and
      don't see crashes with the patch applied. Before it took <20 min to
      trigger the first bug and few hours for second one (if we ignore
      first).
      
      [1] https://lkml.kernel.org/g/<53440991.9090001@oracle.com>
      [2] https://lkml.kernel.org/g/<5310C56C.60709@oracle.com>
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Tested-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Cc: Bob Liu <lliubbo@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <stable@vger.kernel.org>	[3.13+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit b5a8cad3)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      33eff6f8
    • Peter Zijlstra's avatar
      perf/x86: Correctly use FEATURE_PDCM · c8191779
      Peter Zijlstra authored
      The current code simply assumes Intel Arch PerfMon v2+ to have
      the IA32_PERF_CAPABILITIES MSR; the SDM specifies that we should check
      CPUID[1].ECX[15] (aka, FEATURE_PDCM) instead.
      
      This was found by KVM which implements v2+ but didn't provide the
      capabilities MSR. Change the code to DTRT; KVM will also implement the
      MSR and return 0.
      
      Cc: pbonzini@redhat.com
      Reported-by: default avatar"Michael S. Tsirkin" <mst@redhat.com>
      Suggested-by: default avatarEduardo Habkost <ehabkost@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20140203132903.GI8874@twins.programming.kicks-ass.netSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      
      (cherry picked from commit c9b08884)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      c8191779
    • Hugh Dickins's avatar
      mm: let mm_find_pmd fix buggy race with THP fault · 4619ceab
      Hugh Dickins authored
      Trinity has reported:
      
          BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
          IP: __lock_acquire (kernel/locking/lockdep.c:3070 (discriminator 1))
          CPU: 6 PID: 16173 Comm: trinity-c364 Tainted: G        W
                                  3.15.0-rc1-next-20140415-sasha-00020-gaa90d09 #398
          lock_acquire (arch/x86/include/asm/current.h:14
                        kernel/locking/lockdep.c:3602)
          _raw_spin_lock (include/linux/spinlock_api_smp.h:143
                          kernel/locking/spinlock.c:151)
          remove_migration_pte (mm/migrate.c:137)
          rmap_walk (mm/rmap.c:1628 mm/rmap.c:1699)
          remove_migration_ptes (mm/migrate.c:224)
          migrate_pages (mm/migrate.c:922 mm/migrate.c:960 mm/migrate.c:1126)
          migrate_misplaced_page (mm/migrate.c:1733)
          __handle_mm_fault (mm/memory.c:3762 mm/memory.c:3812 mm/memory.c:3925)
          handle_mm_fault (mm/memory.c:3948)
          __get_user_pages (mm/memory.c:1851)
          __mlock_vma_pages_range (mm/mlock.c:255)
          __mm_populate (mm/mlock.c:711)
          SyS_mlockall (include/linux/mm.h:1799 mm/mlock.c:817 mm/mlock.c:791)
      
      I believe this comes about because, whereas collapsing and splitting THP
      functions take anon_vma lock in write mode (which excludes concurrent
      rmap walks), faulting THP functions (write protection and misplaced
      NUMA) do not - and mostly they do not need to.
      
      But they do use a pmdp_clear_flush(), set_pmd_at() sequence which, for
      an instant (indeed, for a long instant, given the inter-CPU TLB flush in
      there), leaves *pmd neither present not trans_huge.
      
      Which can confuse a concurrent rmap walk, as when removing migration
      ptes, seen in the dumped trace.  Although that rmap walk has a 4k page
      to insert, anon_vmas containing THPs are in no way segregated from
      4k-page anon_vmas, so the 4k-intent mm_find_pmd() does need to cope with
      that instant when a trans_huge pmd is temporarily absent.
      
      I don't think we need strengthen the locking at the THP end: it's easily
      handled with an ACCESS_ONCE() before testing both conditions.
      
      And since mm_find_pmd() had only one caller who wanted a THP rather than
      a pmd, let's slightly repurpose it to fail when it hits a THP or
      non-present pmd, and open code split_huge_page_address() again.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Christoph Lameter <cl@gentwo.org>
      Cc: Dave Jones <davej@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit f72e7dcd)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      4619ceab
    • Johan Hovold's avatar
      mfd: viperboard: Fix platform-device id collision · 7812dfbe
      Johan Hovold authored
      Allow more than one viperboard to be connected by registering with
      PLATFORM_DEVID_AUTO instead of PLATFORM_DEVID_NONE.
      
      The subdevices are currently registered with PLATFORM_DEVID_NONE, which
      will cause a name collision on the platform bus when a second viperboard
      is plugged in:
      
      viperboard 1-2.4:1.0: version 0.00 found at bus 001 address 004
      ------------[ cut here ]------------
      WARNING: CPU: 0 PID: 181 at /home/johan/work/omicron/src/linux/fs/sysfs/dir.c:31 sysfs_warn_dup+0x74/0x84()
      sysfs: cannot create duplicate filename '/bus/platform/devices/viperboard-gpio'
      Modules linked in: i2c_viperboard viperboard netconsole [last unloaded: viperboard]
      CPU: 0 PID: 181 Comm: bash Tainted: G        W      3.17.0-rc6 #1
      [<c0016bf4>] (unwind_backtrace) from [<c0013860>] (show_stack+0x20/0x24)
      [<c0013860>] (show_stack) from [<c04305f8>] (dump_stack+0x24/0x28)
      [<c04305f8>] (dump_stack) from [<c0040fb4>] (warn_slowpath_common+0x80/0x98)
      [<c0040fb4>] (warn_slowpath_common) from [<c004100c>] (warn_slowpath_fmt+0x40/0x48)
      [<c004100c>] (warn_slowpath_fmt) from [<c016f1bc>] (sysfs_warn_dup+0x74/0x84)
      [<c016f1bc>] (sysfs_warn_dup) from [<c016f548>] (sysfs_do_create_link_sd.isra.2+0xcc/0xd0)
      [<c016f548>] (sysfs_do_create_link_sd.isra.2) from [<c016f588>] (sysfs_create_link+0x3c/0x48)
      [<c016f588>] (sysfs_create_link) from [<c02867ec>] (bus_add_device+0x12c/0x1e0)
      [<c02867ec>] (bus_add_device) from [<c0284820>] (device_add+0x410/0x584)
      [<c0284820>] (device_add) from [<c0289440>] (platform_device_add+0xd8/0x26c)
      [<c0289440>] (platform_device_add) from [<c02a5ae4>] (mfd_add_device+0x240/0x344)
      [<c02a5ae4>] (mfd_add_device) from [<c02a5ce0>] (mfd_add_devices+0xb8/0x110)
      [<c02a5ce0>] (mfd_add_devices) from [<bf00d1c8>] (vprbrd_probe+0x160/0x1b0 [viperboard])
      [<bf00d1c8>] (vprbrd_probe [viperboard]) from [<c030c000>] (usb_probe_interface+0x1bc/0x2a8)
      [<c030c000>] (usb_probe_interface) from [<c028768c>] (driver_probe_device+0x14c/0x3ac)
      [<c028768c>] (driver_probe_device) from [<c02879e4>] (__driver_attach+0xa4/0xa8)
      [<c02879e4>] (__driver_attach) from [<c0285698>] (bus_for_each_dev+0x70/0xa4)
      [<c0285698>] (bus_for_each_dev) from [<c0287030>] (driver_attach+0x2c/0x30)
      [<c0287030>] (driver_attach) from [<c030a288>] (usb_store_new_id+0x170/0x1ac)
      [<c030a288>] (usb_store_new_id) from [<c030a2f8>] (new_id_store+0x34/0x3c)
      [<c030a2f8>] (new_id_store) from [<c02853ec>] (drv_attr_store+0x30/0x3c)
      [<c02853ec>] (drv_attr_store) from [<c016eaa8>] (sysfs_kf_write+0x5c/0x60)
      [<c016eaa8>] (sysfs_kf_write) from [<c016dc68>] (kernfs_fop_write+0xd4/0x194)
      [<c016dc68>] (kernfs_fop_write) from [<c010fe40>] (vfs_write+0xb4/0x1c0)
      [<c010fe40>] (vfs_write) from [<c01104a8>] (SyS_write+0x4c/0xa0)
      [<c01104a8>] (SyS_write) from [<c000f900>] (ret_fast_syscall+0x0/0x48)
      ---[ end trace 98e8603c22d65817 ]---
      viperboard 1-2.4:1.0: Failed to add mfd devices to core.
      viperboard: probe of 1-2.4:1.0 failed with error -17
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      
      (cherry picked from commit b6684228)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      7812dfbe
    • Linus Walleij's avatar
      mfd: stmpe: Fix STMPE24xx GPMR LSB · 3f81d65f
      Linus Walleij authored
      The least significat byte of the GPIO value read register
      on the STMPE24xx series is on addres 0xA4 not 0xA5. Correct
      against datasheet and tested on the STMPE2401 hardware.
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      
      (cherry picked from commit 871c3cf4)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      3f81d65f
    • Eric W. Biederman's avatar
      umount: Disallow unprivileged mount force · 3e58285f
      Eric W. Biederman authored
      Forced unmount affects not just the mount namespace but the underlying
      superblock as well.  Restrict forced unmount to the global root user
      for now.  Otherwise it becomes possible a user in a less privileged
      mount namespace to force the shutdown of a superblock of a filesystem
      in a more privileged mount namespace, allowing a DOS attack on root.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      
      (cherry picked from commit b2f5d4dc)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      3e58285f
    • Francesco Ruggeri's avatar
      tty: Fix pty master poll() after slave closes v2 · 47bbcc35
      Francesco Ruggeri authored
      commit c4dc3046 upstream.
      
      Commit f95499c3 ("n_tty: Don't wait for buffer work in read() loop")
      introduces a race window where a pty master can be signalled that the pty
      slave was closed before all the data that the slave wrote is delivered.
      Commit f8747d4a ("tty: Fix pty master read() after slave closes") fixed the
      problem in case of n_tty_read, but the problem still exists for n_tty_poll.
      This can be seen by running 'for ((i=0; i<100;i++));do ./test.py ;done'
      where test.py is:
      
      import os, select, pty
      
      (pid, pty_fd) = pty.fork()
      
      if pid == 0:
         os.write(1, 'This string should be received by parent')
      else:
         poller = select.epoll()
         poller.register( pty_fd, select.EPOLLIN )
         ready = poller.poll( 1 * 1000 )
         for fd, events in ready:
            if not events & select.EPOLLIN:
               print 'missed POLLIN event'
            else:
               print os.read(fd, 100)
         poller.close()
      
      The string from the slave is missed several times.
      This patch takes the same approach as the fix for read and special cases
      this condition for poll.
      Tested on 3.16.
      Signed-off-by: default avatarFrancesco Ruggeri <fruggeri@arista.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      (cherry picked from commit 6b96727b)
      47bbcc35