1. 19 Jan, 2017 24 commits
    • Wanpeng Li's avatar
      KVM: eventfd: fix NULL deref irqbypass consumer · 7caf473f
      Wanpeng Li authored
      commit 4f3dbdf4 upstream.
      
      Reported syzkaller:
      
          BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
          IP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass]
          PGD 0
      
          Oops: 0002 [#1] SMP
          CPU: 1 PID: 125 Comm: kworker/1:1 Not tainted 4.9.0+ #1
          Workqueue: kvm-irqfd-cleanup irqfd_shutdown [kvm]
          task: ffff9bbe0dfbb900 task.stack: ffffb61802014000
          RIP: 0010:irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass]
          Call Trace:
           irqfd_shutdown+0x66/0xa0 [kvm]
           process_one_work+0x16b/0x480
           worker_thread+0x4b/0x500
           kthread+0x101/0x140
           ? process_one_work+0x480/0x480
           ? kthread_create_on_node+0x60/0x60
           ret_from_fork+0x25/0x30
          RIP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass] RSP: ffffb61802017e20
          CR2: 0000000000000008
      
      The syzkaller folks reported a NULL pointer dereference that due to
      unregister an consumer which fails registration before. The syzkaller
      creates two VMs w/ an equal eventfd occasionally. So the second VM
      fails to register an irqbypass consumer. It will make irqfd as inactive
      and queue an workqueue work to shutdown irqfd and unregister the irqbypass
      consumer when eventfd is closed. However, the second consumer has been
      initialized though it fails registration. So the token(same as the first
      VM's) is taken to unregister the consumer through the workqueue, the
      consumer of the first VM is found and unregistered, then NULL deref incurred
      in the path of deleting consumer from the consumers list.
      
      This patch fixes it by making irq_bypass_register/unregister_consumer()
      looks for the consumer entry based on consumer pointer itself instead of
      token matching.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Suggested-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7caf473f
    • Paolo Bonzini's avatar
      KVM: x86: fix emulation of "MOV SS, null selector" · 7718ffcf
      Paolo Bonzini authored
      commit 33ab9110 upstream.
      
      This is CVE-2017-2583.  On Intel this causes a failed vmentry because
      SS's type is neither 3 nor 7 (even though the manual says this check is
      only done for usable SS, and the dmesg splat says that SS is unusable!).
      On AMD it's worse: svm.c is confused and sets CPL to 0 in the vmcb.
      
      The fix fabricates a data segment descriptor when SS is set to a null
      selector, so that CPL and SS.DPL are set correctly in the VMCS/vmcb.
      Furthermore, only allow setting SS to a NULL selector if SS.RPL < 3;
      this in turn ensures CPL < 3 because RPL must be equal to CPL.
      
      Thanks to Andy Lutomirski and Willy Tarreau for help in analyzing
      the bug and deciphering the manuals.
      Reported-by: default avatarXiaohan Zhang <zhangxiaohan1@huawei.com>
      Fixes: 79d5b4c3Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7718ffcf
    • Mike Kravetz's avatar
      mm/hugetlb.c: fix reservation race when freeing surplus pages · 1e26cec6
      Mike Kravetz authored
      commit e5bbc8a6 upstream.
      
      return_unused_surplus_pages() decrements the global reservation count,
      and frees any unused surplus pages that were backing the reservation.
      
      Commit 7848a4bf ("mm/hugetlb.c: add cond_resched_lock() in
      return_unused_surplus_pages()") added a call to cond_resched_lock in the
      loop freeing the pages.
      
      As a result, the hugetlb_lock could be dropped, and someone else could
      use the pages that will be freed in subsequent iterations of the loop.
      This could result in inconsistent global hugetlb page state, application
      api failures (such as mmap) failures or application crashes.
      
      When dropping the lock in return_unused_surplus_pages, make sure that
      the global reservation count (resv_huge_pages) remains sufficiently
      large to prevent someone else from claiming pages about to be freed.
      
      Analyzed by Paul Cassella.
      
      Fixes: 7848a4bf ("mm/hugetlb.c: add cond_resched_lock() in return_unused_surplus_pages()")
      Link: http://lkml.kernel.org/r/1483991767-6879-1-git-send-email-mike.kravetz@oracle.comSigned-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reported-by: default avatarPaul Cassella <cassella@cray.com>
      Suggested-by: default avatarMichal Hocko <mhocko@kernel.org>
      Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1e26cec6
    • John Sperbeck's avatar
      mm/slab.c: fix SLAB freelist randomization duplicate entries · 8315c22e
      John Sperbeck authored
      commit c4e490cf upstream.
      
      This patch fixes a bug in the freelist randomization code.  When a high
      random number is used, the freelist will contain duplicate entries.  It
      will result in different allocations sharing the same chunk.
      
      It will result in odd behaviours and crashes.  It should be uncommon but
      it depends on the machines.  We saw it happening more often on some
      machines (every few hours of running tests).
      
      Fixes: c7ce4f60 ("mm: SLAB freelist randomization")
      Link: http://lkml.kernel.org/r/20170103181908.143178-1-thgarnie@google.comSigned-off-by: default avatarJohn Sperbeck <jsperbeck@google.com>
      Signed-off-by: default avatarThomas Garnier <thgarnie@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8315c22e
    • Minchan Kim's avatar
      mm: support anonymous stable page · 6ca29ee3
      Minchan Kim authored
      commit f0571429 upstream.
      
      During developemnt for zram-swap asynchronous writeback, I found strange
      corruption of compressed page, resulting in:
      
        Modules linked in: zram(E)
        CPU: 3 PID: 1520 Comm: zramd-1 Tainted: G            E   4.8.0-mm1-00320-ge0d4894c9c38-dirty #3274
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
        task: ffff88007620b840 task.stack: ffff880078090000
        RIP: set_freeobj.part.43+0x1c/0x1f
        RSP: 0018:ffff880078093ca8  EFLAGS: 00010246
        RAX: 0000000000000018 RBX: ffff880076798d88 RCX: ffffffff81c408c8
        RDX: 0000000000000018 RSI: 0000000000000000 RDI: 0000000000000246
        RBP: ffff880078093cb0 R08: 0000000000000000 R09: 0000000000000000
        R10: ffff88005bc43030 R11: 0000000000001df3 R12: ffff880076798d88
        R13: 000000000005bc43 R14: ffff88007819d1b8 R15: 0000000000000001
        FS:  0000000000000000(0000) GS:ffff88007e380000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007fc934048f20 CR3: 0000000077b01000 CR4: 00000000000406e0
        Call Trace:
          obj_malloc+0x22b/0x260
          zs_malloc+0x1e4/0x580
          zram_bvec_rw+0x4cd/0x830 [zram]
          page_requests_rw+0x9c/0x130 [zram]
          zram_thread+0xe6/0x173 [zram]
          kthread+0xca/0xe0
          ret_from_fork+0x25/0x30
      
      With investigation, it reveals currently stable page doesn't support
      anonymous page.  IOW, reuse_swap_page can reuse the page without waiting
      writeback completion so it can overwrite page zram is compressing.
      
      Unfortunately, zram has used per-cpu stream feature from v4.7.
      It aims for increasing cache hit ratio of scratch buffer for
      compressing. Downside of that approach is that zram should ask
      memory space for compressed page in per-cpu context which requires
      stricted gfp flag which could be failed. If so, it retries to
      allocate memory space out of per-cpu context so it could get memory
      this time and compress the data again, copies it to the memory space.
      
      In this scenario, zram assumes the data should never be changed
      but it is not true unless stable page supports. So, If the data is
      changed under us, zram can make buffer overrun because second
      compression size could be bigger than one we got in previous trial
      and blindly, copy bigger size object to smaller buffer which is
      buffer overrun. The overrun breaks zsmalloc free object chaining
      so system goes crash like above.
      
      I think below is same problem.
      https://bugzilla.suse.com/show_bug.cgi?id=997574
      
      Unfortunately, reuse_swap_page should be atomic so that we cannot wait on
      writeback in there so the approach in this patch is simply return false if
      we found it needs stable page.  Although it increases memory footprint
      temporarily, it happens rarely and it should be reclaimed easily althoug
      it happened.  Also, It would be better than waiting of IO completion,
      which is critial path for application latency.
      
      Fixes: da9556a2 ("zram: user per-cpu compression streams")
      Link: http://lkml.kernel.org/r/20161120233015.GA14113@bbox
      Link: http://lkml.kernel.org/r/1482366980-3782-2-git-send-email-minchan@kernel.orgSigned-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Hyeoncheol Lee <cheol.lee@lge.com>
      Cc: <yjay.kim@lge.com>
      Cc: Sangseok Lee <sangseok.lee@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6ca29ee3
    • Michal Hocko's avatar
      mm, memcg: fix the active list aging for lowmem requests when memcg is enabled · 07fc9575
      Michal Hocko authored
      commit b4536f0c upstream.
      
      Nils Holland and Klaus Ethgen have reported unexpected OOM killer
      invocations with 32b kernel starting with 4.8 kernels
      
      	kworker/u4:5 invoked oom-killer: gfp_mask=0x2400840(GFP_NOFS|__GFP_NOFAIL), nodemask=0, order=0, oom_score_adj=0
      	kworker/u4:5 cpuset=/ mems_allowed=0
      	CPU: 1 PID: 2603 Comm: kworker/u4:5 Not tainted 4.9.0-gentoo #2
      	[...]
      	Mem-Info:
      	active_anon:58685 inactive_anon:90 isolated_anon:0
      	 active_file:274324 inactive_file:281962 isolated_file:0
      	 unevictable:0 dirty:649 writeback:0 unstable:0
      	 slab_reclaimable:40662 slab_unreclaimable:17754
      	 mapped:7382 shmem:202 pagetables:351 bounce:0
      	 free:206736 free_pcp:332 free_cma:0
      	Node 0 active_anon:234740kB inactive_anon:360kB active_file:1097296kB inactive_file:1127848kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:29528kB dirty:2596kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 184320kB anon_thp: 808kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
      	DMA free:3952kB min:788kB low:984kB high:1180kB active_anon:0kB inactive_anon:0kB active_file:7316kB inactive_file:0kB unevictable:0kB writepending:96kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:3200kB slab_unreclaimable:1408kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
      	lowmem_reserve[]: 0 813 3474 3474
      	Normal free:41332kB min:41368kB low:51708kB high:62048kB active_anon:0kB inactive_anon:0kB active_file:532748kB inactive_file:44kB unevictable:0kB writepending:24kB present:897016kB managed:836248kB mlocked:0kB slab_reclaimable:159448kB slab_unreclaimable:69608kB kernel_stack:1112kB pagetables:1404kB bounce:0kB free_pcp:528kB local_pcp:340kB free_cma:0kB
      	lowmem_reserve[]: 0 0 21292 21292
      	HighMem free:781660kB min:512kB low:34356kB high:68200kB active_anon:234740kB inactive_anon:360kB active_file:557232kB inactive_file:1127804kB unevictable:0kB writepending:2592kB present:2725384kB managed:2725384kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:800kB local_pcp:608kB free_cma:0kB
      
      the oom killer is clearly pre-mature because there there is still a lot
      of page cache in the zone Normal which should satisfy this lowmem
      request.  Further debugging has shown that the reclaim cannot make any
      forward progress because the page cache is hidden in the active list
      which doesn't get rotated because inactive_list_is_low is not memcg
      aware.
      
      The code simply subtracts per-zone highmem counters from the respective
      memcg's lru sizes which doesn't make any sense.  We can simply end up
      always seeing the resulting active and inactive counts 0 and return
      false.  This issue is not limited to 32b kernels but in practice the
      effect on systems without CONFIG_HIGHMEM would be much harder to notice
      because we do not invoke the OOM killer for allocations requests
      targeting < ZONE_NORMAL.
      
      Fix the issue by tracking per zone lru page counts in mem_cgroup_per_node
      and subtract per-memcg highmem counts when memcg is enabled.  Introduce
      helper lruvec_zone_lru_size which redirects to either zone counters or
      mem_cgroup_get_zone_lru_size when appropriate.
      
      We are losing empty LRU but non-zero lru size detection introduced by
      ca707239 ("mm: update_lru_size warn and reset bad lru_size") because
      of the inherent zone vs. node discrepancy.
      
      Fixes: f8d1a311 ("mm: consider whether to decivate based on eligible zones inactive ratio")
      Link: http://lkml.kernel.org/r/20170104100825.3729-1-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarNils Holland <nholland@tisys.org>
      Tested-by: default avatarNils Holland <nholland@tisys.org>
      Reported-by: default avatarKlaus Ethgen <Klaus@Ethgen.de>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarVladimir Davydov <vdavydov.dev@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      07fc9575
    • Eric Ren's avatar
      ocfs2: fix crash caused by stale lvb with fsdlm plugin · 6c9bd81c
      Eric Ren authored
      commit e7ee2c08 upstream.
      
      The crash happens rather often when we reset some cluster nodes while
      nodes contend fiercely to do truncate and append.
      
      The crash backtrace is below:
      
         dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover_grant 1 locks on 971 resources
         dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover 9 generation 5 done: 4 ms
         ocfs2: Begin replay journal (node 318952601, slot 2) on device (253,18)
         ocfs2: End replay journal (node 318952601, slot 2) on device (253,18)
         ocfs2: Beginning quota recovery on device (253,18) for slot 2
         ocfs2: Finishing quota recovery on device (253,18) for slot 2
         (truncate,30154,1):ocfs2_truncate_file:470 ERROR: bug expression: le64_to_cpu(fe->i_size) != i_size_read(inode)
         (truncate,30154,1):ocfs2_truncate_file:470 ERROR: Inode 290321, inode i_size = 732 != di i_size = 937, i_flags = 0x1
         ------------[ cut here ]------------
         kernel BUG at /usr/src/linux/fs/ocfs2/file.c:470!
         invalid opcode: 0000 [#1] SMP
         Modules linked in: ocfs2_stack_user(OEN) ocfs2(OEN) ocfs2_nodemanager ocfs2_stackglue(OEN) quota_tree dlm(OEN) configfs fuse sd_mod    iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi af_packet iscsi_ibft iscsi_boot_sysfs softdog xfs libcrc32c ppdev parport_pc pcspkr parport      joydev virtio_balloon virtio_net i2c_piix4 acpi_cpufreq button processor ext4 crc16 jbd2 mbcache ata_generic cirrus virtio_blk ata_piix               drm_kms_helper ahci syscopyarea libahci sysfillrect sysimgblt fb_sys_fops ttm floppy libata drm virtio_pci virtio_ring uhci_hcd virtio ehci_hcd       usbcore serio_raw usb_common sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
         Supported: No, Unsupported modules are loaded
         CPU: 1 PID: 30154 Comm: truncate Tainted: G           OE   N  4.4.21-69-default #1
         Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20151112_172657-sheep25 04/01/2014
         task: ffff88004ff6d240 ti: ffff880074e68000 task.ti: ffff880074e68000
         RIP: 0010:[<ffffffffa05c8c30>]  [<ffffffffa05c8c30>] ocfs2_truncate_file+0x640/0x6c0 [ocfs2]
         RSP: 0018:ffff880074e6bd50  EFLAGS: 00010282
         RAX: 0000000000000074 RBX: 000000000000029e RCX: 0000000000000000
         RDX: 0000000000000001 RSI: 0000000000000246 RDI: 0000000000000246
         RBP: ffff880074e6bda8 R08: 000000003675dc7a R09: ffffffff82013414
         R10: 0000000000034c50 R11: 0000000000000000 R12: ffff88003aab3448
         R13: 00000000000002dc R14: 0000000000046e11 R15: 0000000000000020
         FS:  00007f839f965700(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000
         CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
         CR2: 00007f839f97e000 CR3: 0000000036723000 CR4: 00000000000006e0
         Call Trace:
           ocfs2_setattr+0x698/0xa90 [ocfs2]
           notify_change+0x1ae/0x380
           do_truncate+0x5e/0x90
           do_sys_ftruncate.constprop.11+0x108/0x160
           entry_SYSCALL_64_fastpath+0x12/0x6d
         Code: 24 28 ba d6 01 00 00 48 c7 c6 30 43 62 a0 8b 41 2c 89 44 24 08 48 8b 41 20 48 c7 c1 78 a3 62 a0 48 89 04 24 31 c0 e8 a0 97 f9 ff <0f> 0b 3d 00 fe ff ff 0f 84 ab fd ff ff 83 f8 fc 0f 84 a2 fd ff
         RIP  [<ffffffffa05c8c30>] ocfs2_truncate_file+0x640/0x6c0 [ocfs2]
      
      It's because ocfs2_inode_lock() get us stale LVB in which the i_size is
      not equal to the disk i_size.  We mistakenly trust the LVB because the
      underlaying fsdlm dlm_lock() doesn't set lkb_sbflags with
      DLM_SBF_VALNOTVALID properly for us.  But, why?
      
      The current code tries to downconvert lock without DLM_LKF_VALBLK flag
      to tell o2cb don't update RSB's LVB if it's a PR->NULL conversion, even
      if the lock resource type needs LVB.  This is not the right way for
      fsdlm.
      
      The fsdlm plugin behaves different on DLM_LKF_VALBLK, it depends on
      DLM_LKF_VALBLK to decide if we care about the LVB in the LKB.  If
      DLM_LKF_VALBLK is not set, fsdlm will skip recovering RSB's LVB from
      this lkb and set the right DLM_SBF_VALNOTVALID appropriately when node
      failure happens.
      
      The following diagram briefly illustrates how this crash happens:
      
      RSB1 is inode metadata lock resource with LOCK_TYPE_USES_LVB;
      
      The 1st round:
      
                   Node1                                    Node2
      RSB1: PR
                                                        RSB1(master): NULL->EX
      ocfs2_downconvert_lock(PR->NULL, set_lvb==0)
        ocfs2_dlm_lock(no DLM_LKF_VALBLK)
      
      - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
      
      dlm_lock(no DLM_LKF_VALBLK)
        convert_lock(overwrite lkb->lkb_exflags
                     with no DLM_LKF_VALBLK)
      
      RSB1: NULL                                        RSB1: EX
                                                        reset Node2
      dlm_recover_rsbs()
        recover_lvb()
      
      /* The LVB is not trustable if the node with EX fails and
       * no lock >= PR is left. We should set RSB_VALNOTVALID for RSB1.
       */
      
       if(!(kb_exflags & DLM_LKF_VALBLK)) /* This means we miss the chance to
                 return;                   * to invalid the LVB here.
                                           */
      
      The 2nd round:
      
               Node 1                                Node2
      RSB1(become master from recovery)
      
      ocfs2_setattr()
        ocfs2_inode_lock(NULL->EX)
          /* dlm_lock() return the stale lvb without setting DLM_SBF_VALNOTVALID */
          ocfs2_meta_lvb_is_trustable() return 1 /* so we don't refresh inode from disk */
        ocfs2_truncate_file()
            mlog_bug_on_msg(disk isize != i_size_read(inode))  /* crash! */
      
      The fix is quite straightforward.  We keep to set DLM_LKF_VALBLK flag
      for dlm_lock() if the lock resource type needs LVB and the fsdlm plugin
      is uesed.
      
      Link: http://lkml.kernel.org/r/1481275846-6604-1-git-send-email-zren@suse.comSigned-off-by: default avatarEric Ren <zren@suse.com>
      Reviewed-by: default avatarJoseph Qi <jiangqi903@gmail.com>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6c9bd81c
    • Dan Williams's avatar
      mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done} · 692755b1
      Dan Williams authored
      commit f931ab47 upstream.
      
      Both arch_add_memory() and arch_remove_memory() expect a single threaded
      context.
      
      For example, arch/x86/mm/init_64.c::kernel_physical_mapping_init() does
      not hold any locks over this check and branch:
      
          if (pgd_val(*pgd)) {
          	pud = (pud_t *)pgd_page_vaddr(*pgd);
          	paddr_last = phys_pud_init(pud, __pa(vaddr),
          				   __pa(vaddr_end),
          				   page_size_mask);
          	continue;
          }
      
          pud = alloc_low_page();
          paddr_last = phys_pud_init(pud, __pa(vaddr), __pa(vaddr_end),
          			   page_size_mask);
      
      The result is that two threads calling devm_memremap_pages()
      simultaneously can end up colliding on pgd initialization.  This leads
      to crash signatures like the following where the loser of the race
      initializes the wrong pgd entry:
      
          BUG: unable to handle kernel paging request at ffff888ebfff0000
          IP: memcpy_erms+0x6/0x10
          PGD 2f8e8fc067 PUD 0 /* <---- Invalid PUD */
          Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
          CPU: 54 PID: 3818 Comm: systemd-udevd Not tainted 4.6.7+ #13
          task: ffff882fac290040 ti: ffff882f887a4000 task.ti: ffff882f887a4000
          RIP: memcpy_erms+0x6/0x10
          [..]
          Call Trace:
            ? pmem_do_bvec+0x205/0x370 [nd_pmem]
            ? blk_queue_enter+0x3a/0x280
            pmem_rw_page+0x38/0x80 [nd_pmem]
            bdev_read_page+0x84/0xb0
      
      Hold the standard memory hotplug mutex over calls to
      arch_{add,remove}_memory().
      
      Fixes: 41e94a85 ("add devm_memremap_pages")
      Link: http://lkml.kernel.org/r/148357647831.9498.12606007370121652979.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      692755b1
    • Minchan Kim's avatar
      mm: pmd dirty emulation in page fault handler · 8edd365e
      Minchan Kim authored
      commit 20f664aa upstream.
      
      Andreas reported [1] made a test in jemalloc hang in THP mode in arm64:
      
        http://lkml.kernel.org/r/mvmmvfy37g1.fsf@hawking.suse.de
      
      The problem is currently page fault handler doesn't supports dirty bit
      emulation of pmd for non-HW dirty-bit architecture so that application
      stucks until VM marked the pmd dirty.
      
      How the emulation work depends on the architecture.  In case of arm64,
      when it set up pte firstly, it sets pte PTE_RDONLY to get a chance to
      mark the pte dirty via triggering page fault when store access happens.
      Once the page fault occurs, VM marks the pmd dirty and arch code for
      setting pmd will clear PTE_RDONLY for application to proceed.
      
      IOW, if VM doesn't mark the pmd dirty, application hangs forever by
      repeated fault(i.e., store op but the pmd is PTE_RDONLY).
      
      This patch enables pmd dirty-bit emulation for those architectures.
      
      [1] b8d3c4c3, mm/huge_memory.c: don't split THP page when MADV_FREE syscall is called
      
      Fixes: b8d3c4c3 ("mm/huge_memory.c: don't split THP page when MADV_FREE syscall is called")
      Link: http://lkml.kernel.org/r/1482506098-6149-1-git-send-email-minchan@kernel.orgSigned-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Reported-by: default avatarAndreas Schwab <schwab@suse.de>
      Tested-by: default avatarAndreas Schwab <schwab@suse.de>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Jason Evans <je@fb.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8edd365e
    • Ross Zwisler's avatar
      dax: fix deadlock with DAX 4k holes · 87fa6f37
      Ross Zwisler authored
      commit 965d004a upstream.
      
      Currently in DAX if we have three read faults on the same hole address we
      can end up with the following:
      
      Thread 0		Thread 1		Thread 2
      --------		--------		--------
      dax_iomap_fault
       grab_mapping_entry
        lock_slot
         <locks empty DAX entry>
      
        			dax_iomap_fault
      			 grab_mapping_entry
      			  get_unlocked_mapping_entry
      			   <sleeps on empty DAX entry>
      
      						dax_iomap_fault
      						 grab_mapping_entry
      						  get_unlocked_mapping_entry
      						   <sleeps on empty DAX entry>
        dax_load_hole
         find_or_create_page
         ...
          page_cache_tree_insert
           dax_wake_mapping_entry_waiter
            <wakes one sleeper>
           __radix_tree_replace
            <swaps empty DAX entry with 4k zero page>
      
      			<wakes>
      			get_page
      			lock_page
      			...
      			put_locked_mapping_entry
      			unlock_page
      			put_page
      
      						<sleeps forever on the DAX
      						 wait queue>
      
      The crux of the problem is that once we insert a 4k zero page, all
      locking from then on is done in terms of that 4k zero page and any
      additional threads sleeping on the empty DAX entry will never be woken.
      
      Fix this by waking all sleepers when we replace the DAX radix tree entry
      with a 4k zero page.  This will allow all sleeping threads to
      successfully transition from locking based on the DAX empty entry to
      locking on the 4k zero page.
      
      With the test case reported by Xiong this happens very regularly in my
      test setup, with some runs resulting in 9+ threads in this deadlocked
      state.  With this fix I've been able to run that same test dozens of
      times in a loop without issue.
      
      Fixes: ac401cc7 ("dax: New fault locking")
      Link: http://lkml.kernel.org/r/1483479365-13607-1-git-send-email-ross.zwisler@linux.intel.comSigned-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Reported-by: default avatarXiong Zhou <xzhou@redhat.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      87fa6f37
    • Minchan Kim's avatar
      zram: support BDI_CAP_STABLE_WRITES · 2e264fb5
      Minchan Kim authored
      commit b09ab054 upstream.
      
      zram has used per-cpu stream feature from v4.7.  It aims for increasing
      cache hit ratio of scratch buffer for compressing.  Downside of that
      approach is that zram should ask memory space for compressed page in
      per-cpu context which requires stricted gfp flag which could be failed.
      If so, it retries to allocate memory space out of per-cpu context so it
      could get memory this time and compress the data again, copies it to the
      memory space.
      
      In this scenario, zram assumes the data should never be changed but it is
      not true without stable page support.  So, If the data is changed under
      us, zram can make buffer overrun so that zsmalloc free object chain is
      broken so system goes crash like below
      
         https://bugzilla.suse.com/show_bug.cgi?id=997574
      
      This patch adds BDI_CAP_STABLE_WRITES to zram for declaring "I am block
      device needing *stable write*".
      
      Fixes: da9556a2 ("zram: user per-cpu compression streams")
      Link: http://lkml.kernel.org/r/1482366980-3782-4-git-send-email-minchan@kernel.orgSigned-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Reviewed-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Hyeoncheol Lee <cheol.lee@lge.com>
      Cc: <yjay.kim@lge.com>
      Cc: Sangseok Lee <sangseok.lee@lge.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2e264fb5
    • Minchan Kim's avatar
      zram: revalidate disk under init_lock · ad4764b4
      Minchan Kim authored
      commit e7ccfc4c upstream.
      
      Commit b4c5c609 ("zram: avoid lockdep splat by revalidate_disk")
      moved revalidate_disk call out of init_lock to avoid lockdep
      false-positive splat.  However, commit 08eee69f ("zram: remove
      init_lock in zram_make_request") removed init_lock in IO path so there
      is no worry about lockdep splat.  So, let's restore it.
      
      This patch is needed to set BDI_CAP_STABLE_WRITES atomically in next
      patch.
      
      Fixes: da9556a2 ("zram: user per-cpu compression streams")
      Link: http://lkml.kernel.org/r/1482366980-3782-3-git-send-email-minchan@kernel.orgSigned-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Reviewed-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Hyeoncheol Lee <cheol.lee@lge.com>
      Cc: <yjay.kim@lge.com>
      Cc: Sangseok Lee <sangseok.lee@lge.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ad4764b4
    • Rolf Eike Beer's avatar
      selftests: do not require bash for the generated test · 057ac442
      Rolf Eike Beer authored
      commit a2b1e8a2 upstream.
      
      Nothing in this minimal script seems to require bash. We often run these
      tests on embedded devices where the only shell available is the busybox
      ash. Use sh instead.
      Signed-off-by: default avatarRolf Eike Beer <eb@emlix.com>
      Signed-off-by: default avatarShuah Khan <shuahkh@osg.samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      057ac442
    • Rolf Eike Beer's avatar
      selftests: do not require bash to run netsocktests testcase · 91ee732c
      Rolf Eike Beer authored
      commit 3659f98b upstream.
      
      Nothing in this minimal script seems to require bash. We often run these
      tests on embedded devices where the only shell available is the busybox
      ash. Use sh instead.
      Signed-off-by: default avatarRolf Eike Beer <eb@emlix.com>
      Signed-off-by: default avatarShuah Khan <shuahkh@osg.samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      91ee732c
    • Dan Carpenter's avatar
      drm/savage: dereferencing an error pointer · d65146c7
      Dan Carpenter authored
      commit f7741aa7 upstream.
      
      A recent cleanup changed the kmalloc() + copy_from_user() to
      memdup_user() but the error handling wasn't updated so we might call
      kfree(-EFAULT) and crash.
      
      Fixes: a6e3918b ('GPU-DRM-Savage: Use memdup_user() rather than duplicating')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161012062227.GU12841@mwandaSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d65146c7
    • Dan Carpenter's avatar
      drm/vc4: Fix a couple error codes in vc4_cl_lookup_bos() · c730a84a
      Dan Carpenter authored
      commit b2cdeb19 upstream.
      
      If the allocation fails the current code returns success.  If
      copy_from_user() fails it returns the number of bytes remaining instead
      of -EFAULT.
      
      Fixes: d5b1a78a ("drm/vc4: Add support for drawing 3D frames.")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarEric Anholt <eric@anholt.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c730a84a
    • Christophe Jaillet's avatar
      drm/tegra: dpaux: Fix error handling · a63bb198
      Christophe Jaillet authored
      commit 9376cad2 upstream.
      
      The devm_pinctrl_register() function returns an error pointer or a valid
      handle. So checking for NULL here is pointless and can never trigger.
      
      Check the returned value with IS_ERR instead and propagate this value as
      done in the other functions which call devm_pinctrl_register().
      
      Fixes: 0751bb5c ("drm/tegra: dpaux: Add pinctrl support")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Acked-by: default avatarJon Hunter <jonathanh@nvidia.com>
      Signed-off-by: default avatarThierry Reding <treding@nvidia.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a63bb198
    • Chen-Yu Tsai's avatar
      regulator: axp20x: Fix axp809 ldo_io registration error on cold boot · 6b94626c
      Chen-Yu Tsai authored
      commit 618c8089 upstream.
      
      The maximum supported voltage for ldo_io# is 3.3V, but on cold boot
      the selector comes up at 0x1f, which maps to 3.8V. This was previously
      corrected by Allwinner's U-boot, which set all regulators on the PMICs
      to some pre-configured voltage. With recent progress in U-boot SPL
      support, this is no longer the case. In any case we should handle
      this quirk in the kernel driver as well.
      
      This invalid setting causes _regulator_get_voltage() to fail with -EINVAL
      which causes regulator registration to fail when constrains are used:
      
      [    1.054181] vcc-pg: failed to get the current voltage(-22)
      [    1.059670] axp20x-regulator axp20x-regulator.0: Failed to register ldo_io0
      [    1.069749] axp20x-regulator: probe of axp20x-regulator.0 failed with error -22
      
      This commits makes the axp20x regulator driver accept the 0x1f register
      value, fixing this.
      
      The datasheet does not guarantee reliable operation above 3.3V, so on
      boards where this regulator is used the regulator-max-microvolt setting
      must be 3.3V or less.
      
      This is essentially the same as the commit f40d4896 ("regulator:
      axp20x: Fix axp22x ldo_io registration error on cold boot") for AXP22x
      PMICs.
      
      Fixes: a51f9f46 ("regulator: axp20x: support AXP809 variant")
      Signed-off-by: default avatarChen-Yu Tsai <wens@csie.org>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6b94626c
    • Andrew F. Davis's avatar
      regulator: tps65086: Fix 25mV ranges for BUCK regulators · 8ac055af
      Andrew F. Davis authored
      commit d8ca5bd1 upstream.
      
      The BUCK regulators 3, 4, and 5 also have a 10mV step mode,
      adjust the tables and logic to reflect the data-sheet for
      these regulators.
      
      fixes: d2a2e729 ("regulator: tps65086: Add regulator driver for the TPS65086 PMIC")
      Signed-off-by: default avatarAndrew F. Davis <afd@ti.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8ac055af
    • Niklas Söderlund's avatar
      pinctrl: sh-pfc: Add helper to handle bias lookup table · 92293368
      Niklas Söderlund authored
      commit c314c9f1 upstream.
      
      On some SoC there are no simple mapping of pins to bias register bits
      and a lookup table is needed. This logic is already implemented in some
      SoC specific drivers that could benefit from a generic implementation.
      
      Add helpers to deal with the lookup which later can be used by the SoC
      specific drivers. The logic used to lookup are different from the one it
      aims to replace, this is intentional. This new method reduces the memory
      consumption at the cost of increased CPU usage and fix a bug where a
      WARN() would incorrectly be triggered if the register offset is 0.
      Signed-off-by: default avatarNiklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
      Reviewed-by: default avatarLaurent Pinchart <laurent.pinchart@ideasonboard.com>
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      92293368
    • Niklas Söderlund's avatar
      pinctrl: sh-pfc: r8a7795: Use lookup function for bias data · 5e159522
      Niklas Söderlund authored
      commit d3b861bc upstream.
      
      There is a bug in the r8a7795 bias code where a WARN() is trigged
      anytime a pin from PUEN0/PUD0 is accessed.
      
       # cat /sys/kernel/debug/pinctrl/e6060000.pfc/pinconf-pins
      
       WARNING: CPU: 2 PID: 2391 at drivers/pinctrl/sh-pfc/pfc-r8a7795.c:5364 r8a7795_pinmux_get_bias+0xbc/0xc8
       [..]
       Call trace:
       [<ffff0000083c442c>] r8a7795_pinmux_get_bias+0xbc/0xc8
       [<ffff0000083c37f4>] sh_pfc_pinconf_get+0x194/0x270
       [<ffff0000083b0768>] pin_config_get_for_pin+0x20/0x30
       [<ffff0000083b11e8>] pinconf_generic_dump_one+0x168/0x188
       [<ffff0000083b144c>] pinconf_generic_dump_pins+0x5c/0x98
       [<ffff0000083b0628>] pinconf_pins_show+0xc8/0x128
       [<ffff0000081fe3bc>] seq_read+0x16c/0x420
       [<ffff00000831a110>] full_proxy_read+0x58/0x88
       [<ffff0000081d7ad4>] __vfs_read+0x1c/0xf8
       [<ffff0000081d8874>] vfs_read+0x84/0x148
       [<ffff0000081d9d64>] SyS_read+0x44/0xa0
       [<ffff000008082f4c>] __sys_trace_return+0x0/0x4
      
      This is due to the WARN() check if the reg field of the pullups struct
      is zero, and this should be 0 for pins controlled by the PUEN0/PUD0
      registers since PU0 is defined as 0. Change the data structure and use
      the generic sh_pfc_pin_to_bias_info() function to get the register
      offset and bit information.
      
      Fixes: 56065524 ("pinctrl: sh-pfc: r8a7795: Add bias pinconf support")
      Signed-off-by: default avatarNiklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
      Reviewed-by: default avatarLaurent Pinchart <laurent.pinchart@ideasonboard.com>
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5e159522
    • Gary Bisson's avatar
      pinctrl: imx: fix imx_pinctrl_desc initialization · b01bbf22
      Gary Bisson authored
      commit 8f5983ad upstream.
      
      Fixes: 6e408ed8 ("pinctrl: imx: fix initialization of imx_pinctrl_desc")
      Reviewed-by: default avatarVladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
      Reviewed-by: default avatarPeng Fan <peng.fan@nxp.com>
      Signed-off-by: default avatarGary Bisson <gary.bisson@boundarydevices.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      b01bbf22
    • Marcos Paulo de Souza's avatar
      Input: i8042 - add Pegatron touchpad to noloop table · f34fbb92
      Marcos Paulo de Souza authored
      commit 41c567a5 upstream.
      
      Avoid AUX loopback in Pegatron C15B touchpad, so input subsystem is able
      to recognize a Synaptics touchpad in the AUX port.
      
      Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=93791
      (Touchpad is not detected on DNS 0801480 notebook (PEGATRON C15B))
      Suggested-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarMarcos Paulo de Souza <marcos.souza.org@gmail.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f34fbb92
    • Pavel Rojtberg's avatar
      Input: xpad - use correct product id for x360w controllers · 5975358b
      Pavel Rojtberg authored
      commit b6fc513d upstream.
      
      currently the controllers get the same product id as the wireless
      receiver. However the controllers actually have their own product id.
      
      The patch makes the driver expose the same product id as the windows
      driver.
      
      This improves compatibility when running applications with WINE.
      
      see https://github.com/paroj/xpad/issues/54Signed-off-by: default avatarPavel Rojtberg <rojtberg@gmail.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5975358b
  2. 15 Jan, 2017 16 commits