1. 26 May, 2018 8 commits
    • Michal Hocko's avatar
      mm, memory_hotplug: make has_unmovable_pages more robust · 15c30bc0
      Michal Hocko authored
      Oscar has reported:
      : Due to an unfortunate setting with movablecore, memblocks containing bootmem
      : memory (pages marked by get_page_bootmem()) ended up marked in zone_movable.
      : So while trying to remove that memory, the system failed in do_migrate_range
      : and __offline_pages never returned.
      :
      : This can be reproduced by running
      : qemu-system-x86_64 -m 6G,slots=8,maxmem=8G -numa node,mem=4096M -numa node,mem=2048M
      : and movablecore=4G kernel command line
      :
      : linux kernel: BIOS-provided physical RAM map:
      : linux kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
      : linux kernel: BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
      : linux kernel: BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
      : linux kernel: BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable
      : linux kernel: BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved
      : linux kernel: BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
      : linux kernel: BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
      : linux kernel: BIOS-e820: [mem 0x0000000100000000-0x00000001bfffffff] usable
      : linux kernel: NX (Execute Disable) protection: active
      : linux kernel: SMBIOS 2.8 present.
      : linux kernel: DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org
      : linux kernel: Hypervisor detected: KVM
      : linux kernel: e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
      : linux kernel: e820: remove [mem 0x000a0000-0x000fffff] usable
      : linux kernel: last_pfn = 0x1c0000 max_arch_pfn = 0x400000000
      :
      : linux kernel: SRAT: PXM 0 -> APIC 0x00 -> Node 0
      : linux kernel: SRAT: PXM 1 -> APIC 0x01 -> Node 1
      : linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
      : linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff]
      : linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x13fffffff]
      : linux kernel: ACPI: SRAT: Node 1 PXM 1 [mem 0x140000000-0x1bfffffff]
      : linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x1c0000000-0x43fffffff] hotplug
      : linux kernel: NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0xbfffffff] -> [mem 0x0
      : linux kernel: NUMA: Node 0 [mem 0x00000000-0xbfffffff] + [mem 0x100000000-0x13fffffff] -> [mem 0
      : linux kernel: NODE_DATA(0) allocated [mem 0x13ffd6000-0x13fffffff]
      : linux kernel: NODE_DATA(1) allocated [mem 0x1bffd3000-0x1bfffcfff]
      :
      : zoneinfo shows that the zone movable is placed into both numa nodes:
      : Node 0, zone  Movable
      :   pages free     160140
      :         min      1823
      :         low      2278
      :         high     2733
      :         spanned  262144
      :         present  262144
      :         managed  245670
      : Node 1, zone  Movable
      :   pages free     448427
      :         min      3827
      :         low      4783
      :         high     5739
      :         spanned  524288
      :         present  524288
      :         managed  515766
      
      Note how only Node 0 has a hutplugable memory region which would rule it
      out from the early memblock allocations (most likely memmap).  Node1
      will surely contain memmaps on the same node and those would prevent
      offlining to succeed.  So this is arguably a configuration issue.
      Although one could argue that we should be more clever and rule early
      allocations from the zone movable.  This would be correct but probably
      not worth the effort considering what a hack movablecore is.
      
      Anyway, We could do better for those cases though.  We rely on
      start_isolate_page_range resp.  has_unmovable_pages to do their job.
      The first one isolates the whole range to be offlined so that we do not
      allocate from it anymore and the later makes sure we are not stumbling
      over non-migrateable pages.
      
      has_unmovable_pages is overly optimistic, however.  It doesn't check all
      the pages if we are withing zone_movable because we rely that those
      pages will be always migrateable.  As it turns out we are still not
      perfect there.  While bootmem pages in zonemovable sound like a clear
      bug which should be fixed let's remove the optimization for now and warn
      if we encounter unmovable pages in zone_movable in the meantime.  That
      should help for now at least.
      
      Btw.  this wasn't a real problem until commit 72b39cfc ("mm,
      memory_hotplug: do not fail offlining too early") because we used to
      have a small number of retries and then failed.  This turned out to be
      too fragile though.
      
      Link: http://lkml.kernel.org/r/20180523125555.30039-2-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarOscar Salvador <osalvador@techadventures.net>
      Tested-by: default avatarOscar Salvador <osalvador@techadventures.net>
      Reviewed-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      15c30bc0
    • Andrey Ryabinin's avatar
      mm/kasan: don't vfree() nonexistent vm_area · 0f901dcb
      Andrey Ryabinin authored
      KASAN uses different routines to map shadow for hot added memory and
      memory obtained in boot process.  Attempt to offline memory onlined by
      normal boot process leads to this:
      
          Trying to vfree() nonexistent vm area (000000005d3b34b9)
          WARNING: CPU: 2 PID: 13215 at mm/vmalloc.c:1525 __vunmap+0x147/0x190
      
          Call Trace:
           kasan_mem_notifier+0xad/0xb9
           notifier_call_chain+0x166/0x260
           __blocking_notifier_call_chain+0xdb/0x140
           __offline_pages+0x96a/0xb10
           memory_subsys_offline+0x76/0xc0
           device_offline+0xb8/0x120
           store_mem_state+0xfa/0x120
           kernfs_fop_write+0x1d5/0x320
           __vfs_write+0xd4/0x530
           vfs_write+0x105/0x340
           SyS_write+0xb0/0x140
      
      Obviously we can't call vfree() to free memory that wasn't allocated via
      vmalloc().  Use find_vm_area() to see if we can call vfree().
      
      Unfortunately it's a bit tricky to properly unmap and free shadow
      allocated during boot, so we'll have to keep it.  If memory will come
      online again that shadow will be reused.
      
      Matthew asked: how can you call vfree() on something that isn't a
      vmalloc address?
      
        vfree() is able to free any address returned by
        __vmalloc_node_range().  And __vmalloc_node_range() gives you any
        address you ask.  It doesn't have to be an address in [VMALLOC_START,
        VMALLOC_END] range.
      
        That's also how the module_alloc()/module_memfree() works on
        architectures that have designated area for modules.
      
      [aryabinin@virtuozzo.com: improve comments]
        Link: http://lkml.kernel.org/r/dabee6ab-3a7a-51cd-3b86-5468718e0390@virtuozzo.com
      [akpm@linux-foundation.org: fix typos, reflow comment]
      Link: http://lkml.kernel.org/r/20180201163349.8700-1-aryabinin@virtuozzo.com
      Fixes: fa69b598 ("mm/kasan: add support for memory hotplug")
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reported-by: default avatarPaul Menzel <pmenzel+linux-kasan-dev@molgen.mpg.de>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0f901dcb
    • Mike Kravetz's avatar
      MAINTAINERS: change hugetlbfs maintainer and update files · b9ddff9b
      Mike Kravetz authored
      The current hugetlbfs maintainer has not been active for more than a few
      years.  I have been been active in this area for more than two years and
      plan to remain active in the foreseeable future.
      
      Also, update the hugetlbfs entry to include linux-mm mail list and
      additional hugetlbfs related files.  hugetlb.c and hugetlb.h are not
      100% hugetlbfs, but a majority of their content is hugetlbfs related.
      
      Link: http://lkml.kernel.org/r/20180518225236.19079-1-mike.kravetz@oracle.comSigned-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Nadia Yvette Chambers <nyc@holomorphy.com>
      Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b9ddff9b
    • Davidlohr Bueso's avatar
      ipc/shm: fix shmat() nil address after round-down when remapping · 8f89c007
      Davidlohr Bueso authored
      shmat()'s SHM_REMAP option forbids passing a nil address for; this is in
      fact the very first thing we check for.  Andrea reported that for
      SHM_RND|SHM_REMAP cases we can end up bypassing the initial addr check,
      but we need to check again if the address was rounded down to nil.  As
      of this patch, such cases will return -EINVAL.
      
      Link: http://lkml.kernel.org/r/20180503204934.kk63josdu6u53fbd@linux-n805Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Reported-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Cc: Joe Lawrence <joe.lawrence@redhat.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8f89c007
    • Davidlohr Bueso's avatar
      Revert "ipc/shm: Fix shmat mmap nil-page protection" · a73ab244
      Davidlohr Bueso authored
      Patch series "ipc/shm: shmat() fixes around nil-page".
      
      These patches fix two issues reported[1] a while back by Joe and Andrea
      around how shmat(2) behaves with nil-page.
      
      The first reverts a commit that it was incorrectly thought that mapping
      nil-page (address=0) was a no no with MAP_FIXED.  This is not the case,
      with the exception of SHM_REMAP; which is address in the second patch.
      
      I chose two patches because it is easier to backport and it explicitly
      reverts bogus behaviour.  Both patches ought to be in -stable and ltp
      testcases need updated (the added testcase around the cve can be
      modified to just test for SHM_RND|SHM_REMAP).
      
      [1] lkml.kernel.org/r/20180430172152.nfa564pvgpk3ut7p@linux-n805
      
      This patch (of 2):
      
      Commit 95e91b83 ("ipc/shm: Fix shmat mmap nil-page protection")
      worked on the idea that we should not be mapping as root addr=0 and
      MAP_FIXED.  However, it was reported that this scenario is in fact
      valid, thus making the patch both bogus and breaks userspace as well.
      
      For example X11's libint10.so relies on shmat(1, SHM_RND) for lowmem
      initialization[1].
      
      [1] https://cgit.freedesktop.org/xorg/xserver/tree/hw/xfree86/os-support/linux/int10/linux.c#n347
      Link: http://lkml.kernel.org/r/20180503203243.15045-2-dave@stgolabs.net
      Fixes: 95e91b83 ("ipc/shm: Fix shmat mmap nil-page protection")
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Reported-by: default avatarJoe Lawrence <joe.lawrence@redhat.com>
      Reported-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a73ab244
    • Matthew Wilcox's avatar
      idr: fix invalid ptr dereference on item delete · 7a4deea1
      Matthew Wilcox authored
      If the radix tree underlying the IDR happens to be full and we attempt
      to remove an id which is larger than any id in the IDR, we will call
      __radix_tree_delete() with an uninitialised 'slot' pointer, at which
      point anything could happen.  This was easiest to hit with a single
      entry at id 0 and attempting to remove a non-0 id, but it could have
      happened with 64 entries and attempting to remove an id >= 64.
      
      Roman said:
      
        The syzcaller test boils down to opening /dev/kvm, creating an
        eventfd, and calling a couple of KVM ioctls. None of this requires
        superuser. And the result is dereferencing an uninitialized pointer
        which is likely a crash. The specific path caught by syzbot is via
        KVM_HYPERV_EVENTD ioctl which is new in 4.17. But I guess there are
        other user-triggerable paths, so cc:stable is probably justified.
      
      Matthew added:
      
        We have around 250 calls to idr_remove() in the kernel today. Many of
        them pass an ID which is embedded in the object they're removing, so
        they're safe. Picking a few likely candidates:
      
        drivers/firewire/core-cdev.c looks unsafe; the ID comes from an ioctl.
        drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c is similar
        drivers/atm/nicstar.c could be taken down by a handcrafted packet
      
      Link: http://lkml.kernel.org/r/20180518175025.GD6361@bombadil.infradead.org
      Fixes: 0a835c4f ("Reimplement IDR and IDA using the radix tree")
      Reported-by: <syzbot+35666cba7f0a337e2e79@syzkaller.appspotmail.com>
      Debugged-by: default avatarRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: default avatarMatthew Wilcox <mawilcox@microsoft.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7a4deea1
    • Changwei Ge's avatar
      ocfs2: revert "ocfs2/o2hb: check len for bio_add_page() to avoid getting incorrect bio" · 3373de20
      Changwei Ge authored
      This reverts commit ba16ddfb ("ocfs2/o2hb: check len for
      bio_add_page() to avoid getting incorrect bio").
      
      In my testing, this patch introduces a problem that mkfs can't have
      slots more than 16 with 4k block size.
      
      And the original logic is safe actually with the situation it mentions
      so revert this commit.
      
      Attach test log:
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 0, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 1, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 2, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 3, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 4, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 5, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 6, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 7, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 8, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 9, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 10, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 11, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 12, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 13, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 14, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 15, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 16, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:471 ERROR: Adding page[16] to bio failed, page ffffea0002d7ed40, len 0, vec_len 4096, vec_start 0,bi_sector 8192
        (mkfs.ocfs2,27479,2):o2hb_read_slots:500 ERROR: status = -5
        (mkfs.ocfs2,27479,2):o2hb_populate_slot_data:1911 ERROR: status = -5
        (mkfs.ocfs2,27479,2):o2hb_region_dev_write:2012 ERROR: status = -5
      
      Link: http://lkml.kernel.org/r/SIXPR06MB0461721F398A5A92FC68C39ED5920@SIXPR06MB0461.apcprd06.prod.outlook.comSigned-off-by: default avatarChangwei Ge <ge.changwei@h3c.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: Yiwen Jiang <jiangyiwen@huawei.com>
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3373de20
    • Omar Sandoval's avatar
      mm: fix nr_rotate_swap leak in swapon() error case · 7cbf3192
      Omar Sandoval authored
      If swapon() fails after incrementing nr_rotate_swap, we don't decrement
      it and thus effectively leak it.  Make sure we decrement it if we
      incremented it.
      
      Link: http://lkml.kernel.org/r/b6fe6b879f17fa68eee6cbd876f459f6e5e33495.1526491581.git.osandov@fb.com
      Fixes: 81a0298b ("mm, swap: don't use VMA based swap readahead if HDD is used as swap")
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Reviewed-by: default avatarRik van Riel <riel@surriel.com>
      Reviewed-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7cbf3192
  2. 25 May, 2018 5 commits
  3. 24 May, 2018 12 commits
    • Dave Airlie's avatar
      Merge branch 'vmwgfx-fixes-4.17' of git://people.freedesktop.org/~thomash/linux into drm-fixes · 4bc6f777
      Dave Airlie authored
      Three fixes for vmwgfx. Two are cc'd stable and fix host logging and its
      error paths on 32-bit VMs. One is a fix for a hibernate flaw
      introduced with the 4.17 merge window.
      
      * 'vmwgfx-fixes-4.17' of git://people.freedesktop.org/~thomash/linux:
        drm/vmwgfx: Schedule an fb dirty update after resume
        drm/vmwgfx: Fix host logging / guestinfo reading error paths
        drm/vmwgfx: Fix 32-bit VMW_PORT_HB_[IN|OUT] macros
      4bc6f777
    • Linus Torvalds's avatar
      Merge branch 'stable/for-linus-4.17' of... · b5069438
      Linus Torvalds authored
      Merge branch 'stable/for-linus-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb
      
      Pull swiotlb fix from Konrad Rzeszutek Wilk:
       "One single fix in here: under Xen the DMA32 heap (in the hypervisor)
        would end up looking like swiss cheese.
      
        The reason being that for every coherent DMA allocation we didn't do
        the proper hypercall to tell Xen to return the page back to the DMA32
        heap. End result was (eventually) no DMA32 space if you (for example)
        continously unloaded and loaded modules"
      
      * 'stable/for-linus-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
        xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent
      b5069438
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · 34b48b87
      Linus Torvalds authored
      Pull rdma fixes from Jason Gunthorpe:
       "This is pretty much just the usual array of smallish driver bugs.
      
         - remove bouncing addresses from the MAINTAINERS file
      
         - kernel oops and bad error handling fixes for hfi, i40iw, cxgb4, and
           hns drivers
      
         - various small LOC behavioral/operational bugs in mlx5, hns, qedr
           and i40iw drivers
      
         - two fixes for patches already sent during the merge window
      
         - a long-standing bug related to not decreasing the pinned pages
           count in the right MM was found and fixed"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (28 commits)
        RDMA/hns: Move the location for initializing tmp_len
        RDMA/hns: Bugfix for cq record db for kernel
        IB/uverbs: Fix uverbs_attr_get_obj
        RDMA/qedr: Fix doorbell bar mapping for dpi > 1
        IB/umem: Use the correct mm during ib_umem_release
        iw_cxgb4: Fix an error handling path in 'c4iw_get_dma_mr()'
        RDMA/i40iw: Avoid panic when reading back the IRQ affinity hint
        RDMA/i40iw: Avoid reference leaks when processing the AEQ
        RDMA/i40iw: Avoid panic when objects are being created and destroyed
        RDMA/hns: Fix the bug with NULL pointer
        RDMA/hns: Set NULL for __internal_mr
        RDMA/hns: Enable inner_pa_vld filed of mpt
        RDMA/hns: Set desc_dma_addr for zero when free cmq desc
        RDMA/hns: Fix the bug with rq sge
        RDMA/hns: Not support qp transition from reset to reset for hip06
        RDMA/hns: Add return operation when configured global param fail
        RDMA/hns: Update convert function of endian format
        RDMA/hns: Load the RoCE dirver automatically
        RDMA/hns: Bugfix for rq record db for kernel
        RDMA/hns: Add rq inline flags judgement
        ...
      34b48b87
    • Linus Torvalds's avatar
      Merge tag 'for-4.17-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · d7b66b4a
      Linus Torvalds authored
      Pull btrfs fix from David Sterba:
       "A one-liner that prevents leaking an internal error value 1 out of the
        ftruncate syscall.
      
        This has been observed in practice. The steps to reproduce make a
        common pattern (open/write/fync/ftruncate) but also need the
        application to not check only for negative values and happens only for
        compressed inlined files.
      
        The conditions are narrow but as this could break userspace I think
        it's better to merge it now and not wait for the merge window"
      
      * tag 'for-4.17-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        Btrfs: fix error handling in btrfs_truncate()
      d7b66b4a
    • Lukas Wunner's avatar
      ALSA: hda - Fix runtime PM · 009f8c90
      Lukas Wunner authored
      Before commit 3b5b899c ("ALSA: hda: Make use of core codec functions
      to sync power state"), hda_set_power_state() returned the response to
      the Get Power State verb, a 32-bit unsigned integer whose expected value
      is 0x233 after transitioning a codec to D3, and 0x0 after transitioning
      it to D0.
      
      The response value is significant because hda_codec_runtime_suspend()
      does not clear the codec's bit in the codec_powered bitmask unless the
      AC_PWRST_CLK_STOP_OK bit (0x200) is set in the response value.  That in
      turn prevents the HDA controller from runtime suspending because
      azx_runtime_idle() checks that the codec_powered bitmask is zero.
      
      Since commit 3b5b899c, hda_set_power_state() only returns 0x0 or
      0x1, thereby breaking runtime PM for any HDA controller.  That's because
      an inline function introduced by the commit returns a bool instead of a
      32-bit unsigned int.  The change was likely erroneous and resulted from
      copying and pasting snd_hda_check_power_state(), which is immediately
      preceding the newly introduced inline function.  Fix it.
      
      Link: https://bugs.freedesktop.org/show_bug.cgi?id=106597
      Fixes: 3b5b899c ("ALSA: hda: Make use of core codec functions to sync power state")
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Abhijeet Kumar <abhijeet.kumar@intel.com>
      Reported-and-tested-by: default avatarGunnar Krüger <taijian@posteo.de>
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Acked-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      009f8c90
    • Joonsoo Kim's avatar
      Revert "mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE" · d883c6cf
      Joonsoo Kim authored
      This reverts the following commits that change CMA design in MM.
      
       3d2054ad ("ARM: CMA: avoid double mapping to the CMA area if CONFIG_HIGHMEM=y")
      
       1d47a3ec ("mm/cma: remove ALLOC_CMA")
      
       bad8c6c0 ("mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE")
      
      Ville reported a following error on i386.
      
        Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
        microcode: microcode updated early to revision 0x4, date = 2013-06-28
        Initializing CPU#0
        Initializing HighMem for node 0 (000377fe:00118000)
        Initializing Movable for node 0 (00000001:00118000)
        BUG: Bad page state in process swapper  pfn:377fe
        page:f53effc0 count:0 mapcount:-127 mapping:00000000 index:0x0
        flags: 0x80000000()
        raw: 80000000 00000000 00000000 ffffff80 00000000 00000100 00000200 00000001
        page dumped because: nonzero mapcount
        Modules linked in:
        CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0-rc5-elk+ #145
        Hardware name: Dell Inc. Latitude E5410/03VXMC, BIOS A15 07/11/2013
        Call Trace:
         dump_stack+0x60/0x96
         bad_page+0x9a/0x100
         free_pages_check_bad+0x3f/0x60
         free_pcppages_bulk+0x29d/0x5b0
         free_unref_page_commit+0x84/0xb0
         free_unref_page+0x3e/0x70
         __free_pages+0x1d/0x20
         free_highmem_page+0x19/0x40
         add_highpages_with_active_regions+0xab/0xeb
         set_highmem_pages_init+0x66/0x73
         mem_init+0x1b/0x1d7
         start_kernel+0x17a/0x363
         i386_start_kernel+0x95/0x99
         startup_32_smp+0x164/0x168
      
      The reason for this error is that the span of MOVABLE_ZONE is extended
      to whole node span for future CMA initialization, and, normal memory is
      wrongly freed here.  I submitted the fix and it seems to work, but,
      another problem happened.
      
      It's so late time to fix the later problem so I decide to reverting the
      series.
      Reported-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Acked-by: default avatarLaura Abbott <labbott@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d883c6cf
    • Linus Torvalds's avatar
      Merge branch 'for-4.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata · 577e75e0
      Linus Torvalds authored
      Pull libata fixes from Tejun Heo:
       "Nothing too interesting.  Four patches to update the blacklist and
        add a controller ID"
      
      * 'for-4.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
        ahci: Add PCI ID for Cannon Lake PCH-LP AHCI
        libata: blacklist Micron 500IT SSD with MU01 firmware
        libata: Apply NOLPM quirk for SAMSUNG PM830 CXM13D1Q.
        libata: Blacklist some Sandisk SSDs for NCQ
      577e75e0
    • Linus Torvalds's avatar
      Merge tag 'for-linus-20180524' of git://git.kernel.dk/linux-block · b68ea0ee
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "Two fixes that should go into this release:
      
         - a loop writeback error clearing fix from Jeff
      
         - the sr sense fix from myself"
      
      * tag 'for-linus-20180524' of git://git.kernel.dk/linux-block:
        loop: clear wb_err in bd_inode when detaching backing file
        sr: pass down correctly sized SCSI sense buffer
      b68ea0ee
    • Linus Torvalds's avatar
      Merge tag 'pm-4.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 9ca5a2ae
      Linus Torvalds authored
      Pull power management fix from Rafael Wysocki:
       "Fix a regression from the 4.15 cycle that caused the system suspend
        and resume overhead to increase on many systems and triggered more
        serious problems on some of them (Rafael Wysocki)"
      
      * tag 'pm-4.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        PM / core: Fix direct_complete handling for devices with no callbacks
      9ca5a2ae
    • Mika Westerberg's avatar
      ahci: Add PCI ID for Cannon Lake PCH-LP AHCI · 4544e403
      Mika Westerberg authored
      This one should be using the default LPM policy for mobile chipsets so
      add the PCI ID to the driver list of supported revices.
      Signed-off-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org
      4544e403
    • Laura Abbott's avatar
      arm64: Make sure permission updates happen for pmd/pud · 82034c23
      Laura Abbott authored
      Commit 15122ee2 ("arm64: Enforce BBM for huge IO/VMAP mappings")
      disallowed block mappings for ioremap since that code does not honor
      break-before-make. The same APIs are also used for permission updating
      though and the extra checks prevent the permission updates from happening,
      even though this should be permitted. This results in read-only permissions
      not being fully applied. Visibly, this can occasionaly be seen as a failure
      on the built in rodata test when the test data ends up in a section or
      as an odd RW gap on the page table dump. Fix this by using
      pgattr_change_is_safe instead of p*d_present for determining if the
      change is permitted.
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Tested-by: default avatarPeter Robinson <pbrobinson@gmail.com>
      Reported-by: default avatarPeter Robinson <pbrobinson@gmail.com>
      Fixes: 15122ee2 ("arm64: Enforce BBM for huge IO/VMAP mappings")
      Signed-off-by: default avatarLaura Abbott <labbott@redhat.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      82034c23
    • Omar Sandoval's avatar
      Btrfs: fix error handling in btrfs_truncate() · d5014738
      Omar Sandoval authored
      Jun Wu at Facebook reported that an internal service was seeing a return
      value of 1 from ftruncate() on Btrfs in some cases. This is coming from
      the NEED_TRUNCATE_BLOCK return value from btrfs_truncate_inode_items().
      
      btrfs_truncate() uses two variables for error handling, ret and err.
      When btrfs_truncate_inode_items() returns non-zero, we set err to the
      return value. However, NEED_TRUNCATE_BLOCK is not an error. Make sure we
      only set err if ret is an error (i.e., negative).
      
      To reproduce the issue: mount a filesystem with -o compress-force=zstd
      and the following program will encounter return value of 1 from
      ftruncate:
      
      int main(void) {
              char buf[256] = { 0 };
              int ret;
              int fd;
      
              fd = open("test", O_CREAT | O_WRONLY | O_TRUNC, 0666);
              if (fd == -1) {
                      perror("open");
                      return EXIT_FAILURE;
              }
      
              if (write(fd, buf, sizeof(buf)) != sizeof(buf)) {
                      perror("write");
                      close(fd);
                      return EXIT_FAILURE;
              }
      
              if (fsync(fd) == -1) {
                      perror("fsync");
                      close(fd);
                      return EXIT_FAILURE;
              }
      
              ret = ftruncate(fd, 128);
              if (ret) {
                      printf("ftruncate() returned %d\n", ret);
                      close(fd);
                      return EXIT_FAILURE;
              }
      
              close(fd);
              return EXIT_SUCCESS;
      }
      
      Fixes: ddfae63c ("btrfs: move btrfs_truncate_block out of trans handle")
      CC: stable@vger.kernel.org # 4.15+
      Reported-by: default avatarJun Wu <quark@fb.com>
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      d5014738
  4. 23 May, 2018 13 commits
  5. 22 May, 2018 2 commits
    • Peter Maydell's avatar
      arm64: fault: Don't leak data in ESR context for user fault on kernel VA · cc198460
      Peter Maydell authored
      If userspace faults on a kernel address, handing them the raw ESR
      value on the sigframe as part of the delivered signal can leak data
      useful to attackers who are using information about the underlying hardware
      fault type (e.g. translation vs permission) as a mechanism to defeat KASLR.
      
      However there are also legitimate uses for the information provided
      in the ESR -- notably the GCC and LLVM sanitizers use this to report
      whether wild pointer accesses by the application are reads or writes
      (since a wild write is a more serious bug than a wild read), so we
      don't want to drop the ESR information entirely.
      
      For faulting addresses in the kernel, sanitize the ESR. We choose
      to present userspace with the illusion that there is nothing mapped
      in the kernel's part of the address space at all, by reporting all
      faults as level 0 translation faults taken to EL1.
      
      These fields are safe to pass through to userspace as they depend
      only on the instruction that userspace used to provoke the fault:
       EC IL (always)
       ISV CM WNR (for all data aborts)
      All the other fields in ESR except DFSC are architecturally RES0
      for an L0 translation fault taken to EL1, so can be zeroed out
      without confusing userspace.
      
      The illusion is not entirely perfect, as there is a tiny wrinkle
      where we will report an alignment fault that was not due to the memory
      type (for instance a LDREX to an unaligned address) as a translation
      fault, whereas if you do this on real unmapped memory the alignment
      fault takes precedence. This is not likely to trip anybody up in
      practice, as the only users we know of for the ESR information who
      care about the behaviour for kernel addresses only really want to
      know about the WnR bit.
      Signed-off-by: default avatarPeter Maydell <peter.maydell@linaro.org>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      cc198460
    • Rafael J. Wysocki's avatar
      PM / core: Fix direct_complete handling for devices with no callbacks · c62ec461
      Rafael J. Wysocki authored
      Commit 08810a41 (PM / core: Add NEVER_SKIP and SMART_PREPARE
      driver flags) inadvertently prevented the power.direct_complete flag
      from being set for devices without PM callbacks and with disabled
      runtime PM which also prevents power.direct_complete from being set
      for their parents.  That led to problems including a resume crash on
      HP ZBook 14u.
      
      Restore the previous behavior by causing power.direct_complete to be
      set for those devices again, but do that in a more direct way to
      avoid overlooking that case in the future.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=199693
      Fixes: 08810a41 (PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags)
      Reported-by: default avatarThomas Martitz <kugel@rockbox.org>
      Tested-by: default avatarThomas Martitz <kugel@rockbox.org>
      Cc: 4.15+ <stable@vger.kernel.org> # 4.15+
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Reviewed-by: default avatarJohan Hovold <johan@kernel.org>
      c62ec461