1. 26 May, 2018 18 commits
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · bc2dbc54
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "16 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        kasan: fix memory hotplug during boot
        kasan: free allocated shadow memory on MEM_CANCEL_ONLINE
        checkpatch: fix macro argument precedence test
        init/main.c: include <linux/mem_encrypt.h>
        kernel/sys.c: fix potential Spectre v1 issue
        mm/memory_hotplug: fix leftover use of struct page during hotplug
        proc: fix smaps and meminfo alignment
        mm: do not warn on offline nodes unless the specific node is explicitly requested
        mm, memory_hotplug: make has_unmovable_pages more robust
        mm/kasan: don't vfree() nonexistent vm_area
        MAINTAINERS: change hugetlbfs maintainer and update files
        ipc/shm: fix shmat() nil address after round-down when remapping
        Revert "ipc/shm: Fix shmat mmap nil-page protection"
        idr: fix invalid ptr dereference on item delete
        ocfs2: revert "ocfs2/o2hb: check len for bio_add_page() to avoid getting incorrect bio"
        mm: fix nr_rotate_swap leak in swapon() error case
      bc2dbc54
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 03250e10
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "Let's begin the holiday weekend with some networking fixes:
      
         1) Whoops need to restrict cfg80211 wiphy names even more to 64
            bytes. From Eric Biggers.
      
         2) Fix flags being ignored when using kernel_connect() with SCTP,
            from Xin Long.
      
         3) Use after free in DCCP, from Alexey Kodanev.
      
         4) Need to check rhltable_init() return value in ipmr code, from Eric
            Dumazet.
      
         5) XDP handling fixes in virtio_net from Jason Wang.
      
         6) Missing RTA_TABLE in rtm_ipv4_policy[], from Roopa Prabhu.
      
         7) Need to use IRQ disabling spinlocks in mlx4_qp_lookup(), from Jack
            Morgenstein.
      
         8) Prevent out-of-bounds speculation using indexes in BPF, from
            Daniel Borkmann.
      
         9) Fix regression added by AF_PACKET link layer cure, from Willem de
            Bruijn.
      
        10) Correct ENIC dma mask, from Govindarajulu Varadarajan.
      
        11) Missing config options for PMTU tests, from Stefano Brivio"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (48 commits)
        ibmvnic: Fix partial success login retries
        selftests/net: Add missing config options for PMTU tests
        mlx4_core: allocate ICM memory in page size chunks
        enic: set DMA mask to 47 bit
        ppp: remove the PPPIOCDETACH ioctl
        ipv4: remove warning in ip_recv_error
        net : sched: cls_api: deal with egdev path only if needed
        vhost: synchronize IOTLB message with dev cleanup
        packet: fix reserve calculation
        net/mlx5: IPSec, Fix a race between concurrent sandbox QP commands
        net/mlx5e: When RXFCS is set, add FCS data into checksum calculation
        bpf: properly enforce index mask to prevent out-of-bounds speculation
        net/mlx4: Fix irq-unsafe spinlock usage
        net: phy: broadcom: Fix bcm_write_exp()
        net: phy: broadcom: Fix auxiliary control register reads
        net: ipv4: add missing RTA_TABLE to rtm_ipv4_policy
        net/mlx4: fix spelling mistake: "Inrerface" -> "Interface" and rephrase message
        ibmvnic: Only do H_EOI for mobility events
        tuntap: correctly set SOCKWQ_ASYNC_NOSPACE
        virtio-net: fix leaking page for gso packet during mergeable XDP
        ...
      03250e10
    • David Hildenbrand's avatar
      kasan: fix memory hotplug during boot · 3f195972
      David Hildenbrand authored
      Using module_init() is wrong.  E.g.  ACPI adds and onlines memory before
      our memory notifier gets registered.
      
      This makes sure that ACPI memory detected during boot up will not result
      in a kernel crash.
      
      Easily reproducible with QEMU, just specify a DIMM when starting up.
      
      Link: http://lkml.kernel.org/r/20180522100756.18478-3-david@redhat.com
      Fixes: 786a8959 ("kasan: disable memory hotplug")
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3f195972
    • David Hildenbrand's avatar
      kasan: free allocated shadow memory on MEM_CANCEL_ONLINE · ed1596f9
      David Hildenbrand authored
      We have to free memory again when we cancel onlining, otherwise a later
      onlining attempt will fail.
      
      Link: http://lkml.kernel.org/r/20180522100756.18478-2-david@redhat.com
      Fixes: fa69b598 ("mm/kasan: add support for memory hotplug")
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ed1596f9
    • Joe Perches's avatar
      checkpatch: fix macro argument precedence test · d41362ed
      Joe Perches authored
      checkpatch's macro argument precedence test is broken so fix it.
      
      Link: http://lkml.kernel.org/r/5dd900e9197febc1995604bb33c23c136d8b33ce.camel@perches.comSigned-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d41362ed
    • Mathieu Malaterre's avatar
      init/main.c: include <linux/mem_encrypt.h> · ae67d58d
      Mathieu Malaterre authored
      In commit c7753208 ("x86, swiotlb: Add memory encryption support") a
      call to function `mem_encrypt_init' was added.  Include prototype
      defined in header <linux/mem_encrypt.h> to prevent a warning reported
      during compilation with W=1:
      
        init/main.c:494:20: warning: no previous prototype for `mem_encrypt_init' [-Wmissing-prototypes]
      
      Link: http://lkml.kernel.org/r/20180522195533.31415-1-malat@debian.orgSigned-off-by: default avatarMathieu Malaterre <malat@debian.org>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Laura Abbott <lauraa@codeaurora.org>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Gargi Sharma <gs051095@gmail.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ae67d58d
    • Gustavo A. R. Silva's avatar
      kernel/sys.c: fix potential Spectre v1 issue · 23d6aef7
      Gustavo A. R. Silva authored
      `resource' can be controlled by user-space, hence leading to a potential
      exploitation of the Spectre variant 1 vulnerability.
      
      This issue was detected with the help of Smatch:
      
        kernel/sys.c:1474 __do_compat_sys_old_getrlimit() warn: potential spectre issue 'get_current()->signal->rlim' (local cap)
        kernel/sys.c:1455 __do_sys_old_getrlimit() warn: potential spectre issue 'get_current()->signal->rlim' (local cap)
      
      Fix this by sanitizing *resource* before using it to index
      current->signal->rlim
      
      Notice that given that speculation windows are large, the policy is to
      kill the speculation on the first load and not worry if it can be
      completed with a dependent load/store [1].
      
      [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
      
      Link: http://lkml.kernel.org/r/20180515030038.GA11822@embeddedor.comSigned-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      23d6aef7
    • Jonathan Cameron's avatar
      mm/memory_hotplug: fix leftover use of struct page during hotplug · a2155861
      Jonathan Cameron authored
      The case of a new numa node got missed in avoiding using the node info
      from page_struct during hotplug.  In this path we have a call to
      register_mem_sect_under_node (which allows us to specify it is hotplug
      so don't change the node), via link_mem_sections which unfortunately
      does not.
      
      Fix is to pass check_nid through link_mem_sections as well and disable
      it in the new numa node path.
      
      Note the bug only 'sometimes' manifests depending on what happens to be
      in the struct page structures - there are lots of them and it only needs
      to match one of them.
      
      The result of the bug is that (with a new memory only node) we never
      successfully call register_mem_sect_under_node so don't get the memory
      associated with the node in sysfs and meminfo for the node doesn't
      report it.
      
      It came up whilst testing some arm64 hotplug patches, but appears to be
      universal.  Whilst I'm triggering it by removing then reinserting memory
      to a node with no other elements (thus making the node disappear then
      appear again), it appears it would happen on hotplugging memory where
      there was none before and it doesn't seem to be related the arm64
      patches.
      
      These patches call __add_pages (where most of the issue was fixed by
      Pavel's patch).  If there is a node at the time of the __add_pages call
      then all is well as it calls register_mem_sect_under_node from there
      with check_nid set to false.  Without a node that function returns
      having not done the sysfs related stuff as there is no node to use.
      This is expected but it is the resulting path that fails...
      
      Exact path to the problem is as follows:
      
       mm/memory_hotplug.c: add_memory_resource()
      
         The node is not online so we enter the 'if (new_node)' twice, on the
         second such block there is a call to link_mem_sections which calls
         into
      
        drivers/node.c: link_mem_sections() which calls
      
        drivers/node.c: register_mem_sect_under_node() which calls
           get_nid_for_pfn and keeps trying until the output of that matches
           the expected node (passed all the way down from
           add_memory_resource)
      
      It is effectively the same fix as the one referred to in the fixes tag
      just in the code path for a new node where the comments point out we
      have to rerun the link creation because it will have failed in
      register_new_memory (as there was no node at the time).  (actually that
      comment is wrong now as we don't have register_new_memory any more it
      got renamed to hotplug_memory_register in Pavel's patch).
      
      Link: http://lkml.kernel.org/r/20180504085311.1240-1-Jonathan.Cameron@huawei.com
      Fixes: fc44f7f9 ("mm/memory_hotplug: don't read nid from struct page during hotplug")
      Signed-off-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Reviewed-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a2155861
    • Hugh Dickins's avatar
      proc: fix smaps and meminfo alignment · 6c04ab0e
      Hugh Dickins authored
      The 4.17-rc /proc/meminfo and /proc/<pid>/smaps look ugly: single-digit
      numbers (commonly 0) are misaligned.
      
      Remove seq_put_decimal_ull_width()'s leftover optimization for single
      digits: it's wrong now that num_to_str() takes care of the width.
      
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1805241554210.1326@eggly.anvils
      Fixes: d1be35cb ("proc: add seq_put_decimal_ull_width to speed up /proc/pid/smaps")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Andrei Vagin <avagin@openvz.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6c04ab0e
    • Michal Hocko's avatar
      mm: do not warn on offline nodes unless the specific node is explicitly requested · 8addc2d0
      Michal Hocko authored
      Oscar has noticed that we splat
      
         WARNING: CPU: 0 PID: 64 at ./include/linux/gfp.h:467 vmemmap_alloc_block+0x4e/0xc9
         [...]
         CPU: 0 PID: 64 Comm: kworker/u4:1 Tainted: G        W   E     4.17.0-rc5-next-20180517-1-default+ #66
         Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
         Workqueue: kacpi_hotplug acpi_hotplug_work_fn
         Call Trace:
          vmemmap_populate+0xf2/0x2ae
          sparse_mem_map_populate+0x28/0x35
          sparse_add_one_section+0x4c/0x187
          __add_pages+0xe7/0x1a0
          add_pages+0x16/0x70
          add_memory_resource+0xa3/0x1d0
          add_memory+0xe4/0x110
          acpi_memory_device_add+0x134/0x2e0
          acpi_bus_attach+0xd9/0x190
          acpi_bus_scan+0x37/0x70
          acpi_device_hotplug+0x389/0x4e0
          acpi_hotplug_work_fn+0x1a/0x30
          process_one_work+0x146/0x340
          worker_thread+0x47/0x3e0
          kthread+0xf5/0x130
          ret_from_fork+0x35/0x40
      
      when adding memory to a node that is currently offline.
      
      The VM_WARN_ON is just too loud without a good reason.  In this
      particular case we are doing
      
      	alloc_pages_node(node, GFP_KERNEL|__GFP_RETRY_MAYFAIL|__GFP_NOWARN, order)
      
      so we do not insist on allocating from the given node (it is more a
      hint) so we can fall back to any other populated node and moreover we
      explicitly ask to not warn for the allocation failure.
      
      Soften the warning only to cases when somebody asks for the given node
      explicitly by __GFP_THISNODE.
      
      Link: http://lkml.kernel.org/r/20180523125555.30039-3-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarOscar Salvador <osalvador@techadventures.net>
      Tested-by: default avatarOscar Salvador <osalvador@techadventures.net>
      Reviewed-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8addc2d0
    • Michal Hocko's avatar
      mm, memory_hotplug: make has_unmovable_pages more robust · 15c30bc0
      Michal Hocko authored
      Oscar has reported:
      : Due to an unfortunate setting with movablecore, memblocks containing bootmem
      : memory (pages marked by get_page_bootmem()) ended up marked in zone_movable.
      : So while trying to remove that memory, the system failed in do_migrate_range
      : and __offline_pages never returned.
      :
      : This can be reproduced by running
      : qemu-system-x86_64 -m 6G,slots=8,maxmem=8G -numa node,mem=4096M -numa node,mem=2048M
      : and movablecore=4G kernel command line
      :
      : linux kernel: BIOS-provided physical RAM map:
      : linux kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
      : linux kernel: BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
      : linux kernel: BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
      : linux kernel: BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable
      : linux kernel: BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved
      : linux kernel: BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
      : linux kernel: BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
      : linux kernel: BIOS-e820: [mem 0x0000000100000000-0x00000001bfffffff] usable
      : linux kernel: NX (Execute Disable) protection: active
      : linux kernel: SMBIOS 2.8 present.
      : linux kernel: DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org
      : linux kernel: Hypervisor detected: KVM
      : linux kernel: e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
      : linux kernel: e820: remove [mem 0x000a0000-0x000fffff] usable
      : linux kernel: last_pfn = 0x1c0000 max_arch_pfn = 0x400000000
      :
      : linux kernel: SRAT: PXM 0 -> APIC 0x00 -> Node 0
      : linux kernel: SRAT: PXM 1 -> APIC 0x01 -> Node 1
      : linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
      : linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff]
      : linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x13fffffff]
      : linux kernel: ACPI: SRAT: Node 1 PXM 1 [mem 0x140000000-0x1bfffffff]
      : linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x1c0000000-0x43fffffff] hotplug
      : linux kernel: NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0xbfffffff] -> [mem 0x0
      : linux kernel: NUMA: Node 0 [mem 0x00000000-0xbfffffff] + [mem 0x100000000-0x13fffffff] -> [mem 0
      : linux kernel: NODE_DATA(0) allocated [mem 0x13ffd6000-0x13fffffff]
      : linux kernel: NODE_DATA(1) allocated [mem 0x1bffd3000-0x1bfffcfff]
      :
      : zoneinfo shows that the zone movable is placed into both numa nodes:
      : Node 0, zone  Movable
      :   pages free     160140
      :         min      1823
      :         low      2278
      :         high     2733
      :         spanned  262144
      :         present  262144
      :         managed  245670
      : Node 1, zone  Movable
      :   pages free     448427
      :         min      3827
      :         low      4783
      :         high     5739
      :         spanned  524288
      :         present  524288
      :         managed  515766
      
      Note how only Node 0 has a hutplugable memory region which would rule it
      out from the early memblock allocations (most likely memmap).  Node1
      will surely contain memmaps on the same node and those would prevent
      offlining to succeed.  So this is arguably a configuration issue.
      Although one could argue that we should be more clever and rule early
      allocations from the zone movable.  This would be correct but probably
      not worth the effort considering what a hack movablecore is.
      
      Anyway, We could do better for those cases though.  We rely on
      start_isolate_page_range resp.  has_unmovable_pages to do their job.
      The first one isolates the whole range to be offlined so that we do not
      allocate from it anymore and the later makes sure we are not stumbling
      over non-migrateable pages.
      
      has_unmovable_pages is overly optimistic, however.  It doesn't check all
      the pages if we are withing zone_movable because we rely that those
      pages will be always migrateable.  As it turns out we are still not
      perfect there.  While bootmem pages in zonemovable sound like a clear
      bug which should be fixed let's remove the optimization for now and warn
      if we encounter unmovable pages in zone_movable in the meantime.  That
      should help for now at least.
      
      Btw.  this wasn't a real problem until commit 72b39cfc ("mm,
      memory_hotplug: do not fail offlining too early") because we used to
      have a small number of retries and then failed.  This turned out to be
      too fragile though.
      
      Link: http://lkml.kernel.org/r/20180523125555.30039-2-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarOscar Salvador <osalvador@techadventures.net>
      Tested-by: default avatarOscar Salvador <osalvador@techadventures.net>
      Reviewed-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      15c30bc0
    • Andrey Ryabinin's avatar
      mm/kasan: don't vfree() nonexistent vm_area · 0f901dcb
      Andrey Ryabinin authored
      KASAN uses different routines to map shadow for hot added memory and
      memory obtained in boot process.  Attempt to offline memory onlined by
      normal boot process leads to this:
      
          Trying to vfree() nonexistent vm area (000000005d3b34b9)
          WARNING: CPU: 2 PID: 13215 at mm/vmalloc.c:1525 __vunmap+0x147/0x190
      
          Call Trace:
           kasan_mem_notifier+0xad/0xb9
           notifier_call_chain+0x166/0x260
           __blocking_notifier_call_chain+0xdb/0x140
           __offline_pages+0x96a/0xb10
           memory_subsys_offline+0x76/0xc0
           device_offline+0xb8/0x120
           store_mem_state+0xfa/0x120
           kernfs_fop_write+0x1d5/0x320
           __vfs_write+0xd4/0x530
           vfs_write+0x105/0x340
           SyS_write+0xb0/0x140
      
      Obviously we can't call vfree() to free memory that wasn't allocated via
      vmalloc().  Use find_vm_area() to see if we can call vfree().
      
      Unfortunately it's a bit tricky to properly unmap and free shadow
      allocated during boot, so we'll have to keep it.  If memory will come
      online again that shadow will be reused.
      
      Matthew asked: how can you call vfree() on something that isn't a
      vmalloc address?
      
        vfree() is able to free any address returned by
        __vmalloc_node_range().  And __vmalloc_node_range() gives you any
        address you ask.  It doesn't have to be an address in [VMALLOC_START,
        VMALLOC_END] range.
      
        That's also how the module_alloc()/module_memfree() works on
        architectures that have designated area for modules.
      
      [aryabinin@virtuozzo.com: improve comments]
        Link: http://lkml.kernel.org/r/dabee6ab-3a7a-51cd-3b86-5468718e0390@virtuozzo.com
      [akpm@linux-foundation.org: fix typos, reflow comment]
      Link: http://lkml.kernel.org/r/20180201163349.8700-1-aryabinin@virtuozzo.com
      Fixes: fa69b598 ("mm/kasan: add support for memory hotplug")
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reported-by: default avatarPaul Menzel <pmenzel+linux-kasan-dev@molgen.mpg.de>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0f901dcb
    • Mike Kravetz's avatar
      MAINTAINERS: change hugetlbfs maintainer and update files · b9ddff9b
      Mike Kravetz authored
      The current hugetlbfs maintainer has not been active for more than a few
      years.  I have been been active in this area for more than two years and
      plan to remain active in the foreseeable future.
      
      Also, update the hugetlbfs entry to include linux-mm mail list and
      additional hugetlbfs related files.  hugetlb.c and hugetlb.h are not
      100% hugetlbfs, but a majority of their content is hugetlbfs related.
      
      Link: http://lkml.kernel.org/r/20180518225236.19079-1-mike.kravetz@oracle.comSigned-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Nadia Yvette Chambers <nyc@holomorphy.com>
      Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b9ddff9b
    • Davidlohr Bueso's avatar
      ipc/shm: fix shmat() nil address after round-down when remapping · 8f89c007
      Davidlohr Bueso authored
      shmat()'s SHM_REMAP option forbids passing a nil address for; this is in
      fact the very first thing we check for.  Andrea reported that for
      SHM_RND|SHM_REMAP cases we can end up bypassing the initial addr check,
      but we need to check again if the address was rounded down to nil.  As
      of this patch, such cases will return -EINVAL.
      
      Link: http://lkml.kernel.org/r/20180503204934.kk63josdu6u53fbd@linux-n805Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Reported-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Cc: Joe Lawrence <joe.lawrence@redhat.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8f89c007
    • Davidlohr Bueso's avatar
      Revert "ipc/shm: Fix shmat mmap nil-page protection" · a73ab244
      Davidlohr Bueso authored
      Patch series "ipc/shm: shmat() fixes around nil-page".
      
      These patches fix two issues reported[1] a while back by Joe and Andrea
      around how shmat(2) behaves with nil-page.
      
      The first reverts a commit that it was incorrectly thought that mapping
      nil-page (address=0) was a no no with MAP_FIXED.  This is not the case,
      with the exception of SHM_REMAP; which is address in the second patch.
      
      I chose two patches because it is easier to backport and it explicitly
      reverts bogus behaviour.  Both patches ought to be in -stable and ltp
      testcases need updated (the added testcase around the cve can be
      modified to just test for SHM_RND|SHM_REMAP).
      
      [1] lkml.kernel.org/r/20180430172152.nfa564pvgpk3ut7p@linux-n805
      
      This patch (of 2):
      
      Commit 95e91b83 ("ipc/shm: Fix shmat mmap nil-page protection")
      worked on the idea that we should not be mapping as root addr=0 and
      MAP_FIXED.  However, it was reported that this scenario is in fact
      valid, thus making the patch both bogus and breaks userspace as well.
      
      For example X11's libint10.so relies on shmat(1, SHM_RND) for lowmem
      initialization[1].
      
      [1] https://cgit.freedesktop.org/xorg/xserver/tree/hw/xfree86/os-support/linux/int10/linux.c#n347
      Link: http://lkml.kernel.org/r/20180503203243.15045-2-dave@stgolabs.net
      Fixes: 95e91b83 ("ipc/shm: Fix shmat mmap nil-page protection")
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Reported-by: default avatarJoe Lawrence <joe.lawrence@redhat.com>
      Reported-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a73ab244
    • Matthew Wilcox's avatar
      idr: fix invalid ptr dereference on item delete · 7a4deea1
      Matthew Wilcox authored
      If the radix tree underlying the IDR happens to be full and we attempt
      to remove an id which is larger than any id in the IDR, we will call
      __radix_tree_delete() with an uninitialised 'slot' pointer, at which
      point anything could happen.  This was easiest to hit with a single
      entry at id 0 and attempting to remove a non-0 id, but it could have
      happened with 64 entries and attempting to remove an id >= 64.
      
      Roman said:
      
        The syzcaller test boils down to opening /dev/kvm, creating an
        eventfd, and calling a couple of KVM ioctls. None of this requires
        superuser. And the result is dereferencing an uninitialized pointer
        which is likely a crash. The specific path caught by syzbot is via
        KVM_HYPERV_EVENTD ioctl which is new in 4.17. But I guess there are
        other user-triggerable paths, so cc:stable is probably justified.
      
      Matthew added:
      
        We have around 250 calls to idr_remove() in the kernel today. Many of
        them pass an ID which is embedded in the object they're removing, so
        they're safe. Picking a few likely candidates:
      
        drivers/firewire/core-cdev.c looks unsafe; the ID comes from an ioctl.
        drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c is similar
        drivers/atm/nicstar.c could be taken down by a handcrafted packet
      
      Link: http://lkml.kernel.org/r/20180518175025.GD6361@bombadil.infradead.org
      Fixes: 0a835c4f ("Reimplement IDR and IDA using the radix tree")
      Reported-by: <syzbot+35666cba7f0a337e2e79@syzkaller.appspotmail.com>
      Debugged-by: default avatarRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: default avatarMatthew Wilcox <mawilcox@microsoft.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7a4deea1
    • Changwei Ge's avatar
      ocfs2: revert "ocfs2/o2hb: check len for bio_add_page() to avoid getting incorrect bio" · 3373de20
      Changwei Ge authored
      This reverts commit ba16ddfb ("ocfs2/o2hb: check len for
      bio_add_page() to avoid getting incorrect bio").
      
      In my testing, this patch introduces a problem that mkfs can't have
      slots more than 16 with 4k block size.
      
      And the original logic is safe actually with the situation it mentions
      so revert this commit.
      
      Attach test log:
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 0, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 1, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 2, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 3, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 4, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 5, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 6, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 7, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 8, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 9, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 10, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 11, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 12, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 13, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 14, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 15, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 16, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:471 ERROR: Adding page[16] to bio failed, page ffffea0002d7ed40, len 0, vec_len 4096, vec_start 0,bi_sector 8192
        (mkfs.ocfs2,27479,2):o2hb_read_slots:500 ERROR: status = -5
        (mkfs.ocfs2,27479,2):o2hb_populate_slot_data:1911 ERROR: status = -5
        (mkfs.ocfs2,27479,2):o2hb_region_dev_write:2012 ERROR: status = -5
      
      Link: http://lkml.kernel.org/r/SIXPR06MB0461721F398A5A92FC68C39ED5920@SIXPR06MB0461.apcprd06.prod.outlook.comSigned-off-by: default avatarChangwei Ge <ge.changwei@h3c.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: Yiwen Jiang <jiangyiwen@huawei.com>
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3373de20
    • Omar Sandoval's avatar
      mm: fix nr_rotate_swap leak in swapon() error case · 7cbf3192
      Omar Sandoval authored
      If swapon() fails after incrementing nr_rotate_swap, we don't decrement
      it and thus effectively leak it.  Make sure we decrement it if we
      incremented it.
      
      Link: http://lkml.kernel.org/r/b6fe6b879f17fa68eee6cbd876f459f6e5e33495.1526491581.git.osandov@fb.com
      Fixes: 81a0298b ("mm, swap: don't use VMA based swap readahead if HDD is used as swap")
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Reviewed-by: default avatarRik van Riel <riel@surriel.com>
      Reviewed-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7cbf3192
  2. 25 May, 2018 17 commits
    • Thomas Falcon's avatar
      ibmvnic: Fix partial success login retries · eb110410
      Thomas Falcon authored
      In its current state, the driver will handle backing device
      login in a loop for a certain number of retries while the
      device returns a partial success, indicating that the driver
      may need to try again using a smaller number of resources.
      
      The variable it checks to continue retrying may change
      over the course of operations, resulting in reallocation
      of resources but exits without sending the login attempt.
      Guard against this by introducing a boolean variable that
      will retain the state indicating that the driver needs to
      reattempt login with backing device firmware.
      Signed-off-by: default avatarThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eb110410
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · d2f30f51
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2018-05-24
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Fix a bug in the original fix to prevent out of bounds speculation when
         multiple tail call maps from different branches or calls end up at the
         same tail call helper invocation, from Daniel.
      
      2) Two selftest fixes, one in reuseport_bpf_numa where test is skipped in
         case of missing numa support and another one to update kernel config to
         properly support xdp_meta.sh test, from Anders.
      
       ...
      
      Would be great if you have a chance to merge net into net-next after that.
      
      The verifier fix would be needed later as a dependency in bpf-next for
      upcomig work there. When you do the merge there's a trivial conflict on
      BPF side with 849fa506 ("bpf/verifier: refine retval R0 state for
      bpf_get_stack helper"): Resolution is to keep both functions, the
      do_refine_retval_range() and record_func_map().
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d2f30f51
    • Stefano Brivio's avatar
      selftests/net: Add missing config options for PMTU tests · 24e4b075
      Stefano Brivio authored
      PMTU tests in pmtu.sh need support for VTI, VTI6 and dummy
      interfaces: add them to config file.
      Reported-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Fixes: d1f1b9cb ("selftests: net: Introduce first PMTU test")
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      24e4b075
    • David S. Miller's avatar
      Merge tag 'batadv-net-for-davem-20180524' of git://git.open-mesh.org/linux-merge · e3ffec48
      David S. Miller authored
      Simon Wunderlich says:
      
      ====================
      Here are some batman-adv bugfixes:
      
       - prevent hardif_put call with NULL parameter, by Colin Ian King
      
       - Avoid race in Translation Table allocator, by Sven Eckelmann
      
       - Fix Translation Table sync flags for intermediate Responses,
         by Linus Luessing
      
       - prevent sending inconsistent Translation Table TVLVs,
         by Marek Lindner
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e3ffec48
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 62d18ecf
      Linus Torvalds authored
      Pull more arm64 fixes from Will Deacon:
      
       - fix application of read-only permissions to kernel section mappings
      
       - sanitise reported ESR values for signals delivered on a kernel
         address
      
       - ensure tishift GCC helpers are exported to modules
      
       - fix inline asm constraints for some LSE atomics
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: Make sure permission updates happen for pmd/pud
        arm64: fault: Don't leak data in ESR context for user fault on kernel VA
        arm64: export tishift functions to modules
        arm64: lse: Add early clobbers to some input/output asm operands
      62d18ecf
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.17-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · b133ef6e
      Linus Torvalds authored
      Pull powerpc fix from Michael Ellerman:
       "Just one fix, to make sure the PCR (Processor Compatibility Register)
        is reset on boot.
      
        Otherwise if we're running in compat mode in a guest (eg. pretending a
        Power9 is a Power8) and the host kernel oopses and kdumps then the
        kdump kernel's userspace will be running in Power8 mode, and will
        SIGILL if it uses Power9-only instructions.
      
        Thanks to Michael Neuling"
      
      * tag 'powerpc-4.17-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/64s: Clear PCR on boot
      b133ef6e
    • Linus Torvalds's avatar
      Merge tag 'mmc-v4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · f287fe35
      Linus Torvalds authored
      Pull MMC fixes from Ulf Hansson:
       "MMC core:
         - Propagate correct error code for RPMB requests
      
        MMC host:
         - sdhci-iproc: Drop hard coded cap for 1.8v
         - sdhci-iproc: Fix 32bit writes for transfer mode
         - sdhci-iproc: Enable SDHCI_QUIRK2_HOST_OFF_CARD_ON for cygnus"
      
      * tag 'mmc-v4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
        mmc: sdhci-iproc: add SDHCI_QUIRK2_HOST_OFF_CARD_ON for cygnus
        mmc: sdhci-iproc: fix 32bit writes for TRANSFER_MODE register
        mmc: sdhci-iproc: remove hard coded mmc cap 1.8v
        mmc: block: propagate correct returned value in mmc_rpmb_ioctl
      f287fe35
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-for-v4.17-rc7' of git://people.freedesktop.org/~airlied/linux · b9f57019
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Only two sets of drivers fixes: one rcar-du lvds regression fix, and a
        group of fixes for vmwgfx"
      
      * tag 'drm-fixes-for-v4.17-rc7' of git://people.freedesktop.org/~airlied/linux:
        drm/vmwgfx: Schedule an fb dirty update after resume
        drm/vmwgfx: Fix host logging / guestinfo reading error paths
        drm/vmwgfx: Fix 32-bit VMW_PORT_HB_[IN|OUT] macros
        drm: rcar-du: lvds: Fix crash in .atomic_check when disabling connector
      b9f57019
    • Linus Torvalds's avatar
      Merge tag 'sound-4.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · a1a9f537
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "Two fixes:
      
         - a timer pause event notification was garbled upon the recent
           hardening work; corrected now
      
         - HD-audio runtime PM regression fix due to the incorrect return
           type"
      
      * tag 'sound-4.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: hda - Fix runtime PM
        ALSA: timer: Fix pause event notification
      a1a9f537
    • Qing Huang's avatar
      mlx4_core: allocate ICM memory in page size chunks · 1383cb81
      Qing Huang authored
      When a system is under memory presure (high usage with fragments),
      the original 256KB ICM chunk allocations will likely trigger kernel
      memory management to enter slow path doing memory compact/migration
      ops in order to complete high order memory allocations.
      
      When that happens, user processes calling uverb APIs may get stuck
      for more than 120s easily even though there are a lot of free pages
      in smaller chunks available in the system.
      
      Syslog:
      ...
      Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task
      oracle_205573_e:205573 blocked for more than 120 seconds.
      ...
      
      With 4KB ICM chunk size on x86_64 arch, the above issue is fixed.
      
      However in order to support smaller ICM chunk size, we need to fix
      another issue in large size kcalloc allocations.
      
      E.g.
      Setting log_num_mtt=30 requires 1G mtt entries. With the 4KB ICM chunk
      size, each ICM chunk can only hold 512 mtt entries (8 bytes for each mtt
      entry). So we need a 16MB allocation for a table->icm pointer array to
      hold 2M pointers which can easily cause kcalloc to fail.
      
      The solution is to use kvzalloc to replace kcalloc which will fall back
      to vmalloc automatically if kmalloc fails.
      Signed-off-by: default avatarQing Huang <qing.huang@oracle.com>
      Acked-by: default avatarDaniel Jurgens <danielj@mellanox.com>
      Reviewed-by: default avatarZhu Yanjun <yanjun.zhu@oracle.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1383cb81
    • Govindarajulu Varadarajan's avatar
      enic: set DMA mask to 47 bit · 322eaa06
      Govindarajulu Varadarajan authored
      In commit 624dbf55 ("driver/net: enic: Try DMA 64 first, then
      failover to DMA") DMA mask was changed from 40 bits to 64 bits.
      Hardware actually supports only 47 bits.
      
      Fixes: 624dbf55 ("driver/net: enic: Try DMA 64 first, then failover to DMA")
      Signed-off-by: default avatarGovindarajulu Varadarajan <gvaradar@cisco.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      322eaa06
    • Eric Biggers's avatar
      ppp: remove the PPPIOCDETACH ioctl · af8d3c7c
      Eric Biggers authored
      The PPPIOCDETACH ioctl effectively tries to "close" the given ppp file
      before f_count has reached 0, which is fundamentally a bad idea.  It
      does check 'f_count < 2', which excludes concurrent operations on the
      file since they would only be possible with a shared fd table, in which
      case each fdget() would take a file reference.  However, it fails to
      account for the fact that even with 'f_count == 1' the file can still be
      linked into epoll instances.  As reported by syzbot, this can trivially
      be used to cause a use-after-free.
      
      Yet, the only known user of PPPIOCDETACH is pppd versions older than
      ppp-2.4.2, which was released almost 15 years ago (November 2003).
      Also, PPPIOCDETACH apparently stopped working reliably at around the
      same time, when the f_count check was added to the kernel, e.g. see
      https://lkml.org/lkml/2002/12/31/83.  Also, the current 'f_count < 2'
      check makes PPPIOCDETACH only work in single-threaded applications; it
      always fails if called from a multithreaded application.
      
      All pppd versions released in the last 15 years just close() the file
      descriptor instead.
      
      Therefore, instead of hacking around this bug by exporting epoll
      internals to modules, and probably missing other related bugs, just
      remove the PPPIOCDETACH ioctl and see if anyone actually notices.  Leave
      a stub in place that prints a one-time warning and returns EINVAL.
      
      Reported-by: syzbot+16363c99d4134717c05b@syzkaller.appspotmail.com
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Acked-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Tested-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      af8d3c7c
    • Willem de Bruijn's avatar
      ipv4: remove warning in ip_recv_error · 730c54d5
      Willem de Bruijn authored
      A precondition check in ip_recv_error triggered on an otherwise benign
      race. Remove the warning.
      
      The warning triggers when passing an ipv6 socket to this ipv4 error
      handling function. RaceFuzzer was able to trigger it due to a race
      in setsockopt IPV6_ADDRFORM.
      
        ---
        CPU0
          do_ipv6_setsockopt
            sk->sk_socket->ops = &inet_dgram_ops;
      
        ---
        CPU1
          sk->sk_prot->recvmsg
            udp_recvmsg
              ip_recv_error
                WARN_ON_ONCE(sk->sk_family == AF_INET6);
      
        ---
        CPU0
          do_ipv6_setsockopt
            sk->sk_family = PF_INET;
      
      This socket option converts a v6 socket that is connected to a v4 peer
      to an v4 socket. It updates the socket on the fly, changing fields in
      sk as well as other structs. This is inherently non-atomic. It races
      with the lockless udp_recvmsg path.
      
      No other code makes an assumption that these fields are updated
      atomically. It is benign here, too, as ip_recv_error cares only about
      the protocol of the skbs enqueued on the error queue, for which
      sk_family is not a precise predictor (thanks to another isue with
      IPV6_ADDRFORM).
      
      Link: http://lkml.kernel.org/r/20180518120826.GA19515@dragonet.kaist.ac.kr
      Fixes: 7ce875e5 ("ipv4: warn once on passing AF_INET6 socket to ip_recv_error")
      Reported-by: default avatarDaeRyong Jeong <threeearcat@gmail.com>
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      730c54d5
    • Or Gerlitz's avatar
      net : sched: cls_api: deal with egdev path only if needed · f8f4bef3
      Or Gerlitz authored
      When dealing with ingress rule on a netdev, if we did fine through the
      conventional path, there's no need to continue into the egdev route,
      and we can stop right there.
      
      Not doing so may cause a 2nd rule to be added by the cls api layer
      with the ingress being the egdev.
      
      For example, under sriov switchdev scheme, a user rule of VFR A --> VFR B
      will end up with two HW rules (1) VF A --> VF B and (2) uplink --> VF B
      
      Fixes: 208c0f4b ('net: sched: use tc_setup_cb_call to call per-block callbacks')
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f8f4bef3
    • Jason Wang's avatar
      vhost: synchronize IOTLB message with dev cleanup · 1b15ad68
      Jason Wang authored
      DaeRyong Jeong reports a race between vhost_dev_cleanup() and
      vhost_process_iotlb_msg():
      
      Thread interleaving:
      CPU0 (vhost_process_iotlb_msg)			CPU1 (vhost_dev_cleanup)
      (In the case of both VHOST_IOTLB_UPDATE and
      VHOST_IOTLB_INVALIDATE)
      
      =====						=====
      						vhost_umem_clean(dev->iotlb);
      if (!dev->iotlb) {
      	        ret = -EFAULT;
      		        break;
      }
      						dev->iotlb = NULL;
      
      The reason is we don't synchronize between them, fixing by protecting
      vhost_process_iotlb_msg() with dev mutex.
      Reported-by: default avatarDaeRyong Jeong <threeearcat@gmail.com>
      Fixes: 6b1e6cc7 ("vhost: new device IOTLB API")
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1b15ad68
    • David S. Miller's avatar
      Merge tag 'mlx5-fixes-2018-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · d681bc02
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      Mellanox, mlx5 fixes 2018-05-24
      
      This series includes two mlx5 fixes.
      
      1) add FCS data to checksum complete when required, from Eran Ben
      Elisha.
      
      2) Fix A race in IPSec sandbox QP commands, from Yossi Kuperman.
      
      Please pull and let me know if there's any problem.
      
      for -stable v4.15
      ("net/mlx5e: When RXFCS is set, add FCS data into checksum calculation")
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d681bc02
    • Willem de Bruijn's avatar
      packet: fix reserve calculation · 9aad13b0
      Willem de Bruijn authored
      Commit b84bbaf7 ("packet: in packet_snd start writing at link
      layer allocation") ensures that packet_snd always starts writing
      the link layer header in reserved headroom allocated for this
      purpose.
      
      This is needed because packets may be shorter than hard_header_len,
      in which case the space up to hard_header_len may be zeroed. But
      that necessary padding is not accounted for in skb->len.
      
      The fix, however, is buggy. It calls skb_push, which grows skb->len
      when moving skb->data back. But in this case packet length should not
      change.
      
      Instead, call skb_reserve, which moves both skb->data and skb->tail
      back, without changing length.
      
      Fixes: b84bbaf7 ("packet: in packet_snd start writing at link layer allocation")
      Reported-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9aad13b0
  3. 24 May, 2018 5 commits
    • Dave Airlie's avatar
      Merge branch 'vmwgfx-fixes-4.17' of git://people.freedesktop.org/~thomash/linux into drm-fixes · 4bc6f777
      Dave Airlie authored
      Three fixes for vmwgfx. Two are cc'd stable and fix host logging and its
      error paths on 32-bit VMs. One is a fix for a hibernate flaw
      introduced with the 4.17 merge window.
      
      * 'vmwgfx-fixes-4.17' of git://people.freedesktop.org/~thomash/linux:
        drm/vmwgfx: Schedule an fb dirty update after resume
        drm/vmwgfx: Fix host logging / guestinfo reading error paths
        drm/vmwgfx: Fix 32-bit VMW_PORT_HB_[IN|OUT] macros
      4bc6f777
    • Linus Torvalds's avatar
      Merge branch 'stable/for-linus-4.17' of... · b5069438
      Linus Torvalds authored
      Merge branch 'stable/for-linus-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb
      
      Pull swiotlb fix from Konrad Rzeszutek Wilk:
       "One single fix in here: under Xen the DMA32 heap (in the hypervisor)
        would end up looking like swiss cheese.
      
        The reason being that for every coherent DMA allocation we didn't do
        the proper hypercall to tell Xen to return the page back to the DMA32
        heap. End result was (eventually) no DMA32 space if you (for example)
        continously unloaded and loaded modules"
      
      * 'stable/for-linus-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
        xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent
      b5069438
    • Yossi Kuperman's avatar
      net/mlx5: IPSec, Fix a race between concurrent sandbox QP commands · 1dcbc01f
      Yossi Kuperman authored
      Sandbox QP Commands are retired in the order they are sent. Outstanding
      commands are stored in a linked-list in the order they appear. Once a
      response is received and the callback gets called, we pull the first
      element off the pending list, assuming they correspond.
      
      Sending a message and adding it to the pending list is not done atomically,
      hence there is an opportunity for a race between concurrent requests.
      
      Bind both send and add under a critical section.
      
      Fixes: bebb23e6 ("net/mlx5: Accel, Add IPSec acceleration interface")
      Signed-off-by: default avatarYossi Kuperman <yossiku@mellanox.com>
      Signed-off-by: default avatarAdi Nissim <adin@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      1dcbc01f
    • Eran Ben Elisha's avatar
      net/mlx5e: When RXFCS is set, add FCS data into checksum calculation · 902a5459
      Eran Ben Elisha authored
      When RXFCS feature is enabled, the HW do not strip the FCS data,
      however it is not present in the checksum calculated by the HW.
      
      Fix that by manually calculating the FCS checksum and adding it to the SKB
      checksum field.
      
      Add helper function to find the FCS data for all SKB forms (linear,
      one fragment or more).
      
      Fixes: 102722fc ("net/mlx5e: Add support for RXFCS feature flag")
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      902a5459
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · 34b48b87
      Linus Torvalds authored
      Pull rdma fixes from Jason Gunthorpe:
       "This is pretty much just the usual array of smallish driver bugs.
      
         - remove bouncing addresses from the MAINTAINERS file
      
         - kernel oops and bad error handling fixes for hfi, i40iw, cxgb4, and
           hns drivers
      
         - various small LOC behavioral/operational bugs in mlx5, hns, qedr
           and i40iw drivers
      
         - two fixes for patches already sent during the merge window
      
         - a long-standing bug related to not decreasing the pinned pages
           count in the right MM was found and fixed"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (28 commits)
        RDMA/hns: Move the location for initializing tmp_len
        RDMA/hns: Bugfix for cq record db for kernel
        IB/uverbs: Fix uverbs_attr_get_obj
        RDMA/qedr: Fix doorbell bar mapping for dpi > 1
        IB/umem: Use the correct mm during ib_umem_release
        iw_cxgb4: Fix an error handling path in 'c4iw_get_dma_mr()'
        RDMA/i40iw: Avoid panic when reading back the IRQ affinity hint
        RDMA/i40iw: Avoid reference leaks when processing the AEQ
        RDMA/i40iw: Avoid panic when objects are being created and destroyed
        RDMA/hns: Fix the bug with NULL pointer
        RDMA/hns: Set NULL for __internal_mr
        RDMA/hns: Enable inner_pa_vld filed of mpt
        RDMA/hns: Set desc_dma_addr for zero when free cmq desc
        RDMA/hns: Fix the bug with rq sge
        RDMA/hns: Not support qp transition from reset to reset for hip06
        RDMA/hns: Add return operation when configured global param fail
        RDMA/hns: Update convert function of endian format
        RDMA/hns: Load the RoCE dirver automatically
        RDMA/hns: Bugfix for rq record db for kernel
        RDMA/hns: Add rq inline flags judgement
        ...
      34b48b87