1. 22 Nov, 2019 3 commits
    • Andrey Ryabinin's avatar
      mm/ksm.c: don't WARN if page is still mapped in remove_stable_node() · 9a63236f
      Andrey Ryabinin authored
      It's possible to hit the WARN_ON_ONCE(page_mapped(page)) in
      remove_stable_node() when it races with __mmput() and squeezes in
      between ksm_exit() and exit_mmap().
      
        WARNING: CPU: 0 PID: 3295 at mm/ksm.c:888 remove_stable_node+0x10c/0x150
      
        Call Trace:
         remove_all_stable_nodes+0x12b/0x330
         run_store+0x4ef/0x7b0
         kernfs_fop_write+0x200/0x420
         vfs_write+0x154/0x450
         ksys_write+0xf9/0x1d0
         do_syscall_64+0x99/0x510
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Remove the warning as there is nothing scary going on.
      
      Link: http://lkml.kernel.org/r/20191119131850.5675-1-aryabinin@virtuozzo.com
      Fixes: cbf86cfe ("ksm: remove old stable nodes more thoroughly")
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9a63236f
    • David Hildenbrand's avatar
      mm/memory_hotplug: don't access uninitialized memmaps in shrink_zone_span() · 7ce700bf
      David Hildenbrand authored
      Let's limit shrinking to !ZONE_DEVICE so we can fix the current code.
      We should never try to touch the memmap of offline sections where we
      could have uninitialized memmaps and could trigger BUGs when calling
      page_to_nid() on poisoned pages.
      
      There is no reliable way to distinguish an uninitialized memmap from an
      initialized memmap that belongs to ZONE_DEVICE, as we don't have
      anything like SECTION_IS_ONLINE we can use similar to
      pfn_to_online_section() for !ZONE_DEVICE memory.
      
      E.g., set_zone_contiguous() similarly relies on pfn_to_online_section()
      and will therefore never set a ZONE_DEVICE zone consecutive.  Stopping
      to shrink the ZONE_DEVICE therefore results in no observable changes,
      besides /proc/zoneinfo indicating different boundaries - something we
      can totally live with.
      
      Before commit d0dc12e8 ("mm/memory_hotplug: optimize memory
      hotplug"), the memmap was initialized with 0 and the node with the right
      value.  So the zone might be wrong but not garbage.  After that commit,
      both the zone and the node will be garbage when touching uninitialized
      memmaps.
      
      Toshiki reported a BUG (race between delayed initialization of
      ZONE_DEVICE memmaps without holding the memory hotplug lock and
      concurrent zone shrinking).
      
        https://lkml.org/lkml/2019/11/14/1040
      
      "Iteration of create and destroy namespace causes the panic as below:
      
            kernel BUG at mm/page_alloc.c:535!
            CPU: 7 PID: 2766 Comm: ndctl Not tainted 5.4.0-rc4 #6
            Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
            RIP: 0010:set_pfnblock_flags_mask+0x95/0xf0
            Call Trace:
             memmap_init_zone_device+0x165/0x17c
             memremap_pages+0x4c1/0x540
             devm_memremap_pages+0x1d/0x60
             pmem_attach_disk+0x16b/0x600 [nd_pmem]
             nvdimm_bus_probe+0x69/0x1c0
             really_probe+0x1c2/0x3e0
             driver_probe_device+0xb4/0x100
             device_driver_attach+0x4f/0x60
             bind_store+0xc9/0x110
             kernfs_fop_write+0x116/0x190
             vfs_write+0xa5/0x1a0
             ksys_write+0x59/0xd0
             do_syscall_64+0x5b/0x180
             entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        While creating a namespace and initializing memmap, if you destroy the
        namespace and shrink the zone, it will initialize the memmap outside
        the zone and trigger VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page),
        pfn), page) in set_pfnblock_flags_mask()."
      
      This BUG is also mitigated by this commit, where we for now stop to
      shrink the ZONE_DEVICE zone until we can do it in a safe and clean way.
      
      Link: http://lkml.kernel.org/r/20191006085646.5768-5-david@redhat.com
      Fixes: f1dd2cd1 ("mm, memory_hotplug: do not associate hotadded memory to zones until online")	[visible after d0dc12e8]
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reported-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Reported-by: default avatarToshiki Fukasawa <t-fukasawa@vx.jp.nec.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Damian Tometzki <damian.tometzki@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Halil Pasic <pasic@linux.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Jun Yao <yaojun8558363@gmail.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Pankaj Gupta <pagupta@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Steve Capper <steve.capper@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: <stable@vger.kernel.org>	[4.13+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7ce700bf
    • Joseph Qi's avatar
      Revert "fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry()" · 94b07b6f
      Joseph Qi authored
      This reverts commit 56e94ea1.
      
      Commit 56e94ea1 ("fs: ocfs2: fix possible null-pointer dereferences
      in ocfs2_xa_prepare_entry()") introduces a regression that fail to
      create directory with mount option user_xattr and acl.  Actually the
      reported NULL pointer dereference case can be correctly handled by
      loc->xl_ops->xlo_add_entry(), so revert it.
      
      Link: http://lkml.kernel.org/r/1573624916-83825-1-git-send-email-joseph.qi@linux.alibaba.com
      Fixes: 56e94ea1 ("fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry()")
      Signed-off-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Reported-by: default avatarThomas Voegtle <tv@lio96.de>
      Acked-by: default avatarChangwei Ge <gechangwei@live.cn>
      Cc: Jia-Ju Bai <baijiaju1990@gmail.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      94b07b6f
  2. 21 Nov, 2019 4 commits
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 81429eb8
      Linus Torvalds authored
      Pull arm64 fix from Will Deacon:
       "Ensure PAN is re-enabled following user fault in uaccess routines.
      
        After I thought we were done for 5.4, we had a report this week of a
        nasty issue that has been shown to leak data between different user
        address spaces thanks to corruption of entries in the TLB. In
        hindsight, we should have spotted this in review when the PAN code was
        merged back in v4.3, but hindsight is 20/20 and I'm trying not to beat
        myself up too much about it despite being fairly miserable.
      
        Anyway, the fix is "obvious" but the actual failure is more more
        subtle, and is described in the commit message. I've included a fairly
        mechanical follow-up patch here as well, which moves this checking out
        into the C wrappers which is what we do for {get,put}_user() already
        and allows us to remove these bloody assembly macros entirely. The
        patches have passed kernelci [1] [2] [3] and CKI [4] tests over night,
        as well as some targetted testing [5] for this particular issue.
      
        The first patch is tagged for stable and should be applied to 4.14,
        4.19 and 5.3. I have separate backports for 4.4 and 4.9, which I'll
        send out once this has landed in your tree (although the original
        patch applies cleanly, it won't build for those two trees).
      
        Thanks to Pavel Tatashin for reporting this and Mark Rutland for
        helping to diagnose the issue and review/test the solution"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: uaccess: Remove uaccess_*_not_uao asm macros
        arm64: uaccess: Ensure PAN is re-enabled after unhandled uaccess fault
      81429eb8
    • Linus Torvalds's avatar
      Merge tag 'for-linus-20191121' of git://git.kernel.dk/linux-block · be5fa3aa
      Linus Torvalds authored
      Pull block fix from Jens Axboe:
       "Just a single fix for an issue in nbd introduced in this cycle"
      
      * tag 'for-linus-20191121' of git://git.kernel.dk/linux-block:
        nbd:fix memory leak in nbd_get_socket()
      be5fa3aa
    • Linus Torvalds's avatar
      Merge tag 'gpio-v5.4-5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio · cec353f6
      Linus Torvalds authored
      Pull GPIO fixes from Linus Walleij:
       "A last set of small fixes for GPIO, this cycle was quite busy.
      
         - Fix debounce delays on the MAX77620 GPIO expander
      
         - Use the correct unit for debounce times on the BD70528 GPIO expander
      
         - Get proper deps for parallel builds of the GPIO tools
      
         - Add a specific ACPI quirk for the Terra Pad 1061"
      
      * tag 'gpio-v5.4-5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
        gpiolib: acpi: Add Terra Pad 1061 to the run_edge_events_on_boot_blacklist
        tools: gpio: Correctly add make dependencies for gpio_utils
        gpio: bd70528: Use correct unit for debounce times
        gpio: max77620: Fixup debounce delays
      cec353f6
    • Linus Torvalds's avatar
      Merge tag 'for-linus-2019-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux · d324810a
      Linus Torvalds authored
      Pull pidfd fixlet from Christian Brauner:
       "This contains a simple fix for the pidfd poll method. In the original
        patchset pidfd_poll() was made to return an unsigned int. However, the
        poll method is defined to return a __poll_t. While the unsigned int is
        not a huge deal it's just nicer to return a __poll_t.
      
        I've decided to send it right before the 5.4 release mainly so that
        stable doesn't need to backport it to both 5.4 and 5.3"
      
      * tag 'for-linus-2019-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
        fork: fix pidfd_poll()'s return type
      d324810a
  3. 20 Nov, 2019 3 commits
    • Pavel Tatashin's avatar
      arm64: uaccess: Remove uaccess_*_not_uao asm macros · e50be648
      Pavel Tatashin authored
      It is safer and simpler to drop the uaccess assembly macros in favour of
      inline C functions. Although this bloats the Image size slightly, it
      aligns our user copy routines with '{get,put}_user()' and generally
      makes the code a lot easier to reason about.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: default avatarMark Rutland <mark.rutland@arm.com>
      Tested-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@soleen.com>
      [will: tweaked commit message and changed temporary variable names]
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      e50be648
    • Pavel Tatashin's avatar
      arm64: uaccess: Ensure PAN is re-enabled after unhandled uaccess fault · 94bb804e
      Pavel Tatashin authored
      A number of our uaccess routines ('__arch_clear_user()' and
      '__arch_copy_{in,from,to}_user()') fail to re-enable PAN if they
      encounter an unhandled fault whilst accessing userspace.
      
      For CPUs implementing both hardware PAN and UAO, this bug has no effect
      when both extensions are in use by the kernel.
      
      For CPUs implementing hardware PAN but not UAO, this means that a kernel
      using hardware PAN may execute portions of code with PAN inadvertently
      disabled, opening us up to potential security vulnerabilities that rely
      on userspace access from within the kernel which would usually be
      prevented by this mechanism. In other words, parts of the kernel run the
      same way as they would on a CPU without PAN implemented/emulated at all.
      
      For CPUs not implementing hardware PAN and instead relying on software
      emulation via 'CONFIG_ARM64_SW_TTBR0_PAN=y', the impact is unfortunately
      much worse. Calling 'schedule()' with software PAN disabled means that
      the next task will execute in the kernel using the page-table and ASID
      of the previous process even after 'switch_mm()', since the actual
      hardware switch is deferred until return to userspace. At this point, or
      if there is a intermediate call to 'uaccess_enable()', the page-table
      and ASID of the new process are installed. Sadly, due to the changes
      introduced by KPTI, this is not an atomic operation and there is a very
      small window (two instructions) where the CPU is configured with the
      page-table of the old task and the ASID of the new task; a speculative
      access in this state is disastrous because it would corrupt the TLB
      entries for the new task with mappings from the previous address space.
      
      As Pavel explains:
      
        | I was able to reproduce memory corruption problem on Broadcom's SoC
        | ARMv8-A like this:
        |
        | Enable software perf-events with PERF_SAMPLE_CALLCHAIN so userland's
        | stack is accessed and copied.
        |
        | The test program performed the following on every CPU and forking
        | many processes:
        |
        |	unsigned long *map = mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE,
        |				  MAP_SHARED | MAP_ANONYMOUS, -1, 0);
        |	map[0] = getpid();
        |	sched_yield();
        |	if (map[0] != getpid()) {
        |		fprintf(stderr, "Corruption detected!");
        |	}
        |	munmap(map, PAGE_SIZE);
        |
        | From time to time I was getting map[0] to contain pid for a
        | different process.
      
      Ensure that PAN is re-enabled when returning after an unhandled user
      fault from our uaccess routines.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: default avatarMark Rutland <mark.rutland@arm.com>
      Tested-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: <stable@vger.kernel.org>
      Fixes: 338d4f49 ("arm64: kernel: Add support for Privileged Access Never")
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@soleen.com>
      [will: rewrote commit message]
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      94bb804e
    • Luc Van Oostenryck's avatar
      fork: fix pidfd_poll()'s return type · 9e77716a
      Luc Van Oostenryck authored
      pidfd_poll() is defined as returning 'unsigned int' but the
      .poll method is declared as returning '__poll_t', a bitwise type.
      
      Fix this by using the proper return type and using the EPOLL
      constants instead of the POLL ones, as required for __poll_t.
      
      Fixes: b53b0b9d ("pidfd: add polling support")
      Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
      Cc: stable@vger.kernel.org # 5.3
      Signed-off-by: default avatarLuc Van Oostenryck <luc.vanoostenryck@gmail.com>
      Reviewed-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Link: https://lore.kernel.org/r/20191120003320.31138-1-luc.vanoostenryck@gmail.comSigned-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      9e77716a
  4. 19 Nov, 2019 3 commits
  5. 17 Nov, 2019 8 commits
  6. 16 Nov, 2019 19 commits
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5ffaf037
      Linus Torvalds authored
      Pull perf fixes from Ingo Molnar:
       "Misc fixes: a handful of AUX event handling related fixes, a Sparse
        fix and two ABI fixes"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/core: Fix missing static inline on perf_cgroup_switch()
        perf/core: Consistently fail fork on allocation failures
        perf/aux: Disallow aux_output for kernel events
        perf/core: Reattach a misplaced comment
        perf/aux: Fix the aux_output group inheritance fix
        perf/core: Disallow uncore-cgroup events
      5ffaf037
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 8be636dd
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix memory leak in xfrm_state code, from Steffen Klassert.
      
       2) Fix races between devlink reload operations and device
          setup/cleanup, from Jiri Pirko.
      
       3) Null deref in NFC code, from Stephan Gerhold.
      
       4) Refcount fixes in SMC, from Ursula Braun.
      
       5) Memory leak in slcan open error paths, from Jouni Hogander.
      
       6) Fix ETS bandwidth validation in hns3, from Yonglong Liu.
      
       7) Info leak on short USB request answers in ax88172a driver, from
          Oliver Neukum.
      
       8) Release mem region properly in ep93xx_eth, from Chuhong Yuan.
      
       9) PTP config timestamp flags validation, from Richard Cochran.
      
      10) Dangling pointers after SKB data realloc in seg6, from Andrea Mayer.
      
      11) Missing free_netdev() in gemini driver, from Chuhong Yuan.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (56 commits)
        ipmr: Fix skb headroom in ipmr_get_route().
        net: hns3: cleanup of stray struct hns3_link_mode_mapping
        net/smc: fix fastopen for non-blocking connect()
        rds: ib: update WR sizes when bringing up connection
        net: gemini: add missed free_netdev
        net: dsa: tag_8021q: Fix dsa_8021q_restore_pvid for an absent pvid
        seg6: fix skb transport_header after decap_and_validate()
        seg6: fix srh pointer in get_srh()
        net: stmmac: Use the correct style for SPDX License Identifier
        octeontx2-af: Use the correct style for SPDX License Identifier
        ptp: Extend the test program to check the external time stamp flags.
        mlx5: Reject requests to enable time stamping on both edges.
        igb: Reject requests that fail to enable time stamping on both edges.
        dp83640: Reject requests to enable time stamping on both edges.
        mv88e6xxx: Reject requests to enable time stamping on both edges.
        ptp: Introduce strict checking of external time stamp options.
        renesas: reject unsupported external timestamp flags
        mlx5: reject unsupported external timestamp flags
        igb: reject unsupported external timestamp flags
        dp83640: reject unsupported external timestamp flags
        ...
      8be636dd
    • Guillaume Nault's avatar
      ipmr: Fix skb headroom in ipmr_get_route(). · 7901cd97
      Guillaume Nault authored
      In route.c, inet_rtm_getroute_build_skb() creates an skb with no
      headroom. This skb is then used by inet_rtm_getroute() which may pass
      it to rt_fill_info() and, from there, to ipmr_get_route(). The later
      might try to reuse this skb by cloning it and prepending an IPv4
      header. But since the original skb has no headroom, skb_push() triggers
      skb_under_panic():
      
      skbuff: skb_under_panic: text:00000000ca46ad8a len:80 put:20 head:00000000cd28494e data:000000009366fd6b tail:0x3c end:0xec0 dev:veth0
      ------------[ cut here ]------------
      kernel BUG at net/core/skbuff.c:108!
      invalid opcode: 0000 [#1] SMP KASAN PTI
      CPU: 6 PID: 587 Comm: ip Not tainted 5.4.0-rc6+ #1
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
      RIP: 0010:skb_panic+0xbf/0xd0
      Code: 41 a2 ff 8b 4b 70 4c 8b 4d d0 48 c7 c7 20 76 f5 8b 44 8b 45 bc 48 8b 55 c0 48 8b 75 c8 41 54 41 57 41 56 41 55 e8 75 dc 7a ff <0f> 0b 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
      RSP: 0018:ffff888059ddf0b0 EFLAGS: 00010286
      RAX: 0000000000000086 RBX: ffff888060a315c0 RCX: ffffffff8abe4822
      RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88806c9a79cc
      RBP: ffff888059ddf118 R08: ffffed100d9361b1 R09: ffffed100d9361b0
      R10: ffff88805c68aee3 R11: ffffed100d9361b1 R12: ffff88805d218000
      R13: ffff88805c689fec R14: 000000000000003c R15: 0000000000000ec0
      FS:  00007f6af184b700(0000) GS:ffff88806c980000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ffc8204a000 CR3: 0000000057b40006 CR4: 0000000000360ee0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       skb_push+0x7e/0x80
       ipmr_get_route+0x459/0x6fa
       rt_fill_info+0x692/0x9f0
       inet_rtm_getroute+0xd26/0xf20
       rtnetlink_rcv_msg+0x45d/0x630
       netlink_rcv_skb+0x1a5/0x220
       rtnetlink_rcv+0x15/0x20
       netlink_unicast+0x305/0x3a0
       netlink_sendmsg+0x575/0x730
       sock_sendmsg+0xb5/0xc0
       ___sys_sendmsg+0x497/0x4f0
       __sys_sendmsg+0xcb/0x150
       __x64_sys_sendmsg+0x48/0x50
       do_syscall_64+0xd2/0xac0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Actually the original skb used to have enough headroom, but the
      reserve_skb() call was lost with the introduction of
      inet_rtm_getroute_build_skb() by commit 404eb77e ("ipv4: support
      sport, dport and ip_proto in RTM_GETROUTE").
      
      We could reserve some headroom again in inet_rtm_getroute_build_skb(),
      but this function shouldn't be responsible for handling the special
      case of ipmr_get_route(). Let's handle that directly in
      ipmr_get_route() by calling skb_realloc_headroom() instead of
      skb_clone().
      
      Fixes: 404eb77e ("ipv4: support sport, dport and ip_proto in RTM_GETROUTE")
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7901cd97
    • Salil Mehta's avatar
      net: hns3: cleanup of stray struct hns3_link_mode_mapping · b696083d
      Salil Mehta authored
      This patch cleans-up the stray left over code. It has no
      functionality impact.
      Signed-off-by: default avatarSalil Mehta <salil.mehta@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b696083d
    • Ursula Braun's avatar
      net/smc: fix fastopen for non-blocking connect() · 8204df72
      Ursula Braun authored
      FASTOPEN does not work with SMC-sockets. Since SMC allows fallback to
      TCP native during connection start, the FASTOPEN setsockopts trigger
      this fallback, if the SMC-socket is still in state SMC_INIT.
      But if a FASTOPEN setsockopt is called after a non-blocking connect(),
      this is broken, and fallback does not make sense.
      This change complements
      commit cd206360 ("net/smc: avoid fallback in case of non-blocking connect")
      and fixes the syzbot reported problem "WARNING in smc_unhash_sk".
      
      Reported-by: syzbot+8488cc4cf1c9e09b8b86@syzkaller.appspotmail.com
      Fixes: e1bbdd57 ("net/smc: reduce sock_put() for fallback sockets")
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8204df72
    • Dag Moxnes's avatar
      rds: ib: update WR sizes when bringing up connection · a36e629e
      Dag Moxnes authored
      Currently WR sizes are updated from rds_ib_sysctl_max_send_wr and
      rds_ib_sysctl_max_recv_wr when a connection is shut down. As a result,
      a connection being down while rds_ib_sysctl_max_send_wr or
      rds_ib_sysctl_max_recv_wr are updated, will not update the sizes when
      it comes back up.
      
      Move resizing of WRs to rds_ib_setup_qp so that connections will be setup
      with the most current WR sizes.
      Signed-off-by: default avatarDag Moxnes <dag.moxnes@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a36e629e
    • Chuhong Yuan's avatar
      net: gemini: add missed free_netdev · 18d647ae
      Chuhong Yuan authored
      This driver forgets to free allocated netdev in remove like
      what is done in probe failure.
      Add the free to fix it.
      Signed-off-by: default avatarChuhong Yuan <hslester96@gmail.com>
      Reviewed-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      18d647ae
    • Vladimir Oltean's avatar
      net: dsa: tag_8021q: Fix dsa_8021q_restore_pvid for an absent pvid · c80ed84e
      Vladimir Oltean authored
      This sequence of operations:
      ip link set dev br0 type bridge vlan_filtering 1
      bridge vlan del dev swp2 vid 1
      ip link set dev br0 type bridge vlan_filtering 1
      ip link set dev br0 type bridge vlan_filtering 0
      
      apparently fails with the message:
      
      [   31.305716] sja1105 spi0.1: Reset switch and programmed static config. Reason: VLAN filtering
      [   31.322161] sja1105 spi0.1: Couldn't determine PVID attributes (pvid 0)
      [   31.328939] sja1105 spi0.1: Failed to setup VLAN tagging for port 1: -2
      [   31.335599] ------------[ cut here ]------------
      [   31.340215] WARNING: CPU: 1 PID: 194 at net/switchdev/switchdev.c:157 switchdev_port_attr_set_now+0x9c/0xa4
      [   31.349981] br0: Commit of attribute (id=6) failed.
      [   31.354890] Modules linked in:
      [   31.357942] CPU: 1 PID: 194 Comm: ip Not tainted 5.4.0-rc6-01792-gf4f632e07665-dirty #2062
      [   31.366167] Hardware name: Freescale LS1021A
      [   31.370437] [<c03144dc>] (unwind_backtrace) from [<c030e184>] (show_stack+0x10/0x14)
      [   31.378153] [<c030e184>] (show_stack) from [<c11d1c1c>] (dump_stack+0xe0/0x10c)
      [   31.385437] [<c11d1c1c>] (dump_stack) from [<c034c730>] (__warn+0xf4/0x10c)
      [   31.392373] [<c034c730>] (__warn) from [<c034c7bc>] (warn_slowpath_fmt+0x74/0xb8)
      [   31.399827] [<c034c7bc>] (warn_slowpath_fmt) from [<c11ca204>] (switchdev_port_attr_set_now+0x9c/0xa4)
      [   31.409097] [<c11ca204>] (switchdev_port_attr_set_now) from [<c117036c>] (__br_vlan_filter_toggle+0x6c/0x118)
      [   31.418971] [<c117036c>] (__br_vlan_filter_toggle) from [<c115d010>] (br_changelink+0xf8/0x518)
      [   31.427637] [<c115d010>] (br_changelink) from [<c0f8e9ec>] (__rtnl_newlink+0x3f4/0x76c)
      [   31.435613] [<c0f8e9ec>] (__rtnl_newlink) from [<c0f8eda8>] (rtnl_newlink+0x44/0x60)
      [   31.443329] [<c0f8eda8>] (rtnl_newlink) from [<c0f89f20>] (rtnetlink_rcv_msg+0x2cc/0x51c)
      [   31.451477] [<c0f89f20>] (rtnetlink_rcv_msg) from [<c1008df8>] (netlink_rcv_skb+0xb8/0x110)
      [   31.459796] [<c1008df8>] (netlink_rcv_skb) from [<c1008648>] (netlink_unicast+0x17c/0x1f8)
      [   31.468026] [<c1008648>] (netlink_unicast) from [<c1008980>] (netlink_sendmsg+0x2bc/0x3b4)
      [   31.476261] [<c1008980>] (netlink_sendmsg) from [<c0f43858>] (___sys_sendmsg+0x230/0x250)
      [   31.484408] [<c0f43858>] (___sys_sendmsg) from [<c0f44c84>] (__sys_sendmsg+0x50/0x8c)
      [   31.492209] [<c0f44c84>] (__sys_sendmsg) from [<c0301000>] (ret_fast_syscall+0x0/0x28)
      [   31.500090] Exception stack(0xedf47fa8 to 0xedf47ff0)
      [   31.505122] 7fa0:                   00000002 b6f2e060 00000003 beabd6a4 00000000 00000000
      [   31.513265] 7fc0: 00000002 b6f2e060 5d6e3213 00000128 00000000 00000001 00000006 000619c4
      [   31.521405] 7fe0: 00086078 beabd658 0005edbc b6e7ce68
      
      The reason is the implementation of br_get_pvid:
      
      static inline u16 br_get_pvid(const struct net_bridge_vlan_group *vg)
      {
      	if (!vg)
      		return 0;
      
      	smp_rmb();
      	return vg->pvid;
      }
      
      Since VID 0 is an invalid pvid from the bridge's point of view, let's
      add this check in dsa_8021q_restore_pvid to avoid restoring a pvid that
      doesn't really exist.
      
      Fixes: 5f33183b ("net: dsa: tag_8021q: Restore bridge VLANs when enabling vlan_filtering")
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c80ed84e
    • David S. Miller's avatar
      Merge branch 'seg6-fixes-to-Segment-Routing-in-IPv6' · e84fa0ae
      David S. Miller authored
      Andrea Mayer says:
      
      ====================
      seg6: fixes to Segment Routing in IPv6
      
      This patchset is divided in 2 patches and it introduces some fixes
      to Segment Routing in IPv6, which are:
      
      - in function get_srh() fix the srh pointer after calling
        pskb_may_pull();
      
      - fix the skb->transport_header after calling decap_and_validate()
        function;
      
      Any comments on the patchset are welcome.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e84fa0ae
    • Andrea Mayer's avatar
      seg6: fix skb transport_header after decap_and_validate() · c71644d0
      Andrea Mayer authored
      in the receive path (more precisely in ip6_rcv_core()) the
      skb->transport_header is set to skb->network_header + sizeof(*hdr). As a
      consequence, after routing operations, destination input expects to find
      skb->transport_header correctly set to the next protocol (or extension
      header) that follows the network protocol. However, decap behaviors (DX*,
      DT*) remove the outer IPv6 and SRH extension and do not set again the
      skb->transport_header pointer correctly. For this reason, the patch sets
      the skb->transport_header to the skb->network_header + sizeof(hdr) in each
      DX* and DT* behavior.
      Signed-off-by: default avatarAndrea Mayer <andrea.mayer@uniroma2.it>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c71644d0
    • Andrea Mayer's avatar
      seg6: fix srh pointer in get_srh() · 7f91ed8c
      Andrea Mayer authored
      pskb_may_pull may change pointers in header. For this reason, it is
      mandatory to reload any pointer that points into skb header.
      Signed-off-by: default avatarAndrea Mayer <andrea.mayer@uniroma2.it>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7f91ed8c
    • Nishad Kamdar's avatar
      net: stmmac: Use the correct style for SPDX License Identifier · acb9bdc1
      Nishad Kamdar authored
      This patch corrects the SPDX License Identifier style in
      header files related to STMicroelectronics based Multi-Gigabit
      Ethernet driver. For C header files Documentation/process/license-rules.rst
      mandates C-like comments (opposed to C source files where
      C++ style should be used).
      
      Changes made by using a script provided by Joe Perches here:
      https://lkml.org/lkml/2019/2/7/46.
      Suggested-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarNishad Kamdar <nishadkamdar@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      acb9bdc1
    • Nishad Kamdar's avatar
      octeontx2-af: Use the correct style for SPDX License Identifier · 26b3f3cc
      Nishad Kamdar authored
      This patch corrects the SPDX License Identifier style in
      header files related to Marvell OcteonTX2 network devices.
      It uses an expilict block comment for the SPDX License
      Identifier.
      
      Changes made by using a script provided by Joe Perches here:
      https://lkml.org/lkml/2019/2/7/46.
      Suggested-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarNishad Kamdar <nishadkamdar@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      26b3f3cc
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · bec8b6e9
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "11 fixes"
      
      MM fixes and one xz decompressor fix.
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm/debug.c: PageAnon() is true for PageKsm() pages
        mm/debug.c: __dump_page() prints an extra line
        mm/page_io.c: do not free shared swap slots
        mm/memory_hotplug: fix try_offline_node()
        mm,thp: recheck each page before collapsing file THP
        mm: slub: really fix slab walking for init_on_free
        mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup()
        mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm()
        lib/xz: fix XZ_DYNALLOC to avoid useless memory reallocations
        mm: fix trying to reclaim unevictable lru page when calling madvise_pageout
        mm: mempolicy: fix the wrong return value and potential pages leak of mbind
      bec8b6e9
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 6c9594bd
      Linus Torvalds authored
      Pull more input fixes from Dmitry Torokhov:
       "A couple of fixes in driver teardown paths and another ID for
        Synaptics RMI mode"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: synaptics - enable RMI mode for X1 Extreme 2nd Generation
        Input: synaptics-rmi4 - destroy F54 poller workqueue when removing
        Input: ff-memless - kill timer in destroy()
      6c9594bd
    • Ralph Campbell's avatar
      mm/debug.c: PageAnon() is true for PageKsm() pages · 6855ac4a
      Ralph Campbell authored
      PageAnon() and PageKsm() use the low two bits of the page->mapping
      pointer to indicate the page type.  PageAnon() only checks the LSB while
      PageKsm() checks the least significant 2 bits are equal to 3.
      
      Therefore, PageAnon() is true for KSM pages.  __dump_page() incorrectly
      will never print "ksm" because it checks PageAnon() first.  Fix this by
      checking PageKsm() first.
      
      Link: http://lkml.kernel.org/r/20191113000651.20677-1-rcampbell@nvidia.com
      Fixes: 1c6fb1d8 ("mm: print more information about mapping in __dump_page")
      Signed-off-by: default avatarRalph Campbell <rcampbell@nvidia.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6855ac4a
    • Ralph Campbell's avatar
      mm/debug.c: __dump_page() prints an extra line · 76a1850e
      Ralph Campbell authored
      When dumping struct page information, __dump_page() prints the page type
      with a trailing blank followed by the page flags on a separate line:
      
        anon
        flags: 0x100000000090034(uptodate|lru|active|head|swapbacked)
      
      It looks like the intent was to use pr_cont() for printing "flags:" but
      pr_cont() usage is discouraged so fix this by extending the format to
      include the flags into a single line:
      
        anon flags: 0x100000000090034(uptodate|lru|active|head|swapbacked)
      
      If the page is file backed, the name might be long so use two lines:
      
        shmem_aops name:"dev/zero"
        flags: 0x10000000008000c(uptodate|dirty|swapbacked)
      
      Eliminate pr_conf() usage as well for appending compound_mapcount.
      
      Link: http://lkml.kernel.org/r/20191112012608.16926-1-rcampbell@nvidia.comSigned-off-by: default avatarRalph Campbell <rcampbell@nvidia.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      76a1850e
    • Vinayak Menon's avatar
      mm/page_io.c: do not free shared swap slots · 5df373e9
      Vinayak Menon authored
      The following race is observed due to which a processes faulting on a
      swap entry, finds the page neither in swapcache nor swap.  This causes
      zram to give a zero filled page that gets mapped to the process,
      resulting in a user space crash later.
      
      Consider parent and child processes Pa and Pb sharing the same swap slot
      with swap_count 2.  Swap is on zram with SWP_SYNCHRONOUS_IO set.
      Virtual address 'VA' of Pa and Pb points to the shared swap entry.
      
      Pa                                       Pb
      
      fault on VA                              fault on VA
      do_swap_page                             do_swap_page
      lookup_swap_cache fails                  lookup_swap_cache fails
                                               Pb scheduled out
      swapin_readahead (deletes zram entry)
      swap_free (makes swap_count 1)
                                               Pb scheduled in
                                               swap_readpage (swap_count == 1)
                                               Takes SWP_SYNCHRONOUS_IO path
                                               zram enrty absent
                                               zram gives a zero filled page
      
      Fix this by making sure that swap slot is freed only when swap count
      drops down to one.
      
      Link: http://lkml.kernel.org/r/1571743294-14285-1-git-send-email-vinmenon@codeaurora.org
      Fixes: aa8d22a1 ("mm: swap: SWP_SYNCHRONOUS_IO: skip swapcache only if swapped page has no other reference")
      Signed-off-by: default avatarVinayak Menon <vinmenon@codeaurora.org>
      Suggested-by: default avatarMinchan Kim <minchan@google.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5df373e9
    • David Hildenbrand's avatar
      mm/memory_hotplug: fix try_offline_node() · 2c91f8fc
      David Hildenbrand authored
      try_offline_node() is pretty much broken right now:
      
       - The node span is updated when onlining memory, not when adding it. We
         ignore memory that was mever onlined. Bad.
      
       - We touch possible garbage memmaps. The pfn_to_nid(pfn) can easily
         trigger a kernel panic. Bad for memory that is offline but also bad
         for subsection hotadd with ZONE_DEVICE, whereby the memmap of the
         first PFN of a section might contain garbage.
      
       - Sections belonging to mixed nodes are not properly considered.
      
      As memory blocks might belong to multiple nodes, we would have to walk
      all pageblocks (or at least subsections) within present sections.
      However, we don't have a way to identify whether a memmap that is not
      online was initialized (relevant for ZONE_DEVICE).  This makes things
      more complicated.
      
      Luckily, we can piggy pack on the node span and the nid stored in memory
      blocks.  Currently, the node span is grown when calling
      move_pfn_range_to_zone() - e.g., when onlining memory, and shrunk when
      removing memory, before calling try_offline_node().  Sysfs links are
      created via link_mem_sections(), e.g., during boot or when adding
      memory.
      
      If the node still spans memory or if any memory block belongs to the
      nid, we don't set the node offline.  As memory blocks that span multiple
      nodes cannot get offlined, the nid stored in memory blocks is reliable
      enough (for such online memory blocks, the node still spans the memory).
      
      Introduce for_each_memory_block() to efficiently walk all memory blocks.
      
      Note: We will soon stop shrinking the ZONE_DEVICE zone and the node span
      when removing ZONE_DEVICE memory to fix similar issues (access of
      garbage memmaps) - until we have a reliable way to identify whether
      these memmaps were properly initialized.  This implies later, that once
      a node had ZONE_DEVICE memory, we won't be able to set a node offline -
      which should be acceptable.
      
      Since commit f1dd2cd1 ("mm, memory_hotplug: do not associate
      hotadded memory to zones until online") memory that is added is not
      assoziated with a zone/node (memmap not initialized).  The introducing
      commit 60a5a19e ("memory-hotplug: remove sysfs file of node")
      already missed that we could have multiple nodes for a section and that
      the zone/node span is updated when onlining pages, not when adding them.
      
      I tested this by hotplugging two DIMMs to a memory-less and cpu-less
      NUMA node.  The node is properly onlined when adding the DIMMs.  When
      removing the DIMMs, the node is properly offlined.
      
      Masayoshi Mizuma reported:
      
      : Without this patch, memory hotplug fails as panic:
      :
      :  BUG: kernel NULL pointer dereference, address: 0000000000000000
      :  ...
      :  Call Trace:
      :   remove_memory_block_devices+0x81/0xc0
      :   try_remove_memory+0xb4/0x130
      :   __remove_memory+0xa/0x20
      :   acpi_memory_device_remove+0x84/0x100
      :   acpi_bus_trim+0x57/0x90
      :   acpi_bus_trim+0x2e/0x90
      :   acpi_device_hotplug+0x2b2/0x4d0
      :   acpi_hotplug_work_fn+0x1a/0x30
      :   process_one_work+0x171/0x380
      :   worker_thread+0x49/0x3f0
      :   kthread+0xf8/0x130
      :   ret_from_fork+0x35/0x40
      
      [david@redhat.com: v3]
        Link: http://lkml.kernel.org/r/20191102120221.7553-1-david@redhat.com
      Link: http://lkml.kernel.org/r/20191028105458.28320-1-david@redhat.com
      Fixes: 60a5a19e ("memory-hotplug: remove sysfs file of node")
      Fixes: f1dd2cd1 ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visiable after d0dc12e8Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Tested-by: default avatarMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
      Cc: Jani Nikula <jani.nikula@intel.com>
      Cc: Nayna Jain <nayna@linux.ibm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2c91f8fc