1. 10 Dec, 2016 40 commits
    • Eli Cooper's avatar
      ipv6: Set skb->protocol properly for local output · e67bd82f
      Eli Cooper authored
      commit b4e479a9 upstream.
      
      When xfrm is applied to TSO/GSO packets, it follows this path:
      
          xfrm_output() -> xfrm_output_gso() -> skb_gso_segment()
      
      where skb_gso_segment() relies on skb->protocol to function properly.
      
      This patch sets skb->protocol to ETH_P_IPV6 before dst_output() is called,
      fixing a bug where GSO packets sent through an ipip6 tunnel are dropped
      when xfrm is involved.
      Signed-off-by: default avatarEli Cooper <elicooper@gmx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e67bd82f
    • Linus Torvalds's avatar
      Don't feed anything but regular iovec's to blk_rq_map_user_iov · 22d94c32
      Linus Torvalds authored
      commit a0ac402c upstream.
      
      In theory we could map other things, but there's a reason that function
      is called "user_iov".  Using anything else (like splice can do) just
      confuses it.
      Reported-and-tested-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      22d94c32
    • Al Viro's avatar
      constify iov_iter_count() and iter_is_iovec() · e4a6c61c
      Al Viro authored
      commit b57332b4 upstream.
      
      [stable note, need this to prevent build warning in commit
      a0ac402c]
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e4a6c61c
    • Andreas Larsson's avatar
    • Thomas Tai's avatar
      sparc64: fix compile warning section mismatch in find_node() · 360e257f
      Thomas Tai authored
      [ Upstream commit 87a349f9 ]
      
      A compile warning is introduced by a commit to fix the find_node().
      This patch fix the compile warning by moving find_node() into __init
      section. Because find_node() is only used by memblock_nid_range() which
      is only used by a __init add_node_ranges(). find_node() and
      memblock_nid_range() should also be inside __init section.
      Signed-off-by: default avatarThomas Tai <thomas.tai@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      360e257f
    • Thomas Tai's avatar
      sparc64: Fix find_node warning if numa node cannot be found · 2f02dcb6
      Thomas Tai authored
      [ Upstream commit 74a5ed5c ]
      
      When booting up LDOM, find_node() warns that a physical address
      doesn't match a NUMA node.
      
      WARNING: CPU: 0 PID: 0 at arch/sparc/mm/init_64.c:835
      find_node+0xf4/0x120 find_node: A physical address doesn't
      match a NUMA node rule. Some physical memory will be
      owned by node 0.Modules linked in:
      
      CPU: 0 PID: 0 Comm: swapper Not tainted 4.9.0-rc3 #4
      Call Trace:
       [0000000000468ba0] __warn+0xc0/0xe0
       [0000000000468c74] warn_slowpath_fmt+0x34/0x60
       [00000000004592f4] find_node+0xf4/0x120
       [0000000000dd0774] add_node_ranges+0x38/0xe4
       [0000000000dd0b1c] numa_parse_mdesc+0x268/0x2e4
       [0000000000dd0e9c] bootmem_init+0xb8/0x160
       [0000000000dd174c] paging_init+0x808/0x8fc
       [0000000000dcb0d0] setup_arch+0x2c8/0x2f0
       [0000000000dc68a0] start_kernel+0x48/0x424
       [0000000000dcb374] start_early_boot+0x27c/0x28c
       [0000000000a32c08] tlb_fixup_done+0x4c/0x64
       [0000000000027f08] 0x27f08
      
      It is because linux use an internal structure node_masks[] to
      keep the best memory latency node only. However, LDOM mdesc can
      contain single latency-group with multiple memory latency nodes.
      
      If the address doesn't match the best latency node within
      node_masks[], it should check for an alternative via mdesc.
      The warning message should only be printed if the address
      doesn't match any node_masks[] nor within mdesc. To minimize
      the impact of searching mdesc every time, the last matched
      mask and index is stored in a variable.
      Signed-off-by: default avatarThomas Tai <thomas.tai@oracle.com>
      Reviewed-by: default avatarChris Hyser <chris.hyser@oracle.com>
      Reviewed-by: default avatarLiam Merwick <liam.merwick@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2f02dcb6
    • Alexander Duyck's avatar
      ipv4: Drop suffix update from resize code · ad02ec7d
      Alexander Duyck authored
      [ Upstream commit a52ca62c ]
      
      It has been reported that update_suffix can be expensive when it is called
      on a large node in which most of the suffix lengths are the same.  The time
      required to add 200K entries had increased from around 3 seconds to almost
      49 seconds.
      
      In order to address this we need to move the code for updating the suffix
      out of resize and instead just have it handled in the cases where we are
      pushing a node that increases the suffix length, or will decrease the
      suffix length.
      
      Fixes: 5405afd1 ("fib_trie: Add tracking value for suffix length")
      Reported-by: default avatarRobert Shearman <rshearma@brocade.com>
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Reviewed-by: default avatarRobert Shearman <rshearma@brocade.com>
      Tested-by: default avatarRobert Shearman <rshearma@brocade.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ad02ec7d
    • Alexander Duyck's avatar
      ipv4: Drop leaf from suffix pull/push functions · 0b1c601d
      Alexander Duyck authored
      [ Upstream commit 1a239173 ]
      
      It wasn't necessary to pass a leaf in when doing the suffix updates so just
      drop it.  Instead just pass the suffix and work with that.
      
      Since we dropped the leaf there is no need to include that in the name so
      the names are updated to node_push_suffix and node_pull_suffix.
      
      Finally I noticed that the logic for pulling the suffix length back
      actually had some issues.  Specifically it would stop prematurely if there
      was a longer suffix, but it was not as long as the original suffix.  I
      updated the code to address that in node_pull_suffix.
      
      Fixes: 5405afd1 ("fib_trie: Add tracking value for suffix length")
      Suggested-by: default avatarRobert Shearman <rshearma@brocade.com>
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Reviewed-by: default avatarRobert Shearman <rshearma@brocade.com>
      Tested-by: default avatarRobert Shearman <rshearma@brocade.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0b1c601d
    • Alexander Duyck's avatar
      ipv4: Fix memory leak in exception case for splitting tries · cd8a6c0e
      Alexander Duyck authored
      [ Upstream commit 3114cdfe ]
      
      Fix a small memory leak that can occur where we leak a fib_alias in the
      event of us not being able to insert it into the local table.
      
      Fixes: 0ddcf43d ("ipv4: FIB Local/MAIN table collapse")
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cd8a6c0e
    • Alexander Duyck's avatar
      ipv4: Restore fib_trie_flush_external function and fix call ordering · a8780378
      Alexander Duyck authored
      [ Upstream commit 3b709334, the FIB offload
        removal didn't occur in 4.8 so that part of this patch isn't here.  However
        we still need to fib_unmerge() bits. ]
      
      The patch that removed the FIB offload infrastructure was a bit too
      aggressive and also removed code needed to clean up us splitting the table
      if additional rules were added.  Specifically the function
      fib_trie_flush_external was called at the end of a new rule being added to
      flush the foreign trie entries from the main trie.
      
      I updated the code so that we only call fib_trie_flush_external on the main
      table so that we flush the entries for local from main.  This way we don't
      call it for every rule change which is what was happening previously.
      
      Fixes: 347e3b28 ("switchdev: remove FIB offload infrastructure")
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a8780378
    • Kees Cook's avatar
      net: ping: check minimum size on ICMP header length · 5ff5e5c0
      Kees Cook authored
      [ Upstream commit 0eab121e ]
      
      Prior to commit c0371da6 ("put iov_iter into msghdr") in v3.19, there
      was no check that the iovec contained enough bytes for an ICMP header,
      and the read loop would walk across neighboring stack contents. Since the
      iov_iter conversion, bad arguments are noticed, but the returned error is
      EFAULT. Returning EINVAL is a clearer error and also solves the problem
      prior to v3.19.
      
      This was found using trinity with KASAN on v3.18:
      
      BUG: KASAN: stack-out-of-bounds in memcpy_fromiovec+0x60/0x114 at addr ffffffc071077da0
      Read of size 8 by task trinity-c2/9623
      page:ffffffbe034b9a08 count:0 mapcount:0 mapping:          (null) index:0x0
      flags: 0x0()
      page dumped because: kasan: bad access detected
      CPU: 0 PID: 9623 Comm: trinity-c2 Tainted: G    BU         3.18.0-dirty #15
      Hardware name: Google Tegra210 Smaug Rev 1,3+ (DT)
      Call trace:
      [<ffffffc000209c98>] dump_backtrace+0x0/0x1ac arch/arm64/kernel/traps.c:90
      [<ffffffc000209e54>] show_stack+0x10/0x1c arch/arm64/kernel/traps.c:171
      [<     inline     >] __dump_stack lib/dump_stack.c:15
      [<ffffffc000f18dc4>] dump_stack+0x7c/0xd0 lib/dump_stack.c:50
      [<     inline     >] print_address_description mm/kasan/report.c:147
      [<     inline     >] kasan_report_error mm/kasan/report.c:236
      [<ffffffc000373dcc>] kasan_report+0x380/0x4b8 mm/kasan/report.c:259
      [<     inline     >] check_memory_region mm/kasan/kasan.c:264
      [<ffffffc00037352c>] __asan_load8+0x20/0x70 mm/kasan/kasan.c:507
      [<ffffffc0005b9624>] memcpy_fromiovec+0x5c/0x114 lib/iovec.c:15
      [<     inline     >] memcpy_from_msg include/linux/skbuff.h:2667
      [<ffffffc000ddeba0>] ping_common_sendmsg+0x50/0x108 net/ipv4/ping.c:674
      [<ffffffc000dded30>] ping_v4_sendmsg+0xd8/0x698 net/ipv4/ping.c:714
      [<ffffffc000dc91dc>] inet_sendmsg+0xe0/0x12c net/ipv4/af_inet.c:749
      [<     inline     >] __sock_sendmsg_nosec net/socket.c:624
      [<     inline     >] __sock_sendmsg net/socket.c:632
      [<ffffffc000cab61c>] sock_sendmsg+0x124/0x164 net/socket.c:643
      [<     inline     >] SYSC_sendto net/socket.c:1797
      [<ffffffc000cad270>] SyS_sendto+0x178/0x1d8 net/socket.c:1761
      
      CVE-2016-8399
      Reported-by: default avatarQidan He <i@flanker017.me>
      Fixes: c319b4d7 ("net: ipv4: add IPPROTO_ICMP socket kind")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5ff5e5c0
    • Eric Dumazet's avatar
      net: avoid signed overflows for SO_{SND|RCV}BUFFORCE · f818e5d8
      Eric Dumazet authored
      [ Upstream commit b98b0bc8 ]
      
      CAP_NET_ADMIN users should not be allowed to set negative
      sk_sndbuf or sk_rcvbuf values, as it can lead to various memory
      corruptions, crashes, OOM...
      
      Note that before commit 82981930 ("net: cleanups in
      sock_setsockopt()"), the bug was even more serious, since SO_SNDBUF
      and SO_RCVBUF were vulnerable.
      
      This needs to be backported to all known linux kernels.
      
      Again, many thanks to syzkaller team for discovering this gem.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f818e5d8
    • Sabrina Dubroca's avatar
      geneve: avoid use-after-free of skb->data · bfecf901
      Sabrina Dubroca authored
      [ Upstream commit 5b010147 ]
      
      geneve{,6}_build_skb can end up doing a pskb_expand_head(), which
      makes the ip_hdr(skb) reference we stashed earlier stale. Since it's
      only needed as an argument to ip_tunnel_ecn_encap(), move this
      directly in the function call.
      
      Fixes: 08399efc ("geneve: ensure ECN info is handled properly in all tx/rx paths")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bfecf901
    • Michal Kubeček's avatar
      tipc: check minimum bearer MTU · 4daa2c73
      Michal Kubeček authored
      [ Upstream commit 3de81b75 ]
      
      Qian Zhang (张谦) reported a potential socket buffer overflow in
      tipc_msg_build() which is also known as CVE-2016-8632: due to
      insufficient checks, a buffer overflow can occur if MTU is too short for
      even tipc headers. As anyone can set device MTU in a user/net namespace,
      this issue can be abused by a regular user.
      
      As agreed in the discussion on Ben Hutchings' original patch, we should
      check the MTU at the moment a bearer is attached rather than for each
      processed packet. We also need to repeat the check when bearer MTU is
      adjusted to new device MTU. UDP case also needs a check to avoid
      overflow when calculating bearer MTU.
      
      Fixes: b97bf3fd ("[TIPC] Initial merge")
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Reported-by: default avatarQian Zhang (张谦) <zhangqian-c@360.cn>
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4daa2c73
    • Chris Brandt's avatar
      sh_eth: remove unchecked interrupts for RZ/A1 · 1ff3209a
      Chris Brandt authored
      [ Upstream commit 33d446db ]
      
      When streaming a lot of data and the RZ/A1 can't keep up, some status bits
      will get set that are not being checked or cleared which cause the
      following messages and the Ethernet driver to stop working. This
      patch fixes that issue.
      
      irq 21: nobody cared (try booting with the "irqpoll" option)
      handlers:
      [<c036b71c>] sh_eth_interrupt
      Disabling IRQ #21
      
      Fixes: db893473 ("sh_eth: Add support for r7s72100")
      Signed-off-by: default avatarChris Brandt <chris.brandt@renesas.com>
      Acked-by: default avatarSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1ff3209a
    • Florian Fainelli's avatar
      net: bcmgenet: Utilize correct struct device for all DMA operations · bbf913d7
      Florian Fainelli authored
      [ Upstream commit 8c4799ac ]
      
      __bcmgenet_tx_reclaim() and bcmgenet_free_rx_buffers() are not using the
      same struct device during unmap that was used for the map operation,
      which makes DMA-API debugging warn about it. Fix this by always using
      &priv->pdev->dev throughout the driver, using an identical device
      reference for all map/unmap calls.
      
      Fixes: 1c1008c7 ("net: bcmgenet: add main driver file")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bbf913d7
    • Kristian Evensen's avatar
      cdc_ether: Fix handling connection notification · accb7c99
      Kristian Evensen authored
      [ Upstream commit d5c83d0d ]
      
      Commit bfe9b9d2 ("cdc_ether: Improve ZTE MF823/831/910 handling")
      introduced a work-around in usbnet_cdc_status() for devices that exported
      cdc carrier on twice on connect. Before the commit, this behavior caused
      the link state to be incorrect. It was assumed that all CDC Ethernet
      devices would either export this behavior, or send one off and then one on
      notification (which seems to be the default behavior).
      
      Unfortunately, it turns out multiple devices sends a connection
      notification multiple times per second (via an interrupt), even when
      connection state does not change. This has been observed with several
      different USB LAN dongles (at least), for example 13b1:0041 (Linksys).
      After bfe9b9d2, the link state has been set as down and then up for
      each notification. This has caused a flood of Netlink NEWLINK messages and
      syslog to be flooded with messages similar to:
      
      cdc_ether 2-1:2.0 eth1: kevent 12 may have been dropped
      
      This commit fixes the behavior by reverting usbnet_cdc_status() to how it
      was before bfe9b9d2. The work-around has been moved to a separate
      status-function which is only called when a known, affect device is
      detected.
      
      v1->v2:
      
      * Do not open-code netif_carrier_ok() (thanks Henning Schild).
      * Call netif_carrier_off() instead of usb_link_change(). This prevents
      calling schedule_work() twice without giving the work queue a chance to be
      processed (thanks Bjørn Mork).
      
      Fixes: bfe9b9d2 ("cdc_ether: Improve ZTE MF823/831/910 handling")
      Reported-by: default avatarHenning Schild <henning.schild@siemens.com>
      Signed-off-by: default avatarKristian Evensen <kristian.evensen@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      accb7c99
    • Artem Savkov's avatar
      ip6_offload: check segs for NULL in ipv6_gso_segment. · 34457543
      Artem Savkov authored
      [ Upstream commit 6b6ebb6b ]
      
      segs needs to be checked for being NULL in ipv6_gso_segment() before calling
      skb_shinfo(segs), otherwise kernel can run into a NULL-pointer dereference:
      
      [   97.811262] BUG: unable to handle kernel NULL pointer dereference at 00000000000000cc
      [   97.819112] IP: [<ffffffff816e52f9>] ipv6_gso_segment+0x119/0x2f0
      [   97.825214] PGD 0 [   97.827047]
      [   97.828540] Oops: 0000 [#1] SMP
      [   97.831678] Modules linked in: vhost_net vhost macvtap macvlan nfsv3 rpcsec_gss_krb5
      nfsv4 dns_resolver nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4
      iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack
      ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter
      bridge stp llc snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel
      snd_hda_codec edac_mce_amd snd_hda_core edac_core snd_hwdep kvm_amd snd_seq kvm snd_seq_device
      snd_pcm irqbypass snd_timer ppdev parport_serial snd parport_pc k10temp pcspkr soundcore parport
      sp5100_tco shpchp sg wmi i2c_piix4 acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc
      ip_tables xfs libcrc32c sr_mod cdrom sd_mod ata_generic pata_acpi amdkfd amd_iommu_v2 radeon
      broadcom bcm_phy_lib i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
      ttm ahci serio_raw tg3 firewire_ohci libahci pata_atiixp drm ptp libata firewire_core pps_core
      i2c_core crc_itu_t fjes dm_mirror dm_region_hash dm_log dm_mod
      [   97.927721] CPU: 1 PID: 3504 Comm: vhost-3495 Not tainted 4.9.0-7.el7.test.x86_64 #1
      [   97.935457] Hardware name: AMD Snook/Snook, BIOS ESK0726A 07/26/2010
      [   97.941806] task: ffff880129a1c080 task.stack: ffffc90001bcc000
      [   97.947720] RIP: 0010:[<ffffffff816e52f9>]  [<ffffffff816e52f9>] ipv6_gso_segment+0x119/0x2f0
      [   97.956251] RSP: 0018:ffff88012fc43a10  EFLAGS: 00010207
      [   97.961557] RAX: 0000000000000000 RBX: ffff8801292c8700 RCX: 0000000000000594
      [   97.968687] RDX: 0000000000000593 RSI: ffff880129a846c0 RDI: 0000000000240000
      [   97.975814] RBP: ffff88012fc43a68 R08: ffff880129a8404e R09: 0000000000000000
      [   97.982942] R10: 0000000000000000 R11: ffff880129a84076 R12: 00000020002949b3
      [   97.990070] R13: ffff88012a580000 R14: 0000000000000000 R15: ffff88012a580000
      [   97.997198] FS:  0000000000000000(0000) GS:ffff88012fc40000(0000) knlGS:0000000000000000
      [   98.005280] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   98.011021] CR2: 00000000000000cc CR3: 0000000126c5d000 CR4: 00000000000006e0
      [   98.018149] Stack:
      [   98.020157]  00000000ffffffff ffff88012fc43ac8 ffffffffa017ad0a 000000000000000e
      [   98.027584]  0000001300000000 0000000077d59998 ffff8801292c8700 00000020002949b3
      [   98.035010]  ffff88012a580000 0000000000000000 ffff88012a580000 ffff88012fc43a98
      [   98.042437] Call Trace:
      [   98.044879]  <IRQ> [   98.046803]  [<ffffffffa017ad0a>] ? tg3_start_xmit+0x84a/0xd60 [tg3]
      [   98.053156]  [<ffffffff815eeee0>] skb_mac_gso_segment+0xb0/0x130
      [   98.059158]  [<ffffffff815eefd3>] __skb_gso_segment+0x73/0x110
      [   98.064985]  [<ffffffff815ef40d>] validate_xmit_skb+0x12d/0x2b0
      [   98.070899]  [<ffffffff815ef5d2>] validate_xmit_skb_list+0x42/0x70
      [   98.077073]  [<ffffffff81618560>] sch_direct_xmit+0xd0/0x1b0
      [   98.082726]  [<ffffffff815efd86>] __dev_queue_xmit+0x486/0x690
      [   98.088554]  [<ffffffff8135c135>] ? cpumask_next_and+0x35/0x50
      [   98.094380]  [<ffffffff815effa0>] dev_queue_xmit+0x10/0x20
      [   98.099863]  [<ffffffffa09ce057>] br_dev_queue_push_xmit+0xa7/0x170 [bridge]
      [   98.106907]  [<ffffffffa09ce161>] br_forward_finish+0x41/0xc0 [bridge]
      [   98.113430]  [<ffffffff81627cf2>] ? nf_iterate+0x52/0x60
      [   98.118735]  [<ffffffff81627d6b>] ? nf_hook_slow+0x6b/0xc0
      [   98.124216]  [<ffffffffa09ce32c>] __br_forward+0x14c/0x1e0 [bridge]
      [   98.130480]  [<ffffffffa09ce120>] ? br_dev_queue_push_xmit+0x170/0x170 [bridge]
      [   98.137785]  [<ffffffffa09ce4bd>] br_forward+0x9d/0xb0 [bridge]
      [   98.143701]  [<ffffffffa09cfbb7>] br_handle_frame_finish+0x267/0x560 [bridge]
      [   98.150834]  [<ffffffffa09d0064>] br_handle_frame+0x174/0x2f0 [bridge]
      [   98.157355]  [<ffffffff8102fb89>] ? sched_clock+0x9/0x10
      [   98.162662]  [<ffffffff810b63b2>] ? sched_clock_cpu+0x72/0xa0
      [   98.168403]  [<ffffffff815eccf5>] __netif_receive_skb_core+0x1e5/0xa20
      [   98.174926]  [<ffffffff813659f9>] ? timerqueue_add+0x59/0xb0
      [   98.180580]  [<ffffffff815ed548>] __netif_receive_skb+0x18/0x60
      [   98.186494]  [<ffffffff815ee625>] process_backlog+0x95/0x140
      [   98.192145]  [<ffffffff815edccd>] net_rx_action+0x16d/0x380
      [   98.197713]  [<ffffffff8170cff1>] __do_softirq+0xd1/0x283
      [   98.203106]  [<ffffffff8170b2bc>] do_softirq_own_stack+0x1c/0x30
      [   98.209107]  <EOI> [   98.211029]  [<ffffffff8108a5c0>] do_softirq+0x50/0x60
      [   98.216166]  [<ffffffff815ec853>] netif_rx_ni+0x33/0x80
      [   98.221386]  [<ffffffffa09eeff7>] tun_get_user+0x487/0x7f0 [tun]
      [   98.227388]  [<ffffffffa09ef3ab>] tun_sendmsg+0x4b/0x60 [tun]
      [   98.233129]  [<ffffffffa0b68932>] handle_tx+0x282/0x540 [vhost_net]
      [   98.239392]  [<ffffffffa0b68c25>] handle_tx_kick+0x15/0x20 [vhost_net]
      [   98.245916]  [<ffffffffa0abacfe>] vhost_worker+0x9e/0xf0 [vhost]
      [   98.251919]  [<ffffffffa0abac60>] ? vhost_umem_alloc+0x40/0x40 [vhost]
      [   98.258440]  [<ffffffff81003a47>] ? do_syscall_64+0x67/0x180
      [   98.264094]  [<ffffffff810a44d9>] kthread+0xd9/0xf0
      [   98.268965]  [<ffffffff810a4400>] ? kthread_park+0x60/0x60
      [   98.274444]  [<ffffffff8170a4d5>] ret_from_fork+0x25/0x30
      [   98.279836] Code: 8b 93 d8 00 00 00 48 2b 93 d0 00 00 00 4c 89 e6 48 89 df 66 89 93 c2 00 00 00 ff 10 48 3d 00 f0 ff ff 49 89 c2 0f 87 52 01 00 00 <41> 8b 92 cc 00 00 00 48 8b 80 d0 00 00 00 44 0f b7 74 10 06 66
      [   98.299425] RIP  [<ffffffff816e52f9>] ipv6_gso_segment+0x119/0x2f0
      [   98.305612]  RSP <ffff88012fc43a10>
      [   98.309094] CR2: 00000000000000cc
      [   98.312406] ---[ end trace 726a2c7a2d2d78d0 ]---
      Signed-off-by: default avatarArtem Savkov <asavkov@redhat.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      34457543
    • Philip Pettersson's avatar
      packet: fix race condition in packet_set_ring · cef222d4
      Philip Pettersson authored
      [ Upstream commit 84ac7260 ]
      
      When packet_set_ring creates a ring buffer it will initialize a
      struct timer_list if the packet version is TPACKET_V3. This value
      can then be raced by a different thread calling setsockopt to
      set the version to TPACKET_V1 before packet_set_ring has finished.
      
      This leads to a use-after-free on a function pointer in the
      struct timer_list when the socket is closed as the previously
      initialized timer will not be deleted.
      
      The bug is fixed by taking lock_sock(sk) in packet_setsockopt when
      changing the packet version while also taking the lock at the start
      of packet_set_ring.
      
      Fixes: f6fb8f10 ("af-packet: TPACKET_V3 flexible buffer implementation.")
      Signed-off-by: default avatarPhilip Pettersson <philip.pettersson@gmail.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cef222d4
    • Arnaldo Carvalho de Melo's avatar
      GSO: Reload iph after pskb_may_pull · 17941a9d
      Arnaldo Carvalho de Melo authored
      [ Upstream commit a5108878 ]
      
      As it may get stale and lead to use after free.
      Acked-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Cc: Alexander Duyck <aduyck@mirantis.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Fixes: cbc53e08 ("GSO: Add GSO type for fixed IPv4 ID")
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      17941a9d
    • Eric Dumazet's avatar
      net/dccp: fix use-after-free in dccp_invalid_packet · ff0d7874
      Eric Dumazet authored
      [ Upstream commit 648f0c28 ]
      
      pskb_may_pull() can reallocate skb->head, we need to reload dh pointer
      in dccp_invalid_packet() or risk use after free.
      
      Bug found by Andrey Konovalov using syzkaller.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ff0d7874
    • Cyrille Pitchen's avatar
      net: macb: fix the RX queue reset in macb_rx() · 023cd33e
      Cyrille Pitchen authored
      [ Upstream commit a0b44eea ]
      
      On macb only (not gem), when a RX queue corruption was detected from
      macb_rx(), the RX queue was reset: during this process the RX ring
      buffer descriptor was initialized by macb_init_rx_ring() but we forgot
      to also set bp->rx_tail to 0.
      
      Indeed, when processing the received frames, bp->rx_tail provides the
      macb driver with the index in the RX ring buffer of the next buffer to
      process. So when the whole ring buffer is reset we must also reset
      bp->rx_tail so the driver is synchronized again with the hardware.
      
      Since macb_init_rx_ring() is called from many locations, currently from
      macb_rx() and macb_init_rings(), we'd rather add the "bp->rx_tail = 0;"
      line inside macb_init_rx_ring() than add the very same line after each
      call of this function.
      
      Without this fix, the rx queue is not reset properly to recover from
      queue corruption and connection drop may occur.
      Signed-off-by: default avatarCyrille Pitchen <cyrille.pitchen@atmel.com>
      Fixes: 9ba723b0 ("net: macb: remove BUG_ON() and reset the queue to handle RX errors")
      Acked-by: default avatarNicolas Ferre <nicolas.ferre@atmel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      023cd33e
    • Herbert Xu's avatar
      netlink: Do not schedule work from sk_destruct · 25d9b4bb
      Herbert Xu authored
      [ Upstream commit ed5d7788 ]
      
      It is wrong to schedule a work from sk_destruct using the socket
      as the memory reserve because the socket will be freed immediately
      after the return from sk_destruct.
      
      Instead we should do the deferral prior to sk_free.
      
      This patch does just that.
      
      Fixes: 707693c8 ("netlink: Call cb->done from a worker thread")
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      25d9b4bb
    • Herbert Xu's avatar
      netlink: Call cb->done from a worker thread · f5dad347
      Herbert Xu authored
      [ Upstream commit 707693c8 ]
      
      The cb->done interface expects to be called in process context.
      This was broken by the netlink RCU conversion.  This patch fixes
      it by adding a worker struct to make the cb->done call where
      necessary.
      
      Fixes: 21e4902a ("netlink: Lockless lookup with RCU grace...")
      Reported-by: default avatarSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f5dad347
    • Amir Vadai's avatar
      net/sched: pedit: make sure that offset is valid · 360d6a23
      Amir Vadai authored
      [ Upstream commit 95c2027b ]
      
      Add a validation function to make sure offset is valid:
      1. Not below skb head (could happen when offset is negative).
      2. Validate both 'offset' and 'at'.
      Signed-off-by: default avatarAmir Vadai <amir@vadai.me>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      360d6a23
    • Nikita Yushchenko's avatar
      net: dsa: fix unbalanced dsa_switch_tree reference counting · aa239369
      Nikita Yushchenko authored
      [ Upstream commit 7a99cd6e ]
      
      _dsa_register_switch() gets a dsa_switch_tree object either via
      dsa_get_dst() or via dsa_add_dst(). Former path does not increase kref
      in returned object (resulting into caller not owning a reference),
      while later path does create a new object (resulting into caller owning
      a reference).
      
      The rest of _dsa_register_switch() assumes that it owns a reference, and
      calls dsa_put_dst().
      
      This causes a memory breakage if first switch in the tree initialized
      successfully, but second failed to initialize. In particular, freed
      dsa_swith_tree object is left referenced by switch that was initialized,
      and later access to sysfs attributes of that switch cause OOPS.
      
      To fix, need to add kref_get() call to dsa_get_dst().
      
      Fixes: 83c0afae ("net: dsa: Add new binding implementation")
      Signed-off-by: default avatarNikita Yushchenko <nikita.yoush@cogentembedded.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aa239369
    • Daniel Borkmann's avatar
      net, sched: respect rcu grace period on cls destruction · 9a747927
      Daniel Borkmann authored
      [ Upstream commit d9363774 ]
      
      Roi reported a crash in flower where tp->root was NULL in ->classify()
      callbacks. Reason is that in ->destroy() tp->root is set to NULL via
      RCU_INIT_POINTER(). It's problematic for some of the classifiers, because
      this doesn't respect RCU grace period for them, and as a result, still
      outstanding readers from tc_classify() will try to blindly dereference
      a NULL tp->root.
      
      The tp->root object is strictly private to the classifier implementation
      and holds internal data the core such as tc_ctl_tfilter() doesn't know
      about. Within some classifiers, such as cls_bpf, cls_basic, etc, tp->root
      is only checked for NULL in ->get() callback, but nowhere else. This is
      misleading and seemed to be copied from old classifier code that was not
      cleaned up properly. For example, d3fa76ee ("[NET_SCHED]: cls_basic:
      fix NULL pointer dereference") moved tp->root initialization into ->init()
      routine, where before it was part of ->change(), so ->get() had to deal
      with tp->root being NULL back then, so that was indeed a valid case, after
      d3fa76ee, not really anymore. We used to set tp->root to NULL long
      ago in ->destroy(), see 47a1a1d4 ("pkt_sched: remove unnecessary xchg()
      in packet classifiers"); but the NULLifying was reintroduced with the
      RCUification, but it's not correct for every classifier implementation.
      
      In the cases that are fixed here with one exception of cls_cgroup, tp->root
      object is allocated and initialized inside ->init() callback, which is always
      performed at a point in time after we allocate a new tp, which means tp and
      thus tp->root was not globally visible in the tp chain yet (see tc_ctl_tfilter()).
      Also, on destruction tp->root is strictly kfree_rcu()'ed in ->destroy()
      handler, same for the tp which is kfree_rcu()'ed right when we return
      from ->destroy() in tcf_destroy(). This means, the head object's lifetime
      for such classifiers is always tied to the tp lifetime. The RCU callback
      invocation for the two kfree_rcu() could be out of order, but that's fine
      since both are independent.
      
      Dropping the RCU_INIT_POINTER(tp->root, NULL) for these classifiers here
      means that 1) we don't need a useless NULL check in fast-path and, 2) that
      outstanding readers of that tp in tc_classify() can still execute under
      respect with RCU grace period as it is actually expected.
      
      Things that haven't been touched here: cls_fw and cls_route. They each
      handle tp->root being NULL in ->classify() path for historic reasons, so
      their ->destroy() implementation can stay as is. If someone actually
      cares, they could get cleaned up at some point to avoid the test in fast
      path. cls_u32 doesn't set tp->root to NULL. For cls_rsvp, I just added a
      !head should anyone actually be using/testing it, so it at least aligns with
      cls_fw and cls_route. For cls_flower we additionally need to defer rhashtable
      destruction (to a sleepable context) after RCU grace period as concurrent
      readers might still access it. (Note that in this case we need to hold module
      reference to keep work callback address intact, since we only wait on module
      unload for all call_rcu()s to finish.)
      
      This fixes one race to bring RCU grace period guarantees back. Next step
      as worked on by Cong however is to fix 1e052be6 ("net_sched: destroy
      proto tp when all filters are gone") to get the order of unlinking the tp
      in tc_ctl_tfilter() for the RTM_DELTFILTER case right by moving
      RCU_INIT_POINTER() before tcf_destroy() and let the notification for
      removal be done through the prior ->delete() callback. Both are independant
      issues. Once we have that right, we can then clean tp->root up for a number
      of classifiers by not making them RCU pointers, which requires a new callback
      (->uninit) that is triggered from tp's RCU callback, where we just kfree()
      tp->root from there.
      
      Fixes: 1f947bf1 ("net: sched: rcu'ify cls_bpf")
      Fixes: 9888faef ("net: sched: cls_basic use RCU")
      Fixes: 70da9f0b ("net: sched: cls_flow use RCU")
      Fixes: 77b9900e ("tc: introduce Flower classifier")
      Fixes: bf3994d2 ("net/sched: introduce Match-all classifier")
      Fixes: 952313bd ("net: sched: cls_cgroup use RCU")
      Reported-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Roi Dayan <roid@mellanox.com>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9a747927
    • Florian Fainelli's avatar
      net: dsa: bcm_sf2: Ensure we re-negotiate EEE during after link change · a9437ebc
      Florian Fainelli authored
      [ Upstream commit 76da8706 ]
      
      In case the link change and EEE is enabled or disabled, always try to
      re-negotiate this with the link partner.
      
      Fixes: 450b05c1 ("net: dsa: bcm_sf2: add support for controlling EEE")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a9437ebc
    • Eric Dumazet's avatar
      udplite: call proper backlog handlers · ddf05343
      Eric Dumazet authored
      [ Upstream commit 30c7be26 ]
      
      In commits 93821778 ("udp: Fix rcv socket locking") and
      f7ad74fe ("net/ipv6/udp: UDP encapsulation: break backlog_rcv into
      __udpv6_queue_rcv_skb") UDP backlog handlers were renamed, but UDPlite
      was forgotten.
      
      This leads to crashes if UDPlite header is pulled twice, which happens
      starting from commit e6afc8ac ("udp: remove headers from UDP packets
      before queueing")
      
      Bug found by syzkaller team, thanks a lot guys !
      
      Note that backlog use in UDP/UDPlite is scheduled to be removed starting
      from linux-4.10, so this patch is only needed up to linux-4.9
      
      Fixes: 93821778 ("udp: Fix rcv socket locking")
      Fixes: f7ad74fe ("net/ipv6/udp: UDP encapsulation: break backlog_rcv into __udpv6_queue_rcv_skb")
      Fixes: e6afc8ac ("udp: remove headers from UDP packets before queueing")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ddf05343
    • Paolo Abeni's avatar
      ipv6: bump genid when the IFA_F_TENTATIVE flag is clear · 7b0aa75b
      Paolo Abeni authored
      [ Upstream commit 764d3be6 ]
      
      When an ipv6 address has the tentative flag set, it can't be
      used as source for egress traffic, while the associated route,
      if any, can be looked up and even stored into some dst_cache.
      
      In the latter scenario, the source ipv6 address selected and
      stored in the cache is most probably wrong (e.g. with
      link-local scope) and the entity using the dst_cache will
      experience lack of ipv6 connectivity until said cache is
      cleared or invalidated.
      
      Overall this may cause lack of connectivity over most IPv6 tunnels
      (comprising geneve and vxlan), if the first egress packet reaches
      the tunnel before the DaD is completed for the used ipv6
      address.
      
      This patch bumps a new genid after that the IFA_F_TENTATIVE flag
      is cleared, so that dst_cache will be invalidated on
      next lookup and ipv6 connectivity restored.
      
      Fixes: 0c1d70af ("net: use dst_cache for vxlan device")
      Fixes: 468dfffc ("geneve: add dst caching support")
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7b0aa75b
    • Zhang Shengju's avatar
      rtnl: fix the loop index update error in rtnl_dump_ifinfo() · 58c8cc33
      Zhang Shengju authored
      [ Upstream commit 3f0ae05d ]
      
      If the link is filtered out, loop index should also be updated. If not,
      loop index will not be correct.
      
      Fixes: dc599f76 ("net: Add support for filtering link dump by master device and kind")
      Signed-off-by: default avatarZhang Shengju <zhangshengju@cmss.chinamobile.com>
      Acked-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      58c8cc33
    • Guillaume Nault's avatar
      l2tp: fix racy SOCK_ZAPPED flag check in l2tp_ip{,6}_bind() · 84df5674
      Guillaume Nault authored
      [ Upstream commit 32c23116 ]
      
      Lock socket before checking the SOCK_ZAPPED flag in l2tp_ip6_bind().
      Without lock, a concurrent call could modify the socket flags between
      the sock_flag(sk, SOCK_ZAPPED) test and the lock_sock() call. This way,
      a socket could be inserted twice in l2tp_ip6_bind_table. Releasing it
      would then leave a stale pointer there, generating use-after-free
      errors when walking through the list or modifying adjacent entries.
      
      BUG: KASAN: use-after-free in l2tp_ip6_close+0x22e/0x290 at addr ffff8800081b0ed8
      Write of size 8 by task syz-executor/10987
      CPU: 0 PID: 10987 Comm: syz-executor Not tainted 4.8.0+ #39
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
       ffff880031d97838 ffffffff829f835b ffff88001b5a1640 ffff8800081b0ec0
       ffff8800081b15a0 ffff8800081b6d20 ffff880031d97860 ffffffff8174d3cc
       ffff880031d978f0 ffff8800081b0e80 ffff88001b5a1640 ffff880031d978e0
      Call Trace:
       [<ffffffff829f835b>] dump_stack+0xb3/0x118 lib/dump_stack.c:15
       [<ffffffff8174d3cc>] kasan_object_err+0x1c/0x70 mm/kasan/report.c:156
       [<     inline     >] print_address_description mm/kasan/report.c:194
       [<ffffffff8174d666>] kasan_report_error+0x1f6/0x4d0 mm/kasan/report.c:283
       [<     inline     >] kasan_report mm/kasan/report.c:303
       [<ffffffff8174db7e>] __asan_report_store8_noabort+0x3e/0x40 mm/kasan/report.c:329
       [<     inline     >] __write_once_size ./include/linux/compiler.h:249
       [<     inline     >] __hlist_del ./include/linux/list.h:622
       [<     inline     >] hlist_del_init ./include/linux/list.h:637
       [<ffffffff8579047e>] l2tp_ip6_close+0x22e/0x290 net/l2tp/l2tp_ip6.c:239
       [<ffffffff850b2dfd>] inet_release+0xed/0x1c0 net/ipv4/af_inet.c:415
       [<ffffffff851dc5a0>] inet6_release+0x50/0x70 net/ipv6/af_inet6.c:422
       [<ffffffff84c4581d>] sock_release+0x8d/0x1d0 net/socket.c:570
       [<ffffffff84c45976>] sock_close+0x16/0x20 net/socket.c:1017
       [<ffffffff817a108c>] __fput+0x28c/0x780 fs/file_table.c:208
       [<ffffffff817a1605>] ____fput+0x15/0x20 fs/file_table.c:244
       [<ffffffff813774f9>] task_work_run+0xf9/0x170
       [<ffffffff81324aae>] do_exit+0x85e/0x2a00
       [<ffffffff81326dc8>] do_group_exit+0x108/0x330
       [<ffffffff81348cf7>] get_signal+0x617/0x17a0 kernel/signal.c:2307
       [<ffffffff811b49af>] do_signal+0x7f/0x18f0
       [<ffffffff810039bf>] exit_to_usermode_loop+0xbf/0x150 arch/x86/entry/common.c:156
       [<     inline     >] prepare_exit_to_usermode arch/x86/entry/common.c:190
       [<ffffffff81006060>] syscall_return_slowpath+0x1a0/0x1e0 arch/x86/entry/common.c:259
       [<ffffffff85e4d726>] entry_SYSCALL_64_fastpath+0xc4/0xc6
      Object at ffff8800081b0ec0, in cache L2TP/IPv6 size: 1448
      Allocated:
      PID = 10987
       [ 1116.897025] [<ffffffff811ddcb6>] save_stack_trace+0x16/0x20
       [ 1116.897025] [<ffffffff8174c736>] save_stack+0x46/0xd0
       [ 1116.897025] [<ffffffff8174c9ad>] kasan_kmalloc+0xad/0xe0
       [ 1116.897025] [<ffffffff8174cee2>] kasan_slab_alloc+0x12/0x20
       [ 1116.897025] [<     inline     >] slab_post_alloc_hook mm/slab.h:417
       [ 1116.897025] [<     inline     >] slab_alloc_node mm/slub.c:2708
       [ 1116.897025] [<     inline     >] slab_alloc mm/slub.c:2716
       [ 1116.897025] [<ffffffff817476a8>] kmem_cache_alloc+0xc8/0x2b0 mm/slub.c:2721
       [ 1116.897025] [<ffffffff84c4f6a9>] sk_prot_alloc+0x69/0x2b0 net/core/sock.c:1326
       [ 1116.897025] [<ffffffff84c58ac8>] sk_alloc+0x38/0xae0 net/core/sock.c:1388
       [ 1116.897025] [<ffffffff851ddf67>] inet6_create+0x2d7/0x1000 net/ipv6/af_inet6.c:182
       [ 1116.897025] [<ffffffff84c4af7b>] __sock_create+0x37b/0x640 net/socket.c:1153
       [ 1116.897025] [<     inline     >] sock_create net/socket.c:1193
       [ 1116.897025] [<     inline     >] SYSC_socket net/socket.c:1223
       [ 1116.897025] [<ffffffff84c4b46f>] SyS_socket+0xef/0x1b0 net/socket.c:1203
       [ 1116.897025] [<ffffffff85e4d685>] entry_SYSCALL_64_fastpath+0x23/0xc6
      Freed:
      PID = 10987
       [ 1116.897025] [<ffffffff811ddcb6>] save_stack_trace+0x16/0x20
       [ 1116.897025] [<ffffffff8174c736>] save_stack+0x46/0xd0
       [ 1116.897025] [<ffffffff8174cf61>] kasan_slab_free+0x71/0xb0
       [ 1116.897025] [<     inline     >] slab_free_hook mm/slub.c:1352
       [ 1116.897025] [<     inline     >] slab_free_freelist_hook mm/slub.c:1374
       [ 1116.897025] [<     inline     >] slab_free mm/slub.c:2951
       [ 1116.897025] [<ffffffff81748b28>] kmem_cache_free+0xc8/0x330 mm/slub.c:2973
       [ 1116.897025] [<     inline     >] sk_prot_free net/core/sock.c:1369
       [ 1116.897025] [<ffffffff84c541eb>] __sk_destruct+0x32b/0x4f0 net/core/sock.c:1444
       [ 1116.897025] [<ffffffff84c5aca4>] sk_destruct+0x44/0x80 net/core/sock.c:1452
       [ 1116.897025] [<ffffffff84c5ad33>] __sk_free+0x53/0x220 net/core/sock.c:1460
       [ 1116.897025] [<ffffffff84c5af23>] sk_free+0x23/0x30 net/core/sock.c:1471
       [ 1116.897025] [<ffffffff84c5cb6c>] sk_common_release+0x28c/0x3e0 ./include/net/sock.h:1589
       [ 1116.897025] [<ffffffff8579044e>] l2tp_ip6_close+0x1fe/0x290 net/l2tp/l2tp_ip6.c:243
       [ 1116.897025] [<ffffffff850b2dfd>] inet_release+0xed/0x1c0 net/ipv4/af_inet.c:415
       [ 1116.897025] [<ffffffff851dc5a0>] inet6_release+0x50/0x70 net/ipv6/af_inet6.c:422
       [ 1116.897025] [<ffffffff84c4581d>] sock_release+0x8d/0x1d0 net/socket.c:570
       [ 1116.897025] [<ffffffff84c45976>] sock_close+0x16/0x20 net/socket.c:1017
       [ 1116.897025] [<ffffffff817a108c>] __fput+0x28c/0x780 fs/file_table.c:208
       [ 1116.897025] [<ffffffff817a1605>] ____fput+0x15/0x20 fs/file_table.c:244
       [ 1116.897025] [<ffffffff813774f9>] task_work_run+0xf9/0x170
       [ 1116.897025] [<ffffffff81324aae>] do_exit+0x85e/0x2a00
       [ 1116.897025] [<ffffffff81326dc8>] do_group_exit+0x108/0x330
       [ 1116.897025] [<ffffffff81348cf7>] get_signal+0x617/0x17a0 kernel/signal.c:2307
       [ 1116.897025] [<ffffffff811b49af>] do_signal+0x7f/0x18f0
       [ 1116.897025] [<ffffffff810039bf>] exit_to_usermode_loop+0xbf/0x150 arch/x86/entry/common.c:156
       [ 1116.897025] [<     inline     >] prepare_exit_to_usermode arch/x86/entry/common.c:190
       [ 1116.897025] [<ffffffff81006060>] syscall_return_slowpath+0x1a0/0x1e0 arch/x86/entry/common.c:259
       [ 1116.897025] [<ffffffff85e4d726>] entry_SYSCALL_64_fastpath+0xc4/0xc6
      Memory state around the buggy address:
       ffff8800081b0d80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff8800081b0e00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      >ffff8800081b0e80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
                                                          ^
       ffff8800081b0f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8800081b0f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      ==================================================================
      
      The same issue exists with l2tp_ip_bind() and l2tp_ip_bind_table.
      
      Fixes: c51ce497 ("l2tp: fix oops in L2TP IP sockets for connect() AF_UNSPEC case")
      Reported-by: default avatarBaozeng Ding <sploving1@gmail.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarBaozeng Ding <sploving1@gmail.com>
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      84df5674
    • Sabrina Dubroca's avatar
      rtnetlink: fix FDB size computation · 7f8b251a
      Sabrina Dubroca authored
      [ Upstream commit f82ef3e1 ]
      
      Add missing NDA_VLAN attribute's size.
      
      Fixes: 1e53d5bb ("net: Pass VLAN ID to rtnl_fdb_notify.")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7f8b251a
    • WANG Cong's avatar
      af_unix: conditionally use freezable blocking calls in read · c39caa8f
      WANG Cong authored
      [ Upstream commit 06a77b07 ]
      
      Commit 2b15af6f ("af_unix: use freezable blocking calls in read")
      converts schedule_timeout() to its freezable version, it was probably
      correct at that time, but later, commit 2b514574
      ("net: af_unix: implement splice for stream af_unix sockets") breaks
      the strong requirement for a freezable sleep, according to
      commit 0f9548ca:
      
          We shouldn't try_to_freeze if locks are held.  Holding a lock can cause a
          deadlock if the lock is later acquired in the suspend or hibernate path
          (e.g.  by dpm).  Holding a lock can also cause a deadlock in the case of
          cgroup_freezer if a lock is held inside a frozen cgroup that is later
          acquired by a process outside that group.
      
      The pipe_lock is still held at that point.
      
      So use freezable version only for the recvmsg call path, avoid impact for
      Android.
      
      Fixes: 2b514574 ("net: af_unix: implement splice for stream af_unix sockets")
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Colin Cross <ccross@android.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c39caa8f
    • Jeremy Linton's avatar
      net: sky2: Fix shutdown crash · bdc5c63e
      Jeremy Linton authored
      [ Upstream commit 06ba3b21 ]
      
      The sky2 frequently crashes during machine shutdown with:
      
      sky2_get_stats+0x60/0x3d8 [sky2]
      dev_get_stats+0x68/0xd8
      rtnl_fill_stats+0x54/0x140
      rtnl_fill_ifinfo+0x46c/0xc68
      rtmsg_ifinfo_build_skb+0x7c/0xf0
      rtmsg_ifinfo.part.22+0x3c/0x70
      rtmsg_ifinfo+0x50/0x5c
      netdev_state_change+0x4c/0x58
      linkwatch_do_dev+0x50/0x88
      __linkwatch_run_queue+0x104/0x1a4
      linkwatch_event+0x30/0x3c
      process_one_work+0x140/0x3e0
      worker_thread+0x60/0x44c
      kthread+0xdc/0xf0
      ret_from_fork+0x10/0x50
      
      This is caused by the sky2 being called after it has been shutdown.
      A previous thread about this can be found here:
      
      https://lkml.org/lkml/2016/4/12/410
      
      An alternative fix is to assure that IFF_UP gets cleared by
      calling dev_close() during shutdown. This is similar to what the
      bnx2/tg3/xgene and maybe others are doing to assure that the driver
      isn't being called following _shutdown().
      Signed-off-by: default avatarJeremy Linton <jeremy.linton@arm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bdc5c63e
    • Paolo Abeni's avatar
      ip6_tunnel: disable caching when the traffic class is inherited · a75684ab
      Paolo Abeni authored
      [ Upstream commit b5c2d495 ]
      
      If an ip6 tunnel is configured to inherit the traffic class from
      the inner header, the dst_cache must be disabled or it will foul
      the policy routing.
      
      The issue is apprently there since at leat Linux-2.6.12-rc2.
      Reported-by: default avatarLiam McBirnie <liam.mcbirnie@boeing.com>
      Cc: Liam McBirnie <liam.mcbirnie@boeing.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a75684ab
    • WANG Cong's avatar
      net: check dead netns for peernet2id_alloc() · 1b079d5b
      WANG Cong authored
      [ Upstream commit cfc44a4d ]
      
      Andrei reports we still allocate netns ID from idr after we destroy
      it in cleanup_net().
      
      cleanup_net():
        ...
        idr_destroy(&net->netns_ids);
        ...
        list_for_each_entry_reverse(ops, &pernet_list, list)
          ops_exit_list(ops, &net_exit_list);
            -> rollback_registered_many()
              -> rtmsg_ifinfo_build_skb()
               -> rtnl_fill_ifinfo()
                 -> peernet2id_alloc()
      
      After that point we should not even access net->netns_ids, we
      should check the death of the current netns as early as we can in
      peernet2id_alloc().
      
      For net-next we can consider to avoid sending rtmsg totally,
      it is a good optimization for netns teardown path.
      
      Fixes: 0c7aecd4 ("netns: add rtnl cmd to add and get peer netns ids")
      Reported-by: default avatarAndrei Vagin <avagin@gmail.com>
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarAndrei Vagin <avagin@openvz.org>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1b079d5b
    • Florian Fainelli's avatar
      net: dsa: b53: Fix VLAN usage and how we treat CPU port · 65dfc8b4
      Florian Fainelli authored
      [ Upstream commit e47112d9 ]
      
      We currently have a fundamental problem in how we treat the CPU port and
      its VLAN membership. As soon as a second VLAN is configured to be
      untagged, the CPU automatically becomes untagged for that VLAN as well,
      and yet, we don't gracefully make sure that the CPU becomes tagged in
      the other VLANs it could be a member of. This results in only one VLAN
      being effectively usable from the CPU's perspective.
      
      Instead of having some pretty complex logic which tries to maintain the
      CPU port's default VLAN and its untagged properties, just do something
      very simple which consists in neither altering the CPU port's PVID
      settings, nor its untagged settings:
      
      - whenever a VLAN is added, the CPU is automatically a member of this
        VLAN group, as a tagged member
      - PVID settings for downstream ports do not alter the CPU port's PVID
        since it now is part of all VLANs in the system
      
      This means that a typical example where e.g: LAN ports are in VLAN1, and
      WAN port is in VLAN2, now require having two VLAN interfaces for the
      host to properly terminate and send traffic from/to.
      
      Fixes: Fixes: a2482d2c ("net: dsa: b53: Plug in VLAN support")
      Reported-by: default avatarHartmut Knaack <knaack.h@gmx.de>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      65dfc8b4
    • Eric Dumazet's avatar
      virtio-net: add a missing synchronize_net() · f959eb50
      Eric Dumazet authored
      [ Upstream commit 963abe5c ]
      
      It seems many drivers do not respect napi_hash_del() contract.
      
      When napi_hash_del() is used before netif_napi_del(), an RCU grace
      period is needed before freeing NAPI object.
      
      Fixes: 91815639 ("virtio-net: rx busy polling support")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f959eb50
    • Eric Dumazet's avatar
      gro_cells: mark napi struct as not busy poll candidates · 8070f33b
      Eric Dumazet authored
      [ Upstream commit e88a2766 ]
      
      Rolf Neugebauer reported very long delays at netns dismantle.
      
      Eric W. Biederman was kind enough to look at this problem
      and noticed synchronize_net() occurring from netif_napi_del() that was
      added in linux-4.5
      
      Busy polling makes no sense for tunnels NAPI.
      If busy poll is used for sessions over tunnels, the poller will need to
      poll the physical device queue anyway.
      
      netif_tx_napi_add() could be used here, but function name is misleading,
      and renaming it is not stable material, so set NAPI_STATE_NO_BUSY_POLL
      bit directly.
      
      This will avoid inserting gro_cells napi structures in napi_hash[]
      and avoid the problematic synchronize_net() (per possible cpu) that
      Rolf reported.
      
      Fixes: 93d05d4a ("net: provide generic busy polling to all NAPI drivers")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarRolf Neugebauer <rolf.neugebauer@docker.com>
      Reported-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Tested-by: default avatarRolf Neugebauer <rolf.neugebauer@docker.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8070f33b