1. 10 Dec, 2016 18 commits
    • Herbert Xu's avatar
      netlink: Do not schedule work from sk_destruct · 25d9b4bb
      Herbert Xu authored
      [ Upstream commit ed5d7788 ]
      
      It is wrong to schedule a work from sk_destruct using the socket
      as the memory reserve because the socket will be freed immediately
      after the return from sk_destruct.
      
      Instead we should do the deferral prior to sk_free.
      
      This patch does just that.
      
      Fixes: 707693c8 ("netlink: Call cb->done from a worker thread")
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      25d9b4bb
    • Herbert Xu's avatar
      netlink: Call cb->done from a worker thread · f5dad347
      Herbert Xu authored
      [ Upstream commit 707693c8 ]
      
      The cb->done interface expects to be called in process context.
      This was broken by the netlink RCU conversion.  This patch fixes
      it by adding a worker struct to make the cb->done call where
      necessary.
      
      Fixes: 21e4902a ("netlink: Lockless lookup with RCU grace...")
      Reported-by: default avatarSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f5dad347
    • Amir Vadai's avatar
      net/sched: pedit: make sure that offset is valid · 360d6a23
      Amir Vadai authored
      [ Upstream commit 95c2027b ]
      
      Add a validation function to make sure offset is valid:
      1. Not below skb head (could happen when offset is negative).
      2. Validate both 'offset' and 'at'.
      Signed-off-by: default avatarAmir Vadai <amir@vadai.me>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      360d6a23
    • Nikita Yushchenko's avatar
      net: dsa: fix unbalanced dsa_switch_tree reference counting · aa239369
      Nikita Yushchenko authored
      [ Upstream commit 7a99cd6e ]
      
      _dsa_register_switch() gets a dsa_switch_tree object either via
      dsa_get_dst() or via dsa_add_dst(). Former path does not increase kref
      in returned object (resulting into caller not owning a reference),
      while later path does create a new object (resulting into caller owning
      a reference).
      
      The rest of _dsa_register_switch() assumes that it owns a reference, and
      calls dsa_put_dst().
      
      This causes a memory breakage if first switch in the tree initialized
      successfully, but second failed to initialize. In particular, freed
      dsa_swith_tree object is left referenced by switch that was initialized,
      and later access to sysfs attributes of that switch cause OOPS.
      
      To fix, need to add kref_get() call to dsa_get_dst().
      
      Fixes: 83c0afae ("net: dsa: Add new binding implementation")
      Signed-off-by: default avatarNikita Yushchenko <nikita.yoush@cogentembedded.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aa239369
    • Daniel Borkmann's avatar
      net, sched: respect rcu grace period on cls destruction · 9a747927
      Daniel Borkmann authored
      [ Upstream commit d9363774 ]
      
      Roi reported a crash in flower where tp->root was NULL in ->classify()
      callbacks. Reason is that in ->destroy() tp->root is set to NULL via
      RCU_INIT_POINTER(). It's problematic for some of the classifiers, because
      this doesn't respect RCU grace period for them, and as a result, still
      outstanding readers from tc_classify() will try to blindly dereference
      a NULL tp->root.
      
      The tp->root object is strictly private to the classifier implementation
      and holds internal data the core such as tc_ctl_tfilter() doesn't know
      about. Within some classifiers, such as cls_bpf, cls_basic, etc, tp->root
      is only checked for NULL in ->get() callback, but nowhere else. This is
      misleading and seemed to be copied from old classifier code that was not
      cleaned up properly. For example, d3fa76ee ("[NET_SCHED]: cls_basic:
      fix NULL pointer dereference") moved tp->root initialization into ->init()
      routine, where before it was part of ->change(), so ->get() had to deal
      with tp->root being NULL back then, so that was indeed a valid case, after
      d3fa76ee, not really anymore. We used to set tp->root to NULL long
      ago in ->destroy(), see 47a1a1d4 ("pkt_sched: remove unnecessary xchg()
      in packet classifiers"); but the NULLifying was reintroduced with the
      RCUification, but it's not correct for every classifier implementation.
      
      In the cases that are fixed here with one exception of cls_cgroup, tp->root
      object is allocated and initialized inside ->init() callback, which is always
      performed at a point in time after we allocate a new tp, which means tp and
      thus tp->root was not globally visible in the tp chain yet (see tc_ctl_tfilter()).
      Also, on destruction tp->root is strictly kfree_rcu()'ed in ->destroy()
      handler, same for the tp which is kfree_rcu()'ed right when we return
      from ->destroy() in tcf_destroy(). This means, the head object's lifetime
      for such classifiers is always tied to the tp lifetime. The RCU callback
      invocation for the two kfree_rcu() could be out of order, but that's fine
      since both are independent.
      
      Dropping the RCU_INIT_POINTER(tp->root, NULL) for these classifiers here
      means that 1) we don't need a useless NULL check in fast-path and, 2) that
      outstanding readers of that tp in tc_classify() can still execute under
      respect with RCU grace period as it is actually expected.
      
      Things that haven't been touched here: cls_fw and cls_route. They each
      handle tp->root being NULL in ->classify() path for historic reasons, so
      their ->destroy() implementation can stay as is. If someone actually
      cares, they could get cleaned up at some point to avoid the test in fast
      path. cls_u32 doesn't set tp->root to NULL. For cls_rsvp, I just added a
      !head should anyone actually be using/testing it, so it at least aligns with
      cls_fw and cls_route. For cls_flower we additionally need to defer rhashtable
      destruction (to a sleepable context) after RCU grace period as concurrent
      readers might still access it. (Note that in this case we need to hold module
      reference to keep work callback address intact, since we only wait on module
      unload for all call_rcu()s to finish.)
      
      This fixes one race to bring RCU grace period guarantees back. Next step
      as worked on by Cong however is to fix 1e052be6 ("net_sched: destroy
      proto tp when all filters are gone") to get the order of unlinking the tp
      in tc_ctl_tfilter() for the RTM_DELTFILTER case right by moving
      RCU_INIT_POINTER() before tcf_destroy() and let the notification for
      removal be done through the prior ->delete() callback. Both are independant
      issues. Once we have that right, we can then clean tp->root up for a number
      of classifiers by not making them RCU pointers, which requires a new callback
      (->uninit) that is triggered from tp's RCU callback, where we just kfree()
      tp->root from there.
      
      Fixes: 1f947bf1 ("net: sched: rcu'ify cls_bpf")
      Fixes: 9888faef ("net: sched: cls_basic use RCU")
      Fixes: 70da9f0b ("net: sched: cls_flow use RCU")
      Fixes: 77b9900e ("tc: introduce Flower classifier")
      Fixes: bf3994d2 ("net/sched: introduce Match-all classifier")
      Fixes: 952313bd ("net: sched: cls_cgroup use RCU")
      Reported-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Roi Dayan <roid@mellanox.com>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9a747927
    • Florian Fainelli's avatar
      net: dsa: bcm_sf2: Ensure we re-negotiate EEE during after link change · a9437ebc
      Florian Fainelli authored
      [ Upstream commit 76da8706 ]
      
      In case the link change and EEE is enabled or disabled, always try to
      re-negotiate this with the link partner.
      
      Fixes: 450b05c1 ("net: dsa: bcm_sf2: add support for controlling EEE")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a9437ebc
    • Eric Dumazet's avatar
      udplite: call proper backlog handlers · ddf05343
      Eric Dumazet authored
      [ Upstream commit 30c7be26 ]
      
      In commits 93821778 ("udp: Fix rcv socket locking") and
      f7ad74fe ("net/ipv6/udp: UDP encapsulation: break backlog_rcv into
      __udpv6_queue_rcv_skb") UDP backlog handlers were renamed, but UDPlite
      was forgotten.
      
      This leads to crashes if UDPlite header is pulled twice, which happens
      starting from commit e6afc8ac ("udp: remove headers from UDP packets
      before queueing")
      
      Bug found by syzkaller team, thanks a lot guys !
      
      Note that backlog use in UDP/UDPlite is scheduled to be removed starting
      from linux-4.10, so this patch is only needed up to linux-4.9
      
      Fixes: 93821778 ("udp: Fix rcv socket locking")
      Fixes: f7ad74fe ("net/ipv6/udp: UDP encapsulation: break backlog_rcv into __udpv6_queue_rcv_skb")
      Fixes: e6afc8ac ("udp: remove headers from UDP packets before queueing")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ddf05343
    • Paolo Abeni's avatar
      ipv6: bump genid when the IFA_F_TENTATIVE flag is clear · 7b0aa75b
      Paolo Abeni authored
      [ Upstream commit 764d3be6 ]
      
      When an ipv6 address has the tentative flag set, it can't be
      used as source for egress traffic, while the associated route,
      if any, can be looked up and even stored into some dst_cache.
      
      In the latter scenario, the source ipv6 address selected and
      stored in the cache is most probably wrong (e.g. with
      link-local scope) and the entity using the dst_cache will
      experience lack of ipv6 connectivity until said cache is
      cleared or invalidated.
      
      Overall this may cause lack of connectivity over most IPv6 tunnels
      (comprising geneve and vxlan), if the first egress packet reaches
      the tunnel before the DaD is completed for the used ipv6
      address.
      
      This patch bumps a new genid after that the IFA_F_TENTATIVE flag
      is cleared, so that dst_cache will be invalidated on
      next lookup and ipv6 connectivity restored.
      
      Fixes: 0c1d70af ("net: use dst_cache for vxlan device")
      Fixes: 468dfffc ("geneve: add dst caching support")
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7b0aa75b
    • Zhang Shengju's avatar
      rtnl: fix the loop index update error in rtnl_dump_ifinfo() · 58c8cc33
      Zhang Shengju authored
      [ Upstream commit 3f0ae05d ]
      
      If the link is filtered out, loop index should also be updated. If not,
      loop index will not be correct.
      
      Fixes: dc599f76 ("net: Add support for filtering link dump by master device and kind")
      Signed-off-by: default avatarZhang Shengju <zhangshengju@cmss.chinamobile.com>
      Acked-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      58c8cc33
    • Guillaume Nault's avatar
      l2tp: fix racy SOCK_ZAPPED flag check in l2tp_ip{,6}_bind() · 84df5674
      Guillaume Nault authored
      [ Upstream commit 32c23116 ]
      
      Lock socket before checking the SOCK_ZAPPED flag in l2tp_ip6_bind().
      Without lock, a concurrent call could modify the socket flags between
      the sock_flag(sk, SOCK_ZAPPED) test and the lock_sock() call. This way,
      a socket could be inserted twice in l2tp_ip6_bind_table. Releasing it
      would then leave a stale pointer there, generating use-after-free
      errors when walking through the list or modifying adjacent entries.
      
      BUG: KASAN: use-after-free in l2tp_ip6_close+0x22e/0x290 at addr ffff8800081b0ed8
      Write of size 8 by task syz-executor/10987
      CPU: 0 PID: 10987 Comm: syz-executor Not tainted 4.8.0+ #39
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
       ffff880031d97838 ffffffff829f835b ffff88001b5a1640 ffff8800081b0ec0
       ffff8800081b15a0 ffff8800081b6d20 ffff880031d97860 ffffffff8174d3cc
       ffff880031d978f0 ffff8800081b0e80 ffff88001b5a1640 ffff880031d978e0
      Call Trace:
       [<ffffffff829f835b>] dump_stack+0xb3/0x118 lib/dump_stack.c:15
       [<ffffffff8174d3cc>] kasan_object_err+0x1c/0x70 mm/kasan/report.c:156
       [<     inline     >] print_address_description mm/kasan/report.c:194
       [<ffffffff8174d666>] kasan_report_error+0x1f6/0x4d0 mm/kasan/report.c:283
       [<     inline     >] kasan_report mm/kasan/report.c:303
       [<ffffffff8174db7e>] __asan_report_store8_noabort+0x3e/0x40 mm/kasan/report.c:329
       [<     inline     >] __write_once_size ./include/linux/compiler.h:249
       [<     inline     >] __hlist_del ./include/linux/list.h:622
       [<     inline     >] hlist_del_init ./include/linux/list.h:637
       [<ffffffff8579047e>] l2tp_ip6_close+0x22e/0x290 net/l2tp/l2tp_ip6.c:239
       [<ffffffff850b2dfd>] inet_release+0xed/0x1c0 net/ipv4/af_inet.c:415
       [<ffffffff851dc5a0>] inet6_release+0x50/0x70 net/ipv6/af_inet6.c:422
       [<ffffffff84c4581d>] sock_release+0x8d/0x1d0 net/socket.c:570
       [<ffffffff84c45976>] sock_close+0x16/0x20 net/socket.c:1017
       [<ffffffff817a108c>] __fput+0x28c/0x780 fs/file_table.c:208
       [<ffffffff817a1605>] ____fput+0x15/0x20 fs/file_table.c:244
       [<ffffffff813774f9>] task_work_run+0xf9/0x170
       [<ffffffff81324aae>] do_exit+0x85e/0x2a00
       [<ffffffff81326dc8>] do_group_exit+0x108/0x330
       [<ffffffff81348cf7>] get_signal+0x617/0x17a0 kernel/signal.c:2307
       [<ffffffff811b49af>] do_signal+0x7f/0x18f0
       [<ffffffff810039bf>] exit_to_usermode_loop+0xbf/0x150 arch/x86/entry/common.c:156
       [<     inline     >] prepare_exit_to_usermode arch/x86/entry/common.c:190
       [<ffffffff81006060>] syscall_return_slowpath+0x1a0/0x1e0 arch/x86/entry/common.c:259
       [<ffffffff85e4d726>] entry_SYSCALL_64_fastpath+0xc4/0xc6
      Object at ffff8800081b0ec0, in cache L2TP/IPv6 size: 1448
      Allocated:
      PID = 10987
       [ 1116.897025] [<ffffffff811ddcb6>] save_stack_trace+0x16/0x20
       [ 1116.897025] [<ffffffff8174c736>] save_stack+0x46/0xd0
       [ 1116.897025] [<ffffffff8174c9ad>] kasan_kmalloc+0xad/0xe0
       [ 1116.897025] [<ffffffff8174cee2>] kasan_slab_alloc+0x12/0x20
       [ 1116.897025] [<     inline     >] slab_post_alloc_hook mm/slab.h:417
       [ 1116.897025] [<     inline     >] slab_alloc_node mm/slub.c:2708
       [ 1116.897025] [<     inline     >] slab_alloc mm/slub.c:2716
       [ 1116.897025] [<ffffffff817476a8>] kmem_cache_alloc+0xc8/0x2b0 mm/slub.c:2721
       [ 1116.897025] [<ffffffff84c4f6a9>] sk_prot_alloc+0x69/0x2b0 net/core/sock.c:1326
       [ 1116.897025] [<ffffffff84c58ac8>] sk_alloc+0x38/0xae0 net/core/sock.c:1388
       [ 1116.897025] [<ffffffff851ddf67>] inet6_create+0x2d7/0x1000 net/ipv6/af_inet6.c:182
       [ 1116.897025] [<ffffffff84c4af7b>] __sock_create+0x37b/0x640 net/socket.c:1153
       [ 1116.897025] [<     inline     >] sock_create net/socket.c:1193
       [ 1116.897025] [<     inline     >] SYSC_socket net/socket.c:1223
       [ 1116.897025] [<ffffffff84c4b46f>] SyS_socket+0xef/0x1b0 net/socket.c:1203
       [ 1116.897025] [<ffffffff85e4d685>] entry_SYSCALL_64_fastpath+0x23/0xc6
      Freed:
      PID = 10987
       [ 1116.897025] [<ffffffff811ddcb6>] save_stack_trace+0x16/0x20
       [ 1116.897025] [<ffffffff8174c736>] save_stack+0x46/0xd0
       [ 1116.897025] [<ffffffff8174cf61>] kasan_slab_free+0x71/0xb0
       [ 1116.897025] [<     inline     >] slab_free_hook mm/slub.c:1352
       [ 1116.897025] [<     inline     >] slab_free_freelist_hook mm/slub.c:1374
       [ 1116.897025] [<     inline     >] slab_free mm/slub.c:2951
       [ 1116.897025] [<ffffffff81748b28>] kmem_cache_free+0xc8/0x330 mm/slub.c:2973
       [ 1116.897025] [<     inline     >] sk_prot_free net/core/sock.c:1369
       [ 1116.897025] [<ffffffff84c541eb>] __sk_destruct+0x32b/0x4f0 net/core/sock.c:1444
       [ 1116.897025] [<ffffffff84c5aca4>] sk_destruct+0x44/0x80 net/core/sock.c:1452
       [ 1116.897025] [<ffffffff84c5ad33>] __sk_free+0x53/0x220 net/core/sock.c:1460
       [ 1116.897025] [<ffffffff84c5af23>] sk_free+0x23/0x30 net/core/sock.c:1471
       [ 1116.897025] [<ffffffff84c5cb6c>] sk_common_release+0x28c/0x3e0 ./include/net/sock.h:1589
       [ 1116.897025] [<ffffffff8579044e>] l2tp_ip6_close+0x1fe/0x290 net/l2tp/l2tp_ip6.c:243
       [ 1116.897025] [<ffffffff850b2dfd>] inet_release+0xed/0x1c0 net/ipv4/af_inet.c:415
       [ 1116.897025] [<ffffffff851dc5a0>] inet6_release+0x50/0x70 net/ipv6/af_inet6.c:422
       [ 1116.897025] [<ffffffff84c4581d>] sock_release+0x8d/0x1d0 net/socket.c:570
       [ 1116.897025] [<ffffffff84c45976>] sock_close+0x16/0x20 net/socket.c:1017
       [ 1116.897025] [<ffffffff817a108c>] __fput+0x28c/0x780 fs/file_table.c:208
       [ 1116.897025] [<ffffffff817a1605>] ____fput+0x15/0x20 fs/file_table.c:244
       [ 1116.897025] [<ffffffff813774f9>] task_work_run+0xf9/0x170
       [ 1116.897025] [<ffffffff81324aae>] do_exit+0x85e/0x2a00
       [ 1116.897025] [<ffffffff81326dc8>] do_group_exit+0x108/0x330
       [ 1116.897025] [<ffffffff81348cf7>] get_signal+0x617/0x17a0 kernel/signal.c:2307
       [ 1116.897025] [<ffffffff811b49af>] do_signal+0x7f/0x18f0
       [ 1116.897025] [<ffffffff810039bf>] exit_to_usermode_loop+0xbf/0x150 arch/x86/entry/common.c:156
       [ 1116.897025] [<     inline     >] prepare_exit_to_usermode arch/x86/entry/common.c:190
       [ 1116.897025] [<ffffffff81006060>] syscall_return_slowpath+0x1a0/0x1e0 arch/x86/entry/common.c:259
       [ 1116.897025] [<ffffffff85e4d726>] entry_SYSCALL_64_fastpath+0xc4/0xc6
      Memory state around the buggy address:
       ffff8800081b0d80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff8800081b0e00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      >ffff8800081b0e80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
                                                          ^
       ffff8800081b0f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8800081b0f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      ==================================================================
      
      The same issue exists with l2tp_ip_bind() and l2tp_ip_bind_table.
      
      Fixes: c51ce497 ("l2tp: fix oops in L2TP IP sockets for connect() AF_UNSPEC case")
      Reported-by: default avatarBaozeng Ding <sploving1@gmail.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarBaozeng Ding <sploving1@gmail.com>
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      84df5674
    • Sabrina Dubroca's avatar
      rtnetlink: fix FDB size computation · 7f8b251a
      Sabrina Dubroca authored
      [ Upstream commit f82ef3e1 ]
      
      Add missing NDA_VLAN attribute's size.
      
      Fixes: 1e53d5bb ("net: Pass VLAN ID to rtnl_fdb_notify.")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7f8b251a
    • WANG Cong's avatar
      af_unix: conditionally use freezable blocking calls in read · c39caa8f
      WANG Cong authored
      [ Upstream commit 06a77b07 ]
      
      Commit 2b15af6f ("af_unix: use freezable blocking calls in read")
      converts schedule_timeout() to its freezable version, it was probably
      correct at that time, but later, commit 2b514574
      ("net: af_unix: implement splice for stream af_unix sockets") breaks
      the strong requirement for a freezable sleep, according to
      commit 0f9548ca:
      
          We shouldn't try_to_freeze if locks are held.  Holding a lock can cause a
          deadlock if the lock is later acquired in the suspend or hibernate path
          (e.g.  by dpm).  Holding a lock can also cause a deadlock in the case of
          cgroup_freezer if a lock is held inside a frozen cgroup that is later
          acquired by a process outside that group.
      
      The pipe_lock is still held at that point.
      
      So use freezable version only for the recvmsg call path, avoid impact for
      Android.
      
      Fixes: 2b514574 ("net: af_unix: implement splice for stream af_unix sockets")
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Colin Cross <ccross@android.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c39caa8f
    • Jeremy Linton's avatar
      net: sky2: Fix shutdown crash · bdc5c63e
      Jeremy Linton authored
      [ Upstream commit 06ba3b21 ]
      
      The sky2 frequently crashes during machine shutdown with:
      
      sky2_get_stats+0x60/0x3d8 [sky2]
      dev_get_stats+0x68/0xd8
      rtnl_fill_stats+0x54/0x140
      rtnl_fill_ifinfo+0x46c/0xc68
      rtmsg_ifinfo_build_skb+0x7c/0xf0
      rtmsg_ifinfo.part.22+0x3c/0x70
      rtmsg_ifinfo+0x50/0x5c
      netdev_state_change+0x4c/0x58
      linkwatch_do_dev+0x50/0x88
      __linkwatch_run_queue+0x104/0x1a4
      linkwatch_event+0x30/0x3c
      process_one_work+0x140/0x3e0
      worker_thread+0x60/0x44c
      kthread+0xdc/0xf0
      ret_from_fork+0x10/0x50
      
      This is caused by the sky2 being called after it has been shutdown.
      A previous thread about this can be found here:
      
      https://lkml.org/lkml/2016/4/12/410
      
      An alternative fix is to assure that IFF_UP gets cleared by
      calling dev_close() during shutdown. This is similar to what the
      bnx2/tg3/xgene and maybe others are doing to assure that the driver
      isn't being called following _shutdown().
      Signed-off-by: default avatarJeremy Linton <jeremy.linton@arm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bdc5c63e
    • Paolo Abeni's avatar
      ip6_tunnel: disable caching when the traffic class is inherited · a75684ab
      Paolo Abeni authored
      [ Upstream commit b5c2d495 ]
      
      If an ip6 tunnel is configured to inherit the traffic class from
      the inner header, the dst_cache must be disabled or it will foul
      the policy routing.
      
      The issue is apprently there since at leat Linux-2.6.12-rc2.
      Reported-by: default avatarLiam McBirnie <liam.mcbirnie@boeing.com>
      Cc: Liam McBirnie <liam.mcbirnie@boeing.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a75684ab
    • WANG Cong's avatar
      net: check dead netns for peernet2id_alloc() · 1b079d5b
      WANG Cong authored
      [ Upstream commit cfc44a4d ]
      
      Andrei reports we still allocate netns ID from idr after we destroy
      it in cleanup_net().
      
      cleanup_net():
        ...
        idr_destroy(&net->netns_ids);
        ...
        list_for_each_entry_reverse(ops, &pernet_list, list)
          ops_exit_list(ops, &net_exit_list);
            -> rollback_registered_many()
              -> rtmsg_ifinfo_build_skb()
               -> rtnl_fill_ifinfo()
                 -> peernet2id_alloc()
      
      After that point we should not even access net->netns_ids, we
      should check the death of the current netns as early as we can in
      peernet2id_alloc().
      
      For net-next we can consider to avoid sending rtmsg totally,
      it is a good optimization for netns teardown path.
      
      Fixes: 0c7aecd4 ("netns: add rtnl cmd to add and get peer netns ids")
      Reported-by: default avatarAndrei Vagin <avagin@gmail.com>
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarAndrei Vagin <avagin@openvz.org>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1b079d5b
    • Florian Fainelli's avatar
      net: dsa: b53: Fix VLAN usage and how we treat CPU port · 65dfc8b4
      Florian Fainelli authored
      [ Upstream commit e47112d9 ]
      
      We currently have a fundamental problem in how we treat the CPU port and
      its VLAN membership. As soon as a second VLAN is configured to be
      untagged, the CPU automatically becomes untagged for that VLAN as well,
      and yet, we don't gracefully make sure that the CPU becomes tagged in
      the other VLANs it could be a member of. This results in only one VLAN
      being effectively usable from the CPU's perspective.
      
      Instead of having some pretty complex logic which tries to maintain the
      CPU port's default VLAN and its untagged properties, just do something
      very simple which consists in neither altering the CPU port's PVID
      settings, nor its untagged settings:
      
      - whenever a VLAN is added, the CPU is automatically a member of this
        VLAN group, as a tagged member
      - PVID settings for downstream ports do not alter the CPU port's PVID
        since it now is part of all VLANs in the system
      
      This means that a typical example where e.g: LAN ports are in VLAN1, and
      WAN port is in VLAN2, now require having two VLAN interfaces for the
      host to properly terminate and send traffic from/to.
      
      Fixes: Fixes: a2482d2c ("net: dsa: b53: Plug in VLAN support")
      Reported-by: default avatarHartmut Knaack <knaack.h@gmx.de>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      65dfc8b4
    • Eric Dumazet's avatar
      virtio-net: add a missing synchronize_net() · f959eb50
      Eric Dumazet authored
      [ Upstream commit 963abe5c ]
      
      It seems many drivers do not respect napi_hash_del() contract.
      
      When napi_hash_del() is used before netif_napi_del(), an RCU grace
      period is needed before freeing NAPI object.
      
      Fixes: 91815639 ("virtio-net: rx busy polling support")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f959eb50
    • Eric Dumazet's avatar
      gro_cells: mark napi struct as not busy poll candidates · 8070f33b
      Eric Dumazet authored
      [ Upstream commit e88a2766 ]
      
      Rolf Neugebauer reported very long delays at netns dismantle.
      
      Eric W. Biederman was kind enough to look at this problem
      and noticed synchronize_net() occurring from netif_napi_del() that was
      added in linux-4.5
      
      Busy polling makes no sense for tunnels NAPI.
      If busy poll is used for sessions over tunnels, the poller will need to
      poll the physical device queue anyway.
      
      netif_tx_napi_add() could be used here, but function name is misleading,
      and renaming it is not stable material, so set NAPI_STATE_NO_BUSY_POLL
      bit directly.
      
      This will avoid inserting gro_cells napi structures in napi_hash[]
      and avoid the problematic synchronize_net() (per possible cpu) that
      Rolf reported.
      
      Fixes: 93d05d4a ("net: provide generic busy polling to all NAPI drivers")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarRolf Neugebauer <rolf.neugebauer@docker.com>
      Reported-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Tested-by: default avatarRolf Neugebauer <rolf.neugebauer@docker.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8070f33b
  2. 08 Dec, 2016 22 commits