1. 24 Nov, 2017 6 commits
    • Xin Long's avatar
      sctp: do not peel off an assoc from one netns to another one · 2a0e6090
      Xin Long authored
      
      [ Upstream commit df80cd9b ]
      
      Now when peeling off an association to the sock in another netns, all
      transports in this assoc are not to be rehashed and keep use the old
      key in hashtable.
      
      As a transport uses sk->net as the hash key to insert into hashtable,
      it would miss removing these transports from hashtable due to the new
      netns when closing the sock and all transports are being freeed, then
      later an use-after-free issue could be caused when looking up an asoc
      and dereferencing those transports.
      
      This is a very old issue since very beginning, ChunYu found it with
      syzkaller fuzz testing with this series:
      
        socket$inet6_sctp()
        bind$inet6()
        sendto$inet6()
        unshare(0x40000000)
        getsockopt$inet_sctp6_SCTP_GET_ASSOC_ID_LIST()
        getsockopt$inet_sctp6_SCTP_SOCKOPT_PEELOFF()
      
      This patch is to block this call when peeling one assoc off from one
      netns to another one, so that the netns of all transport would not
      go out-sync with the key in hashtable.
      
      Note that this patch didn't fix it by rehashing transports, as it's
      difficult to handle the situation when the tuple is already in use
      in the new netns. Besides, no one would like to peel off one assoc
      to another netns, considering ipaddrs, ifaces, etc. are usually
      different.
      Reported-by: default avatarChunYu Wang <chunwang@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2a0e6090
    • Jason A. Donenfeld's avatar
      af_netlink: ensure that NLMSG_DONE never fails in dumps · 4cfc0b41
      Jason A. Donenfeld authored
      
      [ Upstream commit 0642840b ]
      
      The way people generally use netlink_dump is that they fill in the skb
      as much as possible, breaking when nla_put returns an error. Then, they
      get called again and start filling out the next skb, and again, and so
      forth. The mechanism at work here is the ability for the iterative
      dumping function to detect when the skb is filled up and not fill it
      past the brim, waiting for a fresh skb for the rest of the data.
      
      However, if the attributes are small and nicely packed, it is possible
      that a dump callback function successfully fills in attributes until the
      skb is of size 4080 (libmnl's default page-sized receive buffer size).
      The dump function completes, satisfied, and then, if it happens to be
      that this is actually the last skb, and no further ones are to be sent,
      then netlink_dump will add on the NLMSG_DONE part:
      
        nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
      
      It is very important that netlink_dump does this, of course. However, in
      this example, that call to nlmsg_put_answer will fail, because the
      previous filling by the dump function did not leave it enough room. And
      how could it possibly have done so? All of the nla_put variety of
      functions simply check to see if the skb has enough tailroom,
      independent of the context it is in.
      
      In order to keep the important assumptions of all netlink dump users, it
      is therefore important to give them an skb that has this end part of the
      tail already reserved, so that the call to nlmsg_put_answer does not
      fail. Otherwise, library authors are forced to find some bizarre sized
      receive buffer that has a large modulo relative to the common sizes of
      messages received, which is ugly and buggy.
      
      This patch thus saves the NLMSG_DONE for an additional message, for the
      case that things are dangerously close to the brim. This requires
      keeping track of the errno from ->dump() across calls.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4cfc0b41
    • Cong Wang's avatar
      vlan: fix a use-after-free in vlan_device_event() · ef206ea7
      Cong Wang authored
      
      [ Upstream commit 052d41c0 ]
      
      After refcnt reaches zero, vlan_vid_del() could free
      dev->vlan_info via RCU:
      
      	RCU_INIT_POINTER(dev->vlan_info, NULL);
      	call_rcu(&vlan_info->rcu, vlan_info_rcu_free);
      
      However, the pointer 'grp' still points to that memory
      since it is set before vlan_vid_del():
      
              vlan_info = rtnl_dereference(dev->vlan_info);
              if (!vlan_info)
                      goto out;
              grp = &vlan_info->grp;
      
      Depends on when that RCU callback is scheduled, we could
      trigger a use-after-free in vlan_group_for_each_dev()
      right following this vlan_vid_del().
      
      Fix it by moving vlan_vid_del() before setting grp. This
      is also symmetric to the vlan_vid_add() we call in
      vlan_device_event().
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Fixes: efc73f4b ("net: Fix memory leak - vlan_info struct")
      Cc: Alexander Duyck <alexander.duyck@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Girish Moodalbail <girish.moodalbail@oracle.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Reviewed-by: default avatarGirish Moodalbail <girish.moodalbail@oracle.com>
      Tested-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ef206ea7
    • Hangbin Liu's avatar
      bonding: discard lowest hash bit for 802.3ad layer3+4 · 3bb6245e
      Hangbin Liu authored
      
      [ Upstream commit b5f86218 ]
      
      After commit 07f4c900 ("tcp/dccp: try to not exhaust ip_local_port_range
      in connect()"), we will try to use even ports for connect(). Then if an
      application (seen clearly with iperf) opens multiple streams to the same
      destination IP and port, each stream will be given an even source port.
      
      So the bonding driver's simple xmit_hash_policy based on layer3+4 addressing
      will always hash all these streams to the same interface. And the total
      throughput will limited to a single slave.
      
      Change the tcp code will impact the whole tcp behavior, only for bonding
      usage. Paolo Abeni suggested fix this by changing the bonding code only,
      which should be more reasonable, and less impact.
      
      Fix this by discarding the lowest hash bit because it contains little entropy.
      After the fix we can re-balance between slaves.
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3bb6245e
    • Ye Yin's avatar
      netfilter/ipvs: clear ipvs_property flag when SKB net namespace changed · 001e9cbe
      Ye Yin authored
      
      [ Upstream commit 2b5ec1a5 ]
      
      When run ipvs in two different network namespace at the same host, and one
      ipvs transport network traffic to the other network namespace ipvs.
      'ipvs_property' flag will make the second ipvs take no effect. So we should
      clear 'ipvs_property' when SKB network namespace changed.
      
      Fixes: 621e84d6 ("dev: introduce skb_scrub_packet()")
      Signed-off-by: default avatarYe Yin <hustcat@gmail.com>
      Signed-off-by: default avatarWei Zhou <chouryzhou@gmail.com>
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      001e9cbe
    • Eric Dumazet's avatar
      tcp: do not mangle skb->cb[] in tcp_make_synack() · 0c1282c7
      Eric Dumazet authored
      
      [ Upstream commit 3b117750 ]
      
      Christoph Paasch sent a patch to address the following issue :
      
      tcp_make_synack() is leaving some TCP private info in skb->cb[],
      then send the packet by other means than tcp_transmit_skb()
      
      tcp_transmit_skb() makes sure to clear skb->cb[] to not confuse
      IPv4/IPV6 stacks, but we have no such cleanup for SYNACK.
      
      tcp_make_synack() should not use tcp_init_nondata_skb() :
      
      tcp_init_nondata_skb() really should be limited to skbs put in write/rtx
      queues (the ones that are only sent via tcp_transmit_skb())
      
      This patch fixes the issue and should even save few cpu cycles ;)
      
      Fixes: 971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Reviewed-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0c1282c7
  2. 21 Nov, 2017 34 commits