1. 06 Jul, 2015 21 commits
  2. 02 Jul, 2015 2 commits
    • Doug Ledford's avatar
      IB/ipoib: change init sequence ordering · 32eb7ef5
      Doug Ledford authored
      commit be7aa663 upstream.
      
      In preparation for using per device work queues, we need to move the
      start of the neighbor thread task to after ipoib_ib_dev_init and move
      the destruction of the neighbor task to before ipoib_ib_dev_cleanup.
      Otherwise we will end up freeing our workqueue with work possibly
      still on it.
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Cc: Joseph Salisbury <joseph.salisbury@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      32eb7ef5
    • Doug Ledford's avatar
      IB/ipoib: factor out ah flushing · 7294ca57
      Doug Ledford authored
      commit e135106f upstream.
      
      Create a an ipoib_flush_ah and ipoib_stop_ah routines to use at
      appropriate times to flush out all remaining ah entries before we shut
      the device down.
      
      Because neighbors and mcast entries can each have a reference on any
      given ah, we must make sure to free all of those first before our ah
      will actually have a 0 refcount and be able to be reaped.
      
      This factoring is needed in preparation for having per-device work
      queues.  The original per-device workqueue code resulted in the following
      error message:
      
      <ibdev>: ib_dealloc_pd failed
      
      That error was tracked down to this issue.  With the changes to which
      workqueues were flushed when, there were no flushes of the per device
      workqueue after the last ah's were freed, resulting in an attempt to
      dealloc the pd with outstanding resources still allocated.  This code
      puts the explicit flushes in the needed places to avoid that problem.
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Cc: Joseph Salisbury <joseph.salisbury@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      7294ca57
  3. 30 Jun, 2015 6 commits
  4. 22 Jun, 2015 1 commit
  5. 18 Jun, 2015 1 commit
  6. 17 Jun, 2015 9 commits
    • Sriharsha Basavapatna's avatar
      be2net: Replace dma/pci_alloc_coherent() calls with dma_zalloc_coherent() · 868cbcb6
      Sriharsha Basavapatna authored
      [ Upstream commit e51000db ]
      
      There are several places in the driver (all in control paths) where
      coherent dma memory is being allocated using either dma_alloc_coherent()
      or the deprecated pci_alloc_consistent(). All these calls should be
      changed to use dma_zalloc_coherent() to avoid uninitialized fields in
      data structures backed by this memory.
      Reported-by: default avatarJoerg Roedel <jroedel@suse.de>
      Tested-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarSriharsha Basavapatna <sriharsha.basavapatna@avagotech.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      868cbcb6
    • Neal Cardwell's avatar
      tcp: fix child sockets to use system default congestion control if not set · 18320675
      Neal Cardwell authored
      [ Upstream commit 9f950415 ]
      
      Linux 3.17 and earlier are explicitly engineered so that if the app
      doesn't specifically request a CC module on a listener before the SYN
      arrives, then the child gets the system default CC when the connection
      is established. See tcp_init_congestion_control() in 3.17 or earlier,
      which says "if no choice made yet assign the current value set as
      default". The change ("net: tcp: assign tcp cong_ops when tcp sk is
      created") altered these semantics, so that children got their parent
      listener's congestion control even if the system default had changed
      after the listener was created.
      
      This commit returns to those original semantics from 3.17 and earlier,
      since they are the original semantics from 2007 in 4d4d3d1e ("[TCP]:
      Congestion control initialization."), and some Linux congestion
      control workflows depend on that.
      
      In summary, if a listener socket specifically sets TCP_CONGESTION to
      "x", or the route locks the CC module to "x", then the child gets
      "x". Otherwise the child gets current system default from
      net.ipv4.tcp_congestion_control. That's the behavior in 3.17 and
      earlier, and this commit restores that.
      
      Fixes: 55d8694f ("net: tcp: assign tcp cong_ops when tcp sk is created")
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Daniel Borkmann <dborkman@redhat.com>
      Cc: Glenn Judd <glenn.judd@morganstanley.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      18320675
    • Nikolay Aleksandrov's avatar
      bridge: disable softirqs around br_fdb_update to avoid lockup · 2ac739e4
      Nikolay Aleksandrov authored
      [ Upstream commit c4c832f8 ]
      
      br_fdb_update() can be called in process context in the following way:
      br_fdb_add() -> __br_fdb_add() -> br_fdb_update() (if NTF_USE flag is set)
      so we need to disable softirqs because there are softirq users of the
      hash_lock. One easy way to reproduce this is to modify the bridge utility
      to set NTF_USE, enable stp and then set maxageing to a low value so
      br_fdb_cleanup() is called frequently and then just add new entries in
      a loop. This happens because br_fdb_cleanup() is called from timer/softirq
      context. The spin locks in br_fdb_update were _bh before commit f8ae737d
      ("[BRIDGE]: forwarding remove unneeded preempt and bh diasables")
      and at the time that commit was correct because br_fdb_update() couldn't be
      called from process context, but that changed after commit:
      292d1398 ("bridge: add NTF_USE support")
      Using local_bh_disable/enable around br_fdb_update() allows us to keep
      using the spin_lock/unlock in br_fdb_update for the fast-path.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Fixes: 292d1398 ("bridge: add NTF_USE support")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      2ac739e4
    • Shawn Bohrer's avatar
      ipv4/udp: Verify multicast group is ours in upd_v4_early_demux() · e540794d
      Shawn Bohrer authored
      [ Upstream commit 6e540309 ]
      
      421b3885 "udp: ipv4: Add udp early
      demux" introduced a regression that allowed sockets bound to INADDR_ANY
      to receive packets from multicast groups that the socket had not joined.
      For example a socket that had joined 224.168.2.9 could also receive
      packets from 225.168.2.9 despite not having joined that group if
      ip_early_demux is enabled.
      
      Fix this by calling ip_check_mc_rcu() in udp_v4_early_demux() to verify
      that the multicast packet is indeed ours.
      Signed-off-by: default avatarShawn Bohrer <sbohrer@rgmadvisors.com>
      Reported-by: default avatarYurij M. Plotnikov <Yurij.Plotnikov@oktetlabs.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      e540794d
    • Ian Campbell's avatar
      xen: netback: read hotplug script once at start of day. · 94c15d0c
      Ian Campbell authored
      [ Upstream commit 31a41898 ]
      
      When we come to tear things down in netback_remove() and generate the
      uevent it is possible that the xenstore directory has already been
      removed (details below).
      
      In such cases netback_uevent() won't be able to read the hotplug
      script and will write a xenstore error node.
      
      A recent change to the hypervisor exposed this race such that we now
      sometimes lose it (where apparently we didn't ever before).
      
      Instead read the hotplug script configuration during setup and use it
      for the lifetime of the backend device.
      
      The apparently more obvious fix of moving the transition to
      state=Closed in netback_remove() to after the uevent does not work
      because it is possible that we are already in state=Closed (in
      reaction to the guest having disconnected as it shutdown). Being
      already in Closed means the toolstack is at liberty to start tearing
      down the xenstore directories. In principal it might be possible to
      arrange to unregister the device sooner (e.g on transition to Closing)
      such that xenstore would still be there but this state machine is
      fragile and prone to anger...
      
      A modern Xen system only relies on the hotplug uevent for driver
      domains, when the backend is in the same domain as the toolstack it
      will run the necessary setup/teardown directly in the correct sequence
      wrt xenstore changes.
      Signed-off-by: default avatarIan Campbell <ian.campbell@citrix.com>
      Acked-by: default avatarWei Liu <wei.liu2@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      94c15d0c
    • Eric Dumazet's avatar
      udp: fix behavior of wrong checksums · 5609714e
      Eric Dumazet authored
      [ Upstream commit beb39db5 ]
      
      We have two problems in UDP stack related to bogus checksums :
      
      1) We return -EAGAIN to application even if receive queue is not empty.
         This breaks applications using edge trigger epoll()
      
      2) Under UDP flood, we can loop forever without yielding to other
         processes, potentially hanging the host, especially on non SMP.
      
      This patch is an attempt to make things better.
      
      We might in the future add extra support for rt applications
      wanting to better control time spent doing a recv() in a hostile
      environment. For example we could validate checksums before queuing
      packets in socket receive queue.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      5609714e
    • Eric Dumazet's avatar
      bridge: fix br_multicast_query_expired() bug · 1f4a6996
      Eric Dumazet authored
      [ Upstream commit 71d9f614 ]
      
      br_multicast_query_expired() querier argument is a pointer to
      a struct bridge_mcast_querier :
      
      struct bridge_mcast_querier {
              struct br_ip addr;
              struct net_bridge_port __rcu    *port;
      };
      
      Intent of the code was to clear port field, not the pointer to querier.
      
      Fixes: 2cd41431 ("bridge: memorize and export selected IGMP/MLD querier port")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarThadeu Lima de Souza Cascardo <cascardo@redhat.com>
      Acked-by: default avatarLinus Lüssing <linus.luessing@c0d3.blue>
      Cc: Linus Lüssing <linus.luessing@web.de>
      Cc: Steinar H. Gunderson <sesse@samfundet.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      1f4a6996
    • Jason Gunthorpe's avatar
      sctp: Fix mangled IPv4 addresses on a IPv6 listening socket · 4138d766
      Jason Gunthorpe authored
      [ Upstream commit 9302d7bb ]
      
      sctp_v4_map_v6 was subtly writing and reading from members
      of a union in a way the clobbered data it needed to read before
      it read it.
      
      Zeroing the v6 flowinfo overwrites the v4 sin_addr with 0, meaning
      that every place that calls sctp_v4_map_v6 gets ::ffff:0.0.0.0 as the
      result.
      
      Reorder things to guarantee correct behaviour no matter what the
      union layout is.
      
      This impacts user space clients that open an IPv6 SCTP socket and
      receive IPv4 connections. Prior to 299ee user space would see a
      sockaddr with AF_INET and a correct address, after 299ee the sockaddr
      is AF_INET6, but the address is wrong.
      
      Fixes: 299ee123 (sctp: Fixup v4mapped behaviour to comply with Sock API)
      Signed-off-by: default avatarJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      4138d766
    • WANG Cong's avatar
      net_sched: invoke ->attach() after setting dev->qdisc · 1eee29d2
      WANG Cong authored
      [ Upstream commit 86e363dc ]
      
      For mq qdisc, we add per tx queue qdisc to root qdisc
      for display purpose, however, that happens too early,
      before the new dev->qdisc is finally set, this causes
      q->list points to an old root qdisc which is going to be
      freed right before assigning with a new one.
      
      Fix this by moving ->attach() after setting dev->qdisc.
      
      For the record, this fixes the following crash:
      
       ------------[ cut here ]------------
       WARNING: CPU: 1 PID: 975 at lib/list_debug.c:59 __list_del_entry+0x5a/0x98()
       list_del corruption. prev->next should be ffff8800d1998ae8, but was 6b6b6b6b6b6b6b6b
       CPU: 1 PID: 975 Comm: tc Not tainted 4.1.0-rc4+ #1019
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
        0000000000000009 ffff8800d73fb928 ffffffff81a44e7f 0000000047574756
        ffff8800d73fb978 ffff8800d73fb968 ffffffff810790da ffff8800cfc4cd20
        ffffffff814e725b ffff8800d1998ae8 ffffffff82381250 0000000000000000
       Call Trace:
        [<ffffffff81a44e7f>] dump_stack+0x4c/0x65
        [<ffffffff810790da>] warn_slowpath_common+0x9c/0xb6
        [<ffffffff814e725b>] ? __list_del_entry+0x5a/0x98
        [<ffffffff81079162>] warn_slowpath_fmt+0x46/0x48
        [<ffffffff81820eb0>] ? dev_graft_qdisc+0x5e/0x6a
        [<ffffffff814e725b>] __list_del_entry+0x5a/0x98
        [<ffffffff814e72a7>] list_del+0xe/0x2d
        [<ffffffff81822f05>] qdisc_list_del+0x1e/0x20
        [<ffffffff81820cd1>] qdisc_destroy+0x30/0xd6
        [<ffffffff81822676>] qdisc_graft+0x11d/0x243
        [<ffffffff818233c1>] tc_get_qdisc+0x1a6/0x1d4
        [<ffffffff810b5eaf>] ? mark_lock+0x2e/0x226
        [<ffffffff817ff8f5>] rtnetlink_rcv_msg+0x181/0x194
        [<ffffffff817ff72e>] ? rtnl_lock+0x17/0x19
        [<ffffffff817ff72e>] ? rtnl_lock+0x17/0x19
        [<ffffffff817ff774>] ? __rtnl_unlock+0x17/0x17
        [<ffffffff81855dc6>] netlink_rcv_skb+0x4d/0x93
        [<ffffffff817ff756>] rtnetlink_rcv+0x26/0x2d
        [<ffffffff818544b2>] netlink_unicast+0xcb/0x150
        [<ffffffff81161db9>] ? might_fault+0x59/0xa9
        [<ffffffff81854f78>] netlink_sendmsg+0x4fa/0x51c
        [<ffffffff817d6e09>] sock_sendmsg_nosec+0x12/0x1d
        [<ffffffff817d8967>] sock_sendmsg+0x29/0x2e
        [<ffffffff817d8cf3>] ___sys_sendmsg+0x1b4/0x23a
        [<ffffffff8100a1b8>] ? native_sched_clock+0x35/0x37
        [<ffffffff810a1d83>] ? sched_clock_local+0x12/0x72
        [<ffffffff810a1fd4>] ? sched_clock_cpu+0x9e/0xb7
        [<ffffffff810def2a>] ? current_kernel_time+0xe/0x32
        [<ffffffff810b4bc5>] ? lock_release_holdtime.part.29+0x71/0x7f
        [<ffffffff810ddebf>] ? read_seqcount_begin.constprop.27+0x5f/0x76
        [<ffffffff810b6292>] ? trace_hardirqs_on_caller+0x17d/0x199
        [<ffffffff811b14d5>] ? __fget_light+0x50/0x78
        [<ffffffff817d9808>] __sys_sendmsg+0x42/0x60
        [<ffffffff817d9838>] SyS_sendmsg+0x12/0x1c
        [<ffffffff81a50e97>] system_call_fastpath+0x12/0x6f
       ---[ end trace ef29d3fb28e97ae7 ]---
      
      For long term, we probably need to clean up the qdisc_graft() code
      in case it hides other bugs like this.
      
      Fixes: 95dc1929 ("pkt_sched: give visibility to mq slave qdiscs")
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      1eee29d2