1. 10 Jul, 2015 9 commits
    • Christoph Paasch's avatar
      tcp: Do not call tcp_fastopen_reset_cipher from interrupt context · 1df46092
      Christoph Paasch authored
      [ Upstream commit dfea2aa6 ]
      
      tcp_fastopen_reset_cipher really cannot be called from interrupt
      context. It allocates the tcp_fastopen_context with GFP_KERNEL and
      calls crypto_alloc_cipher, which allocates all kind of stuff with
      GFP_KERNEL.
      
      Thus, we might sleep when the key-generation is triggered by an
      incoming TFO cookie-request which would then happen in interrupt-
      context, as shown by enabling CONFIG_DEBUG_ATOMIC_SLEEP:
      
      [   36.001813] BUG: sleeping function called from invalid context at mm/slub.c:1266
      [   36.003624] in_atomic(): 1, irqs_disabled(): 0, pid: 1016, name: packetdrill
      [   36.004859] CPU: 1 PID: 1016 Comm: packetdrill Not tainted 4.1.0-rc7 #14
      [   36.006085] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
      [   36.008250]  00000000000004f2 ffff88007f8838a8 ffffffff8171d53a ffff880075a084a8
      [   36.009630]  ffff880075a08000 ffff88007f8838c8 ffffffff810967d3 ffff88007f883928
      [   36.011076]  0000000000000000 ffff88007f8838f8 ffffffff81096892 ffff88007f89be00
      [   36.012494] Call Trace:
      [   36.012953]  <IRQ>  [<ffffffff8171d53a>] dump_stack+0x4f/0x6d
      [   36.014085]  [<ffffffff810967d3>] ___might_sleep+0x103/0x170
      [   36.015117]  [<ffffffff81096892>] __might_sleep+0x52/0x90
      [   36.016117]  [<ffffffff8118e887>] kmem_cache_alloc_trace+0x47/0x190
      [   36.017266]  [<ffffffff81680d82>] ? tcp_fastopen_reset_cipher+0x42/0x130
      [   36.018485]  [<ffffffff81680d82>] tcp_fastopen_reset_cipher+0x42/0x130
      [   36.019679]  [<ffffffff81680f01>] tcp_fastopen_init_key_once+0x61/0x70
      [   36.020884]  [<ffffffff81680f2c>] __tcp_fastopen_cookie_gen+0x1c/0x60
      [   36.022058]  [<ffffffff816814ff>] tcp_try_fastopen+0x58f/0x730
      [   36.023118]  [<ffffffff81671788>] tcp_conn_request+0x3e8/0x7b0
      [   36.024185]  [<ffffffff810e3872>] ? __module_text_address+0x12/0x60
      [   36.025327]  [<ffffffff8167b2e1>] tcp_v4_conn_request+0x51/0x60
      [   36.026410]  [<ffffffff816727e0>] tcp_rcv_state_process+0x190/0xda0
      [   36.027556]  [<ffffffff81661f97>] ? __inet_lookup_established+0x47/0x170
      [   36.028784]  [<ffffffff8167c2ad>] tcp_v4_do_rcv+0x16d/0x3d0
      [   36.029832]  [<ffffffff812e6806>] ? security_sock_rcv_skb+0x16/0x20
      [   36.030936]  [<ffffffff8167cc8a>] tcp_v4_rcv+0x77a/0x7b0
      [   36.031875]  [<ffffffff816af8c3>] ? iptable_filter_hook+0x33/0x70
      [   36.032953]  [<ffffffff81657d22>] ip_local_deliver_finish+0x92/0x1f0
      [   36.034065]  [<ffffffff81657f1a>] ip_local_deliver+0x9a/0xb0
      [   36.035069]  [<ffffffff81657c90>] ? ip_rcv+0x3d0/0x3d0
      [   36.035963]  [<ffffffff81657569>] ip_rcv_finish+0x119/0x330
      [   36.036950]  [<ffffffff81657ba7>] ip_rcv+0x2e7/0x3d0
      [   36.037847]  [<ffffffff81610652>] __netif_receive_skb_core+0x552/0x930
      [   36.038994]  [<ffffffff81610a57>] __netif_receive_skb+0x27/0x70
      [   36.040033]  [<ffffffff81610b72>] process_backlog+0xd2/0x1f0
      [   36.041025]  [<ffffffff81611482>] net_rx_action+0x122/0x310
      [   36.042007]  [<ffffffff81076743>] __do_softirq+0x103/0x2f0
      [   36.042978]  [<ffffffff81723e3c>] do_softirq_own_stack+0x1c/0x30
      
      This patch moves the call to tcp_fastopen_init_key_once to the places
      where a listener socket creates its TFO-state, which always happens in
      user-context (either from the setsockopt, or implicitly during the
      listen()-call)
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Fixes: 222e83d2 ("tcp: switch tcp_fastopen key generation to net_get_random_once")
      Signed-off-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1df46092
    • Julian Anastasov's avatar
      neigh: do not modify unlinked entries · f171360a
      Julian Anastasov authored
      [ Upstream commit 2c51a97f ]
      
      The lockless lookups can return entry that is unlinked.
      Sometimes they get reference before last neigh_cleanup_and_release,
      sometimes they do not need reference. Later, any
      modification attempts may result in the following problems:
      
      1. entry is not destroyed immediately because neigh_update
      can start the timer for dead entry, eg. on change to NUD_REACHABLE
      state. As result, entry lives for some time but is invisible
      and out of control.
      
      2. __neigh_event_send can run in parallel with neigh_destroy
      while refcnt=0 but if timer is started and expired refcnt can
      reach 0 for second time leading to second neigh_destroy and
      possible crash.
      
      Thanks to Eric Dumazet and Ying Xue for their work and analyze
      on the __neigh_event_send change.
      
      Fixes: 767e97e1 ("neigh: RCU conversion of struct neighbour")
      Fixes: a263b309 ("ipv4: Make neigh lookups directly in output packet path.")
      Fixes: 6fd6ce20 ("ipv6: Do not depend on rt->n in ip6_finish_output2().")
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Ying Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f171360a
    • Willem de Bruijn's avatar
      packet: avoid out of bounds read in round robin fanout · 4ce9d474
      Willem de Bruijn authored
      [ Upstream commit 468479e6 ]
      
      PACKET_FANOUT_LB computes f->rr_cur such that it is modulo
      f->num_members. It returns the old value unconditionally, but
      f->num_members may have changed since the last store. Ensure
      that the return value is always < num.
      
      When modifying the logic, simplify it further by replacing the loop
      with an unconditional atomic increment.
      
      Fixes: dc99f600 ("packet: Add fanout support.")
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4ce9d474
    • Eric Dumazet's avatar
      packet: read num_members once in packet_rcv_fanout() · 06f6b596
      Eric Dumazet authored
      [ Upstream commit f98f4514 ]
      
      We need to tell compiler it must not read f->num_members multiple
      times. Otherwise testing if num is not zero is flaky, and we could
      attempt an invalid divide by 0 in fanout_demux_cpu()
      
      Note bug was present in packet_rcv_fanout_hash() and
      packet_rcv_fanout_lb() but final 3.1 had a simple location
      after commit 95ec3eb4 ("packet: Add 'cpu' fanout policy.")
      
      Fixes: dc99f600 ("packet: Add fanout support.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      06f6b596
    • Nikolay Aleksandrov's avatar
      bridge: fix br_stp_set_bridge_priority race conditions · 6a3334dd
      Nikolay Aleksandrov authored
      [ Upstream commit 2dab80a8 ]
      
      After the ->set() spinlocks were removed br_stp_set_bridge_priority
      was left running without any protection when used via sysfs. It can
      race with port add/del and could result in use-after-free cases and
      corrupted lists. Tested by running port add/del in a loop with stp
      enabled while setting priority in a loop, crashes are easily
      reproducible.
      The spinlocks around sysfs ->set() were removed in commit:
      14f98f25 ("bridge: range check STP parameters")
      There's also a race condition in the netlink priority support that is
      fixed by this change, but it was introduced recently and the fixes tag
      covers it, just in case it's needed the commit is:
      af615762 ("bridge: add ageing_time, stp_state, priority over netlink")
      Signed-off-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Fixes: 14f98f25 ("bridge: range check STP parameters")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6a3334dd
    • Marcelo Ricardo Leitner's avatar
      sctp: fix ASCONF list handling · 21eceec5
      Marcelo Ricardo Leitner authored
      [ Upstream commit 2d45a02d ]
      
      ->auto_asconf_splist is per namespace and mangled by functions like
      sctp_setsockopt_auto_asconf() which doesn't guarantee any serialization.
      
      Also, the call to inet_sk_copy_descendant() was backuping
      ->auto_asconf_list through the copy but was not honoring
      ->do_auto_asconf, which could lead to list corruption if it was
      different between both sockets.
      
      This commit thus fixes the list handling by using ->addr_wq_lock
      spinlock to protect the list. A special handling is done upon socket
      creation and destruction for that. Error handlig on sctp_init_sock()
      will never return an error after having initialized asconf, so
      sctp_destroy_sock() can be called without addrq_wq_lock. The lock now
      will be take on sctp_close_sock(), before locking the socket, so we
      don't do it in inverse order compared to sctp_addr_wq_timeout_handler().
      
      Instead of taking the lock on sctp_sock_migrate() for copying and
      restoring the list values, it's preferred to avoid rewritting it by
      implementing sctp_copy_descendant().
      
      Issue was found with a test application that kept flipping sysctl
      default_auto_asconf on and off, but one could trigger it by issuing
      simultaneous setsockopt() calls on multiple sockets or by
      creating/destroying sockets fast enough. This is only triggerable
      locally.
      
      Fixes: 9f7d653b ("sctp: Add Auto-ASCONF support (core).")
      Reported-by: default avatarJi Jianwen <jiji@redhat.com>
      Suggested-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Suggested-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      21eceec5
    • Shaohua Li's avatar
      net: don't wait for order-3 page allocation · 00764c52
      Shaohua Li authored
      [ Upstream commit fb05e7a8 ]
      
      We saw excessive direct memory compaction triggered by skb_page_frag_refill.
      This causes performance issues and add latency. Commit 5640f768
      introduces the order-3 allocation. According to the changelog, the order-3
      allocation isn't a must-have but to improve performance. But direct memory
      compaction has high overhead. The benefit of order-3 allocation can't
      compensate the overhead of direct memory compaction.
      
      This patch makes the order-3 page allocation atomic. If there is no memory
      pressure and memory isn't fragmented, the alloction will still success, so we
      don't sacrifice the order-3 benefit here. If the atomic allocation fails,
      direct memory compaction will not be triggered, skb_page_frag_refill will
      fallback to order-0 immediately, hence the direct memory compaction overhead is
      avoided. In the allocation failure case, kswapd is waken up and doing
      compaction, so chances are allocation could success next time.
      
      alloc_skb_with_frags is the same.
      
      The mellanox driver does similar thing, if this is accepted, we must fix
      the driver too.
      
      V3: fix the same issue in alloc_skb_with_frags as pointed out by Eric
      V2: make the changelog clearer
      
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Chris Mason <clm@fb.com>
      Cc: Debabrata Banerjee <dbavatar@gmail.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      00764c52
    • Nikolay Aleksandrov's avatar
      bridge: fix multicast router rlist endless loop · 1a3c14f4
      Nikolay Aleksandrov authored
      [ Upstream commit 1a040eac ]
      
      Since the addition of sysfs multicast router support if one set
      multicast_router to "2" more than once, then the port would be added to
      the hlist every time and could end up linking to itself and thus causing an
      endless loop for rlist walkers.
      So to reproduce just do:
      echo 2 > multicast_router; echo 2 > multicast_router;
      in a bridge port and let some igmp traffic flow, for me it hangs up
      in br_multicast_flood().
      Fix this by adding a check in br_multicast_add_router() if the port is
      already linked.
      The reason this didn't happen before the addition of multicast_router
      sysfs entries is because there's a !hlist_unhashed check that prevents
      it.
      Signed-off-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Fixes: 0909e117 ("bridge: Add multicast_router sysfs entries")
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1a3c14f4
    • Sowmini Varadhan's avatar
      sparc: Use GFP_ATOMIC in ldc_alloc_exp_dring() as it can be called in softirq context · 1996810d
      Sowmini Varadhan authored
      Upstream commit 671d7732
      
      Since it is possible for vnet_event_napi to end up doing
      vnet_control_pkt_engine -> ... -> vnet_send_attr ->
      vnet_port_alloc_tx_ring -> ldc_alloc_exp_dring -> kzalloc()
      (i.e., in softirq context), kzalloc() should be called with
      GFP_ATOMIC from ldc_alloc_exp_dring.
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1996810d
  2. 04 Jul, 2015 31 commits