1. 12 Jul, 2012 2 commits
    • Eric Dumazet's avatar
      tcp: TCP Small Queues · 46d3ceab
      Eric Dumazet authored
      This introduce TSQ (TCP Small Queues)
      
      TSQ goal is to reduce number of TCP packets in xmit queues (qdisc &
      device queues), to reduce RTT and cwnd bias, part of the bufferbloat
      problem.
      
      sk->sk_wmem_alloc not allowed to grow above a given limit,
      allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a
      given time.
      
      TSO packets are sized/capped to half the limit, so that we have two
      TSO packets in flight, allowing better bandwidth use.
      
      As a side effect, setting the limit to 40000 automatically reduces the
      standard gso max limit (65536) to 40000/2 : It can help to reduce
      latencies of high prio packets, having smaller TSO packets.
      
      This means we divert sock_wfree() to a tcp_wfree() handler, to
      queue/send following frames when skb_orphan() [2] is called for the
      already queued skbs.
      
      Results on my dev machines (tg3/ixgbe nics) are really impressive,
      using standard pfifo_fast, and with or without TSO/GSO.
      
      Without reduction of nominal bandwidth, we have reduction of buffering
      per bulk sender :
      < 1ms on Gbit (instead of 50ms with TSO)
      < 8ms on 100Mbit (instead of 132 ms)
      
      I no longer have 4 MBytes backlogged in qdisc by a single netperf
      session, and both side socket autotuning no longer use 4 Mbytes.
      
      As skb destructor cannot restart xmit itself ( as qdisc lock might be
      taken at this point ), we delegate the work to a tasklet. We use one
      tasklest per cpu for performance reasons.
      
      If tasklet finds a socket owned by the user, it sets TSQ_OWNED flag.
      This flag is tested in a new protocol method called from release_sock(),
      to eventually send new segments.
      
      [1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable
      [2] skb_orphan() is usually called at TX completion time,
        but some drivers call it in their start_xmit() handler.
        These drivers should at least use BQL, or else a single TCP
        session can still fill the whole NIC TX ring, since TSQ will
        have no effect.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Dave Taht <dave.taht@bufferbloat.net>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Matt Mathis <mattmathis@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Nandita Dukkipati <nanditad@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      46d3ceab
    • Alexander Duyck's avatar
      tcp: Fix out of bounds access to tcpm_vals · 2100844c
      Alexander Duyck authored
      The recent patch "tcp: Maintain dynamic metrics in local cache." introduced
      an out of bounds access due to what appears to be a typo.   I believe this
      change should resolve the issue by replacing the access to RTAX_CWND with
      TCP_METRIC_CWND.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2100844c
  2. 11 Jul, 2012 38 commits