1. 29 Sep, 2014 4 commits
    • Daniel Borkmann's avatar
      net: tcp: add flag for ca to indicate that ECN is required · 30e502a3
      Daniel Borkmann authored
      This patch adds a flag to TCP congestion algorithms that allows
      for requesting to mark IPv4/IPv6 sockets with transport as ECN
      capable, that is, ECT(0), when required by a congestion algorithm.
      
      It is currently used and needed in DataCenter TCP (DCTCP), as it
      requires both peers to assert ECT on all IP packets sent - it
      uses ECN feedback (i.e. CE, Congestion Encountered information)
      from switches inside the data center to derive feedback to the
      end hosts.
      
      Therefore, simply add a new flag to icsk_ca_ops. Note that DCTCP's
      algorithm/behaviour slightly diverges from RFC3168, therefore this
      is only (!) enabled iff the assigned congestion control ops module
      has requested this. By that, we can tightly couple this logic really
      only to the provided congestion control ops.
      
      Joint work with Florian Westphal and Glenn Judd.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarGlenn Judd <glenn.judd@morganstanley.com>
      Acked-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      30e502a3
    • Florian Westphal's avatar
      net: tcp: assign tcp cong_ops when tcp sk is created · 55d8694f
      Florian Westphal authored
      Split assignment and initialization from one into two functions.
      
      This is required by followup patches that add Datacenter TCP
      (DCTCP) congestion control algorithm - we need to be able to
      determine if the connection is moderated by DCTCP before the
      3WHS has finished.
      
      As we walk the available congestion control list during the
      assignment, we are always guaranteed to have Reno present as
      it's fixed compiled-in. Therefore, since we're doing the
      early assignment, we don't have a real use for the Reno alias
      tcp_init_congestion_ops anymore and can thus remove it.
      
      Actual usage of the congestion control operations are being
      made after the 3WHS has finished, in some cases however we
      can access get_info() via diag if implemented, therefore we
      need to zero out the private area for those modules.
      
      Joint work with Daniel Borkmann and Glenn Judd.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarGlenn Judd <glenn.judd@morganstanley.com>
      Acked-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      55d8694f
    • John Fastabend's avatar
      net: sched: cls_rcvp, complete rcu conversion · 53dfd501
      John Fastabend authored
      This completes the cls_rsvp conversion to RCU safe
      copy, update semantics.
      
      As a result all cases of tcf_exts_change occur on
      empty lists now.
      Signed-off-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      53dfd501
    • Eric Dumazet's avatar
      dql: dql_queued() should write first to reduce bus transactions · 3d9a0d2f
      Eric Dumazet authored
      While doing high throughput test on a BQL enabled NIC,
      I found a very high cost in ndo_start_xmit() when accessing BQL data.
      
      It turned out the problem was caused by compiler trying to be
      smart, but involving a bad MESI transaction :
      
        0.05 │  mov    0xc0(%rax),%edi    // LOAD dql->num_queued
        0.48 │  mov    %edx,0xc8(%rax)    // STORE dql->last_obj_cnt = count
       58.23 │  add    %edx,%edi
        0.58 │  cmp    %edi,0xc4(%rax)
        0.76 │  mov    %edi,0xc0(%rax)    // STORE dql->num_queued += count
        0.72 │  js     bd8
      
      I got an incredible 10 % gain [1] by making sure cpu do not attempt
      to get the cache line in Shared mode, but directly requests for
      ownership.
      
      New code :
      	mov    %edx,0xc8(%rax)  // STORE dql->last_obj_cnt = count
      	add    %edx,0xc0(%rax)  // RMW   dql->num_queued += count
      	mov    0xc4(%rax),%ecx  // LOAD dql->adj_limit
      	mov    0xc0(%rax),%edx  // LOAD dql->num_queued
      	cmp    %edx,%ecx
      
      The TX completion was running from another cpu, with high interrupts
      rate.
      
      Note that I am using barrier() as a soft hint, as mb() here could be
      too heavy cost.
      
      [1] This was a netperf TCP_STREAM with TSO disabled, but GSO enabled.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3d9a0d2f
  2. 28 Sep, 2014 32 commits
  3. 26 Sep, 2014 4 commits