1. 12 May, 2015 11 commits
  2. 11 May, 2015 20 commits
  3. 10 May, 2015 9 commits
    • Eric Dumazet's avatar
      codel: add ce_threshold attribute · 80ba92fa
      Eric Dumazet authored
      For DCTCP or similar ECN based deployments on fabrics with shallow
      buffers, hosts are responsible for a good part of the buffering.
      
      This patch adds an optional ce_threshold to codel & fq_codel qdiscs,
      so that DCTCP can have feedback from queuing in the host.
      
      A DCTCP enabled egress port simply have a queue occupancy threshold
      above which ECT packets get CE mark.
      
      In codel language this translates to a sojourn time, so that one doesn't
      have to worry about bytes or bandwidth but delays.
      
      This makes the host an active participant in the health of the whole
      network.
      
      This also helps experimenting DCTCP in a setup without DCTCP compliant
      fabric.
      
      On following example, ce_threshold is set to 1ms, and we can see from
      'ldelay xxx us' that TCP is not trying to go around the 5ms codel
      target.
      
      Queue has more capacity to absorb inelastic bursts (say from UDP
      traffic), as queues are maintained to an optimal level.
      
      lpaa23:~# ./tc -s -d qd sh dev eth1
      qdisc mq 1: dev eth1 root
       Sent 87910654696 bytes 58065331 pkt (dropped 0, overlimits 0 requeues 42961)
       backlog 3108242b 364p requeues 42961
      qdisc codel 8063: dev eth1 parent 1:1 limit 1000p target 5.0ms ce_threshold 1.0ms interval 100.0ms
       Sent 7363778701 bytes 4863809 pkt (dropped 0, overlimits 0 requeues 5503)
       rate 2348Mbit 193919pps backlog 255866b 46p requeues 5503
        count 0 lastcount 0 ldelay 1.0ms drop_next 0us
        maxpacket 68130 ecn_mark 0 drop_overlimit 0 ce_mark 72384
      qdisc codel 8064: dev eth1 parent 1:2 limit 1000p target 5.0ms ce_threshold 1.0ms interval 100.0ms
       Sent 7636486190 bytes 5043942 pkt (dropped 0, overlimits 0 requeues 5186)
       rate 2319Mbit 191538pps backlog 207418b 64p requeues 5186
        count 0 lastcount 0 ldelay 694us drop_next 0us
        maxpacket 68130 ecn_mark 0 drop_overlimit 0 ce_mark 69873
      qdisc codel 8065: dev eth1 parent 1:3 limit 1000p target 5.0ms ce_threshold 1.0ms interval 100.0ms
       Sent 11569360142 bytes 7641602 pkt (dropped 0, overlimits 0 requeues 5554)
       rate 3041Mbit 251096pps backlog 210446b 59p requeues 5554
        count 0 lastcount 0 ldelay 889us drop_next 0us
        maxpacket 68130 ecn_mark 0 drop_overlimit 0 ce_mark 37780
      ...
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Glenn Judd <glenn.judd@morganstanley.com>
      Cc: Nandita Dukkipati <nanditad@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      80ba92fa
    • Varka Bhadram's avatar
      ethernet: qualcomm: use spi instead of spi_device · cf9d0dcc
      Varka Bhadram authored
      All spi based drivers have an instance of struct spi_device
      as spi. This patch renames spi_device to spi to synchronize
      with all the drivers.
      Signed-off-by: default avatarVarka Bhadram <varkab@cdac.in>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cf9d0dcc
    • David S. Miller's avatar
      Merge branch 'pktgen-next' · 3e3b3468
      David S. Miller authored
      Jesper Dangaard Brouer says:
      
      ====================
      The following series introduce some pktgen changes
      
      Patch01:
       Cleanup my own work when I introduced NO_TIMESTAMP.
      
      Patch02:
       Took over patch from Alexei, and addressed my own concerns, as Alexie
       is too busy with other work, and this will provide an easy tool for
       measuring ingress path performance, which is a hot topic ATM.
      
       Changes were primarily user interface related.  Introduced a separate
       "xmit_mode" setting, instead of stealing one of the dev flags like
       Alexei did.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3e3b3468
    • Alexei Starovoitov's avatar
      pktgen: introduce xmit_mode '<start_xmit|netif_receive>' · 62f64aed
      Alexei Starovoitov authored
      Introduce xmit_mode 'netif_receive' for pktgen which generates the
      packets using familiar pktgen commands, but feeds them into
      netif_receive_skb() instead of ndo_start_xmit().
      
      Default mode is called 'start_xmit'.
      
      It is designed to test netif_receive_skb and ingress qdisc
      performace only. Make sure to understand how it works before
      using it for other rx benchmarking.
      
      Sample script 'pktgen.sh':
      \#!/bin/bash
      function pgset() {
        local result
      
        echo $1 > $PGDEV
      
        result=`cat $PGDEV | fgrep "Result: OK:"`
        if [ "$result" = "" ]; then
          cat $PGDEV | fgrep Result:
        fi
      }
      
      [ -z "$1" ] && echo "Usage: $0 DEV" && exit 1
      ETH=$1
      
      PGDEV=/proc/net/pktgen/kpktgend_0
      pgset "rem_device_all"
      pgset "add_device $ETH"
      
      PGDEV=/proc/net/pktgen/$ETH
      pgset "xmit_mode netif_receive"
      pgset "pkt_size 60"
      pgset "dst 198.18.0.1"
      pgset "dst_mac 90:e2:ba:ff:ff:ff"
      pgset "count 10000000"
      pgset "burst 32"
      
      PGDEV=/proc/net/pktgen/pgctrl
      echo "Running... ctrl^C to stop"
      pgset "start"
      echo "Done"
      cat /proc/net/pktgen/$ETH
      
      Usage:
      $ sudo ./pktgen.sh eth2
      ...
      Result: OK: 232376(c232372+d3) usec, 10000000 (60byte,0frags)
        43033682pps 20656Mb/sec (20656167360bps) errors: 10000000
      
      Raw netif_receive_skb speed should be ~43 million packet
      per second on 3.7Ghz x86 and 'perf report' should look like:
        37.69%  kpktgend_0   [kernel.vmlinux]  [k] __netif_receive_skb_core
        25.81%  kpktgend_0   [kernel.vmlinux]  [k] kfree_skb
         7.22%  kpktgend_0   [kernel.vmlinux]  [k] ip_rcv
         5.68%  kpktgend_0   [pktgen]          [k] pktgen_thread_worker
      
      If fib_table_lookup is seen on top, it means skb was processed
      by the stack. To benchmark netif_receive_skb only make sure
      that 'dst_mac' of your pktgen script is different from
      receiving device mac and it will be dropped by ip_rcv
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      62f64aed
    • Jesper Dangaard Brouer's avatar
      pktgen: adjust flag NO_TIMESTAMP to be more pktgen compliant · f1f00d8f
      Jesper Dangaard Brouer authored
      Allow flag NO_TIMESTAMP to turn timestamping on again, like other flags,
      with a negation of the flag like !NO_TIMESTAMP.
      
      Also document the option flag NO_TIMESTAMP.
      
      Fixes: afb84b62 ("pktgen: add flag NO_TIMESTAMP to disable timestamping")
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f1f00d8f
    • David S. Miller's avatar
      Merge branch 'netns-scalability' · 4d95b72f
      David S. Miller authored
      Nicolas Dichtel says:
      
      ====================
      netns: ease netlink use with a lot of netns
      
      This idea was informally discussed in Ottawa / netdev0.1. The goal is to
      ease the use/scalability of netns, from a userland point of view.
      Today, users need to open one netlink socket per family and per netns.
      Thus, when the number of netns inscreases (for example 5K or more), the
      number of sockets needed to manage them grows a lot.
      
      The goal of this series is to be able to monitor netlink events, for a
      specified family, for a set of netns, with only one netlink socket. For
      this purpose, a netlink socket option is added: NETLINK_LISTEN_ALL_NSID.
      When this option is set on a netlink socket, this socket will receive
      netlink notifications from all netns that have a nsid assigned into the
      netns where the socket has been opened.
      The nsid is sent to userland via an anscillary data.
      
      Here is an example with a patched iproute2. vxlan10 is created in the
      current netns (netns0, nsid 0) and then moved to another netns (netns1,
      nsid 1):
      
      $ ip netns exec netns0 ip monitor all-nsid label
      [nsid 0][NSID]nsid 1 (iproute2 netns name: netns1)
      [nsid 0][NEIGH]??? lladdr 00:00:00:00:00:00 REACHABLE,PERMANENT
      [nsid 0][LINK]5: vxlan10@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default
          link/ether 92:33:17:e6:e7:1d brd ff:ff:ff:ff:ff:ff
      [nsid 0][LINK]Deleted 5: vxlan10@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default
          link/ether 92:33:17:e6:e7:1d brd ff:ff:ff:ff:ff:ff
      [nsid 1][NSID]nsid 0 (iproute2 netns name: netns0)
      [nsid 1][LINK]5: vxlan10@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default
          link/ether 92:33:17:e6:e7:1d brd ff:ff:ff:ff:ff:ff link-netnsid 0
      [nsid 1][ADDR]5: vxlan10    inet 192.168.0.249/24 brd 192.168.0.255 scope global vxlan10
             valid_lft forever preferred_lft forever
      [nsid 1][ROUTE]local 192.168.0.249 dev vxlan10  table local  proto kernel  scope host  src 192.168.0.249
      [nsid 1][ROUTE]ff00::/8 dev vxlan10  table local  metric 256  pref medium
      [nsid 1][ROUTE]2001:123::/64 dev vxlan10  proto kernel  metric 256  pref medium
      [nsid 1][LINK]5: vxlan10@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
          link/ether 92:33:17:e6:e7:1d brd ff:ff:ff:ff:ff:ff link-netnsid 0
      [nsid 1][ROUTE]broadcast 192.168.0.255 dev vxlan10  table local  proto kernel  scope link  src 192.168.0.249
      [nsid 1][ROUTE]192.168.0.0/24 dev vxlan10  proto kernel  scope link  src 192.168.0.249
      [nsid 1][ROUTE]broadcast 192.168.0.0 dev vxlan10  table local  proto kernel  scope link  src 192.168.0.249
      [nsid 1][ROUTE]fe80::/64 dev vxlan10  proto kernel  metric 256  pref medium
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d95b72f
    • Nicolas Dichtel's avatar
      netlink: allow to listen "all" netns · 59324cf3
      Nicolas Dichtel authored
      More accurately, listen all netns that have a nsid assigned into the netns
      where the netlink socket is opened.
      For this purpose, a netlink socket option is added:
      NETLINK_LISTEN_ALL_NSID. When this option is set on a netlink socket, this
      socket will receive netlink notifications from all netns that have a nsid
      assigned into the netns where the socket has been opened. The nsid is sent
      to userland via an anscillary data.
      
      With this patch, a daemon needs only one socket to listen many netns. This
      is useful when the number of netns is high.
      
      Because 0 is a valid value for a nsid, the field nsid_is_set indicates if
      the field nsid is valid or not. skb->cb is initialized to 0 on skb
      allocation, thus we are sure that we will never send a nsid 0 by error to
      the userland.
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      59324cf3
    • Nicolas Dichtel's avatar
      netlink: rename private flags and states · cc3a572f
      Nicolas Dichtel authored
      These flags and states have the same prefix (NETLINK_) that netlink socket
      options. To avoid confusion and to be able to name a flag like a socket
      option, let's use an other prefix: NETLINK_[S|F]_.
      
      Note: a comment has been fixed, it was talking about
      NETLINK_RECV_NO_ENOBUFS socket option instead of NETLINK_NO_ENOBUFS.
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc3a572f
    • Nicolas Dichtel's avatar
      netns: use a spin_lock to protect nsid management · 95f38411
      Nicolas Dichtel authored
      Before this patch, nsid were protected by the rtnl lock. The goal of this
      patch is to be able to find a nsid without needing to hold the rtnl lock.
      
      The next patch will introduce a netlink socket option to listen to all
      netns that have a nsid assigned into the netns where the socket is opened.
      Thus, it's important to call rtnl_net_notifyid() outside the spinlock, to
      avoid a recursive lock (nsid are notified via rtnl). This was the main
      reason of the previous patch.
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95f38411