1. 11 May, 2015 20 commits
  2. 10 May, 2015 14 commits
    • Eric Dumazet's avatar
      codel: add ce_threshold attribute · 80ba92fa
      Eric Dumazet authored
      For DCTCP or similar ECN based deployments on fabrics with shallow
      buffers, hosts are responsible for a good part of the buffering.
      
      This patch adds an optional ce_threshold to codel & fq_codel qdiscs,
      so that DCTCP can have feedback from queuing in the host.
      
      A DCTCP enabled egress port simply have a queue occupancy threshold
      above which ECT packets get CE mark.
      
      In codel language this translates to a sojourn time, so that one doesn't
      have to worry about bytes or bandwidth but delays.
      
      This makes the host an active participant in the health of the whole
      network.
      
      This also helps experimenting DCTCP in a setup without DCTCP compliant
      fabric.
      
      On following example, ce_threshold is set to 1ms, and we can see from
      'ldelay xxx us' that TCP is not trying to go around the 5ms codel
      target.
      
      Queue has more capacity to absorb inelastic bursts (say from UDP
      traffic), as queues are maintained to an optimal level.
      
      lpaa23:~# ./tc -s -d qd sh dev eth1
      qdisc mq 1: dev eth1 root
       Sent 87910654696 bytes 58065331 pkt (dropped 0, overlimits 0 requeues 42961)
       backlog 3108242b 364p requeues 42961
      qdisc codel 8063: dev eth1 parent 1:1 limit 1000p target 5.0ms ce_threshold 1.0ms interval 100.0ms
       Sent 7363778701 bytes 4863809 pkt (dropped 0, overlimits 0 requeues 5503)
       rate 2348Mbit 193919pps backlog 255866b 46p requeues 5503
        count 0 lastcount 0 ldelay 1.0ms drop_next 0us
        maxpacket 68130 ecn_mark 0 drop_overlimit 0 ce_mark 72384
      qdisc codel 8064: dev eth1 parent 1:2 limit 1000p target 5.0ms ce_threshold 1.0ms interval 100.0ms
       Sent 7636486190 bytes 5043942 pkt (dropped 0, overlimits 0 requeues 5186)
       rate 2319Mbit 191538pps backlog 207418b 64p requeues 5186
        count 0 lastcount 0 ldelay 694us drop_next 0us
        maxpacket 68130 ecn_mark 0 drop_overlimit 0 ce_mark 69873
      qdisc codel 8065: dev eth1 parent 1:3 limit 1000p target 5.0ms ce_threshold 1.0ms interval 100.0ms
       Sent 11569360142 bytes 7641602 pkt (dropped 0, overlimits 0 requeues 5554)
       rate 3041Mbit 251096pps backlog 210446b 59p requeues 5554
        count 0 lastcount 0 ldelay 889us drop_next 0us
        maxpacket 68130 ecn_mark 0 drop_overlimit 0 ce_mark 37780
      ...
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Glenn Judd <glenn.judd@morganstanley.com>
      Cc: Nandita Dukkipati <nanditad@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      80ba92fa
    • Varka Bhadram's avatar
      ethernet: qualcomm: use spi instead of spi_device · cf9d0dcc
      Varka Bhadram authored
      All spi based drivers have an instance of struct spi_device
      as spi. This patch renames spi_device to spi to synchronize
      with all the drivers.
      Signed-off-by: default avatarVarka Bhadram <varkab@cdac.in>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cf9d0dcc
    • David S. Miller's avatar
      Merge branch 'pktgen-next' · 3e3b3468
      David S. Miller authored
      Jesper Dangaard Brouer says:
      
      ====================
      The following series introduce some pktgen changes
      
      Patch01:
       Cleanup my own work when I introduced NO_TIMESTAMP.
      
      Patch02:
       Took over patch from Alexei, and addressed my own concerns, as Alexie
       is too busy with other work, and this will provide an easy tool for
       measuring ingress path performance, which is a hot topic ATM.
      
       Changes were primarily user interface related.  Introduced a separate
       "xmit_mode" setting, instead of stealing one of the dev flags like
       Alexei did.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3e3b3468
    • Alexei Starovoitov's avatar
      pktgen: introduce xmit_mode '<start_xmit|netif_receive>' · 62f64aed
      Alexei Starovoitov authored
      Introduce xmit_mode 'netif_receive' for pktgen which generates the
      packets using familiar pktgen commands, but feeds them into
      netif_receive_skb() instead of ndo_start_xmit().
      
      Default mode is called 'start_xmit'.
      
      It is designed to test netif_receive_skb and ingress qdisc
      performace only. Make sure to understand how it works before
      using it for other rx benchmarking.
      
      Sample script 'pktgen.sh':
      \#!/bin/bash
      function pgset() {
        local result
      
        echo $1 > $PGDEV
      
        result=`cat $PGDEV | fgrep "Result: OK:"`
        if [ "$result" = "" ]; then
          cat $PGDEV | fgrep Result:
        fi
      }
      
      [ -z "$1" ] && echo "Usage: $0 DEV" && exit 1
      ETH=$1
      
      PGDEV=/proc/net/pktgen/kpktgend_0
      pgset "rem_device_all"
      pgset "add_device $ETH"
      
      PGDEV=/proc/net/pktgen/$ETH
      pgset "xmit_mode netif_receive"
      pgset "pkt_size 60"
      pgset "dst 198.18.0.1"
      pgset "dst_mac 90:e2:ba:ff:ff:ff"
      pgset "count 10000000"
      pgset "burst 32"
      
      PGDEV=/proc/net/pktgen/pgctrl
      echo "Running... ctrl^C to stop"
      pgset "start"
      echo "Done"
      cat /proc/net/pktgen/$ETH
      
      Usage:
      $ sudo ./pktgen.sh eth2
      ...
      Result: OK: 232376(c232372+d3) usec, 10000000 (60byte,0frags)
        43033682pps 20656Mb/sec (20656167360bps) errors: 10000000
      
      Raw netif_receive_skb speed should be ~43 million packet
      per second on 3.7Ghz x86 and 'perf report' should look like:
        37.69%  kpktgend_0   [kernel.vmlinux]  [k] __netif_receive_skb_core
        25.81%  kpktgend_0   [kernel.vmlinux]  [k] kfree_skb
         7.22%  kpktgend_0   [kernel.vmlinux]  [k] ip_rcv
         5.68%  kpktgend_0   [pktgen]          [k] pktgen_thread_worker
      
      If fib_table_lookup is seen on top, it means skb was processed
      by the stack. To benchmark netif_receive_skb only make sure
      that 'dst_mac' of your pktgen script is different from
      receiving device mac and it will be dropped by ip_rcv
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      62f64aed
    • Jesper Dangaard Brouer's avatar
      pktgen: adjust flag NO_TIMESTAMP to be more pktgen compliant · f1f00d8f
      Jesper Dangaard Brouer authored
      Allow flag NO_TIMESTAMP to turn timestamping on again, like other flags,
      with a negation of the flag like !NO_TIMESTAMP.
      
      Also document the option flag NO_TIMESTAMP.
      
      Fixes: afb84b62 ("pktgen: add flag NO_TIMESTAMP to disable timestamping")
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f1f00d8f
    • David S. Miller's avatar
      Merge branch 'netns-scalability' · 4d95b72f
      David S. Miller authored
      Nicolas Dichtel says:
      
      ====================
      netns: ease netlink use with a lot of netns
      
      This idea was informally discussed in Ottawa / netdev0.1. The goal is to
      ease the use/scalability of netns, from a userland point of view.
      Today, users need to open one netlink socket per family and per netns.
      Thus, when the number of netns inscreases (for example 5K or more), the
      number of sockets needed to manage them grows a lot.
      
      The goal of this series is to be able to monitor netlink events, for a
      specified family, for a set of netns, with only one netlink socket. For
      this purpose, a netlink socket option is added: NETLINK_LISTEN_ALL_NSID.
      When this option is set on a netlink socket, this socket will receive
      netlink notifications from all netns that have a nsid assigned into the
      netns where the socket has been opened.
      The nsid is sent to userland via an anscillary data.
      
      Here is an example with a patched iproute2. vxlan10 is created in the
      current netns (netns0, nsid 0) and then moved to another netns (netns1,
      nsid 1):
      
      $ ip netns exec netns0 ip monitor all-nsid label
      [nsid 0][NSID]nsid 1 (iproute2 netns name: netns1)
      [nsid 0][NEIGH]??? lladdr 00:00:00:00:00:00 REACHABLE,PERMANENT
      [nsid 0][LINK]5: vxlan10@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default
          link/ether 92:33:17:e6:e7:1d brd ff:ff:ff:ff:ff:ff
      [nsid 0][LINK]Deleted 5: vxlan10@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default
          link/ether 92:33:17:e6:e7:1d brd ff:ff:ff:ff:ff:ff
      [nsid 1][NSID]nsid 0 (iproute2 netns name: netns0)
      [nsid 1][LINK]5: vxlan10@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default
          link/ether 92:33:17:e6:e7:1d brd ff:ff:ff:ff:ff:ff link-netnsid 0
      [nsid 1][ADDR]5: vxlan10    inet 192.168.0.249/24 brd 192.168.0.255 scope global vxlan10
             valid_lft forever preferred_lft forever
      [nsid 1][ROUTE]local 192.168.0.249 dev vxlan10  table local  proto kernel  scope host  src 192.168.0.249
      [nsid 1][ROUTE]ff00::/8 dev vxlan10  table local  metric 256  pref medium
      [nsid 1][ROUTE]2001:123::/64 dev vxlan10  proto kernel  metric 256  pref medium
      [nsid 1][LINK]5: vxlan10@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
          link/ether 92:33:17:e6:e7:1d brd ff:ff:ff:ff:ff:ff link-netnsid 0
      [nsid 1][ROUTE]broadcast 192.168.0.255 dev vxlan10  table local  proto kernel  scope link  src 192.168.0.249
      [nsid 1][ROUTE]192.168.0.0/24 dev vxlan10  proto kernel  scope link  src 192.168.0.249
      [nsid 1][ROUTE]broadcast 192.168.0.0 dev vxlan10  table local  proto kernel  scope link  src 192.168.0.249
      [nsid 1][ROUTE]fe80::/64 dev vxlan10  proto kernel  metric 256  pref medium
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d95b72f
    • Nicolas Dichtel's avatar
      netlink: allow to listen "all" netns · 59324cf3
      Nicolas Dichtel authored
      More accurately, listen all netns that have a nsid assigned into the netns
      where the netlink socket is opened.
      For this purpose, a netlink socket option is added:
      NETLINK_LISTEN_ALL_NSID. When this option is set on a netlink socket, this
      socket will receive netlink notifications from all netns that have a nsid
      assigned into the netns where the socket has been opened. The nsid is sent
      to userland via an anscillary data.
      
      With this patch, a daemon needs only one socket to listen many netns. This
      is useful when the number of netns is high.
      
      Because 0 is a valid value for a nsid, the field nsid_is_set indicates if
      the field nsid is valid or not. skb->cb is initialized to 0 on skb
      allocation, thus we are sure that we will never send a nsid 0 by error to
      the userland.
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      59324cf3
    • Nicolas Dichtel's avatar
      netlink: rename private flags and states · cc3a572f
      Nicolas Dichtel authored
      These flags and states have the same prefix (NETLINK_) that netlink socket
      options. To avoid confusion and to be able to name a flag like a socket
      option, let's use an other prefix: NETLINK_[S|F]_.
      
      Note: a comment has been fixed, it was talking about
      NETLINK_RECV_NO_ENOBUFS socket option instead of NETLINK_NO_ENOBUFS.
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc3a572f
    • Nicolas Dichtel's avatar
      netns: use a spin_lock to protect nsid management · 95f38411
      Nicolas Dichtel authored
      Before this patch, nsid were protected by the rtnl lock. The goal of this
      patch is to be able to find a nsid without needing to hold the rtnl lock.
      
      The next patch will introduce a netlink socket option to listen to all
      netns that have a nsid assigned into the netns where the socket is opened.
      Thus, it's important to call rtnl_net_notifyid() outside the spinlock, to
      avoid a recursive lock (nsid are notified via rtnl). This was the main
      reason of the previous patch.
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95f38411
    • Nicolas Dichtel's avatar
      netns: notify new nsid outside __peernet2id() · 3138dbf8
      Nicolas Dichtel authored
      There is no functional change with this patch. It will ease the refactoring
      of the locking system that protects nsids and the support of the netlink
      socket option NETLINK_LISTEN_ALL_NSID.
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3138dbf8
    • Nicolas Dichtel's avatar
      netns: rename peernet2id() to peernet2id_alloc() · 7a0877d4
      Nicolas Dichtel authored
      In a following commit, a new function will be introduced to only lookup for
      a nsid (no allocation if the nsid doesn't exist). To avoid confusion, the
      existing function is renamed.
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7a0877d4
    • Nicolas Dichtel's avatar
      netns: always provide the id to rtnl_net_fill() · cab3c8ec
      Nicolas Dichtel authored
      The goal of this commit is to prepare the rework of the locking of nsnid
      protection.
      After this patch, rtnl_net_notifyid() will not call anymore __peernet2id(),
      ie no idr_* operation into this function.
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cab3c8ec
    • Nicolas Dichtel's avatar
      netns: returns always an id in __peernet2id() · 109582af
      Nicolas Dichtel authored
      All callers of this function expect a nsid, not an error.
      Thus, returns NETNSA_NSID_NOT_ASSIGNED in case of error so that callers
      don't have to convert the error to NETNSA_NSID_NOT_ASSIGNED.
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      109582af
    • David S. Miller's avatar
      Merge tag 'linux-can-next-for-4.2-20150506' of... · 43996fdd
      David S. Miller authored
      Merge tag 'linux-can-next-for-4.2-20150506' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can-next 2015-05-06
      
      this is a pull request of a seven patches for net-next/master.
      
      Andreas Gröger contributes two patches for the janz-ican3 driver. In
      the first patch, the documentation for already existing sysfs entries
      is added, the second patch adds support for another module/firmware
      variant. A patch by Shawn Landden makes the padding in the struct
      can_frame explicit. The next 4 patches target the flexcan driver, the
      first one is by David Jander adding some documentation, the reaming
      three by me add more documentation and two small code cleanups.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      43996fdd
  3. 09 May, 2015 6 commits
    • Harini Katakam's avatar
      net: macb: Add change_mtu callback with jumbo support · a5898ea0
      Harini Katakam authored
      Add macb_change_mtu callback; if jumbo frame support is present allow
      mtu size changes upto (jumbo max length allowed - headers).
      Signed-off-by: default avatarHarini Katakam <harinik@xilinx.com>
      Reviewed-by: default avatarPunnaiah Choudary Kalluri <punnaia@xilinx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5898ea0
    • Harini Katakam's avatar
      net: macb: Add support for jumbo frames · 98b5a0f4
      Harini Katakam authored
      Enable jumbo frame support for Zynq Ultrascale+ MPSoC.
      Update the NWCFG register and descriptor length masks accordingly.
      Jumbo max length register should be set according to support in SoC; it is
      set to 10240 for Zynq Ultrascale+ MPSoC.
      Signed-off-by: default avatarHarini Katakam <harinik@xilinx.com>
      Reviewed-by: default avatarPunnaiah Choudary Kalluri <punnaia@xilinx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      98b5a0f4
    • Harini Katakam's avatar
      net: macb: Add compatible string for Zynq Ultrascale+ MPSoC · 7b61f9c1
      Harini Katakam authored
      Add compatible string and config structure for Zynq Ultrascale+ MPSoC
      Signed-off-by: default avatarHarini Katakam <harinik@xilinx.com>
      Reviewed-by: default avatarPunnaiah Choudary Kalluri <punnaia@xilinx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7b61f9c1
    • Harini Katakam's avatar
      devicetree: Add compatible string for Zynq Ultrascale+ MPSoC · 988d6f07
      Harini Katakam authored
      Add "cdns,zynqmp-gem" to be used for Zynq Ultrascale+ MPSoC.
      Signed-off-by: default avatarHarini Katakam <harinik@xilinx.com>
      Reviewed-by: default avatarPunnaiah Choudary Kalluri <punnaia@xilinx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      988d6f07
    • Jason Baron's avatar
      tcp: set SOCK_NOSPACE under memory pressure · 790ba456
      Jason Baron authored
      Under tcp memory pressure, calling epoll_wait() in edge triggered
      mode after -EAGAIN, can result in an indefinite hang in epoll_wait(),
      even when there is sufficient memory available to continue making
      progress. The problem is that when __sk_mem_schedule() returns 0
      under memory pressure, we do not set the SOCK_NOSPACE flag in the
      tcp write paths (tcp_sendmsg() or do_tcp_sendpages()). Then, since
      SOCK_NOSPACE is used to trigger wakeups when incoming acks create
      sufficient new space in the write queue, all outstanding packets
      are acked, but we never wake up with the the EPOLLOUT that we are
      expecting from epoll_wait().
      
      This issue is currently limited to epoll() when used in edge trigger
      mode, since 'tcp_poll()', does in fact currently set SOCK_NOSPACE.
      This is sufficient for poll()/select() and epoll() in level trigger
      mode. However, in edge trigger mode, epoll() is relying on the write
      path to set SOCK_NOSPACE. EPOLL(7) says that in edge-trigger mode we
      can only call epoll_wait() after read/write return -EAGAIN. Thus, in
      the case of the socket write, we are relying on the fact that
      tcp_sendmsg()/network write paths are going to issue a wakeup for
      us at some point in the future when we get -EAGAIN.
      
      Normally, epoll() edge trigger works fine when we've exceeded the
      sk->sndbuf because in that case we do set SOCK_NOSPACE. However, when
      we return -EAGAIN from the write path b/c we are over the tcp memory
      limits and not b/c we are over the sndbuf, we are never going to get
      another wakeup.
      
      I can reproduce this issue, using SO_SNDBUF, since __sk_mem_schedule()
      will return 0, or failure more readily with SO_SNDBUF:
      
      1) create socket and set SO_SNDBUF to N
      2) add socket as edge trigger
      3) write to socket and block in epoll on -EAGAIN
      4) cause tcp mem pressure via: echo "<small val>" > net.ipv4.tcp_mem
      
      The fix here is simply to set SOCK_NOSPACE in sk_stream_wait_memory()
      when the socket is non-blocking. Note that SOCK_NOSPACE, in addition
      to waking up outstanding waiters is also used to expand the size of
      the sk->sndbuf. However, we will not expand it by setting it in this
      case because tcp_should_expand_sndbuf(), ensures that no expansion
      occurs when we are under tcp memory pressure.
      
      Note that we could still hang if sk->sk_wmem_queue is 0, when we get
      the -EAGAIN. In this case the SOCK_NOSPACE bit will not help, since we
      are waiting for and event that will never happen. I believe
      that this case is harder to hit (and did not hit in my testing),
      in that over the tcp 'soft' memory limits, we continue to guarantee a
      minimum write buffer size. Perhaps, we could return -ENOSPC in this
      case, or maybe we simply issue a wakeup in this case, such that we
      keep retrying the write. Note that this case is not specific to
      epoll() ET, but rather would affect blocking sockets as well. So I
      view this patch as bringing epoll() edge-trigger into sync with the
      current poll()/select()/epoll() level trigger and blocking sockets
      behavior.
      Signed-off-by: default avatarJason Baron <jbaron@akamai.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      790ba456
    • Claudiu Manoil's avatar
      gianfar: Enable changing mac addr when if up · 3d23a05c
      Claudiu Manoil authored
      Use device flag IFF_LIVE_ADDR_CHANGE to signal that
      the device supports changing the hardware address when
      the device is running.
      This allows eth_mac_addr() to change the mac address
      also when the network device's interface is open.
      This capability is required by certain applications,
      like bonding mode 6 (Adaptive Load Balancing).
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3d23a05c