1. 07 Jun, 2015 7 commits
    • Sven Eckelmann's avatar
      batman-adv: Use common Jenkins Hash implementation · 36fd61cb
      Sven Eckelmann authored
      An unoptimized version of the Jenkins one-at-a-time hash function is used
      and partially copied all over the code wherever an hashtable is used.
      Instead the optimized version shared between the whole kernel should be
      used to reduce code duplication and use better optimized code.
      
      Only the DAT code must use the old implementation because it is used as
      distributed hash function which has to be common for all nodes.
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      36fd61cb
    • Alexei Starovoitov's avatar
      bpf: allow programs to write to certain skb fields · d691f9e8
      Alexei Starovoitov authored
      allow programs read/write skb->mark, tc_index fields and
      ((struct qdisc_skb_cb *)cb)->data.
      
      mark and tc_index are generically useful in TC.
      cb[0]-cb[4] are primarily used to pass arguments from one
      program to another called via bpf_tail_call() which can
      be seen in sockex3_kern.c example.
      
      All fields of 'struct __sk_buff' are readable to socket and tc_cls_act progs.
      mark, tc_index are writeable from tc_cls_act only.
      cb[0]-cb[4] are writeable by both sockets and tc_cls_act.
      
      Add verifier tests and improve sample code.
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d691f9e8
    • Alexei Starovoitov's avatar
      bpf: make programs see skb->data == L2 for ingress and egress · 3431205e
      Alexei Starovoitov authored
      eBPF programs attached to ingress and egress qdiscs see inconsistent skb->data.
      For ingress L2 header is already pulled, whereas for egress it's present.
      This is known to program writers which are currently forced to use
      BPF_LL_OFF workaround.
      Since programs don't change skb internal pointers it is safe to do
      pull/push right around invocation of the program and earlier taps and
      later pt->func() will not be affected.
      Multiple taps via packet_rcv(), tpacket_rcv() are doing the same trick
      around run_filter/BPF_PROG_RUN even if skb_shared.
      
      This fix finally allows programs to use optimized LD_ABS/IND instructions
      without BPF_LL_OFF for higher performance.
      tc ingress + cls_bpf + samples/bpf/tcbpf1_kern.o
             w/o JIT   w/JIT
      before  20.5     23.6 Mpps
      after   21.8     26.6 Mpps
      
      Old programs with BPF_LL_OFF will still work as-is.
      
      We can now undo most of the earlier workaround commit:
      a166151c ("bpf: fix bpf helpers to use skb->mac_header relative offsets")
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3431205e
    • Eric Dumazet's avatar
      tcp: remove redundant checks II · 98da81a4
      Eric Dumazet authored
      For same reasons than in commit 12e25e10 ("tcp: remove redundant
      checks"), we can remove redundant checks done for timewait sockets.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      98da81a4
    • Colin Ian King's avatar
      fddi: print an address with %p format specifier rather than %x · 908e80d6
      Colin Ian King authored
      The debug is printing the struct smt_header * address using
      the %x format specifier. Fix it to use %p instead.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      908e80d6
    • Nicholas Mc Guire's avatar
      wan: dscc4: fix build warning Wunused-but-set-variable · c5726d26
      Nicholas Mc Guire authored
      Fix:
      drivers/net/wan/dscc4.c: In function 'dscc4_open':
      drivers/net/wan/dscc4.c:1049:25: warning: variable 'ppriv' set but not used
      [-Wunused-but-set-variable]
      
      This has been in there unused since 1da177e4 (Linux-2.6.12-rc2) simply
      remove it.
      Signed-off-by: default avatarNicholas Mc Guire <hofrat@osadl.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5726d26
    • Eric Dumazet's avatar
      inet: add IP_BIND_ADDRESS_NO_PORT to overcome bind(0) limitations · 90c337da
      Eric Dumazet authored
      When an application needs to force a source IP on an active TCP socket
      it has to use bind(IP, port=x).
      
      As most applications do not want to deal with already used ports, x is
      often set to 0, meaning the kernel is in charge to find an available
      port.
      But kernel does not know yet if this socket is going to be a listener or
      be connected.
      It has very limited choices (no full knowledge of final 4-tuple for a
      connect())
      
      With limited ephemeral port range (about 32K ports), it is very easy to
      fill the space.
      
      This patch adds a new SOL_IP socket option, asking kernel to ignore
      the 0 port provided by application in bind(IP, port=0) and only
      remember the given IP address.
      
      The port will be automatically chosen at connect() time, in a way
      that allows sharing a source port as long as the 4-tuples are unique.
      
      This new feature is available for both IPv4 and IPv6 (Thanks Neal)
      
      Tested:
      
      Wrote a test program and checked its behavior on IPv4 and IPv6.
      
      strace(1) shows sequences of bind(IP=127.0.0.2, port=0) followed by
      connect().
      Also getsockname() show that the port is still 0 right after bind()
      but properly allocated after connect().
      
      socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 5
      setsockopt(5, SOL_IP, IP_BIND_ADDRESS_NO_PORT, [1], 4) = 0
      bind(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, 16) = 0
      getsockname(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0
      connect(5, {sa_family=AF_INET, sin_port=htons(53174), sin_addr=inet_addr("127.0.0.3")}, 16) = 0
      getsockname(5, {sa_family=AF_INET, sin_port=htons(38050), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0
      
      IPv6 test :
      
      socket(PF_INET6, SOCK_STREAM, IPPROTO_IP) = 7
      setsockopt(7, SOL_IP, IP_BIND_ADDRESS_NO_PORT, [1], 4) = 0
      bind(7, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
      getsockname(7, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
      connect(7, {sa_family=AF_INET6, sin6_port=htons(57300), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
      getsockname(7, {sa_family=AF_INET6, sin6_port=htons(60964), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
      
      I was able to bind()/connect() a million concurrent IPv4 sockets,
      instead of ~32000 before patch.
      
      lpaa23:~# ulimit -n 1000010
      lpaa23:~# ./bind --connect --num-flows=1000000 &
      1000000 sockets
      
      lpaa23:~# grep TCP /proc/net/sockstat
      TCP: inuse 2000063 orphan 0 tw 47 alloc 2000157 mem 66
      
      Check that a given source port is indeed used by many different
      connections :
      
      lpaa23:~# ss -t src :40000 | head -10
      State      Recv-Q Send-Q   Local Address:Port          Peer Address:Port
      ESTAB      0      0           127.0.0.2:40000         127.0.202.33:44983
      ESTAB      0      0           127.0.0.2:40000         127.2.27.240:44983
      ESTAB      0      0           127.0.0.2:40000           127.2.98.5:44983
      ESTAB      0      0           127.0.0.2:40000        127.0.124.196:44983
      ESTAB      0      0           127.0.0.2:40000         127.2.139.38:44983
      ESTAB      0      0           127.0.0.2:40000          127.1.59.80:44983
      ESTAB      0      0           127.0.0.2:40000          127.3.6.228:44983
      ESTAB      0      0           127.0.0.2:40000          127.0.38.53:44983
      ESTAB      0      0           127.0.0.2:40000         127.1.197.10:44983
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90c337da
  2. 06 Jun, 2015 8 commits
  3. 04 Jun, 2015 25 commits