1. 25 Jan, 2017 13 commits
    • Wei Wang's avatar
      net/tcp-fastopen: Add new API support · 19f6d3f3
      Wei Wang authored
      This patch adds a new socket option, TCP_FASTOPEN_CONNECT, as an
      alternative way to perform Fast Open on the active side (client). Prior
      to this patch, a client needs to replace the connect() call with
      sendto(MSG_FASTOPEN). This can be cumbersome for applications who want
      to use Fast Open: these socket operations are often done in lower layer
      libraries used by many other applications. Changing these libraries
      and/or the socket call sequences are not trivial. A more convenient
      approach is to perform Fast Open by simply enabling a socket option when
      the socket is created w/o changing other socket calls sequence:
        s = socket()
          create a new socket
        setsockopt(s, IPPROTO_TCP, TCP_FASTOPEN_CONNECT …);
          newly introduced sockopt
          If set, new functionality described below will be used.
          Return ENOTSUPP if TFO is not supported or not enabled in the
          kernel.
      
        connect()
          With cookie present, return 0 immediately.
          With no cookie, initiate 3WHS with TFO cookie-request option and
          return -1 with errno = EINPROGRESS.
      
        write()/sendmsg()
          With cookie present, send out SYN with data and return the number of
          bytes buffered.
          With no cookie, and 3WHS not yet completed, return -1 with errno =
          EINPROGRESS.
          No MSG_FASTOPEN flag is needed.
      
        read()
          Return -1 with errno = EWOULDBLOCK/EAGAIN if connect() is called but
          write() is not called yet.
          Return -1 with errno = EWOULDBLOCK/EAGAIN if connection is
          established but no msg is received yet.
          Return number of bytes read if socket is established and there is
          msg received.
      
      The new API simplifies life for applications that always perform a write()
      immediately after a successful connect(). Such applications can now take
      advantage of Fast Open by merely making one new setsockopt() call at the time
      of creating the socket. Nothing else about the application's socket call
      sequence needs to change.
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      19f6d3f3
    • Wei Wang's avatar
      net: Remove __sk_dst_reset() in tcp_v6_connect() · 25776aa9
      Wei Wang authored
      Remove __sk_dst_reset() in the failure handling because __sk_dst_reset()
      will eventually get called when sk is released. No need to handle it in
      the protocol specific connect call.
      This is also to make the code path consistent with ipv4.
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      25776aa9
    • Wei Wang's avatar
      net/tcp-fastopen: refactor cookie check logic · 065263f4
      Wei Wang authored
      Refactor the cookie check logic in tcp_send_syn_data() into a function.
      This function will be called else where in later changes.
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      065263f4
    • hayeswang's avatar
      r8152: fix the wrong spelling · a9c54ad2
      hayeswang authored
      Replace rumtime with runtime.
      Signed-off-by: default avatarHayes Wang <hayeswang@realtek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9c54ad2
    • Andrew Lunn's avatar
      Doc: DT: bindings: net: dsa: marvell.txt: Tabification · d2345599
      Andrew Lunn authored
      Replace spaces with tabs. Fix indentation to be multiples of tabs, not
      a mixture or tabs and spaces.
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d2345599
    • David S. Miller's avatar
      Merge branch 'bpf-tracepoints' · cca316f3
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      BPF tracepoints
      
      This set adds tracepoints to BPF for better introspection and
      debugging. The first two patches are prerequisite for the actual
      third patch that adds the tracepoints. I think the first two are
      small and straight forward enough that they could ideally go via
      net-next, but I'm also open to other suggestions on how to route
      them in case that's not applicable (it would reduce potential
      merge conflicts on BPF side, though). For details, please see
      individual patches.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cca316f3
    • Daniel Borkmann's avatar
      bpf: add initial bpf tracepoints · a67edbf4
      Daniel Borkmann authored
      This work adds a number of tracepoints to paths that are either
      considered slow-path or exception-like states, where monitoring or
      inspecting them would be desirable.
      
      For bpf(2) syscall, tracepoints have been placed for main commands
      when they succeed. In XDP case, tracepoint is for exceptions, that
      is, f.e. on abnormal BPF program exit such as unknown or XDP_ABORTED
      return code, or when error occurs during XDP_TX action and the packet
      could not be forwarded.
      
      Both have been split into separate event headers, and can be further
      extended. Worst case, if they unexpectedly should get into our way in
      future, they can also removed [1]. Of course, these tracepoints (like
      any other) can be analyzed by eBPF itself, etc. Example output:
      
        # ./perf record -a -e bpf:* sleep 10
        # ./perf script
        sock_example  6197 [005]   283.980322:      bpf:bpf_map_create: map type=ARRAY ufd=4 key=4 val=8 max=256 flags=0
        sock_example  6197 [005]   283.980721:       bpf:bpf_prog_load: prog=a5ea8fa30ea6849c type=SOCKET_FILTER ufd=5
        sock_example  6197 [005]   283.988423:   bpf:bpf_prog_get_type: prog=a5ea8fa30ea6849c type=SOCKET_FILTER
        sock_example  6197 [005]   283.988443: bpf:bpf_map_lookup_elem: map type=ARRAY ufd=4 key=[06 00 00 00] val=[00 00 00 00 00 00 00 00]
        [...]
        sock_example  6197 [005]   288.990868: bpf:bpf_map_lookup_elem: map type=ARRAY ufd=4 key=[01 00 00 00] val=[14 00 00 00 00 00 00 00]
             swapper     0 [005]   289.338243:    bpf:bpf_prog_put_rcu: prog=a5ea8fa30ea6849c type=SOCKET_FILTER
      
        [1] https://lwn.net/Articles/705270/Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a67edbf4
    • Daniel Borkmann's avatar
      lib, traceevent: add PRINT_HEX_STR variant · 0fe05591
      Daniel Borkmann authored
      Add support for the __print_hex_str() macro that was added for
      tracing, so that user space tools such as perf can understand
      it as well.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0fe05591
    • Daniel Borkmann's avatar
      trace: add variant without spacing in trace_print_hex_seq · 2acae0d5
      Daniel Borkmann authored
      For upcoming tracepoint support for BPF, we want to dump the program's
      tag. Format should be similar to __print_hex(), but without spacing.
      Add a __print_hex_str() variant for exactly that purpose that reuses
      trace_print_hex_seq().
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2acae0d5
    • Eric Dumazet's avatar
      tcp: reduce skb overhead in selected places · 60b1af33
      Eric Dumazet authored
      tcp_add_backlog() can use skb_condense() helper to get better
      gains and less SKB_TRUESIZE() magic. This only happens when socket
      backlog has to be used.
      
      Some attacks involve specially crafted out of order tiny TCP packets,
      clogging the ofo queue of (many) sockets.
      Then later, expensive collapse happens, trying to copy all these skbs
      into single ones.
      This unfortunately does not work if each skb has no neighbor in TCP
      sequence order.
      
      By using skb_condense() if the skb could not be coalesced to a prior
      one, we defeat these kind of threats, potentially saving 4K per skb
      (or more, since this is one page fragment).
      
      A typical NAPI driver allocates gro packets with GRO_MAX_HEAD bytes
      in skb->head, meaning the copy done by skb_condense() is limited to
      about 200 bytes.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      60b1af33
    • David S. Miller's avatar
      Merge tag 'mlx5-updates-2017-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 716dcaeb
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5-updates-2017-24-01
      
      The first seven patches from Or Gerlitz in this series further enhances
      the mlx5 SRIOV switchdev mode to support offloading IPv6 tunnels using the
      TC tunnel key set (encap) and unset (decap) actions.
      
      Or Gerlitz says:
      ========================
      As part of doing this change, few cleanups are done in the IPv4 code,
      later we move to use the full tunnel key info provided to the driver as
      the key for our internal hashing which is used to identify cases where
      the same tunnel is used for encapsulating multiple flows. As done in the
      IPv4 case, the control path for offloading IPv6 tunnels uses route/neigh
      lookups and construction of the IPv6 tunnel headers on the encap path and
      matching on the outer hears in the decap path.
      
      The last patch of the series enlarges the HW FDB size for the switchdev mode,
      so it has now room to contain offloaded flows as many as min(max number
      of HW flow counters supported, max HW table size supported).
      ========================
      
      Next to Or's series you can find several patches handling several topics.
      
      From Mohamad, add support for SRIOV VF min rate guarantee by using the
      TSAR BW share weights mechanism.
      
      From Or, Two patches to enable Eth VFs to query their min-inline value for
      user-space.
      for that we move a mlx5 low level min inline helper function from mlx5
      ethernet driver into the core driver and then use it in mlx5_ib to expose
      the inline mode to rdma applications through libmlx5.
      
      From Kamal Heib, Reduce memory consumption on kdump kernel.
      
      From Shaker Daibes, code reuse in CQE compression control logic
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      716dcaeb
    • Dan Carpenter's avatar
      tipc: uninitialized return code in tipc_setsockopt() · a08ef476
      Dan Carpenter authored
      We shuffled some code around and added some new case statements here and
      now "res" isn't initialized on all paths.
      
      Fixes: 01fd12bb ("tipc: make replicast a user selectable option")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a08ef476
    • Jamal Hadi Salim's avatar
      net sched actions: Add support for user cookies · 1045ba77
      Jamal Hadi Salim authored
      Introduce optional 128-bit action cookie.
      Like all other cookie schemes in the networking world (eg in protocols
      like http or existing kernel fib protocol field, etc) the idea is to save
      user state that when retrieved serves as a correlator. The kernel
      _should not_ intepret it.  The user can store whatever they wish in the
      128 bits.
      
      Sample exercise(showing variable length use of cookie)
      
      .. create an accept action with cookie a1b2c3d4
      sudo $TC actions add action ok index 1 cookie a1b2c3d4
      
      .. dump all gact actions..
      sudo $TC -s actions ls action gact
      
          action order 0: gact action pass
           random type none pass val 0
           index 1 ref 1 bind 0 installed 5 sec used 5 sec
          Action statistics:
          Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
          backlog 0b 0p requeues 0
          cookie a1b2c3d4
      
      .. bind the accept action to a filter..
      sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
      u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 1
      
      ... send some traffic..
      $ ping 127.0.0.1 -c 3
      PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
      64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.020 ms
      64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms
      64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.038 ms
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1045ba77
  2. 24 Jan, 2017 27 commits