1. 11 Feb, 2016 12 commits
    • David S. Miller's avatar
      Merge branch 'tcp-fast-so_reuseport' · fd1914b2
      David S. Miller authored
      Craig Gallek says:
      
      ====================
      Faster SO_REUSEPORT for TCP
      
      This patch series complements an earlier series (6a5ef90c)
      which added faster SO_REUSEPORT lookup for UDP sockets by
      extending the feature to TCP sockets.  It uses the same
      array-based data structure which allows for socket selection
      after finding the first listening socket that matches an incoming
      packet.  Prior to this feature, every socket in the reuseport
      group needed to be found and examined before a selection could be
      made.
      
      With this series the SO_ATTACH_REUSEPORT_CBPF and
      SO_ATTACH_REUSEPORT_EBPF socket options now work for TCP sockets
      as well.  The test at the end of the series includes an example of
      how to use these options to select a reuseport socket based on the
      cpu core id handling the incoming packet.
      
      There are several refactoring patches that precede the feature
      implementation.  Only the last two patches in this series
      should result in any behavioral changes.
      
      v4
      - Fix build issue when compiling IPv6 as a module.  This required
        moving the ipv6_rcv_saddr_equal into an object that is included as a
        built-in object.  I included this change in the second patch which
        adds inet6_hash since that is where ipv6_rcv_saddr_equal will
        later be called from non-module code.
      
      v3:
      - Another warning in the first patch caught by a build bot.  Return 0 in
        the no-op UDP hash function.
      
      v2:
      - In the first patched I missed a couple of hash functions that should now be
        returning int instead of void.  I missed these the first time through as it
        only generated a warning and not an error :\
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd1914b2
    • Craig Gallek's avatar
      soreuseport: BPF selection functional test for TCP · 4b2a6aed
      Craig Gallek authored
      Unfortunately the existing test relied on packet payload in order to
      map incoming packets to sockets.  In order to get this to work with TCP,
      TCP_FASTOPEN needed to be used.
      
      Since the fast open path is slightly different than the standard TCP path,
      I created a second test which sends to reuseport group members based
      on receiving cpu core id.  This will probably serve as a better
      real-world example use as well.
      Signed-off-by: default avatarCraig Gallek <kraig@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4b2a6aed
    • Craig Gallek's avatar
      soreuseport: fast reuseport TCP socket selection · c125e80b
      Craig Gallek authored
      This change extends the fast SO_REUSEPORT socket lookup implemented
      for UDP to TCP.  Listener sockets with SO_REUSEPORT and the same
      receive address are additionally added to an array for faster
      random access.  This means that only a single socket from the group
      must be found in the listener list before any socket in the group can
      be used to receive a packet.  Previously, every socket in the group
      needed to be considered before handing off the incoming packet.
      
      This feature also exposes the ability to use a BPF program when
      selecting a socket from a reuseport group.
      Signed-off-by: default avatarCraig Gallek <kraig@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c125e80b
    • Craig Gallek's avatar
      soreuseport: Prep for fast reuseport TCP socket selection · fa463497
      Craig Gallek authored
      Both of the lines in this patch probably should have been included
      in the initial implementation of this code for generic socket
      support, but weren't technically necessary since only UDP sockets
      were supported.
      
      First, the sk_reuseport_cb points to a structure which assumes
      each socket in the group has this pointer assigned at the same
      time it's added to the array in the structure.  The sk_clone_lock
      function breaks this assumption.  Since a child socket shouldn't
      implicitly be in a reuseport group, the simple fix is to clear
      the field in the clone.
      
      Second, the SO_ATTACH_REUSEPORT_xBPF socket options require that
      SO_REUSEPORT also be set first.  For UDP sockets, this is easily
      enforced at bind-time since that process both puts the socket in
      the appropriate receive hlist and updates the reuseport structures.
      Since these operations can happen at two different times for TCP
      sockets (bind and listen) it must be explicitly checked to enforce
      the use of SO_REUSEPORT with SO_ATTACH_REUSEPORT_xBPF in the
      setsockopt call.
      Signed-off-by: default avatarCraig Gallek <kraig@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fa463497
    • Craig Gallek's avatar
      inet: refactor inet[6]_lookup functions to take skb · a583636a
      Craig Gallek authored
      This is a preliminary step to allow fast socket lookup of SO_REUSEPORT
      groups.  Doing so with a BPF filter will require access to the
      skb in question.  This change plumbs the skb (and offset to payload
      data) through the call stack to the listening socket lookup
      implementations where it will be used in a following patch.
      Signed-off-by: default avatarCraig Gallek <kraig@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a583636a
    • Craig Gallek's avatar
      tcp: __tcp_hdrlen() helper · d9b3fca2
      Craig Gallek authored
      tcp_hdrlen is wasteful if you already have a pointer to struct tcphdr.
      This splits the size calculation into a helper function that can be
      used if a struct tcphdr is already available.
      Signed-off-by: default avatarCraig Gallek <kraig@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d9b3fca2
    • Craig Gallek's avatar
      inet: create IPv6-equivalent inet_hash function · 496611d7
      Craig Gallek authored
      In order to support fast lookups for TCP sockets with SO_REUSEPORT,
      the function that adds sockets to the listening hash set needs
      to be able to check receive address equality.  Since this equality
      check is different for IPv4 and IPv6, we will need two different
      socket hashing functions.
      
      This patch adds inet6_hash identical to the existing inet_hash function
      and updates the appropriate references.  A following patch will
      differentiate the two by passing different comparison functions to
      __inet_hash.
      
      Additionally, in order to use the IPv6 address equality function from
      inet6_hashtables (which is compiled as a built-in object when IPv6 is
      enabled) it also needs to be in a built-in object file as well.  This
      moves ipv6_rcv_saddr_equal into inet_hashtables to accomplish this.
      Signed-off-by: default avatarCraig Gallek <kraig@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      496611d7
    • Craig Gallek's avatar
      sock: struct proto hash function may error · 086c653f
      Craig Gallek authored
      In order to support fast reuseport lookups in TCP, the hash function
      defined in struct proto must be capable of returning an error code.
      This patch changes the function signature of all related hash functions
      to return an integer and handles or propagates this return value at
      all call sites.
      Signed-off-by: default avatarCraig Gallek <kraig@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      086c653f
    • David S. Miller's avatar
      Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge · 30c1de08
      David S. Miller authored
      Antonio Quartulli says:
      
      ====================
      Here you have a batch of patches by Sven Eckelmann that
      drops our private reference counting implementation and
      substitutes it with the kref objects/functions.
      
      Then you have a patch, by Simon Wunderlich, that
      makes the broadcast protection window code more generic so
      that it can be re-used in the future by other components
      with different requirements.
      
      Lastly, Sven is also introducing two lockdep asserts in
      functions operating on our TVLV container list, to make
      sure that the proper lock is always acquired by the users.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      30c1de08
    • David S. Miller's avatar
      Merge branch 'be2net-next' · dba6cf55
      David S. Miller authored
      Ajit Khaparde says:
      
      ====================
      be2net Patch series
      
      Please consider applying these two patches to net-next
      
        Patch-1: Request RSS capability of Rx interface depending on number of
          Rx rings
        Patch-2: Interpret and log new data that's added to the port
          misconfigure async event
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dba6cf55
    • Ajit Khaparde's avatar
      be2net: Interpret and log new data that's added to the port misconfigure async event · 51d1f98a
      Ajit Khaparde authored
      >From FW version 11.0. onwards, the PORT_MISCONFIG event generated by the FW
      will carry more information about the event in the "data_word1"
      and "data_word2" fields. This patch adds support in the driver to parse the
      new information and log it accordingly. This patch also changes some of the
      messages that are being logged currently.
      Signed-off-by: default avatarSuresh Reddy <suresh.reddy@broadcom.com>
      Signed-off-by: default avatarVenkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
      Signed-off-by: default avatarAjit Khaparde <ajit.khaparde@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      51d1f98a
    • Ajit Khaparde's avatar
      be2net: Request RSS capability of Rx interface depending on number of Rx rings · 62219066
      Ajit Khaparde authored
      Currently we request RSS capability even if a single Rx ring is created.
      As a result in few cases we unnecessarily consume an RSS capable interface
      which is a limited resource in the chip.
      This patch enables RSS on an interface only if more than one Rx ring
      is created.
      Signed-off-by: default avatarAjit Khaparde <ajit.khaparde@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      62219066
  2. 10 Feb, 2016 25 commits
  3. 09 Feb, 2016 3 commits
    • David S. Miller's avatar
      Merge branch 'tpacket-gso-csum-offload' · ef5c0e25
      David S. Miller authored
      Willem de Bruijn says:
      
      ====================
      packet: tpacket gso and csum offload
      
      Extend PACKET_VNET_HDR socket option support to packet sockets with
      memory mapped rings.
      
      Patches 2 and 4 add support to tpacket_rcv and tpacket_snd.
      
      Patch 1 prepares for this by moving the relevant virtio_net_hdr
      logic out of packet_snd and packet_rcv into helper functions.
      
      GSO transmission requires all headers in the skb linear section.
      Patch 3 moves parsing of tx_ring slot headers before skb allocation
      to enable allocation with sufficient linear size.
      
      Changes
        v1->v2:
          - fix bounds checks:
            - subtract sizeof(vnet_hdr) before comparing tp_len to size_max
            - compare tp_len to size_max also with GSO, just do not truncate to MTU
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef5c0e25
    • Willem de Bruijn's avatar
      packet: tpacket_snd gso and checksum offload · 1d036d25
      Willem de Bruijn authored
      Support socket option PACKET_VNET_HDR together with PACKET_TX_RING.
      
      When enabled, a struct virtio_net_hdr is expected to precede the data
      in the ring. The vnet option must be set before the ring is created.
      
      The implementation reuses the existing skb_copy_bits code that is used
      when dev->hard_header_len is non-zero. Move this ll_header check to
      before the skb alloc and combine it with a test for vnet_hdr->hdr_len.
      Allocate and copy the max of the two.
      
      Verified with test program at
      github.com/wdebruij/kerneltools/blob/master/tests/psock_txring_vnet.c
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1d036d25
    • Willem de Bruijn's avatar
      packet: parse tpacket header before skb alloc · 8d39b4a6
      Willem de Bruijn authored
      GSO packet headers must be stored in the linear skb segment.
      Move tpacket header parsing before sock_alloc_send_skb. The GSO
      follow-on patch will later increase the skb linear argument to
      sock_alloc_send_skb if needed for large packets.
      
      The header parsing code does not require an allocated skb, so is
      safe to move. Later pass to tpacket_fill_skb the computed data
      start and length.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d39b4a6