Commits · fd1914b2901badd942f008ce57bf4a938d29fde4 · Kirill Smelkov / linux

11 Feb, 2016 12 commits

Merge branch 'tcp-fast-so_reuseport' · fd1914b2

David S. Miller authored Feb 11, 2016

Craig Gallek says:

====================
Faster SO_REUSEPORT for TCP

This patch series complements an earlier series (6a5ef90c)
which added faster SO_REUSEPORT lookup for UDP sockets by
extending the feature to TCP sockets.  It uses the same
array-based data structure which allows for socket selection
after finding the first listening socket that matches an incoming
packet.  Prior to this feature, every socket in the reuseport
group needed to be found and examined before a selection could be
made.

With this series the SO_ATTACH_REUSEPORT_CBPF and
SO_ATTACH_REUSEPORT_EBPF socket options now work for TCP sockets
as well.  The test at the end of the series includes an example of
how to use these options to select a reuseport socket based on the
cpu core id handling the incoming packet.

There are several refactoring patches that precede the feature
implementation.  Only the last two patches in this series
should result in any behavioral changes.

v4
- Fix build issue when compiling IPv6 as a module.  This required
  moving the ipv6_rcv_saddr_equal into an object that is included as a
  built-in object.  I included this change in the second patch which
  adds inet6_hash since that is where ipv6_rcv_saddr_equal will
  later be called from non-module code.

v3:
- Another warning in the first patch caught by a build bot.  Return 0 in
  the no-op UDP hash function.

v2:
- In the first patched I missed a couple of hash functions that should now be
  returning int instead of void.  I missed these the first time through as it
  only generated a warning and not an error :\
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

fd1914b2

soreuseport: BPF selection functional test for TCP · 4b2a6aed

Craig Gallek authored Feb 10, 2016

Unfortunately the existing test relied on packet payload in order to
map incoming packets to sockets.  In order to get this to work with TCP,
TCP_FASTOPEN needed to be used.

Since the fast open path is slightly different than the standard TCP path,
I created a second test which sends to reuseport group members based
on receiving cpu core id.  This will probably serve as a better
real-world example use as well.
Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4b2a6aed

soreuseport: fast reuseport TCP socket selection · c125e80b

Craig Gallek authored Feb 10, 2016

This change extends the fast SO_REUSEPORT socket lookup implemented
for UDP to TCP.  Listener sockets with SO_REUSEPORT and the same
receive address are additionally added to an array for faster
random access.  This means that only a single socket from the group
must be found in the listener list before any socket in the group can
be used to receive a packet.  Previously, every socket in the group
needed to be considered before handing off the incoming packet.

This feature also exposes the ability to use a BPF program when
selecting a socket from a reuseport group.
Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c125e80b

soreuseport: Prep for fast reuseport TCP socket selection · fa463497

Craig Gallek authored Feb 10, 2016

Both of the lines in this patch probably should have been included
in the initial implementation of this code for generic socket
support, but weren't technically necessary since only UDP sockets
were supported.

First, the sk_reuseport_cb points to a structure which assumes
each socket in the group has this pointer assigned at the same
time it's added to the array in the structure.  The sk_clone_lock
function breaks this assumption.  Since a child socket shouldn't
implicitly be in a reuseport group, the simple fix is to clear
the field in the clone.

Second, the SO_ATTACH_REUSEPORT_xBPF socket options require that
SO_REUSEPORT also be set first.  For UDP sockets, this is easily
enforced at bind-time since that process both puts the socket in
the appropriate receive hlist and updates the reuseport structures.
Since these operations can happen at two different times for TCP
sockets (bind and listen) it must be explicitly checked to enforce
the use of SO_REUSEPORT with SO_ATTACH_REUSEPORT_xBPF in the
setsockopt call.
Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fa463497

inet: refactor inet[6]_lookup functions to take skb · a583636a

Craig Gallek authored Feb 10, 2016

This is a preliminary step to allow fast socket lookup of SO_REUSEPORT
groups.  Doing so with a BPF filter will require access to the
skb in question.  This change plumbs the skb (and offset to payload
data) through the call stack to the listening socket lookup
implementations where it will be used in a following patch.
Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a583636a

tcp: __tcp_hdrlen() helper · d9b3fca2

Craig Gallek authored Feb 10, 2016

tcp_hdrlen is wasteful if you already have a pointer to struct tcphdr.
This splits the size calculation into a helper function that can be
used if a struct tcphdr is already available.
Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d9b3fca2

inet: create IPv6-equivalent inet_hash function · 496611d7

Craig Gallek authored Feb 10, 2016

In order to support fast lookups for TCP sockets with SO_REUSEPORT,
the function that adds sockets to the listening hash set needs
to be able to check receive address equality.  Since this equality
check is different for IPv4 and IPv6, we will need two different
socket hashing functions.

This patch adds inet6_hash identical to the existing inet_hash function
and updates the appropriate references.  A following patch will
differentiate the two by passing different comparison functions to
__inet_hash.

Additionally, in order to use the IPv6 address equality function from
inet6_hashtables (which is compiled as a built-in object when IPv6 is
enabled) it also needs to be in a built-in object file as well.  This
moves ipv6_rcv_saddr_equal into inet_hashtables to accomplish this.
Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

496611d7

sock: struct proto hash function may error · 086c653f

Craig Gallek authored Feb 10, 2016

In order to support fast reuseport lookups in TCP, the hash function
defined in struct proto must be capable of returning an error code.
This patch changes the function signature of all related hash functions
to return an integer and handles or propagates this return value at
all call sites.
Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

086c653f

Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge · 30c1de08

David S. Miller authored Feb 11, 2016

Antonio Quartulli says:

====================
Here you have a batch of patches by Sven Eckelmann that
drops our private reference counting implementation and
substitutes it with the kref objects/functions.

Then you have a patch, by Simon Wunderlich, that
makes the broadcast protection window code more generic so
that it can be re-used in the future by other components
with different requirements.

Lastly, Sven is also introducing two lockdep asserts in
functions operating on our TVLV container list, to make
sure that the proper lock is always acquired by the users.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

30c1de08

Merge branch 'be2net-next' · dba6cf55

David S. Miller authored Feb 11, 2016

Ajit Khaparde says:

====================
be2net Patch series

Please consider applying these two patches to net-next

  Patch-1: Request RSS capability of Rx interface depending on number of
    Rx rings
  Patch-2: Interpret and log new data that's added to the port
    misconfigure async event
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

dba6cf55

be2net: Interpret and log new data that's added to the port misconfigure async event · 51d1f98a

Ajit Khaparde authored Feb 10, 2016

>From FW version 11.0. onwards, the PORT_MISCONFIG event generated by the FW
will carry more information about the event in the "data_word1"
and "data_word2" fields. This patch adds support in the driver to parse the
new information and log it accordingly. This patch also changes some of the
messages that are being logged currently.
Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com>
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

51d1f98a

be2net: Request RSS capability of Rx interface depending on number of Rx rings · 62219066

Ajit Khaparde authored Feb 10, 2016

Currently we request RSS capability even if a single Rx ring is created.
As a result in few cases we unnecessarily consume an RSS capable interface
which is a limited resource in the chip.
This patch enables RSS on an interface only if more than one Rx ring
is created.
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

62219066

10 Feb, 2016 25 commits

batman-adv: Convert batadv_tt_common_entry to kref · 92dcdf09

Sven Eckelmann authored Jan 16, 2016

batman-adv uses a self-written reference implementation which is just based
on atomic_t. This is less obvious when reading the code than kref and
therefore increases the change that the reference counting will be missed.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <a@unstable.cc>

92dcdf09

batman-adv: Convert batadv_orig_node to kref · 7c124391

Sven Eckelmann authored Jan 16, 2016

7c124391

batman-adv: Convert batadv_orig_node_vlan to kref · 161a3be9

Sven Eckelmann authored Jan 16, 2016

161a3be9

batman-adv: Convert batadv_hard_iface to kref · 7a659d56

Sven Eckelmann authored Jan 16, 2016

7a659d56

batman-adv: Convert batadv_neigh_node to kref · 77ae32e8

Sven Eckelmann authored Jan 16, 2016

77ae32e8

batman-adv: Convert batadv_orig_ifinfo to kref · a6ba0d34

Sven Eckelmann authored Jan 16, 2016

a6ba0d34

batman-adv: Convert batadv_neigh_ifinfo to kref · 962c6832

Sven Eckelmann authored Jan 16, 2016

962c6832

batman-adv: Convert batadv_tt_orig_list_entry to kref · 6e8ef69d

Sven Eckelmann authored Jan 16, 2016

6e8ef69d

batman-adv: Convert batadv_tvlv_handler to kref · 32836f56

Sven Eckelmann authored Jan 16, 2016

32836f56

batman-adv: Convert batadv_tvlv_container to kref · f7157dd1

Sven Eckelmann authored Jan 16, 2016

f7157dd1

batman-adv: Convert batadv_dat_entry to kref · 68a6722c

Sven Eckelmann authored Jan 16, 2016

68a6722c

batman-adv: Convert batadv_nc_path to kref · 727e0cd5

Sven Eckelmann authored Jan 16, 2016

727e0cd5

batman-adv: Convert batadv_nc_node to kref · daf99b48

Sven Eckelmann authored Jan 16, 2016

daf99b48

batman-adv: Convert batadv_bla_claim to kref · 71b7e3d3

Sven Eckelmann authored Jan 16, 2016

71b7e3d3

batman-adv: Convert batadv_bla_backbone_gw to kref · 06e56ded

Sven Eckelmann authored Jan 16, 2016

06e56ded

batman-adv: Convert batadv_softif_vlan to kref · 6be4d30c

Sven Eckelmann authored Jan 16, 2016

6be4d30c

batman-adv: Convert batadv_gw_node to kref · e7aed321

Sven Eckelmann authored Jan 16, 2016

e7aed321

batman-adv: Convert batadv_hardif_neigh_node to kref · 90f564df

Sven Eckelmann authored Jan 16, 2016

90f564df

batman-adv: Add lockdep assert for container_list_lock · dded0692

Sven Eckelmann authored Dec 20, 2015

The batadv_tvlv_container* functions state in their kernel-doc that they
require tvlv.container_list_lock. Add an assert to automatically detect
when this might have been ignored by the caller.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <a@unstable.cc>

dded0692

batman-adv: add seqno maximum age and protection start flag parameters · 81f02683

Simon Wunderlich authored Nov 23, 2015

To allow future use of the window protected function with different
maximum sequence numbers, add a parameter to set this value which
was previously hardcoded. Another parameter added for future use is a
flag to return whether the protection window has started.

While at it, also fix the kerneldoc.
Signed-off-by: Simon Wunderlich <simon@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <a@unstable.cc>

81f02683

batman-adv: Drop reference to netdevice on last reference · 140ed8e8

Sven Eckelmann authored Jan 05, 2016

The references to the network device should be dropped inside the release
function for batadv_hard_iface similar to what is done with the batman-adv
internal datastructures.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <a@unstable.cc>

140ed8e8

sxgbe: remove unused code · aaa56720

Jean Sacren authored Feb 09, 2016

Remove the unused code of sxgbe_xpcs.
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Cc: Byungho An <bh74.an@samsung.com>
Cc: Girish K S <ks.giri@samsung.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1601191918470.2531@hadrienSigned-off-by: David S. Miller <davem@davemloft.net>

aaa56720

Merge branch 'renesas-bit-twiddling' · 12f08412

David S. Miller authored Feb 10, 2016

Sergei Shtylyov says:

====================
Factor out register bit twiddling in the Renesas Ethernet drivers

   Here's a set of 2 patches against DaveM's 'net-next.git' repo. We factor out
the often repeated pattern of reading a register, AND'ing and/or OR'ing some
bits, and then writing the value back.

[1/2] ravb: factor out register bit twiddling code
[2/2] sh_eth: factor out register bit twiddling code
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

12f08412

sh_eth: factor out register bit twiddling code · b2b14d2f

Sergei Shtylyov authored Feb 10, 2016

The driver has often repeated pattern of reading a register, AND'ing and/or
OR'ing some bits and writing the value back. Factor the pattern out into
sh_eth_modify() -- this saves 84 bytes of code with ARM gcc 4.7.3.

While at it, update Cogent Embedded's copyright.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b2b14d2f

ravb: factor out register bit twiddling code · 568b3ce7

Sergei Shtylyov authored Feb 10, 2016

The driver has often repeated pattern of reading a register, AND'ing and/or
OR'ing some bits and writing the value back. Factor the pattern out into
ravb_modify() -- this saves 260 bytes of code with ARM gcc 4.7.3.

While at it, update Cogent Embedded's copyrights.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

568b3ce7

09 Feb, 2016 3 commits

Merge branch 'tpacket-gso-csum-offload' · ef5c0e25

David S. Miller authored Feb 09, 2016

Willem de Bruijn says:

====================
packet: tpacket gso and csum offload

Extend PACKET_VNET_HDR socket option support to packet sockets with
memory mapped rings.

Patches 2 and 4 add support to tpacket_rcv and tpacket_snd.

Patch 1 prepares for this by moving the relevant virtio_net_hdr
logic out of packet_snd and packet_rcv into helper functions.

GSO transmission requires all headers in the skb linear section.
Patch 3 moves parsing of tx_ring slot headers before skb allocation
to enable allocation with sufficient linear size.

Changes
  v1->v2:
    - fix bounds checks:
      - subtract sizeof(vnet_hdr) before comparing tp_len to size_max
      - compare tp_len to size_max also with GSO, just do not truncate to MTU
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ef5c0e25

packet: tpacket_snd gso and checksum offload · 1d036d25

Willem de Bruijn authored Feb 03, 2016

Support socket option PACKET_VNET_HDR together with PACKET_TX_RING.

When enabled, a struct virtio_net_hdr is expected to precede the data
in the ring. The vnet option must be set before the ring is created.

The implementation reuses the existing skb_copy_bits code that is used
when dev->hard_header_len is non-zero. Move this ll_header check to
before the skb alloc and combine it with a test for vnet_hdr->hdr_len.
Allocate and copy the max of the two.

Verified with test program at
github.com/wdebruij/kerneltools/blob/master/tests/psock_txring_vnet.c
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1d036d25

packet: parse tpacket header before skb alloc · 8d39b4a6

Willem de Bruijn authored Feb 03, 2016

GSO packet headers must be stored in the linear skb segment.
Move tpacket header parsing before sock_alloc_send_skb. The GSO
follow-on patch will later increase the skb linear argument to
sock_alloc_send_skb if needed for large packets.

The header parsing code does not require an allocated skb, so is
safe to move. Later pass to tpacket_fill_skb the computed data
start and length.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8d39b4a6