Commits · f8c6455bb04b944edb69e9b074e28efee2c56bdd · nexedi / linux

11 Nov, 2014 10 commits

net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE · f8c6455b

Shani Michaeli authored Nov 09, 2014

When processing received traffic, pass CHECKSUM_COMPLETE status to the
stack, with calculated checksum for non TCP/UDP packets (such
as GRE or ICMP).

Although the stack expects checksum which doesn't include the pseudo
header, the HW adds it. To address that, we are subtracting the pseudo
header checksum from the checksum value provided by the HW.

In the IPv6 case, we also compute/add the IP header checksum which
is not added by the HW for such packets.

Cc: Jerry Chu <hkchu@google.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f8c6455b

net/mlx4_en: Extend usage of napi_gro_frags · dd65beac

Shani Michaeli authored Nov 09, 2014

We can call napi_gro_frags for all the received traffic regardless
of the checksum status. Specifically, received packets whose status
is CHECKSUM_NONE (and soon to be added CHECKSUM_COMPLETE)
are eligible for napi_gro_frags as well.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dd65beac

Merge branch 'so_incoming_cpu' · b00394c0

David S. Miller authored Nov 11, 2014

Eric Dumazet says:

====================
net: SO_INCOMING_CPU support

SO_INCOMING_CPU socket option (read by getsockopt()) provides
an alternative to RPS/RFS for high performance servers using
multi queues NIC.

TCP should use sk_mark_napi_id() for established sockets only.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b00394c0

net: introduce SO_INCOMING_CPU · 2c8c56e1

Eric Dumazet authored Nov 11, 2014

Alternative to RPS/RFS is to use hardware support for multiple
queues.

Then split a set of million of sockets into worker threads, each
one using epoll() to manage events on its own socket pool.

Ideally, we want one thread per RX/TX queue/cpu, but we have no way to
know after accept() or connect() on which queue/cpu a socket is managed.

We normally use one cpu per RX queue (IRQ smp_affinity being properly
set), so remembering on socket structure which cpu delivered last packet
is enough to solve the problem.

After accept(), connect(), or even file descriptor passing around
processes, applications can use :

 int cpu;
 socklen_t len = sizeof(cpu);

 getsockopt(fd, SOL_SOCKET, SO_INCOMING_CPU, &cpu, &len);

And use this information to put the socket into the right silo
for optimal performance, as all networking stack should run
on the appropriate cpu, without need to send IPI (RPS/RFS).
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2c8c56e1

tcp: move sk_mark_napi_id() at the right place · 3d97379a

Eric Dumazet authored Nov 11, 2014

sk_mark_napi_id() is used to record for a flow napi id of incoming
packets for busypoll sake.
We should do this only on established flows, not on listeners.

This was 'working' by virtue of the socket cloning, but doing
this on SYN packets in unecessary cache line dirtying.

Even if we move sk_napi_id in the same cache line than sk_lock,
we are working to make SYN processing lockless, so it is desirable
to set sk_napi_id only for established flows.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3d97379a

mlx4: restore conditional call to napi_complete_done() · 2e1af7d7

Eric Dumazet authored Nov 10, 2014

After commit 1a288172 ("mlx4: use napi_complete_done()") we ended up
calling napi_complete_done() in the case NAPI poll consumed all its
budget.

This added extra interrupt pressure, this patch restores proper
behavior.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 1a288172 ("mlx4: use napi_complete_done()")
Signed-off-by: David S. Miller <davem@davemloft.net>

2e1af7d7

Merge branch 'sunvnet-next' · d21385fa

David S. Miller authored Nov 10, 2014

Sowmini Varadhan says:

====================
sunvnet: edge-case/race-conditions bug fixes

This patch series contains fixes for race-conditions in sunvnet,
that can encountered when there is a difference in latency between
producer and consumer.

Patch 1 addresses a case when the STOPPED LDC ack from a peer is
processed before vnet_start_xmit can finish updating the dr->prod
state.

Patch 2 fixes the edge-case when outgoing data and incoming
stopped-ack cross each other in flight.

Patch 3 adds a missing rcu_read_unlock(), found by code-inspection.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

d21385fa

sunvnet: Add missing rcu_read_unlock() in vnet_start_xmit · df20286a

Sowmini Varadhan authored Nov 08, 2014

The out_dropped label will only do rcu_read_unlock for non-null port.
So add the missing rcu_read_unlock() when bailing due to non-null port.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

df20286a

sunvnet: vnet_ack() should check if !start_cons to send a missed trigger · 777362d7

Sowmini Varadhan authored Nov 08, 2014

As per comments in vnet_start_xmit, for the edge case
when outgoing vnet_start_xmit() data and an incoming STOPPED
ACK cross each other in flight, we may need to send the missed
START trigger from maybe_tx_wakeup() after checking for a
false value of start_cons
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

777362d7

sunvnet: Fix race between vnet_start_xmit() and vnet_ack() · b0cffed5

Sowmini Varadhan authored Nov 08, 2014

When vnet_start_xmit() is concurrent with vnet_ack(), we may
have a race that looks like:

    thread 1                              thread 2
    vnet_start_xmit                       vnet_event_napi -> vnet_rx

__vnet_tx_trigger for some desc X
at this point dr->prod == X
                                        peer sends back a stopped ack for X
                                        we process X, but X == dr->prod
                                        so we bail out in vnet_ack with
                                        !idx_is_pending
update dr->prod

As a result of the fact that we never processed the stopped ack for X,
the Tx path is led to incorrectly believe that the peer is still
"started" and reading, but the peer has stopped reading, which will
ultimately end in flow-control assertions.

The fix is to synchronize the above 2 paths  on the netif_tx_lock.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b0cffed5

10 Nov, 2014 15 commits

8139too: Allow using the largest possible MTU · 6f6e741f

Alban Bedel authored Nov 08, 2014

This driver allows MTU up to 1518 bytes which is not enought to run
batman-adv. Simply raise the maximum packet size up to the maximum
allowed by the transmit descriptor, 1792 bytes, giving a maximum MTU
of 1774 bytes.
Signed-off-by: Alban Bedel <albeu@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

6f6e741f

8139too: Allow setting MTU larger than 1500 · ef786f10

Alban Bedel authored Nov 08, 2014

Replace the default ndo_change_mtu callback with one that allow
setting MTU that the driver can handle.
Signed-off-by: Alban Bedel <albeu@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

ef786f10

Merge tag 'master-2014-11-04' of... · b9217266

David S. Miller authored Nov 10, 2014

Merge tag 'master-2014-11-04' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next

John W. Linville says:

====================
pull request: wireless-next 2014-11-07

Please pull this batch of updates intended for the 3.19 stream!

For the mac80211 bits, Johannes says:

"This relatively large batch of changes is comprised of the following:
 * large mac80211-hwsim changes from Ben, Jukka and a bit myself
 * OCB/WAVE/11p support from Rostislav on behalf of the Czech Technical
   University in Prague and Volkswagen Group Research
 * minstrel VHT work from Karl
 * more CSA work from Luca
 * WMM admission control support in mac80211 (myself)
 * various smaller fixes, spelling corrections, and minor API additions"

For the Bluetooth bits, Johan says:

"Here's the first bluetooth-next pull request for 3.19. The vast majority
of patches are for ieee802154 from Alexander Aring with various fixes
and cleanups. There are also several LE/SMP fixes as well as improved
support for handling LE devices that have lost their pairing information
(the patches from Alfonso). Jukka provides a couple of stability fixes
for 6lowpan and Szymon conformance fixes for RFCOMM. For the HCI drivers
we have one new USB ID for an Acer controller as well as a reset
handling fix for H5."

For the Atheros bits, Kalle says:

"Major changes are:

o ethtool support (Ben)

o print dev string prefix with debug hex buffers dump (Michal)

o debugfs file to read calibration data from the firmware verification
  purposes (me)

o fix fw_stats debugfs file, now results are more reliable (Michal)

o firmware crash counters via debugfs (Ben&me)

o various tracing points to debug firmware (Rajkumar)

o make it possible to provide firmware calibration data via a file (me)

And we have quite a lot of smaller fixes and clean up."

For the iwlwifi bits, Emmanuel says:

"The big new thing here is netdetect which allows the
firmware to wake up the platform when a specific network
is detected. Along with that I have fixes for d3 operation.
The usual amount of rate scaling stuff - we now support STBC.
The other commit that stands out is Johannes's work on
devcoredump. He basically starts to use the standard
infrastructure he built."

Along with that are the usual sort of updates and such for ath9k,
brcmfmac, wil6210, and a handful of other bits here and there...

Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b9217266

Merge branch 'raw_probe_proto_opt' · e344458f

David S. Miller authored Nov 10, 2014

Herbert Xu says:

====================
ipv4: Simplify raw_probe_proto_opt and avoid reading user iov twice

This series rewrites the function raw_probe_proto_opt in a more
readable fasion, and then fixes the long-standing bug where we
read the probed bytes twice which means that what we're using to
probe may in fact be invalid.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e344458f

ipv4: Avoid reading user iov twice after raw_probe_proto_opt · c008ba5b

Herbert Xu authored Nov 07, 2014

Ever since raw_probe_proto_opt was added it had the problem of
causing the user iov to be read twice, once during the probe for
the protocol header and once again in ip_append_data.

This is a potential security problem since it means that whatever
we're probing may be invalid.  This patch plugs the hole by
firstly advancing the iov so we don't read the same spot again,
and secondly saving what we read the first time around for use
by ip_append_data.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

c008ba5b

ipv4: Use standard iovec primitive in raw_probe_proto_opt · 32b5913a

Herbert Xu authored Nov 07, 2014

The function raw_probe_proto_opt tries to extract the first two
bytes from the user input in order to seed the IPsec lookup for
ICMP packets.  In doing so it's processing iovec by hand and
overcomplicating things.

This patch replaces the manual iovec processing with a call to
memcpy_fromiovecend.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

32b5913a

net: Move bonding headers under include/net · 1ef8019b

David S. Miller authored Nov 10, 2014

This ways drivers like cxgb4 don't need to do ugly relative includes.
Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1ef8019b

cxgb4: Remove unnecessary struct in6_addr * casts · 4483589f

Joe Perches authored Nov 06, 2014

Just use the address of the in6_addr.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4483589f

Merge branch 'cxgb4-next' · c42e2533

David S. Miller authored Nov 10, 2014

Hariprasad Shenai says:

====================
RDMA/cxgb4,cxgb4vf,cxgb4i,csiostor: Cleanup macros

This series moves the debugfs code to a new file debugfs.c and cleans up
macros/register defines.

Various patches have ended up changing the style of the symbolic macros/register
defines and some of them used the macros/register defines that matches the
output of the script from the hardware team.

As a result, the current kernel.org files are a mix of different macro styles.
Since this macro/register defines is used by five different drivers, a
few patch series have ended up adding duplicate macro/register define entries
with different styles. This makes these register define/macro files a complete
mess and we want to make them clean and consistent.

Will post few more series so that we can cover all the macros so that they all
follow the same style to be consistent.

The patches series is created against 'net-next' tree.
And includes patches on cxgb4, cxgb4vf, iw_cxgb4, csiostor and cxgb4i driver.

We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.

V3: Use suffix instead of prefix for macros/register defines
V2: Changes the description and cover-letter content to answer David Miller's
question
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

c42e2533

cxgb4: Cleanup macros so they follow the same style and look consistent, part 2 · e2ac9628

Hariprasad Shenai authored Nov 07, 2014

Various patches have ended up changing the style of the symbolic macros/register
defines to different style.

As a result, the current kernel.org files are a mix of different macro styles.
Since this macro/register defines is used by different drivers a
few patch series have ended up adding duplicate macro/register define entries
with different styles. This makes these register define/macro files a complete
mess and we want to make them clean and consistent. This patch cleans up a part
of it.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e2ac9628

cxgb4: Cleanup macros so they follow the same style and look consistent · 6559a7e8

Hariprasad Shenai authored Nov 07, 2014

Various patches have ended up changing the style of the symbolic macros/register
to different style.

As a result, the current kernel.org files are a mix of different macro styles.
Since this macro/register defines is used by different drivers a
few patch series have ended up adding duplicate macro/register define entries
with different styles. This makes these register define/macro files a complete
mess and we want to make them clean and consistent. This patch cleans up a part
of it.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6559a7e8

cxgb4: Add cxgb4_debugfs.c, move all debugfs code to new file · fd88b31a
Hariprasad Shenai authored Nov 07, 2014
```
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
fd88b31a

mlx4: use napi_complete_done() · 1a288172

Eric Dumazet authored Nov 06, 2014

To enable gro_flush_timeout, a driver has to use napi_complete_done()
instead of napi_complete().

Tested:
 Ran 200 netperf TCP_STREAM from A to B (10Gbe mlx4 link, 8 RX queues)

Without this feature, we send back about 305,000 ACK per second.

GRO aggregation ratio is low (811/305 = 2.65 segments per GRO packet)

Setting a timer of 2000 nsec is enough to increase GRO packet sizes
and reduce number of ACK packets. (811/19.2 = 42)

Receiver performs less calls to upper stacks, less wakes up.
This also reduces cpu usage on the sender, as it receives less ACK
packets.

Note that reducing number of wakes up increases cpu efficiency, but can
decrease QPS, as applications wont have the chance to warmup cpu caches
doing a partial read of RPC requests/answers if they fit in one skb.

B:~# sar -n DEV 1 10 | grep eth0 | tail -1
Average:         eth0 811269.80 305732.30 1199462.57  19705.72      0.00
0.00      0.50

B:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout

B:~# sar -n DEV 1 10 | grep eth0 | tail -1
Average:         eth0 811577.30  19230.80 1199916.51   1239.80      0.00
0.00      0.50
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1a288172

net: gro: add a per device gro flush timer · 3b47d303

Eric Dumazet authored Nov 06, 2014

Tuning coalescing parameters on NIC can be really hard.

Servers can handle both bulk and RPC like traffic, with conflicting
goals : bulk flows want as big GRO packets as possible, RPC want minimal
latencies.

To reach big GRO packets on 10Gbe NIC, one can use :

ethtool -C eth0 rx-usecs 4 rx-frames 44

But this penalizes rpc sessions, with an increase of latencies, up to
50% in some cases, as NICs generally do not force an interrupt when
a packet with TCP Push flag is received.

Some NICs do not have an absolute timer, only a timer rearmed for every
incoming packet.

This patch uses a different strategy : Let GRO stack decides what do do,
based on traffic pattern.

Packets with Push flag wont be delayed.
Packets without Push flag might be held in GRO engine, if we keep
receiving data.

This new mechanism is off by default, and shall be enabled by setting
/sys/class/net/ethX/gro_flush_timeout to a value in nanosecond.

To fully enable this mechanism, drivers should use napi_complete_done()
instead of napi_complete().

Tested:
 Ran 200 netperf TCP_STREAM from A to B (10Gbe mlx4 link, 8 RX queues)

Without this feature, we send back about 305,000 ACK per second.

GRO aggregation ratio is low (811/305 = 2.65 segments per GRO packet)

Setting a timer of 2000 nsec is enough to increase GRO packet sizes
and reduce number of ACK packets. (811/19.2 = 42)

Receiver performs less calls to upper stacks, less wakes up.
This also reduces cpu usage on the sender, as it receives less ACK
packets.

Note that reducing number of wakes up increases cpu efficiency, but can
decrease QPS, as applications wont have the chance to warmup cpu caches
doing a partial read of RPC requests/answers if they fit in one skb.

B:~# sar -n DEV 1 10 | grep eth0 | tail -1
Average:         eth0 811269.80 305732.30 1199462.57  19705.72      0.00
0.00      0.50

B:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout

B:~# sar -n DEV 1 10 | grep eth0 | tail -1
Average:         eth0 811577.30  19230.80 1199916.51   1239.80      0.00
0.00      0.50
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3b47d303

rtnetlink: add babel protocol recognition · be955b29

Dave Taht authored Nov 06, 2014

Babel uses rt_proto 42. Add to userspace visible header file.
Signed-off-by: Dave Taht <dave.taht@bufferbloat.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

be955b29

09 Nov, 2014 1 commit

dccp: Convert DCCP_WARN to net_warn_ratelimited · c0560b9c

Joe Perches authored Nov 06, 2014

Remove the dependency on the "warning" sysctl (net_msg_warn)
which is only used by the LIMIT_NETDEBUG macro.

Convert the LIMIT_NETDEBUG use in DCCP_WARN to the more
common net_warn_ratelimited mechanism.

This still ratelimits based on the net_ratelimit()
function, but removes the check for the sysctl.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c0560b9c

07 Nov, 2014 14 commits

udp: Increment UDP_MIB_IGNOREDMULTI for arriving unmatched multicasts · 36cbb245

Rick Jones authored Nov 06, 2014

As NIC multicast filtering isn't perfect, and some platforms are
quite content to spew broadcasts, we should not trigger an event
for skb:kfree_skb when we do not have a match for such an incoming
datagram.  We do though want to avoid sweeping the matter under the
rug entirely, so increment a suitable statistic.

This incorporates feedback from David L. Stevens, Karl Neiss and Eric
Dumazet.

V3 - use bool per David Miller
Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

36cbb245

cdc-ether: implement MULTICAST flag on the device · f46ad73a

Oliver Neukum authored Nov 06, 2014

Olivier having laid the groundwork this patch transmits the
multicast flag to the device to save some bus traffic.
Signed-off-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

f46ad73a

uapi: resort Kbuild entries · df32dd20

stephen hemminger authored Nov 03, 2014

The entries in the Kbuild files are incorrectly sorted.
Matters for aesthetics only.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

df32dd20

stmmac: platform: fix sparse warnings · f10f9fb2

Andy Shevchenko authored Nov 07, 2014

This patch fixes the following sparse warnings. One is fixed by casting return
value to a return type of the function. The others by creating a specific
stmmac_platform.h which provides the bits related to the platform driver.

drivers/net/ethernet/stmicro/stmmac/dwmac-meson.c:59:29: warning: incorrect type in return expression (different address spaces)
drivers/net/ethernet/stmicro/stmmac/dwmac-meson.c:59:29: expected void *
drivers/net/ethernet/stmicro/stmmac/dwmac-meson.c:59:29: got void [noderef] <asn:2>*reg

drivers/net/ethernet/stmicro/stmmac/dwmac-meson.c:64:29: warning: symbol 'meson6_dwmac_data' was not declared. Should it be static?
drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c:354:29: warning: symbol 'stih4xx_dwmac_data' was not declared. Should it be static?
drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c:361:29: warning: symbol 'stid127_dwmac_data' was not declared. Should it be static?
drivers/net/ethernet/stmicro/stmmac/dwmac-sunxi.c:133:29: warning: symbol 'sun7i_gmac_data' was not declared. Should it be static?
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f10f9fb2

stmmac: remove custom implementation of print_hex_dump() · 424c4f78

Andy Shevchenko authored Nov 07, 2014

There is a kernel helper to dump buffers in a hexdecimal format. This patch
substitutes the open coded function by calling that helper.

The output is slightly changed:
 - no lead space
 - ASCII part will be printed along with the dump
 - offset is longer than 3 characters (now 8)
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

424c4f78

Merge branch 'iov_iter' · 7f6d4670

David S. Miller authored Nov 07, 2014

Herbert Xu says:

====================
Replace skb_copy_datagram_const_iovec with iterator version

This patch series adds the helper skb_copy_datagram_iter, which
is meant to replace both skb_copy_datagram_iovec and its evil
twin skb_copy_datagram_const_iovec.

It then converts tun and macvtap over to the new helper and finally
removes skb_copy_datagram_const_iovec which is only used by tun
and macvtap.

The copy_to_iter return value issue pointed out by Al has now been
fixed.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

7f6d4670

net: Kill skb_copy_datagram_const_iovec · bfe1be38

Herbert Xu authored Nov 07, 2014

Now that both macvtap and tun are using skb_copy_datagram_iter, we
can kill the abomination that is skb_copy_datagram_const_iovec.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

bfe1be38

macvtap: Use iovec iterators · 6c36d2e2

Herbert Xu authored Nov 07, 2014

This patch removes the use of skb_copy_datagram_const_iovec in
favour of the iovec iterator-based skb_copy_datagram_iter.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

6c36d2e2

tun: Use iovec iterators · e0b46d0e

Herbert Xu authored Nov 07, 2014

This patch removes the use of skb_copy_datagram_const_iovec in
favour of the iovec iterator-based skb_copy_datagram_iter.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

e0b46d0e

inet: Add skb_copy_datagram_iter · a8f820aa

Herbert Xu authored Nov 07, 2014

This patch adds skb_copy_datagram_iter, which is identical to
skb_copy_datagram_iovec except that it operates on iov_iter
instead of iovec.

Eventually all users of skb_copy_datagram_iovec should switch
over to iov_iter and then we can remove skb_copy_datagram_iovec.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

a8f820aa

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 4e84b496
David S. Miller authored Nov 06, 2014

4e84b496

vxlan: Fix to enable UDP checksums on interface · 5c91ae08

Tom Herbert authored Nov 06, 2014

Add definition to vxlan nla_policy for UDP checksum. This is necessary
to enable UDP checksums on VXLAN.

In some instances, enabling UDP checksums can improve performance on
receive for devices that return legacy checksum-unnecessary for UDP/IP.
Also, UDP checksum provides some protection against VNI corruption.

Testing:

Ran 200 instances of TCP_STREAM and TCP_RR on bnx2x.

TCP_STREAM
  IPv4, without UDP checksums
      14.41% TX CPU utilization
      25.71% RX CPU utilization
      9083.4 Mbps
  IPv4, with UDP checksums
      13.99% TX CPU utilization
      13.40% RX CPU utilization
      9095.65 Mbps

TCP_RR
  IPv4, without UDP checksums
      94.08% TX CPU utilization
      156/248/462 90/95/99% latencies
      1.12743e+06 tps
  IPv4, with UDP checksums
      94.43% TX CPU utilization
      158/250/462 90/95/99% latencies
      1.13345e+06 tps
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5c91ae08

Merge branch 'amd-xgbe-next' · a1f5313c

David S. Miller authored Nov 06, 2014

Tom Lendacky says:

====================
amd-xgbe: AMD XGBE driver updates 2014-11-06

The following series of patches fixes a couple of bugs that slipped
through my last series.

- Free channel structure after freeing the per channel interrupts
- If an skb error allocation occurs during receive processing check
  whether more descriptors are associated with the packet or whether
  to start on a new packet

This patch series is based on net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

a1f5313c

amd-xgbe: Check for complete packet on skb allocation error · f5eecbbe

Lendacky, Thomas authored Nov 06, 2014

If the skb allocation fails during receive processing, the driver would
continue reading descriptors without first determining if there were
any more descriptors for the current packet. Update the code to check
whether more descriptors are associated with the current packet or
whether to move on to the next descriptor as a new packet.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f5eecbbe