Commits · a16a3361927f75e6032df110920c83ca6c5045d3 · nexedi / linux

08 Jul, 2014 40 commits

enic: fix return values in enic_set_coalesce · a16a3361

Govindarajulu Varadarajan authored Jul 02, 2014

enic_set_coalesce() has two problems.

* It should return -EINVAL and not -EOPNOTSUPP for invalid coalesce values.

* In case of MSIX, enic_set_coalesce return error after applying requested
  coalescing setting partially. We should either apply all the setting requeste
  and return success or apply non and return error.

* This patch also simplifies the algo.

This was introduced by
'7c2ce6e6 enic: Add support for adaptive interrupt coalescing'

These changes were suggested by Ben Hutchings here
http://www.spinics.net/lists/netdev/msg283972.html

Also change enic driver version.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a16a3361

bonding: remove no longer relevant vlan warnings · e721f87d

Jiri Pirko authored Jul 02, 2014

These warnings are no longer relevant. Even when last slave is
removed, there is a valid address assigned to bond (random).
The correct functionality of vlans is ensured by maintaining unicast
list in vlan_sync_address().
Suggested-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e721f87d

Merge branch 'at86rf230-next' · 7cb9e6bf

David S. Miller authored Jul 07, 2014

Alexander Aring says:

====================
at86rf230: rework driver implementation

this patch series includes a rework of the at86rf230 driver.

There are several changes:

 - Add regmap support.
 - Merge at86rf212 operations with generic at86rf2xx operations, all chips
   supports these operations.
 - Drop of irqworker. This is a workqueue which will scheduled by an irq to
   handle synchronous spi handling. Instead using asynchronous spi handling,
   then no scheduler is involved at irq handling.
 - Also detected some bugs by receiving frame like CRC can be correct and a
   802.15.4 frame length could be above 127 bytes. This would crash the whole
   kernel (but should be handled by the mac layer). Another bug is the handling
   with RX_SAFE_MODE which protect the frame buffer after a readout. This is
   currently not working because we read out the buffer twice and the first one
   to get the frame size. Solution is to readout always the whole frame buffer.
 - Added some timing relevants things from the datasheet for state changes And
   IEEE 802.15.4 standard like interframe spacing. Interframe spacing is needed
   to insert some receiving space time between frame transmitting. This should be
   also handled by MAC layer, but it's currently a workaround to add this inside
   the driver layer.
 - Add some callback setting for chip specific handling, instead of runtime decisions
   if (is_chip_type()). Callbacks are set only once at probe time.
 - We don't using a force state change anymore. A force state change will do a
   abort of receiving frames while we want to transmit a new frame. This should
   decrease the drop rate of packets.
 - And many others changes and bug fixes...

changes since v3:
 - fix irq polarity in patch ("at86rf230: rework irq_pol setting").

changes since v2:
 - add check if necessary functions are implemented when hw flags are set in patch
   ("mac802154: at86rf230: add hw flags and merge ops"). I choosed the second variant.
 - remove unnecessary includes for workqueue and mutex in patch
   ("at86rf230: rework transmit and receive").
 - remove unnecessary cast in patch ("at86rf230: rework transmit and receive").
 - acivate regmap cache with REGCACHE_RBTREE in patch
   ("at86rf230: add regmap support").
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

7cb9e6bf

at86rf230: add new author · 01ebd60b

Alexander Aring authored Jul 03, 2014

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

01ebd60b

at86rf230: add sleep cycle timing · 7a4ef918

Alexander Aring authored Jul 03, 2014

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7a4ef918

at86rf230: add timing for channel switch · 984e0c68

Alexander Aring authored Jul 03, 2014

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

984e0c68

at86rf230: rework reset to trx_off state change · 09e536cd

Alexander Aring authored Jul 03, 2014

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

09e536cd

at86rf230: rework state change and start/stop · 2e0571c0

Alexander Aring authored Jul 03, 2014

This patch removes the current synchron state change function and add a
new function for a state assert. Change the start and stop callbacks to
use this new synchron state change behaviour. It's a wrapper around the
async state change function.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2e0571c0

at86rf230: rework irq_pol setting · 1db0558e

Alexander Aring authored Jul 03, 2014

This patch rework the irq_pol register setting for rising and falling
interrupt settings only. The default behaviour should be rising flag.

Also use IRQ_TYPE_* defines instead of IRQF_* defines. There is no
functionality change but irq_get_trigger_type returns IRQ_TYPE_* defines.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1db0558e

at86rf230: move RX_SAFE_MODE setting to hw_init · 6bd2b132

Alexander Aring authored Jul 03, 2014

There is no need to set this bit in start callback which could be
called more than once.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6bd2b132

at86rf230: rework transmit and receive handling · 1d15d6b5

Alexander Aring authored Jul 03, 2014

This patch is a complete reimplementation of transmit and receive
handling for the at86rf230 driver.

It solves also six bugs:

First:

The RX_SAFE_MODE is enabled and the transceiver doesn't leave the
receive state while the framebuffer isn't read by a CMD_FB command.
This is useful to read out the frame and don't get into another receive
or transmit state, otherwise the frame would be overwritten.
The current driver do twice CMD_FB calls, the first one leaves this
protection.

Second:

Sometimes the CRC calculation is correct and the length field is greater
127. The current mac802154 layer and filter of a at86rf2xx doesn't check
on this and the kernel crashes. In this case the frame is corrupted, we
send the whole receive buffer to the next layer which can be useful for
sniffing.

Thrid:
There is a undocumented race condition. When we are go into the
RX_AACK_ON state the transceiver could be changed into RX_AACK_BUSY
state. This is a normal behaviour. In this case the transceiver received
a SHR while assert wasn't finished.

Fourth:
It also handle some more "correct" state changes. In aret mode the
transceiver need to go to TX_ON before the transceiver go into
RX_AACK_ON.

Fifth:
The programming model [0] describes also a error handling in ARET mode
if the trac status is different than zero. This is patch adds support
for handling this.

Sixth:
In receive handling the transceiver should also get the trac status
according [0]. The driver could use the trac status as error statistic
handling, but the driver doesn't use this currently. There is maybe some
timing behaviour or the read of this register change some transceiver
states.

In addition the irqworker is removed. Instead we do async spi calls and
no scheduling is involved anymore. The transmit function is also
asynchron but with a wait_for_completion handling. The mac802154 layer
doesn't support asynchron transmit handling right now.

The state change behaviour is now changes, before it was:

1. assert while(!STATE_TRANSITION_IN_PROGRESS)
2. state change
3. assert while(!STATE_TRANSITION_IN_PROGRESS)
4. assert once(wanted state != current state)

Sometimes a unexcepted state change occurs when 4. assert was violated.
The new state change behaviour is:

1. assert while(!STATE_TRANSITION_IN_PROGRESS)
2. state change
3. wait state change timing according datasheet
4. assert once(wanted state != current state)

This behaviour is described in the at86rf231 software programming model [0].
The state change documentation in this programming guide should also valid for
at86rf212 and at86rf233 chips.

The transceiver don't do a FORCE_TX_ON while we want to transmit a PDU.
The new behaviour is a TX_ON and wait a receiving time (tFrame + tPAck).
If we are still in RX_AACK_BUSY then we transmit a FORCE_TX_ON as timeout
handling. The different is that FORCE_TX_ON aborts receiving and TX_ON
waits if RX_AACK_BUSY is finished. This should decrease the drop rate of
packets.

[0] http://www.atmel.com/Images/AVR2022_swpm231-2.0.zipSigned-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1d15d6b5

at86rf230: add support for at86rf23x desense · a7d7eda9

Alexander Aring authored Jul 03, 2014

To set the CCA_ED_THRES register the calculation for at86rf23x is
different than for at86rf212. This patch adds a new callback for this
calculation in chip data struct.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a7d7eda9

at86rf230: remove is212 and add driver data · a53d1f7c

Alexander Aring authored Jul 03, 2014

This patch adds a new at86rf2xx_chip_data structure which holds device
specific attributes. Instead of runtime decisions "if (is212())" we set
callbacks/attributes while device detection.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a53d1f7c

at86rf230: rework detect device handling · c8ee0f56

Alexander Aring authored Jul 03, 2014

This patch drops the current lowlevel spi calls for the detect device
function instead we handle this via regmap. Also put the detection of
in a seperate function and set all device specific attributes while detection.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c8ee0f56

at86rf230: add regmap support · f76014f7

Alexander Aring authored Jul 03, 2014

This patch adds regmap support for the at86rf230 driver and drop the
lowlevel spi access functions and use the regmap access functions.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f76014f7

mac802154: at86rf230: add hw flags and merge ops · 640985ec

Alexander Aring authored Jul 03, 2014

This patch adds new mac802154 hw flags for transmit power, csma and
listen before transmit (lbt). These flags indicates that the transceiver
supports these features. If the flags are set and the driver doesn't
implement the necessary functions, then ieee802154_register_device
returns -ENOSYS "Function not implemented".

This patch merges also all at86rf230 operations into one operations structure
and set the right hw flags for the at86rf230 transceivers.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

640985ec

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next · 1598c36a

David S. Miller authored Jul 07, 2014

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2014-07-02

This series contains updates to i40e and i40evf.

Anjali fixes a possible race where we were trying to free the dummy packet
buffer in the function that created it, so cleanup the dummy packet buffer
in i40e_clean_tx_ring() instead.  Also fixes an issue where the filter
program routine was not checking if there were descriptors available for
programming a filter.

Mitch fixes unnecessary delays when sending the admin queue commands by
moving a declaration up one level so we do not dereference it out of scope.
Fixes an issue with the VF where if the admin queue interrupts get lost for
some reason, the VF communication will stall as the VFs have no way of
reaching the PF.  To alleviate this condition, go ahead and check the ARQ
every time we run the service task.  Updates i40evf to allow the watchdog
to fire vector 0 via software, which makes the driver tolerant of dropped
interrupts on that vector.

Paul fixes a shifted '1' to be unsigned to avoid shifting a signed integer.

Jesse disables TPH by default since it is currently not enabled in the
current hardware.  Also finishes the i40e implementation of get_settings
for ethtool.

Catherine adds a new variable (hw.phy.link_info.an_enabled) to track whether
auto-negotiation is enabled, along with the functionality to update the
variable.  Adds the functionality to set the requested flow control mode.
Adds i40e implementation of setpauseparam and set_settings to ethtool.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

1598c36a

Merge branch 'fec-next' · dbaaca81

David S. Miller authored Jul 07, 2014

Russell King says:

====================
Freescale ethernet driver updates

Here's the first batch of patches for the Freescale FEC ethernet driver.
They require the previously applied "net: fec: Don't clear IPV6 header
checksum field when IP accelerator enable" patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

dbaaca81

net: fec: fix missing kmalloc() failure check in fec_enet_alloc_buffers() · ffdce2cc

Russell King authored Jul 08, 2014

fec_enet_alloc_buffers() assumes that kmalloc() will never fail, which
is an invalid assumption.  Fix this by implementing a common error
cleanup path, and use it to also clean up after failed bounce buffer
allocation.
Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

ffdce2cc

net: fec: ensure fec_enet_free_buffers() properly cleans the rings · 8b7c9efa

Russell King authored Jul 08, 2014

Ensure that we do not double-free any allocations, and that any transmit
skbuffs are properly freed when we clean up the rings.
Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

8b7c9efa

net: fec: clean up transmit descriptor setup · d6bf3143

Russell King authored Jul 08, 2014

Avoid writing any state until we're certain we can proceed with the
transmission: this avoids writing mapping error address values to the
descriptors, or setting the skbuff pointer until we have successfully
mapped the skb.
Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

d6bf3143

net: fec: make rx skb handling more robust · 730ee360

Russell King authored Jul 08, 2014

Allocate, and then map the receive skb before writing any data to the
ring descriptor or storing the skb.  When freeing the receive ring
entries, unmap and free the skb, and then clear the stored skb pointer.

This means we have ring data and skb pointer in one of two states:
either both fully setup, or nothing setup.

This simplifies the cleanup, as we can use just the skb pointer to
indicate whether the descriptor is setup, and thus avoids potentially
calling dma_unmap_single() on a DMA error value.
Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

730ee360

net: fec: remove useless fep->opened · 5d165c55

Russell King authored Jul 08, 2014

napi_disable() waits until the NAPI processing has completed, and then
prevents any further polls. At this point, the driver then clears
fep->opened. The NAPI poll function uses this to stop processing in
the receive path. Hence, it will never see this variable cleared,
because the NAPI poll has to complete before it will be cleared.

Therefore, this variable serves no purpose, so let's remove it.
Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

5d165c55

net: fec: stop the phy before shutting down the MAC · d76cfae9

Russell King authored Jul 08, 2014

When the network interface goes down, stop the phy to prevent further
link up status changes before taking the MAC or netif sections down.
This prevents further reception of link up events which could
potentially call fec_restart().

Since phy_stop() takes the mutex which adjust_link() runs under, we
also ensure that adjust_link() will not already be processing a link
up event.

We also need to do this when suspending as well - we don't want a
mis-timed phy state change to restart the MAC after we have stopped
it for suspend, and thus need to restart the phy when resuming.
Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

d76cfae9

net: fec: ensure that a disconnected phy isn't configured · 0b146ca8

Russell King authored Jul 08, 2014

When we disconnect from a phy, we should forget our pointer to it so we
don't accidentally try to configure it.  We handle a NULL phy pointer
correctly in most places, except fec_enet_set_pauseparam().  Fix this
too.
Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

0b146ca8

net: fec: remove checking for NULL phy_dev in fec_enet_close() · 635cf17c

Russell King authored Jul 08, 2014

fep->phy_dev can not be NULL here for two reasons:
- fec_enet_open() will have successfully connected the phy, or will have
  failed.
- fec_enet_open() will have called phy_start(fep->phy_dev), which
  unconditionally dereferences this pointer.

If it were to be NULL here, then fec_enet_open() will have already
oopsed.
Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

635cf17c

net: fec: use netif_tx_disable() rather than netif_stop_queue() · b49cd504

Russell King authored Jul 08, 2014

We use netif_stop_queue() in several places where we want to ensure that
the start_xmit function is not running.  netif_stop_queue() is not
sufficient to achieve that - it merely sets a flag to indicate that the
transmit queue(s) should not be run.

netif_tx_disable() gives this guarantee, since it takes the transmit
queue lock while marking the queue stopped.  This will wait for the
transmit function to complete before returning.
Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

b49cd504

net: fec: fix interrupt handling races · 7a16807c

Russell King authored Jul 08, 2014

While running: while :; do iperf -c <HOST> -P 4; done, transmit timeouts
are regularly reported.  With the tx ring dumping in place, we can see
that all entries are in use, and the hardware has finished transmitting
these packets.  However, the driver has not reclaimed these ring
entries.

This can occur if the interrupt handler is invoked at the wrong moment -
eg:

	CPU0				CPU1
	fec_enet_tx()
					interrupt, IEVENT = FEC_ENET_TXF
					FEC_ENET_TXF cleared
					napi_schedule_prep()
	napi_complete()

The result is that we clear the transmit interrupt, but we don't trigger
any cleaning of the transmit ring.  Instead, use a different strategy:

- When receiving a transmit or receive interrupt, disable both tx and rx
  interrupts, but do not acknowledge them.  Schedule a napi poll.  Don't
  loop.

- When we are polled, read IEVENT, acknowledging the pending transmit
  and receive interrupts, before then going on to process the
  appropriate rings.

This allows us to avoid the race, and has a number of other advantages:
- we cut down on the number of transmit interrupts we have to process.
- we only look at the rings which have pending events.
- we gain additional throughput: the iperf total bandwidth increases
  from about 180Mbps to 240Mbps:

[  3]  0.0-10.0 sec  68.1 MBytes  57.0 Mbits/sec
[  5]  0.0-10.0 sec  72.4 MBytes  60.5 Mbits/sec
[  4]  0.0-10.1 sec  76.1 MBytes  63.5 Mbits/sec
[  6]  0.0-10.1 sec  71.9 MBytes  59.9 Mbits/sec
[SUM]  0.0-10.1 sec   288 MBytes   241 Mbits/sec
Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

7a16807c

net: fec: fix ethtool set_pauseparam duplex bug · 9671a42e

Russell King authored Jul 08, 2014

Setting the pause parameters causes a running network interface to be
restarted.  However, the restart forces the FEC into half-duplex mode,
whether or not the remote end is in half-duplex mode.  Misconfigured
duplex mode is a known source of problems on a link.

Fix this by always preserving the duplex mode on configuration changes.
Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

9671a42e

net: fec: iMX6 FEC does not support half-duplex gigabit · b44592ff

Russell King authored Jul 08, 2014

The iMX6 gigabit FEC does not support half-duplex gigabit operation.
Phys attacked to the FEC may support this, and we currently do nothing
to disable this feature.  This may result in an invalid configuration.
Mask out phy support for gigabit half-duplex operation.
Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

b44592ff

Merge branch 'net-hash-tx' · 6c035ea0

David S. Miller authored Jul 07, 2014

Tom Herbert says:

====================
net: Improvements and applications of packet flow hash in transmit path

This patch series includes some patches which improve and make use
of skb->hash in the transmit path.

What is included:

- Infrastructure to save a precomputed hash in the sock structure.
  For connected TCP and UDP sockets we only need to compute the
  flow hash once and not once for every packet.
- Call skb_get_hash in get_xps_queue and __skb_tx_hash. This eliminates
  the awkward access to skb->sk->sk_hash in the lower transmit path.
- Move UDP source port generation into a common function in udp.h This
  implementation is mostly based on vxlan_src_port.
- Use non-zero IPv6 flow labels in flow_dissector as port information
  for flow hash calculation.
- Implement automatic flow label generation on transmit (per RFC 6438).
- Don't repeatedly try to compute an L4 hash in skb_get_hash if we've
  already tried to find one in software stack calculation.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

6c035ea0

net: Only do flow_dissector hash computation once per packet · a3b18ddb

Tom Herbert authored Jul 01, 2014

Add sw_hash flag to skbuff to indicate that skb->hash was computed
from flow_dissector. This flag is checked in skb_get_hash to avoid
repeatedly trying to compute the hash (ie. in the case that no L4 hash
can be computed).
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a3b18ddb

ipv6: Implement automatic flow label generation on transmit · cb1ce2ef

Tom Herbert authored Jul 01, 2014

Automatically generate flow labels for IPv6 packets on transmit.
The flow label is computed based on skb_get_hash. The flow label will
only automatically be set when it is zero otherwise (i.e. flow label
manager hasn't set one). This supports the transmit side functionality
of RFC 6438.

Added an IPv6 sysctl auto_flowlabels to enable/disable this behavior
system wide, and added IPV6_AUTOFLOWLABEL socket option to enable this
functionality per socket.

By default, auto flowlabels are disabled to avoid possible conflicts
with flow label manager, however if this feature proves useful we
may want to enable it by default.

It should also be noted that FreeBSD has already implemented automatic
flow labels (including the sysctl and socket option). In FreeBSD,
automatic flow labels default to enabled.

Performance impact:

Running super_netperf with 200 flows for TCP_RR and UDP_RR for
IPv6. Note that in UDP case, __skb_get_hash will be called for
every packet with explains slight regression. In the TCP case
the hash is saved in the socket so there is no regression.

Automatic flow labels disabled:

  TCP_RR:
    86.53% CPU utilization
    127/195/322 90/95/99% latencies
    1.40498e+06 tps

  UDP_RR:
    90.70% CPU utilization
    118/168/243 90/95/99% latencies
    1.50309e+06 tps

Automatic flow labels enabled:

  TCP_RR:
    85.90% CPU utilization
    128/199/337 90/95/99% latencies
    1.40051e+06

  UDP_RR
    92.61% CPU utilization
    115/164/236 90/95/99% latencies
    1.4687e+06
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cb1ce2ef

flow_dissector: Use IPv6 flow label in flow_dissector · 19469a87

Tom Herbert authored Jul 01, 2014

This patch implements the receive side to support RFC 6438 which is to
use the flow label as an ECMP hash. If an IPv6 flow label is set
in a packet we can use this as input for computing an L4-hash. There
should be no need to parse any transport headers in this case.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

19469a87

vxlan: Call udp_flow_src_port · 535fb8d0

Tom Herbert authored Jul 01, 2014

In vxlan and OVS vport-vxlan call common function to get source port
for a UDP tunnel. Removed vxlan_src_port since the functionality is
now in udp_flow_src_port.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

535fb8d0

udp: Add function to make source port for UDP tunnels · b8f1a556

Tom Herbert authored Jul 01, 2014

This patch adds udp_flow_src_port function which is intended to be
a common function that UDP tunnel implementations call to set the source
port. The source port is chosen so that a hash over the outer headers
(IP addresses and UDP ports) acts as suitable hash for the flow of the
encapsulated packet. In this manner, UDP encapsulation works with RSS
and ECMP based wrt the inner flow.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b8f1a556

net: Call skb_get_hash in get_xps_queue and __skb_tx_hash · 0e001614

Tom Herbert authored Jul 01, 2014

Call standard function to get a packet hash instead of taking this from
skb->sk->sk_hash or only using skb->protocol.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0e001614

net: Save TX flow hash in sock and set in skbuf on xmit · b73c3d0e

Tom Herbert authored Jul 01, 2014

For a connected socket we can precompute the flow hash for setting
in skb->hash on output. This is a performance advantage over
calculating the skb->hash for every packet on the connection. The
computation is done using the common hash algorithm to be consistent
with computations done for packets of the connection in other states
where thers is no socket (e.g. time-wait, syn-recv, syn-cookies).

This patch adds sk_txhash to the sock structure. inet_set_txhash and
ip6_set_txhash functions are added which are called from points in
TCP and UDP where socket moves to established state.

skb_set_hash_from_sk is a function which sets skb->hash from the
sock txhash value. This is called in UDP and TCP transmit path when
transmitting within the context of a socket.

Tested: ran super_netperf with 200 TCP_RR streams over a vxlan
interface (in this case skb_get_hash called on every TX packet to
create a UDP source port).

Before fix:

  95.02% CPU utilization
  154/256/505 90/95/99% latencies
  1.13042e+06 tps

  Time in functions:
    0.28% skb_flow_dissect
    0.21% __skb_get_hash

After fix:

  94.95% CPU utilization
  156/254/485 90/95/99% latencies
  1.15447e+06

  Neither __skb_get_hash nor skb_flow_dissect appear in perf
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b73c3d0e

flow_dissector: Abstract out hash computation · 5ed20a68

Tom Herbert authored Jul 01, 2014

Move the hash computation located in __skb_get_hash to be a separate
function which takes flow_keys as input. This will allow flow hash
computation in other contexts where we only have addresses and ports.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5ed20a68

Merge branch 'systemport-next' · 081a20ff

David S. Miller authored Jul 07, 2014

Florian Fainelli says:

====================
net: systemport: PM and Wake-on-LAN support

This patchset brings Power Management and Wake-on-LAN support to the
Broadcom SYSTEM PORT driver.

S2 and S3 modes are supported, while we only support Wake-on-LAN using
MagicPackets for now
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

081a20ff