Commits · ae21888f9ef34fc2584b6caceb93f0b496dd21d5 · Kirill Smelkov / linux

22 May, 2014 21 commits

Documentation: devicetree: net: refer to fixed-link.txt · ae21888f

Florian Fainelli authored May 22, 2014

Update the Freescale TSEC PHY, Broadcom GENET & SYSTEMPORT Device Tree
binding documentation to refer to the fixed-link Device Tree binding in
fixed-link.txt.
Reviewed-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ae21888f

Documentation: devicetree: add old and deprecated 'fixed-link' · 91c1d980

Florian Fainelli authored May 22, 2014

Update the fixed-link Device Tree binding documentation to contain
information about the old and deprecated 5-digit 'fixed-link' property.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

91c1d980

Merge branch 'vlan_features' · a33fa47e

David S. Miller authored May 22, 2014

Michal Kubecek says:

====================
net: device features handling fixes

> I think we need to think more about what exact behavior is desired in
> mixed csum feature set cases.  Probably whatever csum offload type is
> the most prevelant should be the one advertised by the master.  It
> means we will do that software checksum fixup for the oddball slaves,
> but it seems the best we can do.

I've been thinking about it some more and I believe what I proposed is
correct: features that are or-ed (ONE_FOR_ALL) should start with 0,
those which are and-ed (ALL_FOR_ALL) with 1.

But once we do that, we have to fix another problem in vlan module:
features of a vlan device are mostly computed as bitwise AND of features
and vlan_features of its underlying device. The problem is checksumming
features don't really behave like independent flags as their main
purpose is to be used in can_checksum_protocol() whose logic is that
HW_CSUM (GEN_CSUM) means "can checksum everything" so that it kind of
contains IP(V6)_CSUM functionality. Therefore intersection of HW_CSUM
and IP(V6)_CSUM should result in the latter.

An alternative approach would be to always set IP(V6)_CSUM once HW_CSUM
(any of GEN_CSUM) is set but as for long time people were taught not to
do that and we even have a warning for it, switching to the exact
opposite could cause a lot of problems.

> Also, like Jay, I agree that whatever we do in bonding needs to be
> done in team too and I'd like the bug fixed at the same time in both
> places.

Agreed. Added a similar patch for teaming driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

a33fa47e

teaming: fix vlan_features computing · 3625920b

Michal Kubeček authored May 20, 2014

__team_compute_features() uses netdev_increment_features() to
combine vlan_features of slaves into vlan_features of the team.
As netdev_increment_features() only adds most features and we
start with TEAM_VLAN_FEATURES, we can end up with features none
of the slaves provided.

Initialize vlan_features only with the flags which are both in
TEAM_VLAN_FEATURES and NETIF_F_ALL_FOR_ALL. Right now there is
no such feature so that we actually initialize vlan_features
with zero but stating it explicitely will make the code more
future proof.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>

3625920b

bonding: fix vlan_features computing · a9b3ace4

Michal Kubeček authored May 20, 2014

bond_compute_features() uses netdev_increment_features() to
combine vlan_features of slaves into vlan_features of the bond.
As netdev_increment_features() only adds most features and we
start with BOND_VLAN_FEATURES, we can end up with features none
of the slaves provided.

If there is at least one slave, initialize vlan_features only
with the flags in NETIF_F_ALL_FOR_ALL. Right now there is none
in BOND_VLAN_FEATURES but stating it explicitely will make the
code more future proof.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>

a9b3ace4

vlan: more careful checksum features handling · da08143b

Michal Kubeček authored May 20, 2014

When combining real_dev's features and vlan_features, simple
bitwise AND is used. This doesn't work well for checksum
offloading features as if one set has NETIF_F_HW_CSUM and the
other NETIF_F_IP_CSUM and/or NETIF_F_IPV6_CSUM, we end up with
no checksum offloading. However, from the logical point of view
(how can_checksum_protocol() works), NETIF_F_HW_CSUM contains
the functionality of NETIF_F_IP_CSUM and NETIF_F_IPV6_CSUM so
that the result should be IP/IPV6.

Add helper function netdev_intersect_features() implementing
this logic and use it in vlan_dev_fix_features().
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>

da08143b

net: fec: correct the MDIO clock source · 98a6eeb8

Nimrod Andy authored May 20, 2014

Since imx serials FEC/ENET MDIO clock source is internal ipg clock,
and "ahb" clock is defined as FEC/ENET bus clock, so the patch just
correct the fec driver MDIO clock source.
Signed-off-by: Fugang Duan <B38611@freescale.com>
Acked-by: Frank Li <frank.li@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

98a6eeb8

net: fec: optimize the clock management to save power · e8fcfcd5

Nimrod Andy authored May 20, 2014

Add below clock management to save fec power:
- After probe, disable all clocks incluing ipg, ahb, enet_out, ptp clock.
- Open ethx interface enable necessary clocks.
  Close ethx interface disable all clocks.

The patch also encapsulates the all enet clocks enable/disable to
.fec_enet_clk_enable(), which can reduce the repetitional code in
driver.
Signed-off-by: Fugang Duan <B38611@freescale.com>
Acked-by: Frank Li <Frank.li@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e8fcfcd5

Merge branch 'sw_tso' · fddb8872

David S. Miller authored May 22, 2014

Ezequiel Garcia says:

====================
net: Introduce a software TSO helper API

Here's a first proposal for a generic software TSO helper API, following
David's suggestion.

There are at least two drivers that currently implement some form of software
TSO: Solarflare network driver (drivers/net/ethernet/sfc) and Tilera GX
network driver (drivers/net/ethernet/tile/tilegx.c).

The rationale behind adding a generic API is to provide a boiler plate with the
segmentation and other common tasks, making this support easier to add in other
drivers.

When designing the API, I've considered mainly two design choices:

  1. Implement a series of callbacks that each driver would implement
     and the net core code would call to fill in descriptors and egress
     that data.

  2. Implement an API for drivers to use in a driver's specific tx_tso
     function. This functions would exhaust a sk_buff payload, and use the
     API as helper for building the headers and segmented data.

I've chosen (2), to avoid function pointers (which was Willy's concern) and
because it seemed less fragile. Of course, this is argueable.

The API is by no means complete, and lacks some features, however it allows
to support TSO in mv643xx_eth and mvneta network drivers with some very
good performance results. I've added this support as an example of the API
in action.

In particular the following needs some revisiting:

  1. IPv6 support is lacking.

  2. The required descriptor counting needs some verification. The current
     implementation might be too "sketchy". The tilegx one can be a good
     starting point.

  3. The implemenation assumes the hardware can compute the TCP and IP
     checksums for the built headers. However, some controllers may need
     some initial calculation (such as tilegx, for instance).

Despite this, I hope this proposal is good enough to trigger some discussion
and to check if I'm on the right track. Feedback is much appreciated!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

fddb8872

net: mv643xx_eth: Implement software TSO · 3ae8f4e0

Ezequiel Garcia authored May 19, 2014

Now that the TSO helper API has been introduced, this commit makes use
of it to add support for software TSO in this driver.

This feature allows to improve outbound throughput performance significantly.
Running iperf tests shows a 30% improvement, tested on a Kirkwood Openblocks
A6 board.

$ ethtool -K eth0 tso off
$ iperf -c 192.168.0.45 -t 3
------------------------------------------------------------
Client connecting to 192.168.0.45, TCP port 5001
TCP window size: 43.8 KByte (default)
------------------------------------------------------------
[  3] local 192.168.0.159 port 46389 connected with 192.168.0.45 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 3.0 sec   217 MBytes   607 Mbits/sec

$ ethtool -K eth0 tso on
$ iperf -c 192.168.0.45 -t 3
------------------------------------------------------------
Client connecting to 192.168.0.45, TCP port 5001
TCP window size: 43.8 KByte (default)
------------------------------------------------------------
[  3] local 192.168.0.159 port 46390 connected with 192.168.0.45 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 3.0 sec   336 MBytes   938 Mbits/sec

This commit is just an example of the usage of the TSO API, it works fine
but needs some more work. In particular, the descriptor unmapping path must
avoid unmapping the TSO headers.
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3ae8f4e0

net: mv643xx_eth: Use dma_map_single() to map the skb fragments · 69ad0dd7

Ezequiel Garcia authored May 19, 2014

Using dma_map_single() instead of skb_frag_dma_map() allows to unmap
all the descriptors using dma_unmap_single(). This change allows
to introduce software TSO in a less intrusive way.
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

69ad0dd7

net: mv643xx_eth: Factorize feature setting · 4d48d589

Ezequiel Garcia authored May 19, 2014

In order to ease the addition of new features, let's factorize the
feature list.
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4d48d589

net: mv643xx_eth: Avoid setting the initial TCP checksum · 84411f73

Ezequiel Garcia authored May 19, 2014

As specified in the datasheet, the driver can set the "L4Chk_Mode" flag
(bit 10) in the Tx descriptor command/status to specify that a frame is not
IP fragmented and that the controller is in charge of generating the TCP/IP
checksum. This must be used together with the "GL4chk" flag (bit 17).

These two flags allow to avoid setting the initial TCP checksum in the l4i_chk
field of the Tx descriptor, which is needed to support software TSO.
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

84411f73

net: mv643xx_eth: Factorize initial checksum and command preparation · 0a8fa933

Ezequiel Garcia authored May 19, 2014

Make the code more readable by moving the initial checksum setup
and the command/status preparation to its own function.
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0a8fa933

net: mvneta: Implement software TSO · 2adb719d

Ezequiel Garcia authored May 19, 2014

Now that the TSO helper API has been introduced, this commit makes use
of it to implement the TSO in this driver.

Using iperf to test and vmstat to check the CPU usage, shows a substantial
CPU usage drop when TSO is on (~15% vs. ~25%). HTTP-based tests performed
by Willy Tarreau have shown performance improvements.
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2adb719d

net: mvneta: Clean mvneta_tx() sk_buff handling · e19d2dda

Ezequiel Garcia authored May 19, 2014

Rework mvneta_tx() so that the code that performs the final handling
before a sk_buff is transmitted is done only if the numbers of fragments
processed if positive.

This is preparation work to add the support for software TSO.
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e19d2dda

net: mvneta: Factorize feature setting · 01ef26ca

Ezequiel Garcia authored May 19, 2014

In order to ease the addition of new features, let's factorize the
feature list.
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

01ef26ca

net: Add a software TSO helper API · e876f208

Ezequiel Garcia authored May 19, 2014

Although the implementation probably needs a lot of work, this initial API
allows to implement software TSO in mvneta and mv643xx_eth drivers in a not
so intrusive way.
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e876f208

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables · 8af750d7

David S. Miller authored May 22, 2014

Pablo Neira Ayuso says:

====================
Netfilter/nftables updates for net-next

The following patchset contains Netfilter/nftables updates for net-next,
most relevantly they are:

1) Add set element update notification via netlink, from Arturo Borrero.

2) Put all object updates in one single message batch that is sent to
   kernel-space. Before this patch only rules where included in the batch.
   This series also introduces the generic transaction infrastructure so
   updates to all objects (tables, chains, rules and sets) are applied in
   an all-or-nothing fashion, these series from me.

3) Defer release of objects via call_rcu to reduce the time required to
   commit changes. The assumption is that all objects are destroyed in
   reverse order to ensure that dependencies betweem them are fulfilled
   (ie. rules and sets are destroyed first, then chains, and finally
   tables).

4) Allow to match by bridge port name, from Tomasz Bursztyka. This series
   include two patches to prepare this new feature.

5) Implement the proper set selection based on the characteristics of the
   data. The new infrastructure also allows you to specify your preferences
   in terms of memory and computational complexity so the underlying set
   type is also selected according to your needs, from Patrick McHardy.

6) Several cleanup patches for nft expressions, including one minor possible
   compilation breakage due to missing mark support, also from Patrick.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

8af750d7

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next · 758bd61a

David S. Miller authored May 22, 2014

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates

This series contains updates to i40e and i40evf.

Shannon makes minor changes to the AdminQ interface to bring it up to
date.  Removes the hard coding of stats struct size in ethtool, in prep
for adding data fields which are configuration dependent.

Catherine removes some unused and unneeded PCI bus defines.

Jesse fixes the copyright headers and finishes up the removal of the PTP
Tx work functionality which allows us to rely on the Tx timesync interrupt.

Mitch provides a number of fixes and cleanups for i40e/i40evf based on
suggestions from Ben Hutchings.  First is to use a macro parameter for
ethtool stats instead of just assuming that a valid netdev variable
exists.  Second is not to tell ethtool that the VF can do 10GbaseT, when
it really has no idea what its link speed is, so set the supported value
to 0 instead.  Make the ethtool_ops structure constant since it is
extremely unlikely to change at runtime.  Ethtool consistently reports
0 values for our ITR settings because we never actually use them, so
fix this by setting the default values to the specified default values.

Greg avoids a compile error by wrapping the call to i40e_alloc_vfs() in
CONFIG_PCI_IOV because the function itself is wrapped in the same
conditional compile block.

Alexander Gordeev updates the driver to use the new pci_enable_msi_range()
and pci_enable_msix_range() or pci_enable_msi_exact() and
pci_enable_msix_exact().

Jean Sacren provides a fix where the wrong error code was being passed to
i40e_open().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

758bd61a

tcp: make cwnd-limited checks measurement-based, and gentler · ca8a2263

Neal Cardwell authored May 22, 2014

Experience with the recent e114a710 ("tcp: fix cwnd limited
checking to improve congestion control") has shown that there are
common cases where that commit can cause cwnd to be much larger than
necessary. This leads to TSO autosizing cooking skbs that are too
large, among other things.

The main problems seemed to be:

(1) That commit attempted to predict the future behavior of the
connection by looking at the write queue (if TSO or TSQ limit
sending). That prediction sometimes overestimated future outstanding
packets.

(2) That commit always allowed cwnd to grow to twice the number of
outstanding packets (even in congestion avoidance, where this is not
needed).

This commit improves both of these, by:

(1) Switching to a measurement-based approach where we explicitly
track the largest number of packets in flight during the past window
("max_packets_out"), and remember whether we were cwnd-limited at the
moment we finished sending that flight.

(2) Only allowing cwnd to grow to twice the number of outstanding
packets ("max_packets_out") in slow start. In congestion avoidance
mode we now only allow cwnd to grow if it was fully utilized.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ca8a2263

21 May, 2014 19 commits

wimax/i2400m: make return of 0 explicit · aff4b974

Julia Lawall authored May 20, 2014

Delete unnecessary local variable whose value is always 0 and that hides
the fact that the result is always 0.

A simplified version of the semantic patch that fixes this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@r exists@
local idexpression ret;
expression e;
position p;
@@

-ret = 0;
... when != ret = e
return
- ret
+ 0
  ;
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

aff4b974

net: filter: cleanup invocation of internal BPF · 5fe821a9

Alexei Starovoitov authored May 19, 2014

Kernel API for classic BPF socket filters is:

sk_unattached_filter_create() - validate classic BPF, convert, JIT
SK_RUN_FILTER() - run it
sk_unattached_filter_destroy() - destroy socket filter

Cleanup internal BPF kernel API as following:

sk_filter_select_runtime() - final step of internal BPF creation.
  Try to JIT internal BPF program, if JIT is not available select interpreter
SK_RUN_FILTER() - run it
sk_filter_free() - free internal BPF program

Disallow direct calls to BPF interpreter. Execution of the BPF program should
be done with SK_RUN_FILTER() macro.

Example of internal BPF create, run, destroy:

  struct sk_filter *fp;

  fp = kzalloc(sk_filter_size(prog_len), GFP_KERNEL);
  memcpy(fp->insni, prog, prog_len * sizeof(fp->insni[0]));
  fp->len = prog_len;

  sk_filter_select_runtime(fp);

  SK_RUN_FILTER(fp, ctx);

  sk_filter_free(fp);

Sockets, seccomp, testsuite, tracing are using different ways to populate
sk_filter, so first steps of program creation are not common.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5fe821a9

Merge branch 'enic-next' · 21ea04fa

David S. Miller authored May 21, 2014

Govindarajulu Varadarajan says:

====================
enic: Add adaptive coalescing interrupt support

This series add support for adaptive coalescing interrupt and updates
enic Maintainers.

v1->v2:
	* Add commit log
	* do vnic_intr_coalescing_timer_set only while enabling intr
	* use ktime_get instead of hrtimer
	* make enic_set_rx_coal_setting return type void
	* change func name enic_apply_int_moderation to enic_calc_int_moderation
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

21ea04fa

MAINTAINERS: Update enic maintainers · c327e8f4

Govindarajulu Varadarajan authored May 20, 2014

Cc: Sujith Sankar <ssujith@cisco.com>
Cc: Christian Benvenuti <benve@cisco.com>
Cc: Neel Patel <neepatel@cisco.com>
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c327e8f4

enic: Add support for adaptive interrupt coalescing · 7c2ce6e6

Sujith Sankar authored May 20, 2014

This patch adds support for adaptive interrupt coalescing.

For small pkts with low pkt rate, we can decrease the coalescing interrupt
dynamically which decreases the latency. This however increases the cpu
utilization. Based on testing with different coal intr and pkt rate we came up
with a table(mod_table) with rx_rate and coalescing interrupt value where we
get low latency without significant increase in cpu. mod_table table stores
the coalescing timer percentage value for different throughputs.

Function enic_calc_int_moderation() calculates the desired coalescing intr timer
value. This function is called in driver rx napi_poll. The actual value is set
by enic_set_int_moderation() which is called when napi_poll is complete. i.e
when we unmask the rx intr.

Adaptive coal intr is support only when driver is using msix intr. Because
intr is not shared.

Struct mod_range is used to store only the default adaptive coalescing intr
value.

Adaptive coal intr calue is calculated by

timer = range_start + ((rx_coal->range_end - range_start) *
		       mod_table[index].range_percent / 100);

rx_coal->range_end is the rx-usecs-high value set using ethtool.
range_start is rx-usecs-low, set using ethtool, if rx_small_pkt_bytes_cnt is
greater than 2 * rx_large_pkt_bytes_cnt. i.e small pkts are dominant. Else its
rx-usecs-low + 3.

Cc: Christian Benvenuti <benve@cisco.com>
Cc: Neel Patel <neepatel@cisco.com>
Signed-off-by: Sujith Sankar <ssujith@cisco.com>
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7c2ce6e6

vxge: Use time_before() · f6e92d10

Manuel Schölling authored May 19, 2014

To be future-proof and for better readability the time comparisons are modified
to use time_before() instead of plain, error-prone math.
Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

f6e92d10

ieee802154: Introduce the use of the managed version of kzalloc · 12b5c38f

Himangi Saraogi authored May 19, 2014

This patch moves data allocated using kzalloc to managed data allocated
using devm_kzalloc and cleans now unnecessary kfrees in probe and remove
functions. An explicit linux/device.h include is added to make sure
the devm_*() routine declarations are unambiguously available.

The following Coccinelle semantic patch was used for making the change:

@platform@
identifier p, probefn, removefn;
@@
struct platform_driver p = {
  .probe = probefn,
  .remove = removefn,
};

@prb@
identifier platform.probefn, pdev;
expression e, e1, e2;
@@
probefn(struct platform_device *pdev, ...) {
  <+...
- e = kzalloc(e1, e2)
+ e = devm_kzalloc(&pdev->dev, e1, e2)
  ...
?-kfree(e);
  ...+>
}

@rem depends on prb@
identifier platform.removefn;
expression e;
@@
removefn(...) {
  <...
- kfree(e);
  ...>
}
Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

12b5c38f

atm: idt77252: Remove redundant error check · 7e910357

Peter Senna Tschudin authored May 19, 2014

Remove double checks, convert printk to pr_warn, and move the call to
pr_warn to the first check. The simplified version of the coccinelle
semantic patch that find this issue is as follows:

// <smpl>
@@
expression E; identifier pr; expression list es;
@@
while(...){
...
-       if (E) break;
+       if (E){
+               pr(es);
+               break;
+       }
...
}
- if(E) pr(es);
// </smpl>

Tested by compilation only.
Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7e910357

ipv6: slight optimization in ip6_dst_gc · 14956643

Li RongQing authored May 19, 2014

entries is always greater than rt_max_size here, since if entries is less
than rt_max_size, the fib6_run_gc function will be skipped
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

14956643

net-tun: restructure tun_do_read for better sleep/wakeup efficiency · 9e641bdc

Xi Wang authored May 16, 2014

tun_do_read always adds current thread to wait queue, even if a packet
is ready to read. This is inefficient because both sleeper and waker
want to acquire the wait queue spin lock when packet rate is high.

We restructure the read function and use common kernel networking
routines to handle receive, sleep and wakeup. With the change
available packets are checked first before the reading thread is added
to the wait queue.

Ran performance tests with the following configuration:

 - my packet generator -> tap1 -> br0 -> tap0 -> my packet consumer
 - sender pinned to one core and receiver pinned to another core
 - sender send small UDP packets (64 bytes total) as fast as it can
 - sandy bridge cores
 - throughput are receiver side goodput numbers

The results are

baseline: 731k pkts/sec, cpu utilization at 1.50 cpus
 changed: 783k pkts/sec, cpu utilization at 1.53 cpus

The performance difference is largely determined by packet rate and
inter-cpu communication cost. For example, if the sender and
receiver are pinned to different cpu sockets, the results are

baseline: 558k pkts/sec, cpu utilization at 1.71 cpus
 changed: 690k pkts/sec, cpu utilization at 1.67 cpus
Co-authored-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Xi Wang <xii@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9e641bdc

net: tunnels - enable module autoloading · f98f89a0

Tom Gundersen authored May 15, 2014

Enable the module alias hookup to allow tunnel modules to be autoloaded on demand.

This is in line with how most other netdev kinds work, and will allow userspace
to create tunnels without having CAP_SYS_MODULE.
Signed-off-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

f98f89a0

i40e: fix passing wrong error code to i40e_open() · ce9ccb17

Jean Sacren authored May 01, 2014

The commit 6c167f58 ("i40e: Refactor and cleanup i40e_open(),
adding i40e_vsi_open()") introduced a new function i40e_vsi_open()
with the regression by a typo. Due to the commit, the wrong error
code would be passed to i40e_open(). Fix this error in
i40e_vsi_open() by turning the macro into a negative value so that
i40e_open() could return the pertinent error code correctly.

Fixes: 6c167f58 ("i40e: Refactor and cleanup i40e_open(), adding i40e_vsi_open()")
Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ce9ccb17

i40evf: Use pci_enable_msix_range() instead of pci_enable_msix() · fc2f2f5d

Alexander Gordeev authored Apr 28, 2014

As result of deprecation of MSI-X/MSI enablement functions
pci_enable_msix() and pci_enable_msi_block() all drivers
using these two interfaces need to be updated to use the
new pci_enable_msi_range()  or pci_enable_msi_exact()
and pci_enable_msix_range() or pci_enable_msix_exact()
interfaces.

Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: linux-pci@vger.kernel.org
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

fc2f2f5d

i40e: Check PCI_IOV config to avoid compile error · df805f62

Greg Rose authored Apr 04, 2014

The call to i40e_alloc_vfs needs to be wrapped in CONFIG_PCI_IOV because
the function itself is wrapped in the same conditional compile block.

Change-ID: I663c5f1b85e5cfba0b36da8966f7db1a034f408b
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

df805f62

i40e: remove Tx work for ptp · c0c8a202

Jesse Brandeburg authored Apr 04, 2014

The previous removal of the PTP Tx work functionality was
incomplete as noted by Jake Keller. This removal allows
us to rely on the Tx timesync interrupt.

CC: Jacob Keller <jacob.e.keller@intel.com>
Change-ID: Id4faaf275a3688053ebbf07bef08072f9fd11aa9
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

c0c8a202

i40e: Don't disable SR-IOV when VFs are assigned · 9e5634df

Mitch Williams authored Apr 04, 2014

When VFs are assigned to active VMs and we disable SR-IOV out from under them,
bad things happen. Currently, the VM does not crash, but the VFs lose all
resources and have no way to get them back.

Add an additional check for when the user is disabling through sysfs, and add a
comment to clarify why we check twice.

Change-ID: Icad78eef516e4e1e4a87874d59132bc3baa058d4
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

9e5634df

i40e: remove hardcode of stats struct size in ethtool · 31cd840e

Shannon Nelson authored Apr 04, 2014

Base the queue stats length on the queue stats struct rather than
assuming it is 2 fields.  This is in prep for adding data fields
which are configuration dependent.

Change-ID: I937f471f389d2e0f8cec733960c5d9a06b14f3ec
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

31cd840e

i40e/i40evf: control auto ITR through ethtool · 32f5f54a

Mitch Williams authored Apr 04, 2014

For all of our supported kernels, ethtool allows us to directly control
adaptive ITR instead of just faking it with an ITR value. Support this
capability so that user knows explicitly when ITR is being controlled
dynamically. Suggested by Ben Hutchings.

CC: Ben Hutchings <ben@decadent.org.uk>
Change-ID: Iae6b79c5db767a63d22ecd9a9c24acaff02a096e
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

32f5f54a

i40e/i40evf: set proper default for ITR registers · ca99eb99

Mitch Williams authored Apr 04, 2014

Ethtool consistently reports 0 values for our ITR settings because
we never actually set them. Fix this by setting the default values
to the specified default values.

Change-ID: I2832406a66f7140f2b1230945d6ff6cbf77467c8
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ca99eb99