- 22 May, 2014 29 commits
-
-
David S. Miller authored
Florian Fainelli says: ==================== net: of_phy_connect_fixed_link removal This patch set removes of_phy_connect_fixed_link() from the tree now that we have a better solution for dealing with fixed PHY (emulated PHY) devices for drivers that require them. First two patches update the 'fixed-link' Device Tree binding and drivers to refere to it. Patches 3 to 7 update the in-tree network drivers that use of_phy_connect_fixed_link() Patch 8 removes of_phy_connect_fixed_link Patch 9 removes the PowerPC code that parsed the 'fixed-link' property. Patch 9 can be merged via the net-next tree if the PowerPC folks ack it, but it really has to be merged after the first 8 patches in order to avoid breakage. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Fainelli authored
Parsing and registration of fixed PHY devices was needed with the use of of_phy_connect_fixed_link() because this function was using the designated PHY address identifier (first cell of the property) as the address to bind the PHY on the emulated bus. Since commit 3be2a49e ("of: provide a binding for fixed link PHYs") a new pair of functions has been introduced which allows for dynamic address allocation of these fixed PHY devices, but also parses the old 'fixed-link' 5-digit property. Registration of fixed PHY early in platform code was needed because we could not issue a fixed MDIO bus re-scan within network drivers. The fixed PHYs had to be registered before the network drivers would call of_phy_connect_fixed_link(). All of these caveats are solved now, such that we can safely remove of_add_fixed_phys() now. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Fainelli authored
All in-tree drivers have been converted to use the new pair of functions: of_is_fixed_phy_link() plus of_phy_register_fixed_link(), we can now safely remove of_phy_connect_fixed_link. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Fainelli authored
of_phy_connect_fixed_link() is becoming obsolete, and also required platform code to register the fixed PHYs at the specified addresses for those to be usable. Get rid of it and use the new of_phy_is_fixed_link() plus of_phy_register_fixed_link() helpers to transition over the new scheme. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Fainelli authored
of_phy_connect_fixed_link() is becoming obsolete, and also required platform code to register the fixed PHYs at the specified addresses for those to be usable. Get rid of it and use the new of_phy_is_fixed_link() plus of_phy_register_fixed_link() helpers to transition over the new scheme. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Fainelli authored
of_phy_connect_fixed_link() is becoming obsolete, and also required platform code to register the fixed PHYs at the specified addresses for those to be usable. Get rid of it and use the new of_phy_is_fixed_link() plus of_phy_register_fixed_link() helpers to transition over the new scheme. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Fainelli authored
of_phy_connect_fixed_link() is becoming obsolete, and also required platform code to register the fixed PHYs at the specified addresses for those to be usable. Get rid of it and use the new of_phy_is_fixed_link() plus of_phy_register_fixed_link() helpers to transition over the new scheme. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Fainelli authored
of_phy_connect_fixed_link() is becoming obsolete, and also required platform code to register the fixed PHYs at the specified addresses for those to be usable. Get rid of it and use the new of_phy_is_fixed_link() plus of_phy_register_fixed_link() helpers to transition over the new scheme. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Fainelli authored
Update the Freescale TSEC PHY, Broadcom GENET & SYSTEMPORT Device Tree binding documentation to refer to the fixed-link Device Tree binding in fixed-link.txt. Reviewed-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Fainelli authored
Update the fixed-link Device Tree binding documentation to contain information about the old and deprecated 5-digit 'fixed-link' property. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Michal Kubecek says: ==================== net: device features handling fixes > I think we need to think more about what exact behavior is desired in > mixed csum feature set cases. Probably whatever csum offload type is > the most prevelant should be the one advertised by the master. It > means we will do that software checksum fixup for the oddball slaves, > but it seems the best we can do. I've been thinking about it some more and I believe what I proposed is correct: features that are or-ed (ONE_FOR_ALL) should start with 0, those which are and-ed (ALL_FOR_ALL) with 1. But once we do that, we have to fix another problem in vlan module: features of a vlan device are mostly computed as bitwise AND of features and vlan_features of its underlying device. The problem is checksumming features don't really behave like independent flags as their main purpose is to be used in can_checksum_protocol() whose logic is that HW_CSUM (GEN_CSUM) means "can checksum everything" so that it kind of contains IP(V6)_CSUM functionality. Therefore intersection of HW_CSUM and IP(V6)_CSUM should result in the latter. An alternative approach would be to always set IP(V6)_CSUM once HW_CSUM (any of GEN_CSUM) is set but as for long time people were taught not to do that and we even have a warning for it, switching to the exact opposite could cause a lot of problems. > Also, like Jay, I agree that whatever we do in bonding needs to be > done in team too and I'd like the bug fixed at the same time in both > places. Agreed. Added a similar patch for teaming driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michal Kubeček authored
__team_compute_features() uses netdev_increment_features() to combine vlan_features of slaves into vlan_features of the team. As netdev_increment_features() only adds most features and we start with TEAM_VLAN_FEATURES, we can end up with features none of the slaves provided. Initialize vlan_features only with the flags which are both in TEAM_VLAN_FEATURES and NETIF_F_ALL_FOR_ALL. Right now there is no such feature so that we actually initialize vlan_features with zero but stating it explicitely will make the code more future proof. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michal Kubeček authored
bond_compute_features() uses netdev_increment_features() to combine vlan_features of slaves into vlan_features of the bond. As netdev_increment_features() only adds most features and we start with BOND_VLAN_FEATURES, we can end up with features none of the slaves provided. If there is at least one slave, initialize vlan_features only with the flags in NETIF_F_ALL_FOR_ALL. Right now there is none in BOND_VLAN_FEATURES but stating it explicitely will make the code more future proof. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michal Kubeček authored
When combining real_dev's features and vlan_features, simple bitwise AND is used. This doesn't work well for checksum offloading features as if one set has NETIF_F_HW_CSUM and the other NETIF_F_IP_CSUM and/or NETIF_F_IPV6_CSUM, we end up with no checksum offloading. However, from the logical point of view (how can_checksum_protocol() works), NETIF_F_HW_CSUM contains the functionality of NETIF_F_IP_CSUM and NETIF_F_IPV6_CSUM so that the result should be IP/IPV6. Add helper function netdev_intersect_features() implementing this logic and use it in vlan_dev_fix_features(). Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nimrod Andy authored
Since imx serials FEC/ENET MDIO clock source is internal ipg clock, and "ahb" clock is defined as FEC/ENET bus clock, so the patch just correct the fec driver MDIO clock source. Signed-off-by: Fugang Duan <B38611@freescale.com> Acked-by: Frank Li <frank.li@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nimrod Andy authored
Add below clock management to save fec power: - After probe, disable all clocks incluing ipg, ahb, enet_out, ptp clock. - Open ethx interface enable necessary clocks. Close ethx interface disable all clocks. The patch also encapsulates the all enet clocks enable/disable to .fec_enet_clk_enable(), which can reduce the repetitional code in driver. Signed-off-by: Fugang Duan <B38611@freescale.com> Acked-by: Frank Li <Frank.li@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Ezequiel Garcia says: ==================== net: Introduce a software TSO helper API Here's a first proposal for a generic software TSO helper API, following David's suggestion. There are at least two drivers that currently implement some form of software TSO: Solarflare network driver (drivers/net/ethernet/sfc) and Tilera GX network driver (drivers/net/ethernet/tile/tilegx.c). The rationale behind adding a generic API is to provide a boiler plate with the segmentation and other common tasks, making this support easier to add in other drivers. When designing the API, I've considered mainly two design choices: 1. Implement a series of callbacks that each driver would implement and the net core code would call to fill in descriptors and egress that data. 2. Implement an API for drivers to use in a driver's specific tx_tso function. This functions would exhaust a sk_buff payload, and use the API as helper for building the headers and segmented data. I've chosen (2), to avoid function pointers (which was Willy's concern) and because it seemed less fragile. Of course, this is argueable. The API is by no means complete, and lacks some features, however it allows to support TSO in mv643xx_eth and mvneta network drivers with some very good performance results. I've added this support as an example of the API in action. In particular the following needs some revisiting: 1. IPv6 support is lacking. 2. The required descriptor counting needs some verification. The current implementation might be too "sketchy". The tilegx one can be a good starting point. 3. The implemenation assumes the hardware can compute the TCP and IP checksums for the built headers. However, some controllers may need some initial calculation (such as tilegx, for instance). Despite this, I hope this proposal is good enough to trigger some discussion and to check if I'm on the right track. Feedback is much appreciated! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ezequiel Garcia authored
Now that the TSO helper API has been introduced, this commit makes use of it to add support for software TSO in this driver. This feature allows to improve outbound throughput performance significantly. Running iperf tests shows a 30% improvement, tested on a Kirkwood Openblocks A6 board. $ ethtool -K eth0 tso off $ iperf -c 192.168.0.45 -t 3 ------------------------------------------------------------ Client connecting to 192.168.0.45, TCP port 5001 TCP window size: 43.8 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.0.159 port 46389 connected with 192.168.0.45 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 3.0 sec 217 MBytes 607 Mbits/sec $ ethtool -K eth0 tso on $ iperf -c 192.168.0.45 -t 3 ------------------------------------------------------------ Client connecting to 192.168.0.45, TCP port 5001 TCP window size: 43.8 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.0.159 port 46390 connected with 192.168.0.45 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 3.0 sec 336 MBytes 938 Mbits/sec This commit is just an example of the usage of the TSO API, it works fine but needs some more work. In particular, the descriptor unmapping path must avoid unmapping the TSO headers. Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ezequiel Garcia authored
Using dma_map_single() instead of skb_frag_dma_map() allows to unmap all the descriptors using dma_unmap_single(). This change allows to introduce software TSO in a less intrusive way. Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ezequiel Garcia authored
In order to ease the addition of new features, let's factorize the feature list. Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ezequiel Garcia authored
As specified in the datasheet, the driver can set the "L4Chk_Mode" flag (bit 10) in the Tx descriptor command/status to specify that a frame is not IP fragmented and that the controller is in charge of generating the TCP/IP checksum. This must be used together with the "GL4chk" flag (bit 17). These two flags allow to avoid setting the initial TCP checksum in the l4i_chk field of the Tx descriptor, which is needed to support software TSO. Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ezequiel Garcia authored
Make the code more readable by moving the initial checksum setup and the command/status preparation to its own function. Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ezequiel Garcia authored
Now that the TSO helper API has been introduced, this commit makes use of it to implement the TSO in this driver. Using iperf to test and vmstat to check the CPU usage, shows a substantial CPU usage drop when TSO is on (~15% vs. ~25%). HTTP-based tests performed by Willy Tarreau have shown performance improvements. Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ezequiel Garcia authored
Rework mvneta_tx() so that the code that performs the final handling before a sk_buff is transmitted is done only if the numbers of fragments processed if positive. This is preparation work to add the support for software TSO. Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ezequiel Garcia authored
In order to ease the addition of new features, let's factorize the feature list. Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ezequiel Garcia authored
Although the implementation probably needs a lot of work, this initial API allows to implement software TSO in mvneta and mv643xx_eth drivers in a not so intrusive way. Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftablesDavid S. Miller authored
Pablo Neira Ayuso says: ==================== Netfilter/nftables updates for net-next The following patchset contains Netfilter/nftables updates for net-next, most relevantly they are: 1) Add set element update notification via netlink, from Arturo Borrero. 2) Put all object updates in one single message batch that is sent to kernel-space. Before this patch only rules where included in the batch. This series also introduces the generic transaction infrastructure so updates to all objects (tables, chains, rules and sets) are applied in an all-or-nothing fashion, these series from me. 3) Defer release of objects via call_rcu to reduce the time required to commit changes. The assumption is that all objects are destroyed in reverse order to ensure that dependencies betweem them are fulfilled (ie. rules and sets are destroyed first, then chains, and finally tables). 4) Allow to match by bridge port name, from Tomasz Bursztyka. This series include two patches to prepare this new feature. 5) Implement the proper set selection based on the characteristics of the data. The new infrastructure also allows you to specify your preferences in terms of memory and computational complexity so the underlying set type is also selected according to your needs, from Patrick McHardy. 6) Several cleanup patches for nft expressions, including one minor possible compilation breakage due to missing mark support, also from Patrick. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-nextDavid S. Miller authored
Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates This series contains updates to i40e and i40evf. Shannon makes minor changes to the AdminQ interface to bring it up to date. Removes the hard coding of stats struct size in ethtool, in prep for adding data fields which are configuration dependent. Catherine removes some unused and unneeded PCI bus defines. Jesse fixes the copyright headers and finishes up the removal of the PTP Tx work functionality which allows us to rely on the Tx timesync interrupt. Mitch provides a number of fixes and cleanups for i40e/i40evf based on suggestions from Ben Hutchings. First is to use a macro parameter for ethtool stats instead of just assuming that a valid netdev variable exists. Second is not to tell ethtool that the VF can do 10GbaseT, when it really has no idea what its link speed is, so set the supported value to 0 instead. Make the ethtool_ops structure constant since it is extremely unlikely to change at runtime. Ethtool consistently reports 0 values for our ITR settings because we never actually use them, so fix this by setting the default values to the specified default values. Greg avoids a compile error by wrapping the call to i40e_alloc_vfs() in CONFIG_PCI_IOV because the function itself is wrapped in the same conditional compile block. Alexander Gordeev updates the driver to use the new pci_enable_msi_range() and pci_enable_msix_range() or pci_enable_msi_exact() and pci_enable_msix_exact(). Jean Sacren provides a fix where the wrong error code was being passed to i40e_open(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Neal Cardwell authored
Experience with the recent e114a710 ("tcp: fix cwnd limited checking to improve congestion control") has shown that there are common cases where that commit can cause cwnd to be much larger than necessary. This leads to TSO autosizing cooking skbs that are too large, among other things. The main problems seemed to be: (1) That commit attempted to predict the future behavior of the connection by looking at the write queue (if TSO or TSQ limit sending). That prediction sometimes overestimated future outstanding packets. (2) That commit always allowed cwnd to grow to twice the number of outstanding packets (even in congestion avoidance, where this is not needed). This commit improves both of these, by: (1) Switching to a measurement-based approach where we explicitly track the largest number of packets in flight during the past window ("max_packets_out"), and remember whether we were cwnd-limited at the moment we finished sending that flight. (2) Only allowing cwnd to grow to twice the number of outstanding packets ("max_packets_out") in slow start. In congestion avoidance mode we now only allow cwnd to grow if it was fully utilized. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 21 May, 2014 11 commits
-
-
Julia Lawall authored
Delete unnecessary local variable whose value is always 0 and that hides the fact that the result is always 0. A simplified version of the semantic patch that fixes this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r exists@ local idexpression ret; expression e; position p; @@ -ret = 0; ... when != ret = e return - ret + 0 ; // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexei Starovoitov authored
Kernel API for classic BPF socket filters is: sk_unattached_filter_create() - validate classic BPF, convert, JIT SK_RUN_FILTER() - run it sk_unattached_filter_destroy() - destroy socket filter Cleanup internal BPF kernel API as following: sk_filter_select_runtime() - final step of internal BPF creation. Try to JIT internal BPF program, if JIT is not available select interpreter SK_RUN_FILTER() - run it sk_filter_free() - free internal BPF program Disallow direct calls to BPF interpreter. Execution of the BPF program should be done with SK_RUN_FILTER() macro. Example of internal BPF create, run, destroy: struct sk_filter *fp; fp = kzalloc(sk_filter_size(prog_len), GFP_KERNEL); memcpy(fp->insni, prog, prog_len * sizeof(fp->insni[0])); fp->len = prog_len; sk_filter_select_runtime(fp); SK_RUN_FILTER(fp, ctx); sk_filter_free(fp); Sockets, seccomp, testsuite, tracing are using different ways to populate sk_filter, so first steps of program creation are not common. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Govindarajulu Varadarajan says: ==================== enic: Add adaptive coalescing interrupt support This series add support for adaptive coalescing interrupt and updates enic Maintainers. v1->v2: * Add commit log * do vnic_intr_coalescing_timer_set only while enabling intr * use ktime_get instead of hrtimer * make enic_set_rx_coal_setting return type void * change func name enic_apply_int_moderation to enic_calc_int_moderation ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Govindarajulu Varadarajan authored
Cc: Sujith Sankar <ssujith@cisco.com> Cc: Christian Benvenuti <benve@cisco.com> Cc: Neel Patel <neepatel@cisco.com> Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sujith Sankar authored
This patch adds support for adaptive interrupt coalescing. For small pkts with low pkt rate, we can decrease the coalescing interrupt dynamically which decreases the latency. This however increases the cpu utilization. Based on testing with different coal intr and pkt rate we came up with a table(mod_table) with rx_rate and coalescing interrupt value where we get low latency without significant increase in cpu. mod_table table stores the coalescing timer percentage value for different throughputs. Function enic_calc_int_moderation() calculates the desired coalescing intr timer value. This function is called in driver rx napi_poll. The actual value is set by enic_set_int_moderation() which is called when napi_poll is complete. i.e when we unmask the rx intr. Adaptive coal intr is support only when driver is using msix intr. Because intr is not shared. Struct mod_range is used to store only the default adaptive coalescing intr value. Adaptive coal intr calue is calculated by timer = range_start + ((rx_coal->range_end - range_start) * mod_table[index].range_percent / 100); rx_coal->range_end is the rx-usecs-high value set using ethtool. range_start is rx-usecs-low, set using ethtool, if rx_small_pkt_bytes_cnt is greater than 2 * rx_large_pkt_bytes_cnt. i.e small pkts are dominant. Else its rx-usecs-low + 3. Cc: Christian Benvenuti <benve@cisco.com> Cc: Neel Patel <neepatel@cisco.com> Signed-off-by: Sujith Sankar <ssujith@cisco.com> Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Manuel Schölling authored
To be future-proof and for better readability the time comparisons are modified to use time_before() instead of plain, error-prone math. Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Himangi Saraogi authored
This patch moves data allocated using kzalloc to managed data allocated using devm_kzalloc and cleans now unnecessary kfrees in probe and remove functions. An explicit linux/device.h include is added to make sure the devm_*() routine declarations are unambiguously available. The following Coccinelle semantic patch was used for making the change: @platform@ identifier p, probefn, removefn; @@ struct platform_driver p = { .probe = probefn, .remove = removefn, }; @prb@ identifier platform.probefn, pdev; expression e, e1, e2; @@ probefn(struct platform_device *pdev, ...) { <+... - e = kzalloc(e1, e2) + e = devm_kzalloc(&pdev->dev, e1, e2) ... ?-kfree(e); ...+> } @rem depends on prb@ identifier platform.removefn; expression e; @@ removefn(...) { <... - kfree(e); ...> } Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Peter Senna Tschudin authored
Remove double checks, convert printk to pr_warn, and move the call to pr_warn to the first check. The simplified version of the coccinelle semantic patch that find this issue is as follows: // <smpl> @@ expression E; identifier pr; expression list es; @@ while(...){ ... - if (E) break; + if (E){ + pr(es); + break; + } ... } - if(E) pr(es); // </smpl> Tested by compilation only. Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Li RongQing authored
entries is always greater than rt_max_size here, since if entries is less than rt_max_size, the fib6_run_gc function will be skipped Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xi Wang authored
tun_do_read always adds current thread to wait queue, even if a packet is ready to read. This is inefficient because both sleeper and waker want to acquire the wait queue spin lock when packet rate is high. We restructure the read function and use common kernel networking routines to handle receive, sleep and wakeup. With the change available packets are checked first before the reading thread is added to the wait queue. Ran performance tests with the following configuration: - my packet generator -> tap1 -> br0 -> tap0 -> my packet consumer - sender pinned to one core and receiver pinned to another core - sender send small UDP packets (64 bytes total) as fast as it can - sandy bridge cores - throughput are receiver side goodput numbers The results are baseline: 731k pkts/sec, cpu utilization at 1.50 cpus changed: 783k pkts/sec, cpu utilization at 1.53 cpus The performance difference is largely determined by packet rate and inter-cpu communication cost. For example, if the sender and receiver are pinned to different cpu sockets, the results are baseline: 558k pkts/sec, cpu utilization at 1.71 cpus changed: 690k pkts/sec, cpu utilization at 1.67 cpus Co-authored-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Xi Wang <xii@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tom Gundersen authored
Enable the module alias hookup to allow tunnel modules to be autoloaded on demand. This is in line with how most other netdev kinds work, and will allow userspace to create tunnels without having CAP_SYS_MODULE. Signed-off-by: Tom Gundersen <teg@jklm.no> Signed-off-by: David S. Miller <davem@davemloft.net>
-