- 06 Oct, 2014 40 commits
-
-
David S. Miller authored
John Fastabend says: ==================== net sched rcu updates This fixes the use of tcf_proto from RCU callbacks it requires moving the unbind calls out of the callbacks and removing the tcf_proto argument from the tcf_em_tree_destroy(). This is a rework of two previous series and addresses comments from Cong. And should apply against latest net-next. The previous series links below for reference: (1/2) net: sched: do not use tcf_proto 'tp' argument from call_rcu http://patchwork.ozlabs.org/patch/396149/ (2/2) net: sched: replace ematch calls to use struct net http://patchwork.ozlabs.org/patch/396150/ net: sched: cls_cgroup tear down exts and ematch from rcu callback http://patchwork.ozlabs.org/patch/396307/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Fastabend authored
Using the tcf_proto pointer 'tp' from inside the classifiers callback is not valid because it may have been cleaned up by another call_rcu occuring on another CPU. 'tp' is currently being used by tcf_unbind_filter() in this patch we move instances of tcf_unbind_filter outside of the call_rcu() context. This is safe to do because any running schedulers will either read the valid class field or it will be zeroed. And all schedulers today when the class is 0 do a lookup using the same call used by the tcf_exts_bind(). So even if we have a running classifier hit the null class pointer it will do a lookup and get to the same result. This is particularly fragile at the moment because the only way to verify this is to audit the schedulers call sites. Reported-by: Cong Wang <xiyou.wangconf@gmail.com> Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Fastabend authored
It is not RCU safe to destroy the action chain while there is a possibility of readers accessing it. Move this code into the rcu callback using the same rcu callback used in the code patch to make a change to head. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Fastabend authored
This removes the tcf_proto argument from the ematch code paths that only need it to reference the net namespace. This allows simplifying qdisc code paths especially when we need to tear down the ematch from an RCU callback. In this case we can not guarentee that the tcf_proto structure is still valid. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Some TSO engines might have a too heavy setup cost, that impacts performance on hosts sending small bursts (2 MSS per packet). This patch adds a device gso_min_segs, allowing drivers to set a minimum segment size for TSO packets, according to the NIC performance. Tested on a mlx4 NIC, this allows to get a ~110% increase of throughput when sending 2 MSS per packet. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
hayeswang authored
Restart autonegotiation is necessary after setting EEE. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
In case we find a general query with non-zero number of sources, we are dropping the skb as it's malformed. RFC3376, section 4.1.8. Number of Sources (N): This number is zero in a General Query or a Group-Specific Query, and non-zero in a Group-and-Source-Specific Query. Therefore, reflect that by using kfree_skb() instead of consume_skb(). Fixes: d679c532 ("igmp: avoid drop_monitor false positives") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Mahesh Bandewar authored
Earlier change to use usable slave array for TLB mode had an additional performance advantage. So extending the same logic to all other modes that use xmit-hash for slave selection (viz 802.3AD, and XOR modes). Also consolidating this with the earlier TLB change. The main idea is to build the usable slaves array in the control path and use that array for slave selection during xmit operation. Measured performance in a setup with a bond of 4x1G NICs with 200 instances of netperf for the modes involved (3ad, xor, tlb) cmd: netperf -t TCP_RR -H <TargetHost> -l 60 -s 5 Mode TPS-Before TPS-After 802.3ad : 468,694 493,101 TLB (lb=0): 392,583 392,965 XOR : 475,696 484,517 Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Mahesh Bandewar authored
It's a trivial fix to display xmit_hash_policy for this new TLB mode since it uses transmit-hash-poilicy as part of bonding-master info (/proc/net/bonding/<bonding-interface). Signed-off-by: Mahesh Bandewar <maheshb@google.com> Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Amir Vadai says: ==================== net/mlx4_en: Optimizations to TX flow This patchset contains optimizations to TX flow in mlx4_en driver. It also introduce setting/getting tx copybreak, to enable controlling inline threshold dynamically. TX flow optimizations was authored and posted to the mailing list by Eric Dumazet [1] as a single patch. I splitted this patch to smaller patches, Reviewed it and tested. Changed from original patch: - s/iowrite32be/iowrite32/, since ring->doorbell_qpn is stored as be32 The tx copybreak patch was also suggested by Eric Dumazet, and was edited and reviewed by me. User space patch will be sent after kernel code is ready. I am sending this patchset now since the merge window is near and don't want to miss it. More work need to do: - Disable BF when xmit_more is in use - Make TSO use xmit_more too. Maybe by splitting small TSO packets in the driver itself, to avoid extra cpu/memory costs of GSO before the driver - Fix mlx4_en_xmit buggy handling of queue full in the middle of a burst partially posted to send queue using xmit_more Eric, I edited the patches to have you as the Author and the first signed-off-by. I hope it is ok with you (I wasn't sure if it is ok to sign by you), anyway all the credit to those changes should go to you. Patchset was tested and applied over commit 1e203c1a "(net: sched: suspicious RCU usage in qdisc_watchdog") [1] - https://patchwork.ozlabs.org/patch/394256/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Instead of setting inline threshold using module parameter only on driver load, use set_tunable() to set it dynamically. No need to store the threshold per ring, using instead the netdev global priv->prof->inline_thold Initial value still is set using the module parameter, therefore backward compatability is kept. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Use new ethtool [sg]et_tunable() to set tx_copybread (inline threshold) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Reorganize code to call is_inline() once, so compiler can inline it Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Properly clear tx_info->ts_requested Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Access skb_headlen() once in tx flow Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Acces skb_shinfo(skb) once in tx flow. Also, rename @i variable to @i_frag to avoid confusion, as the "goto tx_drop_unmap;" relied on this @i variable. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
mlx4_en_process_tx_cq() carefully fetches and writes ring->last_nr_txbb and ring->cons only one time to avoid false sharing Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
mlx4_en_free_tx_desc() uses a prefetchw(&skb->users) to speed up consume_skb() prefetchw(&ring->tx_queue->dql) to speed up BQL update Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Add frag0_dma/frag0_byte_count into mlx4_en_tx_info to avoid a cache line miss in TX completion for frames having one dma element. (We avoid reading back the tx descriptor) Note this could be extended to 2/3 dma elements later, as we have free room in mlx4_en_tx_info Also, mlx4_en_free_tx_desc() no longer accesses skb_shinfo(). We use a new nr_maps fields in mlx4_en_tx_info to avoid 2 or 3 cache misses. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Try to allocate using kmalloc_node() first, only on failure use vmalloc() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
- doorbell_qpn is stored in the cpu_to_be32() way to avoid bswap() in fast path. - mdev->mr.key stored in ring->mr_key to also avoid bswap() and access to cold cache line. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Reorganize struct mlx4_en_tx_ring to have: - One cache line containing last_nr_txbb & cons & wake_queue, used by tx completion. - One cache line containing fields dirtied by mlx4_en_xmit() - Following part is read mostly and shared by cpus. Align struct mlx4_en_tx_info to a cache line Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
- Remove unused variable ring->poll_cnt - No need to set some fields if using blueflame - Add missing const's - Use unlikely - Remove unneeded new line - Make some comments more precise - struct mlx4_bf @offset field reduced to unsigned int to save space Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Standard qdisc API to setup a timer implies an atomic operation on every packet dequeue : qdisc_unthrottled() It turns out this is not really needed for FQ, as FQ has no concept of global qdisc throttling, being a qdisc handling many different flows, some of them can be throttled, while others are not. Fix is straightforward : add a 'bool throttle' to qdisc_watchdog_schedule_ns(), and remove calls to qdisc_unthrottled() in sch_fq. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Fabio Estevam authored
When fec_enet_alloc_buffers() fails we should better undo the previous actions, which consists of: disabling the FEC clocks and putting the FEC pins into inactive state. The error path for fec_enet_mii_probe() is kept unchanged. Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Chen Gang authored
MDIO_BCM_UNIMAC needs HAS_IOMEM, so depend on it, the related error ( with allmodconfig under um): MODPOST 1205 modules ERROR: "devm_ioremap" [drivers/net/phy/mdio-bcm-unimac.ko] undefined! Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Its unfortunate we have to walk again skb list to find the tail after segmentation, even if data is probably hot in cpu caches. skb_segment() can store the tail of the list into segs->prev, and validate_xmit_skb_list() can immediately get the tail. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Andy Zhou says: ==================== Add Geneve tunnel protocol support This patch series adds kernel support for Geneve (Generic Network Virtualization Encapsulation) based on Geneve IETF draft: http://www.ietf.org/id/draft-gross-geneve-01.txt Patch 1 implements Geneve tunneling protocol driver Patch 2-6 adds openvswitch support for creating and using Geneve tunnels by OVS user space. v1->v2: Style fixes: use tab instead space for Kconfig Patch 2-6 are reviewed by Pravin Shetty, add him to acked-by Patch 6 was reviewed by Thomas Graf when commiting to openvswitch.org, add him to acked-by. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jesse Gross authored
The Openvswitch implementation is completely agnostic to the options that are in use and can handle newly defined options without further work. It does this by simply matching on a byte array of options and allowing userspace to setup flows on this array. Signed-off-by: Jesse Gross <jesse@nicira.com> Singed-off-by: Ansis Atteka <aatteka@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Thomas Graf <tgraf@noironetworks.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jesse Gross authored
As the size of the flow key grows, it can put some pressure on the stack. This is particularly true in ovs_flow_cmd_set(), which needs several copies of the key on the stack. One of those uses is logically separate, so this factors it out to reduce stack pressure and improve readibility. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jesse Gross authored
Currently, the flow information that is matched for tunnels and the tunnel data passed around with packets is the same. However, as additional information is added this is not necessarily desirable, as in the case of pointers. This adds a new structure for tunnel metadata which currently contains only the existing struct. This change is purely internal to the kernel since the current OVS_KEY_ATTR_IPV4_TUNNEL is simply a compressed version of OVS_KEY_ATTR_TUNNEL that is translated at flow setup. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jesse Gross authored
Some tunnel formats have mechanisms for indicating that packets are OAM frames that should be handled specially (either as high priority or not forwarded beyond an endpoint). This provides support for allowing those types of packets to be matched. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jesse Gross authored
As new protocols are added, the size of the flow key tends to increase although few protocols care about all of the fields. In order to optimize this for hashing and matching, OVS uses a variable length portion of the key. However, when fields are extracted from the packet we must still zero out the entire key. This is no longer necessary now that OVS implements masking. Any fields (or holes in the structure) which are not part of a given protocol will be by definition not part of the mask and zeroed out during lookup. Furthermore, since masking already uses variable length keys this zeroing operation automatically benefits as well. In principle, the only thing that needs to be done at this point is remove the memset() at the beginning of flow. However, some fields assume that they are initialized to zero, which now must be done explicitly. In addition, in the event of an error we must also zero out corresponding fields to signal that there is no valid data present. These increase the total amount of code but very little of it is executed in non-error situations. Removing the memset() reduces the profile of ovs_flow_extract() from 0.64% to 0.56% when tested with large packets on a 10G link. Suggested-by: Pravin Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andy Zhou authored
This adds a device level support for Geneve -- Generic Network Virtualization Encapsulation. The protocol is documented at http://tools.ietf.org/html/draft-gross-geneve-01 Only protocol layer Geneve support is provided by this driver. Openvswitch can be used for configuring, set up and tear down functional Geneve tunnels. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Frank Li authored
reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross git checkout 1b7bde6d make.cross ARCH=m68k m5275evb_defconfig make.cross ARCH=m68k All error/warnings: drivers/net/ethernet/freescale/fec_main.c: In function 'fec_enet_rx_queue': >> drivers/net/ethernet/freescale/fec_main.c:1470:3: error: implicit declaration of function 'prefetch' [-Werror=implicit-function-declaration] prefetch(skb->data - NET_IP_ALIGN); ^ cc1: some warnings being treated as errors missed included prefetch.h Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Frank Li <Frank.Li@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petri Gynther authored
bcmgenet_mii_setup() is called from the PHY state machine every 1-2 seconds when the PHYs are in PHY_POLL mode. Improve bcmgenet_mii_setup() so that it touches the MAC registers only when the link is up and there was a change to link, speed, duplex, or pause status. Signed-off-by: Petri Gynther <pgynther@google.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Walter Lozano says: ==================== Altera TSE with no PHY In some scenarios there is no PHY chip present, for example in optical links. This serie of patches moves PHY get addr and MDIO create to a new function and avoids PHY and MDIO probing in these cases. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Walter Lozano authored
This patch avoids PHY and MDIO probing if no PHY chip is present. This is the case mainly in optical links where there is no need for PHY chip, and therefore no need of MDIO. In this scenario Ethernet MAC is directly connected to an optical module through an external SFP transceiver. Signed-off-by: Walter Lozano <walter@vanguardiasur.com.ar> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Walter Lozano authored
Move PHY get addr and MDIO create to a new function to improve readability and make it easier to avoid its usage. This will be useful for example in the case where there is no PHY chip. Signed-off-by: Walter Lozano <walter@vanguardiasur.com.ar> Signed-off-by: David S. Miller <davem@davemloft.net>
-