- 24 Sep, 2016 5 commits
-
-
Moshe Shemesh authored
Move the vf to VST 802.1ad mode (mlx4 VST QinQ mode) by setting vf vlan protocol to 802.1ad. VST 802.1ad mode in mlx4, is used for STAG strip/insertion by PF, while the CTAG is set by the VF. Read current vlan protocol as part of the vf configuration state. Upon setting vf vlan protocol to 802.1ad, we use a mechanism of handshake to verify that both the vf and the pf driver version support it. The handshake uses the command QUERY_FUNC_CAP: - The vf sets a pre-defined support bit in input modifier. - A pf that supports the feature sends the request to the vf through a pre-defined field in the output mailbox. - In case vf does not support the feature, the pf will fail the control command (in this case, IP link tool command to set the vf vlan protocol to 802.1ad). No change in VST 802.1Q mode. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Moshe Shemesh authored
Introduce new rtnl UAPI that exposes a list of vlans per VF, giving the ability for user-space application to specify it for the VF, as an option to support 802.1ad. We adjusted IP Link tool to support this option. For future use cases, the new UAPI supports multiple vlans. For now we limit the list size to a single vlan in kernel. Add IFLA_VF_VLAN_LIST in addition to IFLA_VF_VLAN to keep backward compatibility with older versions of IP Link tool. Add a vlan protocol parameter to the ndo_set_vf_vlan callback. We kept 802.1Q as the drivers' default vlan protocol. Suitable ip link tool command examples: Set vf vlan protocol 802.1ad: ip link set eth0 vf 1 vlan 100 proto 802.1ad Set vf to VST (802.1Q) mode: ip link set eth0 vf 1 vlan 100 proto 802.1Q Or by omitting the new parameter ip link set eth0 vf 1 vlan 100 Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Moshe Shemesh authored
In Ethernet VF, disable vlan HW acceleration on VF while it is set to VF vlan protocol 802.1ad mode. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Moshe Shemesh authored
Check device capability to support VF vlan protocol 802.1ad mode. Add vport attribute vlan protocol. Init vport vlan protocol by default to 802.1Q. Add update QP support for VF vlan protocol 802.1ad. Add func capability vlan_offload_disable to disable all vlan HW acceleration on VF while the VF is set to VF vlan protocol 802.1ad mode. No change in VF vlan protocol 802.1Q (VST) mode. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Moshe Shemesh authored
Separate QUERY_FUNC_CAP flags0 from QUERY_FUNC_CAP flags, as 'flags' is already used for another set of flags in FUNC CAP, while phv bit should be part of a different set of flags. Remove QUERY_FUNC_CAP port_flags field, as it is not in use. Fixes: 77fc29c4 ('net/mlx4_core: Preparations for 802.1ad VLAN support') Fixes: 5cc914f1 ('mlx4_core: Added FW commands and their wrappers for supporting SRIOV') Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 23 Sep, 2016 35 commits
-
-
David S. Miller authored
This reverts commit c0c64c15. There is no vif->ctrl_task member, so this change broke the build. Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Daniel Borkmann says: ==================== Few minor BPF helper improvements Just a few minor improvements around BPF helpers, first one is a fix but given this late stage and that it's not really a critical one, I think net-next is just fine. For details please see the individual patches. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
Add a small helper that complements 36bbef52 ("bpf: direct packet write and access for helpers for clsact progs") for invalidating the current skb->hash after mangling on headers via direct packet write. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
Same motivation as in commit 80b48c44 ("bpf: don't use raw processor id in generic helper"), but this time for XDP typed programs. Thus, allow for preemption checks when we have DEBUG_PREEMPT enabled, and otherwise use the raw variant. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
We need to use skb_to_full_sk() helper introduced in commit bd5eb35f ("xfrm: take care of request sockets") as otherwise we miss tcp synack messages, since ownership is on request socket and therefore it would miss the sk_fullsock() check. Use skb_to_full_sk() as also done similarly in the bpf_get_cgroup_classid() helper via 2309236c ("cls_cgroup: get sk_classid only from full sockets") fix to not let this fall through. Fixes: 4a482f34 ("cgroup: bpf: Add bpf_skb_in_cgroup_proto") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Stephen Hemminger says: ==================== hv_netvsc changes These are mostly about improving the handling of interaction between the virtual network device (netvsc) and the SR-IOV VF network device. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stephen Hemminger authored
Useful for debugging issues with multicast and SR-IOV to keep track of number of received multicast packets. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stephen Hemminger authored
Since VF reference is now protected by RCU, no longer need the VF usage counter and can use device flags to see whether to inject or not. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stephen Hemminger authored
The vf_netdev pointer in the netvsc device context can simply be protected by RCU because network device destruction is already RCU synchronized. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stephen Hemminger authored
The code to associate netvsc and VF devices can be made less error prone by using a better matching algorithms. On registration, use the permanent address which avoids any possible issues caused by device MAC address being changed. For all other callbacks, search by the netdevice pointer value to ensure getting the correct network device. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stephen Hemminger authored
The callback handler for netlink events can be simplified: * Consolidate check for netlink callback events about this driver itself. * Ignore non-Ethernet devices. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stephen Hemminger authored
The netvsc driver holds a pointer to the virtual function network device if managing SR-IOV association. In order to ensure that the VF network device does not disappear, it should be using dev_hold/dev_put to get a reference count. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stephen Hemminger authored
Packets that are transmitted in normal path should use consume_skb instead of kfree_skb. This allows for better tracing of packet drops. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Vivien Didelot says: ==================== net: dsa: add port fast ageing Today the DSA drivers are in charge of flushing the MAC addresses associated to a port when its STP state changes from Learning or Forwarding, to Disabled or Blocking or Listening. This makes the drivers more complex and hides this generic switch logic. This patchset introduces a new optional port_fast_age operation to dsa_switch_ops, to move this logic to the DSA layer and keep drivers simple. b53 and mv88e6xxx are updated accordingly. ==================== Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vivien Didelot authored
Now that the DSA layer handles port fast ageing on correct STP change, simplify _mv88e6xxx_port_state and implement mv88e6xxx_port_fast_age. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vivien Didelot authored
Remove the fast ageing logic from b53_br_set_stp_state and implement the new DSA switch port_fast_age operation instead. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vivien Didelot authored
Today the DSA drivers are in charge of flushing the MAC addresses associated to a port when its STP state changes from Learning or Forwarding, to Disabled or Blocking or Listening. This makes the drivers more complex and hides the generic switch logic. Introduce a new optional port_fast_age operation to dsa_switch_ops, to move this logic to the DSA layer and keep drivers simple. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vivien Didelot authored
Add a void helper to set the STP state of a port, checking first if the required routine is provided by the driver. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Iyappan Subramanian authored
Current driver programs static value of MSS in hardware register for TSO offload engine to segment the TCP payload regardless the MSS value provided by network stack. This patch fixes this by programming hardware registers with the stack provided MSS value. Since the hardware has the limitation of having only 4 MSS registers, this patch uses reference count of mss values being used. Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: Toan Le <toanle@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Colin Ian King authored
Change predecrement compare to post decrement compare to avoid an unsigned integer wrap-around comparison when decrementing idx in the while loop. For example, when idx is zero, the current situation will predecrement idx in the while loop, wrapping idx to the maximum signed integer and cause out of bounds reads on rxq_info->msix_tbl[idx]. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Saeed Mahameed says: ==================== Mellanox 100G SRIOV offloads vlan push/pop From Or Gerlitz: This series further enhances the SRIOV TC offloads of mlx5 to handle the TC vlan push and pop actions. This serves a common use-case in virtualization systems where the virtual switch add (push) vlan tags to packets sent from VMs and removes (pop) vlan tags before the packet is received by the VM. We use the new E-Switch switchdev mode and the TC vlan action to achieve that also in SW defined SRIOV environments by offloading TC rules that contain this action along with forwarding (TC mirred/redirect action) the packet. In the first patch we add some helpers to access the TC vlan action info by offloading drivers. The next five patches don't add any new functionality, they do some refactoring and cleanups in the current code to be used next. The seventh patch deals with supporting vlans by the mlx5 e-switch in switchdev mode. The eighth patch does the vlan action offload from TC and the last patch adds matching for vlans as typically required by TC flows that involve vlan pop action. The series was applied on top of commit 524605e5 "cxgb4: Convert to use simple_open()" ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Or Gerlitz authored
Enhance the parsing of offloaded TC rules matches to handle vlans. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Or Gerlitz authored
Parse TC vlan actions and set the required elements to allow offloading. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Or Gerlitz authored
Many virtualization systems use a policy under which a vlan tag is pushed to packets sent by guests, and popped before the packet is forwarded to the VM. The current generation of the mlx5 HW doesn't fully support that on a per flow level. As such, we are addressing the above common use case with the SRIOV e-Switch abilities to push vlan into packets sent by VFs and pop vlan from packets forwarded to VFs. The HW can match on the correct vlan being present in packets forwarded to VFs (eSwitch steering is done before stripping the tag), so this part is offloaded as is. A common practice for vlans is to avoid both push vlan and pop vlan for inter-host VM/VM (east-west) communication because in this case, push on egress cancels out with pop on ingress. For supporting that, we use a global eswitch vlan pop policy, hence allowing guest A to communicate with both remote VM B and local VM C. This works since the HW pops the vlan only if it exists (e.g for C --> A packets but not for B --> A packets). On the slow path, when a VF vport has an offloaded flow which involves pushing vlans, wheres another flow is not currently offloaded, the packets from the 2nd flow seen by the VF representor on the host have vlan. The VF rep driver removes such vlan before calling into the host networking stack. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Or Gerlitz authored
Factor the relevant code into a static inline helper (skb_from_cqe) doing that. Move the call to napi_gro_receive to be carried out just after mlx5e_complete_rx_cqe returns. Both changes are to be used for the VF representor as well in the next commit. This patch doesn't change any functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Or Gerlitz authored
Put the representors related to the source and dest vports and the action in struct mlx5_esw_flow_attr which is used while setting the FDB rule. This patch doesn't change any functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Or Gerlitz authored
The HW can be programmed to push vlan, pop vlan or both. A factorization step towards using the push/pop capabilties in the eswitch offloads mode. This patch doesn't add new functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Or Gerlitz authored
The structure we use for the eswitch vport representor (mlx5_eswitch_rep) has some fields which are set from upper layers in the driver when they register the rep. Use explicit setting on registration time for them and avoid global memcpy. This patch doesn't add new functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Or Gerlitz authored
Set the vport value in the PF entry to be that of the uplink so we can use it blindly over the tc / eswitch offload code without translating it each time we deal with the uplink representor. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Or Gerlitz authored
Needed e.g for offloading drivers to pick the relevant attributes. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
It looks like the following patch can make FQ very precise, even in VM or stressed hosts. It matters at high pacing rates. We take into account the difference between the time that was programmed when last packet was sent, and current time (a drift of tens of usecs is often observed) Add an EWMA of the unthrottle latency to help diagnostics. This latency is the difference between current time and oldest packet in delayed RB-tree. This accounts for the high resolution timer latency, but can be different under stress, as fq_check_throttled() can be opportunistically be called from a dequeue() called after an enqueue() for a different flow. Tested: // Start a 10Gbit flow $ netperf --google-pacing-rate 1250000000 -H lpaa24 -l 10000 -- -K bbr & Before patch : $ sar -n DEV 10 5 | grep eth0 | grep Average Average: eth0 17106.04 756876.84 1102.75 1119049.02 0.00 0.00 0.52 After patch : $ sar -n DEV 10 5 | grep eth0 | grep Average Average: eth0 17867.00 800245.90 1151.77 1183172.12 0.00 0.00 0.52 A new iproute2 tc can output the 'unthrottle latency' : $ tc -s qd sh dev eth0 | grep latency 0 gc, 0 highprio, 32490767 throttled, 2382 ns latency Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Bert Kenward authored
Add a NULL check before calling asynchronous MCDI completion functions during device removal. Fixes: 7014d7f6 ("sfc: allow asynchronous MCDI without completion function") Signed-off-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Marcelo Ricardo Leitner says: ==================== sctp: fix the handling of SACK Gap Ack blocks sctp_acked() is using 32bit arithmetics on 16bits vars, via TSN_lte() macros, which is weird and confusing. Once the offset to ctsn is calculated, all wrapping is already handled and thus to verify the Gap Ack blocks we can just use pure less/big-or-equal than checks. Also, rename gap variable to tsn_offset, so it's more meaningful, as it doesn't point to any gap at all. Even so, I don't think this discrepancy resulted in any practical bug. This patch is a preparation for the next one, which will introduce typecheck() for TSN_lte() macros and would cause a compile error here. Suggested-by: David Laight <David.Laight@ACULAB.COM> Reported-by: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Marcelo Ricardo Leitner authored
Make it similar to time_before() macros: - easier to understand - make use of typecheck() to avoid working on unexpected variable types (made the issue on previous patch visible) - for _[lg]te versions, slighly faster, as the compiler used to generate a sequence of cmp/je/cmp/js instructions and now it's sub/test/jle (for _lte): Before, for sctp_outq_sack: if (primary->cacc.changeover_active) { 1f01: 80 b9 84 02 00 00 00 cmpb $0x0,0x284(%rcx) 1f08: 74 6e je 1f78 <sctp_outq_sack+0xe8> u8 clear_cycling = 0; if (TSN_lte(primary->cacc.next_tsn_at_change, sack_ctsn)) { 1f0a: 8b 81 80 02 00 00 mov 0x280(%rcx),%eax return ((s) - (t)) & TSN_SIGN_BIT; } static inline int TSN_lte(__u32 s, __u32 t) { return ((s) == (t)) || (((s) - (t)) & TSN_SIGN_BIT); 1f10: 8b 7d bc mov -0x44(%rbp),%edi 1f13: 39 c7 cmp %eax,%edi 1f15: 74 25 je 1f3c <sctp_outq_sack+0xac> 1f17: 39 f8 cmp %edi,%eax 1f19: 78 21 js 1f3c <sctp_outq_sack+0xac> primary->cacc.changeover_active = 0; After: if (primary->cacc.changeover_active) { 1ee7: 80 b9 84 02 00 00 00 cmpb $0x0,0x284(%rcx) 1eee: 74 73 je 1f63 <sctp_outq_sack+0xf3> u8 clear_cycling = 0; if (TSN_lte(primary->cacc.next_tsn_at_change, sack_ctsn)) { 1ef0: 8b 81 80 02 00 00 mov 0x280(%rcx),%eax 1ef6: 2b 45 b4 sub -0x4c(%rbp),%eax 1ef9: 85 c0 test %eax,%eax 1efb: 7e 26 jle 1f23 <sctp_outq_sack+0xb3> primary->cacc.changeover_active = 0; *_lt() generated pretty much the same code. Tested with gcc (GCC) 6.1.1 20160621. This patch also removes SSN_lte as it is not used and cleanups some comments. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Marcelo Ricardo Leitner authored
sctp_acked() is using 32bit arithmetics on 16bits vars, via TSN_lte() macros, which is weird and confusing. Once the offset to ctsn is calculated, all wrapping is already handled and thus to verify the Gap Ack blocks we can just use pure less/big-or-equal than checks. Also, rename gap variable to tsn_offset, so it's more meaningful, as it doesn't point to any gap at all. Even so, I don't think this discrepancy resulted in any practical bug. This patch is a preparation for the next one, which will introduce typecheck() for TSN_lte() macros and would cause a compile error here. Suggested-by: David Laight <David.Laight@ACULAB.COM> Reported-by: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-