- 31 Aug, 2015 36 commits
-
-
David S. Miller authored
Andrew Lunn says: ==================== DSA port configuration and status This patchset allows various switch port settings to be configured and port status to be sampled. Some of these patches have been posted before. The first three patches provide infrastructure for configuring a switch ports link speed and duplex from a fixed_link phy. Patch four then uses this infrastructure to allow the CPU and DSA ports of a switch to be configured using a fixed-link property in the device tree. Patches five and six allow a phy-mode property to be specified in the device tree, and allow this to be used for configuring RGMII delays. Patches seven through nine allow link status, for example that of an SFP module, to be read from a gpio. Changes since v1: Rewrite 9/9 so that it hopefully does not regression on 868a4215 ("net: phy: fixed_phy: handle link-down case") ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrew Lunn authored
What features a phy supports is masked in genphy_config_init() by looking at the PHYs BMSR register. If the link is down, fixed_phy_update_regs() will only set the auto- negotiation capable bit in BMSR. Thus genphy_config_init() comes to the conclusion the PHY can only perform 10/Half, and masks out the higher speed features. If however the link it up, BMSR is set to indicate the speed the PHY is capable of auto-negotiating, and genphy_config_init() does not mask out the high speed features. To fix this, when the link is down, have fixed_phy_update_regs() leave the link status, auto-negotiation complete, and link partner capabilities unset, but set all the local capabilities depending on the fixed phy speed. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrew Lunn authored
An SFP module may have a link up/down status pin which can be connection to a GPIO line of the host. Add support for reading such an GPIO in the fixed_phy driver. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrew Lunn authored
When polling for link status, don't consider ports which have a forced link. Such ports don't monitor their phy or may not even have a phy. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrew Lunn authored
Some Marvell switches allow the RGMII Rx and Tx clock to be delayed when the port is using RGMII. Have the adjust_link function look at the phy interface type and enable this delay as requested. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrew Lunn authored
It can be useful for DSA and CPU ports to have a phy-mode property, in particular to specify RGMII delays. Parse the property and set it in the fixed-link phydev. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrew Lunn authored
By default, DSA and CPU ports are configured to the maximum speed the switch supports. However there can be use cases where the peer devices port is slower. Allow a fixed-link property to be used with the DSA and CPU port in the device tree, and use this information to configure the port. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrew Lunn authored
Set the supported field of the phydev to indicate the speed features of the phy. If the phy is never attached to a netdev, but used in an adjust_link() function, the speed will be incorrectly evaluated to 10/half rather than the correct speed/duplex. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrew Lunn authored
The current code sets user ports to perform auto negotiation using the phy. CPU and DSA ports are configured to full duplex and maximum speed the switch supports. There are however use cases where the CPU has a slower port, and when user ports have SFP modules with fixed speed. In these cases, port settings to be read from a fixed_phy devices. The switch driver then needs to implement the adjust_link op, so the port settings can be set. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Fainelli authored
Some Ethernet MAC drivers using the PHY library require the hardcoding of link parameters when interfaced to a switch device, SFP module, switch to switch port, etc. This has typically lead to various ad-hoc implementations looking like this: - using a "fixed PHY" emulated device, which will provide link indication towards the Ethernet MAC driver and hardware - pretend there is no PHY and hardcode link parameters, ala mv643x_eth Based on that, it is desireable to have the PHY drivers advertise the correct link parameters, just like regular Ethernet PHYs towards their CPU Ethernet MAC drivers, however, Ethernet MAC drivers should be able to tell whether this link should be monitored or not. In the context of an Ethernet switch, SFP module, switch to switch link, we do not need to monitor this link since it should be always up. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
Fix a memory leak in the mpls netns init function in case of failure. If register_net_sysctl fails then we need to free the ctl_table. Fixes: 7720c01f ("mpls: Add a sysctl to control the size of the mpls label table") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Ahern authored
TOS is another key aspect of the lookup passed to fib_validate_source. Add it to the tracepoint. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexei Starovoitov authored
To fix build errors: kernel/built-in.o: In function `bpf_trace_printk': bpf_trace.c:(.text+0x11a254): undefined reference to `strncpy_from_unsafe' kernel/built-in.o: In function `fetch_memory_string': trace_kprobe.c:(.text+0x11acf8): undefined reference to `strncpy_from_unsafe' move strncpy_from_unsafe() next to probe_kernel_read/write() which use the same memory access style. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Reported-by: Guenter Roeck <linux@roeck-us.net> Fixes: 1a6877b9 ("lib: introduce strncpy_from_unsafe()") Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Daniel Borkmann says: ==================== tcp: receive-side per route dctcp handling Original cover letter: Currently, the following case doesn't use DCTCP, even if it should: - responder has f.e. cubic as system wide default - 'ip route congctl dctcp $src' was set Then, DCTCP is NOT used if a DCTCP sender attempts to connect from a host in the $src range: ECT(0) is set, but listen_sk is not dctcp, so we fail the INET_ECN_is_not_ect sanity check. We also have to examine the dst used for the SYN/ACK reply to make this case work. In order to minimize additional cost, store the 'ecn is must have' information is the dst_features field. The set targets -next instead of -net since this doesn't seem to be a serious bug and to give the change more soak time until it hits linus tree. v1 -> v2: - Addressed Dave's feedback, not exposing any bits to user space - Added patch 3 to reject incorrect configurations - Rest as is, rebased and retested ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
Currently, the following case doesn't use DCTCP, even if it should: A responder has f.e. Cubic as system wide default, but for a specific route to the initiating host, DCTCP is being set in RTAX_CC_ALGO. The initiating host then uses DCTCP as congestion control, but since the initiator sets ECT(0), tcp_ecn_create_request() doesn't set ecn_ok, and we have to fall back to Reno after 3WHS completes. We were thinking on how to solve this in a minimal, non-intrusive way without bloating tcp_ecn_create_request() needlessly: lets cache the CA ecn option flag in RTAX_FEATURES. In other words, when ECT(0) is set on the SYN packet, set ecn_ok=1 iff route RTAX_FEATURES contains the unexposed (internal-only) DST_FEATURE_ECN_CA. This allows to only do a single metric feature lookup inside tcp_ecn_create_request(). Joint work with Florian Westphal. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
Feature bits that are invalid should not be accepted by the kernel, only the lower 4 bits may be configured, but not the remaining ones. Even from these 4, 2 of them are unused. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
Reduce the identation a bit, there's no need to artificically have it increased. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Westphal authored
fib_create_info() is already quite large, so before adding more code to the metrics section move that to a helper, similar to ip6_convert_metrics. Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Philip Downey authored
Document the addition of a new sysctl variable which controls the generation of IGMP reports for link local multicast groups in the 224.0.0.X range. IGMP reports for local multicast groups can now be optionally inhibited by setting the value to zero e.g.: echo 0 > /proc/sys/net/ipv4/igmp_link_local_mcast_reports To retain backwards compatibility the previous behaviour is retained by default on system boot or reverted by setting the value back to non-zero. Signed-off-by: Philip Downey <pdowney@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pravin B Shelar authored
Currently tun-info options pointer is used in few cases to pass options around. But tunnel options can be accessed using ip_tunnel_info_opts() API without using the pointer. Following patch removes the redundant pointer and consistently make use of API. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Reviewed-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Madalin Bucur authored
Address remaining issue after 80ec1927. Signed-off-by: Madalin Bucur <madalin.bucur@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
net/ipv4/af_inet.c: In function 'snmp_get_cpu_field64': >> net/ipv4/af_inet.c:1486:26: error: 'offt' undeclared (first use in this function) v = *(((u64 *)bhptr) + offt); ^ net/ipv4/af_inet.c:1486:26: note: each undeclared identifier is reported only once for each function it appears in net/ipv4/af_inet.c: In function 'snmp_fold_field64': >> net/ipv4/af_inet.c:1499:39: error: 'offct' undeclared (first use in this function) res += snmp_get_cpu_field(mib, cpu, offct, syncp_offset); ^ >> net/ipv4/af_inet.c:1499:10: error: too many arguments to function 'snmp_get_cpu_field' res += snmp_get_cpu_field(mib, cpu, offct, syncp_offset); ^ net/ipv4/af_inet.c:1455:5: note: declared here u64 snmp_get_cpu_field(void __percpu *mib, int cpu, int offt) ^ Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ken-ichirou MATSUZAWA authored
Poll() returns immediately after setting the kernel current frame (ring->head) to SKIP from user space even though there is no new frame. And in a case of all frames is VALID, user space program unintensionally sets (only) kernel current frame to UNUSED, then calls poll(), it will not return immediately even though there are VALID frames. To avoid situations like above, I think we need to scan all frames to find VALID frames at poll() like netlink_alloc_skb(), netlink_forward_ring() finding an UNUSED frame at skb allocation. Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Aleksey Makarov says: ==================== net: thunderx: New features and fixes v2: - The unused affinity_mask field of the structure cmp_queue has been deleted. (thanks to David Miller) - The unneeded initializers have been dropped. (thanks to Alexey Klimov) - The commit message "net: thunderx: Rework interrupt handling" has been fixed. (thanks to Alexey Klimov) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sunil Goutham authored
Support for setting VF's corresponding BGX LMAC in internal loopback mode. This mode can be used for verifying basic HW functionality such as packet I/O, RX checksum validation, CQ/RBDR interrupts, stats e.t.c. Useful when DUT has no external network connectivity. 'loopback' mode can be enabled or disabled via ethtool. Note: This feature is not supported when no of VFs enabled are morethan no of physical interfaces i.e active BGX LMACs Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sunil Goutham authored
This patch adds support for handling multiple qsets assigned to a single VF. There by increasing no of queues from earlier 8 to max no of CPUs in the system i.e 48 queues on a single node and 96 on dual node system. User doesn't have option to assign which Qsets/VFs to be merged. Upon request from VF, PF assigns next free Qsets as secondary qsets. To maintain current behavior no of queues is kept to 8 by default which can be increased via ethtool. If user wants to unbind NICVF driver from a secondary Qset then it should be done after tearing down primary VF's interface. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com> Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sunil Goutham authored
Rework interrupt handler to avoid checking IRQ affinity of CQ interrupts. Now separate handlers are registered for each IRQ including RBDR. Register interrupt handlers for only those which are being used. Add nicvf_dump_intr_status() and use it in irq handlers. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sunil Goutham authored
This patch configures HW to strip 802.1Q header if found in a receiving packet. The stripped VLAN ID and TCI information is passed on to software via CQE_RX. Also sets netdev's 'vlan_features' so that other HW offload features can be used for tagged packets. This offload feature can be enabled or disabled via ethtool. Network stack normally ignores RPS for 802.1Q packets and hence low throughput. With this offload enabled throughput for tagged packets will be almost same as normal packets. Note: This patch doesn't enable HW VLAN insertion for transmit packets. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sunil Goutham authored
Adding support for receive hashing HW offload by using RSS_ALG and RSS_TAG fields of CQE_RX descriptor. Also removed dependency on minimum receive queue count to configure RSS so that hash is always generated. This hash is used by RPS logic to distribute flows across multiple CPUs. Offload can be disabled via ethtool. Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sunil Goutham authored
Use the nicvf_send_msg_to_pf() function in the mailbox code. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sunil Goutham authored
Added ethtool support to dump receive packet error statistics reported in CQE. Also made some small fixes Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Aleksey Makarov authored
The liquidio and thunder drivers have different maintainers. Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Raghavendra K T says: ==================== Optimize the snmp stat aggregation for large cpus While creating 1000 containers, perf is showing lot of time spent in snmp_fold_field on a large cpu system. The current patch tries to improve by reordering the statistics gathering. Please note that similar overhead was also reported while creating veth pairs https://lkml.org/lkml/2013/3/19/556 Changes in V4: - remove 'item' variable and use IPSTATS_MIB_MAX to avoid sparse warning (Eric) also remove 'item' parameter (Joe) - add missing memset of padding. Changes in V3: - use memset to initialize temp buffer in leaf function. (David) - use memcpy to copy the buffer data to stat instead of unalign_pu (Joe) - Move buffer definition to leaf function __snmp6_fill_stats64() (Eric) - Changes in V2: - Allocate the stat calculation buffer in stack. (Eric) Setup: 160 cpu (20 core) baremetal powerpc system with 1TB memory 1000 docker containers was created with command docker run -itd ubuntu:15.04 /bin/bash in loop observation: Docker container creation linearly increased from around 1.6 sec to 7.5 sec (at 1000 containers) perf data showed, creating veth interfaces resulting in the below code path was taking more time. rtnl_fill_ifinfo -> inet6_fill_link_af -> inet6_fill_ifla6_attrs -> snmp_fold_field proposed idea: currently __snmp6_fill_stats64 calls snmp_fold_field that walks through per cpu data to of an item (iteratively for around 36 items). The patch tries to aggregate the statistics by going through all the items of each cpu sequentially which is reducing cache misses. Performance of docker creation improved by around more than 2x after the patch. before the patch: ================ 3f45ba571a42e925c4ec4aaee0e48d7610a9ed82a4c931f83324d41822cf6617 real 0m6.836s user 0m0.095s sys 0m0.011s perf record -a docker run -itd ubuntu:15.04 /bin/bash ======================================================= 50.73% docker [kernel.kallsyms] [k] snmp_fold_field 9.07% swapper [kernel.kallsyms] [k] snooze_loop 3.49% docker [kernel.kallsyms] [k] veth_stats_one 2.85% swapper [kernel.kallsyms] [k] _raw_spin_lock 1.37% docker docker [.] backtrace_qsort 1.31% docker docker [.] strings.FieldsFunc cache-misses: 2.7% after the patch: ============= 9178273e9df399c8290b6c196e4aef9273be2876225f63b14a60cf97eacfafb5 real 0m3.249s user 0m0.088s sys 0m0.020s perf record -a docker run -itd ubuntu:15.04 /bin/bash ======================================================= 10.57% docker docker [.] scanblock 8.37% swapper [kernel.kallsyms] [k] snooze_loop 6.91% docker [kernel.kallsyms] [k] snmp_get_cpu_field 6.67% docker [kernel.kallsyms] [k] veth_stats_one 3.96% docker docker [.] runtime_MSpan_Sweep 2.47% docker docker [.] strings.FieldsFunc cache-misses: 1.41 % Please let me know if you have suggestions/comments. Thanks Eric, Joe and David for the comments. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Raghavendra K T authored
Docker container creation linearly increased from around 1.6 sec to 7.5 sec (at 1000 containers) and perf data showed 50% ovehead in snmp_fold_field. reason: currently __snmp6_fill_stats64 calls snmp_fold_field that walks through per cpu data of an item (iteratively for around 36 items). idea: This patch tries to aggregate the statistics by going through all the items of each cpu sequentially which is reducing cache misses. Docker creation got faster by more than 2x after the patch. Result: Before After Docker creation time 6.836s 3.25s cache miss 2.7% 1.41% perf before: 50.73% docker [kernel.kallsyms] [k] snmp_fold_field 9.07% swapper [kernel.kallsyms] [k] snooze_loop 3.49% docker [kernel.kallsyms] [k] veth_stats_one 2.85% swapper [kernel.kallsyms] [k] _raw_spin_lock perf after: 10.57% docker docker [.] scanblock 8.37% swapper [kernel.kallsyms] [k] snooze_loop 6.91% docker [kernel.kallsyms] [k] snmp_get_cpu_field 6.67% docker [kernel.kallsyms] [k] veth_stats_one changes/ideas suggested: Using buffer in stack (Eric), Usage of memset (David), Using memcpy in place of unaligned_put (Joe). Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Raghavendra K T authored
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
-
- 30 Aug, 2015 4 commits
-
-
David S. Miller authored
Pravin B Shelar says: ==================== openvswitch: Cleanup post vport conversion. After converting all vport to netdev implmentations there is no need for some of vport functionality. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pravin B Shelar authored
This structure is not used anymore. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pravin B Shelar authored
Since all vport types are now backed by netdev, we can directly use netdev stats. Following patch removes redundant stat from vport. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pravin B Shelar authored
tun info is passed using skb-dst pointer. Now we have converted all vports to netdev based implementation so Now we can remove redundant pointer to tun-info from OVS_CB. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-