- 14 Jan, 2017 11 commits
-
-
Yuchung Cheng authored
Forward retransmit is an esoteric feature in RFC3517 (condition(3) in the NextSeg()). Basically if a packet is not considered lost by the current criteria (# of dupacks etc), but the congestion window has room for more packets, then retransmit this packet. However it actually conflicts with the rest of recovery design. For example, when reordering is detected we want to be conservative in retransmitting packets but forward-retransmit feature would break that to force more retransmission. Also the implementation is fairly complicated inside the retransmission logic inducing extra iterations in the write queue. With RACK losses are being detected timely and this heuristic is no longer necessary. There this patch removes the feature. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yuchung Cheng authored
Current F-RTO reverts cwnd reset whenever a never-retransmitted packet was (s)acked. The timeout can be declared spurious because the packets acknoledged with this ACK was transmitted before the timeout, so clearly not all the packets are lost to reset the cwnd. This nice detection does not really depend F-RTO internals. This patch applies the detection universally. On Google servers this change detected 20% more spurious timeouts. Suggested-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yuchung Cheng authored
This patch changes two things: 1. Start fast recovery with RACK in addition to other heuristics (e.g., DUPACK threshold, FACK). Prior to this change RACK is enabled to detect losses only after the recovery has started by other algorithms. 2. Disable TCP early retransmit. RACK subsumes the early retransmit with the new reordering timer feature. A latter patch in this series removes the early retransmit code. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yuchung Cheng authored
Currently RACK would mark loss before the undo operations in TCP loss recovery. This could incorrectly identify real losses as spurious. For example a sender first experiences a delay spike and then eventually some packets were lost due to buffer overrun. In this case, the sender should perform fast recovery b/c not all the packets were lost. But the sender may first trigger a (spurious) RTO and reset cwnd to 1. The following ACKs may used to mark real losses by tcp_rack_mark_lost. Then in tcp_process_loss this ACK could trigger F-RTO undo condition and unmark real losses and revert the cwnd reduction. If there are no more ACKs coming back, eventually the sender would timeout again instead of performing fast recovery. The patch fixes this incorrect process by always performing the undo checks before detecting losses. Fixes: 4f41b1c5 ("tcp: use RACK to detect losses") Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yuchung Cheng authored
The packets inside a jumbo skb (e.g., TSO) share the same skb timestamp, even though they are sent sequentially on the wire. Since RACK is based on time, it can not detect some packets inside the same skb are lost. However, we can leverage the packet sequence numbers as extended timestamps to detect losses. Therefore, when RACK timestamp is identical to skb's timestamp (i.e., one of the packets of the skb is acked or sacked), we use the sequence numbers of the acked and unacked packets to break ties. We can use the same sequence logic to advance RACK xmit time as well to detect more losses and avoid timeout. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yuchung Cheng authored
This patch makes RACK install a reordering timer when it suspects some packets might be lost, but wants to delay the decision a little bit to accomodate reordering. It does not create a new timer but instead repurposes the existing RTO timer, because both are meant to retransmit packets. Specifically it arms a timer ICSK_TIME_REO_TIMEOUT when the RACK timing check fails. The wait time is set to RACK.RTT + RACK.reo_wnd - (NOW - Packet.xmit_time) + fudge This translates to expecting a packet (Packet) should take (RACK.RTT + RACK.reo_wnd + fudge) to deliver after it was sent. When there are multiple packets that need a timer, we use one timer with the maximum timeout. Therefore the timer conservatively uses the maximum window to expire N packets by one timeout, instead of N timeouts to expire N packets sent at different times. The fudge factor is 2 jiffies to ensure when the timer fires, all the suspected packets would exceed the deadline and be marked lost by tcp_rack_detect_loss(). It has to be at least 1 jiffy because the clock may tick between calling icsk_reset_xmit_timer(timeout) and actually hang the timer. The next jiffy is to lower-bound the timeout to 2 jiffies when reo_wnd is < 1ms. When the reordering timer fires (tcp_rack_reo_timeout): If we aren't in Recovery we'll enter fast recovery and force fast retransmit. This is very similar to the early retransmit (RFC5827) except RACK is not constrained to only enter recovery for small outstanding flights. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yuchung Cheng authored
Record the most recent RTT in RACK. It is often identical to the "ca_rtt_us" values in tcp_clean_rtx_queue. But when the packet has been retransmitted, RACK choses to believe the ACK is for the (latest) retransmitted packet if the RTT is over minimum RTT. This requires passing the arrival time of the most recent ACK to RACK routines. The timestamp is now recorded in the "ack_time" in tcp_sacktag_state during the ACK processing. This patch does not change the RACK algorithm itself. It only adds the RTT variable to prepare the next main patch. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yuchung Cheng authored
Create a new helper tcp_rack_detect_loss to prepare the upcoming RACK reordering timer patch. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yuchung Cheng authored
Create a new helper tcp_rack_mark_skb_lost to prepare the upcoming RACK reordering timer support. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Satanand Burla authored
Remove assignment to ndo_select_queue so that fallback is used for selecting txq. Also remove the now-useless function that used to be assigned to ndo_select_queue. Signed-off-by: Satanand Burla <satananda.burla@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vivien Didelot authored
The Marvell 6352 chip has a 8-bit address/16-bit data EEPROM access. The Marvell 6390 chip has a 16-bit address/8-bit data EEPROM access. This patch implements the 8-bit data EEPROM access in the mv88e6xxx driver and adds its support to chips of the 6390 family. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 12 Jan, 2017 29 commits
-
-
Eric Dumazet authored
Current allocations are not NUMA aware, and lack proper cleanup in case of error. It is perfectly fine to use static per cpu allocations for 256 bytes per cpu. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Lebrun <david.lebrun@uclouvain.be> Acked-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
Recently we started using ipmr with thousands of entries and easily hit soft lockups on smaller devices. The reason is that the hash function uses the high order bits from the src and dst, but those don't change in many common cases, also the hash table is only 64 elements so with thousands it doesn't scale at all. This patch migrates the hash table to rhashtable, and in particular the rhl interface which allows for duplicate elements to be chained because of the MFC_PROXY support (*,G; *,*,oif cases) which allows for multiple duplicate entries to be added with different interfaces (IMO wrong, but it's been in for a long time). And here are some results from tests I've run in a VM: mr_table size (default, allocated for all namespaces): Before After 49304 bytes 2400 bytes Add 65000 routes (the diff is much larger on smaller devices): Before After 1m42s 58s Forwarding 256 byte packets with 65000 routes (test done in a VM): Before After 3 Mbps / ~1465 pps 122 Mbps / ~59000 pps As a bonus we no longer see the soft lockups on smaller devices which showed up even with 2000 entries before. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Fixes following warnings : net/core/secure_seq.c:125:28: warning: incorrect type in argument 1 (different base types) net/core/secure_seq.c:125:28: expected unsigned int const [unsigned] [usertype] a net/core/secure_seq.c:125:28: got restricted __be32 [usertype] saddr net/core/secure_seq.c:125:35: warning: incorrect type in argument 2 (different base types) net/core/secure_seq.c:125:35: expected unsigned int const [unsigned] [usertype] b net/core/secure_seq.c:125:35: got restricted __be32 [usertype] daddr net/core/secure_seq.c:125:43: warning: cast from restricted __be16 net/core/secure_seq.c:125:61: warning: restricted __be16 degrades to integer Fixes: 7cd23e53 ("secure_seq: use SipHash in place of MD5") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Prasad Kanneganti authored
Reduce the load time of the VF driver by decreasing the wait time between iterations of the loop that polls for a mailbox response from the PF. Also change the wait time units from jiffies to milliseconds. Signed-off-by: Prasad Kanneganti <prasad.kanneganti@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com> Signed-off-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: Satanand Burla <satananda.burla@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Felix Manlunas authored
Remove code that's no longer needed. It used to serve a purpose, which was to fix a link-related bug. For a while now, the NIC firmware has had a more elegant fix for that bug. Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: Satanand Burla <satananda.burla@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Joe Perches authored
commit bc1f4470 ("net: make ndo_get_stats64 a void function") mistakenly used a return value for this void conversion. Fix it. Signed-off-by: Joe Perches <joe@perches.com> cc: stephen hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Florian Fainelli says: ==================== net: mdio-gpio: Use modern GPIO helpers This patch series modernizes the mdio-gpio and makes it switch to the latest and greatest API for manipulating GPIO lines, thus allowing some simplifications in the driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Guenter Roeck authored
gpiod functions support handling low-active pins, so we can move thos code out of this driver into the gpio subsystem and simplify the code a bit. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Guenter Roeck authored
Using gpiod functions lets us use functionality which is not available with gpio functions. There is no gpiod function to match devm_gpio_request_one, so leave it in place and use gpio_to_desc() to convert absolute pin numbers to gpio descriptors. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Guenter Roeck authored
Using devm_gpio_request_one lets us request gpio pins with initial state in one go. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wei Yongjun authored
Fixes the following sparse warning: drivers/net/usb/cdc_ether.c:469:6: warning: symbol 'usbnet_cdc_zte_status' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sowmini Varadhan authored
The filter added by sock_setfilter is intended to only permit packets matching the pattern set up by create_payload(), but we only check the ip_len, and a single test-character in the IP packet to ensure this condition. Harden the filter by adding additional constraints so that we only permit UDP/IPv4 packets that meet the ip_len and test-character requirements. Include the bpf_asm src as a comment, in case this needs to be enhanced in the future Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wei Yongjun authored
Fixes the following sparse warning: net/core/lwt_bpf.c:355:5: warning: symbol 'bpf_lwt_prog_cmp' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Ursula Braun says: ==================== s390: qeth patches yesterday I came up with 13 qeth patches. Since you have not been happy with the 13th patch, I want to make sure that at least the remaining 12 qeth patches can be applied to net-next. Here is the resend of them. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ursula Braun authored
qeth devices in layer3 mode need a separate handling of vipa and proxy-arp addresses. vipa and proxy-arp addresses processed by qeth can be read from userspace. Introduced with commit 5f78e29c ("qeth: optimize IP handling in rx_mode callback") the retrieval of vipa and proxy-arp addresses is broken, if more than one vipa or proxy-arp address are set. The qeth code used local variable "int i" for 2 different purposes. This patch now spends 2 separate local variables of type "int". While touching these functions hash_for_each_safe() is converted to hash_for_each(), since there is no removal of hash entries. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Reference-ID: RQM 3524 Signed-off-by: David S. Miller <davem@davemloft.net>
-
Julian Wiedmann authored
STARTLAN needs to be the first IPA command after MPC initialization completes. So move the qeth_send_startlan() call from the layer disciplines into the core path, right after the MPC handshake. While at it, replace the magic LAN OFFLINE return code with the existing enum. Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Reviewed-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Julian Wiedmann authored
Move all MAC utility functions in one place, and drop the forward declarations. Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Reviewed-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Julian Wiedmann authored
This matches qeth_l2_write_mac(). Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Reviewed-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Julian Wiedmann authored
Consolidate errno handling for MAC management: Instead of doing this in every caller, do it in one place. Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Reviewed-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Suggested-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Julian Wiedmann authored
qeth_l2_send_groupmac() already translates the return code, so calling qeth_setdel_makerc() a second time only produces garbage. Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Reviewed-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Julian Wiedmann authored
The only caller passes del = 0, so remove both the parameter and the code that handles != 0. Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Reviewed-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Acked-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Julian Wiedmann authored
Remove unused define QETH_IP_HEADER_SIZE. Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Reviewed-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Acked-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Julian Wiedmann authored
Accessing the current hsuid via card->options.hsuid is perfectly fine, even when the card is DOWN. Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Reviewed-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Acked-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thomas Richter authored
When RX/TX checksum offloading is turned on and the adapter is an OSA 3 card in layer 3 mode, the checksum offloading is only performed when both peers use different adapters. If both peers share an OSA 3 card, communication is a memory copy and checksum offloading is not performed. This patch adds a warning to inform the administrator. OSA 3 in layer 2 mode does not offer the RX/TX checksum offload feature. Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thomas Richter authored
Turning on receive and/or transmit checksum offload support on the OSA card requires 2 commands: 1. start command which replies with available features 2. enable command to turn on selected features. The current version does not check the reply of the start command and simply uses the returned value to enable offload features. When the start command returns zero, this leads to a situation where no checksum offload is turned on by the hardware. Even worse no error indication is returned. The Linux kernel assumes the OSA card performs RX/TX checksum offload, but the hardware does not perform any checksum verification at all. This patch checks the return of the start and enable command responses from the hardware and turns off checksum offloading if the commands fails or does not respond with the correct bit setting. Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thomas Richter authored
Rework the RX/TX checksum offloading command sequence to use the provided function call back mechanims to return card data to the device driver. Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Daniel Borkmann says: ==================== More flexible BPF cb access This patch improves BPF's cb access by allowing b/h/w/dw access variants on it. For details, please see individual patches. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
When structs are used to store temporary state in cb[] buffer that is used with programs and among tail calls, then the generated code will not always access the buffer in bpf_w chunks. We can ease programming of it and let this act more natural by allowing for aligned b/h/w/dw sized access for cb[] ctx member. Various test cases are attached as well for the selftest suite. Potentially, this can also be reused for other program types to pass data around. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
Currently, when calling convert_ctx_access() callback for the various program types, we pass in insn->dst_reg, insn->src_reg, insn->off from the original instruction. This information is needed to rewrite the instruction that is based on the user ctx structure into a kernel representation for the ctx. As we'd like to allow access size beyond just BPF_W, we'd need also insn->code for that in order to decode the original access size. Given that, lets just pass insn directly to the convert_ctx_access() callback and work on that to not clutter the callback with even more arguments we need to pass when everything is already contained in insn. So lets go through that once, no functional change. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-