- 18 May, 2018 12 commits
-
-
Jose Abreu authored
We can remove the if condition and check if return code is different than -EINVAL, meaning callback is present. Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Joao Pinto <jpinto@synopsys.com> Cc: Vitor Soares <soares@synopsys.com> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jose Abreu authored
Stop using if conditions depending on the GMAC version for getting the descriptor skbuff address and use instead a helper implemented in the descriptor files. Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Joao Pinto <jpinto@synopsys.com> Cc: Vitor Soares <soares@synopsys.com> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jose Abreu authored
Currently an if condition is used to select the correct callback to set rx_onwer in descriptor. Lets keep this simple and always use the same callback. Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Joao Pinto <jpinto@synopsys.com> Cc: Vitor Soares <soares@synopsys.com> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jose Abreu authored
We either have .enable_dma_transmission or .set_tx_tail_ptr in the HW table callbacks, we can never have both so there is no need to check for GMAC version. Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Joao Pinto <jpinto@synopsys.com> Cc: Vitor Soares <soares@synopsys.com> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jose Abreu authored
Instead of relying on the GMAC version for choosing if we need to use dma_init or dma_init_{rx/tx}_chan callback, lets uniformize this and always use the dma_init_{rx/tx}_chan callbacks. While at it, fix the use of dma_init_chan callback, which shall be called for as many channels as the max of rx/tx channels. Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Joao Pinto <jpinto@synopsys.com> Cc: Vitor Soares <soares@synopsys.com> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jose Abreu authored
PTP and MMC modules base address can depend on the GMAC version. As this is HW specific lets move this base address calculation to hwif.c. Also, add an entry in the HW table so that we can specify the module offset. This can later be extended to more modules, if deemed necessary. Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Joao Pinto <jpinto@synopsys.com> Cc: Vitor Soares <soares@synopsys.com> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jose Abreu authored
With the introducion of callbacks check in hwif.h we only call the callback if HW supports it so there is no longer need to check for GMAC version. Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Joao Pinto <jpinto@synopsys.com> Cc: Vitor Soares <soares@synopsys.com> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jose Abreu authored
Instead of relying on the GMAC version for choosing if we need to use dma_{rx/tx}_mode or just dma_mode callback lets uniformize this and always use the dma_{rx/tx}_mode callbacks. Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Joao Pinto <jpinto@synopsys.com> Cc: Vitor Soares <soares@synopsys.com> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jose Abreu authored
Stop using if conditions depending on the GMAC version for clearing the descriptor and use instead a helper implemented in the descriptor files. Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Joao Pinto <jpinto@synopsys.com> Cc: Vitor Soares <soares@synopsys.com> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jose Abreu authored
Stop using if conditions depending on the GMAC version for setting the the descriptor skbuff address and use instead a helper implemented in the descriptor files. Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Joao Pinto <jpinto@synopsys.com> Cc: Vitor Soares <soares@synopsys.com> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jose Abreu authored
This is cutting down performance. Once the timer is armed it should run after the time expires for the first packet sent and not the last one. After this change, running iperf, the performance gain is +/- 24%. Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Joao Pinto <jpinto@synopsys.com> Cc: Vitor Soares <soares@synopsys.com> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jose Abreu authored
This enables OSP (Operate on Second Packet) for GMAC4. The feature allows DMA to fetch second descriptor while its still processing the first one. Running iperf, the performance gain is +/- 38%. Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Joao Pinto <jpinto@synopsys.com> Cc: Vitor Soares <soares@synopsys.com> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 17 May, 2018 28 commits
-
-
David S. Miller authored
Florian Fainelli says: ==================== net: Allow more drivers with COMPILE_TEST This patch series includes more drivers to be build tested with COMPILE_TEST enabled. This helps cover some of the issues I just ran into with missing a driver *sigh*. Chanves in v3: - drop the TI Keystone NETCP driver from the COMPILE_TEST additions Changes in v2: - allow FEC to build outside of CONFIG_ARM/ARM64 by defining a layout of registers, this is not meant to run, so this is not a real issue if we are not matching the correct register layout ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Fainelli authored
Those drivers build just fine with COMPILE_TEST, so make that possible. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Fainelli authored
The Freescale FEC driver builds fine with COMPILE_TEST, so make that possible. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Fainelli authored
Most of the TI drivers build just fine with COMPILE_TEST, cpmac (AR7) is the exception because it uses a header file from arch/mips/include/asm/mach-ar7/ar7.h and keystone netcp which requires help from drivers/soc/ti/ for queue management helpers. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Ahern authored
Add informative messages for error paths related to adding a VLAN to a device. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Manish Chopra authored
This patch makes use of build_skb() throughout in driver's receieve data path [HW gro flow and non HW gro flow]. With this, driver can build skb directly from the page segments which are already mapped to the hardware instead of allocating new SKB via netdev_alloc_skb() and memcpy the data which is quite costly. This really improves performance (keeping same or slight gain in rx throughput) in terms of CPU utilization which is significantly reduced [almost half] in non HW gro flow where for every incoming MTU sized packet driver had to allocate skb, memcpy headers etc. Additionally in that flow, it also gets rid of bunch of additional overheads [eth_get_headlen() etc.] to split headers and data in the skb. Tested with: system: 2 sockets, 4 cores per socket, hyperthreading, 2x4x2=16 cores iperf [server]: iperf -s iperf [client]: iperf -c <server_ip> -t 500 -i 10 -P 32 HW GRO off – w/o build_skb(), throughput: 36.8 Gbits/sec Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle Average: all 0.59 0.00 32.93 0.00 0.00 43.07 0.00 0.00 23.42 HW GRO off - with build_skb(), throughput: 36.9 Gbits/sec Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle Average: all 0.70 0.00 31.70 0.00 0.00 25.68 0.00 0.00 41.92 HW GRO on - w/o build_skb(), throughput: 36.9 Gbits/sec Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle Average: all 0.86 0.00 24.14 0.00 0.00 6.59 0.00 0.00 68.41 HW GRO on - with build_skb(), throughput: 37.5 Gbits/sec Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle Average: all 0.87 0.00 23.75 0.00 0.00 6.19 0.00 0.00 69.19 Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: Manish Chopra <manish.chopra@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queueDavid S. Miller authored
Jeff Kirsher says: ==================== 10GbE Intel Wired LAN Driver Updates 2018-05-17 This series contains updates to ixgbe, ixgbevf and ice drivers. Cathy Zhou resolves sparse warnings by using the force attribute. Mauro S M Rodrigues fixes a bug where IRQs were not freed if a PCI error recovery system opts to remove the device which causes ixgbe_io_error_detected() to return PCI_ERS_RESULT_DISCONNECT before calling ixgbe_close_suspend() which results in IRQs not freed and crashing when the remove handler calls pci_disable_device(). Resolved this by calling ixgbe_close_suspend() before evaluating the PCI channel state. Pavel Tatashin releases the rtnl_lock during the call to ixgbe_close_suspend() to allow scaling if device_shutdown() is multi-threaded. Emil modifies ixgbe to not validate the MAC address during a reset, unless the MAC was set on the host so that the VF will get a new MAC address every time it reloads. Also updates ixgbevf to set hw->mac.perm_addr in order to retain the custom MAC on a reset. Anirudh updates the ice NVM read/erase/update AQ commands to align with the latest specification. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Roman Mashak authored
Reported-by: Vlad Buslov <vladbu@mellanox.com> Reported-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dan Carpenter authored
We recently refactored this code and introduced a static checker warning. Smatch complains that if cmd->index is zero then we would underflow the arrays. That's obviously true. The question is whether we prevent cmd->index from being zero at a different level. I've looked at the code and I don't immediately see a check for that. Fixes: 062b3e1b ("net/ncsi: Refactor MAC, VLAN filters") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
syzkaller found that following program crashes the host : { int fd = socket(AF_SMC, SOCK_STREAM, 0); int val = 1; listen(fd, 0); shutdown(fd, SHUT_RDWR); setsockopt(fd, 6, TCP_NODELAY, &val, 4); } Simply initialize conn.tx_work & conn.send_lock at socket creation, rather than deeper in the stack. ODEBUG: assert_init not available (active state 0) object type: timer_list hint: (null) WARNING: CPU: 1 PID: 13988 at lib/debugobjects.c:329 debug_print_object+0x16a/0x210 lib/debugobjects.c:326 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 13988 Comm: syz-executor0 Not tainted 4.17.0-rc4+ #46 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 panic+0x22f/0x4de kernel/panic.c:184 __warn.cold.8+0x163/0x1b3 kernel/panic.c:536 report_bug+0x252/0x2d0 lib/bug.c:186 fixup_bug arch/x86/kernel/traps.c:178 [inline] do_error_trap+0x1de/0x490 arch/x86/kernel/traps.c:296 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992 RIP: 0010:debug_print_object+0x16a/0x210 lib/debugobjects.c:326 RSP: 0018:ffff880197a37880 EFLAGS: 00010086 RAX: 0000000000000061 RBX: 0000000000000005 RCX: ffffc90001ed0000 RDX: 0000000000004aaf RSI: ffffffff8160f6f1 RDI: 0000000000000001 RBP: ffff880197a378c0 R08: ffff8801aa7a0080 R09: ffffed003b5e3eb2 R10: ffffed003b5e3eb2 R11: ffff8801daf1f597 R12: 0000000000000001 R13: ffffffff88d96980 R14: ffffffff87fa19a0 R15: ffffffff81666ec0 debug_object_assert_init+0x309/0x500 lib/debugobjects.c:692 debug_timer_assert_init kernel/time/timer.c:724 [inline] debug_assert_init kernel/time/timer.c:776 [inline] del_timer+0x74/0x140 kernel/time/timer.c:1198 try_to_grab_pending+0x439/0x9a0 kernel/workqueue.c:1223 mod_delayed_work_on+0x91/0x250 kernel/workqueue.c:1592 mod_delayed_work include/linux/workqueue.h:541 [inline] smc_setsockopt+0x387/0x6d0 net/smc/af_smc.c:1367 __sys_setsockopt+0x1bd/0x390 net/socket.c:1903 __do_sys_setsockopt net/socket.c:1914 [inline] __se_sys_setsockopt net/socket.c:1911 [inline] __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1911 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: 01d2f7e2 ("net/smc: sockopts TCP_NODELAY and TCP_CORK") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ursula Braun <ubraun@linux.ibm.com> Cc: linux-s390@vger.kernel.org Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Don't store repr pointer to reprs array until the representor is successfully created. This avoids message about "representor destruction" even when it was never created. Also it cleans-up the flow. Also, check return value after port alloc. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Antoine Tenart says: ==================== net: mvpp2: small improvements Those 3 patches are small improvements to the Marvell PPv2 driver. The series does not conflict with the one sent about phylink and 1000/2500baseX support, so the two series can live in parallel. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yan Markman authored
Prevent flood of RX error prints during heavy traffic with weak signal in link by checking net_ratelimit() before using netdev_err(). Signed-off-by: Yan Markman <ymarkman@marvell.com> [Antoine: small rework, commit message] Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yan Markman authored
Remove special stop/start handling from the set_mac_address callback. All this special care is not needed, and can be removed. It also simplifies the up/down status in the driver and helps avoiding possible link status mismatch issues. Signed-off-by: Yan Markman <ymarkman@marvell.com> [Antoine: commit message] Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yan Markman authored
Avoid repeating the check for free aggregated descriptors when it already failed at the beginning of the function. Signed-off-by: Yan Markman <ymarkman@marvell.com> [Antoine: commit message] Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Antoine Tenart says: ==================== net: mvpp2: phylink conversion This series convert the Marvell PPv2 driver to phylink (models the MAC to PHY link). One important point is the PPv2 driver supports two probe modes: device tree and ACPI. This series only brings phylink support for the device tree mode, as the ACPI one will need further work. Still, the driver should be working as before when using ACPI. This split should be temporary, and was discussed with Marcin (in Cc.) who added ACPI support to the driver. Also as the SFP cages on both DB boards can be considered as non-wired. We thus chose not to describe those SFP cages and we use fixed-link. The rest of the series uses phylink to add support for 1000BaseX and 2500BaseX modes in the PPv2 driver. To do this, two patches are needed in the common PHY framework (patches 3 and 4). The last 4 patches modify the device tree to use the new PPv2 functionalities. The series has been tested for the device tree mode on the 7040-db, 8040-db and 8040-mcbin boards, to ensure all the interface where working as expected. @Dave: patches 7 to 10 should go through the mvebu tree (Gregory in Cc.) to avoid any conflict with the other mvebu dt patches taken during this cycle. The series is based on today's net-next. Since v2: - Removed the SFP description from the DB boards, as their SFP cages are wired properly. We now use fixed-link. - Because of this rework, split the series in two, so that the SFP part is reviewed separately. - Small fixes in the phylink patch. - Rebased on the latest net-next branch. Since v1: - Chose a different approach to the SFP changes, as the previous ones weren't valid and reworked both BD boards device trees. - Misc fixes. - Added Kishon's acked-by on one patch. - Rebaed on latest net-next branch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Antoine Tenart authored
This patch adds the 2500Base-X PHY mode support in the Marvell PPv2 driver. 2500Base-X is quite close to 1000Base-X and SGMII modes and uses nearly the same code path. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Antoine Tenart authored
This patch adds the 1000Base-X PHY mode support in the Marvell PPv2 driver. 1000Base-X is quite close the SGMII and uses nearly the same code path. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Antoine Tenart authored
This patch allow the CP110 comphy to configure some lanes in the 2.5G SGMII mode. This mode is quite close to SGMII and uses nearly the same code path. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Antoine Tenart authored
This patch adds one more generic PHY mode to the phy_mode enum, to allow configuring generic PHYs to the 2.5G SGMII mode by using the set_mode callback. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Acked-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Antoine Tenart authored
Convert the PPv2 driver to implement phylink helpers, and use phylink in DT mode. The other mode supported is ACPI, which will need further work in order to be entirely compatible with phylink. The MAC and GoP configuration functions were completely moved to fit into the phylink helpers. When a PHY is always present between the MAC and the physical port, phylink only is used, but when this is not the case (the MAC directly is connected to the physical port) the link IRQ is used to detect changes in the link state and call phylink_mac_change. The ACPI mode do not uses phylink as of now, and the changes shouldn't impact its use. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Antoine Tenart authored
Cosmetic patch to align the ethtool functions to ops definitions. This patch does not change in any way the driver's behaviour. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Merge tag 'wireless-drivers-next-for-davem-2018-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for 4.18 The first pull request for 4.18. As usual new features and bug fixes but nothing really special. I also merged wireless-drivers due to an iwlwifi patch dependency. Major changes: iwlwifi * implement Traffic Condition Monitor and use it for scan, BT coex and to detect when the AP doesn't support UAPSD properly * some more work for the 22000 family of devices; * introduce AMSDU rate control offload qtnfmac * DFS offload support rsi * roaming enhancements * increase max supported aggregation subframes * don't advertise 5 GHz support if the device doesn't support it brcmfmac * add support for BCM4366E chipset * add support for bcm43364 wireless chipset ath10k * enable temperature reads for QCA6174 and QCA9377 * add firmware memory dump support for QCA9984 * continue adding WCN3990 support via SNOC bus ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
YueHaibing authored
As documented in Documentation/timers/timers-howto.txt, replace msleep(1) with usleep_range(). Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tonghao Zhang authored
Introduce an new common helper to avoid redundancy. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Yuchung Cheng says: ==================== tcp: default RACK loss recovery This patch set implements the features correspond to the draft-ietf-tcpm-rack-03 version of the RACK draft. https://datatracker.ietf.org/meeting/101/materials/slides-101-tcpm-update-on-tcp-rack-00 1. SACK: implement equivalent DUPACK threshold heuristic in RACK to replace existing RFC6675 recovery (tcp_mark_head_lost). 2. Non-SACK: simplify RFC6582 NewReno implementation 3. RTO: apply RACK's time-based approach to avoid spuriouly marking very recently sent packets lost. 4. with (1)(2)(3), make RACK the exclusive fast recovery mechanism to mark losses based on time on S/ACK. Tail loss probe and F-RTO remain enabled by default as complementary mechanisms to send probes in CA_Open and CA_Loss states. The probes would solicit S/ACKs to trigger RACK time-based loss detection. All Google web and internal servers have been running RACK-only mode (4) for a while now. a/b experiments indicate RACK/TLP on average reduces recovery latency by 10% compared to RFC6675. RFC6675 is default-off now but can be enabled by disabling RACK (sysctl net.ipv4.tcp_recovery=0) for unseen issues. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yuchung Cheng authored
An RTO event indicates the head has not been acked for a long time after its last (re)transmission. But the other packets are not necessarily lost if they have been only sent recently (for example due to application limit). This patch would prohibit marking packets sent within an RTT to be lost on RTO event, using similar logic in TCP RACK detection. Normally the head (SND.UNA) would be marked lost since RTO should fire strictly after the head was sent. An exception is when the most recent RACK RTT measurement is larger than the (previous) RTO. To address this exception the head is always marked lost. Congestion control interaction: since we may not mark every packet lost, the congestion window may be more than 1 (inflight plus 1). But only one packet will be retransmitted after RTO, since tcp_retransmit_timer() calls tcp_retransmit_skb(...,segs=1). The connection still performs slow start from one packet (with Cubic congestion control). This commit was tested in an A/B test with Google web servers, and showed a reduction of 2% in (spurious) retransmits post timeout (SlowStartRetrans), and correspondingly reduced DSACKs (DSACKIgnoredOld) by 7%. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yuchung Cheng authored
Create and export a new helper tcp_rack_skb_timeout and move tcp_is_rack to prepare the final RTO change. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-