- 08 Nov, 2017 21 commits
-
-
Nogah Frankel authored
Add more statistics to be collected from the HW periodically. These stats are tclass based (beside ECN marked packet, that exist only port based). They are needed to expose RED qdisc stats and xstats correctly. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yuval Mintz authored
This adds the counter group definitions for 2 new counter groups which are necessary for gaining ECN & wred counters. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nogah Frankel authored
Add support for ndo_setup_tc with enum tc_setup_type value of TC_SETUP_RED. This call sets RED qdisc on a traffic class. This patch supports RED qdisc only as a root qdisc and set in on the default tclass. It can be set with or without ECN. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nogah Frankel authored
This patch adds 2 new registers: - Congestion WRED ECN TClass Profile Register [CWTP] - Congestion WRED ECN TClass and Pool Mapping Register [CWTPM] These registers would later be needed to offload RED-related functionality to the HW. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nogah Frankel authored
Change TC_SETUP_CBS to TC_SETUP_QDISC_CBS to match the new convention.. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nogah Frankel authored
Change TC_SETUP_MQPRIO to TC_SETUP_QDISC_MQPRIO to match the new convention. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nogah Frankel authored
Add the ability to offload RED qdisc by using ndo_setup_tc. There are four commands for RED offloading: * TC_RED_SET: handles set and change. * TC_RED_DESTROY: handle qdisc destroy. * TC_RED_STATS: update the qdiscs counters (given as reference) * TC_RED_XSTAT: returns red xstats. Whether RED is being offloaded is being determined every time dump action is being called because parent change of this qdisc could change its offload state but doesn't require any RED function to be called. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Lawrence Brakmo authored
The original patch had the wrong filename. Fixes: bfdf7569 ("bpf: create samples/bpf/tcp_bpf.readme") Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Tom Herbert says: ==================== ila: make identifier format optional and other fixes The identifier type and checksum neutral mapping bits are optional in identifier formats. This patch set fixes the implementation to make them optional and configurable. Specific items: - Clean up checksum diff code in ILA - Add checksum neutral mapping auto so that checksum neutral mapping can be configured without requiring use of the C-bit - Add identifier type configuration and allow identifier type to be configured so that the identifier type field does not need to be present - Added ILA documention: ila.txt I have fixes for ILA in iproute2 that will be poseted separately. Tested: Ran netperf TCP_RR on various combinations of checksum mode and the two supported identifier types. v2: - Add proper sign off - In ILA LWT, only check prefix length includes identifier type if identifier type is enabled (ILA_ATYPE_USE_FORMAT). - Add a hook type so that it can be specified whether ILA translation is done on input or output route funciton in LWT. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tom Herbert authored
Add documenation for kernel ILA. This describes ILA, features, configuration gives some examples. Signed-off-by: Tom Herbert <tom@quantonium.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tom Herbert authored
In LWT tunnels both an input and output route method is defined. If both of these are executed in the same path then double translation happens and the effect is not correct. This patch adds a new attribute that indicates the hook type. Two values are defined for route output and route output. ILA translation is only done for the one that is set. The default is to enable ILA on route output. Signed-off-by: Tom Herbert <tom@quantonium.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tom Herbert authored
Allow identifier to be explicitly configured for a mapping. This can either be one of the identifier types specified in the ILA draft or a value of ILA_ATYPE_USE_FORMAT which means the identifier type is inferred from the identifier type field. If a value other than ILA_ATYPE_USE_FORMAT is set for a mapping then it is assumed that the identifier type field is not present in an identifier. Signed-off-by: Tom Herbert <tom@quantonium.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tom Herbert authored
Add checksum neutral auto that performs checksum neutral mapping without using the C-bit. This is enabled by configuration of a mapping. The checksum neutral function has been split into ila_csum_do_neutral_fmt and ila_csum_do_neutral_nofmt. The former handles the C-bit and includes it in the adjustment value. The latter just sets the adjustment value on the locator diff only. Added configuration for checksum neutral map aut in ila_lwt and ila_xlat. Signed-off-by: Tom Herbert <tom@quantonium.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tom Herbert authored
Consolidate computing checksum diff into one function. Add get_csum_diff_iaddr that computes the checksum diff between an address argument and locator being written. get_csum_diff calls this using the destination address in the IP header as the argument. Also moved ila_init_saved_csum to be close to the checksum diff functions. Signed-off-by: Tom Herbert <tom@quantonium.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christina Jacob authored
Implements port to port forwarding with route table and arp table lookup for ipv4 packets using bpf_redirect helper function and lpm_trie map. Signed-off-by: Christina Jacob <Christina.Jacob@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Troy Kisky authored
This is better for code locality and should slightly speed up normal interrupts. This also allows PPS clock output to start working for i.mx7. This is because i.mx7 was already using the limit of 3 interrupts, and needed another. Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com> Acked-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Vitaly Kuznetsov says: ==================== hv_netvsc: fix a hang on channel/mtu changes It was found that netvsc driver doesn't survive e.g. test. I was able to identify a hang in guest/host communication, it is fixed by PATCH1 of this series. PATCH2 is a cosmetic change masking unneeded messages. Changes since v1: - Throw away patches 2 and 3 of the original series as one is unneeded and the other is not justified [Eric Dumazet, Stephen Hemminger] so I'm only fixing the hang now, the crash doesn't reproduce. Will keep an eye on it. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vitaly Kuznetsov authored
Hyper-V hosts are known to send RNDIS messages even after we halt the device in rndis_filter_halt_device(). Remove user visible messages as they are not really useful. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vitaly Kuznetsov authored
It was found that in some cases host refuses to teardown GPADL for send/ receive buffers (probably when some work with these buffere is scheduled or ongoing). Change the teardown logic to be: 1) Send NVSP_MSG1_TYPE_REVOKE_* messages 2) Close the channel 3) Teardown GPADLs. This seems to work reliably. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Maciej S. Szmigiero authored
Currently, we create a LED trigger for any link speed known to a PHY. These triggers only fire when their exact link speed had been negotiated (they aren't cumulative, that is, they don't fire for "their or any higher" link speed). What we are missing, however, is a trigger which will fire on any link speed known to the PHY. Such trigger can then be used for implementing a poor man's substitute of the "link" LED on boards that lack it. Let's add it. Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Maciej S. Szmigiero authored
Currently, phy_led_trigger_change_speed() is handling a "no link" condition like it was some kind of an error (using "goto" to a code at the function end). However, having no link at PHY is an ordinary operational state, so let's handle it in an appropriately named separate function so it is more obvious what the code is doing. Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 07 Nov, 2017 1 commit
-
-
Eric Dumazet authored
Fixes a case where GFP_ATOMIC allocation must be used instead of GFP_KERNEL one. [ 54.891146] lock_acquire+0xb3/0x2f0 [ 54.891153] ? fs_reclaim_acquire.part.60+0x5/0x30 [ 54.891165] fs_reclaim_acquire.part.60+0x29/0x30 [ 54.891170] ? fs_reclaim_acquire.part.60+0x5/0x30 [ 54.891178] kmem_cache_alloc_trace+0x3f/0x500 [ 54.891186] ? cyc2ns_read_end+0x1e/0x30 [ 54.891196] ipv6_add_addr+0x15a/0xc30 [ 54.891217] ? ipv6_create_tempaddr+0x2ea/0x5d0 [ 54.891223] ipv6_create_tempaddr+0x2ea/0x5d0 [ 54.891238] ? manage_tempaddrs+0x195/0x220 [ 54.891249] ? addrconf_prefix_rcv_add_addr+0x1c0/0x4f0 [ 54.891255] addrconf_prefix_rcv_add_addr+0x1c0/0x4f0 [ 54.891268] addrconf_prefix_rcv+0x2e5/0x9b0 [ 54.891279] ? neigh_update+0x446/0xb90 [ 54.891298] ? ndisc_router_discovery+0x5ab/0xf00 [ 54.891303] ndisc_router_discovery+0x5ab/0xf00 [ 54.891311] ? retint_kernel+0x2d/0x2d [ 54.891331] ndisc_rcv+0x1b6/0x270 [ 54.891340] icmpv6_rcv+0x6aa/0x9f0 [ 54.891345] ? ipv6_chk_mcast_addr+0x176/0x530 [ 54.891351] ? do_csum+0x17b/0x260 [ 54.891360] ip6_input_finish+0x194/0xb20 [ 54.891372] ip6_input+0x5b/0x2c0 [ 54.891380] ? ip6_rcv_finish+0x320/0x320 [ 54.891389] ip6_mc_input+0x15a/0x250 [ 54.891396] ipv6_rcv+0x772/0x1050 [ 54.891403] ? consume_skb+0xbe/0x2d0 [ 54.891412] ? ip6_make_skb+0x2a0/0x2a0 [ 54.891418] ? ip6_input+0x2c0/0x2c0 [ 54.891425] __netif_receive_skb_core+0xa0f/0x1600 [ 54.891436] ? process_backlog+0xac/0x400 [ 54.891441] process_backlog+0xfa/0x400 [ 54.891448] ? net_rx_action+0x145/0x1130 [ 54.891456] net_rx_action+0x310/0x1130 [ 54.891524] ? RTUSBBulkReceive+0x11d/0x190 [mt7610u_sta] [ 54.891538] __do_softirq+0x140/0xaba [ 54.891553] irq_exit+0x10b/0x160 [ 54.891561] do_IRQ+0xbb/0x1b0 Fixes: f3d9832e ("ipv6: addrconf: cleanup locking in ipv6_add_addr") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Acked-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 05 Nov, 2017 18 commits
-
-
David S. Miller authored
Roman Gushchin says: ==================== eBPF-based device cgroup controller This patchset introduces an eBPF-based device controller for cgroup v2. Patches (1) and (2) are a preparational work required to share some code with the existing device controller implementation. Patch (3) is the main patch, which introduces a new bpf prog type and all necessary infrastructure. Patch (4) moves cgroup_helpers.c/h to use them by patch (4). Patch (5) implements an example of eBPF program which controls access to device files and corresponding userspace test. v3: Renamed constants introduced by patch (3) to BPF_DEVCG_* v2: Added patch (1). v1: https://lkml.org/lkml/2017/11/1/363 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Roman Gushchin authored
Add a test for device cgroup controller. The test loads a simple bpf program which logs all device access attempts using trace_printk() and forbids all operations except operations with /dev/zero and /dev/urandom. Then the test creates and joins a test cgroup, and attaches the bpf program to it. Then it tries to perform some simple device operations and checks the result: create /dev/null (should fail) create /dev/zero (should pass) copy data from /dev/urandom to /dev/zero (should pass) copy data from /dev/urandom to /dev/full (should fail) copy data from /dev/random to /dev/zero (should fail) Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Roman Gushchin authored
The purpose of this move is to use these files in bpf tests. Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Roman Gushchin authored
Cgroup v2 lacks the device controller, provided by cgroup v1. This patch adds a new eBPF program type, which in combination of previously added ability to attach multiple eBPF programs to a cgroup, will provide a similar functionality, but with some additional flexibility. This patch introduces a BPF_PROG_TYPE_CGROUP_DEVICE program type. A program takes major and minor device numbers, device type (block/character) and access type (mknod/read/write) as parameters and returns an integer which defines if the operation should be allowed or terminated with -EPERM. Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Roman Gushchin authored
This is non-functional change to prepare the device cgroup code for adding eBPF-based controller for cgroups v2. The patch performs the following changes: 1) __devcgroup_inode_permission() and devcgroup_inode_mknod() are moving to the device-cgroup.h and converting into static inline. 2) __devcgroup_check_permission() is exported. 3) devcgroup_check_permission() wrapper is introduced to be used by both existing and new bpf-based implementations. Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Roman Gushchin authored
Rename device type and access type constants defined in security/device_cgroup.c by adding the DEVCG_ prefix. The reason behind this renaming is to make them global namespace friendly, as they will be moved to the corresponding header file by following patches. Signed-off-by: Roman Gushchin <guro@fb.com> Cc: David S. Miller <davem@davemloft.net> Cc: Tejun Heo <tj@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linuxDavid S. Miller authored
Saeed Mahameed says: ==================== mlx5-updates-2017-11-04 This series includes: From Huy: dscp to priority mapping for Ethernet packet. =================================================== First six patches enable differentiated services code point (dscp) to priority mapping for Ethernet packet. Once this feature is enabled, the packet is routed to the corresponding priority based on its dscp. User can combine this feature with priority flow control (pfc) feature to have priority flow control based on the dscp. Firmware interface: Mellanox firmware provides two control knobs for this feature: QPTS register allow changing the trust state between dscp and pcp mode. The default is pcp mode. Once in dscp mode, firmware will route the packet based on its dscp value if the dscp field exists. QPDPM register allow mapping a specific dscp (0 to 63) to a specific priority (0 to 7). By default, all the dscps are mapped to priority zero. Software interface: This feature is controlled via application priority TLV. IEEE specification P802.1Qcd/D2.1 defines priority selector id 5 for application priority TLV. This APP TLV selector defines DSCP to priority map. This APP TLV can be sent by the switch or can be set locally using software such as lldptool. In mlx5 drivers, we add the support for net dcb's getapp and setapp call back. Mlx5 driver only handles the selector id 5 application entry (dscp application priority application entry). If user sends multiple dscp to priority APP TLV entries on the same dscp, the last sent one will take effect. All the previous sent will be deleted. The firmware trust state (in QPTS register) is changed based on the number of dscp to priority application entries. When the first dscp to priority application entry is added by the user, the trust state is changed to dscp. When the last dscp to priority application entry is deleted by the user, the trust state is changed to pcp. When the port is in DSCP trust state, the transmit queue is selected based on the dscp of the skb. When the port is in DSCP trust state and vport inline mode is not NONE, firmware requires mlx5 driver to copy the IP header to the wqe ethernet segment inline header if the skb has it. This is done by changing the transmit queue sq's min inline mode to L3. Note that the min inline mode of sqs that belong to other features such as xdpsq, icosq are not modified. =================================================== Plus to the dscp series, some small misc changes are include as well: From Inbar, Ethtool msglvl support and some debug prints in DCBNL logic From Or Gerlitz, Enlarge the NIC TC offload table size From Rabie, Initialize destination_flow struct to 0 From Feras, Add inner TTC table to IPoIB flow steering From Tal, Enable CQE based moderation on TX CQ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Simon Horman says: ==================== nfp: ethtool and related improvements Dirk van der Merwe says: This patch series throws a couple of loosely related items into a single series. Patch 1: Clang compilation fix reported by Matthias Kaehlcke <mka@chromium.org> Patch 2: Driver can now do MAC reinit on load when there has been a media override set in the NSP. Patch 3: Refactor the nfp_app_reprs_set API. Patch 4: Similar to vNICs, representors must be able to deal with media override changes in the NSP. Patch 5: Since representors can now handle media overrides, we can allocate the get/set link ndo's to them. Patch 6 & 7: Add support for FEC mode modification. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dirk van der Merwe authored
Add support in the driver ethtool ops to modify the NFP FEC modes. The FEC modes can be set for vNIC associated with physical ports or for MAC representor netdevs. Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dirk van der Merwe authored
Implement helpers to determine and modify FEC modes via the NSP. The NSP advertises FEC capabilities on a per port basis and provides support for: * Auto mode selection * Reed Solomon * BaseR * None/Off Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dirk van der Merwe authored
Since it is now safe to modify link settings for representors, we can attach the get/set link settings ndos to it. The get/set link settings are nfp_port based operations. If a port becomes invalid, the representor will be removed in the same way a vnic would be. Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dirk van der Merwe authored
If the NSP port table has been refreshed, resync the representor state with the new port information. At the moment, this only entails looking for invalid ports and killing off representors associated with them. The repr instance becomes NULL which is safe since the app accessor function for reprs returns NULL when it cannot access a repr. Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dirk van der Merwe authored
The criteria that reprs cannot be replaced with another new set of reprs has been removed. This check is not needed since the only use case that could exercise this at the moment, would be to modify the number of SRIOV VFs without first disabling them. This case is explicitly disallowed in any case and subsequent patches in this series need to be able to replace the running set of reprs. All cases where the return code used to be checked for the nfp_app_reprs_set function have been removed. As stated above, it is not possible for the current code to encounter a case where reprs exist and need to be replaced. Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Recent management FW images can perform full reinit of MAC cores without requiring a reboot. When loading the driver check if there are changes pending and if so call NSP MAC reinit. Full application FW reload is still required, and all MACs need to be reinited at the same time (not only the ones which have been reconfigured, and thus potentially causing disruption to unrelated netdevs) therefore for now changing MAC config without reloading the driver still remains future work. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Tested-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Matthias reports: nfp_eth_set_bit_config() is marked as __always_inline to allow gcc to identify the 'mask' parameter as known to be constant at compile time, which is required to use the FIELD_GET() macro. The forced inlining does the trick for gcc, but for kernel builds with clang it results in undefined symbols: drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.o: In function `__nfp_eth_set_aneg': drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c:(.text+0x787): undefined reference to `__compiletime_assert_492' drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c:(.text+0x7b1): undefined reference to `__compiletime_assert_496' These __compiletime_assert_xyx() calls would have been optimized away if the compiler had seen 'mask' as a constant. Add a macro to extract the mask and shift and pass those to nfp_eth_set_bit_config() separately. Reported-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Tested-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Priyaranjan Jha authored
Currently TCP RACK loss detection does not work well if packets are being reordered beyond its static reordering window (min_rtt/4).Under such reordering it may falsely trigger loss recoveries and reduce TCP throughput significantly. This patch improves that by increasing and reducing the reordering window based on DSACK, which is now supported in major TCP implementations. It makes RACK's reo_wnd adaptive based on DSACK and no. of recoveries. - If DSACK is received, increment reo_wnd by min_rtt/4 (upper bounded by srtt), since there is possibility that spurious retransmission was due to reordering delay longer than reo_wnd. - Persist the current reo_wnd value for TCP_RACK_RECOVERY_THRESH (16) no. of successful recoveries (accounts for full DSACK-based loss recovery undo). After that, reset it to default (min_rtt/4). - At max, reo_wnd is incremented only once per rtt. So that the new DSACK on which we are reacting, is due to the spurious retx (approx) after the reo_wnd has been updated last time. - reo_wnd is tracked in terms of steps (of min_rtt/4), rather than absolute value to account for change in rtt. In our internal testing, we observed significant increase in throughput, in scenarios where reordering exceeds min_rtt/4 (previous static value). Signed-off-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Vivien Didelot says: ==================== net: dsa: parsing stage When registering a DSA switch, there is basically two stages. The first stage is the parsing of the switch device, from either device tree or platform data. It fetches the DSA tree to which it belongs, and validates its ports. The switch device is then added to the tree, and the second stage is called if this was the last switch of the tree. The second stage is the setup of the tree, which validates that the tree is complete, sets up the routing tables, the default CPU port for user ports, sets up the switch drivers and finally the master interfaces, which makes the whole switch fabric functional. This patch series covers the first parsing stage. It fixes the type of the switch and tree indexes to unsigned int, simplifies the tree reference counting and the switch and CPU ports parsing. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vivien Didelot authored
Extend the dsa_port_parse_cpu() function to resolve the tagging protocol at port parsing time, instead of waiting for the whole tree to be complete. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-