- 27 Feb, 2014 7 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wirelessDavid S. Miller authored
John W. Linville says: ==================== Regarding the mac80211 bits, Johannes says: "This time, I have a fix from Arik for scheduled scan recovery (something that only recently went into the tree), a memory leak fix from Eytan and a small regulatory bugfix from Inbal. The EAPOL change from Felix makes rekeying more stable while lots of traffic is flowing, and there's Emmanuel's and my fixes for a race in the code handling powersaving clients." Regarding the NFC bits, Samuel says: "We only have one candidate for 3.14 fixes, and this is a NCI NULL pointer dereference introduced during the 3.14 merge window." Regarding the iwlwifi bits, Emmanuel says: "This should fix an issue raised in iwldvm when we have lots of association failures. There is a bugzilla for this bug - it hasn't been validated by the user, but I hope it will do the trick." Beyond that... Amitkumar Karwar brings two mwifiex fixes, one to avoid a NULL pointer dereference and another to address an improperly timed interrupt. Arend van Spriel gives us a brcmfmac fix to avoid a crash during scatter-gather packet transfers. Avinash Patila offers an mwifiex to avoid an invalid memory access when a device is removed. Bing Zhao delivers a simple fix to avoid a naming conflict between libertas and mwifiex. Felix Fietkau provides a trio of ath9k fixes that properly account for sequence numbering in ps-poll frames, reduce the rate for false positives during baseband hang detection, and fix a regression related to rx descriptor handling. James Cameron shows us a libertas fix to ignore zero-length IEs when processing scan results. Kirill Tkhai brings a hostap fix to avoid prematurely freeing a timer. Stanislaw Gruszka fixes an ath9k locking problem. Sujith Manoharan addresses ETSI compliance for a device handled by ath9k by adjusting the minimum CCA power threshold values. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yuval Mintz authored
Commit c14db202 "bnx2x: Correct default Tx switching behaviour" supposedly changed the default Tx switching behaviour, but was missing the fastpath change required for FW to pass packets from PFs to VFs. Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsecDavid S. Miller authored
Steffen Klassert says: ==================== 1) Build fix for ip_vti when NET_IP_TUNNEL is not set. We need this set to have ip_tunnel_get_stats64() available. 2) Fix a NULL pointer dereference on sub policy usage. We try to access a xfrm_state from the wrong array. 3) Take xfrm_state_lock in xfrm_migrate_state_find(), we need it to traverse through the state lists. 4) Clone states properly on migration, otherwise we crash when we migrate a state with aead algorithm attached. 5) Fix unlink race when between thread context and timer when policies are deleted. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Lorenzo Colitti authored
Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
John W. Linville authored
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
-
Hiroaki SHIMODA authored
The allocated child qdisc is not freed in error conditions. Defer the allocation after user configuration turns out to be valid and acceptable. Fixes: cc106e44 ("net: sched: tbf: fix the calculation of max_size") Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> Cc: Yang Yingliang <yangyingliang@huawei.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Bohac authored
Enslaving a bond to itself leads to an endless loop and hangs the kernel. Signed-off-by: Jiri Bohac <jbohac@suse.cz> Tested-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 26 Feb, 2014 9 commits
-
-
Nikolay Aleksandrov authored
There's a bug in the slave release function which leads the transmit functions which use the bond->slave_cnt to a div by 0 because we might just have released our last slave and made slave_cnt == 0 but at the same time we may have a transmitter after the check for an empty list which will fetch it and use it in the slave id calculation. Fix it by moving the slave_cnt after synchronize_rcu so if this was our last slave any new transmitters will see an empty slave list which is checked after rcu lock but before calling the mode transmit functions which rely on bond->slave_cnt. Fixes: 278b2083 ("bonding: initial RCU conversion") CC: Veaceslav Falico <vfalico@redhat.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: Jay Vosburgh <fubar@us.ibm.com> CC: David S. Miller <davem@davemloft.net> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Freddy Xin authored
Add VID:DID for Lenovo OneLinkDock Gigabit LAN Signed-off-by: Freddy Xin <freddy@asix.com.tw> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Ding Tianhong says: ==================== Fix RTNL: assertion failed at net/core/rtnetlink.c The commit 1d3ee88a (bonding: add netlink attributes to slave link dev) make the bond_set_active_slave() and bond_set_backup_slave() use rtmsg_ifinfo to send slave's states and this functions should be called in RTNL. But the 902.3ad and ARP monitor did not hold the RTNL when calling thses two functions, so fix them. v1->v2: Add new micro to indicate that the notification should be send later, not never. And add a new patch to fix the same problem for ARP mode. v2->v3: modify the bond_should_notify to should_notify_rtnl, it is more reasonable, and use bool for should_notify_rtnl. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
dingtianhong authored
Veaceslav has reported and fix this problem by commit f2ebd477 (bonding: restructure locking of bond_ab_arp_probe()). According Jay's opinion, the current solution is not very well, because the notification is to indicate that the interface has actually changed state in a meaningful way, but these calls in the ab ARP monitor are internal settings of the flags to allow the ARP monitor to search for a slave to become active when there are no active slaves. The flag setting to active or backup is to permit the ARP monitor's response logic to do the right thing when deciding if the test slave (current_arp_slave) is up or not. So the best way to fix the problem is that we should not send a notification when the slave is in testing state, and check the state at the end of the monitor, if the slave's state recover, avoid to send pointless notification twice. And RTNL is really a big lock, hold it regardless the slave's state changed or not when the current_active_slave is null will loss performance (every 100ms), so we should hold it only when the slave's state changed and need to notify. I revert the old commit and add new modifications. Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Veaceslav Falico <vfalico@redhat.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
dingtianhong authored
The problem was introduced by the commit 1d3ee88a (bonding: add netlink attributes to slave link dev). The bond_set_active_slave() and bond_set_backup_slave() will use rtmsg_ifinfo to send slave's states, so these two functions should be called in RTNL. In 802.3ad mode, acquiring RTNL for the __enable_port and __disable_port cases is difficult, as those calls generally already hold the state machine lock, and cannot unconditionally call rtnl_lock because either they already hold RTNL (for calls via bond_3ad_unbind_slave) or due to the potential for deadlock with bond_3ad_adapter_speed_changed, bond_3ad_adapter_duplex_changed, bond_3ad_link_change, or bond_3ad_update_lacp_rate. All four of those are called with RTNL held, and acquire the state machine lock second. The calling contexts for __enable_port and __disable_port already hold the state machine lock, and may or may not need RTNL. According to the Jay's opinion, I don't think it is a problem that the slave don't send notify message synchronously when the status changed, normally the state machine is running every 100 ms, send the notify message at the end of the state machine if the slave's state changed should be better. I fix the problem through these steps: 1). add a new function bond_set_slave_state() which could change the slave's state and call rtmsg_ifinfo() according to the input parameters called notify. 2). Add a new slave parameter which called should_notify, if the slave's state changed and don't notify yet, the parameter will be set to 1, and then if the slave's state changed again, the param will be set to 0, it indicate that the slave's state has been restored, no need to notify any one. 3). the __enable_port and __disable_port should not call rtmsg_ifinfo in the state machine lock, any change in the state of slave could set a flag in the slave, it will indicated that an rtmsg_ifinfo should be called at the end of the state machine. Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Veaceslav Falico <vfalico@redhat.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Joe Perches authored
Add a new F: line for the intel subdirectories. This allows get_maintainers to avoid using git log and cc'ing people that have submitted clean-up style patches for all first level directories under drivers/net/ethernet/intel/ This does not make e100.c maintained. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Edward Cree authored
If we receive a PTP event from the NIC when we haven't set up PTP state in the driver, we attempt to read through a NULL pointer efx->ptp_data, triggering a panic. Signed-off-by: Edward Cree <ecree@solarflare.com> Acked-by: Shradha Shah <sshah@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
While LINUX_MIB_TCPSPURIOUS_RTX_HOSTQUEUES can only be incremented in tcp_transmit_skb() from softirq (incoming message or timer activation), it is better to use NET_INC_STATS() instead of NET_INC_STATS_BH() as tcp_transmit_skb() can be called from process context. This will avoid copy/paste confusion when/if we want to add other SNMP counters in tcp_transmit_skb() Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Steffen Klassert authored
When a policy is unlinked from the lists in thread context, the xfrm timer can fire before we can mark this policy as dead. So reinitialize the bydst hlist, then hlist_unhashed() will notice that this policy is not linked and will avoid a doulble unlink of that policy. Reported-by: Xianpeng Zhao <673321875@qq.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
-
- 25 Feb, 2014 15 commits
-
-
Cristian Bercaru authored
Masking the link partner's capabilities with local capabilities can be misleading in autonegotiation scenarios such as PAUSE frame autonegotiation. This patch calculates the join between the local capabilities and the link parner capabilities, when it determines the speed and duplex settings, but does not mask any of the link partner capabilities when it calculates PAUSE frame settings. Signed-off-by: Cristian Bercaru <cristian.bercaru@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Mike Pecovnik authored
netlink_sendmsg() was changed to prevent non-root processes from sending messages with dst_pid != 0. netlink_connect() however still only checks if nladdr->nl_groups is set. This patch modifies netlink_connect() to check for the same condition. Signed-off-by: Mike Pecovnik <mike.pecovnik@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thadeu Lima de Souza Cascardo authored
Without a shutdown handler, T4 cards behave very badly after a kexec. Some firmware calls return errors indicating allocation failures, for example. This is probably because thouse resources were not released by a BYE message to the firmware, for example. Using the remove handler guarantees we will use a well tested path. With this patch I applied, I managed to use kexec multiple times and probe and iSCSI login worked every time. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Shahed Shaikh says: ==================== qlcnic: Bug fixes This patch series includes following bug fixes, * Fix for return value handling of function qlcnic_enable_msi_legacy(). * Fix for the usage of module parameters for interrupt mode. Driver should use flags while checking for driver's interrupt mode instead of module parameters. * Revert commit 1414abea (qlcnic: Restrict VF from configuring any VLAN mode), in order to save some multicast filters. * Fix a bug where driver was not re-setting sds ring count to 1 when it falls back from MSI-x mode to legacy interrupt mode. Please apply to net. Change in v2 - Dropped patch "qlcnic: reset firmware API lock during driver load" for further rework. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Rajesh Borundia authored
o Driver was not re-setting sds ring count to 1 after failing to allocate msi-x interrupts. Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sucheta Chakraborty authored
o This patch reverts commit 1414abea (qlcnic: Restrict VF from configuring any VLAN mode.) This will allow same multicast address to be used with any VLAN instead of programming seperate (MAC, VLAN) tuples in adapter. This will help save some multicast filters. Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Shahed Shaikh authored
Once interrupts are enabled, instead of using module parameters, use flags (QLCNIC_MSI_ENABLED and QLCNIC_MSIX_ENABLED) set by driver to check interrupt mode. Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Shahed Shaikh authored
Driver was treating -ve return value as success in case of qlcnic_enable_msi_legacy() failure Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ursula Braun authored
To guarantee that a qdio ccw_device no longer touches the qdio memory shared with Linux, the qdio ccw_device should be offline when freeing the qdio memory. Thus this patch postpones freeing of qdio memory. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Reviewed-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hannes Frederic Sowa authored
Currently the UFO fragmentation process does not correctly handle inner UDP frames. (The following tcpdumps are captured on the parent interface with ufo disabled while tunnel has ufo enabled, 2000 bytes payload, mtu 1280, both sit device): IPv6: 16:39:10.031613 IP (tos 0x0, ttl 64, id 3208, offset 0, flags [DF], proto IPv6 (41), length 1300) 192.168.122.151 > 1.1.1.1: IP6 (hlim 64, next-header Fragment (44) payload length: 1240) 2001::1 > 2001::8: frag (0x00000001:0|1232) 44883 > distinct: UDP, length 2000 16:39:10.031709 IP (tos 0x0, ttl 64, id 3209, offset 0, flags [DF], proto IPv6 (41), length 844) 192.168.122.151 > 1.1.1.1: IP6 (hlim 64, next-header Fragment (44) payload length: 784) 2001::1 > 2001::8: frag (0x00000001:0|776) 58979 > 46366: UDP, length 5471 We can see that fragmentation header offset is not correctly updated. (fragmentation id handling is corrected by 916e4cf4 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data")). IPv4: 16:39:57.737761 IP (tos 0x0, ttl 64, id 3209, offset 0, flags [DF], proto IPIP (4), length 1296) 192.168.122.151 > 1.1.1.1: IP (tos 0x0, ttl 64, id 57034, offset 0, flags [none], proto UDP (17), length 1276) 192.168.99.1.35961 > 192.168.99.2.distinct: UDP, length 2000 16:39:57.738028 IP (tos 0x0, ttl 64, id 3210, offset 0, flags [DF], proto IPIP (4), length 792) 192.168.122.151 > 1.1.1.1: IP (tos 0x0, ttl 64, id 57035, offset 0, flags [none], proto UDP (17), length 772) 192.168.99.1.13531 > 192.168.99.2.20653: UDP, length 51109 In this case fragmentation id is incremented and offset is not updated. First, I aligned inet_gso_segment and ipv6_gso_segment: * align naming of flags * ipv6_gso_segment: setting skb->encapsulation is unnecessary, as we always ensure that the state of this flag is left untouched when returning from upper gso segmenation function * ipv6_gso_segment: move skb_reset_inner_headers below updating the fragmentation header data, we don't care for updating fragmentation header data * remove currently unneeded comment indicating skb->encapsulation might get changed by upper gso_segment callback (gre and udp-tunnel reset encapsulation after segmentation on each fragment) If we encounter an IPIP or SIT gso skb we now check for the protocol == IPPROTO_UDP and that we at least have already traversed another ip(6) protocol header. The reason why we have to special case GSO_IPIP and GSO_SIT is that we reset skb->encapsulation to 0 while skb_mac_gso_segment the inner protocol of GSO_UDP_TUNNEL or GSO_GRE packets. Reported-by: Wolfgang Walter <linux@stwm.de> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Tom Herbert <therbert@google.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Amir Vadai authored
Bump all Mellanox driver versions. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Felix Fietkau authored
Only set sc->rx.discard_next to rx_stats->rs_more when actually discarding the current descriptor. Also, fix a detection of broken descriptors: First the code checks if the current descriptor is not done. Then it checks if the next descriptor is done. Add a check that afterwards checks the first descriptor again, because it might have been completed in the mean time. This fixes a regression introduced in commit 723e7113 "ath9k: fix handling of broken descriptors" Cc: stable@vger.kernel.org Reported-by: Marco André Dinis <marcoandredinis@gmail.com> Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
-
Felix Fietkau authored
Check if the baseband state remains stable, and add a small delay between register reads. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
-
Kyle McMartin authored
Boris reports he's seeing: > [ 9.195943] INFO: trying to register non-static key. > [ 9.196031] the code is fine but needs lockdep annotation. > [ 9.196031] turning off the locking correctness validator. > [ 9.196031] CPU: 1 PID: 933 Comm: modprobe Not tainted 3.14.0-rc4+ #1 with the r8169 driver. These are occuring because the seqcount embedded in u64_stats_sync on 32-bit SMP is uninitialized which is making lockdep unhappy. Signed-off-by: Kyle McMartin <kyle@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
tcp_is_cwnd_limited() allows GSO/TSO enabled flows to increase their cwnd to allow a full size (64KB) TSO packet to be sent. Non GSO flows only allow an extra room of 3 MSS. For most flows with a BDP below 10 MSS, this results in a bloat of cwnd reaching 90, and an inflate of RTT. Thanks to TSO auto sizing, we can restrict the bloat to the number of MSS contained in a TSO packet (tp->xmit_size_goal_segs), to keep original intent without performance impact. Because we keep cwnd small, it helps to keep TSO packet size to their optimal value. Example for a 10Mbit flow, with low TCP Small queue limits (no more than 2 skb in qdisc/device tx ring) Before patch : lpk51:~# ./ss -i dst lpk52:44862 | grep cwnd cubic wscale:6,6 rto:215 rtt:15.875/2.5 mss:1448 cwnd:96 ssthresh:96 send 70.1Mbps unacked:14 rcv_space:29200 After patch : lpk51:~# ./ss -i dst lpk52:52916 | grep cwnd cubic wscale:6,6 rto:206 rtt:5.206/0.036 mss:1448 cwnd:15 ssthresh:14 send 33.4Mbps unacked:4 rcv_space:29200 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Cc: Van Jacobson <vanj@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 24 Feb, 2014 6 commits
-
-
Tobias Klauser authored
alloc_dma_desc_resources() returns an error value and the next line actually checks for it, so assign the return value properly. Found by the coverity scanner. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jason Wang authored
We should alloc big buffers also when guest can receive UFO packets to let the big packets fit into guest rx buffer. Fixes 5c516751 (virtio-net: Allow UFO feature to be set and advertised.) Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Felix Fietkau authored
When passing tx frames to the U-APSD queue for powersave poll responses, the ath_atx_tid pointer needs to be passed to ath_tx_setup_buffer for proper sequence number accounting. This fixes high latency and connection stability issues with ath9k running as AP and a few kinds of mobile phones as client, when PS-Poll is heavily used Cc: stable@vger.kernel.org Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
-
Bing Zhao authored
Both libertas USB driver and mwifiex_usb driver are registerring with name 'usb8xxx'. The following conflict happens while trying to load both drivers. [6.211307] Error: Driver 'usb8xxx' is already registered... [6.217261] mwifiex_usb: Driver register failed! Fix it by renaming mwifiex_usb driver's name. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Bing Zhao <bzhao@marvell.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-fixesJohn W. Linville authored
Samuel Ortiz <sameo@linux.intel.com> says: "NFC: 3.14: First pull request We only have one candidate for 3.14 fixes, and this is a NCI NULL pointer dereference introduced during the 3.14 merge window." Signed-off-by: John W. Linville <linville@tuxdriver.com>
-
- 23 Feb, 2014 1 commit
-
-
Amitkumar Karwar authored
The check should be for setup function pointer. This patch fixes NULL pointer dereference issue for NCI based NFC driver which doesn't define setup handler. Signed-off-by: Amitkumar Karwar <akarwar@marvell.com> Signed-off-by: Bing Zhao <bzhao@marvell.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
-
- 22 Feb, 2014 2 commits
-
-
Hannes Frederic Sowa authored
Currently we generate a new fragmentation id on UFO segmentation. It is pretty hairy to identify the correct net namespace and dst there. Especially tunnels use IFF_XMIT_DST_RELEASE and thus have no skb_dst available at all. This causes unreliable or very predictable ipv6 fragmentation id generation while segmentation. Luckily we already have pregenerated the ip6_frag_id in ip6_ufo_append_data and can use it here. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
Problem statement: 1) both paths (primary path1 and alternate path2) are up after the association has been established i.e., HB packets are normally exchanged, 2) path2 gets inactive after path_max_retrans * max_rto timed out (i.e. path2 is down completely), 3) now, if a transmission times out on the only surviving/active path1 (any ~1sec network service impact could cause this like a channel bonding failover), then the retransmitted packets are sent over the inactive path2; this happens with partial failover and without it. Besides not being optimal in the above scenario, a small failure or timeout in the only existing path has the potential to cause long delays in the retransmission (depending on RTO_MAX) until the still active path is reselected. Further, when the T3-timeout occurs, we have active_patch == retrans_path, and even though the timeout occurred on the initial transmission of data, not a retransmit, we end up updating retransmit path. RFC4960, section 6.4. "Multi-Homed SCTP Endpoints" states under 6.4.1. "Failover from an Inactive Destination Address" the following: Some of the transport addresses of a multi-homed SCTP endpoint may become inactive due to either the occurrence of certain error conditions (see Section 8.2) or adjustments from the SCTP user. When there is outbound data to send and the primary path becomes inactive (e.g., due to failures), or where the SCTP user explicitly requests to send data to an inactive destination transport address, before reporting an error to its ULP, the SCTP endpoint should try to send the data to an alternate __active__ destination transport address if one exists. When retransmitting data that timed out, if the endpoint is multihomed, it should consider each source-destination address pair in its retransmission selection policy. When retransmitting timed-out data, the endpoint should attempt to pick the most divergent source-destination pair from the original source-destination pair to which the packet was transmitted. Note: Rules for picking the most divergent source-destination pair are an implementation decision and are not specified within this document. So, we should first reconsider to take the current active retransmission transport if we cannot find an alternative active one. If all of that fails, we can still round robin through unkown, partial failover, and inactive ones in the hope to find something still suitable. Commit 4141ddc0 ("sctp: retran_path update bug fix") broke that behaviour by selecting the next inactive transport when no other active transport was found besides the current assoc's peer.retran_path. Before commit 4141ddc0, we would have traversed through the list until we reach our peer.retran_path again, and in case that is still in state SCTP_ACTIVE, we would take it and return. Only if that is not the case either, we take the next inactive transport. Besides all that, another issue is that transports in state SCTP_UNKNOWN could be preferred over transports in state SCTP_ACTIVE in case a SCTP_ACTIVE transport appears after SCTP_UNKNOWN in the transport list yielding a weaker transport state to be used in retransmission. This patch mostly reverts 4141ddc0, but also rewrites this function to introduce more clarity and strictness into the code. A strict priority of transport states is enforced in this patch, hence selection is active > unkown > partial failover > inactive. Fixes: 4141ddc0 ("sctp: retran_path update bug fix") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Acked-by: Vlad Yasevich <yasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-