Commit 8ceaec02 authored by Daniel Borkmann's avatar Daniel Borkmann Committed by Willy Tarreau

net: sctp: rework multihoming retransmission path selection to rfc4960

commit 4c47af4d upstream.

Problem statement: 1) both paths (primary path1 and alternate
path2) are up after the association has been established i.e.,
HB packets are normally exchanged, 2) path2 gets inactive after
path_max_retrans * max_rto timed out (i.e. path2 is down completely),
3) now, if a transmission times out on the only surviving/active
path1 (any ~1sec network service impact could cause this like
a channel bonding failover), then the retransmitted packets are
sent over the inactive path2; this happens with partial failover
and without it.

Besides not being optimal in the above scenario, a small failure
or timeout in the only existing path has the potential to cause
long delays in the retransmission (depending on RTO_MAX) until
the still active path is reselected. Further, when the T3-timeout
occurs, we have active_patch == retrans_path, and even though the
timeout occurred on the initial transmission of data, not a
retransmit, we end up updating retransmit path.

RFC4960, section 6.4. "Multi-Homed SCTP Endpoints" states under
6.4.1. "Failover from an Inactive Destination Address" the
following:

  Some of the transport addresses of a multi-homed SCTP endpoint
  may become inactive due to either the occurrence of certain
  error conditions (see Section 8.2) or adjustments from the
  SCTP user.

  When there is outbound data to send and the primary path
  becomes inactive (e.g., due to failures), or where the SCTP
  user explicitly requests to send data to an inactive
  destination transport address, before reporting an error to
  its ULP, the SCTP endpoint should try to send the data to an
  alternate __active__ destination transport address if one
  exists.

  When retransmitting data that timed out, if the endpoint is
  multihomed, it should consider each source-destination address
  pair in its retransmission selection policy. When retransmitting
  timed-out data, the endpoint should attempt to pick the most
  divergent source-destination pair from the original
  source-destination pair to which the packet was transmitted.

  Note: Rules for picking the most divergent source-destination
  pair are an implementation decision and are not specified
  within this document.

So, we should first reconsider to take the current active
retransmission transport if we cannot find an alternative
active one. If all of that fails, we can still round robin
through unkown, partial failover, and inactive ones in the
hope to find something still suitable.

Commit 4141ddc0 ("sctp: retran_path update bug fix") broke
that behaviour by selecting the next inactive transport when
no other active transport was found besides the current assoc's
peer.retran_path. Before commit 4141ddc0, we would have
traversed through the list until we reach our peer.retran_path
again, and in case that is still in state SCTP_ACTIVE, we would
take it and return. Only if that is not the case either, we
take the next inactive transport.

Besides all that, another issue is that transports in state
SCTP_UNKNOWN could be preferred over transports in state
SCTP_ACTIVE in case a SCTP_ACTIVE transport appears after
SCTP_UNKNOWN in the transport list yielding a weaker transport
state to be used in retransmission.

This patch mostly reverts 4141ddc0, but also rewrites
this function to introduce more clarity and strictness into
the code. A strict priority of transport states is enforced
in this patch, hence selection is active > unkown > partial
failover > inactive.

Fixes: 4141ddc0 ("sctp: retran_path update bug fix")
Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
Cc: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Acked-by: default avatarVlad Yasevich <yasevich@gmail.com>
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
[wt: picked updated function from 3.12 except the debug statement]
Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
parent ad1336d2
...@@ -1301,82 +1301,111 @@ void sctp_assoc_update(struct sctp_association *asoc, ...@@ -1301,82 +1301,111 @@ void sctp_assoc_update(struct sctp_association *asoc,
} }
/* Update the retran path for sending a retransmitted packet. /* Update the retran path for sending a retransmitted packet.
* Round-robin through the active transports, else round-robin * See also RFC4960, 6.4. Multi-Homed SCTP Endpoints:
* through the inactive transports as this is the next best thing *
* we can try. * When there is outbound data to send and the primary path
*/ * becomes inactive (e.g., due to failures), or where the
void sctp_assoc_update_retran_path(struct sctp_association *asoc) * SCTP user explicitly requests to send data to an
* inactive destination transport address, before reporting
* an error to its ULP, the SCTP endpoint should try to send
* the data to an alternate active destination transport
* address if one exists.
*
* When retransmitting data that timed out, if the endpoint
* is multihomed, it should consider each source-destination
* address pair in its retransmission selection policy.
* When retransmitting timed-out data, the endpoint should
* attempt to pick the most divergent source-destination
* pair from the original source-destination pair to which
* the packet was transmitted.
*
* Note: Rules for picking the most divergent source-destination
* pair are an implementation decision and are not specified
* within this document.
*
* Our basic strategy is to round-robin transports in priorities
* according to sctp_state_prio_map[] e.g., if no such
* transport with state SCTP_ACTIVE exists, round-robin through
* SCTP_UNKNOWN, etc. You get the picture.
*/
static const u8 sctp_trans_state_to_prio_map[] = {
[SCTP_ACTIVE] = 3, /* best case */
[SCTP_UNKNOWN] = 2,
[SCTP_PF] = 1,
[SCTP_INACTIVE] = 0, /* worst case */
};
static u8 sctp_trans_score(const struct sctp_transport *trans)
{ {
struct sctp_transport *t, *next; return sctp_trans_state_to_prio_map[trans->state];
struct list_head *head = &asoc->peer.transport_addr_list; }
struct list_head *pos;
if (asoc->peer.transport_count == 1)
return;
/* Find the next transport in a round-robin fashion. */ static struct sctp_transport *sctp_trans_elect_best(struct sctp_transport *curr,
t = asoc->peer.retran_path; struct sctp_transport *best)
pos = &t->transports; {
next = NULL; if (best == NULL)
return curr;
while (1) { return sctp_trans_score(curr) > sctp_trans_score(best) ? curr : best;
/* Skip the head. */ }
if (pos->next == head)
pos = head->next;
else
pos = pos->next;
t = list_entry(pos, struct sctp_transport, transports); void sctp_assoc_update_retran_path(struct sctp_association *asoc)
{
struct sctp_transport *trans = asoc->peer.retran_path;
struct sctp_transport *trans_next = NULL;
/* We have exhausted the list, but didn't find any /* We're done as we only have the one and only path. */
* other active transports. If so, use the next if (asoc->peer.transport_count == 1)
* transport. return;
/* If active_path and retran_path are the same and active,
* then this is the only active path. Use it.
*/ */
if (t == asoc->peer.retran_path) { if (asoc->peer.active_path == asoc->peer.retran_path &&
t = next; asoc->peer.active_path->state == SCTP_ACTIVE)
break; return;
}
/* Try to find an active transport. */
if ((t->state == SCTP_ACTIVE) || /* Iterate from retran_path's successor back to retran_path. */
(t->state == SCTP_UNKNOWN)) { for (trans = list_next_entry(trans, transports); 1;
trans = list_next_entry(trans, transports)) {
/* Manually skip the head element. */
if (&trans->transports == &asoc->peer.transport_addr_list)
continue;
if (trans->state == SCTP_UNCONFIRMED)
continue;
trans_next = sctp_trans_elect_best(trans, trans_next);
/* Active is good enough for immediate return. */
if (trans_next->state == SCTP_ACTIVE)
break;
/* We've reached the end, time to update path. */
if (trans == asoc->peer.retran_path)
break; break;
} else {
/* Keep track of the next transport in case
* we don't find any active transport.
*/
if (t->state != SCTP_UNCONFIRMED && !next)
next = t;
}
} }
if (t) if (trans_next != NULL)
asoc->peer.retran_path = t; asoc->peer.retran_path = trans_next;
else
t = asoc->peer.retran_path;
SCTP_DEBUG_PRINTK_IPADDR("sctp_assoc_update_retran_path:association" SCTP_DEBUG_PRINTK_IPADDR("sctp_assoc_update_retran_path:association"
" %p addr: ", " %p updated new path to addr: ",
" port: %d\n", " port: %d\n",
asoc, asoc,
(&t->ipaddr), (&asoc->peer.retran_path->ipaddr),
ntohs(t->ipaddr.v4.sin_port)); ntohs(asoc->peer.retran_path->ipaddr.v4.sin_port));
} }
/* Choose the transport for sending retransmit packet. */ struct sctp_transport *
struct sctp_transport *sctp_assoc_choose_alter_transport( sctp_assoc_choose_alter_transport(struct sctp_association *asoc,
struct sctp_association *asoc, struct sctp_transport *last_sent_to) struct sctp_transport *last_sent_to)
{ {
/* If this is the first time packet is sent, use the active path, /* If this is the first time packet is sent, use the active path,
* else use the retran path. If the last packet was sent over the * else use the retran path. If the last packet was sent over the
* retran path, update the retran path and use it. * retran path, update the retran path and use it.
*/ */
if (!last_sent_to) if (last_sent_to == NULL) {
return asoc->peer.active_path; return asoc->peer.active_path;
else { } else {
if (last_sent_to == asoc->peer.retran_path) if (last_sent_to == asoc->peer.retran_path)
sctp_assoc_update_retran_path(asoc); sctp_assoc_update_retran_path(asoc);
return asoc->peer.retran_path; return asoc->peer.retran_path;
} }
} }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment