Commit 36a6503f authored by Eric Dumazet's avatar Eric Dumazet Committed by David S. Miller

tcp: refine tcp_prune_ofo_queue() to not drop all packets

Over the years, TCP BDP has increased a lot, and is typically
in the order of ~10 Mbytes with help of clever Congestion Control
modules.

In presence of packet losses, TCP stores incoming packets into an out of
order queue, and number of skbs sitting there waiting for the missing
packets to be received can match the BDP (~10 Mbytes)

In some cases, TCP needs to make room for incoming skbs, and current
strategy can simply remove all skbs in the out of order queue as a last
resort, incurring a huge penalty, both for receiver and sender.

Unfortunately these 'last resort events' are quite frequent, forcing
sender to send all packets again, stalling the flow and wasting a lot of
resources.

This patch cleans only a part of the out of order queue in order
to meet the memory constraints.
Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: C. Stephen Gun <csg@google.com>
Cc: Van Jacobson <vanj@google.com>
Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
Acked-by: default avatarYuchung Cheng <ycheng@google.com>
Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parent e2d8f646
...@@ -4392,12 +4392,9 @@ static int tcp_try_rmem_schedule(struct sock *sk, struct sk_buff *skb, ...@@ -4392,12 +4392,9 @@ static int tcp_try_rmem_schedule(struct sock *sk, struct sk_buff *skb,
if (tcp_prune_queue(sk) < 0) if (tcp_prune_queue(sk) < 0)
return -1; return -1;
if (!sk_rmem_schedule(sk, skb, size)) { while (!sk_rmem_schedule(sk, skb, size)) {
if (!tcp_prune_ofo_queue(sk)) if (!tcp_prune_ofo_queue(sk))
return -1; return -1;
if (!sk_rmem_schedule(sk, skb, size))
return -1;
} }
} }
return 0; return 0;
...@@ -4874,17 +4871,32 @@ static void tcp_collapse_ofo_queue(struct sock *sk) ...@@ -4874,17 +4871,32 @@ static void tcp_collapse_ofo_queue(struct sock *sk)
} }
/* /*
* Purge the out-of-order queue. * Clean the out-of-order queue to make room.
* Return true if queue was pruned. * We drop high sequences packets to :
* 1) Let a chance for holes to be filled.
* 2) not add too big latencies if thousands of packets sit there.
* (But if application shrinks SO_RCVBUF, we could still end up
* freeing whole queue here)
*
* Return true if queue has shrunk.
*/ */
static bool tcp_prune_ofo_queue(struct sock *sk) static bool tcp_prune_ofo_queue(struct sock *sk)
{ {
struct tcp_sock *tp = tcp_sk(sk); struct tcp_sock *tp = tcp_sk(sk);
bool res = false; struct sk_buff *skb;
if (skb_queue_empty(&tp->out_of_order_queue))
return false;
if (!skb_queue_empty(&tp->out_of_order_queue)) {
NET_INC_STATS(sock_net(sk), LINUX_MIB_OFOPRUNED); NET_INC_STATS(sock_net(sk), LINUX_MIB_OFOPRUNED);
__skb_queue_purge(&tp->out_of_order_queue);
while ((skb = __skb_dequeue_tail(&tp->out_of_order_queue)) != NULL) {
tcp_drop(sk, skb);
sk_mem_reclaim(sk);
if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf &&
!tcp_under_memory_pressure(sk))
break;
}
/* Reset SACK state. A conforming SACK implementation will /* Reset SACK state. A conforming SACK implementation will
* do the same at a timeout based retransmit. When a connection * do the same at a timeout based retransmit. When a connection
...@@ -4893,10 +4905,7 @@ static bool tcp_prune_ofo_queue(struct sock *sk) ...@@ -4893,10 +4905,7 @@ static bool tcp_prune_ofo_queue(struct sock *sk)
*/ */
if (tp->rx_opt.sack_ok) if (tp->rx_opt.sack_ok)
tcp_sack_reset(&tp->rx_opt); tcp_sack_reset(&tp->rx_opt);
sk_mem_reclaim(sk); return true;
res = true;
}
return res;
} }
/* Reduce allocated memory if we can, trying to get /* Reduce allocated memory if we can, trying to get
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment