• Eric Dumazet's avatar
    tcp: defer regular ACK while processing socket backlog · 133c4c0d
    Eric Dumazet authored
    This idea came after a particular workload requested
    the quickack attribute set on routes, and a performance
    drop was noticed for large bulk transfers.
    
    For high throughput flows, it is best to use one cpu
    running the user thread issuing socket system calls,
    and a separate cpu to process incoming packets from BH context.
    (With TSO/GRO, bottleneck is usually the 'user' cpu)
    
    Problem is the user thread can spend a lot of time while holding
    the socket lock, forcing BH handler to queue most of incoming
    packets in the socket backlog.
    
    Whenever the user thread releases the socket lock, it must first
    process all accumulated packets in the backlog, potentially
    adding latency spikes. Due to flood mitigation, having too many
    packets in the backlog increases chance of unexpected drops.
    
    Backlog processing unfortunately shifts a fair amount of cpu cycles
    from the BH cpu to the 'user' cpu, thus reducing max throughput.
    
    This patch takes advantage of the backlog processing,
    and the fact that ACK are mostly cumulative.
    
    The idea is to detect we are in the backlog processing
    and defer all eligible ACK into a single one,
    sent from tcp_release_cb().
    
    This saves cpu cycles on both sides, and network resources.
    
    Performance of a single TCP flow on a 200Gbit NIC:
    
    - Throughput is increased by 20% (100Gbit -> 120Gbit).
    - Number of generated ACK per second shrinks from 240,000 to 40,000.
    - Number of backlog drops per second shrinks from 230 to 0.
    
    Benchmark context:
     - Regular netperf TCP_STREAM (no zerocopy)
     - Intel(R) Xeon(R) Platinum 8481C (Saphire Rapids)
     - MAX_SKB_FRAGS = 17 (~60KB per GRO packet)
    
    This feature is guarded by a new sysctl, and enabled by default:
     /proc/sys/net/ipv4/tcp_backlog_ack_defer
    Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
    Acked-by: default avatarYuchung Cheng <ycheng@google.com>
    Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
    Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
    Acked-by: default avatarDave Taht <dave.taht@gmail.com>
    Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
    133c4c0d
tcp_input.c 204 KB