• Eric Dumazet's avatar
    tcp: allow splice() to build full TSO packets · 2f533844
    Eric Dumazet authored
    vmsplice()/splice(pipe, socket) call do_tcp_sendpages() one page at a
    time, adding at most 4096 bytes to an skb. (assuming PAGE_SIZE=4096)
    
    The call to tcp_push() at the end of do_tcp_sendpages() forces an
    immediate xmit when pipe is not already filled, and tso_fragment() try
    to split these skb to MSS multiples.
    
    4096 bytes are usually split in a skb with 2 MSS, and a remaining
    sub-mss skb (assuming MTU=1500)
    
    This makes slow start suboptimal because many small frames are sent to
    qdisc/driver layers instead of big ones (constrained by cwnd and packets
    in flight of course)
    
    In fact, applications using sendmsg() (adding an additional memory copy)
    instead of vmsplice()/splice()/sendfile() are a bit faster because of
    this anomaly, especially if serving small files in environments with
    large initial [c]wnd.
    
    Call tcp_push() only if MSG_MORE is not set in the flags parameter.
    
    This bit is automatically provided by splice() internals but for the
    last page, or on all pages if user specified SPLICE_F_MORE splice()
    flag.
    
    In some workloads, this can reduce number of sent logical packets by an
    order of magnitude, making zero-copy TCP actually faster than
    one-copy :)
    Reported-by: default avatarTom Herbert <therbert@google.com>
    Cc: Nandita Dukkipati <nanditad@google.com>
    Cc: Neal Cardwell <ncardwell@google.com>
    Cc: Tom Herbert <therbert@google.com>
    Cc: Yuchung Cheng <ycheng@google.com>
    Cc: H.K. Jerry Chu <hkchu@google.com>
    Cc: Maciej Żenczykowski <maze@google.com>
    Cc: Mahesh Bandewar <maheshb@google.com>
    Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
    Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail&gt;com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    2f533844
tcp.c 86.5 KB