• Soheil Hassas Yeganeh's avatar
    tcp: schedule EPOLLOUT after a partial sendmsg · afb83012
    Soheil Hassas Yeganeh authored
    For EPOLLET, applications must call sendmsg until they get EAGAIN.
    Otherwise, there is no guarantee that EPOLLOUT is sent if there was
    a failure upon memory allocation.
    
    As a result on high-speed NICs, userspace observes multiple small
    sendmsgs after a partial sendmsg until EAGAIN, since TCP can send
    1-2 TSOs in between two sendmsg syscalls:
    
    // One large partial send due to memory allocation failure.
    sendmsg(20MB)   = 2MB
    // Many small sends until EAGAIN.
    sendmsg(18MB)   = 64KB
    sendmsg(17.9MB) = 128KB
    sendmsg(17.8MB) = 64KB
    ...
    sendmsg(...)    = EAGAIN
    // At this point, userspace can assume an EPOLLOUT.
    
    To fix this, set the SOCK_NOSPACE on all partial sendmsg scenarios
    to guarantee that we send EPOLLOUT after partial sendmsg.
    
    After this commit userspace can assume that it will receive an EPOLLOUT
    after the first partial sendmsg. This EPOLLOUT will benefit from
    sk_stream_write_space() logic delaying the EPOLLOUT until significant
    space is available in write queue.
    Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
    Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    afb83012
tcp.c 109 KB