1. 15 Sep, 2020 38 commits
  2. 14 Sep, 2020 2 commits
    • Soheil Hassas Yeganeh's avatar
      tcp: schedule EPOLLOUT after a partial sendmsg · afb83012
      Soheil Hassas Yeganeh authored
      For EPOLLET, applications must call sendmsg until they get EAGAIN.
      Otherwise, there is no guarantee that EPOLLOUT is sent if there was
      a failure upon memory allocation.
      
      As a result on high-speed NICs, userspace observes multiple small
      sendmsgs after a partial sendmsg until EAGAIN, since TCP can send
      1-2 TSOs in between two sendmsg syscalls:
      
      // One large partial send due to memory allocation failure.
      sendmsg(20MB)   = 2MB
      // Many small sends until EAGAIN.
      sendmsg(18MB)   = 64KB
      sendmsg(17.9MB) = 128KB
      sendmsg(17.8MB) = 64KB
      ...
      sendmsg(...)    = EAGAIN
      // At this point, userspace can assume an EPOLLOUT.
      
      To fix this, set the SOCK_NOSPACE on all partial sendmsg scenarios
      to guarantee that we send EPOLLOUT after partial sendmsg.
      
      After this commit userspace can assume that it will receive an EPOLLOUT
      after the first partial sendmsg. This EPOLLOUT will benefit from
      sk_stream_write_space() logic delaying the EPOLLOUT until significant
      space is available in write queue.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      afb83012
    • Soheil Hassas Yeganeh's avatar
      tcp: return EPOLLOUT from tcp_poll only when notsent_bytes is half the limit · 8ba3c9d1
      Soheil Hassas Yeganeh authored
      If there was any event available on the TCP socket, tcp_poll()
      will be called to retrieve all the events.  In tcp_poll(), we call
      sk_stream_is_writeable() which returns true as long as we are at least
      one byte below notsent_lowat.  This will result in quite a few
      spurious EPLLOUT and frequent tiny sendmsg() calls as a result.
      
      Similar to sk_stream_write_space(), use __sk_stream_is_writeable
      with a wake value of 1, so that we set EPOLLOUT only if half the
      space is available for write.
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8ba3c9d1