1. 15 Sep, 2020 36 commits
  2. 14 Sep, 2020 4 commits
    • Soheil Hassas Yeganeh's avatar
      tcp: schedule EPOLLOUT after a partial sendmsg · afb83012
      Soheil Hassas Yeganeh authored
      For EPOLLET, applications must call sendmsg until they get EAGAIN.
      Otherwise, there is no guarantee that EPOLLOUT is sent if there was
      a failure upon memory allocation.
      
      As a result on high-speed NICs, userspace observes multiple small
      sendmsgs after a partial sendmsg until EAGAIN, since TCP can send
      1-2 TSOs in between two sendmsg syscalls:
      
      // One large partial send due to memory allocation failure.
      sendmsg(20MB)   = 2MB
      // Many small sends until EAGAIN.
      sendmsg(18MB)   = 64KB
      sendmsg(17.9MB) = 128KB
      sendmsg(17.8MB) = 64KB
      ...
      sendmsg(...)    = EAGAIN
      // At this point, userspace can assume an EPOLLOUT.
      
      To fix this, set the SOCK_NOSPACE on all partial sendmsg scenarios
      to guarantee that we send EPOLLOUT after partial sendmsg.
      
      After this commit userspace can assume that it will receive an EPOLLOUT
      after the first partial sendmsg. This EPOLLOUT will benefit from
      sk_stream_write_space() logic delaying the EPOLLOUT until significant
      space is available in write queue.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      afb83012
    • Soheil Hassas Yeganeh's avatar
      tcp: return EPOLLOUT from tcp_poll only when notsent_bytes is half the limit · 8ba3c9d1
      Soheil Hassas Yeganeh authored
      If there was any event available on the TCP socket, tcp_poll()
      will be called to retrieve all the events.  In tcp_poll(), we call
      sk_stream_is_writeable() which returns true as long as we are at least
      one byte below notsent_lowat.  This will result in quite a few
      spurious EPLLOUT and frequent tiny sendmsg() calls as a result.
      
      Similar to sk_stream_write_space(), use __sk_stream_is_writeable
      with a wake value of 1, so that we set EPOLLOUT only if half the
      space is available for write.
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8ba3c9d1
    • Shannon Nelson's avatar
      ionic: fix up debugfs after queue swap · ed6d9b02
      Shannon Nelson authored
      Clean and rebuild the debugfs info for the queues being swapped.
      
      Fixes: a34e25ab ("ionic: change the descriptor ring length without full reset")
      Signed-off-by: default avatarShannon Nelson <snelson@pensando.io>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ed6d9b02
    • Vladimir Oltean's avatar
      __netif_receive_skb_core: don't untag vlan from skb on DSA master · b14a9fc4
      Vladimir Oltean authored
      A DSA master interface has upper network devices, each representing an
      Ethernet switch port attached to it. Demultiplexing the source ports and
      setting skb->dev accordingly is done through the catch-all ETH_P_XDSA
      packet_type handler. Catch-all because DSA vendors have various header
      implementations, which can be placed anywhere in the frame: before the
      DMAC, before the EtherType, before the FCS, etc. So, the ETH_P_XDSA
      handler acts like an rx_handler more than anything.
      
      It is unlikely for the DSA master interface to have any other upper than
      the DSA switch interfaces themselves. Only maybe a bridge upper*, but it
      is very likely that the DSA master will have no 8021q upper. So
      __netif_receive_skb_core() will try to untag the VLAN, despite the fact
      that the DSA switch interface might have an 8021q upper. So the skb will
      never reach that.
      
      So far, this hasn't been a problem because most of the possible
      placements of the DSA switch header mentioned in the first paragraph
      will displace the VLAN header when the DSA master receives the frame, so
      __netif_receive_skb_core() will not actually execute any VLAN-specific
      code for it. This only becomes a problem when the DSA switch header does
      not displace the VLAN header (for example with a tail tag).
      
      What the patch does is it bypasses the untagging of the skb when there
      is a DSA switch attached to this net device. So, DSA is the only
      packet_type handler which requires seeing the VLAN header. Once skb->dev
      will be changed, __netif_receive_skb_core() will be invoked again and
      untagging, or delivery to an 8021q upper, will happen in the RX of the
      DSA switch interface itself.
      
      *see commit 9eb8eff0 ("net: bridge: allow enslaving some DSA master
      network devices". This is actually the reason why I prefer keeping DSA
      as a packet_type handler of ETH_P_XDSA rather than converting to an
      rx_handler. Currently the rx_handler code doesn't support chaining, and
      this is a problem because a DSA master might be bridged.
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b14a9fc4