1. 15 Sep, 2020 20 commits
  2. 14 Sep, 2020 20 commits
    • Soheil Hassas Yeganeh's avatar
      tcp: schedule EPOLLOUT after a partial sendmsg · afb83012
      Soheil Hassas Yeganeh authored
      For EPOLLET, applications must call sendmsg until they get EAGAIN.
      Otherwise, there is no guarantee that EPOLLOUT is sent if there was
      a failure upon memory allocation.
      
      As a result on high-speed NICs, userspace observes multiple small
      sendmsgs after a partial sendmsg until EAGAIN, since TCP can send
      1-2 TSOs in between two sendmsg syscalls:
      
      // One large partial send due to memory allocation failure.
      sendmsg(20MB)   = 2MB
      // Many small sends until EAGAIN.
      sendmsg(18MB)   = 64KB
      sendmsg(17.9MB) = 128KB
      sendmsg(17.8MB) = 64KB
      ...
      sendmsg(...)    = EAGAIN
      // At this point, userspace can assume an EPOLLOUT.
      
      To fix this, set the SOCK_NOSPACE on all partial sendmsg scenarios
      to guarantee that we send EPOLLOUT after partial sendmsg.
      
      After this commit userspace can assume that it will receive an EPOLLOUT
      after the first partial sendmsg. This EPOLLOUT will benefit from
      sk_stream_write_space() logic delaying the EPOLLOUT until significant
      space is available in write queue.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      afb83012
    • Soheil Hassas Yeganeh's avatar
      tcp: return EPOLLOUT from tcp_poll only when notsent_bytes is half the limit · 8ba3c9d1
      Soheil Hassas Yeganeh authored
      If there was any event available on the TCP socket, tcp_poll()
      will be called to retrieve all the events.  In tcp_poll(), we call
      sk_stream_is_writeable() which returns true as long as we are at least
      one byte below notsent_lowat.  This will result in quite a few
      spurious EPLLOUT and frequent tiny sendmsg() calls as a result.
      
      Similar to sk_stream_write_space(), use __sk_stream_is_writeable
      with a wake value of 1, so that we set EPOLLOUT only if half the
      space is available for write.
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8ba3c9d1
    • Shannon Nelson's avatar
      ionic: fix up debugfs after queue swap · ed6d9b02
      Shannon Nelson authored
      Clean and rebuild the debugfs info for the queues being swapped.
      
      Fixes: a34e25ab ("ionic: change the descriptor ring length without full reset")
      Signed-off-by: default avatarShannon Nelson <snelson@pensando.io>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ed6d9b02
    • Vladimir Oltean's avatar
      __netif_receive_skb_core: don't untag vlan from skb on DSA master · b14a9fc4
      Vladimir Oltean authored
      A DSA master interface has upper network devices, each representing an
      Ethernet switch port attached to it. Demultiplexing the source ports and
      setting skb->dev accordingly is done through the catch-all ETH_P_XDSA
      packet_type handler. Catch-all because DSA vendors have various header
      implementations, which can be placed anywhere in the frame: before the
      DMAC, before the EtherType, before the FCS, etc. So, the ETH_P_XDSA
      handler acts like an rx_handler more than anything.
      
      It is unlikely for the DSA master interface to have any other upper than
      the DSA switch interfaces themselves. Only maybe a bridge upper*, but it
      is very likely that the DSA master will have no 8021q upper. So
      __netif_receive_skb_core() will try to untag the VLAN, despite the fact
      that the DSA switch interface might have an 8021q upper. So the skb will
      never reach that.
      
      So far, this hasn't been a problem because most of the possible
      placements of the DSA switch header mentioned in the first paragraph
      will displace the VLAN header when the DSA master receives the frame, so
      __netif_receive_skb_core() will not actually execute any VLAN-specific
      code for it. This only becomes a problem when the DSA switch header does
      not displace the VLAN header (for example with a tail tag).
      
      What the patch does is it bypasses the untagging of the skb when there
      is a DSA switch attached to this net device. So, DSA is the only
      packet_type handler which requires seeing the VLAN header. Once skb->dev
      will be changed, __netif_receive_skb_core() will be invoked again and
      untagging, or delivery to an 8021q upper, will happen in the RX of the
      DSA switch interface itself.
      
      *see commit 9eb8eff0 ("net: bridge: allow enslaving some DSA master
      network devices". This is actually the reason why I prefer keeping DSA
      as a packet_type handler of ETH_P_XDSA rather than converting to an
      rx_handler. Currently the rx_handler code doesn't support chaining, and
      this is a problem because a DSA master might be bridged.
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b14a9fc4
    • David S. Miller's avatar
      Merge branch 'net-next-dsa-mt7530-add-support-for-MT7531' · 0ca6d8b7
      David S. Miller authored
      Landen Chao says:
      
      ====================
      net-next: dsa: mt7530: add support for MT7531
      
      This patch series adds support for MT7531.
      
      MT7531 is the next generation of MT7530 which could be found on Mediatek
      router platforms such as MT7622 or MT7629.
      
      It is also a 7-ports switch with 5 giga embedded phys, 2 cpu ports, and
      the same MAC logic of MT7530. Cpu port 6 only supports SGMII interface.
      Cpu port 5 supports either RGMII or SGMII in different HW SKU, but cannot
      be muxed to PHY of port 0/4 like mt7530. Due to support for SGMII
      interface, pll, and pad setting are different from MT7530.
      
      MT7531 SGMII interface can be configured in following mode:
      - 'SGMII AN mode' with in-band negotiation capability
          which is compatible with PHY_INTERFACE_MODE_SGMII.
      - 'SGMII force mode' without in-band negotiation
          which is compatible with 10B/8B encoding of
          PHY_INTERFACE_MODE_1000BASEX with fixed full-duplex and fixed pause.
      - 2.5 times faster clocked 'SGMII force mode' without in-band negotiation
          which is compatible with 10B/8B encoding of
          PHY_INTERFACE_MODE_2500BASEX with fixed full-duplex and fixed pause.
      
      v4 -> v5
      - Add fixed-link node to dsa cpu port in dts file by suggestion of
        Vladimir Oltean.
      
      v3 -> v4
      - Adjust the coding style by suggestion of Jakub Kicinski.
        Remove unnecessary jumping label, merge continuous numeric 'switch
        cases' into one line, and keep the variables longest to shortest
        (reverse xmas tree).
      
      v2 -> v3
      - Keep the same setup logic of mt7530/mt7621 because these series of
        patches is for adding mt7531 hardware.
      - Do not adjust rgmii delay when vendor phy driver presents in order to
        prevent double adjustment by suggestion of Andrew Lunn.
      - Remove redundant 'Example 4' from dt-bindings by suggestion of
        Rob Herring.
      - Fix typo.
      
      v1 -> v2
      - change phylink_validate callback function to support full-duplex
        gigabit only to match hardware capability.
      - add description of SGMII interface.
      - configure mt7531 cpu port in fastest speed by default.
      - parse SGMII control word for in-band negotiation mode.
      - configure RGMII delay based on phy.rst.
      - Rename the definition in the header file to avoid potential conflicts.
      - Add wrapper function for mdio read/write to support both C22 and C45.
      - correct fixed-link speed of 2500base-x in dts.
      - add MT7531 port mirror setting.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0ca6d8b7
    • Landen Chao's avatar
      arm64: dts: mt7622: add mt7531 dsa to bananapi-bpi-r64 board · 79a675e6
      Landen Chao authored
      Add mt7531 dsa to bananapi-bpi-r64 board for 5 giga Ethernet ports support.
      Signed-off-by: default avatarLanden Chao <landen.chao@mediatek.com>
      Tested-By: default avatarFrank Wunderlich <frank-w@public-files.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      79a675e6
    • Landen Chao's avatar
      arm64: dts: mt7622: add mt7531 dsa to mt7622-rfb1 board · 6af06448
      Landen Chao authored
      Add mt7531 dsa to mt7622-rfb1 board for 5 giga Ethernet ports support.
      mt7622 only supports 1 sgmii interface, so either gmac0 or gmac1 can be
      configured as sgmii interface. In this patch, change to connect mt7622
      gmac0 and mt7531 port6 through sgmii interface.
      Signed-off-by: default avatarLanden Chao <landen.chao@mediatek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6af06448
    • Landen Chao's avatar
      net: dsa: mt7530: Add the support of MT7531 switch · c288575f
      Landen Chao authored
      Add new support for MT7531:
      
      MT7531 is the next generation of MT7530. It is also a 7-ports switch with
      5 giga embedded phys, 2 cpu ports, and the same MAC logic of MT7530. Cpu
      port 6 only supports SGMII interface. Cpu port 5 supports either RGMII
      or SGMII in different HW sku, but cannot be muxed to PHY of port 0/4 like
      mt7530. Due to SGMII interface support, pll, and pad setting are different
      from MT7530. This patch adds different initial setting, and SGMII phylink
      handlers of MT7531.
      
      MT7531 SGMII interface can be configured in following mode:
      - 'SGMII AN mode' with in-band negotiation capability
          which is compatible with PHY_INTERFACE_MODE_SGMII.
      - 'SGMII force mode' without in-band negotiation
          which is compatible with 10B/8B encoding of
          PHY_INTERFACE_MODE_1000BASEX with fixed full-duplex and fixed pause.
      - 2.5 times faster clocked 'SGMII force mode' without in-band negotiation
          which is compatible with 10B/8B encoding of
          PHY_INTERFACE_MODE_2500BASEX with fixed full-duplex and fixed pause.
      Signed-off-by: default avatarLanden Chao <landen.chao@mediatek.com>
      Signed-off-by: default avatarSean Wang <sean.wang@mediatek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c288575f
    • Landen Chao's avatar
      dt-bindings: net: dsa: add new MT7531 binding to support MT7531 · 27834b02
      Landen Chao authored
      Add devicetree binding to support the compatible mt7531 switch as used
      in the MediaTek MT7531 switch.
      Signed-off-by: default avatarSean Wang <sean.wang@mediatek.com>
      Signed-off-by: default avatarLanden Chao <landen.chao@mediatek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      27834b02
    • Landen Chao's avatar
      net: dsa: mt7530: Extend device data ready for adding a new hardware · 88bdef8b
      Landen Chao authored
      Add a structure holding required operations for each device such as device
      initialization, PHY port read or write, a checker whether PHY interface is
      supported on a certain port, MAC port setup for either bus pad or a
      specific PHY interface.
      
      The patch is done for ready adding a new hardware MT7531, and keep the
      same setup logic of existing hardware.
      Signed-off-by: default avatarLanden Chao <landen.chao@mediatek.com>
      Signed-off-by: default avatarSean Wang <sean.wang@mediatek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      88bdef8b
    • Landen Chao's avatar
      net: dsa: mt7530: Refine message in Kconfig · dc8ef938
      Landen Chao authored
      Refine message in Kconfig with fixing typo and an explicit MT7621 support.
      Signed-off-by: default avatarLanden Chao <landen.chao@mediatek.com>
      Signed-off-by: default avatarSean Wang <sean.wang@mediatek.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc8ef938
    • Xie He's avatar
      drivers/net/wan/x25_asy: Remove an unnecessary x25_type_trans call · 4b468385
      Xie He authored
      x25_type_trans only needs to be called before we call netif_rx to pass
      the skb to upper layers.
      
      It does not need to be called before lapb_data_received. The LAPB module
      does not need the fields that are set by calling it.
      
      In the other two X.25 drivers - lapbether and hdlc_x25. x25_type_trans
      is only called before netif_rx and not before lapb_data_received.
      
      Cc: Martin Schiller <ms@dev.tdt.de>
      Signed-off-by: default avatarXie He <xie.he.0141@gmail.com>
      Acked-by: default avatarMartin Schiller <ms@dev.tdt.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4b468385
    • Paolo Abeni's avatar
      net: try to avoid unneeded backlog flush · 2de79ee2
      Paolo Abeni authored
      flush_all_backlogs() may cause deadlock on systems
      running processes with FIFO scheduling policy.
      
      The above is critical in -RT scenarios, where user-space
      specifically ensure no network activity is scheduled on
      the CPU running the mentioned FIFO process, but still get
      stuck.
      
      This commit tries to address the problem checking the
      backlog status on the remote CPUs before scheduling the
      flush operation. If the backlog is empty, we can skip it.
      
      v1 -> v2:
       - explicitly clear flushed cpu mask - Eric
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2de79ee2
    • David S. Miller's avatar
      Merge branch 'mlxsw-Derive-SBIB-from-maximum-port-speed-and-MTU' · 7b2d1b8d
      David S. Miller authored
      Ido Schimmel says:
      
      ====================
      mlxsw: Derive SBIB from maximum port speed & MTU
      
      Petr says:
      
      Internal buffer is a part of port headroom used for packets that are
      mirrored due to triggers that the Spectrum ASIC considers "egress". Besides
      ACL mirroring on port egresss this includes also packets mirrored due to
      ECN marking.
      
      This patchset changes the way the internal mirroring buffer is reserved.
      Currently the buffer reflects port MTU and speed accurately. In the future,
      mlxsw should support dcbnl_setbuffer hook to allow the users to set buffer
      sizes by hand. In that case, there might not be enough space for growth of
      the internal mirroring buffer due to MTU and speed changes. While vetoing
      MTU changes would be merely confusing, port speed changes cannot be vetoed,
      and such change would simply lead to issues in packet mirroring.
      
      For these reasons, with these patches the internal mirroring buffer is
      derived from maximum MTU and maximum speed achievable on the port.
      
      Patches #1 and #2 introduce a new callback to determine the maximum speed a
      given port can achieve.
      
      With patches #3 and #4, the information about, respectively, maximum MTU
      and maximum port speed, is kept in struct mlxsw_sp_port.
      
      In patch #5, maximum MTU and maximum speed are used to determine the size
      of the internal buffer. MTU update and speed update hooks are dropped,
      because they are no longer necessary.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7b2d1b8d
    • Petr Machata's avatar
      mlxsw: spectrum_span: Derive SBIB from maximum port speed & MTU · 532b49e4
      Petr Machata authored
      The SBIB register configures the size of an internal buffer that the
      Spectrum ASICs use when mirroring traffic on egress. This size should be
      taken into account when validating that the port headroom buffers are not
      larger than the chip can handle. Up until now this was not done, which is
      incidentally not a problem, because the priority group buffers that mlxsw
      auto-configures are small enough that the boundary condition could not be
      violated.
      
      However when dcbnl_setbuffer is implemented, the user has control over
      sizes of PG buffers, and they might overshoot the headroom capacity.
      However the size of the SBIB buffer depends on port speed, and that cannot
      be vetoed. Therefore SBIB size should be deduced from maximum port speed.
      
      Additionally, once the buffers are configured by hand, the user could get
      into an uncomfortable situation where their MTU change requests get vetoed,
      because the SBIB does not fit anymore. Therefore derive SBIB size from
      maximum permissible MTU as well.
      
      Remove all the code that adjusted the SBIB size whenever speed or MTU
      changed.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      532b49e4
    • Petr Machata's avatar
      mlxsw: spectrum: Keep maximum speed around · 3232e8c6
      Petr Machata authored
      The maximum port speed depends on link modes supported by the port, and for
      Ethernet ports is constant. The maximum speed will be handy when setting
      SBIB, the internal buffer used for traffic mirroring. Therefore, keep it in
      struct mlxsw_sp_port for easy access.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3232e8c6
    • Petr Machata's avatar
      mlxsw: spectrum: Keep maximum MTU around · 2ecf87ae
      Petr Machata authored
      The maximum port MTU depends on port type. On Spectrum, mlxsw configures
      all ports as Ethernet ports, and the maximum MTU therefore never changes.
      Besides checking MTU configuration, maximum MTU will also be handy when
      setting SBIB, the internal buffer used for traffic mirroring. Therefore,
      keep it in struct mlxsw_sp_port for easy access.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2ecf87ae
    • Petr Machata's avatar
      mlxsw: spectrum_ethtool: Introduce ptys_max_speed callback · 60fbc521
      Petr Machata authored
      The SBIB register configures the size of an internal buffer that the
      Spectrum ASICs use when mirroring traffic on egress. This size should be
      taken into account when validating that the port headroom buffers are not
      larger than the chip can handle. Up until now this was not done, which is
      incidentally not a problem, because the priority group buffers that mlxsw
      auto-configures are small enough that the boundary condition could not be
      violated.
      
      When dcbnl_setbuffer is implemented, the user gets control over sizes of PG
      buffers, and they might overshoot the headroom capacity. However the size
      of the SBIB buffer depends on port speed, which cannot be vetoed. There is
      obviously no way to retroactively push back on requests for overlarge PG
      buffers, or reject an overlarge MTU, or cancel losslessness of a certain
      PG.
      
      Therefore, instead of taking into account the current speed when
      calculating SBIB buffer size, take into account the maximum speed that a
      port with given Ethernet protocol capabilities can have.
      
      To that end, add a new ethtool callback, ptys_max_speed, which determines
      this maximum speed.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      60fbc521
    • Petr Machata's avatar
      mlxsw: spectrum_ethtool: Extract a helper to get Ethernet attributes · d24ca6c0
      Petr Machata authored
      In order to allow reusing the logic, extract from
      mlxsw_sp_port_get_link_ksettings() the code to obtain Ethernet protocol
      attributes, mlxsw_sp_port_ptys_query().
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d24ca6c0
    • David S. Miller's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 7952d7ed
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      40GbE Intel Wired LAN Driver Updates 2020-09-14
      
      This series contains updates to i40e driver only.
      
      Li RongQing removes binding affinity mask to a fixed CPU and sets
      prefetch of Rx buffer page to occur conditionally.
      
      Björn provides AF_XDP performance improvements by not prefetching HW
      descriptors, using 16 byte descriptors, and moving buffer allocation
      out of Rx processing loop.
      
      v2: Define prefetch_page_address in a common header for patch 2.
      Dropped, previous, patch 5 as it is being reworked to be more
      generalized.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7952d7ed