1. 11 Feb, 2022 9 commits
    • David S. Miller's avatar
      Merge branch 'smc-optimizations' · 1ea59b5e
      David S. Miller authored
      D. Wythe says:
      
      ====================
      net/smc: Optimizing performance in short-lived scenarios
      
      This patch set aims to optimizing performance of SMC in short-lived
      links scenarios, which is quite unsatisfactory right now.
      
      In our benchmark, we test it with follow scripts:
      
      ./wrk -c 10000 -t 4 -H 'Connection: Close' -d 20 http://smc-server
      
      Current performance figures like that:
      
      Running 20s test @ http://11.213.45.6
        4 threads and 10000 connections
        4956 requests in 20.06s, 3.24MB read
        Socket errors: connect 0, read 0, write 672, timeout 0
      Requests/sec:    247.07
      Transfer/sec:    165.28KB
      
      There are many reasons for this phenomenon, this patch set doesn't
      solve it all though, but it can be well alleviated with it in.
      
      Patch 1/5  (Make smc_tcp_listen_work() independent) :
      
      Separate smc_tcp_listen_work() from smc_listen_work(), make them
      independent of each other, the busy SMC handshake can not affect new TCP
      connections visit any more. Avoid discarding a large number of TCP
      connections after being overstock, which is undoubtedly raise the
      connection establishment time.
      
      Patch 2/5 (Limit SMC backlog connections):
      
      Since patch 1 has separated smc_tcp_listen_work() from
      smc_listen_work(), an unrestricted TCP accept have come into being. This
      patch try to put a limit on SMC backlog connections refers to
      implementation of TCP.
      
      Patch 3/5 (Limit SMC visits when handshake workqueue congested):
      
      Considering the complexity of SMC handshake right now, in short-lived
      links scenarios, this may not be the main scenario of SMC though, it's
      performance is still quite poor. This patch try to provide constraint on
      SMC handshake when handshake workqueue congested, which is the sign of
      SMC handshake stacking in our opinion.
      
      Patch 4/5 (Dynamic control handshake limitation by socket options)
      
      This patch allow applications dynamically control the ability of SMC
      handshake limitation. Since SMC don't support set SMC socket option
      before,
      this patch also have to support SMC's owns socket options.
      
      Patch 5/5 (Add global configure for handshake limitation by netlink)
      
      This patch provides a way to get benefit of handshake limitation
      without
      modifying any code for applications, which is quite useful for most
      existing applications.
      
      After this patch set, performance figures like that:
      
      Running 20s test @ http://11.213.45.6
        4 threads and 10000 connections
        693253 requests in 20.10s, 452.88MB read
      Requests/sec:  34488.13
      Transfer/sec:     22.53MB
      
      That's a quite well performance improvement, about to 6 to 7 times in my
      environment.
      ---
      changelog:
      v1 -> v2:
      - fix compile warning
      - fix invalid dependencies in kconfig
      v2 -> v3:
      - correct spelling mistakes
      - fix useless variable declare
      v3 -> v4
      - make smc_tcp_ls_wq be static
      v4 -> v5
      - add dynamic control for SMC auto fallback by socket options
      - add global configure for SMC auto fallback through netlink
      v5 -> v6
      - move auto fallback to net namespace scope
      - remove auto fallback attribute in SMC_GEN_SYS_INFO
      - add independent attributes for auto fallback
      v6 -> v7
      - fix wording and the naming issues, rename 'auto fallback' to handshake
        limitation.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ea59b5e
    • D. Wythe's avatar
      net/smc: Add global configure for handshake limitation by netlink · f9496b7c
      D. Wythe authored
      Although we can control SMC handshake limitation through socket options,
      which means that applications who need it must modify their code. It's
      quite troublesome for many existing applications. This patch modifies
      the global default value of SMC handshake limitation through netlink,
      providing a way to put constraint on handshake without modifies any code
      for applications.
      Suggested-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Signed-off-by: default avatarD. Wythe <alibuda@linux.alibaba.com>
      Reviewed-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f9496b7c
    • D. Wythe's avatar
      net/smc: Dynamic control handshake limitation by socket options · a6a6fe27
      D. Wythe authored
      This patch aims to add dynamic control for SMC handshake limitation for
      every smc sockets, in production environment, it is possible for the
      same applications to handle different service types, and may have
      different opinion on SMC handshake limitation.
      
      This patch try socket options to complete it, since we don't have socket
      option level for SMC yet, which requires us to implement it at the same
      time.
      
      This patch does the following:
      
      - add new socket option level: SOL_SMC.
      - add new SMC socket option: SMC_LIMIT_HS.
      - provide getter/setter for SMC socket options.
      
      Link: https://lore.kernel.org/all/20f504f961e1a803f85d64229ad84260434203bd.1644323503.git.alibuda@linux.alibaba.com/Signed-off-by: default avatarD. Wythe <alibuda@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a6a6fe27
    • D. Wythe's avatar
      net/smc: Limit SMC visits when handshake workqueue congested · 48b6190a
      D. Wythe authored
      This patch intends to provide a mechanism to put constraint on SMC
      connections visit according to the pressure of SMC handshake process.
      At present, frequent visits will cause the incoming connections to be
      backlogged in SMC handshake queue, raise the connections established
      time. Which is quite unacceptable for those applications who base on
      short lived connections.
      
      There are two ways to implement this mechanism:
      
      1. Put limitation after TCP established.
      2. Put limitation before TCP established.
      
      In the first way, we need to wait and receive CLC messages that the
      client will potentially send, and then actively reply with a decline
      message, in a sense, which is also a sort of SMC handshake, affect the
      connections established time on its way.
      
      In the second way, the only problem is that we need to inject SMC logic
      into TCP when it is about to reply the incoming SYN, since we already do
      that, it's seems not a problem anymore. And advantage is obvious, few
      additional processes are required to complete the constraint.
      
      This patch use the second way. After this patch, connections who beyond
      constraint will not informed any SMC indication, and SMC will not be
      involved in any of its subsequent processes.
      
      Link: https://lore.kernel.org/all/1641301961-59331-1-git-send-email-alibuda@linux.alibaba.com/Signed-off-by: default avatarD. Wythe <alibuda@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      48b6190a
    • D. Wythe's avatar
      net/smc: Limit backlog connections · 8270d9c2
      D. Wythe authored
      Current implementation does not handling backlog semantics, one
      potential risk is that server will be flooded by infinite amount
      connections, even if client was SMC-incapable.
      
      This patch works to put a limit on backlog connections, referring to the
      TCP implementation, we divides SMC connections into two categories:
      
      1. Half SMC connection, which includes all TCP established while SMC not
      connections.
      
      2. Full SMC connection, which includes all SMC established connections.
      
      For half SMC connection, since all half SMC connections starts with TCP
      established, we can achieve our goal by put a limit before TCP
      established. Refer to the implementation of TCP, this limits will based
      on not only the half SMC connections but also the full connections,
      which is also a constraint on full SMC connections.
      
      For full SMC connections, although we know exactly where it starts, it's
      quite hard to put a limit before it. The easiest way is to block wait
      before receive SMC confirm CLC message, while it's under protection by
      smc_server_lgr_pending, a global lock, which leads this limit to the
      entire host instead of a single listen socket. Another way is to drop
      the full connections, but considering the cast of SMC connections, we
      prefer to keep full SMC connections.
      
      Even so, the limits of full SMC connections still exists, see commits
      about half SMC connection below.
      
      After this patch, the limits of backend connection shows like:
      
      For SMC:
      
      1. Client with SMC-capability can makes 2 * backlog full SMC connections
         or 1 * backlog half SMC connections and 1 * backlog full SMC
         connections at most.
      
      2. Client without SMC-capability can only makes 1 * backlog half TCP
         connections and 1 * backlog full TCP connections.
      Signed-off-by: default avatarD. Wythe <alibuda@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8270d9c2
    • D. Wythe's avatar
      net/smc: Make smc_tcp_listen_work() independent · 3079e342
      D. Wythe authored
      In multithread and 10K connections benchmark, the backend TCP connection
      established very slowly, and lots of TCP connections stay in SYN_SENT
      state.
      
      Client: smc_run wrk -c 10000 -t 4 http://server
      
      the netstate of server host shows like:
          145042 times the listen queue of a socket overflowed
          145042 SYNs to LISTEN sockets dropped
      
      One reason of this issue is that, since the smc_tcp_listen_work() shared
      the same workqueue (smc_hs_wq) with smc_listen_work(), while the
      smc_listen_work() do blocking wait for smc connection established. Once
      the workqueue became congested, it's will block the accept() from TCP
      listen.
      
      This patch creates a independent workqueue(smc_tcp_ls_wq) for
      smc_tcp_listen_work(), separate it from smc_listen_work(), which is
      quite acceptable considering that smc_tcp_listen_work() runs very fast.
      Signed-off-by: default avatarD. Wythe <alibuda@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3079e342
    • Luiz Angelo Daros de Luca's avatar
      dt-bindings: net: dsa: realtek: convert to YAML schema, add MDIO · 429c83c7
      Luiz Angelo Daros de Luca authored
      Schema changes:
      
      - support for mdio-connected switches (mdio driver), recognized by
        checking the presence of property "reg"
      - new compatible strings for rtl8367s and rtl8367rb
      - "interrupt-controller" was not added as a required property. It might
        still work polling the ports when missing.
      
      Examples changes:
      
      - renamed "switch_intc" to make it unique between examples
      - removed "dsa-mdio" from mdio compatible property
      - renamed phy@0 to ethernet-phy@0 (not tested with real HW)
        phy@ requires #phy-cells
      Signed-off-by: default avatarLuiz Angelo Daros de Luca <luizluca@gmail.com>
      Reviewed-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      429c83c7
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 5b91c5cc
      Jakub Kicinski authored
      No conflicts.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5b91c5cc
    • Linus Torvalds's avatar
      Merge tag 'net-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · f1baf68e
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from netfilter and can.
      
      Current release - new code bugs:
      
         - sparx5: fix get_stat64 out-of-bound access and crash
      
         - smc: fix netdev ref tracker misuse
      
        Previous releases - regressions:
      
         - eth: ixgbevf: require large buffers for build_skb on 82599VF, avoid
           overflows
      
         - eth: ocelot: fix all IP traffic getting trapped to CPU with PTP
           over IP
      
         - bonding: fix rare link activation misses in 802.3ad mode
      
        Previous releases - always broken:
      
         - tcp: fix tcp sock mem accounting in zero-copy corner cases
      
         - remove the cached dst when uncloning an skb dst and its metadata,
           since we only have one ref it'd lead to an UaF
      
         - netfilter:
            - conntrack: don't refresh sctp entries in closed state
            - conntrack: re-init state for retransmitted syn-ack, avoid
              connection establishment getting stuck with strange stacks
            - ctnetlink: disable helper autoassign, avoid it getting lost
            - nft_payload: don't allow transport header access for fragments
      
         - dsa: fix use of devres for mdio throughout drivers
      
         - eth: amd-xgbe: disable interrupts during pci removal
      
         - eth: dpaa2-eth: unregister netdev before disconnecting the PHY
      
         - eth: ice: fix IPIP and SIT TSO offload"
      
      * tag 'net-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits)
        net: dsa: mv88e6xxx: fix use-after-free in mv88e6xxx_mdios_unregister
        net: mscc: ocelot: fix mutex lock error during ethtool stats read
        ice: Avoid RTNL lock when re-creating auxiliary device
        ice: Fix KASAN error in LAG NETDEV_UNREGISTER handler
        ice: fix IPIP and SIT TSO offload
        ice: fix an error code in ice_cfg_phy_fec()
        net: mpls: Fix GCC 12 warning
        dpaa2-eth: unregister the netdev before disconnecting from the PHY
        skbuff: cleanup double word in comment
        net: macb: Align the dma and coherent dma masks
        mptcp: netlink: process IPv6 addrs in creating listening sockets
        selftests: mptcp: add missing join check
        net: usb: qmi_wwan: Add support for Dell DW5829e
        vlan: move dev_put into vlan_dev_uninit
        vlan: introduce vlan_dev_free_egress_priority
        ax25: fix UAF bugs of net_device caused by rebinding operation
        net: dsa: fix panic when DSA master device unbinds on shutdown
        net: amd-xgbe: disable interrupts during pci removal
        tipc: rate limit warning for received illegal binding update
        net: mdio: aspeed: Add missing MODULE_DEVICE_TABLE
        ...
      f1baf68e
  2. 10 Feb, 2022 31 commits