• Michael S. Tsirkin's avatar
    IPoIB: Connected mode experimental support · 839fcaba
    Michael S. Tsirkin authored
    The following patch adds experimental support for IPoIB connected
    mode, as defined by the draft from the IETF ipoib working group.  The
    idea is to increase performance by increasing the MTU from the maximum
    of 2K (theoretically 4K) supported by IPoIB on top of UD.  With this
    code, I'm able to get 800MByte/sec or more with netperf without
    options on a Mellanox 4x back-to-back DDR system.
    
    Some notes on code:
    1. SRQ is used for scalability to large cluster sizes
    2. Only RC connections are used (UC does not support SRQ now)
    3. Retry count is set to 0 since spec draft warns against retries
    4. Each connection is used for data transfers in only 1 direction, so
       each connection is either active(TX) or passive (RX).  2 sides that
       want to communicate create 2 connections.
    5. Each active (TX) connection has a separate CQ for send completions -
       this keeps the code simple without CQ resize and other tricks
    6. To detect stale passive side connections (where the remote side is
       down), we keep an LRU list of passive connections (updated once per
       second per connection) and destroy a connection after it has been
       unused for several seconds. The LRU rule makes it possible to avoid
       scanning connections that have recently been active.
    Signed-off-by: default avatarMichael S. Tsirkin <mst@mellanox.co.il>
    Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
    839fcaba
ipoib_main.c 30.7 KB