• Chuck Lever's avatar
    xprtrdma: Limit number of RDMA segments in RPC-over-RDMA headers · 94931746
    Chuck Lever authored
    Send buffer space is shared between the RPC-over-RDMA header and
    an RPC message. A large RPC-over-RDMA header means less space is
    available for the associated RPC message, which then has to be
    moved via an RDMA Read or Write.
    
    As more segments are added to the chunk lists, the header increases
    in size.  Typical modern hardware needs only a few segments to
    convey the maximum payload size, but some devices and registration
    modes may need a lot of segments to convey data payload. Sometimes
    so many are needed that the remaining space in the Send buffer is
    not enough for the RPC message. Sending such a message usually
    fails.
    
    To ensure a transport can always make forward progress, cap the
    number of RDMA segments that are allowed in chunk lists. This
    prevents less-capable devices and memory registrations from
    consuming a large portion of the Send buffer by reducing the
    maximum data payload that can be conveyed with such devices.
    
    For now I choose an arbitrary maximum of 8 RDMA segments. This
    allows a maximum size RPC-over-RDMA header to fit nicely in the
    current 1024 byte inline threshold with over 700 bytes remaining
    for an inline RPC message.
    
    The current maximum data payload of NFS READ or WRITE requests is
    one megabyte. To convey that payload on a client with 4KB pages,
    each chunk segment would need to handle 32 or more data pages. This
    is well within the capabilities of FMR. For physical registration,
    the maximum payload size on platforms with 4KB pages is reduced to
    32KB.
    
    For FRWR, a device's maximum page list depth would need to be at
    least 34 to support the maximum 1MB payload. A device with a smaller
    maximum page list depth means the maximum data payload is reduced
    when using that device.
    Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
    Tested-by: default avatarSteve Wise <swise@opengridcomputing.com>
    Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
    Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
    94931746
verbs.c 31.1 KB