• Bob Pearson's avatar
    RDMA/rxe: Fix incomplete state save in rxe_requester · 5d122db2
    Bob Pearson authored
    If a send packet is dropped by the IP layer in rxe_requester()
    the call to rxe_xmit_packet() can fail with err == -EAGAIN.
    To recover, the state of the wqe is restored to the state before
    the packet was sent so it can be resent. However, the routines
    that save and restore the state miss a significnt part of the
    variable state in the wqe, the dma struct which is used to process
    through the sge table. And, the state is not saved before the packet
    is built which modifies the dma struct.
    
    Under heavy stress testing with many QPs on a fast node sending
    large messages to a slow node dropped packets are observed and
    the resent packets are corrupted because the dma struct was not
    restored. This patch fixes this behavior and allows the test cases
    to succeed.
    
    Fixes: 3050b998 ("IB/rxe: Fix race condition between requester and completer")
    Link: https://lore.kernel.org/r/20230721200748.4604-1-rpearsonhpe@gmail.com
    
    Signed-off-by: default avatarBob Pearson <rpearsonhpe@gmail.com>
    Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
    5d122db2
rxe_req.c 20.8 KB