• Benjamin Coddington's avatar
    SUNRPC: Always drop the XPRT_LOCK on XPRT_CLOSE_WAIT · 2440f3ce
    Benjamin Coddington authored
    This patch is only appropriate for stable kernels v4.16 - v4.19
    
    Since commit 9b30889c ("SUNRPC: Ensure we always close the socket after
    a connection shuts down"), and until commit c544577d ("SUNRPC: Clean up
    transport write space handling"), it is possible for the NFS client to spin
    in the following tight loop:
    
    269.964083: rpc_task_run_action: task:43@0 flags=5a81 state=0005 status=0 action=call_bind [sunrpc]
    269.964083: rpc_task_run_action: task:43@0 flags=5a81 state=0005 status=0 action=call_connect [sunrpc]
    269.964083: rpc_task_run_action: task:43@0 flags=5a81 state=0005 status=0 action=call_transmit [sunrpc]
    269.964085: xprt_transmit: peer=[10.0.1.82]:2049 xid=0x761d3f77 status=-32
    269.964085: rpc_task_run_action: task:43@0 flags=5a81 state=0005 status=-32 action=call_transmit_status [sunrpc]
    269.964085: rpc_task_run_action: task:43@0 flags=5a81 state=0005 status=-32 action=call_status [sunrpc]
    269.964085: rpc_call_status: task:43@0 status=-32
    
    The issue is that the path through call_transmit_status does not release
    the XPRT_LOCK when the transmit result is -EPIPE, so the socket cannot be
    properly shut down.
    
    The below commit fixed things up in mainline by unconditionally calling
    xprt_end_transmit() and releasing the XPRT_LOCK after every pass through
    call_transmit.  However, the entirety of this commit is not appropriate for
    stable kernels because its original inclusion was part of a series that
    modifies the sunrpc code to use a different queueing model.  As a result,
    there are machinations within this patch that are not needed for a stable
    fix and will not make sense without a larger backport of the mainline
    series.
    
    In this patch, we take the slightly modified bit of the mainline patch
    below, which is to release the XPRT_LOCK on transmission error should we
    detect that the transport is waiting to close.
    
    commit c544577d upstream
    Author: Trond Myklebust <trond.myklebust@hammerspace.com>
    Date:   Mon Sep 3 23:39:27 2018 -0400
    
        SUNRPC: Clean up transport write space handling
    
        Treat socket write space handling in the same way we now treat transport
        congestion: by denying the XPRT_LOCK until the transport signals that it
        has free buffer space.
    Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
    
    The original discussion of the problem is here:
    
        https://lore.kernel.org/linux-nfs/20181212135157.4489-1-dwysocha@redhat.com/T/#t
    
    This passes my usual cthon and xfstests on NFS as applied on v4.19 mainline.
    Reported-by: default avatarDave Wysochanski <dwysocha@redhat.com>
    Suggested-by: default avatarTrond Myklebust <trondmy@hammerspace.com>
    Signed-off-by: default avatarBenjamin Coddington <bcodding@redhat.com>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    2440f3ce
clnt.c 69.2 KB