• NeilBrown's avatar
    SUNRPC: close a rare race in xs_tcp_setup_socket. · 93dc41bd
    NeilBrown authored
    We have one report of a crash in xs_tcp_setup_socket.
    The call path to the crash is:
    
      xs_tcp_setup_socket -> inet_stream_connect -> lock_sock_nested.
    
    The 'sock' passed to that last function is NULL.
    
    The only way I can see this happening is a concurrent call to
    xs_close:
    
      xs_close -> xs_reset_transport -> sock_release -> inet_release
    
    inet_release sets:
       sock->sk = NULL;
    inet_stream_connect calls
       lock_sock(sock->sk);
    which gets NULL.
    
    All calls to xs_close are protected by XPRT_LOCKED as are most
    activations of the workqueue which runs xs_tcp_setup_socket.
    The exception is xs_tcp_schedule_linger_timeout.
    
    So presumably the timeout queued by the later fires exactly when some
    other code runs xs_close().
    
    To protect against this we can move the cancel_delayed_work_sync()
    call from xs_destory() to xs_close().
    
    As xs_close is never called from the worker scheduled on
    ->connect_worker, this can never deadlock.
    Signed-off-by: default avatarNeilBrown <neilb@suse.de>
    [Trond: Make it safe to call cancel_delayed_work_sync() on AF_LOCAL sockets]
    Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
    93dc41bd
xprtsock.c 79.9 KB