• Sowmini Varadhan's avatar
    RDS: TCP: fix race windows in send-path quiescence by rds_tcp_accept_one() · 9c79440e
    Sowmini Varadhan authored
    The send path needs to be quiesced before resetting callbacks from
    rds_tcp_accept_one(), and commit eb192840 ("RDS:TCP: Synchronize
    rds_tcp_accept_one with rds_send_xmit when resetting t_sock") achieves
    this using the c_state and RDS_IN_XMIT bit following the pattern
    used by rds_conn_shutdown(). However this leaves the possibility
    of a race window as shown in the sequence below
        take t_conn_lock in rds_tcp_conn_connect
        send outgoing syn to peer
        drop t_conn_lock in rds_tcp_conn_connect
        incoming from peer triggers rds_tcp_accept_one, conn is
    	marked CONNECTING
        wait for RDS_IN_XMIT to quiesce any rds_send_xmit threads
        call rds_tcp_reset_callbacks
        [.. race-window where incoming syn-ack can cause the conn
    	to be marked UP from rds_tcp_state_change ..]
        lock_sock called from rds_tcp_reset_callbacks, and we set
    	t_sock to null
    As soon as the conn is marked UP in the race-window above, rds_send_xmit()
    threads will proceed to rds_tcp_xmit and may encounter a null-pointer
    deref on the t_sock.
    
    Given that rds_tcp_state_change() is invoked in softirq context, whereas
    rds_tcp_reset_callbacks() is in workq context, and testing for RDS_IN_XMIT
    after lock_sock could result in a deadlock with tcp_sendmsg, this
    commit fixes the race by using a new c_state, RDS_TCP_RESETTING, which
    will prevent a transition to RDS_CONN_UP from rds_tcp_state_change().
    Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
    Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    9c79440e
rds.h 25.3 KB