• Rainer Weikusat's avatar
    af_unix: fix 'poll for write'/connected DGRAM sockets · ec0d215f
    Rainer Weikusat authored
    For n:1 'datagram connections' (eg /dev/log), the unix_dgram_sendmsg
    routine implements a form of receiver-imposed flow control by
    comparing the length of the receive queue of the 'peer socket' with
    the max_ack_backlog value stored in the corresponding sock structure,
    either blocking the thread which caused the send-routine to be called
    or returning EAGAIN. This routine is used by both SOCK_DGRAM and
    SOCK_SEQPACKET sockets. The poll-implementation for these socket types
    is datagram_poll from core/datagram.c. A socket is deemed to be
    writeable by this routine when the memory presently consumed by
    datagrams owned by it is less than the configured socket send buffer
    size. This is always wrong for PF_UNIX non-stream sockets connected to
    server sockets dealing with (potentially) multiple clients if the
    abovementioned receive queue is currently considered to be full.
    'poll' will then return, indicating that the socket is writeable, but
    a subsequent write result in EAGAIN, effectively causing an (usual)
    application to 'poll for writeability by repeated send request with
    O_NONBLOCK set' until it has consumed its time quantum.
    
    The change below uses a suitably modified variant of the datagram_poll
    routines for both type of PF_UNIX sockets, which tests if the
    recv-queue of the peer a socket is connected to is presently
    considered to be 'full' as part of the 'is this socket
    writeable'-checking code. The socket being polled is additionally
    put onto the peer_wait wait queue associated with its peer, because the
    unix_dgram_recvmsg routine does a wake up on this queue after a
    datagram was received and the 'other wakeup call' is done implicitly
    as part of skb destruction, meaning, a process blocked in poll
    because of a full peer receive queue could otherwise sleep forever
    if no datagram owned by its socket was already sitting on this queue.
    Among this change is a small (inline) helper routine named
    'unix_recvq_full', which consolidates the actual testing code (in three
    different places) into a single location.
    Signed-off-by: default avatarRainer Weikusat <rweikusat@mssgmbh.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    ec0d215f
af_unix.c 51.7 KB