• Jakub Sitnicki's avatar
    net, sk_msg: Clear sk_user_data pointer on clone if tagged · f1ff5ce2
    Jakub Sitnicki authored
    sk_user_data can hold a pointer to an object that is not intended to be
    shared between the parent socket and the child that gets a pointer copy on
    clone. This is the case when sk_user_data points at reference-counted
    object, like struct sk_psock.
    
    One way to resolve it is to tag the pointer with a no-copy flag by
    repurposing its lowest bit. Based on the bit-flag value we clear the child
    sk_user_data pointer after cloning the parent socket.
    
    The no-copy flag is stored in the pointer itself as opposed to externally,
    say in socket flags, to guarantee that the pointer and the flag are copied
    from parent to child socket in an atomic fashion. Parent socket state is
    subject to change while copying, we don't hold any locks at that time.
    
    This approach relies on an assumption that sk_user_data holds a pointer to
    an object aligned at least 2 bytes. A manual audit of existing users of
    rcu_dereference_sk_user_data helper confirms our assumption.
    
    Also, an RCU-protected sk_user_data is not likely to hold a pointer to a
    char value or a pathological case of "struct { char c; }". To be safe, warn
    when the flag-bit is set when setting sk_user_data to catch any future
    misuses.
    
    It is worth considering why clearing sk_user_data unconditionally is not an
    option. There exist users, DRBD, NVMe, and Xen drivers being among them,
    that rely on the pointer being copied when cloning the listening socket.
    
    Potentially we could distinguish these users by checking if the listening
    socket has been created in kernel-space via sock_create_kern, and hence has
    sk_kern_sock flag set. However, this is not the case for NVMe and Xen
    drivers, which create sockets without marking them as belonging to the
    kernel.
    Signed-off-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
    Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
    Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
    Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
    Link: https://lore.kernel.org/bpf/20200218171023.844439-3-jakub@cloudflare.com
    f1ff5ce2
sock.h 73.8 KB