• John Fastabend's avatar
    bpf, sockmap: Fix incorrect fwd_alloc accounting · 144748eb
    John Fastabend authored
    Incorrect accounting fwd_alloc can result in a warning when the socket
    is torn down,
    
     [18455.319240] WARNING: CPU: 0 PID: 24075 at net/core/stream.c:208 sk_stream_kill_queues+0x21f/0x230
     [...]
     [18455.319543] Call Trace:
     [18455.319556]  inet_csk_destroy_sock+0xba/0x1f0
     [18455.319577]  tcp_rcv_state_process+0x1b4e/0x2380
     [18455.319593]  ? lock_downgrade+0x3a0/0x3a0
     [18455.319617]  ? tcp_finish_connect+0x1e0/0x1e0
     [18455.319631]  ? sk_reset_timer+0x15/0x70
     [18455.319646]  ? tcp_schedule_loss_probe+0x1b2/0x240
     [18455.319663]  ? lock_release+0xb2/0x3f0
     [18455.319676]  ? __release_sock+0x8a/0x1b0
     [18455.319690]  ? lock_downgrade+0x3a0/0x3a0
     [18455.319704]  ? lock_release+0x3f0/0x3f0
     [18455.319717]  ? __tcp_close+0x2c6/0x790
     [18455.319736]  ? tcp_v4_do_rcv+0x168/0x370
     [18455.319750]  tcp_v4_do_rcv+0x168/0x370
     [18455.319767]  __release_sock+0xbc/0x1b0
     [18455.319785]  __tcp_close+0x2ee/0x790
     [18455.319805]  tcp_close+0x20/0x80
    
    This currently happens because on redirect case we do skb_set_owner_r()
    with the original sock. This increments the fwd_alloc memory accounting
    on the original sock. Then on redirect we may push this into the queue
    of the psock we are redirecting to. When the skb is flushed from the
    queue we give the memory back to the original sock. The problem is if
    the original sock is destroyed/closed with skbs on another psocks queue
    then the original sock will not have a way to reclaim the memory before
    being destroyed. Then above warning will be thrown
    
      sockA                          sockB
    
      sk_psock_strp_read()
       sk_psock_verdict_apply()
         -- SK_REDIRECT --
         sk_psock_skb_redirect()
                                    skb_queue_tail(psock_other->ingress_skb..)
    
      sk_close()
       sock_map_unref()
         sk_psock_put()
           sk_psock_drop()
             sk_psock_zap_ingress()
    
    At this point we have torn down our own psock, but have the outstanding
    skb in psock_other. Note that SK_PASS doesn't have this problem because
    the sk_psock_drop() logic releases the skb, its still associated with
    our psock.
    
    To resolve lets only account for sockets on the ingress queue that are
    still associated with the current socket. On the redirect case we will
    check memory limits per 6fa9201a, but will omit fwd_alloc accounting
    until skb is actually enqueued. When the skb is sent via skb_send_sock_locked
    or received with sk_psock_skb_ingress memory will be claimed on psock_other.
    
    Fixes: 6fa9201a ("bpf, sockmap: Avoid returning unneeded EAGAIN when redirecting to self")
    Reported-by: default avatarAndrii Nakryiko <andrii@kernel.org>
    Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
    Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
    Link: https://lore.kernel.org/bpf/161731444013.68884.4021114312848535993.stgit@john-XPS-13-9370
    144748eb
skmsg.c 24.5 KB