• Geliang Tang's avatar
    skmsg: Skip zero length skb in sk_msg_recvmsg · f0c18025
    Geliang Tang authored
    When running BPF selftests (./test_progs -t sockmap_basic) on a Loongarch
    platform, the following kernel panic occurs:
    
      [...]
      Oops[#1]:
      CPU: 22 PID: 2824 Comm: test_progs Tainted: G           OE  6.10.0-rc2+ #18
      Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-UDK2018
         ... ...
         ra: 90000000048bf6c0 sk_msg_recvmsg+0x120/0x560
        ERA: 9000000004162774 copy_page_to_iter+0x74/0x1c0
       CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
       PRMD: 0000000c (PPLV0 +PIE +PWE)
       EUEN: 00000007 (+FPE +SXE +ASXE -BTE)
       ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
      ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0)
       BADV: 0000000000000040
       PRID: 0014c011 (Loongson-64bit, Loongson-3C5000)
      Modules linked in: bpf_testmod(OE) xt_CHECKSUM xt_MASQUERADE xt_conntrack
      Process test_progs (pid: 2824, threadinfo=0000000000863a31, task=...)
      Stack : ...
      Call Trace:
      [<9000000004162774>] copy_page_to_iter+0x74/0x1c0
      [<90000000048bf6c0>] sk_msg_recvmsg+0x120/0x560
      [<90000000049f2b90>] tcp_bpf_recvmsg_parser+0x170/0x4e0
      [<90000000049aae34>] inet_recvmsg+0x54/0x100
      [<900000000481ad5c>] sock_recvmsg+0x7c/0xe0
      [<900000000481e1a8>] __sys_recvfrom+0x108/0x1c0
      [<900000000481e27c>] sys_recvfrom+0x1c/0x40
      [<9000000004c076ec>] do_syscall+0x8c/0xc0
      [<9000000003731da4>] handle_syscall+0xc4/0x160
      Code: ...
      ---[ end trace 0000000000000000 ]---
      Kernel panic - not syncing: Fatal exception
      Kernel relocated by 0x3510000
       .text @ 0x9000000003710000
       .data @ 0x9000000004d70000
       .bss  @ 0x9000000006469400
      ---[ end Kernel panic - not syncing: Fatal exception ]---
      [...]
    
    This crash happens every time when running sockmap_skb_verdict_shutdown
    subtest in sockmap_basic.
    
    This crash is because a NULL pointer is passed to page_address() in the
    sk_msg_recvmsg(). Due to the different implementations depending on the
    architecture, page_address(NULL) will trigger a panic on Loongarch
    platform but not on x86 platform. So this bug was hidden on x86 platform
    for a while, but now it is exposed on Loongarch platform. The root cause
    is that a zero length skb (skb->len == 0) was put on the queue.
    
    This zero length skb is a TCP FIN packet, which was sent by shutdown(),
    invoked in test_sockmap_skb_verdict_shutdown():
    
    	shutdown(p1, SHUT_WR);
    
    In this case, in sk_psock_skb_ingress_enqueue(), num_sge is zero, and no
    page is put to this sge (see sg_set_page in sg_set_page), but this empty
    sge is queued into ingress_msg list.
    
    And in sk_msg_recvmsg(), this empty sge is used, and a NULL page is got by
    sg_page(sge). Pass this NULL page to copy_page_to_iter(), which passes it
    to kmap_local_page() and to page_address(), then kernel panics.
    
    To solve this, we should skip this zero length skb. So in sk_msg_recvmsg(),
    if copy is zero, that means it's a zero length skb, skip invoking
    copy_page_to_iter(). We are using the EFAULT return triggered by
    copy_page_to_iter to check for is_fin in tcp_bpf.c.
    
    Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface")
    Suggested-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
    Signed-off-by: default avatarGeliang Tang <tanggeliang@kylinos.cn>
    Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
    Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
    Link: https://lore.kernel.org/bpf/e3a16eacdc6740658ee02a33489b1b9d4912f378.1719992715.git.tanggeliang@kylinos.cn
    f0c18025
skmsg.c 28.9 KB