• Maciej Fijalkowski's avatar
    xsk: check IFF_UP earlier in Tx path · 1596dae2
    Maciej Fijalkowski authored
    Xsk Tx can be triggered via either sendmsg() or poll() syscalls. These
    two paths share a call to common function xsk_xmit() which has two
    sanity checks within. A pseudo code example to show the two paths:
    
    __xsk_sendmsg() :                       xsk_poll():
    if (unlikely(!xsk_is_bound(xs)))        if (unlikely(!xsk_is_bound(xs)))
        return -ENXIO;                          return mask;
    if (unlikely(need_wait))                (...)
        return -EOPNOTSUPP;                 xsk_xmit()
    mark napi id
    (...)
    xsk_xmit()
    
    xsk_xmit():
    if (unlikely(!(xs->dev->flags & IFF_UP)))
    	return -ENETDOWN;
    if (unlikely(!xs->tx))
    	return -ENOBUFS;
    
    As it can be observed above, in sendmsg() napi id can be marked on
    interface that was not brought up and this causes a NULL ptr
    dereference:
    
    [31757.505631] BUG: kernel NULL pointer dereference, address: 0000000000000018
    [31757.512710] #PF: supervisor read access in kernel mode
    [31757.517936] #PF: error_code(0x0000) - not-present page
    [31757.523149] PGD 0 P4D 0
    [31757.525726] Oops: 0000 [#1] PREEMPT SMP NOPTI
    [31757.530154] CPU: 26 PID: 95641 Comm: xdpsock Not tainted 6.2.0-rc5+ #40
    [31757.536871] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
    [31757.547457] RIP: 0010:xsk_sendmsg+0xde/0x180
    [31757.551799] Code: 00 75 a2 48 8b 00 a8 04 75 9b 84 d2 74 69 8b 85 14 01 00 00 85 c0 75 1b 48 8b 85 28 03 00 00 48 8b 80 98 00 00 00 48 8b 40 20 <8b> 40 18 89 85 14 01 00 00 8b bd 14 01 00 00 81 ff 00 01 00 00 0f
    [31757.570840] RSP: 0018:ffffc90034f27dc0 EFLAGS: 00010246
    [31757.576143] RAX: 0000000000000000 RBX: ffffc90034f27e18 RCX: 0000000000000000
    [31757.583389] RDX: 0000000000000001 RSI: ffffc90034f27e18 RDI: ffff88984cf3c100
    [31757.590631] RBP: ffff88984714a800 R08: ffff88984714a800 R09: 0000000000000000
    [31757.597877] R10: 0000000000000001 R11: 0000000000000000 R12: 00000000fffffffa
    [31757.605123] R13: 0000000000000000 R14: 0000000000000003 R15: 0000000000000000
    [31757.612364] FS:  00007fb4c5931180(0000) GS:ffff88afdfa00000(0000) knlGS:0000000000000000
    [31757.620571] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [31757.626406] CR2: 0000000000000018 CR3: 000000184b41c003 CR4: 00000000007706e0
    [31757.633648] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [31757.640894] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [31757.648139] PKRU: 55555554
    [31757.650894] Call Trace:
    [31757.653385]  <TASK>
    [31757.655524]  sock_sendmsg+0x8f/0xa0
    [31757.659077]  ? sockfd_lookup_light+0x12/0x70
    [31757.663416]  __sys_sendto+0xfc/0x170
    [31757.667051]  ? do_sched_setscheduler+0xdb/0x1b0
    [31757.671658]  __x64_sys_sendto+0x20/0x30
    [31757.675557]  do_syscall_64+0x38/0x90
    [31757.679197]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
    [31757.687969] Code: 8e f6 ff 44 8b 4c 24 2c 4c 8b 44 24 20 41 89 c4 44 8b 54 24 28 48 8b 54 24 18 b8 2c 00 00 00 48 8b 74 24 10 8b 7c 24 08 0f 05 <48> 3d 00 f0 ff ff 77 3a 44 89 e7 48 89 44 24 08 e8 b5 8e f6 ff 48
    [31757.707007] RSP: 002b:00007ffd49c73c70 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
    [31757.714694] RAX: ffffffffffffffda RBX: 000055a996565380 RCX: 00007fb4c5727c16
    [31757.721939] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
    [31757.729184] RBP: 0000000000000040 R08: 0000000000000000 R09: 0000000000000000
    [31757.736429] R10: 0000000000000040 R11: 0000000000000293 R12: 0000000000000000
    [31757.743673] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
    [31757.754940]  </TASK>
    
    To fix this, let's make xsk_xmit a function that will be responsible for
    generic Tx, where RCU is handled accordingly and pull out sanity checks
    and xs->zc handling. Populate sanity checks to __xsk_sendmsg() and
    xsk_poll().
    
    Fixes: ca2e1a62 ("xsk: Mark napi_id on sendmsg()")
    Fixes: 18b1ab7a ("xsk: Fix race at socket teardown")
    Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
    Reviewed-by: default avatarAlexander Lobakin <aleksander.lobakin@intel.com>
    Link: https://lore.kernel.org/r/20230215143309.13145-1-maciej.fijalkowski@intel.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
    Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
    1596dae2
xsk.c 33.1 KB