• Dave Marchevsky's avatar
    bpf: Make bpf_refcount_acquire fallible for non-owning refs · 7793fc3b
    Dave Marchevsky authored
    This patch fixes an incorrect assumption made in the original
    bpf_refcount series [0], specifically that the BPF program calling
    bpf_refcount_acquire on some node can always guarantee that the node is
    alive. In that series, the patch adding failure behavior to rbtree_add
    and list_push_{front, back} breaks this assumption for non-owning
    references.
    
    Consider the following program:
    
      n = bpf_kptr_xchg(&mapval, NULL);
      /* skip error checking */
    
      bpf_spin_lock(&l);
      if(bpf_rbtree_add(&t, &n->rb, less)) {
        bpf_refcount_acquire(n);
        /* Failed to add, do something else with the node */
      }
      bpf_spin_unlock(&l);
    
    It's incorrect to assume that bpf_refcount_acquire will always succeed in this
    scenario. bpf_refcount_acquire is being called in a critical section
    here, but the lock being held is associated with rbtree t, which isn't
    necessarily the lock associated with the tree that the node is already
    in. So after bpf_rbtree_add fails to add the node and calls bpf_obj_drop
    in it, the program has no ownership of the node's lifetime. Therefore
    the node's refcount can be decr'd to 0 at any time after the failing
    rbtree_add. If this happens before the refcount_acquire above, the node
    might be free'd, and regardless refcount_acquire will be incrementing a
    0 refcount.
    
    Later patches in the series exercise this scenario, resulting in the
    expected complaint from the kernel (without this patch's changes):
    
      refcount_t: addition on 0; use-after-free.
      WARNING: CPU: 1 PID: 207 at lib/refcount.c:25 refcount_warn_saturate+0xbc/0x110
      Modules linked in: bpf_testmod(O)
      CPU: 1 PID: 207 Comm: test_progs Tainted: G           O       6.3.0-rc7-02231-g723de1a718a2-dirty #371
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
      RIP: 0010:refcount_warn_saturate+0xbc/0x110
      Code: 6f 64 f6 02 01 e8 84 a3 5c ff 0f 0b eb 9d 80 3d 5e 64 f6 02 00 75 94 48 c7 c7 e0 13 d2 82 c6 05 4e 64 f6 02 01 e8 64 a3 5c ff <0f> 0b e9 7a ff ff ff 80 3d 38 64 f6 02 00 0f 85 6d ff ff ff 48 c7
      RSP: 0018:ffff88810b9179b0 EFLAGS: 00010082
      RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000
      RDX: 0000000000000202 RSI: 0000000000000008 RDI: ffffffff857c3680
      RBP: ffff88810027d3c0 R08: ffffffff8125f2a4 R09: ffff88810b9176e7
      R10: ffffed1021722edc R11: 746e756f63666572 R12: ffff88810027d388
      R13: ffff88810027d3c0 R14: ffffc900005fe030 R15: ffffc900005fe048
      FS:  00007fee0584a700(0000) GS:ffff88811b280000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00005634a96f6c58 CR3: 0000000108ce9002 CR4: 0000000000770ee0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
       <TASK>
       bpf_refcount_acquire_impl+0xb5/0xc0
    
      (rest of output snipped)
    
    The patch addresses this by changing bpf_refcount_acquire_impl to use
    refcount_inc_not_zero instead of refcount_inc and marking
    bpf_refcount_acquire KF_RET_NULL.
    
    For owning references, though, we know the above scenario is not possible
    and thus that bpf_refcount_acquire will always succeed. Some verifier
    bookkeeping is added to track "is input owning ref?" for bpf_refcount_acquire
    calls and return false from is_kfunc_ret_null for bpf_refcount_acquire on
    owning refs despite it being marked KF_RET_NULL.
    
    Existing selftests using bpf_refcount_acquire are modified where
    necessary to NULL-check its return value.
    
      [0]: https://lore.kernel.org/bpf/20230415201811.343116-1-davemarchevsky@fb.com/
    
    Fixes: d2dcc67d ("bpf: Migrate bpf_rbtree_add and bpf_list_push_{front,back} to possibly fail")
    Reported-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
    Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
    Link: https://lore.kernel.org/r/20230602022647.1571784-5-davemarchevsky@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    7793fc3b
refcounted_kptr.c 9.42 KB