1. 15 May, 2023 5 commits
    • Yafang Shao's avatar
      bpf: Remove bpf trampoline selector · 47e79cbe
      Yafang Shao authored
      After commit e21aa341 ("bpf: Fix fexit trampoline."), the selector is only
      used to indicate how many times the bpf trampoline image are updated and been
      displayed in the trampoline ksym name. After the trampoline is freed, the
      selector will start from 0 again. So the selector is a useless value to the
      user. We can remove it.
      
      If the user want to check whether the bpf trampoline image has been updated
      or not, the user can compare the address. Each time the trampoline image is
      updated, the address will change consequently. Jiri also pointed out another
      issue that perf is still using the old name "bpf_trampoline_%lu", so this
      change can fix the issue in perf.
      
      Fixes: e21aa341 ("bpf: Fix fexit trampoline.")
      Signed-off-by: default avatarYafang Shao <laoar.shao@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarSong Liu <song@kernel.org>
      Cc: Jiri Olsa <olsajiri@gmail.com>
      Link: https://lore.kernel.org/bpf/ZFvOOlrmHiY9AgXE@krava
      Link: https://lore.kernel.org/bpf/20230515130849.57502-3-laoar.shao@gmail.com
      47e79cbe
    • Florent Revest's avatar
      bpf, arm64: Support struct arguments in the BPF trampoline · 90564f1e
      Florent Revest authored
      This extends the BPF trampoline JIT to support attachment to functions
      that take small structures (up to 128bit) as argument. This is trivially
      achieved by saving/restoring a number of "argument registers" rather
      than a number of arguments.
      
      The AAPCS64 section 6.8.2 describes the parameter passing ABI.
      "Composite types" (like C structs) below 16 bytes (as enforced by the
      BPF verifier) are provided as part of the 8 argument registers as
      explained in the section C.12.
      Signed-off-by: default avatarFlorent Revest <revest@chromium.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Acked-by: default avatarXu Kuohai <xukuohai@huawei.com>
      Link: https://lore.kernel.org/bpf/20230511140507.514888-1-revest@chromium.org
      90564f1e
    • Alan Maguire's avatar
      bpftool: JIT limited misreported as negative value on aarch64 · 04cb8453
      Alan Maguire authored
      On aarch64, "bpftool feature" reports an incorrect BPF JIT limit:
      
      $ sudo /sbin/bpftool feature
      Scanning system configuration...
      bpf() syscall restricted to privileged users
      JIT compiler is enabled
      JIT compiler hardening is disabled
      JIT compiler kallsyms exports are enabled for root
      skipping kernel config, can't open file: No such file or directory
      Global memory limit for JIT compiler for unprivileged users is -201326592 bytes
      
      This is because /proc/sys/net/core/bpf_jit_limit reports
      
      $ sudo cat /proc/sys/net/core/bpf_jit_limit
      68169519595520
      
      ...and an int is assumed in read_procfs().  Change read_procfs()
      to return a long to avoid negative value reporting.
      
      Fixes: 7a4522bb ("tools: bpftool: add probes for /proc/ eBPF parameters")
      Reported-by: default avatarNicky Veitch <nicky.veitch@oracle.com>
      Signed-off-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Link: https://lore.kernel.org/bpf/20230512113134.58996-1-alan.maguire@oracle.com
      04cb8453
    • Andrii Nakryiko's avatar
      bpf: fix calculation of subseq_idx during precision backtracking · d84b1a67
      Andrii Nakryiko authored
      Subsequent instruction index (subseq_idx) is an index of an instruction
      that was verified/executed by verifier after the currently processed
      instruction. It is maintained during precision backtracking processing
      and is used to detect various subprog calling conditions.
      
      This patch fixes the bug with incorrectly resetting subseq_idx to -1
      when going from child state to parent state during backtracking. If we
      don't maintain correct subseq_idx we can misidentify subprog calls
      leading to precision tracking bugs.
      
      One such case was triggered by test_global_funcs/global_func9 test where
      global subprog call happened to be the very last instruction in parent
      state, leading to subseq_idx==-1, triggering WARN_ONCE:
      
        [   36.045754] verifier backtracking bug
        [   36.045764] WARNING: CPU: 13 PID: 2073 at kernel/bpf/verifier.c:3503 __mark_chain_precision+0xcc6/0xde0
        [   36.046819] Modules linked in: aesni_intel(E) crypto_simd(E) cryptd(E) kvm_intel(E) kvm(E) irqbypass(E) i2c_piix4(E) serio_raw(E) i2c_core(E) crc32c_intel)
        [   36.048040] CPU: 13 PID: 2073 Comm: test_progs Tainted: G        W  OE      6.3.0-07976-g4d585f48-dirty #972
        [   36.048783] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
        [   36.049648] RIP: 0010:__mark_chain_precision+0xcc6/0xde0
        [   36.050038] Code: 3d 82 c6 05 bb 35 32 02 01 e8 66 21 ec ff 0f 0b b8 f2 ff ff ff e9 30 f5 ff ff 48 c7 c7 f3 61 3d 82 4c 89 0c 24 e8 4a 21 ec ff <0f> 0b 4c0
      
      With the fix precision tracking across multiple states works correctly now:
      
      mark_precise: frame0: last_idx 45 first_idx 38 subseq_idx -1
      mark_precise: frame0: regs=r8 stack= before 44: (61) r7 = *(u32 *)(r10 -4)
      mark_precise: frame0: regs=r8 stack= before 43: (85) call pc+41
      mark_precise: frame0: regs=r8 stack= before 42: (07) r1 += -48
      mark_precise: frame0: regs=r8 stack= before 41: (bf) r1 = r10
      mark_precise: frame0: regs=r8 stack= before 40: (63) *(u32 *)(r10 -48) = r1
      mark_precise: frame0: regs=r8 stack= before 39: (b4) w1 = 0
      mark_precise: frame0: regs=r8 stack= before 38: (85) call pc+38
      mark_precise: frame0: parent state regs=r8 stack=:  R0_w=scalar() R1_w=map_value(off=4,ks=4,vs=8,imm=0) R6=1 R7_w=scalar() R8_r=P0 R10=fpm
      mark_precise: frame0: last_idx 36 first_idx 28 subseq_idx 38
      mark_precise: frame0: regs=r8 stack= before 36: (18) r1 = 0xffff888104f2ed14
      mark_precise: frame0: regs=r8 stack= before 35: (85) call pc+33
      mark_precise: frame0: regs=r8 stack= before 33: (18) r1 = 0xffff888104f2ed10
      mark_precise: frame0: regs=r8 stack= before 32: (85) call pc+36
      mark_precise: frame0: regs=r8 stack= before 31: (07) r1 += -4
      mark_precise: frame0: regs=r8 stack= before 30: (bf) r1 = r10
      mark_precise: frame0: regs=r8 stack= before 29: (63) *(u32 *)(r10 -4) = r7
      mark_precise: frame0: regs=r8 stack= before 28: (4c) w7 |= w0
      mark_precise: frame0: parent state regs=r8 stack=:  R0_rw=scalar() R6=1 R7_rw=scalar() R8_rw=P0 R10=fp0 fp-48_r=mmmmmmmm
      mark_precise: frame0: last_idx 27 first_idx 16 subseq_idx 28
      mark_precise: frame0: regs=r8 stack= before 27: (85) call pc+31
      mark_precise: frame0: regs=r8 stack= before 26: (b7) r1 = 0
      mark_precise: frame0: regs=r8 stack= before 25: (b7) r8 = 0
      
      Note how subseq_idx starts out as -1, then is preserved as 38 and then 28 as we
      go up the parent state chain.
      Reported-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Fixes: fde2a388 ("bpf: support precision propagation in the presence of subprogs")
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230515180710.1535018-1-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d84b1a67
    • Dave Marchevsky's avatar
      bpf: Remove anonymous union in bpf_kfunc_call_arg_meta · 4d585f48
      Dave Marchevsky authored
      For kfuncs like bpf_obj_drop and bpf_refcount_acquire - which take
      user-defined types as input - the verifier needs to track the specific
      type passed in when checking a particular kfunc call. This requires
      tracking (btf, btf_id) tuple. In commit 7c50b1cb
      ("bpf: Add bpf_refcount_acquire kfunc") I added an anonymous union with
      inner structs named after the specific kfuncs tracking this information,
      with the goal of making it more obvious which kfunc this data was being
      tracked / expected to be tracked on behalf of.
      
      In a recent series adding a new user of this tuple, Alexei mentioned
      that he didn't like this union usage as it doesn't really help with
      readability or bug-proofing ([0]). In an offline convo we agreed to
      have the tuple be fields (arg_btf, arg_btf_id), with comments in
      bpf_kfunc_call_arg_meta definition enumerating the uses of the fields by
      kfunc-specific handling logic. Such a pattern is used by struct
      bpf_reg_state without trouble.
      
      Accordingly, this patch removes the anonymous union in favor of arg_btf
      and arg_btf_id fields and comment enumerating their current uses. The
      patch also removes struct btf_and_id, which was only being used by the
      removed union's inner structs.
      
      This is a mechanical change, existing linked_list and rbtree tests will
      validate that correct (btf, btf_id) are being passed.
      
        [0]: https://lore.kernel.org/bpf/20230505021707.vlyiwy57vwxglbka@dhcp-172-26-102-232.dhcp.thefacebook.comSigned-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Link: https://lore.kernel.org/r/20230510213047.1633612-1-davemarchevsky@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      4d585f48
  2. 14 May, 2023 1 commit
    • Martin KaFai Lau's avatar
      Merge branch 'bpf: Don't EFAULT for {g,s}setsockopt with wrong optlen' · 79b3604d
      Martin KaFai Lau authored
      Stanislav Fomichev says:
      
      ====================
      optval larger than PAGE_SIZE leads to EFAULT if the BPF program
      isn't careful enough. This is often overlooked and might break
      completely unrelated socket options. Instead of EFAULT,
      let's ignore BPF program buffer changes. See the first patch for
      more info.
      
      In addition, clearly document this corner case and reset optlen
      in our selftests (in case somebody copy-pastes from them).
      
      v6:
      - no changes; resending due to screwing up v5 series with the unrelated
        patch
      
      v5:
      - goto in the selftest (Martin)
      - set IP_TOS to zero to avoid endianness complications (Martin)
      
      v4:
      - ignore retval as well when optlen > PAGE_SIZE (Martin)
      
      v3:
      - don't hard-code PAGE_SIZE (Martin)
      - reset orig_optlen in getsockopt when kernel part succeeds (Martin)
      ====================
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      79b3604d
  3. 13 May, 2023 4 commits
  4. 12 May, 2023 3 commits
    • Andrii Nakryiko's avatar
      libbpf: fix offsetof() and container_of() to work with CO-RE · bdeeed34
      Andrii Nakryiko authored
      It seems like __builtin_offset() doesn't preserve CO-RE field
      relocations properly. So if offsetof() macro is defined through
      __builtin_offset(), CO-RE-enabled BPF code using container_of() will be
      subtly and silently broken.
      
      To avoid this problem, redefine offsetof() and container_of() in the
      form that works with CO-RE relocations more reliably.
      
      Fixes: 5fbc2208 ("tools/libpf: Add offsetof/container_of macro in bpf_helpers.h")
      Reported-by: default avatarLennart Poettering <lennart@poettering.net>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20230509065502.2306180-1-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      bdeeed34
    • Martin KaFai Lau's avatar
      bpf: Address KCSAN report on bpf_lru_list · ee9fd0ac
      Martin KaFai Lau authored
      KCSAN reported a data-race when accessing node->ref.
      Although node->ref does not have to be accurate,
      take this chance to use a more common READ_ONCE() and WRITE_ONCE()
      pattern instead of data_race().
      
      There is an existing bpf_lru_node_is_ref() and bpf_lru_node_set_ref().
      This patch also adds bpf_lru_node_clear_ref() to do the
      WRITE_ONCE(node->ref, 0) also.
      
      ==================================================================
      BUG: KCSAN: data-race in __bpf_lru_list_rotate / __htab_lru_percpu_map_update_elem
      
      write to 0xffff888137038deb of 1 bytes by task 11240 on cpu 1:
      __bpf_lru_node_move kernel/bpf/bpf_lru_list.c:113 [inline]
      __bpf_lru_list_rotate_active kernel/bpf/bpf_lru_list.c:149 [inline]
      __bpf_lru_list_rotate+0x1bf/0x750 kernel/bpf/bpf_lru_list.c:240
      bpf_lru_list_pop_free_to_local kernel/bpf/bpf_lru_list.c:329 [inline]
      bpf_common_lru_pop_free kernel/bpf/bpf_lru_list.c:447 [inline]
      bpf_lru_pop_free+0x638/0xe20 kernel/bpf/bpf_lru_list.c:499
      prealloc_lru_pop kernel/bpf/hashtab.c:290 [inline]
      __htab_lru_percpu_map_update_elem+0xe7/0x820 kernel/bpf/hashtab.c:1316
      bpf_percpu_hash_update+0x5e/0x90 kernel/bpf/hashtab.c:2313
      bpf_map_update_value+0x2a9/0x370 kernel/bpf/syscall.c:200
      generic_map_update_batch+0x3ae/0x4f0 kernel/bpf/syscall.c:1687
      bpf_map_do_batch+0x2d9/0x3d0 kernel/bpf/syscall.c:4534
      __sys_bpf+0x338/0x810
      __do_sys_bpf kernel/bpf/syscall.c:5096 [inline]
      __se_sys_bpf kernel/bpf/syscall.c:5094 [inline]
      __x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5094
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      read to 0xffff888137038deb of 1 bytes by task 11241 on cpu 0:
      bpf_lru_node_set_ref kernel/bpf/bpf_lru_list.h:70 [inline]
      __htab_lru_percpu_map_update_elem+0x2f1/0x820 kernel/bpf/hashtab.c:1332
      bpf_percpu_hash_update+0x5e/0x90 kernel/bpf/hashtab.c:2313
      bpf_map_update_value+0x2a9/0x370 kernel/bpf/syscall.c:200
      generic_map_update_batch+0x3ae/0x4f0 kernel/bpf/syscall.c:1687
      bpf_map_do_batch+0x2d9/0x3d0 kernel/bpf/syscall.c:4534
      __sys_bpf+0x338/0x810
      __do_sys_bpf kernel/bpf/syscall.c:5096 [inline]
      __se_sys_bpf kernel/bpf/syscall.c:5094 [inline]
      __x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5094
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      value changed: 0x01 -> 0x00
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 11241 Comm: syz-executor.3 Not tainted 6.3.0-rc7-syzkaller-00136-g6a66fdd2 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/30/2023
      ==================================================================
      
      Reported-by: syzbot+ebe648a84e8784763f82@syzkaller.appspotmail.com
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20230511043748.1384166-1-martin.lau@linux.devSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      ee9fd0ac
    • Alan Maguire's avatar
      bpf: Add --skip_encoding_btf_inconsistent_proto, --btf_gen_optimized to pahole flags for v1.25 · 7b99f759
      Alan Maguire authored
      v1.25 of pahole supports filtering out functions with multiple inconsistent
      function prototypes or optimized-out parameters from the BTF representation.
      These present problems because there is no additional info in BTF saying which
      inconsistent prototype matches which function instance to help guide attachment,
      and functions with optimized-out parameters can lead to incorrect assumptions
      about register contents.
      
      So for now, filter out such functions while adding BTF representations for
      functions that have "."-suffixes (foo.isra.0) but not optimized-out parameters.
      This patch assumes that below linked changes land in pahole for v1.25.
      
      Issues with pahole filtering being too aggressive in removing functions
      appear to be resolved now, but CI and further testing will confirm.
      Signed-off-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: https://lore.kernel.org/r/20230510130241.1696561-1-alan.maguire@oracle.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7b99f759
  5. 06 May, 2023 9 commits
  6. 05 May, 2023 14 commits
    • Pengcheng Yang's avatar
      samples/bpf: Fix buffer overflow in tcp_basertt · f4dea968
      Pengcheng Yang authored
      Using sizeof(nv) or strlen(nv)+1 is correct.
      
      Fixes: c890063e ("bpf: sample BPF_SOCKET_OPS_BASE_RTT program")
      Signed-off-by: default avatarPengcheng Yang <yangpc@wangsu.com>
      Link: https://lore.kernel.org/r/1683276658-2860-1-git-send-email-yangpc@wangsu.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f4dea968
    • Will Hawkins's avatar
      bpf, docs: Update llvm_relocs.rst with typo fixes · 69535186
      Will Hawkins authored
      Correct a few typographical errors and fix some mistakes in examples.
      Signed-off-by: default avatarWill Hawkins <hawkinsw@obs.cr>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20230428023015.1698072-2-hawkinsw@obs.crSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      69535186
    • Alexei Starovoitov's avatar
      Merge branch 'Add precision propagation for subprogs and callbacks' · fbc0b025
      Alexei Starovoitov authored
      Andrii Nakryiko says:
      
      ====================
      As more and more real-world BPF programs become more complex
      and increasingly use subprograms (both static and global), scalar precision
      tracking and its (previously weak) support for BPF subprograms (and callbacks
      as a special case of that) is becoming more and more of an issue and
      limitation. Couple that with increasing reliance on state equivalence (BPF
      open-coded iterators have a hard requirement for state equivalence to converge
      and successfully validate loops), and it becomes pretty critical to address
      this limitation and make precision tracking universally supported for BPF
      programs of any complexity and composition.
      
      This patch set teaches BPF verifier to support SCALAR precision
      backpropagation across multiple frames (for subprogram calls and callback
      simulations) and addresses most practical situations (SCALAR stack
      loads/stores using registers other than r10 being the last remaining
      limitation, though thankfully rarely used in practice).
      
      Main logic is explained in details in patch #8. The rest are preliminary
      preparations, refactorings, clean ups, and fixes. See respective patches for
      details.
      
      Patch #8 has also veristat comparison of results for selftests, Cilium, and
      some of Meta production BPF programs before and after these changes.
      
      v2->v3:
        - drop bitcnt and ifs from bt_xxx() helpers (Alexei);
      v1->v2:
        - addressed review feedback form Alexei, adjusted commit messages, comments,
          added verbose(), WARN_ONCE(), etc;
        - re-ran all the tests and veristat on selftests, cilium, and meta-internal
          code: no new changes and no kernel warnings.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      fbc0b025
    • Andrii Nakryiko's avatar
      selftests/bpf: revert iter test subprog precision workaround · c91ab90c
      Andrii Nakryiko authored
      Now that precision propagation is supported fully in the presence of
      subprogs, there is no need to work around iter test. Revert original
      workaround.
      
      This reverts be7dbd27 ("selftests/bpf: avoid mark_all_scalars_precise() trigger in one of iter tests").
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230505043317.3629845-11-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c91ab90c
    • Andrii Nakryiko's avatar
      selftests/bpf: add precision propagation tests in the presence of subprogs · 3ef3d217
      Andrii Nakryiko authored
      Add a bunch of tests validating verifier's precision backpropagation
      logic in the presence of subprog calls and/or callback-calling
      helpers/kfuncs.
      
      We validate the following conditions:
        - subprog_result_precise: static subprog r0 result precision handling;
        - global_subprog_result_precise: global subprog r0 precision
          shortcutting, similar to BPF helper handling;
        - callback_result_precise: similarly r0 marking precise for
          callback-calling helpers;
        - parent_callee_saved_reg_precise, parent_callee_saved_reg_precise_global:
          propagation of precision for callee-saved registers bypassing
          static/global subprogs;
        - parent_callee_saved_reg_precise_with_callback: same as above, but in
          the presence of callback-calling helper;
        - parent_stack_slot_precise, parent_stack_slot_precise_global:
          similar to above, but instead propagating precision of stack slot
          (spilled SCALAR reg);
        - parent_stack_slot_precise_with_callback: same as above, but in the
          presence of callback-calling helper;
        - subprog_arg_precise: propagation of precision of static subprog's
          input argument back to caller;
        - subprog_spill_into_parent_stack_slot_precise: negative test
          validating that verifier currently can't support backtracking of stack
          access with non-r10 register, we validate that we fallback to
          forcing precision for all SCALARs.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230505043317.3629845-10-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      3ef3d217
    • Andrii Nakryiko's avatar
      bpf: support precision propagation in the presence of subprogs · fde2a388
      Andrii Nakryiko authored
      Add support precision backtracking in the presence of subprogram frames in
      jump history.
      
      This means supporting a few different kinds of subprogram invocation
      situations, all requiring a slightly different handling in precision
      backtracking handling logic:
        - static subprogram calls;
        - global subprogram calls;
        - callback-calling helpers/kfuncs.
      
      For each of those we need to handle a few precision propagation cases:
        - what to do with precision of subprog returns (r0);
        - what to do with precision of input arguments;
        - for all of them callee-saved registers in caller function should be
          propagated ignoring subprog/callback part of jump history.
      
      N.B. Async callback-calling helpers (currently only
      bpf_timer_set_callback()) are transparent to all this because they set
      a separate async callback environment and thus callback's history is not
      shared with main program's history. So as far as all the changes in this
      commit goes, such helper is just a regular helper.
      
      Let's look at all these situation in more details. Let's start with
      static subprogram being called, using an exxerpt of a simple main
      program and its static subprog, indenting subprog's frame slightly to
      make everything clear.
      
      frame 0				frame 1			precision set
      =======				=======			=============
      
       9: r6 = 456;
      10: r1 = 123;						fr0: r6
      11: call pc+10;						fr0: r1, r6
      				22: r0 = r1;		fr0: r6;     fr1: r1
      				23: exit		fr0: r6;     fr1: r0
      12: r1 = <map_pointer>					fr0: r0, r6
      13: r1 += r0;						fr0: r0, r6
      14: r1 += r6;						fr0: r6
      15: exit
      
      As can be seen above main function is passing 123 as single argument to
      an identity (`return x;`) subprog. Returned value is used to adjust map
      pointer offset, which forces r0 to be marked as precise. Then
      instruction #14 does the same for callee-saved r6, which will have to be
      backtracked all the way to instruction #9. For brevity, precision sets
      for instruction #13 and #14 are combined in the diagram above.
      
      First, for subprog calls, r0 returned from subprog (in frame 0) has to
      go into subprog's frame 1, and should be cleared from frame 0. So we go
      back into subprog's frame knowing we need to mark r0 precise. We then
      see that insn #22 sets r0 from r1, so now we care about marking r1
      precise.  When we pop up from subprog's frame back into caller at
      insn #11 we keep r1, as it's an argument-passing register, so we eventually
      find `10: r1 = 123;` and satify precision propagation chain for insn #13.
      
      This example demonstrates two sets of rules:
        - r0 returned after subprog call has to be moved into subprog's r0 set;
        - *static* subprog arguments (r1-r5) are moved back to caller precision set.
      
      Let's look at what happens with callee-saved precision propagation. Insn #14
      mark r6 as precise. When we get into subprog's frame, we keep r6 in
      frame 0's precision set *only*. Subprog itself has its own set of
      independent r6-r10 registers and is not affected. When we eventually
      made our way out of subprog frame we keep r6 in precision set until we
      reach `9: r6 = 456;`, satisfying propagation. r6-r10 propagation is
      perhaps the simplest aspect, it always stays in its original frame.
      
      That's pretty much all we have to do to support precision propagation
      across *static subprog* invocation.
      
      Let's look at what happens when we have global subprog invocation.
      
      frame 0				frame 1			precision set
      =======				=======			=============
      
       9: r6 = 456;
      10: r1 = 123;						fr0: r6
      11: call pc+10; # global subprog			fr0: r6
      12: r1 = <map_pointer>					fr0: r0, r6
      13: r1 += r0;						fr0: r0, r6
      14: r1 += r6;						fr0: r6;
      15: exit
      
      Starting from insn #13, r0 has to be precise. We backtrack all the way
      to insn #11 (call pc+10) and see that subprog is global, so was already
      validated in isolation. As opposed to static subprog, global subprog
      always returns unknown scalar r0, so that satisfies precision
      propagation and we drop r0 from precision set. We are done for insns #13.
      
      Now for insn #14. r6 is in precision set, we backtrack to `call pc+10;`.
      Here we need to recognize that this is effectively both exit and entry
      to global subprog, which means we stay in caller's frame. So we carry on
      with r6 still in precision set, until we satisfy it at insn #9. The only
      hard part with global subprogs is just knowing when it's a global func.
      
      Lastly, callback-calling helpers and kfuncs do simulate subprog calls,
      so jump history will have subprog instructions in between caller
      program's instructions, but the rules of propagating r0 and r1-r5
      differ, because we don't actually directly call callback. We actually
      call helper/kfunc, which at runtime will call subprog, so the only
      difference between normal helper/kfunc handling is that we need to make
      sure to skip callback simulatinog part of jump history.
      Let's look at an example to make this clearer.
      
      frame 0				frame 1			precision set
      =======				=======			=============
      
       8: r6 = 456;
       9: r1 = 123;						fr0: r6
      10: r2 = &callback;					fr0: r6
      11: call bpf_loop;					fr0: r6
      				22: r0 = r1;		fr0: r6      fr1:
      				23: exit		fr0: r6      fr1:
      12: r1 = <map_pointer>					fr0: r0, r6
      13: r1 += r0;						fr0: r0, r6
      14: r1 += r6;						fr0: r6;
      15: exit
      
      Again, insn #13 forces r0 to be precise. As soon as we get to `23: exit`
      we see that this isn't actually a static subprog call (it's `call
      bpf_loop;` helper call instead). So we clear r0 from precision set.
      
      For callee-saved register, there is no difference: it stays in frame 0's
      precision set, we go through insn #22 and #23, ignoring them until we
      get back to caller frame 0, eventually satisfying precision backtrack
      logic at insn #8 (`r6 = 456;`).
      
      Assuming callback needed to set r0 as precise at insn #23, we'd
      backtrack to insn #22, switching from r0 to r1, and then at the point
      when we pop back to frame 0 at insn #11, we'll clear r1-r5 from
      precision set, as we don't really do a subprog call directly, so there
      is no input argument precision propagation.
      
      That's pretty much it. With these changes, it seems like the only still
      unsupported situation for precision backpropagation is the case when
      program is accessing stack through registers other than r10. This is
      still left as unsupported (though rare) case for now.
      
      As for results. For selftests, few positive changes for bigger programs,
      cls_redirect in dynptr variant benefitting the most:
      
      [vmuser@archvm bpf]$ ./veristat -C ~/subprog-precise-before-results.csv ~/subprog-precise-after-results.csv -f @veristat.cfg -e file,prog,insns -f 'insns_diff!=0'
      File                                      Program        Insns (A)  Insns (B)  Insns     (DIFF)
      ----------------------------------------  -------------  ---------  ---------  ----------------
      pyperf600_bpf_loop.bpf.linked1.o          on_event            2060       2002      -58 (-2.82%)
      test_cls_redirect_dynptr.bpf.linked1.o    cls_redirect       15660       2914  -12746 (-81.39%)
      test_cls_redirect_subprogs.bpf.linked1.o  cls_redirect       61620      59088    -2532 (-4.11%)
      xdp_synproxy_kern.bpf.linked1.o           syncookie_tc      109980      86278  -23702 (-21.55%)
      xdp_synproxy_kern.bpf.linked1.o           syncookie_xdp      97716      85147  -12569 (-12.86%)
      
      Cilium progress don't really regress. They don't use subprogs and are
      mostly unaffected, but some other fixes and improvements could have
      changed something. This doesn't appear to be the case:
      
      [vmuser@archvm bpf]$ ./veristat -C ~/subprog-precise-before-results-cilium.csv ~/subprog-precise-after-results-cilium.csv -e file,prog,insns -f 'insns_diff!=0'
      File           Program                         Insns (A)  Insns (B)  Insns (DIFF)
      -------------  ------------------------------  ---------  ---------  ------------
      bpf_host.o     tail_nodeport_nat_ingress_ipv6       4983       5003  +20 (+0.40%)
      bpf_lxc.o      tail_nodeport_nat_ingress_ipv6       4983       5003  +20 (+0.40%)
      bpf_overlay.o  tail_nodeport_nat_ingress_ipv6       4983       5003  +20 (+0.40%)
      bpf_xdp.o      tail_handle_nat_fwd_ipv6            12475      12504  +29 (+0.23%)
      bpf_xdp.o      tail_nodeport_nat_ingress_ipv6       6363       6371   +8 (+0.13%)
      
      Looking at (somewhat anonymized) Meta production programs, we see mostly
      insignificant variation in number of instructions, with one program
      (syar_bind6_protect6) benefitting the most at -17%.
      
      [vmuser@archvm bpf]$ ./veristat -C ~/subprog-precise-before-results-fbcode.csv ~/subprog-precise-after-results-fbcode.csv -e prog,insns -f 'insns_diff!=0'
      Program                   Insns (A)  Insns (B)  Insns     (DIFF)
      ------------------------  ---------  ---------  ----------------
      on_request_context_event        597        585      -12 (-2.01%)
      read_async_py_stack           43789      43657     -132 (-0.30%)
      read_sync_py_stack            35041      37599    +2558 (+7.30%)
      rrm_usdt                        946        940       -6 (-0.63%)
      sysarmor_inet6_bind           28863      28249     -614 (-2.13%)
      sysarmor_inet_bind            28845      28240     -605 (-2.10%)
      syar_bind4_protect4          154145     147640    -6505 (-4.22%)
      syar_bind6_protect6          165242     137088  -28154 (-17.04%)
      syar_task_exit_setgid         21289      19720    -1569 (-7.37%)
      syar_task_exit_setuid         21290      19721    -1569 (-7.37%)
      do_uprobe                     19967      19413     -554 (-2.77%)
      tw_twfw_ingress              215877     204833   -11044 (-5.12%)
      tw_twfw_tc_in                215877     204833   -11044 (-5.12%)
      
      But checking duration (wall clock) differences, that is the actual time taken
      by verifier to validate programs, we see a sometimes dramatic improvements, all
      the way to about 16x improvements:
      
      [vmuser@archvm bpf]$ ./veristat -C ~/subprog-precise-before-results-meta.csv ~/subprog-precise-after-results-meta.csv -e prog,duration -s duration_diff^ | head -n20
      Program                                   Duration (us) (A)  Duration (us) (B)  Duration (us) (DIFF)
      ----------------------------------------  -----------------  -----------------  --------------------
      tw_twfw_ingress                                     4488374             272836    -4215538 (-93.92%)
      tw_twfw_tc_in                                       4339111             268175    -4070936 (-93.82%)
      tw_twfw_egress                                      3521816             270751    -3251065 (-92.31%)
      tw_twfw_tc_eg                                       3472878             284294    -3188584 (-91.81%)
      balancer_ingress                                     343119             291391      -51728 (-15.08%)
      syar_bind6_protect6                                   78992              64782      -14210 (-17.99%)
      ttls_tc_ingress                                       11739               8176       -3563 (-30.35%)
      kprobe__security_inode_link                           13864              11341       -2523 (-18.20%)
      read_sync_py_stack                                    21927              19442       -2485 (-11.33%)
      read_async_py_stack                                   30444              28136        -2308 (-7.58%)
      syar_task_exit_setuid                                 10256               8440       -1816 (-17.71%)
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230505043317.3629845-9-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      fde2a388
    • Andrii Nakryiko's avatar
      bpf: fix mark_all_scalars_precise use in mark_chain_precision · c50c0b57
      Andrii Nakryiko authored
      When precision backtracking bails out due to some unsupported sequence
      of instructions (e.g., stack access through register other than r10), we
      need to mark all SCALAR registers as precise to be safe. Currently,
      though, we mark SCALARs precise only starting from the state we detected
      unsupported condition, which could be one of the parent states of the
      actual current state. This will leave some registers potentially not
      marked as precise, even though they should. So make sure we start
      marking scalars as precise from current state (env->cur_state).
      
      Further, we don't currently detect a situation when we end up with some
      stack slots marked as needing precision, but we ran out of available
      states to find the instructions that populate those stack slots. This is
      akin the `i >= func->allocated_stack / BPF_REG_SIZE` check and should be
      handled similarly by falling back to marking all SCALARs precise. Add
      this check when we run out of states.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230505043317.3629845-8-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c50c0b57
    • Andrii Nakryiko's avatar
      bpf: fix propagate_precision() logic for inner frames · f655badf
      Andrii Nakryiko authored
      Fix propagate_precision() logic to perform propagation of all necessary
      registers and stack slots across all active frames *in one batch step*.
      
      Doing this for each register/slot in each individual frame is wasteful,
      but the main problem is that backtracking of instruction in any frame
      except the deepest one just doesn't work. This is due to backtracking
      logic relying on jump history, and available jump history always starts
      (or ends, depending how you view it) in current frame. So, if
      prog A (frame #0) called subprog B (frame #1) and we need to propagate
      precision of, say, register R6 (callee-saved) within frame #0, we
      actually don't even know where jump history that corresponds to prog
      A even starts. We'd need to skip subprog part of jump history first to
      be able to do this.
      
      Luckily, with struct backtrack_state and __mark_chain_precision()
      handling bitmasks tracking/propagation across all active frames at the
      same time (added in previous patch), propagate_precision() can be both
      fixed and sped up by setting all the necessary bits across all frames
      and then performing one __mark_chain_precision() pass. This makes it
      unnecessary to skip subprog parts of jump history.
      
      We also improve logging along the way, to clearly specify which
      registers' and slots' precision markings are propagated within which
      frame. Each frame will have dedicated line and all registers and stack
      slots from that frame will be reported in format similar to precision
      backtrack regs/stack logging. E.g.:
      
      frame 1: propagating r1,r2,r3,fp-8,fp-16
      frame 0: propagating r3,r9,fp-120
      
      Fixes: 529409ea ("bpf: propagate precision across all frames, not just the last one")
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230505043317.3629845-7-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f655badf
    • Andrii Nakryiko's avatar
      bpf: maintain bitmasks across all active frames in __mark_chain_precision · 1ef22b68
      Andrii Nakryiko authored
      Teach __mark_chain_precision logic to maintain register/stack masks
      across all active frames when going from child state to parent state.
      Currently this should be mostly no-op, as precision backtracking usually
      bails out when encountering subprog entry/exit.
      
      It's not very apparent from the diff due to increased indentation, but
      the logic remains the same, except everything is done on specific `fr`
      frame index. Calls to bt_clear_reg() and bt_clear_slot() are replaced
      with frame-specific bt_clear_frame_reg() and bt_clear_frame_slot(),
      where frame index is passed explicitly, instead of using current frame
      number.
      
      We also adjust logging to emit affected frame number. And we also add
      better logging of human-readable register and stack slot masks, similar
      to previous patch.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230505043317.3629845-6-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      1ef22b68
    • Andrii Nakryiko's avatar
      bpf: improve precision backtrack logging · d9439c21
      Andrii Nakryiko authored
      Add helper to format register and stack masks in more human-readable
      format. Adjust logging a bit during backtrack propagation and especially
      during forcing precision fallback logic to make it clearer what's going
      on (with log_level=2, of course), and also start reporting affected
      frame depth. This is in preparation for having more than one active
      frame later when precision propagation between subprog calls is added.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230505043317.3629845-5-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d9439c21
    • Andrii Nakryiko's avatar
      bpf: encapsulate precision backtracking bookkeeping · 407958a0
      Andrii Nakryiko authored
      Add struct backtrack_state and straightforward API around it to keep
      track of register and stack masks used and maintained during precision
      backtracking process. Having this logic separately allow to keep
      high-level backtracking algorithm cleaner, but also it sets us up to
      cleanly keep track of register and stack masks per frame, allowing (with
      some further logic adjustments) to perform precision backpropagation
      across multiple frames (i.e., subprog calls).
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230505043317.3629845-4-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      407958a0
    • Andrii Nakryiko's avatar
      bpf: mark relevant stack slots scratched for register read instructions · e0bf4622
      Andrii Nakryiko authored
      When handling instructions that read register slots, mark relevant stack
      slots as scratched so that verifier log would contain those slots' states, in
      addition to currently emitted registers with stack slot offsets.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230505043317.3629845-3-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e0bf4622
    • Andrii Nakryiko's avatar
      veristat: add -t flag for adding BPF_F_TEST_STATE_FREQ program flag · 5956f301
      Andrii Nakryiko authored
      Sometimes during debugging it's important that BPF program is loaded
      with BPF_F_TEST_STATE_FREQ flag set to force verifier to do frequent
      state checkpointing. Teach veristat to do this when -t ("test state")
      flag is specified.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230505043317.3629845-2-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      5956f301
    • Kenjiro Nakayama's avatar
      libbpf: Fix comment about arc and riscv arch in bpf_tracing.h · 7866fc6a
      Kenjiro Nakayama authored
      To make comments about arc and riscv arch in bpf_tracing.h accurate,
      this patch fixes the comment about arc and adds the comment for riscv.
      Signed-off-by: default avatarKenjiro Nakayama <nakayamakenjiro@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20230504035443.427927-1-nakayamakenjiro@gmail.com
      7866fc6a
  7. 02 May, 2023 2 commits
  8. 01 May, 2023 2 commits