1. 27 Jun, 2024 1 commit
  2. 26 Jun, 2024 5 commits
    • Alan Maguire's avatar
      libbpf: Fix clang compilation error in btf_relocate.c · 0f31c2c6
      Alan Maguire authored
      When building with clang for ARCH=i386, the following errors are
      observed:
      
        CC      kernel/bpf/btf_relocate.o
      ./tools/lib/bpf/btf_relocate.c:206:23: error: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Werror,-Wsingle-bit-bitfield-constant-conversion]
        206 |                 info[id].needs_size = true;
            |                                     ^ ~
      ./tools/lib/bpf/btf_relocate.c:256:25: error: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Werror,-Wsingle-bit-bitfield-constant-conversion]
        256 |                         base_info.needs_size = true;
            |                                              ^ ~
      2 errors generated.
      
      The problem is we use 1-bit, 31-bit bitfields in a signed int.
      Changing to
      
      	bool needs_size: 1;
      	unsigned int size:31;
      
      ...resolves the error and pahole reports that 4 bytes are used
      for the underlying representation:
      
      $ pahole btf_name_info tools/lib/bpf/btf_relocate.o
      struct btf_name_info {
      	const char  *              name;                 /*     0     8 */
      	unsigned int               needs_size:1;         /*     8: 0  4 */
      	unsigned int               size:31;              /*     8: 1  4 */
      	__u32                      id;                   /*    12     4 */
      
      	/* size: 16, cachelines: 1, members: 4 */
      	/* last cacheline: 16 bytes */
      };
      Signed-off-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20240624192903.854261-1-alan.maguire@oracle.com
      0f31c2c6
    • Ma Ke's avatar
      selftests/bpf: Don't close(-1) in serial_test_fexit_stress() · d07980f7
      Ma Ke authored
      Guard close() with extra link_fd[i] > 0 and fexit_fd[i] > 0
      check to prevent close(-1).
      Signed-off-by: default avatarMa Ke <make24@iscas.ac.cn>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20240623131753.2133829-1-make24@iscas.ac.cn
      d07980f7
    • Matt Bobrowski's avatar
      bpf: add new negative selftests to cover missing check_func_arg_reg_off() and reg->type check · aa293983
      Matt Bobrowski authored
      Add new negative selftests which are intended to cover the
      out-of-bounds memory access that could be performed on a
      CONST_PTR_TO_DYNPTR within functions taking a ARG_PTR_TO_DYNPTR |
      MEM_RDONLY as an argument, and acceptance of invalid register types
      i.e. PTR_TO_BTF_ID within functions taking a ARG_PTR_TO_DYNPTR |
      MEM_RDONLY.
      Reported-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Acked-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Acked-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Signed-off-by: default avatarMatt Bobrowski <mattbobrowski@google.com>
      Link: https://lore.kernel.org/r/20240625062857.92760-2-mattbobrowski@google.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      aa293983
    • Matt Bobrowski's avatar
      bpf: add missing check_func_arg_reg_off() to prevent out-of-bounds memory accesses · ec2b9a5e
      Matt Bobrowski authored
      Currently, it's possible to pass in a modified CONST_PTR_TO_DYNPTR to
      a global function as an argument. The adverse effects of this is that
      BPF helpers can continue to make use of this modified
      CONST_PTR_TO_DYNPTR from within the context of the global function,
      which can unintentionally result in out-of-bounds memory accesses and
      therefore compromise overall system stability i.e.
      
      [  244.157771] BUG: KASAN: slab-out-of-bounds in bpf_dynptr_data+0x137/0x140
      [  244.161345] Read of size 8 at addr ffff88810914be68 by task test_progs/302
      [  244.167151] CPU: 0 PID: 302 Comm: test_progs Tainted: G O E 6.10.0-rc3-00131-g66b58671 #533
      [  244.174318] Call Trace:
      [  244.175787]  <TASK>
      [  244.177356]  dump_stack_lvl+0x66/0xa0
      [  244.179531]  print_report+0xce/0x670
      [  244.182314]  ? __virt_addr_valid+0x200/0x3e0
      [  244.184908]  kasan_report+0xd7/0x110
      [  244.187408]  ? bpf_dynptr_data+0x137/0x140
      [  244.189714]  ? bpf_dynptr_data+0x137/0x140
      [  244.192020]  bpf_dynptr_data+0x137/0x140
      [  244.194264]  bpf_prog_b02a02fdd2bdc5fa_global_call_bpf_dynptr_data+0x22/0x26
      [  244.198044]  bpf_prog_b0fe7b9d7dc3abde_callback_adjust_bpf_dynptr_reg_off+0x1f/0x23
      [  244.202136]  bpf_user_ringbuf_drain+0x2c7/0x570
      [  244.204744]  ? 0xffffffffc0009e58
      [  244.206593]  ? __pfx_bpf_user_ringbuf_drain+0x10/0x10
      [  244.209795]  bpf_prog_33ab33f6a804ba2d_user_ringbuf_callback_const_ptr_to_dynptr_reg_off+0x47/0x4b
      [  244.215922]  bpf_trampoline_6442502480+0x43/0xe3
      [  244.218691]  __x64_sys_prlimit64+0x9/0xf0
      [  244.220912]  do_syscall_64+0xc1/0x1d0
      [  244.223043]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
      [  244.226458] RIP: 0033:0x7ffa3eb8f059
      [  244.228582] Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8f 1d 0d 00 f7 d8 64 89 01 48
      [  244.241307] RSP: 002b:00007ffa3e9c6eb8 EFLAGS: 00000206 ORIG_RAX: 000000000000012e
      [  244.246474] RAX: ffffffffffffffda RBX: 00007ffa3e9c7cdc RCX: 00007ffa3eb8f059
      [  244.250478] RDX: 00007ffa3eb162b4 RSI: 0000000000000000 RDI: 00007ffa3e9c7fb0
      [  244.255396] RBP: 00007ffa3e9c6ed0 R08: 00007ffa3e9c76c0 R09: 0000000000000000
      [  244.260195] R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffff80
      [  244.264201] R13: 000000000000001c R14: 00007ffc5d6b4260 R15: 00007ffa3e1c7000
      [  244.268303]  </TASK>
      
      Add a check_func_arg_reg_off() to the path in which the BPF verifier
      verifies the arguments of global function arguments, specifically
      those which take an argument of type ARG_PTR_TO_DYNPTR |
      MEM_RDONLY. Also, process_dynptr_func() doesn't appear to perform any
      explicit and strict type matching on the supplied register type, so
      let's also enforce that a register either type PTR_TO_STACK or
      CONST_PTR_TO_DYNPTR is by the caller.
      Reported-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Acked-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Acked-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Signed-off-by: default avatarMatt Bobrowski <mattbobrowski@google.com>
      Link: https://lore.kernel.org/r/20240625062857.92760-1-mattbobrowski@google.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      ec2b9a5e
    • Leon Hwang's avatar
      bpf: Fix tailcall cases in test_bpf · d65f3767
      Leon Hwang authored
      Since f663a03c ("bpf, x64: Remove tail call detection"),
      tail_call_reachable won't be detected in x86 JIT. And, tail_call_reachable
      is provided by verifier.
      
      Therefore, in test_bpf, the tail_call_reachable must be provided in test
      cases before running.
      
      Fix and test:
      
      [  174.828662] test_bpf: #0 Tail call leaf jited:1 170 PASS
      [  174.829574] test_bpf: #1 Tail call 2 jited:1 244 PASS
      [  174.830363] test_bpf: #2 Tail call 3 jited:1 296 PASS
      [  174.830924] test_bpf: #3 Tail call 4 jited:1 719 PASS
      [  174.831863] test_bpf: #4 Tail call load/store leaf jited:1 197 PASS
      [  174.832240] test_bpf: #5 Tail call load/store jited:1 326 PASS
      [  174.832240] test_bpf: #6 Tail call error path, max count reached jited:1 2214 PASS
      [  174.835713] test_bpf: #7 Tail call count preserved across function calls jited:1 609751 PASS
      [  175.446098] test_bpf: #8 Tail call error path, NULL target jited:1 472 PASS
      [  175.447597] test_bpf: #9 Tail call error path, index out of range jited:1 206 PASS
      [  175.448833] test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed]
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Closes: https://lore.kernel.org/oe-lkp/202406251415.c51865bc-oliver.sang@intel.com
      Fixes: f663a03c ("bpf, x64: Remove tail call detection")
      Signed-off-by: default avatarLeon Hwang <hffilwlqm@gmail.com>
      Link: https://lore.kernel.org/r/20240625145351.40072-1-hffilwlqm@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d65f3767
  3. 24 Jun, 2024 1 commit
  4. 23 Jun, 2024 2 commits
  5. 21 Jun, 2024 30 commits
  6. 17 Jun, 2024 1 commit
    • Andrii Nakryiko's avatar
      Merge branch 'bpf-support-resilient-split-btf' · f6afdaf7
      Andrii Nakryiko authored
      Alan Maguire says:
      
      ====================
      bpf: support resilient split BTF
      
      Split BPF Type Format (BTF) provides huge advantages in that kernel
      modules only have to provide type information for types that they do not
      share with the core kernel; for core kernel types, split BTF refers to
      core kernel BTF type ids.  So for a STRUCT sk_buff, a module that
      uses that structure (or a pointer to it) simply needs to refer to the
      core kernel type id, saving the need to define the structure and its many
      dependents.  This cuts down on duplication and makes BTF as compact
      as possible.
      
      However, there is a downside.  This scheme requires the references from
      split BTF to base BTF to be valid not just at encoding time, but at use
      time (when the module is loaded).  Even a small change in kernel types
      can perturb the type ids in core kernel BTF, and - if the new reproducible
      BTF option is not used - pahole's parallel processing of compilation units
      can lead to different type ids for the same kernel if the BTF is
      regenerated.
      
      So we have a robustness problem for split BTF for cases where a module is
      not always compiled at the same time as the kernel.  This problem is
      particularly acute for distros which generally want module builders to be
      able to compile a module for the lifetime of a Linux stable-based release,
      and have it continue to be valid over the lifetime of that release, even
      as changes in data structures (and hence BTF types) accrue.  Today it's not
      possible to generate BTF for modules that works beyond the initial
      kernel it is compiled against - kernel bugfixes etc invalidate the split
      BTF references to vmlinux BTF, and BTF is no longer usable for the
      module.
      
      The goal of this series is to provide options to provide additional
      context for cases like this.  That context comes in the form of
      distilled base BTF; it stands in for the base BTF, and contains
      information about the types referenced from split BTF, but not their
      full descriptions.  The modified split BTF will refer to type ids in
      this .BTF.base section, and when the kernel loads such modules it
      will use that .BTF.base to map references from split BTF to the
      equivalent current vmlinux base BTF types.  Once this relocation
      process has succeeded, the module BTF available in /sys/kernel/btf
      will look exactly as if it was built with the current vmlinux;
      references to base types will be fixed up etc.
      
      A module builder - using this series along with the pahole changes -
      can then build a module with distilled base BTF via an out-of-tree
      module build, i.e.
      
      make -C . M=path/2/module
      
      The module will have a .BTF section (the split BTF) and a
      .BTF.base section.  The latter is small in size - distilled base
      BTF does not need full struct/union/enum information for named
      types for example.  For 2667 modules built with distilled base BTF,
      the average size observed was 1556 bytes (stddev 1563).  The overall
      size added to this 2667 modules was 5.3Mb.
      
      Note that for the in-tree modules, this approach is not needed as
      split and base BTF in the case of in-tree modules are always built
      and re-built together.
      
      The series first focuses on generating split BTF with distilled base
      BTF; then relocation support is added to allow split BTF with
      an associated distlled base to be relocated with a new base BTF.
      
      Next Eduard's patch allows BTF ELF parsing to work with both
      .BTF and .BTF.base sections; this ensures that bpftool will be
      able to dump BTF for a module with a .BTF.base section for example,
      or indeed dump relocated BTF where a module and a "-B vmlinux"
      is supplied.
      
      Then we add support to resolve_btfids to ignore base BTF - i.e.
      to avoid relocation - if a .BTF.base section is found.  This ensures
      the .BTF.ids section is populated with ids relative to the distilled
      base (these will be relocated as part of module load).
      
      Finally the series supports storage of .BTF.base data/size in modules
      and supports sharing of relocation code with the kernel to allow
      relocation of module BTF.  For the kernel, this relocation
      process happens at module load time, and we relocate split BTF
      references to point at types in the current vmlinux BTF.  As part of
      this, .BTF.ids references need to be mapped also.
      
      So concretely, what happens is
      
      - we generate split BTF in the .BTF section of a module that refers to
        types in the .BTF.base section as base types; the latter are not full
        type descriptions but provide information about the base type.  So
        a STRUCT sk_buff would be represented as a FWD struct sk_buff in
        distilled base BTF for example.
      - when the module is loaded, the split BTF is relocated with vmlinux
        BTF; in the case of the FWD struct sk_buff, we find the STRUCT sk_buff
        in vmlinux BTF and map all split BTF references to the distilled base
        FWD sk_buff, replacing them with references to the vmlinux BTF
        STRUCT sk_buff.
      
      A previous approach to this problem [1] utilized standalone BTF for such
      cases - where the BTF is not defined relative to base BTF so there is no
      relocation required.  The problem with that approach is that from
      the verifier perspective, some types are special, and having a custom
      representation of a core kernel type that did not necessarily match the
      current representation is not tenable.  So the approach taken here was
      to preserve the split BTF model while minimizing the representation of
      the context needed to relocate split and current vmlinux BTF.
      
      To generate distilled .BTF.base sections the associated dwarves
      patch (to be applied on the "next" branch there) is needed [3]
      Without it, things will still work but modules will not be built
      with a .BTF.base section.
      
      Changes since v5[4]:
      
      - Update search of distilled types to return the first occurrence
        of a string (or a string+size pair); this allows us to iterate
        over all matches in distilled base BTF (Andrii, patch 3)
      - Update to use BTF field iterators (Andrii, patches 1, 3 and 8)
      - Update tests to cover multiple match and associated error cases
        (Eduard, patch 4)
      - Rename elf_sections_info to btf_elf_secs, remove use of
        libbpf_get_error(), reset btf->owns_base when relocation
        succeeds (Andrii, patch 5)
      
      Changes since v4[5]:
      
      - Moved embeddedness, duplicate name checks to relocation time
        and record struct/union size for all distilled struct/unions
        instead of using forwards.  This allows us to carry out
        type compatibility checks based on the base BTF we want to
        relocate with (Eduard, patches 1, 3)
      - Moved to using qsort() instead of qsort_r() as support for
        qsort_r() appears to be missing in Android libc (Andrii, patch 3)
      - Sorting/searching now incorporates size matching depending
        on BTF kind and embeddedness of struct/union (Eduard, Andrii,
        patch 3)
      - Improved naming of various types during relocation to avoid
        confusion (Andrii, patch 3)
      - Incorporated Eduard's patch (patch 5) which handles .BTF.base
        sections internally in btf_parse_elf().  This makes ELF parsing
        work with split BTF, split BTF with a distilled base, split
        BTF with a distilled base _and_ base BTF (by relocating) etc.
        Having this avoids the need for bpftool changes; it will work
        as-is with .BTF.base sections (Eduard, patch 4)
      - Updated resolve_btfids to _not_ relocate BTF for modules
        where a .BTF.base section is present; in that one case we
        do not want to relocate BTF as the .BTF.ids section should
        reflect ids in .BTF.base which will later be relocated on
        module load (Eduard, Andrii, patch 5)
      
      Changes since v3[6]:
      
      - distill now checks for duplicate-named struct/unions and records
        them as a sized struct/union to help identify which of the
        multiple base BTF structs/unions it refers to (Eduard, patch 1)
      - added test support for multiple name handling (Eduard, patch 2)
      - simplified the string mapping when updating split BTF to use
        base BTF instead of distilled base.  Since the only string
        references split BTF can make to base BTF are the names of
        the base types, create a string map from distilled string
        offset -> base BTF string offset and update string offsets
        by visiting all strings in split BTF; this saves having to
        do costly searches of base BTF (Eduard, patch 7,10)
      - fixed bpftool manpage and indentation issues (Quentin, patch 11)
      
      Also explored Eduard's suggestion of doing an implicit fallback
      to checking for .BTF.base section in btf__parse() when it is
      called to get base BTF.  However while it is doable, it turned
      out to be difficult operationally.  Since fallback is implicit
      we do not know the source of the BTF - was it from .BTF or
      .BTF.base? In bpftool, we want to try first standalone BTF,
      then split, then split with distilled base.  Having a way
      to explicitly request .BTF.base via btf__parse_opts() fits
      that model better.
      
      Changes since v2[7]:
      
      - submitted patch to use --btf_features in Makefile.btf for pahole
        v1.26 and later separately (Andrii).  That has landed in bpf-next
        now.
      - distilled base now encodes ENUM64 as fwd ENUM (size 8), eliminating
        the need for support for ENUM64 in btf__add_fwd (patch 1, Andrii)
      - moved to distilling only named types, augmenting split BTF with
        associated reference types; this simplifies greatly the distilled
        base BTF and the mapping operation between distilled and base
        BTF when relocating (most of the series changes, Andrii)
      - relocation now iterates over base BTF, looking for matches based
        on name in distilled BTF.  Distilled BTF is pre-sorted by name
        (Andrii, patch 8)
      - removed most redundant compabitiliby checks aside from struct
        size for base types/embedded structs and kind compatibility
        (since we only match on name) (Andrii, patch 8)
      - btf__parse_opts() now replaces btf_parse() internally in libbpf
        (Eduard, patch 3)
      
      Changes since RFC [8]:
      
      - updated terminology; we replace clunky "base reference" BTF with
        distilling base BTF into a .BTF.base section. Similarly BTF
        reconcilation becomes BTF relocation (Andrii, most patches)
      - add distilled base BTF by default for out-of-tree modules
        (Alexei, patch 8)
      - distill algorithm updated to record size of embedded struct/union
        by recording it as a 0-vlen STRUCT/UNION with size preserved
        (Andrii, patch 2)
      - verify size match on relocation for such STRUCT/UNIONs (Andrii,
        patch 9)
      - with embedded STRUCT/UNION recording size, we can have bpftool
        dump a header representation using .BTF.base + .BTF sections
        rather than special-casing and refusing to use "format c" for
        that case (patch 5)
      - match enum with enum64 and vice versa (Andrii, patch 9)
      - ensure that resolve_btfids works with BTF without .BTF.base
        section (patch 7)
      - update tests to cover embedded types, arrays and function
        prototypes (patches 3, 12)
      
      [1] https://lore.kernel.org/bpf/20231112124834.388735-14-alan.maguire@oracle.com/
      [2] https://lore.kernel.org/bpf/20240501175035.2476830-1-alan.maguire@oracle.com/
      [3] https://lore.kernel.org/bpf/20240517102714.4072080-1-alan.maguire@oracle.com/
      [4] https://lore.kernel.org/bpf/20240528122408.3154936-1-alan.maguire@oracle.com/
      [5] https://lore.kernel.org/bpf/20240517102246.4070184-1-alan.maguire@oracle.com/
      [6] https://lore.kernel.org/bpf/20240510103052.850012-1-alan.maguire@oracle.com/
      [7] https://lore.kernel.org/bpf/20240424154806.3417662-1-alan.maguire@oracle.com/
      [8] https://lore.kernel.org/bpf/20240322102455.98558-1-alan.maguire@oracle.com/
      ====================
      
      Link: https://lore.kernel.org/r/20240613095014.357981-1-alan.maguire@oracle.comSigned-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      f6afdaf7