1. 29 Sep, 2020 24 commits
    • John Fastabend's avatar
      bpf, selftests: Fix cast to smaller integer type 'int' warning in raw_tp · 00e8c44a
      John Fastabend authored
      Fix warning in bpf selftests,
      
      progs/test_raw_tp_test_run.c:18:10: warning: cast to smaller integer type 'int' from 'struct task_struct *' [-Wpointer-to-int-cast]
      
      Change int type cast to long to fix. Discovered with gcc-9 and llvm-11+
      where llvm was recent main branch.
      
      Fixes: 09d8ad16 ("selftests/bpf: Add raw_tp_test_run")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/160134424745.11199.13841922833336698133.stgit@john-Precision-5820-Tower
      00e8c44a
    • Alexei Starovoitov's avatar
      Merge branch 'libbpf: BTF writer APIs' · bc600908
      Alexei Starovoitov authored
      Andrii Nakryiko says:
      
      ====================
      This patch set introduces a new set of BTF APIs to libbpf that allow to
      conveniently produce BTF types and strings. These APIs will allow libbpf to do
      more intrusive modifications of program's BTF (by rewriting it, at least as of
      right now), which is necessary for the upcoming libbpf static linking. But
      they are complete and generic, so can be adopted by anyone who has a need to
      produce BTF type information.
      
      One such example outside of libbpf is pahole, which was actually converted to
      these APIs (locally, pending landing of these changes in libbpf) completely
      and shows reduction in amount of custom pahole code necessary and brings nice
      savings in memory usage (about 370MB reduction at peak for my kernel
      configuration) and even BTF deduplication times (one second reduction,
      23.7s -> 22.7s). Memory savings are due to avoiding pahole's own copy of
      "uncompressed" raw BTF data. Time reduction comes from faster string
      search and deduplication by relying on hashmap instead of BST used by pahole's
      own code. Consequently, these APIs are already tested on real-world
      complicated kernel BTF, but there is also pretty extensive selftest doing
      extra validations.
      
      Selftests in patch #3 add a set of generic ASSERT_{EQ,STREQ,ERR,OK} macros
      that are useful for writing shorter and less repretitive selftests. I decided
      to keep them local to that selftest for now, but if they prove to be useful in
      more contexts we should move them to test_progs.h. And few more (e.g.,
      inequality tests) macros are probably necessary to have a more complete set.
      
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      
      v2->v3:
        - resending original patches #7-9 as patches #1-3 due to merge conflict;
      
      v1->v2:
        - fixed comments (John);
        - renamed btf__append_xxx() into btf__add_xxx() (Alexei);
        - added btf__find_str() in addition to btf__add_str();
        - btf__new_empty() now sets kernel FD to -1 initially.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      bc600908
    • Andrii Nakryiko's avatar
    • Andrii Nakryiko's avatar
      libbpf: Add btf__str_by_offset() as a more generic variant of name_by_offset · f86ed050
      Andrii Nakryiko authored
      BTF strings are used not just for names, they can be arbitrary strings used
      for CO-RE relocations, line/func infos, etc. Thus "name_by_offset" terminology
      is too specific and might be misleading. Instead, introduce
      btf__str_by_offset() API which uses generic string terminology.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200929020533.711288-3-andriin@fb.com
      f86ed050
    • Andrii Nakryiko's avatar
      libbpf: Add BTF writing APIs · 4a3b33f8
      Andrii Nakryiko authored
      Add APIs for appending new BTF types at the end of BTF object.
      
      Each BTF kind has either one API of the form btf__add_<kind>(). For types
      that have variable amount of additional items (struct/union, enum, func_proto,
      datasec), additional API is provided to emit each such item. E.g., for
      emitting a struct, one would use the following sequence of API calls:
      
      btf__add_struct(...);
      btf__add_field(...);
      ...
      btf__add_field(...);
      
      Each btf__add_field() will ensure that the last BTF type is of STRUCT or
      UNION kind and will automatically increment that type's vlen field.
      
      All the strings are provided as C strings (const char *), not a string offset.
      This significantly improves usability of BTF writer APIs. All such strings
      will be automatically appended to string section or existing string will be
      re-used, if such string was already added previously.
      
      Each API attempts to do all the reasonable validations, like enforcing
      non-empty names for entities with required names, proper value bounds, various
      bit offset restrictions, etc.
      
      Type ID validation is minimal because it's possible to emit a type that refers
      to type that will be emitted later, so libbpf has no way to enforce such
      cases. User must be careful to properly emit all the necessary types and
      specify type IDs that will be valid in the finally generated BTF.
      
      Each of btf__add_<kind>() APIs return new type ID on success or negative
      value on error. APIs like btf__add_field() that emit additional items
      return zero on success and negative value on error.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200929020533.711288-2-andriin@fb.com
      4a3b33f8
    • Alexei Starovoitov's avatar
      Merge branch 'bpf: add helpers to support BTF-based kernel' · 98b972d2
      Alexei Starovoitov authored
      Alan Maguire says:
      
      ====================
      This series attempts to provide a simple way for BPF programs (and in
      future other consumers) to utilize BPF Type Format (BTF) information
      to display kernel data structures in-kernel.  The use case this
      functionality is applied to here is to support a snprintf()-like
      helper to copy a BTF representation of kernel data to a string,
      and a BPF seq file helper to display BTF data for an iterator.
      
      There is already support in kernel/bpf/btf.c for "show" functionality;
      the changes here generalize that support from seq-file specific
      verifier display to the more generic case and add another specific
      use case; rather than seq_printf()ing the show data, it is copied
      to a supplied string using a snprintf()-like function.  Other future
      consumers of the show functionality could include a bpf_printk_btf()
      function which printk()ed the data instead.  Oops messaging in
      particular would be an interesting application for such functionality.
      
      The above potential use case hints at a potential reply to
      a reasonable objection that such typed display should be
      solved by tracing programs, where the in-kernel tracing records
      data and the userspace program prints it out.  While this
      is certainly the recommended approach for most cases, I
      believe having an in-kernel mechanism would be valuable
      also.  Critically in BPF programs it greatly simplifies
      debugging and tracing of such data to invoking a simple
      helper.
      
      One challenge raised in an earlier iteration of this work -
      where the BTF printing was implemented as a printk() format
      specifier - was that the amount of data printed per
      printk() was large, and other format specifiers were far
      simpler.  Here we sidestep that concern by printing
      components of the BTF representation as we go for the
      seq file case, and in the string case the snprintf()-like
      operation is intended to be a basis for perf event or
      ringbuf output.  The reasons for avoiding bpf_trace_printk
      are that
      
      1. bpf_trace_printk() strings are restricted in size and
      cannot display anything beyond trivial data structures; and
      2. bpf_trace_printk() is for debugging purposes only.
      
      As Alexei suggested, a bpf_trace_puts() helper could solve
      this in the future but it still would be limited by the
      1000 byte limit for traced strings.
      
      Default output for an sk_buff looks like this (zeroed fields
      are omitted):
      
      (struct sk_buff){
       .transport_header = (__u16)65535,
       .mac_header = (__u16)65535,
       .end = (sk_buff_data_t)192,
       .head = (unsigned char *)0x000000007524fd8b,
       .data = (unsigned char *)0x000000007524fd8b,
       .truesize = (unsigned int)768,
       .users = (refcount_t){
        .refs = (atomic_t){
         .counter = (int)1,
        },
       },
      }
      
      Flags can modify aspects of output format; see patch 3
      for more details.
      
      Changes since v6:
      
      - Updated safe data size to 32, object name size to 80.
        This increases the number of safe copies done, but performance is
        not a key goal here. WRT name size the largest type name length
        in bpf-next according to "pahole -s" is 64 bytes, so that still gives
        room for additional type qualifiers, parens etc within the name limit
        (Alexei, patch 2)
      - Remove inlines and converted as many #defines to functions as was
        possible.  In a few cases - btf_show_type_value[s]() specifically -
        I left these as macros as btf_show_type_value[s]() prepends and
        appends format strings to the format specifier (in order to include
        indentation, delimiters etc so a macro makes that simpler (Alexei,
        patch 2)
      - Handle btf_resolve_size() error in btf_show_obj_safe() (Alexei, patch 2)
      - Removed clang loop unroll in BTF snprintf test (Alexei)
      - switched to using bpf_core_type_id_kernel(type) as suggested by Andrii,
        and Alexei noted that __builtin_btf_type_id(,1) should be used (patch 4)
      - Added skip logic if __builtin_btf_type_id is not available (patches 4,8)
      - Bumped limits on bpf iters to support printing larger structures (Alexei,
        patch 5)
      - Updated overflow bpf_iter tests to reflect new iter max size (patch 6)
      - Updated seq helper to use type id only (Alexei, patch 7)
      - Updated BTF task iter test to use task struct instead of struct fs_struct
        since new limits allow a task_struct to be displayed (patch 8)
      - Fixed E2BIG handling in iter task (Alexei, patch 8)
      
      Changes since v5:
      
      - Moved btf print prepare into patch 3, type show seq
        with flags into patch 2 (Alexei, patches 2,3)
      - Fixed build bot warnings around static declarations
        and printf attributes
      - Renamed functions to snprintf_btf/seq_printf_btf
        (Alexei, patches 3-6)
      
      Changes since v4:
      
      - Changed approach from a BPF trace event-centric design to one
        utilizing a snprintf()-like helper and an iter helper (Alexei,
        patches 3,5)
      - Added tests to verify BTF output (patch 4)
      - Added support to tests for verifying BTF type_id-based display
        as well as type name via __builtin_btf_type_id (Andrii, patch 4).
      - Augmented task iter tests to cover the BTF-based seq helper.
        Because a task_struct's BTF-based representation would overflow
        the PAGE_SIZE limit on iterator data, the "struct fs_struct"
        (task->fs) is displayed for each task instead (Alexei, patch 6).
      
      Changes since v3:
      
      - Moved to RFC since the approach is different (and bpf-next is
        closed)
      - Rather than using a printk() format specifier as the means
        of invoking BTF-enabled display, a dedicated BPF helper is
        used.  This solves the issue of printk() having to output
        large amounts of data using a complex mechanism such as
        BTF traversal, but still provides a way for the display of
        such data to be achieved via BPF programs.  Future work could
        include a bpf_printk_btf() function to invoke display via
        printk() where the elements of a data structure are printk()ed
       one at a time.  Thanks to Petr Mladek, Andy Shevchenko and
        Rasmus Villemoes who took time to look at the earlier printk()
        format-specifier-focused version of this and provided feedback
        clarifying the problems with that approach.
      - Added trace id to the bpf_trace_printk events as a means of
        separating output from standard bpf_trace_printk() events,
        ensuring it can be easily parsed by the reader.
      - Added bpf_trace_btf() helper tests which do simple verification
        of the various display options.
      
      Changes since v2:
      
      - Alexei and Yonghong suggested it would be good to use
        probe_kernel_read() on to-be-shown data to ensure safety
        during operation.  Safe copy via probe_kernel_read() to a
        buffer object in "struct btf_show" is used to support
        this.  A few different approaches were explored
        including dynamic allocation and per-cpu buffers. The
        downside of dynamic allocation is that it would be done
        during BPF program execution for bpf_trace_printk()s using
        %pT format specifiers. The problem with per-cpu buffers
        is we'd have to manage preemption and since the display
        of an object occurs over an extended period and in printk
        context where we'd rather not change preemption status,
        it seemed tricky to manage buffer safety while considering
        preemption.  The approach of utilizing stack buffer space
        via the "struct btf_show" seemed like the simplest approach.
        The stack size of the associated functions which have a
        "struct btf_show" on their stack to support show operation
        (btf_type_snprintf_show() and btf_type_seq_show()) stays
        under 500 bytes. The compromise here is the safe buffer we
        use is small - 256 bytes - and as a result multiple
        probe_kernel_read()s are needed for larger objects. Most
        objects of interest are smaller than this (e.g.
        "struct sk_buff" is 224 bytes), and while task_struct is a
        notable exception at ~8K, performance is not the priority for
        BTF-based display. (Alexei and Yonghong, patch 2).
      - safe buffer use is the default behaviour (and is mandatory
        for BPF) but unsafe display - meaning no safe copy is done
        and we operate on the object itself - is supported via a
        'u' option.
      - pointers are prefixed with 0x for clarity (Alexei, patch 2)
      - added additional comments and explanations around BTF show
        code, especially around determining whether objects such
        zeroed. Also tried to comment safe object scheme used. (Yonghong,
        patch 2)
      - added late_initcall() to initialize vmlinux BTF so that it would
        not have to be initialized during printk operation (Alexei,
        patch 5)
      - removed CONFIG_BTF_PRINTF config option as it is not needed;
        CONFIG_DEBUG_INFO_BTF can be used to gate test behaviour and
        determining behaviour of type-based printk can be done via
        retrieval of BTF data; if it's not there BTF was unavailable
        or broken (Alexei, patches 4,6)
      - fix bpf_trace_printk test to use vmlinux.h and globals via
        skeleton infrastructure, removing need for perf events
        (Andrii, patch 8)
      
      Changes since v1:
      
      - changed format to be more drgn-like, rendering indented type info
        along with type names by default (Alexei)
      - zeroed values are omitted (Arnaldo) by default unless the '0'
        modifier is specified (Alexei)
      - added an option to print pointer values without obfuscation.
        The reason to do this is the sysctls controlling pointer display
        are likely to be irrelevant in many if not most tracing contexts.
        Some questions on this in the outstanding questions section below...
      - reworked printk format specifer so that we no longer rely on format
        %pT<type> but instead use a struct * which contains type information
        (Rasmus). This simplifies the printk parsing, makes use more dynamic
        and also allows specification by BTF id as well as name.
      - removed incorrect patch which tried to fix dereferencing of resolved
        BTF info for vmlinux; instead we skip modifiers for the relevant
        case (array element type determination) (Alexei).
      - fixed issues with negative snprintf format length (Rasmus)
      - added test cases for various data structure formats; base types,
        typedefs, structs, etc.
      - tests now iterate through all typedef, enum, struct and unions
        defined for vmlinux BTF and render a version of the target dummy
        value which is either all zeros or all 0xff values; the idea is this
        exercises the "skip if zero" and "print everything" cases.
      - added support in BPF for using the %pT format specifier in
        bpf_trace_printk()
      - added BPF tests which ensure %pT format specifier use works (Alexei).
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      98b972d2
    • Alan Maguire's avatar
      selftests/bpf: Add test for bpf_seq_printf_btf helper · b72091bd
      Alan Maguire authored
      Add a test verifying iterating over tasks and displaying BTF
      representation of task_struct succeeds.
      Suggested-by: default avatarAlexei Starovoitov <alexei.starovoitov@gmail.com>
      Signed-off-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/1601292670-1616-9-git-send-email-alan.maguire@oracle.com
      b72091bd
    • Alan Maguire's avatar
      bpf: Add bpf_seq_printf_btf helper · eb411377
      Alan Maguire authored
      A helper is added to allow seq file writing of kernel data
      structures using vmlinux BTF.  Its signature is
      
      long bpf_seq_printf_btf(struct seq_file *m, struct btf_ptr *ptr,
                              u32 btf_ptr_size, u64 flags);
      
      Flags and struct btf_ptr definitions/use are identical to the
      bpf_snprintf_btf helper, and the helper returns 0 on success
      or a negative error value.
      Suggested-by: default avatarAlexei Starovoitov <alexei.starovoitov@gmail.com>
      Signed-off-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/1601292670-1616-8-git-send-email-alan.maguire@oracle.com
      eb411377
    • Alan Maguire's avatar
      selftests/bpf: Fix overflow tests to reflect iter size increase · eb58bbf2
      Alan Maguire authored
      bpf iter size increase to PAGE_SIZE << 3 means overflow tests assuming
      page size need to be bumped also.
      Signed-off-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/1601292670-1616-7-git-send-email-alan.maguire@oracle.com
      eb58bbf2
    • Alan Maguire's avatar
      bpf: Bump iter seq size to support BTF representation of large data structures · af653209
      Alan Maguire authored
      BPF iter size is limited to PAGE_SIZE; if we wish to display BTF-based
      representations of larger kernel data structures such as task_struct,
      this will be insufficient.
      Suggested-by: default avatarAlexei Starovoitov <alexei.starovoitov@gmail.com>
      Signed-off-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/1601292670-1616-6-git-send-email-alan.maguire@oracle.com
      af653209
    • Alan Maguire's avatar
      selftests/bpf: Add bpf_snprintf_btf helper tests · 076a95f5
      Alan Maguire authored
      Tests verifying snprintf()ing of various data structures,
      flags combinations using a tp_btf program. Tests are skipped
      if __builtin_btf_type_id is not available to retrieve BTF
      type ids.
      Signed-off-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/1601292670-1616-5-git-send-email-alan.maguire@oracle.com
      076a95f5
    • Alan Maguire's avatar
      bpf: Add bpf_snprintf_btf helper · c4d0bfb4
      Alan Maguire authored
      A helper is added to support tracing kernel type information in BPF
      using the BPF Type Format (BTF).  Its signature is
      
      long bpf_snprintf_btf(char *str, u32 str_size, struct btf_ptr *ptr,
      		      u32 btf_ptr_size, u64 flags);
      
      struct btf_ptr * specifies
      
      - a pointer to the data to be traced
      - the BTF id of the type of data pointed to
      - a flags field is provided for future use; these flags
        are not to be confused with the BTF_F_* flags
        below that control how the btf_ptr is displayed; the
        flags member of the struct btf_ptr may be used to
        disambiguate types in kernel versus module BTF, etc;
        the main distinction is the flags relate to the type
        and information needed in identifying it; not how it
        is displayed.
      
      For example a BPF program with a struct sk_buff *skb
      could do the following:
      
      	static struct btf_ptr b = { };
      
      	b.ptr = skb;
      	b.type_id = __builtin_btf_type_id(struct sk_buff, 1);
      	bpf_snprintf_btf(str, sizeof(str), &b, sizeof(b), 0, 0);
      
      Default output looks like this:
      
      (struct sk_buff){
       .transport_header = (__u16)65535,
       .mac_header = (__u16)65535,
       .end = (sk_buff_data_t)192,
       .head = (unsigned char *)0x000000007524fd8b,
       .data = (unsigned char *)0x000000007524fd8b,
       .truesize = (unsigned int)768,
       .users = (refcount_t){
        .refs = (atomic_t){
         .counter = (int)1,
        },
       },
      }
      
      Flags modifying display are as follows:
      
      - BTF_F_COMPACT:	no formatting around type information
      - BTF_F_NONAME:		no struct/union member names/types
      - BTF_F_PTR_RAW:	show raw (unobfuscated) pointer values;
      			equivalent to %px.
      - BTF_F_ZERO:		show zero-valued struct/union members;
      			they are not displayed by default
      Signed-off-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/1601292670-1616-4-git-send-email-alan.maguire@oracle.com
      c4d0bfb4
    • Alan Maguire's avatar
      bpf: Move to generic BTF show support, apply it to seq files/strings · 31d0bc81
      Alan Maguire authored
      generalize the "seq_show" seq file support in btf.c to support
      a generic show callback of which we support two instances; the
      current seq file show, and a show with snprintf() behaviour which
      instead writes the type data to a supplied string.
      
      Both classes of show function call btf_type_show() with different
      targets; the seq file or the string to be written.  In the string
      case we need to track additional data - length left in string to write
      and length to return that we would have written (a la snprintf).
      
      By default show will display type information, field members and
      their types and values etc, and the information is indented
      based upon structure depth. Zeroed fields are omitted.
      
      Show however supports flags which modify its behaviour:
      
      BTF_SHOW_COMPACT - suppress newline/indent.
      BTF_SHOW_NONAME - suppress show of type and member names.
      BTF_SHOW_PTR_RAW - do not obfuscate pointer values.
      BTF_SHOW_UNSAFE - do not copy data to safe buffer before display.
      BTF_SHOW_ZERO - show zeroed values (by default they are not shown).
      Signed-off-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/1601292670-1616-3-git-send-email-alan.maguire@oracle.com
      31d0bc81
    • Alan Maguire's avatar
      76654e67
    • Andrii Nakryiko's avatar
      libbpf: Add btf__new_empty() to create an empty BTF object · a871b043
      Andrii Nakryiko authored
      Add an ability to create an empty BTF object from scratch. This is going to be
      used by pahole for BTF encoding. And also by selftest for convenient creation
      of BTF objects.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200926011357.2366158-7-andriin@fb.com
      a871b043
    • Andrii Nakryiko's avatar
      libbpf: Allow modification of BTF and add btf__add_str API · 919d2b1d
      Andrii Nakryiko authored
      Allow internal BTF representation to switch from default read-only mode, in
      which raw BTF data is a single non-modifiable block of memory with BTF header,
      types, and strings layed out sequentially and contiguously in memory, into
      a writable representation with types and strings data split out into separate
      memory regions, that can be dynamically expanded.
      
      Such writable internal representation is transparent to users of libbpf APIs,
      but allows to append new types and strings at the end of BTF, which is
      a typical use case when generating BTF programmatically. All the basic
      guarantees of BTF types and strings layout is preserved, i.e., user can get
      `struct btf_type *` pointer and read it directly. Such btf_type pointers might
      be invalidated if BTF is modified, so some care is required in such mixed
      read/write scenarios.
      
      Switch from read-only to writable configuration happens automatically the
      first time when user attempts to modify BTF by either adding a new type or new
      string. It is still possible to get raw BTF data, which is a single piece of
      memory that can be persisted in ELF section or into a file as raw BTF. Such
      raw data memory is also still owned by BTF and will be freed either when BTF
      object is freed or if another modification to BTF happens, as any modification
      invalidates BTF raw representation.
      
      This patch adds the first two BTF manipulation APIs: btf__add_str(), which
      allows to add arbitrary strings to BTF string section, and btf__find_str()
      which allows to find existing string offset, but not add it if it's missing.
      All the added strings are automatically deduplicated. This is achieved by
      maintaining an additional string lookup index for all unique strings. Such
      index is built when BTF is switched to modifiable mode. If at that time BTF
      strings section contained duplicate strings, they are not de-duplicated. This
      is done specifically to not modify the existing content of BTF (types, their
      string offsets, etc), which can cause confusion and is especially important
      property if there is struct btf_ext associated with struct btf. By following
      this "imperfect deduplication" process, btf_ext is kept consitent and correct.
      If deduplication of strings is necessary, it can be forced by doing BTF
      deduplication, at which point all the strings will be eagerly deduplicated and
      all string offsets both in struct btf and struct btf_ext will be updated.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200926011357.2366158-6-andriin@fb.com
      919d2b1d
    • Andrii Nakryiko's avatar
      libbpf: Extract generic string hashing function for reuse · 7d9c71e1
      Andrii Nakryiko authored
      Calculating a hash of zero-terminated string is a common need when using
      hashmap, so extract it for reuse.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200926011357.2366158-5-andriin@fb.com
      7d9c71e1
    • Andrii Nakryiko's avatar
      libbpf: Generalize common logic for managing dynamically-sized arrays · 192f5a1f
      Andrii Nakryiko authored
      Managing dynamically-sized array is a common, but not trivial functionality,
      which significant amount of logic and code to implement properly. So instead
      of re-implementing it all the time, extract it into a helper function ans
      reuse.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200926011357.2366158-4-andriin@fb.com
      192f5a1f
    • Andrii Nakryiko's avatar
      libbpf: Remove assumption of single contiguous memory for BTF data · b8604247
      Andrii Nakryiko authored
      Refactor internals of struct btf to remove assumptions that BTF header, type
      data, and string data are layed out contiguously in a memory in a single
      memory allocation. Now we have three separate pointers pointing to the start
      of each respective are: header, types, strings. In the next patches, these
      pointers will be re-assigned to point to independently allocated memory areas,
      if BTF needs to be modified.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200926011357.2366158-3-andriin@fb.com
      b8604247
    • Andrii Nakryiko's avatar
      libbpf: Refactor internals of BTF type index · 740e69c3
      Andrii Nakryiko authored
      Refactor implementation of internal BTF type index to not use direct pointers.
      Instead it uses offset relative to the start of types data section. This
      allows for types data to be reallocatable, enabling implementation of
      modifiable BTF.
      
      As now getting type by ID has an extra indirection step, convert all internal
      type lookups to a new helper btf_type_id(), that returns non-const pointer to
      a type by its ID.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200926011357.2366158-2-andriin@fb.com
      740e69c3
    • Toke Høiland-Jørgensen's avatar
      selftests: Remove fmod_ret from test_overhead · b000def2
      Toke Høiland-Jørgensen authored
      The test_overhead prog_test included an fmod_ret program that attached to
      __set_task_comm() in the kernel. However, this function was never listed as
      allowed for return modification, so this only worked because of the
      verifier skipping tests when a trampoline already existed for the attach
      point. Now that the verifier checks have been fixed, remove fmod_ret from
      the test so it works again.
      
      Fixes: 4eaf0b5c ("selftest/bpf: Fmod_ret prog and implement test_overhead as part of bench")
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      b000def2
    • Toke Høiland-Jørgensen's avatar
      bpf: verifier: refactor check_attach_btf_id() · f7b12b6f
      Toke Høiland-Jørgensen authored
      The check_attach_btf_id() function really does three things:
      
      1. It performs a bunch of checks on the program to ensure that the
         attachment is valid.
      
      2. It stores a bunch of state about the attachment being requested in
         the verifier environment and struct bpf_prog objects.
      
      3. It allocates a trampoline for the attachment.
      
      This patch splits out (1.) and (3.) into separate functions which will
      perform the checks, but return the computed values instead of directly
      modifying the environment. This is done in preparation for reusing the
      checks when the actual attachment is happening, which will allow tracing
      programs to have multiple (compatible) attachments.
      
      This also fixes a bug where a bunch of checks were skipped if a trampoline
      already existed for the tracing target.
      
      Fixes: 6ba43b76 ("bpf: Attachment verification for BPF_MODIFY_RETURN")
      Fixes: 1e6c62a8 ("bpf: Introduce sleepable BPF programs")
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f7b12b6f
    • Toke Høiland-Jørgensen's avatar
      bpf: change logging calls from verbose() to bpf_log() and use log pointer · efc68158
      Toke Høiland-Jørgensen authored
      In preparation for moving code around, change a bunch of references to
      env->log (and the verbose() logging helper) to use bpf_log() and a direct
      pointer to struct bpf_verifier_log. While we're touching the function
      signature, mark the 'prog' argument to bpf_check_type_match() as const.
      
      Also enhance the bpf_verifier_log_needed() check to handle NULL pointers
      for the log struct so we can re-use the code with logging disabled.
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      efc68158
    • Toke Høiland-Jørgensen's avatar
      bpf: disallow attaching modify_return tracing functions to other BPF programs · 1af9270e
      Toke Høiland-Jørgensen authored
      From the checks and commit messages for modify_return, it seems it was
      never the intention that it should be possible to attach a tracing program
      with expected_attach_type == BPF_MODIFY_RETURN to another BPF program.
      However, check_attach_modify_return() will only look at the function name,
      so if the target function starts with "security_", the attach will be
      allowed even for bpf2bpf attachment.
      
      Fix this oversight by also blocking the modification if a target program is
      supplied.
      
      Fixes: 18644cec ("bpf: Fix use-after-free in fmod_ret check")
      Fixes: 6ba43b76 ("bpf: Attachment verification for BPF_MODIFY_RETURN")
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      1af9270e
  2. 28 Sep, 2020 10 commits
  3. 26 Sep, 2020 1 commit
    • John Fastabend's avatar
      bpf: Add comment to document BTF type PTR_TO_BTF_ID_OR_NULL · ba5f4cfe
      John Fastabend authored
      The meaning of PTR_TO_BTF_ID_OR_NULL differs slightly from other types
      denoted with the *_OR_NULL type. For example the types PTR_TO_SOCKET
      and PTR_TO_SOCKET_OR_NULL can be used for branch analysis because the
      type PTR_TO_SOCKET is guaranteed to _not_ have a null value.
      
      In contrast PTR_TO_BTF_ID and BTF_TO_BTF_ID_OR_NULL have slightly
      different meanings. A PTR_TO_BTF_TO_ID may be a pointer to NULL value,
      but it is safe to read this pointer in the program context because
      the program context will handle any faults. The fallout is for
      PTR_TO_BTF_ID the verifier can assume reads are safe, but can not
      use the type in branch analysis. Additionally, authors need to be
      extra careful when passing PTR_TO_BTF_ID into helpers. In general
      helpers consuming type PTR_TO_BTF_ID will need to assume it may
      be null.
      
      Seeing the above is not obvious to readers without the back knowledge
      lets add a comment in the type definition.
      
      Editorial comment, as networking and tracing programs get closer
      and more tightly merged we may need to consider a new type that we
      can ensure is non-null for branch analysis and also passing into
      helpers.
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarLorenz Bauer <lmb@cloudflare.com>
      ba5f4cfe
  4. 25 Sep, 2020 5 commits
    • John Fastabend's avatar
      bpf: Add AND verifier test case where 32bit and 64bit bounds differ · 99d4def4
      John Fastabend authored
      If we AND two values together that are known in the 32bit subregs, but not
      known in the 64bit registers we rely on the tnum value to report the 32bit
      subreg is known. And do not use mark_reg_known() directly from
      scalar32_min_max_and()
      
      Add an AND test to cover the case with known 32bit subreg, but unknown
      64bit reg.
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      99d4def4
    • John Fastabend's avatar
      bpf, verifier: Remove redundant var_off.value ops in scalar known reg cases · 4fbb38a3
      John Fastabend authored
      In BPF_AND and BPF_OR alu cases we have this pattern when the src and dst
      tnum is a constant.
      
       1 dst_reg->var_off = tnum_[op](dst_reg->var_off, src_reg.var_off)
       2 scalar32_min_max_[op]
       3       if (known) return
       4 scalar_min_max_[op]
       5       if (known)
       6          __mark_reg_known(dst_reg,
                         dst_reg->var_off.value [op] src_reg.var_off.value)
      
      The result is in 1 we calculate the var_off value and store it in the
      dst_reg. Then in 6 we duplicate this logic doing the op again on the
      value.
      
      The duplication comes from the the tnum_[op] handlers because they have
      already done the value calcuation. For example this is tnum_and().
      
       struct tnum tnum_and(struct tnum a, struct tnum b)
       {
      	u64 alpha, beta, v;
      
      	alpha = a.value | a.mask;
      	beta = b.value | b.mask;
      	v = a.value & b.value;
      	return TNUM(v, alpha & beta & ~v);
       }
      
      So lets remove the redundant op calculation. Its confusing for readers
      and unnecessary. Its also not harmful because those ops have the
      property, r1 & r1 = r1 and r1 | r1 = r1.
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      4fbb38a3
    • Alexei Starovoitov's avatar
      Merge branch 'enable-bpf_skc-cast-for-networking-progs' · 84085f87
      Alexei Starovoitov authored
      Martin KaFai Lau says:
      
      ====================
      This set allows networking prog type to directly read fields from
      the in-kernel socket type, e.g. "struct tcp_sock".
      
      Patch 2 has the details on the use case.
      
      v3:
      - Pass arg_btf_id instead of fn into check_reg_type() in Patch 1 (Lorenz)
      - Move arg_btf_id from func_proto to struct bpf_reg_types in Patch 2 (Lorenz)
      - Remove test_sock_fields from .gitignore in Patch 8 (Andrii)
      - Add tests to have better coverage on the modified helpers (Alexei)
        Patch 13 is added.
      - Use "void *sk" as the helper argument in UAPI bpf.h
      
      v3:
      - ARG_PTR_TO_SOCK_COMMON_OR_NULL was attempted in v2.  The _OR_NULL was
        needed because the PTR_TO_BTF_ID could be NULL but note that a could be NULL
        PTR_TO_BTF_ID is not a scalar NULL to the verifier.  "_OR_NULL" implicitly
        gives an expectation that the helper can take a scalar NULL which does
        not make sense in most (except one) helpers.  Passing scalar NULL
        should be rejected at the verification time.
      
        Thus, this patch uses ARG_PTR_TO_BTF_ID_SOCK_COMMON to specify that the
        helper can take both the btf-id ptr or the legacy PTR_TO_SOCK_COMMON but
        not scalar NULL.  It requires the func_proto to explicitly specify the
        arg_btf_id such that there is a very clear expectation that the helper
        can handle a NULL PTR_TO_BTF_ID.
      
      v2:
      - Add ARG_PTR_TO_SOCK_COMMON_OR_NULL (Lorenz)
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      84085f87
    • Martin KaFai Lau's avatar
      bpf: selftest: Add test_btf_skc_cls_ingress · 9a856cae
      Martin KaFai Lau authored
      This patch attaches a classifier prog to the ingress filter.
      It exercises the following helpers with different socket pointer
      types in different logical branches:
      1. bpf_sk_release()
      2. bpf_sk_assign()
      3. bpf_skc_to_tcp_request_sock(), bpf_skc_to_tcp_sock()
      4. bpf_tcp_gen_syncookie, bpf_tcp_check_syncookie
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200925000458.3859627-1-kafai@fb.com
      9a856cae
    • Martin KaFai Lau's avatar
      bpf: selftest: Remove enum tcp_ca_state from bpf_tcp_helpers.h · 0c402c6c
      Martin KaFai Lau authored
      The enum tcp_ca_state is available in <linux/tcp.h>.
      Remove it from the bpf_tcp_helpers.h to avoid conflict when the bpf prog
      needs to include both both <linux/tcp.h> and bpf_tcp_helpers.h.
      
      Modify the bpf_cubic.c and bpf_dctcp.c to use <linux/tcp.h> instead.
      The <linux/stddef.h> is needed by <linux/tcp.h>.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200925000452.3859313-1-kafai@fb.com
      0c402c6c