1. 21 Jun, 2024 26 commits
  2. 17 Jun, 2024 7 commits
    • Andrii Nakryiko's avatar
      Merge branch 'bpf-support-resilient-split-btf' · f6afdaf7
      Andrii Nakryiko authored
      Alan Maguire says:
      
      ====================
      bpf: support resilient split BTF
      
      Split BPF Type Format (BTF) provides huge advantages in that kernel
      modules only have to provide type information for types that they do not
      share with the core kernel; for core kernel types, split BTF refers to
      core kernel BTF type ids.  So for a STRUCT sk_buff, a module that
      uses that structure (or a pointer to it) simply needs to refer to the
      core kernel type id, saving the need to define the structure and its many
      dependents.  This cuts down on duplication and makes BTF as compact
      as possible.
      
      However, there is a downside.  This scheme requires the references from
      split BTF to base BTF to be valid not just at encoding time, but at use
      time (when the module is loaded).  Even a small change in kernel types
      can perturb the type ids in core kernel BTF, and - if the new reproducible
      BTF option is not used - pahole's parallel processing of compilation units
      can lead to different type ids for the same kernel if the BTF is
      regenerated.
      
      So we have a robustness problem for split BTF for cases where a module is
      not always compiled at the same time as the kernel.  This problem is
      particularly acute for distros which generally want module builders to be
      able to compile a module for the lifetime of a Linux stable-based release,
      and have it continue to be valid over the lifetime of that release, even
      as changes in data structures (and hence BTF types) accrue.  Today it's not
      possible to generate BTF for modules that works beyond the initial
      kernel it is compiled against - kernel bugfixes etc invalidate the split
      BTF references to vmlinux BTF, and BTF is no longer usable for the
      module.
      
      The goal of this series is to provide options to provide additional
      context for cases like this.  That context comes in the form of
      distilled base BTF; it stands in for the base BTF, and contains
      information about the types referenced from split BTF, but not their
      full descriptions.  The modified split BTF will refer to type ids in
      this .BTF.base section, and when the kernel loads such modules it
      will use that .BTF.base to map references from split BTF to the
      equivalent current vmlinux base BTF types.  Once this relocation
      process has succeeded, the module BTF available in /sys/kernel/btf
      will look exactly as if it was built with the current vmlinux;
      references to base types will be fixed up etc.
      
      A module builder - using this series along with the pahole changes -
      can then build a module with distilled base BTF via an out-of-tree
      module build, i.e.
      
      make -C . M=path/2/module
      
      The module will have a .BTF section (the split BTF) and a
      .BTF.base section.  The latter is small in size - distilled base
      BTF does not need full struct/union/enum information for named
      types for example.  For 2667 modules built with distilled base BTF,
      the average size observed was 1556 bytes (stddev 1563).  The overall
      size added to this 2667 modules was 5.3Mb.
      
      Note that for the in-tree modules, this approach is not needed as
      split and base BTF in the case of in-tree modules are always built
      and re-built together.
      
      The series first focuses on generating split BTF with distilled base
      BTF; then relocation support is added to allow split BTF with
      an associated distlled base to be relocated with a new base BTF.
      
      Next Eduard's patch allows BTF ELF parsing to work with both
      .BTF and .BTF.base sections; this ensures that bpftool will be
      able to dump BTF for a module with a .BTF.base section for example,
      or indeed dump relocated BTF where a module and a "-B vmlinux"
      is supplied.
      
      Then we add support to resolve_btfids to ignore base BTF - i.e.
      to avoid relocation - if a .BTF.base section is found.  This ensures
      the .BTF.ids section is populated with ids relative to the distilled
      base (these will be relocated as part of module load).
      
      Finally the series supports storage of .BTF.base data/size in modules
      and supports sharing of relocation code with the kernel to allow
      relocation of module BTF.  For the kernel, this relocation
      process happens at module load time, and we relocate split BTF
      references to point at types in the current vmlinux BTF.  As part of
      this, .BTF.ids references need to be mapped also.
      
      So concretely, what happens is
      
      - we generate split BTF in the .BTF section of a module that refers to
        types in the .BTF.base section as base types; the latter are not full
        type descriptions but provide information about the base type.  So
        a STRUCT sk_buff would be represented as a FWD struct sk_buff in
        distilled base BTF for example.
      - when the module is loaded, the split BTF is relocated with vmlinux
        BTF; in the case of the FWD struct sk_buff, we find the STRUCT sk_buff
        in vmlinux BTF and map all split BTF references to the distilled base
        FWD sk_buff, replacing them with references to the vmlinux BTF
        STRUCT sk_buff.
      
      A previous approach to this problem [1] utilized standalone BTF for such
      cases - where the BTF is not defined relative to base BTF so there is no
      relocation required.  The problem with that approach is that from
      the verifier perspective, some types are special, and having a custom
      representation of a core kernel type that did not necessarily match the
      current representation is not tenable.  So the approach taken here was
      to preserve the split BTF model while minimizing the representation of
      the context needed to relocate split and current vmlinux BTF.
      
      To generate distilled .BTF.base sections the associated dwarves
      patch (to be applied on the "next" branch there) is needed [3]
      Without it, things will still work but modules will not be built
      with a .BTF.base section.
      
      Changes since v5[4]:
      
      - Update search of distilled types to return the first occurrence
        of a string (or a string+size pair); this allows us to iterate
        over all matches in distilled base BTF (Andrii, patch 3)
      - Update to use BTF field iterators (Andrii, patches 1, 3 and 8)
      - Update tests to cover multiple match and associated error cases
        (Eduard, patch 4)
      - Rename elf_sections_info to btf_elf_secs, remove use of
        libbpf_get_error(), reset btf->owns_base when relocation
        succeeds (Andrii, patch 5)
      
      Changes since v4[5]:
      
      - Moved embeddedness, duplicate name checks to relocation time
        and record struct/union size for all distilled struct/unions
        instead of using forwards.  This allows us to carry out
        type compatibility checks based on the base BTF we want to
        relocate with (Eduard, patches 1, 3)
      - Moved to using qsort() instead of qsort_r() as support for
        qsort_r() appears to be missing in Android libc (Andrii, patch 3)
      - Sorting/searching now incorporates size matching depending
        on BTF kind and embeddedness of struct/union (Eduard, Andrii,
        patch 3)
      - Improved naming of various types during relocation to avoid
        confusion (Andrii, patch 3)
      - Incorporated Eduard's patch (patch 5) which handles .BTF.base
        sections internally in btf_parse_elf().  This makes ELF parsing
        work with split BTF, split BTF with a distilled base, split
        BTF with a distilled base _and_ base BTF (by relocating) etc.
        Having this avoids the need for bpftool changes; it will work
        as-is with .BTF.base sections (Eduard, patch 4)
      - Updated resolve_btfids to _not_ relocate BTF for modules
        where a .BTF.base section is present; in that one case we
        do not want to relocate BTF as the .BTF.ids section should
        reflect ids in .BTF.base which will later be relocated on
        module load (Eduard, Andrii, patch 5)
      
      Changes since v3[6]:
      
      - distill now checks for duplicate-named struct/unions and records
        them as a sized struct/union to help identify which of the
        multiple base BTF structs/unions it refers to (Eduard, patch 1)
      - added test support for multiple name handling (Eduard, patch 2)
      - simplified the string mapping when updating split BTF to use
        base BTF instead of distilled base.  Since the only string
        references split BTF can make to base BTF are the names of
        the base types, create a string map from distilled string
        offset -> base BTF string offset and update string offsets
        by visiting all strings in split BTF; this saves having to
        do costly searches of base BTF (Eduard, patch 7,10)
      - fixed bpftool manpage and indentation issues (Quentin, patch 11)
      
      Also explored Eduard's suggestion of doing an implicit fallback
      to checking for .BTF.base section in btf__parse() when it is
      called to get base BTF.  However while it is doable, it turned
      out to be difficult operationally.  Since fallback is implicit
      we do not know the source of the BTF - was it from .BTF or
      .BTF.base? In bpftool, we want to try first standalone BTF,
      then split, then split with distilled base.  Having a way
      to explicitly request .BTF.base via btf__parse_opts() fits
      that model better.
      
      Changes since v2[7]:
      
      - submitted patch to use --btf_features in Makefile.btf for pahole
        v1.26 and later separately (Andrii).  That has landed in bpf-next
        now.
      - distilled base now encodes ENUM64 as fwd ENUM (size 8), eliminating
        the need for support for ENUM64 in btf__add_fwd (patch 1, Andrii)
      - moved to distilling only named types, augmenting split BTF with
        associated reference types; this simplifies greatly the distilled
        base BTF and the mapping operation between distilled and base
        BTF when relocating (most of the series changes, Andrii)
      - relocation now iterates over base BTF, looking for matches based
        on name in distilled BTF.  Distilled BTF is pre-sorted by name
        (Andrii, patch 8)
      - removed most redundant compabitiliby checks aside from struct
        size for base types/embedded structs and kind compatibility
        (since we only match on name) (Andrii, patch 8)
      - btf__parse_opts() now replaces btf_parse() internally in libbpf
        (Eduard, patch 3)
      
      Changes since RFC [8]:
      
      - updated terminology; we replace clunky "base reference" BTF with
        distilling base BTF into a .BTF.base section. Similarly BTF
        reconcilation becomes BTF relocation (Andrii, most patches)
      - add distilled base BTF by default for out-of-tree modules
        (Alexei, patch 8)
      - distill algorithm updated to record size of embedded struct/union
        by recording it as a 0-vlen STRUCT/UNION with size preserved
        (Andrii, patch 2)
      - verify size match on relocation for such STRUCT/UNIONs (Andrii,
        patch 9)
      - with embedded STRUCT/UNION recording size, we can have bpftool
        dump a header representation using .BTF.base + .BTF sections
        rather than special-casing and refusing to use "format c" for
        that case (patch 5)
      - match enum with enum64 and vice versa (Andrii, patch 9)
      - ensure that resolve_btfids works with BTF without .BTF.base
        section (patch 7)
      - update tests to cover embedded types, arrays and function
        prototypes (patches 3, 12)
      
      [1] https://lore.kernel.org/bpf/20231112124834.388735-14-alan.maguire@oracle.com/
      [2] https://lore.kernel.org/bpf/20240501175035.2476830-1-alan.maguire@oracle.com/
      [3] https://lore.kernel.org/bpf/20240517102714.4072080-1-alan.maguire@oracle.com/
      [4] https://lore.kernel.org/bpf/20240528122408.3154936-1-alan.maguire@oracle.com/
      [5] https://lore.kernel.org/bpf/20240517102246.4070184-1-alan.maguire@oracle.com/
      [6] https://lore.kernel.org/bpf/20240510103052.850012-1-alan.maguire@oracle.com/
      [7] https://lore.kernel.org/bpf/20240424154806.3417662-1-alan.maguire@oracle.com/
      [8] https://lore.kernel.org/bpf/20240322102455.98558-1-alan.maguire@oracle.com/
      ====================
      
      Link: https://lore.kernel.org/r/20240613095014.357981-1-alan.maguire@oracle.comSigned-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      f6afdaf7
    • Alan Maguire's avatar
      resolve_btfids: Handle presence of .BTF.base section · 6ba77385
      Alan Maguire authored
      Now that btf_parse_elf() handles .BTF.base section presence,
      we need to ensure that resolve_btfids uses .BTF.base when present
      rather than the vmlinux base BTF passed in via the -B option.
      Detect .BTF.base section presence and unset the base BTF path
      to ensure that BTF ELF parsing will do the right thing.
      Signed-off-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Reviewed-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Link: https://lore.kernel.org/bpf/20240613095014.357981-7-alan.maguire@oracle.com
      6ba77385
    • Eduard Zingerman's avatar
      libbpf: Make btf_parse_elf process .BTF.base transparently · c86f180f
      Eduard Zingerman authored
      Update btf_parse_elf() to check if .BTF.base section is present.
      The logic is as follows:
      
        if .BTF.base section exists:
           distilled_base := btf_new(.BTF.base)
        if distilled_base:
           btf := btf_new(.BTF, .base_btf=distilled_base)
           if base_btf:
              btf_relocate(btf, base_btf)
        else:
           btf := btf_new(.BTF)
        return btf
      
      In other words:
      - if .BTF.base section exists, load BTF from it and use it as a base
        for .BTF load;
      - if base_btf is specified and .BTF.base section exist, relocate newly
        loaded .BTF against base_btf.
      Signed-off-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Signed-off-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20240613095014.357981-6-alan.maguire@oracle.com
      c86f180f
    • Alan Maguire's avatar
      selftests/bpf: Extend distilled BTF tests to cover BTF relocation · affdeb50
      Alan Maguire authored
      Ensure relocated BTF looks as expected; in this case identical to
      original split BTF, with a few duplicate anonymous types added to
      split BTF by the relocation process.  Also add relocation tests
      for edge cases like missing type in base BTF and multiple types
      of the same name.
      Signed-off-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Link: https://lore.kernel.org/bpf/20240613095014.357981-5-alan.maguire@oracle.com
      affdeb50
    • Alan Maguire's avatar
      libbpf: Split BTF relocation · 19e00c89
      Alan Maguire authored
      Map distilled base BTF type ids referenced in split BTF and their
      references to the base BTF passed in, and if the mapping succeeds,
      reparent the split BTF to the base BTF.
      
      Relocation is done by first verifying that distilled base BTF
      only consists of named INT, FLOAT, ENUM, FWD, STRUCT and
      UNION kinds; then we sort these to speed lookups.  Once sorted,
      the base BTF is iterated, and for each relevant kind we check
      for an equivalent in distilled base BTF.  When found, the
      mapping from distilled -> base BTF id and string offset is recorded.
      In establishing mappings, we need to ensure we check STRUCT/UNION
      size when the STRUCT/UNION is embedded in a split BTF STRUCT/UNION,
      and when duplicate names exist for the same STRUCT/UNION.  Otherwise
      size is ignored in matching STRUCT/UNIONs.
      
      Once all mappings are established, we can update type ids
      and string offsets in split BTF and reparent it to the new base.
      Signed-off-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Link: https://lore.kernel.org/bpf/20240613095014.357981-4-alan.maguire@oracle.com
      19e00c89
    • Alan Maguire's avatar
      selftests/bpf: Test distilled base, split BTF generation · eb20e727
      Alan Maguire authored
      Test generation of split+distilled base BTF, ensuring that
      
      - named base BTF STRUCTs and UNIONs are represented as 0-vlen sized
        STRUCT/UNIONs
      - named ENUM[64]s are represented as 0-vlen named ENUM[64]s
      - anonymous struct/unions are represented in full in split BTF
      - anonymous enums are represented in full in split BTF
      - types unreferenced from split BTF are not present in distilled
        base BTF
      
      Also test that with vmlinux BTF and split BTF based upon it,
      we only represent needed base types referenced from split BTF
      in distilled base.
      Signed-off-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Link: https://lore.kernel.org/bpf/20240613095014.357981-3-alan.maguire@oracle.com
      eb20e727
    • Alan Maguire's avatar
      libbpf: Add btf__distill_base() creating split BTF with distilled base BTF · 58e185a0
      Alan Maguire authored
      To support more robust split BTF, adding supplemental context for the
      base BTF type ids that split BTF refers to is required.  Without such
      references, a simple shuffling of base BTF type ids (without any other
      significant change) invalidates the split BTF.  Here the attempt is made
      to store additional context to make split BTF more robust.
      
      This context comes in the form of distilled base BTF providing minimal
      information (name and - in some cases - size) for base INTs, FLOATs,
      STRUCTs, UNIONs, ENUMs and ENUM64s along with modified split BTF that
      points at that base and contains any additional types needed (such as
      TYPEDEF, PTR and anonymous STRUCT/UNION declarations).  This
      information constitutes the minimal BTF representation needed to
      disambiguate or remove split BTF references to base BTF.  The rules
      are as follows:
      
      - INT, FLOAT, FWD are recorded in full.
      - if a named base BTF STRUCT or UNION is referred to from split BTF, it
        will be encoded as a zero-member sized STRUCT/UNION (preserving
        size for later relocation checks).  Only base BTF STRUCT/UNIONs
        that are either embedded in split BTF STRUCT/UNIONs or that have
        multiple STRUCT/UNION instances of the same name will _need_ size
        checks at relocation time, but as it is possible a different set of
        types will be duplicates in the later to-be-resolved base BTF,
        we preserve size information for all named STRUCT/UNIONs.
      - if an ENUM[64] is named, a ENUM forward representation (an ENUM
        with no values) of the same size is used.
      - in all other cases, the type is added to the new split BTF.
      
      Avoiding struct/union/enum/enum64 expansion is important to keep the
      distilled base BTF representation to a minimum size.
      
      When successful, new representations of the distilled base BTF and new
      split BTF that refers to it are returned.  Both need to be freed by the
      caller.
      
      So to take a simple example, with split BTF with a type referring
      to "struct sk_buff", we will generate distilled base BTF with a
      0-member STRUCT sk_buff of the appropriate size, and the split BTF
      will refer to it instead.
      
      Tools like pahole can utilize such split BTF to populate the .BTF
      section (split BTF) and an additional .BTF.base section.  Then
      when the split BTF is loaded, the distilled base BTF can be used
      to relocate split BTF to reference the current (and possibly changed)
      base BTF.
      
      So for example if "struct sk_buff" was id 502 when the split BTF was
      originally generated,  we can use the distilled base BTF to see that
      id 502 refers to a "struct sk_buff" and replace instances of id 502
      with the current (relocated) base BTF sk_buff type id.
      
      Distilled base BTF is small; when building a kernel with all modules
      using distilled base BTF as a test, overall module size grew by only
      5.3Mb total across ~2700 modules.
      Signed-off-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Link: https://lore.kernel.org/bpf/20240613095014.357981-2-alan.maguire@oracle.com
      58e185a0
  3. 14 Jun, 2024 4 commits
  4. 13 Jun, 2024 3 commits
    • Alexei Starovoitov's avatar
      Merge branch 'bpf-make-trusted-args-nullable' · cdbde084
      Alexei Starovoitov authored
      Vadim Fedorenko says:
      
      ====================
      bpf: make trusted args nullable
      
      Current verifier checks for the arg to be nullable after checking for
      certain pointer types. It prevents programs to pass NULL to kfunc args
      even if they are marked as nullable. This patchset adjusts verifier and
      changes bpf crypto kfuncs to allow null for IV parameter which is
      optional for some ciphers. Benchmark shows ~4% improvements when there
      is no need to initialise 0-sized dynptr.
      
      v3:
      - add special selftest for nullable parameters
      v2:
      - adjust kdoc accordingly
      ====================
      
      Link: https://lore.kernel.org/r/20240613211817.1551967-1-vadfed@meta.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      cdbde084
    • Vadim Fedorenko's avatar
      selftests: bpf: add testmod kfunc for nullable params · 2d45ab1e
      Vadim Fedorenko authored
      Add special test to be sure that only __nullable BTF params can be
      replaced by NULL. This patch adds fake kfuncs in bpf_testmod to
      properly test different params.
      Acked-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Signed-off-by: default avatarVadim Fedorenko <vadfed@meta.com>
      Link: https://lore.kernel.org/r/20240613211817.1551967-6-vadfed@meta.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      2d45ab1e
    • Vadim Fedorenko's avatar
      selftests: bpf: crypto: adjust bench to use nullable IV · 9b560751
      Vadim Fedorenko authored
      The bench shows some improvements, around 4% faster on decrypt.
      
      Before:
      
      Benchmark 'crypto-decrypt' started.
      Iter   0 (325.719us): hits    5.105M/s (  5.105M/prod), drops 0.000M/s, total operations    5.105M/s
      Iter   1 (-17.295us): hits    5.224M/s (  5.224M/prod), drops 0.000M/s, total operations    5.224M/s
      Iter   2 (  5.504us): hits    4.630M/s (  4.630M/prod), drops 0.000M/s, total operations    4.630M/s
      Iter   3 (  9.239us): hits    5.148M/s (  5.148M/prod), drops 0.000M/s, total operations    5.148M/s
      Iter   4 ( 37.885us): hits    5.198M/s (  5.198M/prod), drops 0.000M/s, total operations    5.198M/s
      Iter   5 (-53.282us): hits    5.167M/s (  5.167M/prod), drops 0.000M/s, total operations    5.167M/s
      Iter   6 (-17.809us): hits    5.186M/s (  5.186M/prod), drops 0.000M/s, total operations    5.186M/s
      Summary: hits    5.092 ± 0.228M/s (  5.092M/prod), drops    0.000 ±0.000M/s, total operations    5.092 ± 0.228M/s
      
      After:
      
      Benchmark 'crypto-decrypt' started.
      Iter   0 (268.912us): hits    5.312M/s (  5.312M/prod), drops 0.000M/s, total operations    5.312M/s
      Iter   1 (124.869us): hits    5.354M/s (  5.354M/prod), drops 0.000M/s, total operations    5.354M/s
      Iter   2 (-36.801us): hits    5.334M/s (  5.334M/prod), drops 0.000M/s, total operations    5.334M/s
      Iter   3 (254.628us): hits    5.334M/s (  5.334M/prod), drops 0.000M/s, total operations    5.334M/s
      Iter   4 (-77.691us): hits    5.275M/s (  5.275M/prod), drops 0.000M/s, total operations    5.275M/s
      Iter   5 (-164.510us): hits    5.313M/s (  5.313M/prod), drops 0.000M/s, total operations    5.313M/s
      Iter   6 (-81.376us): hits    5.346M/s (  5.346M/prod), drops 0.000M/s, total operations    5.346M/s
      Summary: hits    5.326 ± 0.029M/s (  5.326M/prod), drops    0.000 ±0.000M/s, total operations    5.326 ± 0.029M/s
      Reviewed-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Signed-off-by: default avatarVadim Fedorenko <vadfed@meta.com>
      Link: https://lore.kernel.org/r/20240613211817.1551967-5-vadfed@meta.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      9b560751