1. 06 Oct, 2021 27 commits
    • Grant Seltzer's avatar
      libbpf: Add API documentation convention guidelines · 93303034
      Grant Seltzer authored
      This adds a section to the documentation for libbpf
      naming convention which describes how to document
      API features in libbpf, specifically the format of
      which API doc comments need to conform to.
      Signed-off-by: default avatarGrant Seltzer <grantseltzer@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20211004215644.497327-1-grantseltzer@gmail.com
      93303034
    • Jiri Olsa's avatar
      selftest/bpf: Switch recursion test to use htab_map_delete_elem · 189c83bd
      Jiri Olsa authored
      Currently the recursion test is hooking __htab_map_lookup_elem
      function, which is invoked both from bpf_prog and bpf syscall.
      
      But in our kernel build, the __htab_map_lookup_elem gets inlined
      within the htab_map_lookup_elem, so it's not trigered and the
      test fails.
      
      Fixing this by using htab_map_delete_elem, which is not inlined
      for bpf_prog calls (like htab_map_lookup_elem is) and is used
      directly as pointer for map_delete_elem, so it won't disappear
      by inlining.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/YVnfFTL/3T6jOwHI@krava
      189c83bd
    • Quentin Monnet's avatar
      bpf: Use $(pound) instead of \# in Makefiles · 929bef46
      Quentin Monnet authored
      Recent-ish versions of make do no longer consider number signs ("#") as
      comment symbols when they are inserted inside of a macro reference or in
      a function invocation. In such cases, the symbols should not be escaped.
      
      There are a few occurrences of "\#" in libbpf's and samples' Makefiles.
      In the former, the backslash is harmless, because grep associates no
      particular meaning to the escaped symbol and reads it as a regular "#".
      In samples' Makefile, recent versions of make will pass the backslash
      down to the compiler, making the probe fail all the time and resulting
      in the display of a warning about "make headers_install" being required,
      even after headers have been installed.
      
      A similar issue has been addressed at some other locations by commit
      9564a8cf ("Kbuild: fix # escaping in .cmd files for future Make").
      Let's address it for libbpf's and samples' Makefiles in the same
      fashion, by using a "$(pound)" variable (pulled from
      tools/scripts/Makefile.include for libbpf, or re-defined for the
      samples).
      
      Reference for the change in make:
      https://git.savannah.gnu.org/cgit/make.git/commit/?id=c6966b323811c37acedff05b57
      
      Fixes: 2f383041 ("libbpf: Make libbpf_version.h non-auto-generated")
      Fixes: 07c3bbdb ("samples: bpf: print a warning about headers_install")
      Signed-off-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20211006111049.20708-1-quentin@isovalent.com
      929bef46
    • Daniel Borkmann's avatar
      bpf, arm: Remove dummy bpf_jit_compile stub · 90982e13
      Daniel Borkmann authored
      The BPF core defines a __weak bpf_jit_compile() dummy function already
      which should only be overridden by JITs if they actually implement a
      legacy cBPF JIT. Given arm implements an eBPF JIT, this stub is not
      needed.
      
      Now that MIPS cBPF JIT is finally gone, the only JIT left that is still
      implementing bpf_jit_compile() is the sparc32 one.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      90982e13
    • Daniel Borkmann's avatar
      Merge branch 'bpf-mips-jit' · f438ee21
      Daniel Borkmann authored
      Johan Almbladh says:
      
      ====================
      This is an implementation of an eBPF JIT for MIPS I-V and MIPS32/64 r1-r6.
      The new JIT is written from scratch, but uses the same overall structure
      as other eBPF JITs.
      
      Before, the MIPS JIT situation looked like this.
      
        - 32-bit: MIPS32, cBPF-only, tests fail
        - 64-bit: MIPS64r2-r6, eBPF, tests fail, incomplete eBPF ISA support
      
      The new JIT implementation raises the bar to the following level.
      
        - 32/64-bit: all MIPS ISA, eBPF, all tests pass, full eBPF ISA support
      
      Overview
      --------
      The implementation supports all 32-bit and 64-bit eBPF instructions
      defined as of this writing, including the recently-added atomics. It is
      intended to provide good performance for native word size operations,
      while also being complete so the JIT never has to fall back to the
      interpreter. The new JIT replaces the current cBPF and eBPF JITs for MIPS.
      
      The implementation is divided into separate files as follows. The source
      files contains comments describing internal mechanisms and details on
      things like eBPF-to-CPU register mappings, so I won't repeat that here.
      
        - jit_comp.[ch]    code shared between 32-bit and 64-bit JITs
        - jit_comp32.c     32-bit JIT implementation
        - jit_comp64.c     64-bit JIT implementation
      
      Both the 32-bit and 64-bit versions map all eBPF registers to native MIPS
      CPU registers. There are also enough unmapped CPU registers available to
      allow all eBPF operations implemented natively by the JIT to use only CPU
      registers without having to resort to stack scratch space.
      
      Some operations are deemed too complex to implement natively in the JIT.
      Those are instead implemented as a function call to a helper that performs
      the operation. This is done in the following cases.
      
        - 64-bit div and mod on a 32-bit CPU
        - 64-bit atomics on a 32-bit CPU
        - 32-bit atomics on a 32-bit CPU that lacks ll/sc instructions
      
      CPU errata workarounds
      ----------------------
      The JIT implements workarounds for R10000, Loongson-2F and Loongson-3 CPU
      errata. For the Loongson workarounds, I have used the public information
      available on the matter.
      
      Link: https://sourceware.org/legacy-ml/binutils/2009-11/msg00387.html
      
      Testing
      -------
      During the development of the JIT, I have added a number of new test cases
      to the test_bpf.ko test suite to be able to verify correctness of JIT
      implementations in a more systematic way. The new additions increase the
      test suite roughly three-fold, with many of the new tests being very
      extensive and even exhaustive when feasible.
      
      Link: https://lore.kernel.org/bpf/20211001130348.3670534-1-johan.almbladh@anyfinetworks.com/
      Link: https://lore.kernel.org/bpf/20210914091842.4186267-1-johan.almbladh@anyfinetworks.com/
      Link: https://lore.kernel.org/bpf/20210809091829.810076-1-johan.almbladh@anyfinetworks.com/
      
      The JIT has been tested by running the test_bpf.ko test suite in QEMU with
      the following MIPS ISAs, in both big and little endian mode, with and
      without JIT hardening enabled.
      
        MIPS32r2, MIPS32r6, MIPS64r2, MIPS64r6
      
      For the QEMU r2 targets, the correctness of pre-r2 code emitted has been
      tested by manually overriding each of the following macros with 0.
      
        cpu_has_llsc, cpu_has_mips_2, cpu_has_mips_r1, cpu_has_mips_r2
      
      Similarly, CPU errata workaround code has been tested by enabling the
      each of the following configurations for the MIPS64r2 targets.
      
        CONFIG_WAR_R10000
        CONFIG_CPU_LOONGSON3_WORKAROUNDS
        CONFIG_CPU_NOP_WORKAROUNDS
        CONFIG_CPU_JUMP_WORKAROUNDS
      
      The JIT passes all tests in all configurations. Below is the summary for
      MIPS32r2 in little endian mode.
      
        test_bpf: Summary: 1006 PASSED, 0 FAILED, [994/994 JIT'ed]
        test_bpf: test_tail_calls: Summary: 8 PASSED, 0 FAILED, [8/8 JIT'ed]
        test_bpf: test_skb_segment: Summary: 2 PASSED, 0 FAILED
      
      According to MIPS ISA reference documentation, the result of a 32-bit ALU
      arithmetic operation on a 64-bit CPU is unpredictable if an operand
      register value is not properly sign-extended to 64 bits. To verify the
      code emitted by the JIT, the code generation engine in QEMU was modifed to
      flip all low 32 bits if the above condition was not met. With this
      trip-wire installed, the kernel booted properly in qemu-system-mips64el
      and all test_bpf.ko tests passed.
      
      Remaining features
      ------------------
      While the JIT is complete is terms of eBPF ISA support, this series does
      not include support for BPF-to-BPF calls and BPF trampolines. Those
      features are planned to be added in another patch series.
      
      The BPF_ST | BPF_NOSPEC instruction currently emits nothing. This is
      consistent with the behavior if the MIPS interpreter and the existing
      eBPF JIT.
      
      Why not build on the existing eBPF JIT?
      ---------------------------------------
      The existing eBPF JIT was originally written for MIPS64. An effort was
      made to add MIPS32 support to it in commit 716850ab ("MIPS: eBPF:
      Initial eBPF support for MIPS32 architecture."). That turned out to
      contain a number of flaws, so eBPF support for MIPS32 was disabled in
      commit 36366e36 ("MIPS: BPF: Restore MIPS32 cBPF JIT").
      
      Link: https://lore.kernel.org/bpf/5deaa994.1c69fb81.97561.647e@mx.google.com/
      
      The current eBPF JIT for MIPS64 lacks a lot of functionality regarding
      ALU32, JMP32 and atomic operations. It also lacks 32-bit CPU support on a
      fundamental level, for example 32-bit CPU register mappings and o32 ABI
      calling conventions. For optimization purposes, it tracks register usage
      through the program control flow in order to do zero-extension and sign-
      extension only when necessary, a static analysis of sorts. In my opinion,
      having this kind of complexity in JITs, and for which there is not
      adequate test coverage, is a problem. Such analysis should be done by the
      verifier, if needed at all. Finally, when I run the BPF test suite
      test_bpf.ko on the current JIT, there are errors and warnings.
      
      I believe that an eBPF JIT should strive to be correct, complete and
      optimized, and in that order. The JIT runs after the verifer has audited
      the program and given its approval. If the JIT then emits code that does
      something else, it will undermine the eBPF security model. A simple
      implementation is easier to get correct than a complex one. Furthermore,
      the real performance hit is not an extra CPU instruction here and there,
      but when the JIT bails on an unimplemented eBPF instruction and cause the
      whole program to fall back to the interpreter. My reasoning here boils
      down to the following.
      
      * The JIT should not contain a static analyzer that tracks branches.
      
      * It is acceptable to emit possibly superfluous sign-/zero-extensions for
        ALU32 and JMP32 operations on a 64-bit MIPS to guarantee correctness.
      
      * The JIT should handle all eBPF instructions on all MIPS CPUs.
      
      I conclude that the current eBPF MIPS JIT is complex, incomplete and
      incorrect. For the reasons stated above, I decided to not use the existing
      JIT implementation.
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      f438ee21
    • Johan Almbladh's avatar
      mips, bpf: Remove old BPF JIT implementations · ebcbacfa
      Johan Almbladh authored
      This patch removes the old 32-bit cBPF and 64-bit eBPF JIT implementations.
      They are replaced by a new eBPF implementation that supports both 32-bit
      and 64-bit MIPS CPUs.
      Signed-off-by: default avatarJohan Almbladh <johan.almbladh@anyfinetworks.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20211005165408.2305108-8-johan.almbladh@anyfinetworks.com
      ebcbacfa
    • Johan Almbladh's avatar
      mips, bpf: Enable eBPF JITs · 01bdc58e
      Johan Almbladh authored
      This patch enables the new eBPF JITs for 32-bit and 64-bit MIPS. It also
      disables the old cBPF JIT to so cBPF programs are converted to use the
      new JIT.
      
      Workarounds for R4000 CPU errata are not implemented by the JIT, so the
      JIT is disabled if any of those workarounds are configured.
      Signed-off-by: default avatarJohan Almbladh <johan.almbladh@anyfinetworks.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20211005165408.2305108-7-johan.almbladh@anyfinetworks.com
      01bdc58e
    • Johan Almbladh's avatar
      mips, bpf: Add JIT workarounds for CPU errata · 72570224
      Johan Almbladh authored
      This patch adds workarounds for the following CPU errata to the MIPS
      eBPF JIT, if enabled in the kernel configuration.
      
        - R10000 ll/sc weak ordering
        - Loongson-3 ll/sc weak ordering
        - Loongson-2F jump hang
      
      The Loongson-2F nop errata is implemented in uasm, which the JIT uses,
      so no additional mitigations are needed for that.
      Signed-off-by: default avatarJohan Almbladh <johan.almbladh@anyfinetworks.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Reviewed-by: default avatarJiaxun Yang <jiaxun.yang@flygoat.com>
      Link: https://lore.kernel.org/bpf/20211005165408.2305108-6-johan.almbladh@anyfinetworks.com
      72570224
    • Johan Almbladh's avatar
      mips, bpf: Add new eBPF JIT for 64-bit MIPS · fbc802de
      Johan Almbladh authored
      This is an implementation on of an eBPF JIT for 64-bit MIPS III-V and
      MIPS64r1-r6. It uses the same framework introduced by the 32-bit JIT.
      Signed-off-by: default avatarJohan Almbladh <johan.almbladh@anyfinetworks.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20211005165408.2305108-5-johan.almbladh@anyfinetworks.com
      fbc802de
    • Johan Almbladh's avatar
      mips, bpf: Add eBPF JIT for 32-bit MIPS · eb63cfcd
      Johan Almbladh authored
      This is an implementation of an eBPF JIT for 32-bit MIPS I-V and MIPS32.
      The implementation supports all 32-bit and 64-bit ALU and JMP operations,
      including the recently-added atomics. 64-bit div/mod and 64-bit atomics
      are implemented using function calls to math64 and atomic64 functions,
      respectively. All 32-bit operations are implemented natively by the JIT,
      except if the CPU lacks ll/sc instructions.
      
      Register mapping
      ================
      All 64-bit eBPF registers are mapped to native 32-bit MIPS register pairs,
      and does not use any stack scratch space for register swapping. This means
      that all eBPF register data is kept in CPU registers all the time, and
      this simplifies the register management a lot. It also reduces the JIT's
      pressure on temporary registers since we do not have to move data around.
      
      Native register pairs are ordered according to CPU endiannes, following
      the O32 calling convention for passing 64-bit arguments and return values.
      The eBPF return value, arguments and callee-saved registers are mapped to
      their native MIPS equivalents.
      
      Since the 32 highest bits in the eBPF FP (frame pointer) register are
      always zero, only one general-purpose register is actually needed for the
      mapping. The MIPS fp register is used for this purpose. The high bits are
      mapped to MIPS register r0. This saves us one CPU register, which is much
      needed for temporaries, while still allowing us to treat the R10 (FP)
      register just like any other eBPF register in the JIT.
      
      The MIPS gp (global pointer) and at (assembler temporary) registers are
      used as internal temporary registers for constant blinding. CPU registers
      t6-t9 are used internally by the JIT when constructing more complex 64-bit
      operations. This is precisely what is needed - two registers to store an
      operand value, and two more as scratch registers when performing the
      operation.
      
      The register mapping is shown below.
      
          R0 - $v1, $v0   return value
          R1 - $a1, $a0   argument 1, passed in registers
          R2 - $a3, $a2   argument 2, passed in registers
          R3 - $t1, $t0   argument 3, passed on stack
          R4 - $t3, $t2   argument 4, passed on stack
          R5 - $t4, $t3   argument 5, passed on stack
          R6 - $s1, $s0   callee-saved
          R7 - $s3, $s2   callee-saved
          R8 - $s5, $s4   callee-saved
          R9 - $s7, $s6   callee-saved
          FP - $r0, $fp   32-bit frame pointer
          AX - $gp, $at   constant-blinding
               $t6 - $t9  unallocated, JIT temporaries
      
      Jump offsets
      ============
      The JIT tries to map all conditional JMP operations to MIPS conditional
      PC-relative branches. The MIPS branch offset field is 18 bits, in bytes,
      which is equivalent to the eBPF 16-bit instruction offset. However, since
      the JIT may emit more than one CPU instruction per eBPF instruction, the
      field width may overflow. If that happens, the JIT converts the long
      conditional jump to a short PC-relative branch with the condition
      inverted, jumping over a long unconditional absolute jmp (j).
      
      This conversion will change the instruction offset mapping used for jumps,
      and may in turn result in more branch offset overflows. The JIT therefore
      dry-runs the translation until no more branches are converted and the
      offsets do not change anymore. There is an upper bound on this of course,
      and if the JIT hits that limit, the last two iterations are run with all
      branches being converted.
      
      Tail call count
      ===============
      The current tail call count is stored in the 16-byte area of the caller's
      stack frame that is reserved for the callee in the o32 ABI. The value is
      initialized in the prologue, and propagated to the tail-callee by skipping
      the initialization instructions when emitting the tail call.
      Signed-off-by: default avatarJohan Almbladh <johan.almbladh@anyfinetworks.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20211005165408.2305108-4-johan.almbladh@anyfinetworks.com
      eb63cfcd
    • Johan Almbladh's avatar
      mips, uasm: Add workaround for Loongson-2F nop CPU errata · f7c036c1
      Johan Almbladh authored
      This patch implements a workaround for the Loongson-2F nop in generated,
      code, if the existing option CONFIG_CPU_NOP_WORKAROUND is set. Before,
      the binutils option -mfix-loongson2f-nop was enabled, but no workaround
      was done when emitting MIPS code. Now, the nop pseudo instruction is
      emitted as "or ax,ax,zero" instead of the default "sll zero,zero,0". This
      is consistent with the workaround implemented by binutils.
      Signed-off-by: default avatarJohan Almbladh <johan.almbladh@anyfinetworks.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Reviewed-by: default avatarJiaxun Yang <jiaxun.yang@flygoat.com>
      Link: https://sourceware.org/legacy-ml/binutils/2009-11/msg00387.html
      Link: https://lore.kernel.org/bpf/20211005165408.2305108-3-johan.almbladh@anyfinetworks.com
      f7c036c1
    • Tony Ambardar's avatar
      mips, uasm: Enable muhu opcode for MIPS R6 · e737547e
      Tony Ambardar authored
      Enable the 'muhu' instruction, complementing the existing 'mulu', needed
      to implement a MIPS32 BPF JIT.
      
      Also fix a typo in the existing definition of 'dmulu'.
      Signed-off-by: default avatarTony Ambardar <Tony.Ambardar@gmail.com>
      Signed-off-by: default avatarJohan Almbladh <johan.almbladh@anyfinetworks.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20211005165408.2305108-2-johan.almbladh@anyfinetworks.com
      e737547e
    • Andrii Nakryiko's avatar
      selftests/bpf: Test new btf__add_btf() API · 9d057872
      Andrii Nakryiko authored
      Add a test that validates that btf__add_btf() API is correctly copying
      all the types from the source BTF into destination BTF object and
      adjusts type IDs and string offsets properly.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20211006051107.17921-4-andrii@kernel.org
      9d057872
    • Andrii Nakryiko's avatar
      selftests/bpf: Refactor btf_write selftest to reuse BTF generation logic · c65eb808
      Andrii Nakryiko authored
      Next patch will need to reuse BTF generation logic, which tests every
      supported BTF kind, for testing btf__add_btf() APIs. So restructure
      existing selftests and make it as a single subtest that uses bulk
      VALIDATE_RAW_BTF() macro for raw BTF dump checking.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20211006051107.17921-3-andrii@kernel.org
      c65eb808
    • Andrii Nakryiko's avatar
      libbpf: Add API that copies all BTF types from one BTF object to another · 7ca61121
      Andrii Nakryiko authored
      Add a bulk copying api, btf__add_btf(), that speeds up and simplifies
      appending entire contents of one BTF object to another one, taking care
      of copying BTF type data, adjusting resulting BTF type IDs according to
      their new locations in the destination BTF object, as well as copying
      and deduplicating all the referenced strings and updating all the string
      offsets in new BTF types as appropriate.
      
      This API is intended to be used from tools that are generating and
      otherwise manipulating BTFs generically, such as pahole. In pahole's
      case, this API is useful for speeding up parallelized BTF encoding, as
      it allows pahole to offload all the intricacies of BTF type copying to
      libbpf and handle the parallelization aspects of the process.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Link: https://lore.kernel.org/bpf/20211006051107.17921-2-andrii@kernel.org
      7ca61121
    • Jie Meng's avatar
      bpf, x64: Save bytes for DIV by reducing reg copies · 57a610f1
      Jie Meng authored
      Instead of unconditionally performing push/pop on %rax/%rdx in case of
      division/modulo, we can save a few bytes in case of destination register
      being either BPF r0 (%rax) or r3 (%rdx) since the result is written in
      there anyway.
      
      Also, we do not need to copy the source to %r11 unless the source is either
      %rax, %rdx or an immediate.
      
      For example, before the patch:
      
        22:   push   %rax
        23:   push   %rdx
        24:   mov    %rsi,%r11
        27:   xor    %edx,%edx
        29:   div    %r11
        2c:   mov    %rax,%r11
        2f:   pop    %rdx
        30:   pop    %rax
        31:   mov    %r11,%rax
      
      After:
      
        22:   push   %rdx
        23:   xor    %edx,%edx
        25:   div    %rsi
        28:   pop    %rdx
      Signed-off-by: default avatarJie Meng <jmeng@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20211002035626.2041910-1-jmeng@fb.com
      57a610f1
    • Andrey Ignatov's avatar
      bpf: Avoid retpoline for bpf_for_each_map_elem · 0640c77c
      Andrey Ignatov authored
      Similarly to 09772d92 ("bpf: avoid retpoline for
      lookup/update/delete calls on maps") and 84430d42 ("bpf, verifier:
      avoid retpoline for map push/pop/peek operation") avoid indirect call
      while calling bpf_for_each_map_elem.
      
      Before (a program fragment):
      
        ; if (rules_map) {
         142: (15) if r4 == 0x0 goto pc+8
         143: (bf) r3 = r10
        ; bpf_for_each_map_elem(rules_map, process_each_rule, &ctx, 0);
         144: (07) r3 += -24
         145: (bf) r1 = r4
         146: (18) r2 = subprog[+5]
         148: (b7) r4 = 0
         149: (85) call bpf_for_each_map_elem#143680  <-- indirect call via
                                                          helper
      
      After (same program fragment):
      
         ; if (rules_map) {
          142: (15) if r4 == 0x0 goto pc+8
          143: (bf) r3 = r10
         ; bpf_for_each_map_elem(rules_map, process_each_rule, &ctx, 0);
          144: (07) r3 += -24
          145: (bf) r1 = r4
          146: (18) r2 = subprog[+5]
          148: (b7) r4 = 0
          149: (85) call bpf_for_each_array_elem#170336  <-- direct call
      
      On a benchmark that calls bpf_for_each_map_elem() once and does many
      other things (mostly checking fields in skb) with CONFIG_RETPOLINE=y it
      makes program faster.
      
      Before:
      
        ============================================================================
        Benchmark.cpp                                              time/iter iters/s
        ============================================================================
        IngressMatchByRemoteEndpoint                                80.78ns 12.38M
        IngressMatchByRemoteIP                                      80.66ns 12.40M
        IngressMatchByRemotePort                                    80.87ns 12.37M
      
      After:
      
        ============================================================================
        Benchmark.cpp                                              time/iter iters/s
        ============================================================================
        IngressMatchByRemoteEndpoint                                73.49ns 13.61M
        IngressMatchByRemoteIP                                      71.48ns 13.99M
        IngressMatchByRemotePort                                    70.39ns 14.21M
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211006001838.75607-1-rdna@fb.com
      0640c77c
    • Alexei Starovoitov's avatar
      Merge branch 'Support kernel module function calls from eBPF' · 32a16f6b
      Alexei Starovoitov authored
      Kumar Kartikeya says:
      
      ====================
      
      This set enables kernel module function calls, and also modifies verifier logic
      to permit invalid kernel function calls as long as they are pruned as part of
      dead code elimination. This is done to provide better runtime portability for
      BPF objects, which can conditionally disable parts of code that are pruned later
      by the verifier (e.g. const volatile vars, kconfig options). libbpf
      modifications are made along with kernel changes to support module function
      calls.
      
      It also converts TCP congestion control objects to use the module kfunc support
      instead of relying on IS_BUILTIN ifdef.
      
      Changelog:
      ----------
      v6 -> v7
      v6: https://lore.kernel.org/bpf/20210930062948.1843919-1-memxor@gmail.com
      
       * Let __bpf_check_kfunc_call take kfunc_btf_id_list instead of generating
         callbacks (Andrii)
       * Rename it to bpf_check_mod_kfunc_call to reflect usage
       * Remove OOM checks (Alexei)
       * Remove resolve_btfids invocation for bpf_testmod (Andrii)
       * Move fd_array_cnt initialization near fd_array alloc (Andrii)
       * Rename helper to btf_find_by_name_kind and pass start_id (Andrii)
       * memset when data is NULL in add_data (Alexei)
       * Fix other nits
      
      v5 -> v6
      v5: https://lore.kernel.org/bpf/20210927145941.1383001-1-memxor@gmail.com
      
       * Rework gen_loader relocation emits
         * Only emit bpf_btf_find_by_name_kind call when required (Alexei)
         * Refactor code to emit ksym var and func relo into separate helpers, this
           will be easier to add future weak/typeless ksym support to (for my followup)
         * Count references for both ksym var and funcs, and avoid calling helpers
           unless required for both of them. This also means we share fds between
           ksym vars for the module BTFs. Also be careful with this when closing
           BTF fd so that we only close one instance of the fd for each ksym
      
      v4 -> v5
      v4: https://lore.kernel.org/bpf/20210920141526.3940002-1-memxor@gmail.com
      
       * Address comments from Alexei
         * Use reserved fd_array area in loader map instead of creating a new map
         * Drop selftest testing the 256 kfunc limit, however selftest testing reuse
           of BTF fd for same kfunc in gen_loader and libbpf is kept
       * Address comments from Andrii
         * Make --no-fail the default for resolve_btfids, i.e. only fail if we find
           BTF section and cannot process it
         * Use obj->btf_modules array to store index in the fd_array, so that we don't
           have to do any searching to reuse the index, instead only set it the first
           time a module BTF's fd is used
         * Make find_ksym_btf_id to return struct module_btf * in last parameter
         * Improve logging when index becomes bigger than INT16_MAX
         * Add btf__find_by_name_kind_own internal helper to only start searching for
           kfunc ID in module BTF, since find_ksym_btf_id already checks vmlinux BTF
           before iterating over module BTFs.
         * Fix various other nits
       * Fixes for failing selftests on BPF CI
       * Rearrange/cleanup selftests
         * Avoid testing kfunc limit (Alexei)
         * Do test gen_loader and libbpf BTF fd index dedup with 256 calls
         * Move invalid kfunc failure test to verifier selftest
         * Minimize duplication
       * Use consistent bpf_<type>_check_kfunc_call naming for module kfunc callback
       * Since we try to add fd using add_data while we can, cherry pick Alexei's
         patch from CO-RE RFC series to align gen_loader data.
      
      v3 -> v4
      v3: https://lore.kernel.org/bpf/20210915050943.679062-1-memxor@gmail.com
      
       * Address comments from Alexei
         * Drop MAX_BPF_STACK change, instead move map_fd and BTF fd to BPF array map
           and pass fd_array using BPF_PSEUDO_MAP_IDX_VALUE
       * Address comments from Andrii
         * Fix selftest to store to variable for observing function call instead of
           printk and polluting CI logs
       * Drop use of raw_tp for testing, instead reuse classifier based prog_test_run
       * Drop index + 1 based insn->off convention for kfunc module calls
       * Expand selftests to cover more corner cases
       * Misc cleanups
      
      v2 -> v3
      v2: https://lore.kernel.org/bpf/20210914123750.460750-1-memxor@gmail.com
      
       * Fix issues pointed out by Kernel Test Robot
       * Fix find_kfunc_desc to also take offset into consideration when comparing
      
      RFC v1 -> v2
      v1: https://lore.kernel.org/bpf/20210830173424.1385796-1-memxor@gmail.com
      
       * Address comments from Alexei
         * Reuse fd_array instead of introducing kfunc_btf_fds array
         * Take btf and module reference as needed, instead of preloading
         * Add BTF_KIND_FUNC relocation support to gen_loader infrastructure
       * Address comments from Andrii
         * Drop hashmap in libbpf for finding index of existing BTF in fd_array
         * Preserve invalid kfunc calls only when the symbol is weak
       * Adjust verifier selftests
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      32a16f6b
    • Kumar Kartikeya Dwivedi's avatar
      bpf: selftests: Add selftests for module kfunc support · c48e51c8
      Kumar Kartikeya Dwivedi authored
      This adds selftests that tests the success and failure path for modules
      kfuncs (in presence of invalid kfunc calls) for both libbpf and
      gen_loader. It also adds a prog_test kfunc_btf_id_list so that we can
      add module BTF ID set from bpf_testmod.
      
      This also introduces  a couple of test cases to verifier selftests for
      validating whether we get an error or not depending on if invalid kfunc
      call remains after elimination of unreachable instructions.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211002011757.311265-10-memxor@gmail.com
      c48e51c8
    • Kumar Kartikeya Dwivedi's avatar
      libbpf: Update gen_loader to emit BTF_KIND_FUNC relocations · 18f4fccb
      Kumar Kartikeya Dwivedi authored
      This change updates the BPF syscall loader to relocate BTF_KIND_FUNC
      relocations, with support for weak kfunc relocations. The general idea
      is to move map_fds to loader map, and also use the data for storing
      kfunc BTF fds. Since both reuse the fd_array parameter, they need to be
      kept together.
      
      For map_fds, we reserve MAX_USED_MAPS slots in a region, and for kfunc,
      we reserve MAX_KFUNC_DESCS. This is done so that insn->off has more
      chances of being <= INT16_MAX than treating data map as a sparse array
      and adding fd as needed.
      
      When the MAX_KFUNC_DESCS limit is reached, we fall back to the sparse
      array model, so that as long as it does remain <= INT16_MAX, we pass an
      index relative to the start of fd_array.
      
      We store all ksyms in an array where we try to avoid calling the
      bpf_btf_find_by_name_kind helper, and also reuse the BTF fd that was
      already stored. This also speeds up the loading process compared to
      emitting calls in all cases, in later tests.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211002011757.311265-9-memxor@gmail.com
      18f4fccb
    • Kumar Kartikeya Dwivedi's avatar
      libbpf: Resolve invalid weak kfunc calls with imm = 0, off = 0 · 466b2e13
      Kumar Kartikeya Dwivedi authored
      Preserve these calls as it allows verifier to succeed in loading the
      program if they are determined to be unreachable after dead code
      elimination during program load. If not, the verifier will fail at
      runtime. This is done for ext->is_weak symbols similar to the case for
      variable ksyms.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20211002011757.311265-8-memxor@gmail.com
      466b2e13
    • Kumar Kartikeya Dwivedi's avatar
      libbpf: Support kernel module function calls · 9dbe6015
      Kumar Kartikeya Dwivedi authored
      This patch adds libbpf support for kernel module function call support.
      The fd_array parameter is used during BPF program load to pass module
      BTFs referenced by the program. insn->off is set to index into this
      array, but starts from 1, because insn->off as 0 is reserved for
      btf_vmlinux.
      
      We try to use existing insn->off for a module, since the kernel limits
      the maximum distinct module BTFs for kfuncs to 256, and also because
      index must never exceed the maximum allowed value that can fit in
      insn->off (INT16_MAX). In the future, if kernel interprets signed offset
      as unsigned for kfunc calls, this limit can be increased to UINT16_MAX.
      
      Also introduce a btf__find_by_name_kind_own helper to start searching
      from module BTF's start id when we know that the BTF ID is not present
      in vmlinux BTF (in find_ksym_btf_id).
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211002011757.311265-7-memxor@gmail.com
      9dbe6015
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Enable TCP congestion control kfunc from modules · 0e32dfc8
      Kumar Kartikeya Dwivedi authored
      This commit moves BTF ID lookup into the newly added registration
      helper, in a way that the bbr, cubic, and dctcp implementation set up
      their sets in the bpf_tcp_ca kfunc_btf_set list, while the ones not
      dependent on modules are looked up from the wrapper function.
      
      This lifts the restriction for them to be compiled as built in objects,
      and can be loaded as modules if required. Also modify Makefile.modfinal
      to call resolve_btfids for each module.
      
      Note that since kernel kfunc_ids never overlap with module kfunc_ids, we
      only match the owner for module btf id sets.
      
      See following commits for background on use of:
      
       CONFIG_X86 ifdef:
       569c484f (bpf: Limit static tcp-cc functions in the .BTF_ids list to x86)
      
       CONFIG_DYNAMIC_FTRACE ifdef:
       7aae231a (bpf: tcp: Limit calling some tcp cc functions to CONFIG_DYNAMIC_FTRACE)
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211002011757.311265-6-memxor@gmail.com
      0e32dfc8
    • Kumar Kartikeya Dwivedi's avatar
      tools: Allow specifying base BTF file in resolve_btfids · f614f2c7
      Kumar Kartikeya Dwivedi authored
      This commit allows specifying the base BTF for resolving btf id
      lists/sets during link time in the resolve_btfids tool. The base BTF is
      set to NULL if no path is passed. This allows resolving BTF ids for
      module kernel objects.
      
      Also, drop the --no-fail option, as it is only used in case .BTF_ids
      section is not present, instead make no-fail the default mode. The long
      option name is same as that of pahole.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20211002011757.311265-5-memxor@gmail.com
      f614f2c7
    • Kumar Kartikeya Dwivedi's avatar
      bpf: btf: Introduce helpers for dynamic BTF set registration · 14f267d9
      Kumar Kartikeya Dwivedi authored
      This adds helpers for registering btf_id_set from modules and the
      bpf_check_mod_kfunc_call callback that can be used to look them up.
      
      With in kernel sets, the way this is supposed to work is, in kernel
      callback looks up within the in-kernel kfunc whitelist, and then defers
      to the dynamic BTF set lookup if it doesn't find the BTF id. If there is
      no in-kernel BTF id set, this callback can be used directly.
      
      Also fix includes for btf.h and bpfptr.h so that they can included in
      isolation. This is in preparation for their usage in tcp_bbr, tcp_cubic
      and tcp_dctcp modules in the next patch.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211002011757.311265-4-memxor@gmail.com
      14f267d9
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Be conservative while processing invalid kfunc calls · a5d82727
      Kumar Kartikeya Dwivedi authored
      This patch also modifies the BPF verifier to only return error for
      invalid kfunc calls specially marked by userspace (with insn->imm == 0,
      insn->off == 0) after the verifier has eliminated dead instructions.
      This can be handled in the fixup stage, and skip processing during add
      and check stages.
      
      If such an invalid call is dropped, the fixup stage will not encounter
      insn->imm as 0, otherwise it bails out and returns an error.
      
      This will be exposed as weak ksym support in libbpf in later patches.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211002011757.311265-3-memxor@gmail.com
      a5d82727
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Introduce BPF support for kernel module function calls · 2357672c
      Kumar Kartikeya Dwivedi authored
      This change adds support on the kernel side to allow for BPF programs to
      call kernel module functions. Userspace will prepare an array of module
      BTF fds that is passed in during BPF_PROG_LOAD using fd_array parameter.
      In the kernel, the module BTFs are placed in the auxilliary struct for
      bpf_prog, and loaded as needed.
      
      The verifier then uses insn->off to index into the fd_array. insn->off
      0 is reserved for vmlinux BTF (for backwards compat), so userspace must
      use an fd_array index > 0 for module kfunc support. kfunc_btf_tab is
      sorted based on offset in an array, and each offset corresponds to one
      descriptor, with a max limit up to 256 such module BTFs.
      
      We also change existing kfunc_tab to distinguish each element based on
      imm, off pair as each such call will now be distinct.
      
      Another change is to check_kfunc_call callback, which now include a
      struct module * pointer, this is to be used in later patch such that the
      kfunc_id and module pointer are matched for dynamically registered BTF
      sets from loadable modules, so that same kfunc_id in two modules doesn't
      lead to check_kfunc_call succeeding. For the duration of the
      check_kfunc_call, the reference to struct module exists, as it returns
      the pointer stored in kfunc_btf_tab.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211002011757.311265-2-memxor@gmail.com
      2357672c
  2. 05 Oct, 2021 13 commits