1. 13 Sep, 2021 6 commits
    • Andrii Nakryiko's avatar
      libbpf: Make libbpf_version.h non-auto-generated · 2f383041
      Andrii Nakryiko authored
      Turn previously auto-generated libbpf_version.h header into a normal
      header file. This prevents various tricky Makefile integration issues,
      simplifies the overall build process, but also allows to further extend
      it with some more versioning-related APIs in the future.
      
      To prevent accidental out-of-sync versions as defined by libbpf.map and
      libbpf_version.h, Makefile checks their consistency at build time.
      
      Simultaneously with this change bump libbpf.map to v0.6.
      
      Also undo adding libbpf's output directory into include path for
      kernel/bpf/preload, bpftool, and resolve_btfids, which is not necessary
      because libbpf_version.h is just a normal header like any other.
      
      Fixes: 0b46b755 ("libbpf: Add LIBBPF_DEPRECATED_SINCE macro for scheduling API deprecations")
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210913222309.3220849-1-andrii@kernel.org
      2f383041
    • Daniel Borkmann's avatar
      bpf, selftests: Replicate tailcall limit test for indirect call case · dbd7eb14
      Daniel Borkmann authored
      The tailcall_3 test program uses bpf_tail_call_static() where the JIT
      would patch a direct jump. Add a new tailcall_6 test program replicating
      exactly the same test just ensuring that bpf_tail_call() uses a map
      index where the verifier cannot make assumptions this time.
      
      In other words, this will now cover both on x86-64 JIT, meaning, JIT
      images with emit_bpf_tail_call_direct() emission as well as JIT images
      with emit_bpf_tail_call_indirect() emission.
      
        # echo 1 > /proc/sys/net/core/bpf_jit_enable
        # ./test_progs -t tailcalls
        #136/1 tailcalls/tailcall_1:OK
        #136/2 tailcalls/tailcall_2:OK
        #136/3 tailcalls/tailcall_3:OK
        #136/4 tailcalls/tailcall_4:OK
        #136/5 tailcalls/tailcall_5:OK
        #136/6 tailcalls/tailcall_6:OK
        #136/7 tailcalls/tailcall_bpf2bpf_1:OK
        #136/8 tailcalls/tailcall_bpf2bpf_2:OK
        #136/9 tailcalls/tailcall_bpf2bpf_3:OK
        #136/10 tailcalls/tailcall_bpf2bpf_4:OK
        #136/11 tailcalls/tailcall_bpf2bpf_5:OK
        #136 tailcalls:OK
        Summary: 1/11 PASSED, 0 SKIPPED, 0 FAILED
      
        # echo 0 > /proc/sys/net/core/bpf_jit_enable
        # ./test_progs -t tailcalls
        #136/1 tailcalls/tailcall_1:OK
        #136/2 tailcalls/tailcall_2:OK
        #136/3 tailcalls/tailcall_3:OK
        #136/4 tailcalls/tailcall_4:OK
        #136/5 tailcalls/tailcall_5:OK
        #136/6 tailcalls/tailcall_6:OK
        [...]
      
      For interpreter, the tailcall_1-6 tests are passing as well. The later
      tailcall_bpf2bpf_* are failing due lack of bpf2bpf + tailcall support
      in interpreter, so this is expected.
      
      Also, manual inspection shows that both loaded programs from tailcall_3
      and tailcall_6 test case emit the expected opcodes:
      
      * tailcall_3 disasm, emit_bpf_tail_call_direct():
      
        [...]
         b:   push   %rax
         c:   push   %rbx
         d:   push   %r13
         f:   mov    %rdi,%rbx
        12:   movabs $0xffff8d3f5afb0200,%r13
        1c:   mov    %rbx,%rdi
        1f:   mov    %r13,%rsi
        22:   xor    %edx,%edx                 _
        24:   mov    -0x4(%rbp),%eax          |  limit check
        2a:   cmp    $0x20,%eax               |
        2d:   ja     0x0000000000000046       |
        2f:   add    $0x1,%eax                |
        32:   mov    %eax,-0x4(%rbp)          |_
        38:   nopl   0x0(%rax,%rax,1)
        3d:   pop    %r13
        3f:   pop    %rbx
        40:   pop    %rax
        41:   jmpq   0xffffffffffffe377
        [...]
      
      * tailcall_6 disasm, emit_bpf_tail_call_indirect():
      
        [...]
        47:   movabs $0xffff8d3f59143a00,%rsi
        51:   mov    %edx,%edx
        53:   cmp    %edx,0x24(%rsi)
        56:   jbe    0x0000000000000093        _
        58:   mov    -0x4(%rbp),%eax          |  limit check
        5e:   cmp    $0x20,%eax               |
        61:   ja     0x0000000000000093       |
        63:   add    $0x1,%eax                |
        66:   mov    %eax,-0x4(%rbp)          |_
        6c:   mov    0x110(%rsi,%rdx,8),%rcx
        74:   test   %rcx,%rcx
        77:   je     0x0000000000000093
        79:   pop    %rax
        7a:   mov    0x30(%rcx),%rcx
        7e:   add    $0xb,%rcx
        82:   callq  0x000000000000008e
        87:   pause
        89:   lfence
        8c:   jmp    0x0000000000000087
        8e:   mov    %rcx,(%rsp)
        92:   retq
        [...]
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Tested-by: default avatarTiezhu Yang <yangtiezhu@loongson.cn>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Acked-by: default avatarJohan Almbladh <johan.almbladh@anyfinetworks.com>
      Acked-by: default avatarPaul Chaignon <paul@cilium.io>
      Link: https://lore.kernel.org/bpf/CAM1=_QRyRVCODcXo_Y6qOm1iT163HoiSj8U2pZ8Rj3hzMTT=HQ@mail.gmail.com
      Link: https://lore.kernel.org/bpf/20210910091900.16119-1-daniel@iogearbox.net
      dbd7eb14
    • Alexei Starovoitov's avatar
      Merge branch 'bpf: introduce bpf_get_branch_snapshot' · 14bef1ab
      Alexei Starovoitov authored
      Song Liu says:
      
      ====================
      
      Changes v6 => v7:
      1. Improve/fix intel_pmu_snapshot_branch_stack() logic. (Peter).
      
      Changes v5 => v6:
      1. Add local_irq_save/restore to intel_pmu_snapshot_branch_stack. (Peter)
      2. Remove buf and size check in bpf_get_branch_snapshot, move flags check
         to later fo the function. (Peter, Andrii)
      3. Revise comments for bpf_get_branch_snapshot in bpf.h (Andrii)
      
      Changes v4 => v5:
      1. Modify perf_snapshot_branch_stack_t to save some memcpy. (Andrii)
      2. Minor fixes in selftests. (Andrii)
      
      Changes v3 => v4:
      1. Do not reshuffle intel_pmu_disable_all(). Use some inline to save LBR
         entries. (Peter)
      2. Move static_call(perf_snapshot_branch_stack) to the helper. (Alexei)
      3. Add argument flags to bpf_get_branch_snapshot. (Andrii)
      4. Make MAX_BRANCH_SNAPSHOT an enum (Andrii). And rename it as
         PERF_MAX_BRANCH_SNAPSHOT
      5. Make bpf_get_branch_snapshot similar to bpf_read_branch_records.
         (Andrii)
      6. Move the test target function to bpf_testmod. Updated kallsyms_find_next
         to work properly with modules. (Andrii)
      
      Changes v2 => v3:
      1. Fix the use of static_call. (Peter)
      2. Limit the use to perfmon version >= 2. (Peter)
      3. Modify intel_pmu_snapshot_branch_stack() to use intel_pmu_disable_all
         and intel_pmu_enable_all().
      
      Changes v1 => v2:
      1. Rename the helper as bpf_get_branch_snapshot;
      2. Fix/simplify the use of static_call;
      3. Instead of percpu variables, let intel_pmu_snapshot_branch_stack output
         branch records to an output argument of type perf_branch_snapshot.
      
      Branch stack can be very useful in understanding software events. For
      example, when a long function, e.g. sys_perf_event_open, returns an errno,
      it is not obvious why the function failed. Branch stack could provide very
      helpful information in this type of scenarios.
      
      This set adds support to read branch stack with a new BPF helper
      bpf_get_branch_trace(). Currently, this is only supported in Intel systems.
      It is also possible to support the same feaure for PowerPC.
      
      The hardware that records the branch stace is not stopped automatically on
      software events. Therefore, it is necessary to stop it in software soon.
      Otherwise, the hardware buffers/registers will be flushed. One of the key
      design consideration in this set is to minimize the number of branch record
      entries between the event triggers and the hardware recorder is stopped.
      Based on this goal, current design is different from the discussions in
      original RFC [1]:
       1) Static call is used when supported, to save function pointer
          dereference;
       2) intel_pmu_lbr_disable_all is used instead of perf_pmu_disable(),
          because the latter uses about 10 entries before stopping LBR.
      
      With current code, on Intel CPU, LBR is stopped after 7 branch entries
      after fexit triggers:
      
      ID: 0 from bpf_get_branch_snapshot+18 to intel_pmu_snapshot_branch_stack+0
      ID: 1 from __brk_limit+477143934 to bpf_get_branch_snapshot+0
      ID: 2 from __brk_limit+477192263 to __brk_limit+477143880  # trampoline
      ID: 3 from __bpf_prog_enter+34 to __brk_limit+477192251
      ID: 4 from migrate_disable+60 to __bpf_prog_enter+9
      ID: 5 from __bpf_prog_enter+4 to migrate_disable+0
      ID: 6 from bpf_testmod_loop_test+20 to __bpf_prog_enter+0
      ID: 7 from bpf_testmod_loop_test+20 to bpf_testmod_loop_test+13
      ID: 8 from bpf_testmod_loop_test+20 to bpf_testmod_loop_test+13
      ...
      
      [1] https://lore.kernel.org/bpf/20210818012937.2522409-1-songliubraving@fb.com/
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      14bef1ab
    • Song Liu's avatar
      selftests/bpf: Add test for bpf_get_branch_snapshot · 025bd7c7
      Song Liu authored
      This test uses bpf_get_branch_snapshot from a fexit program. The test uses
      a target function (bpf_testmod_loop_test) and compares the record against
      kallsyms. If there isn't enough record matching kallsyms, the test fails.
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20210910183352.3151445-4-songliubraving@fb.com
      025bd7c7
    • Song Liu's avatar
      bpf: Introduce helper bpf_get_branch_snapshot · 856c02db
      Song Liu authored
      Introduce bpf_get_branch_snapshot(), which allows tracing pogram to get
      branch trace from hardware (e.g. Intel LBR). To use the feature, the
      user need to create perf_event with proper branch_record filtering
      on each cpu, and then calls bpf_get_branch_snapshot in the bpf function.
      On Intel CPUs, VLBR event (raw event 0x1b00) can be use for this.
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20210910183352.3151445-3-songliubraving@fb.com
      856c02db
    • Song Liu's avatar
      perf: Enable branch record for software events · c22ac2a3
      Song Liu authored
      The typical way to access branch record (e.g. Intel LBR) is via hardware
      perf_event. For CPUs with FREEZE_LBRS_ON_PMI support, PMI could capture
      reliable LBR. On the other hand, LBR could also be useful in non-PMI
      scenario. For example, in kretprobe or bpf fexit program, LBR could
      provide a lot of information on what happened with the function. Add API
      to use branch record for software use.
      
      Note that, when the software event triggers, it is necessary to stop the
      branch record hardware asap. Therefore, static_call is used to remove some
      branch instructions in this process.
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/bpf/20210910183352.3151445-2-songliubraving@fb.com
      c22ac2a3
  2. 10 Sep, 2021 23 commits
  3. 09 Sep, 2021 1 commit
    • Quentin Monnet's avatar
      libbpf: Add LIBBPF_DEPRECATED_SINCE macro for scheduling API deprecations · 0b46b755
      Quentin Monnet authored
      Introduce a macro LIBBPF_DEPRECATED_SINCE(major, minor, message) to prepare
      the deprecation of two API functions. This macro marks functions as deprecated
      when libbpf's version reaches the values passed as an argument.
      
      As part of this change libbpf_version.h header is added with recorded major
      (LIBBPF_MAJOR_VERSION) and minor (LIBBPF_MINOR_VERSION) libbpf version macros.
      They are now part of libbpf public API and can be relied upon by user code.
      libbpf_version.h is installed system-wide along other libbpf public headers.
      
      Due to this new build-time auto-generated header, in-kernel applications
      relying on libbpf (resolve_btfids, bpftool, bpf_preload) are updated to
      include libbpf's output directory as part of a list of include search paths.
      Better fix would be to use libbpf's make_install target to install public API
      headers, but that clean up is left out as a future improvement. The build
      changes were tested by building kernel (with KBUILD_OUTPUT and O= specified
      explicitly), bpftool, libbpf, selftests/bpf, and resolve_btfids builds. No
      problems were detected.
      
      Note that because of the constraints of the C preprocessor we have to write
      a few lines of macro magic for each version used to prepare deprecation (0.6
      for now).
      
      Also, use LIBBPF_DEPRECATED_SINCE() to schedule deprecation of
      btf__get_from_id() and btf__load(), which are replaced by
      btf__load_from_kernel_by_id() and btf__load_into_kernel(), respectively,
      starting from future libbpf v0.6. This is part of libbpf 1.0 effort ([0]).
      
        [0] Closes: https://github.com/libbpf/libbpf/issues/278Co-developed-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Co-developed-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210908213226.1871016-1-andrii@kernel.org
      0b46b755
  4. 08 Sep, 2021 6 commits
  5. 07 Sep, 2021 1 commit
  6. 05 Sep, 2021 3 commits
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-for-v5.15-2021-09-04' of... · 27151f17
      Linus Torvalds authored
      Merge tag 'perf-tools-for-v5.15-2021-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
      Pull perf tool updates from Arnaldo Carvalho de Melo:
       "New features:
      
         - Improvements for the flamegraph python script, including:
             - Display perf.data header
             - Display PIDs of user stacks
             - Added option to change color scheme
             - Default to blue/green color scheme to improve accessibility
             - Correctly identify kernel stacks when debuginfo is available
      
         - Improvements for 'perf bench futex':
             - Add --mlockall parameter
             - Add --broadcast and --pi to the 'requeue' sub benchmark
      
         - Add support for PMU aliases.
      
         - Introduce an ARM Coresight ETE decoder.
      
         - Add a 'perf bench' entry for evlist open/close operations, to help
           quantify improvements with multithreading 'perf record'.
      
         - Allow reporting the [un]throttle PERF_RECORD_ meta event in 'perf
           script's python scripting.
      
         - Add a 'perf test' entry for PMU aliases.
      
         - Add a 'perf test' entry for 'perf record/perf report/perf script'
           pipe mode.
      
        Fixes:
      
         - perf script dlfilter (API for filtering via dynamically loaded
           shared object introduced in v5.14) fixes and a 'perf test' entry
           for it.
      
         - Fix get_current_dir_name() compilation on Android.
      
         - Fix issues with asciidoc and double dashes uses.
      
         - Fix memory leaks in the BTF handling code.
      
         - Fix leftover problems in the Documentation from the infrastructure
           originally lifted from the git codebase.
      
         - Fix *probe_vfs_getname.sh 'perf test' failures.
      
         - Handle fd gaps in 'perf test's test__dso_data_reopen().
      
         - Make sure to show disasembly warnings for 'perf annotate --stdio'.
      
         - Fix output from pipe to file and vice-versa in 'perf
           record/report/script'.
      
         - Correct 'perf data -h' output.
      
         - Fix wrong comm in system-wide mode with 'perf record --delay'.
      
         - Do not allow --for-each-cgroup without cpu in 'perf stat'
      
         - Make 'perf test --skip' work on shell tests.
      
         - Fix libperf's verbose printing.
      
        Misc improvements:
      
         - Preparatory patches for multithreading various 'perf record' phases
           (synthesizing, opening, recording, etc).
      
         - Add sparse context/locking annotations in compiler-types.h, also to
           help with the multithreading effort.
      
         - Optimize the generation of the arch specific erno tables used in
           'perf trace'.
      
         - Optimize libperf's perf_cpu_map__max().
      
         - Improve ARM's CoreSight warnings.
      
         - Report collisions in AUX records.
      
         - Improve warnings for the LLVM 'perf test' entry.
      
         - Improve the PMU events 'perf test' codebase.
      
         - perf test: Do not compare overheads in the zstd comp test
      
         - Better support annotation on ARM.
      
         - Update 'perf trace's cmd string table to decode sys_bpf() first
           arg.
      
        Vendor events:
      
         - Add JSON events and metrics for Intel's Ice Lake, Tiger Lake and
           Elhart Lake.
      
         - Update JSON eventsand metrics for Intel's Cascade Lake and Sky Lake
           servers.
      
        Hardware tracing:
      
         - Improvements for the ARM hardware tracing auxtrace support"
      
      * tag 'perf-tools-for-v5.15-2021-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (130 commits)
        perf tests: Add test for PMU aliases
        perf pmu: Add PMU alias support
        perf session: Report collisions in AUX records
        perf script python: Allow reporting the [un]throttle PERF_RECORD_ meta event
        perf build: Report failure for testing feature libopencsd
        perf cs-etm: Show a warning for an unknown magic number
        perf cs-etm: Print the decoder name
        perf cs-etm: Create ETE decoder
        perf cs-etm: Update OpenCSD decoder for ETE
        perf cs-etm: Fix typo
        perf cs-etm: Save TRCDEVARCH register
        perf cs-etm: Refactor out ETMv4 header saving
        perf cs-etm: Initialise architecture based on TRCIDR1
        perf cs-etm: Refactor initialisation of decoder params.
        tools build: Fix feature detect clean for out of source builds
        perf evlist: Add evlist__for_each_entry_from() macro
        perf evsel: Handle precise_ip fallback in evsel__open_cpu()
        perf evsel: Move bpf_counter__install_pe() to success path in evsel__open_cpu()
        perf evsel: Move test_attr__open() to success path in evsel__open_cpu()
        perf evsel: Move ignore_missing_thread() to fallback code
        ...
      27151f17
    • Linus Torvalds's avatar
      Merge tag 'trace-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 58ca2415
      Linus Torvalds authored
      Pull tracing updates from Steven Rostedt:
      
       - simplify the Kconfig use of FTRACE and TRACE_IRQFLAGS_SUPPORT
      
       - bootconfig can now start histograms
      
       - bootconfig supports group/all enabling
      
       - histograms now can put values in linear size buckets
      
       - execnames can be passed to synthetic events
      
       - introduce "event probes" that attach to other events and can retrieve
         data from pointers of fields, or record fields as different types (a
         pointer to a string as a string instead of just a hex number)
      
       - various fixes and clean ups
      
      * tag 'trace-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (35 commits)
        tracing/doc: Fix table format in histogram code
        selftests/ftrace: Add selftest for testing duplicate eprobes and kprobes
        selftests/ftrace: Add selftest for testing eprobe events on synthetic events
        selftests/ftrace: Add test case to test adding and removing of event probe
        selftests/ftrace: Fix requirement check of README file
        selftests/ftrace: Add clear_dynamic_events() to test cases
        tracing: Add a probe that attaches to trace events
        tracing/probes: Reject events which have the same name of existing one
        tracing/probes: Have process_fetch_insn() take a void * instead of pt_regs
        tracing/probe: Change traceprobe_set_print_fmt() to take a type
        tracing/probes: Use struct_size() instead of defining custom macros
        tracing/probes: Allow for dot delimiter as well as slash for system names
        tracing/probe: Have traceprobe_parse_probe_arg() take a const arg
        tracing: Have dynamic events have a ref counter
        tracing: Add DYNAMIC flag for dynamic events
        tracing: Replace deprecated CPU-hotplug functions.
        MAINTAINERS: Add an entry for os noise/latency
        tracepoint: Fix kerneldoc comments
        bootconfig/tracing/ktest: Update ktest example for boot-time tracing
        tools/bootconfig: Use per-group/all enable option in ftrace2bconf script
        ...
      58ca2415
    • Linus Torvalds's avatar
      Merge tag 'arc-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc · e07af262
      Linus Torvalds authored
      Pull ARC updates from Vineet Gupta:
       "Finally a big pile of changes for ARC (atomics/mm). These are from our
        internal arc64 tree, preparing mainline for eventual arc64 support.
        I'm spreading them out to avoid tsunami of patches in one release.
      
         - MM rework:
             - Implement up to 4 paging levels
             - Enable STRICT_MM_TYPECHECK
             - switch pgtable_t back to 'struct page *'
      
         - Atomics rework / implement relaxed accessors
      
         - Retire legacy MMUv1,v2; ARC750 cores
      
         - A few other build errors, typos"
      
      * tag 'arc-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: (33 commits)
        ARC: mm: vmalloc sync from kernel to user table to update PMD ...
        ARC: mm: support 4 levels of page tables
        ARC: mm: support 3 levels of page tables
        ARC: mm: switch to asm-generic/pgalloc.h
        ARC: mm: switch pgtable_t back to struct page *
        ARC: mm: hack to allow 2 level build with 4 level code
        ARC: mm: disintegrate pgtable.h into levels and flags
        ARC: mm: disintegrate mmu.h (arcv2 bits out)
        ARC: mm: move MMU specific bits out of entry code ...
        ARC: mm: move MMU specific bits out of ASID allocator
        ARC: mm: non-functional code movement/cleanup
        ARC: mm: pmd_populate* to use the canonical set_pmd (and drop pmd_set)
        ARC: ioremap: use more commonly used PAGE_KERNEL based uncached flag
        ARC: mm: Enable STRICT_MM_TYPECHECKS
        ARC: mm: Fixes to allow STRICT_MM_TYPECHECKS
        ARC: mm: move mmu/cache externs out to setup.h
        ARC: mm: remove tlb paranoid code
        ARC: mm: use SCRATCH_DATA0 register for caching pgdir in ARCv2 only
        ARC: retire MMUv1 and MMUv2 support
        ARC: retire ARC750 support
        ...
      e07af262