1. 17 Sep, 2021 8 commits
  2. 15 Sep, 2021 17 commits
  3. 14 Sep, 2021 6 commits
  4. 13 Sep, 2021 6 commits
    • Andrii Nakryiko's avatar
      libbpf: Make libbpf_version.h non-auto-generated · 2f383041
      Andrii Nakryiko authored
      Turn previously auto-generated libbpf_version.h header into a normal
      header file. This prevents various tricky Makefile integration issues,
      simplifies the overall build process, but also allows to further extend
      it with some more versioning-related APIs in the future.
      
      To prevent accidental out-of-sync versions as defined by libbpf.map and
      libbpf_version.h, Makefile checks their consistency at build time.
      
      Simultaneously with this change bump libbpf.map to v0.6.
      
      Also undo adding libbpf's output directory into include path for
      kernel/bpf/preload, bpftool, and resolve_btfids, which is not necessary
      because libbpf_version.h is just a normal header like any other.
      
      Fixes: 0b46b755 ("libbpf: Add LIBBPF_DEPRECATED_SINCE macro for scheduling API deprecations")
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210913222309.3220849-1-andrii@kernel.org
      2f383041
    • Daniel Borkmann's avatar
      bpf, selftests: Replicate tailcall limit test for indirect call case · dbd7eb14
      Daniel Borkmann authored
      The tailcall_3 test program uses bpf_tail_call_static() where the JIT
      would patch a direct jump. Add a new tailcall_6 test program replicating
      exactly the same test just ensuring that bpf_tail_call() uses a map
      index where the verifier cannot make assumptions this time.
      
      In other words, this will now cover both on x86-64 JIT, meaning, JIT
      images with emit_bpf_tail_call_direct() emission as well as JIT images
      with emit_bpf_tail_call_indirect() emission.
      
        # echo 1 > /proc/sys/net/core/bpf_jit_enable
        # ./test_progs -t tailcalls
        #136/1 tailcalls/tailcall_1:OK
        #136/2 tailcalls/tailcall_2:OK
        #136/3 tailcalls/tailcall_3:OK
        #136/4 tailcalls/tailcall_4:OK
        #136/5 tailcalls/tailcall_5:OK
        #136/6 tailcalls/tailcall_6:OK
        #136/7 tailcalls/tailcall_bpf2bpf_1:OK
        #136/8 tailcalls/tailcall_bpf2bpf_2:OK
        #136/9 tailcalls/tailcall_bpf2bpf_3:OK
        #136/10 tailcalls/tailcall_bpf2bpf_4:OK
        #136/11 tailcalls/tailcall_bpf2bpf_5:OK
        #136 tailcalls:OK
        Summary: 1/11 PASSED, 0 SKIPPED, 0 FAILED
      
        # echo 0 > /proc/sys/net/core/bpf_jit_enable
        # ./test_progs -t tailcalls
        #136/1 tailcalls/tailcall_1:OK
        #136/2 tailcalls/tailcall_2:OK
        #136/3 tailcalls/tailcall_3:OK
        #136/4 tailcalls/tailcall_4:OK
        #136/5 tailcalls/tailcall_5:OK
        #136/6 tailcalls/tailcall_6:OK
        [...]
      
      For interpreter, the tailcall_1-6 tests are passing as well. The later
      tailcall_bpf2bpf_* are failing due lack of bpf2bpf + tailcall support
      in interpreter, so this is expected.
      
      Also, manual inspection shows that both loaded programs from tailcall_3
      and tailcall_6 test case emit the expected opcodes:
      
      * tailcall_3 disasm, emit_bpf_tail_call_direct():
      
        [...]
         b:   push   %rax
         c:   push   %rbx
         d:   push   %r13
         f:   mov    %rdi,%rbx
        12:   movabs $0xffff8d3f5afb0200,%r13
        1c:   mov    %rbx,%rdi
        1f:   mov    %r13,%rsi
        22:   xor    %edx,%edx                 _
        24:   mov    -0x4(%rbp),%eax          |  limit check
        2a:   cmp    $0x20,%eax               |
        2d:   ja     0x0000000000000046       |
        2f:   add    $0x1,%eax                |
        32:   mov    %eax,-0x4(%rbp)          |_
        38:   nopl   0x0(%rax,%rax,1)
        3d:   pop    %r13
        3f:   pop    %rbx
        40:   pop    %rax
        41:   jmpq   0xffffffffffffe377
        [...]
      
      * tailcall_6 disasm, emit_bpf_tail_call_indirect():
      
        [...]
        47:   movabs $0xffff8d3f59143a00,%rsi
        51:   mov    %edx,%edx
        53:   cmp    %edx,0x24(%rsi)
        56:   jbe    0x0000000000000093        _
        58:   mov    -0x4(%rbp),%eax          |  limit check
        5e:   cmp    $0x20,%eax               |
        61:   ja     0x0000000000000093       |
        63:   add    $0x1,%eax                |
        66:   mov    %eax,-0x4(%rbp)          |_
        6c:   mov    0x110(%rsi,%rdx,8),%rcx
        74:   test   %rcx,%rcx
        77:   je     0x0000000000000093
        79:   pop    %rax
        7a:   mov    0x30(%rcx),%rcx
        7e:   add    $0xb,%rcx
        82:   callq  0x000000000000008e
        87:   pause
        89:   lfence
        8c:   jmp    0x0000000000000087
        8e:   mov    %rcx,(%rsp)
        92:   retq
        [...]
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Tested-by: default avatarTiezhu Yang <yangtiezhu@loongson.cn>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Acked-by: default avatarJohan Almbladh <johan.almbladh@anyfinetworks.com>
      Acked-by: default avatarPaul Chaignon <paul@cilium.io>
      Link: https://lore.kernel.org/bpf/CAM1=_QRyRVCODcXo_Y6qOm1iT163HoiSj8U2pZ8Rj3hzMTT=HQ@mail.gmail.com
      Link: https://lore.kernel.org/bpf/20210910091900.16119-1-daniel@iogearbox.net
      dbd7eb14
    • Alexei Starovoitov's avatar
      Merge branch 'bpf: introduce bpf_get_branch_snapshot' · 14bef1ab
      Alexei Starovoitov authored
      Song Liu says:
      
      ====================
      
      Changes v6 => v7:
      1. Improve/fix intel_pmu_snapshot_branch_stack() logic. (Peter).
      
      Changes v5 => v6:
      1. Add local_irq_save/restore to intel_pmu_snapshot_branch_stack. (Peter)
      2. Remove buf and size check in bpf_get_branch_snapshot, move flags check
         to later fo the function. (Peter, Andrii)
      3. Revise comments for bpf_get_branch_snapshot in bpf.h (Andrii)
      
      Changes v4 => v5:
      1. Modify perf_snapshot_branch_stack_t to save some memcpy. (Andrii)
      2. Minor fixes in selftests. (Andrii)
      
      Changes v3 => v4:
      1. Do not reshuffle intel_pmu_disable_all(). Use some inline to save LBR
         entries. (Peter)
      2. Move static_call(perf_snapshot_branch_stack) to the helper. (Alexei)
      3. Add argument flags to bpf_get_branch_snapshot. (Andrii)
      4. Make MAX_BRANCH_SNAPSHOT an enum (Andrii). And rename it as
         PERF_MAX_BRANCH_SNAPSHOT
      5. Make bpf_get_branch_snapshot similar to bpf_read_branch_records.
         (Andrii)
      6. Move the test target function to bpf_testmod. Updated kallsyms_find_next
         to work properly with modules. (Andrii)
      
      Changes v2 => v3:
      1. Fix the use of static_call. (Peter)
      2. Limit the use to perfmon version >= 2. (Peter)
      3. Modify intel_pmu_snapshot_branch_stack() to use intel_pmu_disable_all
         and intel_pmu_enable_all().
      
      Changes v1 => v2:
      1. Rename the helper as bpf_get_branch_snapshot;
      2. Fix/simplify the use of static_call;
      3. Instead of percpu variables, let intel_pmu_snapshot_branch_stack output
         branch records to an output argument of type perf_branch_snapshot.
      
      Branch stack can be very useful in understanding software events. For
      example, when a long function, e.g. sys_perf_event_open, returns an errno,
      it is not obvious why the function failed. Branch stack could provide very
      helpful information in this type of scenarios.
      
      This set adds support to read branch stack with a new BPF helper
      bpf_get_branch_trace(). Currently, this is only supported in Intel systems.
      It is also possible to support the same feaure for PowerPC.
      
      The hardware that records the branch stace is not stopped automatically on
      software events. Therefore, it is necessary to stop it in software soon.
      Otherwise, the hardware buffers/registers will be flushed. One of the key
      design consideration in this set is to minimize the number of branch record
      entries between the event triggers and the hardware recorder is stopped.
      Based on this goal, current design is different from the discussions in
      original RFC [1]:
       1) Static call is used when supported, to save function pointer
          dereference;
       2) intel_pmu_lbr_disable_all is used instead of perf_pmu_disable(),
          because the latter uses about 10 entries before stopping LBR.
      
      With current code, on Intel CPU, LBR is stopped after 7 branch entries
      after fexit triggers:
      
      ID: 0 from bpf_get_branch_snapshot+18 to intel_pmu_snapshot_branch_stack+0
      ID: 1 from __brk_limit+477143934 to bpf_get_branch_snapshot+0
      ID: 2 from __brk_limit+477192263 to __brk_limit+477143880  # trampoline
      ID: 3 from __bpf_prog_enter+34 to __brk_limit+477192251
      ID: 4 from migrate_disable+60 to __bpf_prog_enter+9
      ID: 5 from __bpf_prog_enter+4 to migrate_disable+0
      ID: 6 from bpf_testmod_loop_test+20 to __bpf_prog_enter+0
      ID: 7 from bpf_testmod_loop_test+20 to bpf_testmod_loop_test+13
      ID: 8 from bpf_testmod_loop_test+20 to bpf_testmod_loop_test+13
      ...
      
      [1] https://lore.kernel.org/bpf/20210818012937.2522409-1-songliubraving@fb.com/
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      14bef1ab
    • Song Liu's avatar
      selftests/bpf: Add test for bpf_get_branch_snapshot · 025bd7c7
      Song Liu authored
      This test uses bpf_get_branch_snapshot from a fexit program. The test uses
      a target function (bpf_testmod_loop_test) and compares the record against
      kallsyms. If there isn't enough record matching kallsyms, the test fails.
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20210910183352.3151445-4-songliubraving@fb.com
      025bd7c7
    • Song Liu's avatar
      bpf: Introduce helper bpf_get_branch_snapshot · 856c02db
      Song Liu authored
      Introduce bpf_get_branch_snapshot(), which allows tracing pogram to get
      branch trace from hardware (e.g. Intel LBR). To use the feature, the
      user need to create perf_event with proper branch_record filtering
      on each cpu, and then calls bpf_get_branch_snapshot in the bpf function.
      On Intel CPUs, VLBR event (raw event 0x1b00) can be use for this.
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20210910183352.3151445-3-songliubraving@fb.com
      856c02db
    • Song Liu's avatar
      perf: Enable branch record for software events · c22ac2a3
      Song Liu authored
      The typical way to access branch record (e.g. Intel LBR) is via hardware
      perf_event. For CPUs with FREEZE_LBRS_ON_PMI support, PMI could capture
      reliable LBR. On the other hand, LBR could also be useful in non-PMI
      scenario. For example, in kretprobe or bpf fexit program, LBR could
      provide a lot of information on what happened with the function. Add API
      to use branch record for software use.
      
      Note that, when the software event triggers, it is necessary to stop the
      branch record hardware asap. Therefore, static_call is used to remove some
      branch instructions in this process.
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/bpf/20210910183352.3151445-2-songliubraving@fb.com
      c22ac2a3
  5. 10 Sep, 2021 3 commits
    • Vadim Fedorenko's avatar
      selftests/bpf: Test new __sk_buff field hwtstamp · 3384c7c7
      Vadim Fedorenko authored
      Analogous to the gso_segs selftests introduced in commit d9ff286a
      ("bpf: allow BPF programs access skb_shared_info->gso_segs field").
      Signed-off-by: default avatarVadim Fedorenko <vfedorenko@novek.ru>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20210909220409.8804-3-vfedorenko@novek.ru
      3384c7c7
    • Vadim Fedorenko's avatar
      bpf: Add hardware timestamp field to __sk_buff · f64c4ace
      Vadim Fedorenko authored
      BPF programs may want to know hardware timestamps if NIC supports
      such timestamping.
      
      Expose this data as hwtstamp field of __sk_buff the same way as
      gso_segs/gso_size. This field could be accessed from the same
      programs as tstamp field, but it's read-only field. Explicit test
      to deny access to padding data is added to bpf_skb_is_valid_access.
      
      Also update BPF_PROG_TEST_RUN tests of the feature.
      Signed-off-by: default avatarVadim Fedorenko <vfedorenko@novek.ru>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20210909220409.8804-2-vfedorenko@novek.ru
      f64c4ace
    • Daniel Borkmann's avatar
      Merge branch 'bpf-xsk-selftests' · e876a036
      Daniel Borkmann authored
      Magnus Karlsson says:
      
      ====================
      This patch set facilitates adding new tests as well as describing
      existing ones in the xsk selftests suite and adds 3 new test suites at
      the end. The idea is to isolate the run-time that executes the test
      from the actual implementation of the test. Today, implementing a test
      amounts to adding test specific if-statements all around the run-time,
      which is not scalable or amenable for reuse. This patch set instead
      introduces a test specification that is the only thing that a test
      fills in. The run-time then gets this specification and acts upon it
      completely unaware of what test it is executing. This way, we can get
      rid of all test specific if-statements from the run-time and the
      implementation of the test can be contained in a single function. This
      hopefully makes it easier to add tests and for users to understand
      what the test accomplishes.
      
      As a recap of what the run-time does: each test is based on the
      run-time launching two threads and connecting a veth link between the
      two threads. Each thread opens an AF_XDP socket on that veth interface
      and one of them sends traffic that the other one receives and
      validates. Each thread has its own umem. Note that this behavior is
      not changed by this patch set.
      
      A test specification consists of several items. Most importantly:
      
      * Two packet streams. One for Tx thread that specifies what traffic to
        send and one for the Rx thread that specifies what that thread
        should receive. If it receives exactly what is specified, the test
        passes, otherwise it fails. A packet stream can also specify what
        buffers in the umem that should be used by the Rx and Tx threads.
      
      * What kind of AF_XDP sockets it should create and bind to what
        interfaces
      
      * How many times it should repeat the socket creation and destruction
      
      * The name of the test
      
      The interface for the test spec is the following:
      
      void test_spec_init(struct test_spec *test, struct ifobject *ifobj_tx,
                          struct ifobject *ifobj_rx, enum test_mode mode);
      
      /* Reset everything but the interface specifications and the mode */
      void test_spec_reset(struct test_spec *test);
      
      void test_spec_set_name(struct test_spec *test, const char *name);
      
      Packet streams have the following interfaces:
      
      struct pkt *pkt_stream_get_pkt(struct pkt_stream *pkt_stream, u32 pkt_nb)
      
      struct pkt *pkt_stream_get_next_rx_pkt(struct pkt_stream *pkt_stream)
      
      struct pkt_stream *pkt_stream_generate(struct xsk_umem_info *umem,
                                             u32 nb_pkts, u32 pkt_len);
      
      void pkt_stream_delete(struct pkt_stream *pkt_stream);
      
      struct pkt_stream *pkt_stream_clone(struct xsk_umem_info *umem,
                                          struct pkt_stream *pkt_stream);
      
      /* Replaces all packets in the stream*/
      void pkt_stream_replace(struct test_spec *test, u32 nb_pkts, u32 pkt_len);
      
      /* Replaces every other packet in the stream */
      void pkt_stream_replace_half(struct test_spec *test, u32 pkt_len, u32 offset);
      
      /* For creating custom made packet streams */
      void pkt_stream_generate_custom(struct test_spec *test, struct pkt *pkts,
                                      u32 nb_pkts);
      
      /* Restores the default packet stream */
      void pkt_stream_restore_default(struct test_spec *test);
      
      A test can then then in the most basic case described like this
      (provided the test specification has been created before calling the
      function):
      
      static bool testapp_aligned(struct test_spec *test)
      {
              test_spec_set_name(test, "RUN_TO_COMPLETION");
              testapp_validate_traffic(test);
      }
      
      Running the same test in unaligned mode would then look like this:
      
      static bool testapp_unaligned(struct test_spec *test)
      {
              if (!hugepages_present(test->ifobj_tx)) {
                      ksft_test_result_skip("No 2M huge pages present.\n");
                      return false;
              }
      
              test_spec_set_name(test, "UNALIGNED_MODE");
              test->ifobj_tx->umem->unaligned_mode = true;
              test->ifobj_rx->umem->unaligned_mode = true;
              /* Let half of the packets straddle a buffer boundrary */
              pkt_stream_replace_half(test, PKT_SIZE,
                                      XSK_UMEM__DEFAULT_FRAME_SIZE - 32);
      	/* Populate fill ring with addresses in the packet stream */
              test->ifobj_rx->pkt_stream->use_addr_for_fill = true;
              testapp_validate_traffic(test);
      
              pkt_stream_restore_default(test);
      	return true;
      }
      
      3 of the last 4 patches in the set add 3 new test suites, one for
      unaligned mode, one for testing the rejection of tricky invalid
      descriptors plus the acceptance of some valid ones in the Tx ring, and
      one for testing 2K frame sizes (the default is 4K).
      
      What is left to do for follow-up patches:
      
      * Convert the statistics tests to the new framework.
      
      * Implement a way of registering new tests without having the enum
        test_type. Once this has been done (together with the previous
        bullet), all the test types can be dropped from the header
        file. This means that we should be able to add tests by just writing
        a single function with a new test specification, which is one of the
        goals.
      
      * Introduce functions for manipulating parts of the test or interface
        spec instead of direct manipulations such as
        test->ifobj_rx->pkt_stream->use_addr_for_fill = true; which is kind
        of awkward.
      
      * Move the run-time and its interface to its own .c and .h files. Then
        we can have all the tests in a separate file.
      
      * Better error reporting if a test fails. Today it does not state what
        test fails and might not continue execute the rest of the tests due
        to this failure. Failures are not propagated upwards through the
        functions so a failed test will also be a passed test, which messes
        up the stats counting. This needs to be changed.
      
      * Add option to run specific test instead of all of them
      
      * Introduce pacing of sent packets so that they are never dropped
        by the receiver even if it is stalled for some reason. If you run
        the current tests on a heavily loaded system, they might fail in SKB
        mode due to packets being dropped by the driver on Tx. Though I have
        never seen it, it might happen.
      
      v1 -> v2:
      
      * Fixed a number of spelling errors [Maciej]
      * Fixed use after free bug in pkt_stream_replace() [Maciej]
      * pkt_stream_set -> pkt_stream_generate_custom [Maciej]
      * Fixed formatting problem in testapp_invalid_desc() [Maciej]
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      e876a036