1. 15 Feb, 2023 5 commits
  2. 14 Feb, 2023 12 commits
    • Alexei Starovoitov's avatar
      Revert "bpf: Add --skip_encoding_btf_inconsistent_proto, --btf_gen_optimized... · 1f5dfcc7
      Alexei Starovoitov authored
      Revert "bpf: Add --skip_encoding_btf_inconsistent_proto, --btf_gen_optimized to pahole flags for v1.25"
      
      This reverts commit 0243d3df.
      
      pahole 1.25 is too aggressive removing functions.
      With clang compiled kernel the following is seen:
      WARN: resolve_btfids: unresolved symbol tcp_reno_cong_avoid
      WARN: resolve_btfids: unresolved symbol dctcp_update_alpha
      WARN: resolve_btfids: unresolved symbol cubictcp_cong_avoid
      WARN: resolve_btfids: unresolved symbol bpf_xdp_metadata_rx_timestamp
      WARN: resolve_btfids: unresolved symbol bpf_xdp_metadata_rx_hash
      WARN: resolve_btfids: unresolved symbol bpf_task_kptr_get
      WARN: resolve_btfids: unresolved symbol bpf_task_acquire_not_zero
      WARN: resolve_btfids: unresolved symbol bpf_rdonly_cast
      WARN: resolve_btfids: unresolved symbol bpf_kfunc_call_test_static_unused_arg
      WARN: resolve_btfids: unresolved symbol bpf_kfunc_call_test_ref
      WARN: resolve_btfids: unresolved symbol bpf_kfunc_call_test_pass_ctx
      WARN: resolve_btfids: unresolved symbol bpf_kfunc_call_test_pass2
      WARN: resolve_btfids: unresolved symbol bpf_kfunc_call_test_pass1
      WARN: resolve_btfids: unresolved symbol bpf_kfunc_call_test_mem_len_pass1
      WARN: resolve_btfids: unresolved symbol bpf_kfunc_call_test_mem_len_fail2
      WARN: resolve_btfids: unresolved symbol bpf_kfunc_call_test_mem_len_fail1
      WARN: resolve_btfids: unresolved symbol bpf_kfunc_call_test_kptr_get
      WARN: resolve_btfids: unresolved symbol bpf_kfunc_call_test_fail3
      WARN: resolve_btfids: unresolved symbol bpf_kfunc_call_test_fail2
      WARN: resolve_btfids: unresolved symbol bpf_kfunc_call_test_acquire
      WARN: resolve_btfids: unresolved symbol bpf_kfunc_call_test2
      WARN: resolve_btfids: unresolved symbol bpf_kfunc_call_test1
      WARN: resolve_btfids: unresolved symbol bpf_kfunc_call_memb_release
      WARN: resolve_btfids: unresolved symbol bpf_kfunc_call_memb1_release
      WARN: resolve_btfids: unresolved symbol bpf_kfunc_call_int_mem_release
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      1f5dfcc7
    • Joanne Koong's avatar
      selftests/bpf: Clean up dynptr prog_tests · 50a7cedb
      Joanne Koong authored
      Clean up prog_tests/dynptr.c by removing the unneeded "expected_err_msg"
      in the dynptr_tests struct, which is a remnant from converting the fail
      tests cases to use the generic verification tester.
      Signed-off-by: default avatarJoanne Koong <joannelkoong@gmail.com>
      Link: https://lore.kernel.org/r/20230214051332.4007131-2-joannelkoong@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      50a7cedb
    • Joanne Koong's avatar
      selftests/bpf: Clean up user_ringbuf, cgrp_kfunc, kfunc_dynptr_param tests · 8032cad1
      Joanne Koong authored
      Clean up user_ringbuf, cgrp_kfunc, and kfunc_dynptr_param tests to use
      the generic verification tester for checking verifier rejections.
      The generic verification tester uses btf_decl_tag-based annotations
      for verifying that the tests fail with the expected log messages.
      Signed-off-by: default avatarJoanne Koong <joannelkoong@gmail.com>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      Reviewed-by: default avatarRoberto Sassu <roberto.sassu@huawei.com>
      Link: https://lore.kernel.org/r/20230214051332.4007131-1-joannelkoong@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      8032cad1
    • Alexei Starovoitov's avatar
      Merge branch 'BPF rbtree next-gen datastructure' · c8ea0997
      Alexei Starovoitov authored
      Dave Marchevsky says:
      
      ====================
      This series adds a rbtree datastructure following the "next-gen
      datastructure" precedent set by recently-added linked-list [0]. This is
      a reimplementation of previous rbtree RFC [1] to use kfunc + kptr
      instead of adding a new map type. This series adds a smaller set of API
      functions than that RFC - just the minimum needed to support current
      cgfifo example scheduler in ongoing sched_ext effort [2], namely:
      
        bpf_rbtree_add
        bpf_rbtree_remove
        bpf_rbtree_first
      
      The meat of this series is bugfixes and verifier infra work to support
      these API functions. Adding more rbtree kfuncs in future patches should
      be straightforward as a result.
      
      First, the series refactors and extends linked_list's release_on_unlock
      logic. The concept of "reference to node that was added to data
      structure" is formalized as "non-owning reference". From linked_list's
      perspective this non-owning reference after
      linked_list_push_{front,back} has same semantics as release_on_unlock,
      with the addition of writes to such references being valid in the
      critical section. Such references are no longer marked PTR_UNTRUSTED.
      Patches 2 and 13 go into more detail.
      
      The series then adds rbtree API kfuncs and necessary verifier support
      for them - namely support for callback args to kfuncs and some
      non-owning reference interactions that linked_list didn't need.
      
      BPF rbtree uses struct rb_root_cached + existing rbtree lib under the
      hood. From the BPF program writer's perspective, a BPF rbtree is very
      similar to existing linked list. Consider the following example:
      
        struct node_data {
          long key;
          long data;
          struct bpf_rb_node node;
        }
      
        static bool less(struct bpf_rb_node *a, const struct bpf_rb_node *b)
        {
          struct node_data *node_a;
          struct node_data *node_b;
      
          node_a = container_of(a, struct node_data, node);
          node_b = container_of(b, struct node_data, node);
      
          return node_a->key < node_b->key;
        }
      
        private(A) struct bpf_spin_lock glock;
        private(A) struct bpf_rb_root groot __contains(node_data, node);
      
        /* ... in BPF program */
        struct node_data *n, *m;
        struct bpf_rb_node *res;
      
        n = bpf_obj_new(typeof(*n));
        if (!n)
          /* skip */
        n->key = 5;
        n->data = 10;
      
        bpf_spin_lock(&glock);
        bpf_rbtree_add(&groot, &n->node, less);
        bpf_spin_unlock(&glock);
      
        bpf_spin_lock(&glock);
        res = bpf_rbtree_first(&groot);
        if (!res)
          /* skip */
        res = bpf_rbtree_remove(&groot, res);
        if (!res)
          /* skip */
        bpf_spin_unlock(&glock);
      
        m = container_of(res, struct node_data, node);
        bpf_obj_drop(m);
      
      Some obvious similarities:
      
        * Special bpf_rb_root and bpf_rb_node types have same semantics
          as bpf_list_head and bpf_list_node, respectively
        * __contains is used to associated node type with root
        * The spin_lock associated with a rbtree must be held when using
          rbtree API kfuncs
        * Nodes are allocated via bpf_obj_new and dropped via bpf_obj_drop
        * Rbtree takes ownership of node lifetime when a node is added.
          Removing a node gives ownership back to the program, requiring a
          bpf_obj_drop before program exit
      
      Some new additions as well:
      
        * Support for callbacks in kfunc args is added to enable 'less'
          callback use above
        * bpf_rbtree_first is the first graph API function to return a
          non-owning reference instead of convering an arg from own->non-own
        * Because all references to nodes already added to the rbtree are
          non-owning, bpf_rbtree_remove must accept such a reference in order
          to remove it from the tree
      
      Summary of patches:
        Patches 1 - 5 implement the meat of rbtree-specific support in this
        series, gradually building up to implemented kfuncs that verify as
        expected.
      
        Patch 6 adds the bpf_rbtree_{add,first,remove} to bpf_experimental.h.
      
        Patch 7 adds tests, Patch 9 adds documentation.
      
        [0]: lore.kernel.org/bpf/20221118015614.2013203-1-memxor@gmail.com
        [1]: lore.kernel.org/bpf/20220830172759.4069786-1-davemarchevsky@fb.com
        [2]: lore.kernel.org/bpf/20221130082313.3241517-1-tj@kernel.org
      
      Changelog:
      
      v5 -> v6: lore.kernel.org/bpf/20230212092715.1422619-1-davemarchevsky@fb.com/
      
      Patch #'s below refer to the patch's number in v5 unless otherwise stated.
      
      * General / Patch 1
        * Rebase onto latest bpf-next: "bpf: Migrate release_on_unlock logic to non-owning ref semantics"
        * This was Patch 1 of v4, was applied, not included in v6
      
      * Patch 3 - "bpf: Add bpf_rbtree_{add,remove,first} kfuncs"
        * Use bpf_callback_t instead of plain-C fn ptr for bpf_rbtree_add. This
          necessitated having bpf_rbtree_add duplicate rbtree_add's functionality.
          Wrapper function was used w/ internal __bpf_rbtree_add helper so that
          bpf_experimental.h proto could continue to use plain-C fn ptr so BPF progs
          could benefit from typechecking (Alexei)
      
      v4 -> v5: lore.kernel.org/bpf/20230209174144.3280955-1-davemarchevsky@fb.com/
      
      Patch #'s below refer to the patch's number in v4 unless otherwise stated.
      
      * General
        * Rebase onto latest bpf-next: "Merge branch 'bpf, mm: introduce cgroup.memory=nobpf'"
      
      * Patches 1-3 are squashed into "bpf: Migrate release_on_unlock logic to non-owning ref semantics".
        * Added type_is_non_owning_ref helper (Alexei)
        * Use a NON_OWN_REF type flag instead of separate bool (Alexei)
      
      * Patch 8 - "bpf: Special verifier handling for bpf_rbtree_{remove, first}"
        * When doing btf_parse_fields, reject structs with both bpf_list_node and
          bpf_rb_node fields. This is a temporary measure that can be removed after
          "collection identity" followup. See comment added in btf_parse_fields for
          more detail (Kumar, Alexei)
        * Add linked_list BTF test exercising check added to btf_parse_fields
        * Minor changes and moving around of some reg type checks due to NON_OWN_REF type flag
          introduction
      
      * Patch 10 - "selftests/bpf: Add rbtree selftests"
        * Migrate failure tests to RUN_TESTS, __failure, __msg() framework (Alexei)
      
      v3 -> v4: lore.kernel.org/bpf/20230131180016.3368305-1-davemarchevsky@fb.com/
      
      Patch #'s below refer to the patch's number in v3 unless otherwise stated.
      
      * General
        * Don't base this series on "bpf: Refactor release_regno searching logic",
          which was submitted separately as a refactor.
        * Rebase onto latest bpf-next: "samples/bpf: Add openat2() enter/exit tracepoint to syscall_tp sample"
      
      * Patch 2 - "bpf: Improve bpf_reg_state space usage for non-owning ref lock"
        * print_verifier_state change was adding redundant comma after "non_own_ref",
          fix it to put comma in correct place
        * invalidate_non_owning_refs no longer needs to take bpf_active_lock param,
          since any non-owning ref reg in env's cur_state is assumed to use that
          state's active_lock (Alexei)
        * invalidate_non_owning_refs' reg loop should check that the reg being
          inspected is a PTR_TO_BTF_ID before checking reg->non_owning_ref_lock,
          since that field is part of a union and may be filled w/ meaningless bytes
          if reg != PTR_TO_BTF_ID (Alexei)
      
      * Patch 3 - "selftests/bpf: Update linked_list tests for non-owning ref semantics"
        * Change the string searched for by the following tests:
          * linked_list/incorrect_node_off1
          * linked_list/double_push_front
          * linked_list/double_push_back
      
          necessary due to rebase / dropping of "release_regno searching logic" patch
          (see "General" changes)
      
      * Patch 8 - "bpf: Special verifier handling for bpf_rbtree_{remove, first}"
        * Just call invalidate_non_owning_refs w/ env instead of env, lock. (see
          Patch 2 changes)
      
      * Patch 11 - "bpf, documentation: Add graph documentation for non-owning refs"
        * Fix documentation formatting and improve content (David)
        * v3's version of patch 11 was missing some changes, v4's patch 11 is still
          addressing David's feedback from v2
      
      v2 -> v3: lore.kernel.org/bpf/20221217082506.1570898-1-davemarchevsky@fb.com/
      
      Patch #'s below refer to the patch's number in v2 unless otherwise stated.
      
      * Patch 1 - "bpf: Support multiple arg regs w/ ref_obj_id for kfuncs"
        * No longer needed as v3 doesn't have multiple ref_obj_id arg regs
        * The refactoring pieces were submitted separately
          (https://lore.kernel.org/bpf/20230121002417.1684602-1-davemarchevsky@fb.com/)
      
      * Patch 2 - "bpf: Migrate release_on_unlock logic to non-owning ref semantics"
        * Remove KF_RELEASE_NON_OWN flag from list API push methods, just match
          against specific kfuncs for now (Alexei, David)
        * Separate "release non owning reference" logic from KF_RELEASE logic
          (Alexei, David)
        * reg_find_field_offset now correctly tests 'rec' instead of 'reg' after
          calling reg_btf_record (Dan Carpenter)
      
      * New patch added after Patch 2 - "bpf: Improve bpf_reg_state space usage for non-owning ref lock"
        * Eliminates extra bpf_reg_state memory usage by using a bool instead of
          copying lock identity
      
      * Patch 4 - "bpf: rename list_head -> graph_root in field info types"
        * v2's version was applied to bpf-next, not including in respins
      
      * Patch 6 - "bpf: Add bpf_rbtree_{add,remove,first} kfuncs"
        * Remove KF_RELEASE_NON_OWN flag from rbtree_add, just add it to specific
          kfunc matching (Alexei, David)
      
      * Patch 9 - "bpf: Special verifier handling for bpf_rbtree_{remove, first}"
        * Remove KF_INVALIDATE_NON_OWN kfunc flag, just match against specific kfunc
          for now (Alexei, David)
      
      * Patch 11 - "libbpf: Make BTF mandatory if program BTF has spin_lock or alloc_obj type"
        * Drop for now, will submit separately
      
      * Patch 12 - "selftests/bpf: Add rbtree selftests"
        * Some expected-failure tests have different error messages due to "release
          non-owning reference logic" being separated from KF_RELEASE logic in Patch
          2 changes
      
      * Patch 13 - "bpf, documentation: Add graph documentation for non-owning refs"
        * Fix documentation formatting and improve content (David)
      
      v1 -> v2: lore.kernel.org/bpf/20221206231000.3180914-1-davemarchevsky@fb.com/
      
      Series-wide changes:
        * Rename datastructure_{head,node,api} -> graph_{root,node,api} (Alexei)
        * "graph datastructure" in patch summaries to refer to linked_list + rbtree
          instead of "next-gen datastructure" (Alexei)
        * Move from hacky marking of non-owning references as PTR_UNTRUSTED to
          cleaner implementation (Alexei)
        * Add invalidation of non-owning refs to rbtree_remove (Kumar, Alexei)
      
      Patch #'s below refer to the patch's number in v1 unless otherwise stated.
      
      Note that in v1 most of the meaty verifier changes were in the latter half
      of the series. Here, about half of that complexity has been moved to
      "bpf: Migrate release_on_unlock logic to non-owning ref semantics" - was Patch
      3 in v1.
      
      * Patch 1 - "bpf: Loosen alloc obj test in verifier's reg_btf_record"
        * Was applied, dropped from further iterations
      
      * Patch 2 - "bpf: map_check_btf should fail if btf_parse_fields fails"
        * Dropped in favor of verifier check-on-use: when some normal verifier
          checking expects the map to have btf_fields correctly parsed, it won't
          find any and verification will fail
      
      * New patch added before Patch 3 - "bpf: Support multiple arg regs w/ ref_obj_id for kfuncs"
        * Addition of KF_RELEASE_NON_OWN flag, which requires KF_RELEASE, and tagging
          of bpf_list_push_{front,back} KF_RELEASE | KF_RELEASE_NON_OWN, means that
          list-in-list push_{front,back} will trigger "only one ref_obj_id arg reg"
          logic. This is because "head" arg to those functions can be a list-in-list,
          which itself can be an owning reference with ref_obj_id. So need to
          support multiple ref_obj_id for release kfuncs.
      
      * Patch 3 - "bpf: Minor refactor of ref_set_release_on_unlock"
        * Now a major refactor w/ a rename to reflect this
          * "bpf: Migrate release_on_unlock logic to non-owning ref semantics"
        * Replaces release_on_unlock with active_lock logic as discussed in v1
      
      * New patch added after Patch 3 - "selftests/bpf: Update linked_list tests for non_owning_ref logic"
        * Removes "write after push" linked_list failure tests - no longer failure
          scenarios.
      
      * Patch 4 - "bpf: rename list_head -> datastructure_head in field info types"
        * rename to graph_root instead. Similar renamings across the series - see
          series-wide changes.
      
      * Patch 5 - "bpf: Add basic bpf_rb_{root,node} support"
        * OWNER_FIELD_MASK -> GRAPH_ROOT_MASK, OWNEE_FIELD_MASK -> GRAPH_NODE_MASK,
          and change of "owner"/"ownee" in big btf_check_and_fixup_fields comment to
          "root"/"node" (Alexei)
      
      * Patch 6 - "bpf: Add bpf_rbtree_{add,remove,first} kfuncs"
        * bpf_rbtree_remove can no longer return NULL. v2 continues v1's "use type
          system to prevent remove of node that isn't in a datastructure" approach,
          so rbtree_remove should never have been able to return NULL
      
      * Patch 7 - "bpf: Add support for bpf_rb_root and bpf_rb_node in kfunc args"
        * is_bpf_datastructure_api_kfunc -> is_bpf_graph_api_kfunc (Alexei)
      
      * Patch 8 - "bpf: Add callback validation to kfunc verifier logic"
        * Explicitly disallow rbtree_remove in rbtree callback
        * Explicitly disallow bpf_spin_{lock,unlock} call in rbtree callback,
          preventing possibility of "unbalanced" unlock (Alexei)
      
      * Patch 10 - "bpf, x86: BPF_PROBE_MEM handling for insn->off < 0"
        * Now that non-owning refs aren't marked PTR_UNTRUSTED it's not necessary to
          include this patch as part of the series
        * After conversation w/ Alexei, did another pass and submitted as an
          independent series (lore.kernel.org/bpf/20221213182726.325137-1-davemarchevsky@fb.com/)
      
      * Patch 13 - "selftests/bpf: Add rbtree selftests"
        * Since bpf_rbtree_remove can no longer return null, remove null checks
        * Remove test confirming that rbtree_first isn't allowed in callback. We want
          this to be possible
        * Add failure test confirming that rbtree_remove's new non-owning reference
          invalidation behavior behaves as expected
        * Add SEC("license") to rbtree_btf_fail__* progs. They were previously
          failing due to lack of this section. Now they're failing for correct
          reasons.
        * rbtree_btf_fail__add_wrong_type.c - add locking around rbtree_add, rename
          the bpf prog to something reasonable
      
      * New patch added after patch 13 - "bpf, documentation: Add graph documentation for non-owning refs"
        * Summarizes details of owning and non-owning refs which we hashed out in
          v1
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c8ea0997
    • Dave Marchevsky's avatar
      bpf, documentation: Add graph documentation for non-owning refs · c31315c3
      Dave Marchevsky authored
      It is difficult to intuit the semantics of owning and non-owning
      references from verifier code. In order to keep the high-level details
      from being lost in the mailing list, this patch adds documentation
      explaining semantics and details.
      
      The target audience of doc added in this patch is folks working on BPF
      internals, as there's focus on "what should the verifier do here". Via
      reorganization or copy-and-paste, much of the content can probably be
      repurposed for BPF program writer audience as well.
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Link: https://lore.kernel.org/r/20230214004017.2534011-9-davemarchevsky@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c31315c3
    • Dave Marchevsky's avatar
      selftests/bpf: Add rbtree selftests · 215249f6
      Dave Marchevsky authored
      This patch adds selftests exercising the logic changed/added in the
      previous patches in the series. A variety of successful and unsuccessful
      rbtree usages are validated:
      
      Success:
        * Add some nodes, let map_value bpf_rbtree_root destructor clean them
          up
        * Add some nodes, remove one using the non-owning ref leftover by
          successful rbtree_add() call
        * Add some nodes, remove one using the non-owning ref returned by
          rbtree_first() call
      
      Failure:
        * BTF where bpf_rb_root owns bpf_list_node should fail to load
        * BTF where node of type X is added to tree containing nodes of type Y
          should fail to load
        * No calling rbtree api functions in 'less' callback for rbtree_add
        * No releasing lock in 'less' callback for rbtree_add
        * No removing a node which hasn't been added to any tree
        * No adding a node which has already been added to a tree
        * No escaping of non-owning references past their lock's
          critical section
        * No escaping of non-owning references past other invalidation points
          (rbtree_remove)
      
      These tests mostly focus on rbtree-specific additions, but some of the
      failure cases revalidate scenarios common to both linked_list and rbtree
      which are covered in the former's tests. Better to be a bit redundant in
      case linked_list and rbtree semantics deviate over time.
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Link: https://lore.kernel.org/r/20230214004017.2534011-8-davemarchevsky@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      215249f6
    • Dave Marchevsky's avatar
      c834df84
    • Dave Marchevsky's avatar
      bpf: Special verifier handling for bpf_rbtree_{remove, first} · a40d3632
      Dave Marchevsky authored
      Newly-added bpf_rbtree_{remove,first} kfuncs have some special properties
      that require handling in the verifier:
      
        * both bpf_rbtree_remove and bpf_rbtree_first return the type containing
          the bpf_rb_node field, with the offset set to that field's offset,
          instead of a struct bpf_rb_node *
          * mark_reg_graph_node helper added in previous patch generalizes
            this logic, use it
      
        * bpf_rbtree_remove's node input is a node that's been inserted
          in the tree - a non-owning reference.
      
        * bpf_rbtree_remove must invalidate non-owning references in order to
          avoid aliasing issue. Use previously-added
          invalidate_non_owning_refs helper to mark this function as a
          non-owning ref invalidation point.
      
        * Unlike other functions, which convert one of their input arg regs to
          non-owning reference, bpf_rbtree_first takes no arguments and just
          returns a non-owning reference (possibly null)
          * For now verifier logic for this is special-cased instead of
            adding new kfunc flag.
      
      This patch, along with the previous one, complete special verifier
      handling for all rbtree API functions added in this series.
      
      With functional verifier handling of rbtree_remove, under current
      non-owning reference scheme, a node type with both bpf_{list,rb}_node
      fields could cause the verifier to accept programs which remove such
      nodes from collections they haven't been added to.
      
      In order to prevent this, this patch adds a check to btf_parse_fields
      which rejects structs with both bpf_{list,rb}_node fields. This is a
      temporary measure that can be removed after "collection identity"
      followup. See comment added in btf_parse_fields. A linked_list BTF test
      exercising the new check is added in this patch as well.
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Link: https://lore.kernel.org/r/20230214004017.2534011-6-davemarchevsky@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      a40d3632
    • Dave Marchevsky's avatar
      bpf: Add callback validation to kfunc verifier logic · 5d92ddc3
      Dave Marchevsky authored
      Some BPF helpers take a callback function which the helper calls. For
      each helper that takes such a callback, there's a special call to
      __check_func_call with a callback-state-setting callback that sets up
      verifier bpf_func_state for the callback's frame.
      
      kfuncs don't have any of this infrastructure yet, so let's add it in
      this patch, following existing helper pattern as much as possible. To
      validate functionality of this added plumbing, this patch adds
      callback handling for the bpf_rbtree_add kfunc and hopes to lay
      groundwork for future graph datastructure callbacks.
      
      In the "general plumbing" category we have:
      
        * check_kfunc_call doing callback verification right before clearing
          CALLER_SAVED_REGS, exactly like check_helper_call
        * recognition of func_ptr BTF types in kfunc args as
          KF_ARG_PTR_TO_CALLBACK + propagation of subprogno for this arg type
      
      In the "rbtree_add / graph datastructure-specific plumbing" category:
      
        * Since bpf_rbtree_add must be called while the spin_lock associated
          with the tree is held, don't complain when callback's func_state
          doesn't unlock it by frame exit
        * Mark rbtree_add callback's args with ref_set_non_owning
          to prevent rbtree api functions from being called in the callback.
          Semantically this makes sense, as less() takes no ownership of its
          args when determining which comes first.
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Link: https://lore.kernel.org/r/20230214004017.2534011-5-davemarchevsky@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      5d92ddc3
    • Dave Marchevsky's avatar
      bpf: Add support for bpf_rb_root and bpf_rb_node in kfunc args · cd6791b4
      Dave Marchevsky authored
      Now that we find bpf_rb_root and bpf_rb_node in structs, let's give args
      that contain those types special classification and properly handle
      these types when checking kfunc args.
      
      "Properly handling" these types largely requires generalizing similar
      handling for bpf_list_{head,node}, with little new logic added in this
      patch.
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Link: https://lore.kernel.org/r/20230214004017.2534011-4-davemarchevsky@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      cd6791b4
    • Dave Marchevsky's avatar
      bpf: Add bpf_rbtree_{add,remove,first} kfuncs · bd1279ae
      Dave Marchevsky authored
      This patch adds implementations of bpf_rbtree_{add,remove,first}
      and teaches verifier about their BTF_IDs as well as those of
      bpf_rb_{root,node}.
      
      All three kfuncs have some nonstandard component to their verification
      that needs to be addressed in future patches before programs can
      properly use them:
      
        * bpf_rbtree_add:     Takes 'less' callback, need to verify it
      
        * bpf_rbtree_first:   Returns ptr_to_node_type(off=rb_node_off) instead
                              of ptr_to_rb_node(off=0). Return value ref is
      			non-owning.
      
        * bpf_rbtree_remove:  Returns ptr_to_node_type(off=rb_node_off) instead
                              of ptr_to_rb_node(off=0). 2nd arg (node) is a
      			non-owning reference.
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Link: https://lore.kernel.org/r/20230214004017.2534011-3-davemarchevsky@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      bd1279ae
    • Dave Marchevsky's avatar
      bpf: Add basic bpf_rb_{root,node} support · 9c395c1b
      Dave Marchevsky authored
      This patch adds special BPF_RB_{ROOT,NODE} btf_field_types similar to
      BPF_LIST_{HEAD,NODE}, adds the necessary plumbing to detect the new
      types, and adds bpf_rb_root_free function for freeing bpf_rb_root in
      map_values.
      
      structs bpf_rb_root and bpf_rb_node are opaque types meant to
      obscure structs rb_root_cached rb_node, respectively.
      
      btf_struct_access will prevent BPF programs from touching these special
      fields automatically now that they're recognized.
      
      btf_check_and_fixup_fields now groups list_head and rb_root together as
      "graph root" fields and {list,rb}_node as "graph node", and does same
      ownership cycle checking as before. Note that this function does _not_
      prevent ownership type mixups (e.g. rb_root owning list_node) - that's
      handled by btf_parse_graph_root.
      
      After this patch, a bpf program can have a struct bpf_rb_root in a
      map_value, but not add anything to nor do anything useful with it.
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Link: https://lore.kernel.org/r/20230214004017.2534011-2-davemarchevsky@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      9c395c1b
  3. 13 Feb, 2023 10 commits
    • Dave Marchevsky's avatar
      bpf: Migrate release_on_unlock logic to non-owning ref semantics · 6a3cd331
      Dave Marchevsky authored
      This patch introduces non-owning reference semantics to the verifier,
      specifically linked_list API kfunc handling. release_on_unlock logic for
      refs is refactored - with small functional changes - to implement these
      semantics, and bpf_list_push_{front,back} are migrated to use them.
      
      When a list node is pushed to a list, the program still has a pointer to
      the node:
      
        n = bpf_obj_new(typeof(*n));
      
        bpf_spin_lock(&l);
        bpf_list_push_back(&l, n);
        /* n still points to the just-added node */
        bpf_spin_unlock(&l);
      
      What the verifier considers n to be after the push, and thus what can be
      done with n, are changed by this patch.
      
      Common properties both before/after this patch:
        * After push, n is only a valid reference to the node until end of
          critical section
        * After push, n cannot be pushed to any list
        * After push, the program can read the node's fields using n
      
      Before:
        * After push, n retains the ref_obj_id which it received on
          bpf_obj_new, but the associated bpf_reference_state's
          release_on_unlock field is set to true
          * release_on_unlock field and associated logic is used to implement
            "n is only a valid ref until end of critical section"
        * After push, n cannot be written to, the node must be removed from
          the list before writing to its fields
        * After push, n is marked PTR_UNTRUSTED
      
      After:
        * After push, n's ref is released and ref_obj_id set to 0. NON_OWN_REF
          type flag is added to reg's type, indicating that it's a non-owning
          reference.
          * NON_OWN_REF flag and logic is used to implement "n is only a
            valid ref until end of critical section"
        * n can be written to (except for special fields e.g. bpf_list_node,
          timer, ...)
      
      Summary of specific implementation changes to achieve the above:
      
        * release_on_unlock field, ref_set_release_on_unlock helper, and logic
          to "release on unlock" based on that field are removed
      
        * The anonymous active_lock struct used by bpf_verifier_state is
          pulled out into a named struct bpf_active_lock.
      
        * NON_OWN_REF type flag is introduced along with verifier logic
          changes to handle non-owning refs
      
        * Helpers are added to use NON_OWN_REF flag to implement non-owning
          ref semantics as described above
          * invalidate_non_owning_refs - helper to clobber all non-owning refs
            matching a particular bpf_active_lock identity. Replaces
            release_on_unlock logic in process_spin_lock.
          * ref_set_non_owning - set NON_OWN_REF type flag after doing some
            sanity checking
          * ref_convert_owning_non_owning - convert owning reference w/
            specified ref_obj_id to non-owning references. Set NON_OWN_REF
            flag for each reg with that ref_obj_id and 0-out its ref_obj_id
      
        * Update linked_list selftests to account for minor semantic
          differences introduced by this patch
          * Writes to a release_on_unlock node ref are not allowed, while
            writes to non-owning reference pointees are. As a result the
            linked_list "write after push" failure tests are no longer scenarios
            that should fail.
          * The test##missing_lock##op and test##incorrect_lock##op
            macro-generated failure tests need to have a valid node argument in
            order to have the same error output as before. Otherwise
            verification will fail early and the expected error output won't be seen.
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Link: https://lore.kernel.org/r/20230212092715.1422619-2-davemarchevsky@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      6a3cd331
    • Daniel Borkmann's avatar
      Merge branch 'xdp-ice-mbuf' · 39c536ac
      Daniel Borkmann authored
      Alexander Lobakin says:
      
      ====================
      The set grew from the poor performance of %BPF_F_TEST_XDP_LIVE_FRAMES
      when the ice-backed device is a sender. Initially there were around
      3.3 Mpps / thread, while I have 5.5 on skb-based pktgen ...
      
      After fixing 0005 (0004 is a prereq for it) first (strange thing nobody
      noticed that earlier), I started catching random OOMs. This is how 0002
      (and partially 0001) appeared.
      
      0003 is a suggestion from Maciej to not waste time on refactoring dead
      lines. 0006 is a "cherry on top" to get away with the final 6.7 Mpps.
      4.5 of 6 are fixes, but only the first three are tagged, since it then
      starts being tricky. I may backport them manually later on.
      
      TL;DR for the series is that shortcuts are good, but only as long as
      they don't make the driver miss important things. %XDP_TX is purely
      driver-local, however .ndo_xdp_xmit() is not, and sometimes assumptions
      can be unsafe there.
      
      With that series and also one core code patch[0], "live frames" and
      xdp-trafficgen are now safe'n'fast on ice (probably more to come).
      
        [0] https://lore.kernel.org/all/20230209172827.874728-1-alexandr.lobakin@intel.com
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      39c536ac
    • Alexander Lobakin's avatar
      ice: Micro-optimize .ndo_xdp_xmit() path · ad07f29b
      Alexander Lobakin authored
      After the recent mbuf changes, ice_xmit_xdp_ring() became a 3-liner.
      It makes no sense to keep it global in a different file than its caller.
      Move it just next to the sole call site and mark static. Also, it
      doesn't need a full xdp_convert_frame_to_buff(). Save several cycles
      and fill only the fields used by __ice_xmit_xdp_ring() later on.
      Finally, since it doesn't modify @xdpf anyhow, mark the argument const
      to save some more (whole -11 bytes of .text! :D).
      
      Thanks to 1 jump less and less calcs as well, this yields as many as
      6.7 Mpps per queue. `xdp.data_hard_start = xdpf` is fully intentional
      again (see xdp_convert_buff_to_frame()) and just works when there are
      no source device's driver issues.
      Signed-off-by: default avatarAlexander Lobakin <alexandr.lobakin@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Link: https://lore.kernel.org/bpf/20230210170618.1973430-7-alexandr.lobakin@intel.com
      ad07f29b
    • Alexander Lobakin's avatar
      ice: Fix freeing XDP frames backed by Page Pool · 055d0920
      Alexander Lobakin authored
      As already mentioned, freeing any &xdp_frame via page_frag_free() is
      wrong, as it assumes the frame is backed by either an order-0 page or
      a page with no "patrons" behind them, while in fact frames backed by
      Page Pool can be redirected to a device, which's driver doesn't use it.
      Keep storing a pointer to the raw buffer and then freeing it
      unconditionally via page_frag_free() for %XDP_TX frames, but introduce
      a separate type in the enum for frames coming through .ndo_xdp_xmit(),
      and free them via xdp_return_frame_bulk(). Note that saving xdpf as
      xdp_buff->data_hard_start is intentional and is always true when
      everything is configured properly.
      After this change, %XDP_REDIRECT from a Page Pool based driver to ice
      becomes zero-alloc as it should be and horrendous 3.3 Mpps / queue
      turn into 6.6, hehe.
      
      Let it go with no "Fixes:" tag as it spans across good 5+ commits and
      can't be trivially backported.
      Signed-off-by: default avatarAlexander Lobakin <alexandr.lobakin@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Link: https://lore.kernel.org/bpf/20230210170618.1973430-6-alexandr.lobakin@intel.com
      055d0920
    • Alexander Lobakin's avatar
      ice: Robustify cleaning/completing XDP Tx buffers · aa1d3faf
      Alexander Lobakin authored
      When queueing frames from a Page Pool for redirecting to a device backed
      by the ice driver, `perf top` shows heavy load on page_alloc() and
      page_frag_free(), despite that on a properly working system it must be
      fully or at least almost zero-alloc. The problem is in fact a bit deeper
      and raises from how ice cleans up completed Tx buffers.
      
      The story so far: when cleaning/freeing the resources related to
      a particular completed Tx frame (skbs, DMA mappings etc.), ice uses some
      heuristics only without setting any type explicitly (except for dummy
      Flow Director packets, which are marked via ice_tx_buf::tx_flags).
      This kinda works, but only up to some point. For example, currently ice
      assumes that each frame coming to __ice_xmit_xdp_ring(), is backed by
      either plain order-0 page or plain page frag, while it may also be
      backed by Page Pool or any other possible memory models introduced in
      future. This means any &xdp_frame must be freed properly via
      xdp_return_frame() family with no assumptions.
      
      In order to do that, the whole heuristics must be replaced with setting
      the Tx buffer/frame type explicitly, just how it's always been done via
      an enum. Let us reuse 16 bits from ::tx_flags -- 1 bit-and instr won't
      hurt much -- especially given that sometimes there was a check for
      %ICE_TX_FLAGS_DUMMY_PKT, which is now turned from a flag to an enum
      member. The rest of the changes is straightforward and most of it is
      just a conversion to rely now on the type set in &ice_tx_buf rather than
      to some secondary properties.
      For now, no functional changes intended, the change only prepares the
      ground for starting freeing XDP frames properly next step. And it must
      be done atomically/synchronously to not break stuff.
      Signed-off-by: default avatarAlexander Lobakin <alexandr.lobakin@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Link: https://lore.kernel.org/bpf/20230210170618.1973430-5-alexandr.lobakin@intel.com
      aa1d3faf
    • Alexander Lobakin's avatar
      ice: Remove two impossible branches on XDP Tx cleaning · 923096b5
      Alexander Lobakin authored
      The tagged commit started sending %XDP_TX frames from XSk Rx ring
      directly without converting it to an &xdp_frame. However, when XSk is
      enabled on a queue pair, it has its separate Tx cleaning functions, so
      neither ice_clean_xdp_irq() nor ice_unmap_and_free_tx_buf() ever happens
      there.
      Remove impossible branches in order to reduce the diffstat of the
      upcoming change.
      
      Fixes: a24b4c6e ("ice: xsk: Do not convert to buff to frame for XDP_TX")
      Suggested-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: default avatarAlexander Lobakin <alexandr.lobakin@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Link: https://lore.kernel.org/bpf/20230210170618.1973430-4-alexandr.lobakin@intel.com
      923096b5
    • Alexander Lobakin's avatar
      ice: Fix XDP Tx ring overrun · 0bd939b6
      Alexander Lobakin authored
      Sometimes, under heavy XDP Tx traffic, e.g. when using XDP traffic
      generator (%BPF_F_TEST_XDP_LIVE_FRAMES), the machine can catch OOM due
      to the driver not freeing all of the pages passed to it by
      .ndo_xdp_xmit().
      Turned out that during the development of the tagged commit, the check,
      which ensures that we have a free descriptor to queue a frame, moved
      into the branch happening only when a buffer has frags. Otherwise, we
      only run a cleaning cycle, but don't check anything.
      ATST, there can be situations when the driver gets new frames to send,
      but there are no buffers that can be cleaned/completed and the ring has
      no free slots. It's very rare, but still possible (> 6.5 Mpps per ring).
      The driver then fills the next buffer/descriptor, effectively
      overwriting the data, which still needs to be freed.
      
      Restore the check after the cleaning routine to make sure there is a
      slot to queue a new frame. When there are frags, there still will be a
      separate check that we can place all of them, but if the ring is full,
      there's no point in wasting any more time.
      
      (minor: make `!ready_frames` unlikely since it happens ~1-2 times per
       billion of frames)
      
      Fixes: 3246a107 ("ice: Add support for XDP multi-buffer on Tx side")
      Signed-off-by: default avatarAlexander Lobakin <alexandr.lobakin@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Link: https://lore.kernel.org/bpf/20230210170618.1973430-3-alexandr.lobakin@intel.com
      0bd939b6
    • Alexander Lobakin's avatar
      ice: fix ice_tx_ring:: Xdp_tx_active underflow · bc4db834
      Alexander Lobakin authored
      xdp_tx_active is used to indicate whether an XDP ring has any %XDP_TX
      frames queued to shortcut processing Tx cleaning for XSk-enabled queues.
      When !XSk, it simply indicates whether the ring has any queued frames in
      general.
      It gets increased on each frame placed onto the ring and counts the
      whole frame, not each frag. However, currently it gets decremented in
      ice_clean_xdp_tx_buf(), which is called per each buffer, i.e. per each
      frag. Thus, on completing multi-frag frames, an underflow happens.
      Move the decrement to the outer function and do it once per frame, not
      buf. Also, do that on the stack and update the ring counter after the
      loop is done to save several cycles.
      XSk rings are fine since there are no frags at the moment.
      
      Fixes: 3246a107 ("ice: Add support for XDP multi-buffer on Tx side")
      Signed-off-by: default avatarAlexander Lobakin <alexandr.lobakin@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Link: https://lore.kernel.org/bpf/20230210170618.1973430-2-alexandr.lobakin@intel.com
      bc4db834
    • Ilya Leoshkevich's avatar
      selftests/bpf: Fix out-of-srctree build · 0b075724
      Ilya Leoshkevich authored
      Building BPF selftests out of srctree fails with:
      
        make: *** No rule to make target '/linux-build//ima_setup.sh', needed by 'ima_setup.sh'.  Stop.
      
      The culprit is the rule that defines convenient shorthands like
      "make test_progs", which builds $(OUTPUT)/test_progs. These shorthands
      make sense only for binaries that are built though; scripts that live
      in the source tree do not end up in $(OUTPUT).
      
      Therefore drop $(TEST_PROGS) and $(TEST_PROGS_EXTENDED) from the rule.
      
      The issue exists for a while, but it became a problem only after commit
      d68ae498 ("selftests/bpf: Install all required files to run selftests"),
      which added dependencies on these scripts.
      
      Fixes: 03dcb784 ("selftests/bpf: Add simple per-test targets to Makefile")
      Signed-off-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20230208231211.283606-1-iii@linux.ibm.com
      0b075724
    • Alan Maguire's avatar
      bpf: Add --skip_encoding_btf_inconsistent_proto, --btf_gen_optimized to pahole flags for v1.25 · 0243d3df
      Alan Maguire authored
      v1.25 of pahole supports filtering out functions with multiple inconsistent
      function prototypes or optimized-out parameters from the BTF representation.
      These present problems because there is no additional info in BTF saying which
      inconsistent prototype matches which function instance to help guide attachment,
      and functions with optimized-out parameters can lead to incorrect assumptions
      about register contents.
      
      So for now, filter out such functions while adding BTF representations for
      functions that have "."-suffixes (foo.isra.0) but not optimized-out parameters.
      This patch assumes that below linked changes land in pahole for v1.25.
      Signed-off-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: https://lore.kernel.org/bpf/1675790102-23037-1-git-send-email-alan.maguire@oracle.com
      Link: https://lore.kernel.org/bpf/1675949331-27935-1-git-send-email-alan.maguire@oracle.com
      0243d3df
  4. 11 Feb, 2023 13 commits