1. 27 Apr, 2018 17 commits
  2. 26 Apr, 2018 23 commits
    • Wang Sheng-Hui's avatar
      samples, bpf: remove redundant ret assignment in bpf_load_program() · c0885f61
      Wang Sheng-Hui authored
      2 redundant ret assignments removed:
      
      * 'ret = 1' before the logic 'if (data_maps)', and if any errors jump to
        label 'done'. No 'ret = 1' needed before the error jump.
      
      * After the '/* load programs */' part, if everything goes well, then
        the BPF code will be loaded and 'ret' set to 0 by load_and_attach().
        If something goes wrong, 'ret' set to none-O, the redundant 'ret = 0'
        after the for clause will make the error skipped.
      
        For example, if some BPF code cannot provide supported program types
        in ELF SEC("unknown"), the for clause will not call load_and_attach()
        to load the BPF code. 1 should be returned to callees instead of 0.
      Signed-off-by: default avatarWang Sheng-Hui <shhuiw@foxmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      c0885f61
    • Daniel Borkmann's avatar
      Merge branch 'bpf-uapi-helper-doc' · a6712d45
      Daniel Borkmann authored
      Quentin Monnet says:
      
      ====================
      eBPF helper functions can be called from within eBPF programs to perform
      a variety of tasks that would be otherwise hard or impossible to do with
      eBPF itself. There is a growing number of such helper functions in the
      kernel, but documentation is scarce. The main user space header file
      does contain a short commented description of most helpers, but it is
      somewhat outdated and not complete. It is more a "cheat sheet" than a
      real documentation accessible to new eBPF developers.
      
      This commit attempts to improve the situation by replacing the existing
      overview for the helpers with a more developed description. Furthermore,
      a Python script is added to generate a manual page for eBPF helpers. The
      workflow is the following, and requires the rst2man utility:
      
          $ ./scripts/bpf_helpers_doc.py \
                  --filename include/uapi/linux/bpf.h > /tmp/bpf-helpers.rst
          $ rst2man /tmp/bpf-helpers.rst > /tmp/bpf-helpers.7
          $ man /tmp/bpf-helpers.7
      
      The objective is to keep all documentation related to the helpers in a
      single place, and to be able to generate from here a manual page that
      could be packaged in the man-pages repository and shipped with most
      distributions.
      
      Additionally, parsing the prototypes of the helper functions could
      hopefully be reused, with a different Printer object, to generate
      header files needed in some eBPF-related projects.
      
      Regarding the description of each helper, it comprises several items:
      
      - The function prototype.
      - A description of the function and of its arguments (except for a
        couple of cases, when there are no arguments and the return value
        makes the function usage really obvious).
      - A description of return values (if not void).
      
      Additional items such as the list of compatible eBPF program and map
      types for each helper, Linux kernel version that introduced the helper,
      GPL-only restriction, and commit hash could be added in the future, but
      it was decided on the mailing list to leave them aside for now.
      
      For several helpers, descriptions are inspired (at times, nearly copied)
      from the commit logs introducing them in the kernel--Many thanks to
      their respective authors! Some sentences were also adapted from comments
      from the reviews, thanks to the reviewers as well. Descriptions were
      completed as much as possible, the objective being to have something easily
      accessible even for people just starting with eBPF. There is probably a bit
      more work to do in this direction for some helpers.
      
      Some RST formatting is used in the descriptions (not in function
      prototypes, to keep them readable, but the Python script provided in
      order to generate the RST for the manual page does add formatting to
      prototypes, to produce something pretty) to get "bold" and "italics" in
      manual pages. Hopefully, the descriptions in bpf.h file remains
      perfectly readable. Note that the few trailing white spaces are
      intentional, removing them would break paragraphs for rst2man.
      
      The descriptions should ideally be updated each time someone adds a new
      helper, or updates the behaviour (new socket option supported, ...) or
      the interface (new flags available, ...) of existing ones.
      
      To ease the review process, the documentation has been split into several
      patches.
      
      v3 -> v4:
      - Add a patch (#9) for newly added BPF helpers.
      - Add a patch (#10) to update UAPI bpf.h version under tools/.
      - Use SPDX tag in Python script.
      - Several fixes on man page header and footer, and helpers documentation.
        Please refer to individual patches for details.
      
      RFC v2 -> PATCH v3:
      Several fixes on man page header and footer, and helpers documentation.
      Please refer to individual patches for details.
      
      RFC v1 -> RFC v2:
      - Remove "For" (compatible program and map types), "Since" (minimal
        Linux kernel version required), "GPL only" sections and commit hashes
        for the helpers.
      - Add comment on top of the description list to explain how this
        documentation is supposed to be processed.
      - Update Python script accordingly (remove the same sections, and remove
        paragraphs on program types and GPL restrictions from man page
        header).
      - Split series into several patches.
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: linux-doc@vger.kernel.org
      Cc: linux-man@vger.kernel.org
      a6712d45
    • Quentin Monnet's avatar
      bpf: update bpf.h uapi header for tools · 9cde0c88
      Quentin Monnet authored
      Update tools/include/uapi/linux/bpf.h file in order to reflect the
      changes for BPF helper functions documentation introduced in previous
      commits.
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      9cde0c88
    • Quentin Monnet's avatar
      bpf: add documentation for eBPF helpers (65-66) · 2d020dd7
      Quentin Monnet authored
      Add documentation for eBPF helper functions to bpf.h user header file.
      This documentation can be parsed with the Python script provided in
      another commit of the patch series, in order to provide a RST document
      that can later be converted into a man page.
      
      The objective is to make the documentation easily understandable and
      accessible to all eBPF developers, including beginners.
      
      This patch contains descriptions for the following helper functions:
      
      Helper from Nikita:
      - bpf_xdp_adjust_tail()
      
      Helper from Eyal:
      - bpf_skb_get_xfrm_state()
      
      v4:
      - New patch (helpers did not exist yet for previous versions).
      
      Cc: Nikita V. Shirokov <tehnerd@tehnerd.com>
      Cc: Eyal Birger <eyal.birger@gmail.com>
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      2d020dd7
    • Quentin Monnet's avatar
      bpf: add documentation for eBPF helpers (58-64) · ab127040
      Quentin Monnet authored
      Add documentation for eBPF helper functions to bpf.h user header file.
      This documentation can be parsed with the Python script provided in
      another commit of the patch series, in order to provide a RST document
      that can later be converted into a man page.
      
      The objective is to make the documentation easily understandable and
      accessible to all eBPF developers, including beginners.
      
      This patch contains descriptions for the following helper functions, all
      written by John:
      
      - bpf_redirect_map()
      - bpf_sk_redirect_map()
      - bpf_sock_map_update()
      - bpf_msg_redirect_map()
      - bpf_msg_apply_bytes()
      - bpf_msg_cork_bytes()
      - bpf_msg_pull_data()
      
      v4:
      - bpf_redirect_map(): Fix typos: "XDP_ABORT" changed to "XDP_ABORTED",
        "his" to "this". Also add a paragraph on performance improvement over
        bpf_redirect() helper.
      
      v3:
      - bpf_sk_redirect_map(): Improve description of BPF_F_INGRESS flag.
      - bpf_msg_redirect_map(): Improve description of BPF_F_INGRESS flag.
      - bpf_redirect_map(): Fix note on CPU redirection, not fully implemented
        for generic XDP but supported on native XDP.
      - bpf_msg_pull_data(): Clarify comment about invalidated verifier
        checks.
      
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      ab127040
    • Quentin Monnet's avatar
      bpf: add documentation for eBPF helpers (51-57) · 7aa79a86
      Quentin Monnet authored
      Add documentation for eBPF helper functions to bpf.h user header file.
      This documentation can be parsed with the Python script provided in
      another commit of the patch series, in order to provide a RST document
      that can later be converted into a man page.
      
      The objective is to make the documentation easily understandable and
      accessible to all eBPF developers, including beginners.
      
      This patch contains descriptions for the following helper functions:
      
      Helpers from Lawrence:
      - bpf_setsockopt()
      - bpf_getsockopt()
      - bpf_sock_ops_cb_flags_set()
      
      Helpers from Yonghong:
      - bpf_perf_event_read_value()
      - bpf_perf_prog_read_value()
      
      Helper from Josef:
      - bpf_override_return()
      
      Helper from Andrey:
      - bpf_bind()
      
      v4:
      - bpf_perf_event_read_value(): State that this helper should be
        preferred over bpf_perf_event_read().
      
      v3:
      - bpf_perf_event_read_value(): Fix time of selection for perf event type
        in description. Remove occurences of "cores" to avoid confusion with
        "CPU".
      - bpf_bind(): Remove last paragraph of description, which was off topic.
      
      Cc: Lawrence Brakmo <brakmo@fb.com>
      Cc: Yonghong Song <yhs@fb.com>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: Andrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      [for bpf_perf_event_read_value(), bpf_perf_prog_read_value()]
      Acked-by: default avatarAndrey Ignatov <rdna@fb.com>
      [for bpf_bind()]
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      7aa79a86
    • Quentin Monnet's avatar
      bpf: add documentation for eBPF helpers (42-50) · c6b5fb86
      Quentin Monnet authored
      Add documentation for eBPF helper functions to bpf.h user header file.
      This documentation can be parsed with the Python script provided in
      another commit of the patch series, in order to provide a RST document
      that can later be converted into a man page.
      
      The objective is to make the documentation easily understandable and
      accessible to all eBPF developers, including beginners.
      
      This patch contains descriptions for the following helper functions:
      
      Helper from Kaixu:
      - bpf_perf_event_read()
      
      Helpers from Martin:
      - bpf_skb_under_cgroup()
      - bpf_xdp_adjust_head()
      
      Helpers from Sargun:
      - bpf_probe_write_user()
      - bpf_current_task_under_cgroup()
      
      Helper from Thomas:
      - bpf_skb_change_head()
      
      Helper from Gianluca:
      - bpf_probe_read_str()
      
      Helpers from Chenbo:
      - bpf_get_socket_cookie()
      - bpf_get_socket_uid()
      
      v4:
      - bpf_perf_event_read(): State that bpf_perf_event_read_value() should
        be preferred over this helper.
      - bpf_skb_change_head(): Clarify comment about invalidated verifier
        checks.
      - bpf_xdp_adjust_head(): Clarify comment about invalidated verifier
        checks.
      - bpf_probe_write_user(): Add that dst must be a valid user space
        address.
      - bpf_get_socket_cookie(): Improve description by making clearer that
        the cockie belongs to the socket, and state that it remains stable for
        the life of the socket.
      
      v3:
      - bpf_perf_event_read(): Fix time of selection for perf event type in
        description. Remove occurences of "cores" to avoid confusion with
        "CPU".
      
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Sargun Dhillon <sargun@sargun.me>
      Cc: Thomas Graf <tgraf@suug.ch>
      Cc: Gianluca Borello <g.borello@gmail.com>
      Cc: Chenbo Feng <fengc@google.com>
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      [for bpf_skb_under_cgroup(), bpf_xdp_adjust_head()]
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      c6b5fb86
    • Quentin Monnet's avatar
      bpf: add documentation for eBPF helpers (33-41) · fa15601a
      Quentin Monnet authored
      Add documentation for eBPF helper functions to bpf.h user header file.
      This documentation can be parsed with the Python script provided in
      another commit of the patch series, in order to provide a RST document
      that can later be converted into a man page.
      
      The objective is to make the documentation easily understandable and
      accessible to all eBPF developers, including beginners.
      
      This patch contains descriptions for the following helper functions, all
      written by Daniel:
      
      - bpf_get_hash_recalc()
      - bpf_skb_change_tail()
      - bpf_skb_pull_data()
      - bpf_csum_update()
      - bpf_set_hash_invalid()
      - bpf_get_numa_node_id()
      - bpf_set_hash()
      - bpf_skb_adjust_room()
      - bpf_xdp_adjust_meta()
      
      v4:
      - bpf_skb_change_tail(): Clarify comment about invalidated verifier
        checks.
      - bpf_skb_pull_data(): Clarify the motivation for using this helper or
        bpf_skb_load_bytes(), on non-linear buffers. Fix RST formatting for
        *skb*. Clarify comment about invalidated verifier checks.
      - bpf_csum_update(): Fix description of checksum (entire packet, not IP
        checksum). Fix a typo: "header" instead of "helper".
      - bpf_set_hash_invalid(): Mention bpf_get_hash_recalc().
      - bpf_get_numa_node_id(): State that the helper is not restricted to
        programs attached to sockets.
      - bpf_skb_adjust_room(): Clarify comment about invalidated verifier
        checks.
      - bpf_xdp_adjust_meta(): Clarify comment about invalidated verifier
        checks.
      
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      fa15601a
    • Quentin Monnet's avatar
      bpf: add documentation for eBPF helpers (23-32) · 1fdd08be
      Quentin Monnet authored
      Add documentation for eBPF helper functions to bpf.h user header file.
      This documentation can be parsed with the Python script provided in
      another commit of the patch series, in order to provide a RST document
      that can later be converted into a man page.
      
      The objective is to make the documentation easily understandable and
      accessible to all eBPF developers, including beginners.
      
      This patch contains descriptions for the following helper functions, all
      written by Daniel:
      
      - bpf_get_prandom_u32()
      - bpf_get_smp_processor_id()
      - bpf_get_cgroup_classid()
      - bpf_get_route_realm()
      - bpf_skb_load_bytes()
      - bpf_csum_diff()
      - bpf_skb_get_tunnel_opt()
      - bpf_skb_set_tunnel_opt()
      - bpf_skb_change_proto()
      - bpf_skb_change_type()
      
      v4:
      - bpf_get_prandom_u32(): Warn that the prng is not cryptographically
        secure.
      - bpf_get_smp_processor_id(): Fix a typo (case).
      - bpf_get_cgroup_classid(): Clarify description. Add notes on the helper
        being limited to cgroup v1, and to egress path.
      - bpf_get_route_realm(): Add comparison with bpf_get_cgroup_classid().
        Add a note about usage with TC and advantage of clsact. Fix a typo in
        return value ("sdb" instead of "skb").
      - bpf_skb_load_bytes(): Make explicit loading large data loads it to the
        eBPF stack.
      - bpf_csum_diff(): Add a note on seed that can be cascaded. Link to
        bpf_l3|l4_csum_replace().
      - bpf_skb_get_tunnel_opt(): Add a note about usage with "collect
        metadata" mode, and example of this with Geneve.
      - bpf_skb_set_tunnel_opt(): Add a link to bpf_skb_get_tunnel_opt()
        description.
      - bpf_skb_change_proto(): Mention that the main use case is NAT64.
        Clarify comment about invalidated verifier checks.
      
      v3:
      - bpf_get_prandom_u32(): Fix helper name :(. Add description, including
        a note on the internal random state.
      - bpf_get_smp_processor_id(): Add description, including a note on the
        processor id remaining stable during program run.
      - bpf_get_cgroup_classid(): State that CONFIG_CGROUP_NET_CLASSID is
        required to use the helper. Add a reference to related documentation.
        State that placing a task in net_cls controller disables cgroup-bpf.
      - bpf_get_route_realm(): State that CONFIG_CGROUP_NET_CLASSID is
        required to use this helper.
      - bpf_skb_load_bytes(): Fix comment on current use cases for the helper.
      
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      1fdd08be
    • Quentin Monnet's avatar
      bpf: add documentation for eBPF helpers (12-22) · c456dec4
      Quentin Monnet authored
      Add documentation for eBPF helper functions to bpf.h user header file.
      This documentation can be parsed with the Python script provided in
      another commit of the patch series, in order to provide a RST document
      that can later be converted into a man page.
      
      The objective is to make the documentation easily understandable and
      accessible to all eBPF developers, including beginners.
      
      This patch contains descriptions for the following helper functions, all
      written by Alexei:
      
      - bpf_get_current_pid_tgid()
      - bpf_get_current_uid_gid()
      - bpf_get_current_comm()
      - bpf_skb_vlan_push()
      - bpf_skb_vlan_pop()
      - bpf_skb_get_tunnel_key()
      - bpf_skb_set_tunnel_key()
      - bpf_redirect()
      - bpf_perf_event_output()
      - bpf_get_stackid()
      - bpf_get_current_task()
      
      v4:
      - bpf_redirect(): Fix typo: "XDP_ABORT" changed to "XDP_ABORTED". Add
        note on bpf_redirect_map() providing better performance. Replace "Save
        for" with "Except for".
      - bpf_skb_vlan_push(): Clarify comment about invalidated verifier
        checks.
      - bpf_skb_vlan_pop(): Clarify comment about invalidated verifier
        checks.
      - bpf_skb_get_tunnel_key(): Add notes on tunnel_id, "collect metadata"
        mode, and example tunneling protocols with which it can be used.
      - bpf_skb_set_tunnel_key(): Add a reference to the description of
        bpf_skb_get_tunnel_key().
      - bpf_perf_event_output(): Specify that, and for what purpose, the
        helper can be used with programs attached to TC and XDP.
      
      v3:
      - bpf_skb_get_tunnel_key(): Change and improve description and example.
      - bpf_redirect(): Improve description of BPF_F_INGRESS flag.
      - bpf_perf_event_output(): Fix first sentence of description. Delete
        wrong statement on context being evaluated as a struct pt_reg. Remove
        the long yet incomplete example.
      - bpf_get_stackid(): Add a note about PERF_MAX_STACK_DEPTH being
        configurable.
      
      Cc: Alexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      c456dec4
    • Quentin Monnet's avatar
      bpf: add documentation for eBPF helpers (01-11) · ad4a5223
      Quentin Monnet authored
      Add documentation for eBPF helper functions to bpf.h user header file.
      This documentation can be parsed with the Python script provided in
      another commit of the patch series, in order to provide a RST document
      that can later be converted into a man page.
      
      The objective is to make the documentation easily understandable and
      accessible to all eBPF developers, including beginners.
      
      This patch contains descriptions for the following helper functions, all
      written by Alexei:
      
      - bpf_map_lookup_elem()
      - bpf_map_update_elem()
      - bpf_map_delete_elem()
      - bpf_probe_read()
      - bpf_ktime_get_ns()
      - bpf_trace_printk()
      - bpf_skb_store_bytes()
      - bpf_l3_csum_replace()
      - bpf_l4_csum_replace()
      - bpf_tail_call()
      - bpf_clone_redirect()
      
      v4:
      - bpf_map_lookup_elem(): Add "const" qualifier for key.
      - bpf_map_update_elem(): Add "const" qualifier for key and value.
      - bpf_map_lookup_elem(): Add "const" qualifier for key.
      - bpf_skb_store_bytes(): Clarify comment about invalidated verifier
        checks.
      - bpf_l3_csum_replace(): Mention L3 instead of just IP, and add a note
        about bpf_csum_diff().
      - bpf_l4_csum_replace(): Mention L4 instead of just TCP/UDP, and add a
        note about bpf_csum_diff().
      - bpf_tail_call(): Bring minor edits to description.
      - bpf_clone_redirect(): Add a note about the relation with
        bpf_redirect(). Also clarify comment about invalidated verifier
        checks.
      
      v3:
      - bpf_map_lookup_elem(): Fix description of restrictions for flags
        related to the existence of the entry.
      - bpf_trace_printk(): State that trace_pipe can be configured. Fix
        return value in case an unknown format specifier is met. Add a note on
        kernel log notice when the helper is used. Edit example.
      - bpf_tail_call(): Improve comment on stack inheritance.
      - bpf_clone_redirect(): Improve description of BPF_F_INGRESS flag.
      
      Cc: Alexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      ad4a5223
    • Quentin Monnet's avatar
      bpf: add script and prepare bpf.h for new helpers documentation · 56a092c8
      Quentin Monnet authored
      Remove previous "overview" of eBPF helpers from user bpf.h header.
      Replace it by a comment explaining how to process the new documentation
      (to come in following patches) with a Python script to produce RST, then
      man page documentation.
      
      Also add the aforementioned Python script under scripts/. It is used to
      process include/uapi/linux/bpf.h and to extract helper descriptions, to
      turn it into a RST document that can further be processed with rst2man
      to produce a man page. The script takes one "--filename <path/to/file>"
      option. If the script is launched from scripts/ in the kernel root
      directory, it should be able to find the location of the header to
      parse, and "--filename <path/to/file>" is then optional. If it cannot
      find the file, then the option becomes mandatory. RST-formatted
      documentation is printed to standard output.
      
      Typical workflow for producing the final man page would be:
      
          $ ./scripts/bpf_helpers_doc.py \
                  --filename include/uapi/linux/bpf.h > /tmp/bpf-helpers.rst
          $ rst2man /tmp/bpf-helpers.rst > /tmp/bpf-helpers.7
          $ man /tmp/bpf-helpers.7
      
      Note that the tool kernel-doc cannot be used to document eBPF helpers,
      whose signatures are not available directly in the header files
      (pre-processor directives are used to produce them at the beginning of
      the compilation process).
      
      v4:
      - Also remove overviews for newly added bpf_xdp_adjust_tail() and
        bpf_skb_get_xfrm_state().
      - Remove vague statement about what helpers are restricted to GPL
        programs in "LICENSE" section for man page footer.
      - Replace license boilerplate with SPDX tag for Python script.
      
      v3:
      - Change license for man page.
      - Remove "for safety reasons" from man page header text.
      - Change "packets metadata" to "packets" in man page header text.
      - Move and fix comment on helpers introducing no overhead.
      - Remove "NOTES" section from man page footer.
      - Add "LICENSE" section to man page footer.
      - Edit description of file include/uapi/linux/bpf.h in man page footer.
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      56a092c8
    • Daniel Borkmann's avatar
      Merge branch 'bpf-tunnel-metadata-selftests' · 3f13de6d
      Daniel Borkmann authored
      William Tu says:
      
      ====================
      The patch series provide end-to-end eBPF tunnel testsute.  A common topology
      is created below for all types of tunnels:
      
      Topology:
      ---------
           root namespace   |     at_ns0 namespace
                            |
            -----------     |     -----------
            | tnl dev |     |     | tnl dev |  (overlay network)
            -----------     |     -----------
            metadata-mode   |     native-mode
             with bpf       |
                            |
            ----------      |     ----------
            |  veth1  | --------- |  veth0  |  (underlay network)
            ----------    peer    ----------
      
      Device Configuration
      --------------------
       Root namespace with metadata-mode tunnel + BPF
       Device names and addresses:
             veth1 IP: 172.16.1.200, IPv6: 00::22 (underlay)
             tunnel dev <type>11, ex: gre11, IPv4: 10.1.1.200 (overlay)
      
       Namespace at_ns0 with native tunnel
       Device names and addresses:
             veth0 IPv4: 172.16.1.100, IPv6: 00::11 (underlay)
             tunnel dev <type>00, ex: gre00, IPv4: 10.1.1.100 (overlay)
      
      End-to-end ping packet flow
      ---------------------------
       Most of the tests start by namespace creation, device configuration,
       then ping the underlay and overlay network.  When doing 'ping 10.1.1.100'
       from root namespace, the following operations happen:
       1) Route lookup shows 10.1.1.100/24 belongs to tnl dev, fwd to tnl dev.
       2) Tnl device's egress BPF program is triggered and set the tunnel metadata,
          with remote_ip=172.16.1.200 and others.
       3) Outer tunnel header is prepended and route the packet to veth1's egress
       4) veth0's ingress queue receive the tunneled packet at namespace at_ns0
       5) Tunnel protocol handler, ex: vxlan_rcv, decap the packet
       6) Forward the packet to the overlay tnl dev
      
      Test Cases
      -----------------------------
       Tunnel Type |  BPF Programs
      -----------------------------
       GRE:          gre_set_tunnel, gre_get_tunnel
       IP6GRE:       ip6gretap_set_tunnel, ip6gretap_get_tunnel
       ERSPAN:       erspan_set_tunnel, erspan_get_tunnel
       IP6ERSPAN:    ip4ip6erspan_set_tunnel, ip4ip6erspan_get_tunnel
       VXLAN:        vxlan_set_tunnel, vxlan_get_tunnel
       IP6VXLAN:     ip6vxlan_set_tunnel, ip6vxlan_get_tunnel
       GENEVE:       geneve_set_tunnel, geneve_get_tunnel
       IP6GENEVE:    ip6geneve_set_tunnel, ip6geneve_get_tunnel
       IPIP:         ipip_set_tunnel, ipip_get_tunnel
       IP6IP:        ipip6_set_tunnel, ipip6_get_tunnel,
                     ip6ip6_set_tunnel, ip6ip6_get_tunnel
       XFRM:         xfrm_get_state
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      3f13de6d
    • William Tu's avatar
      samples/bpf: remove the bpf tunnel testsuite. · b05cd740
      William Tu authored
      Move the testsuite to
      selftests/bpf/{test_tunnel_kern.c, test_tunnel.sh}
      Signed-off-by: default avatarWilliam Tu <u9012063@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      b05cd740
    • William Tu's avatar
      selftests/bpf: bpf tunnel test. · 933a741e
      William Tu authored
      The patch migrates the original tests at samples/bpf/tcbpf2_kern.c
      and samples/bpf/test_tunnel_bpf.sh to selftests.  There are a couple
      changes from the original:
          1) add ipv6 vxlan, ipv6 geneve, ipv6 ipip tests
          2) simplify the original ipip tests (remove iperf tests)
          3) improve documentation
          4) use bpf_ntoh* and bpf_hton* api
      
      In summary, 'test_tunnel_kern.o' contains the following bpf program:
        GRE: gre_set_tunnel, gre_get_tunnel
        IP6GRE: ip6gretap_set_tunnel, ip6gretap_get_tunnel
        ERSPAN: erspan_set_tunnel, erspan_get_tunnel
        IP6ERSPAN: ip4ip6erspan_set_tunnel, ip4ip6erspan_get_tunnel
        VXLAN: vxlan_set_tunnel, vxlan_get_tunnel
        IP6VXLAN: ip6vxlan_set_tunnel, ip6vxlan_get_tunnel
        GENEVE: geneve_set_tunnel, geneve_get_tunnel
        IP6GENEVE: ip6geneve_set_tunnel, ip6geneve_get_tunnel
        IPIP: ipip_set_tunnel, ipip_get_tunnel
        IP6IP: ipip6_set_tunnel, ipip6_get_tunnel,
               ip6ip6_set_tunnel, ip6ip6_get_tunnel
        XFRM: xfrm_get_state
      Signed-off-by: default avatarWilliam Tu <u9012063@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      933a741e
    • Nikita V. Shirokov's avatar
      bpf: fix xdp_generic for bpf_adjust_tail usecase · f7613120
      Nikita V. Shirokov authored
      When bpf_adjust_tail was introduced for generic xdp, it changed skb's tail
      pointer, so it was pointing to the new "end of the packet". However skb's
      len field wasn't properly modified, so on the wire ethernet frame had
      original (or even bigger, if adjust_head was used) size. This diff is
      fixing this.
      
      Fixes: 198d83bb (" bpf: make generic xdp compatible w/ bpf_xdp_adjust_tail")
      Signed-off-by: default avatarNikita V. Shirokov <tehnerd@tehnerd.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      f7613120
    • Jiri Olsa's avatar
      tools, bpftool: Display license GPL compatible in prog show/list · 9b984a20
      Jiri Olsa authored
      Display the license "gpl" string in bpftool prog command, like:
      
        # bpftool prog list
        5: tracepoint  name func  tag 57cd311f2e27366b  gpl
                loaded_at Apr 26/09:37  uid 0
                xlated 16B  not jited  memlock 4096B
      
        # bpftool --json --pretty prog show
        [{
                "id": 5,
                "type": "tracepoint",
                "name": "func",
                "tag": "57cd311f2e27366b",
                "gpl_compatible": true,
                "loaded_at": "Apr 26/09:37",
                "uid": 0,
                "bytes_xlated": 16,
                "jited": false,
                "bytes_memlock": 4096
            }
        ]
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      9b984a20
    • Jiri Olsa's avatar
      tools, bpf: Sync bpf.h uapi header · fb6ef42b
      Jiri Olsa authored
      Syncing the bpf.h uapi header with tools.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      fb6ef42b
    • Jiri Olsa's avatar
      bpf: Add gpl_compatible flag to struct bpf_prog_info · b85fab0e
      Jiri Olsa authored
      Adding gpl_compatible flag to struct bpf_prog_info
      so it can be dumped via bpf_prog_get_info_by_fd and
      displayed via bpftool progs dump.
      
      Alexei noticed 4-byte hole in struct bpf_prog_info,
      so we put the u32 flags field in there, and we can
      keep adding bit fields in there without breaking
      user space.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      b85fab0e
    • David S. Miller's avatar
      Merge branch 'udp-gso' · cb586c63
      David S. Miller authored
      Willem de Bruijn says:
      
      ====================
      udp gso
      
      Segmentation offload reduces cycles/byte for large packets by
      amortizing the cost of protocol stack traversal.
      
      This patchset implements GSO for UDP. A process can concatenate and
      submit multiple datagrams to the same destination in one send call
      by setting socket option SOL_UDP/UDP_SEGMENT with the segment size,
      or passing an analogous cmsg at send time.
      
      The stack will send the entire large (up to network layer max size)
      datagram through the protocol layer. At the GSO layer, it is broken
      up in individual segments. All receive the same network layer header
      and UDP src and dst port. All but the last segment have the same UDP
      header, but the last may differ in length and checksum.
      
      Initial results show a significant reduction in UDP cycles/byte.
      See the main patch for more details and benchmark results.
      
              udp
                876 MB/s 14873 msg/s 624666 calls/s
                  11,205,777,429      cycles
      
              udp gso
               2139 MB/s 36282 msg/s 36282 calls/s
                  11,204,374,561      cycles
      
      The patch set is broken down as follows:
      - patch 1 is a prerequisite: code rearrangement, noop otherwise
      - patch 2 implements the gso logic
      - patch 3 adds protocol stack support for UDP_SEGMENT
      - patch 4,5,7 are refinements
      - patch 6 adds the cmsg interface
      - patch 8..11 are tests
      
      This idea was presented previously at netconf 2017-2
      http://vger.kernel.org/netconf2017_files/rx_hardening_and_udp_gso.pdf
      
      Changes v1 -> v2
        - Convert __udp_gso_segment to modify headers after skb_segment
        - Split main patch into two, one for gso logic, one for UDP_SEGMENT
      
      Changes RFC -> v1
        - MSG_MORE:
            fixed, by allowing checksum offload with corking if gso
        - SKB_GSO_UDP_L4:
            made independent from SKB_GSO_UDP
            and removed skb_is_ufo() wrapper
        - NETIF_F_GSO_UDP_L4:
            add to netdev_features_string
            and to netdev-features.txt
            add BUILD_BUG_ON to match SKB_GSO_UDP_L4 value
        - UDP_MAX_SEGMENTS:
            introduce limit on number of segments per gso skb
            to avoid extreme cases like IP_MAX_MTU/IPV4_MIN_MTU
        - CHECKSUM_PARTIAL:
            test against missing feature after ndo_features_check
            if not supported return error, analogous to udp_send_check
        - MSG_ZEROCOPY: removed, deferred for now
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cb586c63
    • Willem de Bruijn's avatar
      selftests: udp gso benchmark · 3a687bef
      Willem de Bruijn authored
      Send udp data between a source and sink, optionally with udp gso.
      The two processes are expected to be run on separate hosts.
      
      A script is included that runs them together over loopback in a
      single namespace for functionality testing.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a687bef
    • Willem de Bruijn's avatar
      selftests: udp gso with corking · 3f12817f
      Willem de Bruijn authored
      Corked sockets take a different path to construct a udp datagram than
      the lockless fast path. Test this alternate path.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3f12817f
    • Willem de Bruijn's avatar
      selftests: udp gso with connected sockets · e5b2d91c
      Willem de Bruijn authored
      Connected sockets use path mtu instead of device mtu.
      
      Test this path by inserting a route mtu that is lower than the device
      mtu. Verify that the path mtu for the connection matches this lower
      number, then run the same test as in the connectionless case.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e5b2d91c