1. 30 Mar, 2018 3 commits
    • Daniel Borkmann's avatar
      Merge branch 'bpf-sockmap-sg-api-fixes' · 807ae7da
      Daniel Borkmann authored
      Prashant Bhole says:
      
      ====================
      These patches fix sg api usage in sockmap. Previously sockmap didn't
      use sg_init_table(), which caused hitting BUG_ON in sg api, when
      CONFIG_DEBUG_SG is enabled
      
      v1: added sg_init_table() calls wherever needed.
      
      v2:
      - Patch1 adds new helper function in sg api. sg_init_marker()
      - Patch2 sg_init_marker() and sg_init_table() in appropriate places
      
      Backgroud:
      While reviewing v1, John Fastabend raised a valid point about
      unnecessary memset in sg_init_table() because sockmap uses sg table
      which embedded in a struct. As enclosing struct is zeroed out, there
      is unnecessary memset in sg_init_table.
      
      So Daniel Borkmann suggested to define another static inline function
      in scatterlist.h which only initializes sg_magic. Also this function
      will be called from sg_init_table. From this suggestion I defined a
      function sg_init_marker() which sets sg_magic and calls sg_mark_end()
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      807ae7da
    • Prashant Bhole's avatar
      bpf: sockmap: initialize sg table entries properly · 6ef6d84c
      Prashant Bhole authored
      When CONFIG_DEBUG_SG is set, sg->sg_magic is initialized in
      sg_init_table() and it is verified in sg api while navigating. We hit
      BUG_ON when magic check is failed.
      
      In functions sg_tcp_sendpage and sg_tcp_sendmsg, the struct containing
      the scatterlist is already zeroed out. So to avoid extra memset, we
      use sg_init_marker() to initialize sg_magic.
      
      Fixed following things:
      - In bpf_tcp_sendpage: initialize sg using sg_init_marker
      - In bpf_tcp_sendmsg: Replace sg_init_table with sg_init_marker
      - In bpf_tcp_push: Replace memset with sg_init_table where consumed
        sg entry needs to be re-initialized.
      Signed-off-by: default avatarPrashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      6ef6d84c
    • Prashant Bhole's avatar
      lib/scatterlist: add sg_init_marker() helper · f3851786
      Prashant Bhole authored
      sg_init_marker initializes sg_magic in the sg table and calls
      sg_mark_end() on the last entry of the table. This can be useful to
      avoid memset in sg_init_table() when scatterlist is already zeroed out
      
      For example: when scatterlist is embedded inside other struct and that
      container struct is zeroed out
      Suggested-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarPrashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      f3851786
  2. 29 Mar, 2018 20 commits
  3. 28 Mar, 2018 12 commits
    • Daniel Borkmann's avatar
      Merge branch 'bpf-raw-tracepoints' · f6ef5658
      Daniel Borkmann authored
      Alexei Starovoitov says:
      
      ====================
      v7->v8:
      - moved 'u32 num_args' from 'struct tracepoint' into 'struct bpf_raw_event_map'
        that increases memory overhead, but can be optimized/compressed later.
        Now it's zero changes in tracepoint.[ch]
      
      v6->v7:
      - adopted Steven's bpf_raw_tp_map section approach to find tracepoint
        and corresponding bpf probe function instead of kallsyms approach.
        dropped kernel_tracepoint_find_by_name() patch
      
      v5->v6:
      - avoid changing semantics of for_each_kernel_tracepoint() function, instead
        introduce kernel_tracepoint_find_by_name() helper
      
      v4->v5:
      - adopted Daniel's fancy REPEAT macro in bpf_trace.c in patch 6
      
      v3->v4:
      - adopted Linus's CAST_TO_U64 macro to cast any integer, pointer, or small
        struct to u64. That nicely reduced the size of patch 1
      
      v2->v3:
      - with Linus's suggestion introduced generic COUNT_ARGS and CONCATENATE macros
        (or rather moved them from apparmor)
        that cleaned up patch 6
      - added patch 4 to refactor trace_iwlwifi_dev_ucode_error() from 17 args to 4
        Now any tracepoint with >12 args will have build error
      
      v1->v2:
      - simplified api by combing bpf_raw_tp_open(name) + bpf_attach(prog_fd) into
        bpf_raw_tp_open(name, prog_fd) as suggested by Daniel.
        That simplifies bpf_detach as well which is now simple close() of fd.
      - fixed memory leak in error path which was spotted by Daniel.
      - fixed bpf_get_stackid(), bpf_perf_event_output() called from raw tracepoints
      - added more tests
      - fixed allyesconfig build caught by buildbot
      
      v1:
      This patch set is a different way to address the pressing need to access
      task_struct pointers in sched tracepoints from bpf programs.
      
      The first approach simply added these pointers to sched tracepoints:
      https://lkml.org/lkml/2017/12/14/753
      which Peter nacked.
      Few options were discussed and eventually the discussion converged on
      doing bpf specific tracepoint_probe_register() probe functions.
      Details here:
      https://lkml.org/lkml/2017/12/20/929
      
      Patch 1 is kernel wide cleanup of pass-struct-by-value into
      pass-struct-by-reference into tracepoints.
      
      Patches 2 and 3 are minor cleanups to address allyesconfig build
      
      Patch 4 refactor trace_iwlwifi_dev_ucode_error from 17 to 4 args
      
      Patch 5 introduces COUNT_ARGS macro
      
      Patch 6 introduces BPF_RAW_TRACEPOINT api.
      the auto-cleanup and multiple concurrent users are must have
      features of tracing api. For bpf raw tracepoints it looks like:
        // load bpf prog with BPF_PROG_TYPE_RAW_TRACEPOINT type
        prog_fd = bpf_prog_load(...);
      
        // receive anon_inode fd for given bpf_raw_tracepoint
        // and attach bpf program to it
        raw_tp_fd = bpf_raw_tracepoint_open("xdp_exception", prog_fd);
      
      Ctrl-C of tracing daemon or cmdline tool will automatically
      detach bpf program, unload it and unregister tracepoint probe.
      More details in patch 6.
      
      Patch 7 - trivial support in libbpf
      Patches 8, 9 - user space tests
      
      samples/bpf/test_overhead performance on 1 cpu:
      
      tracepoint    base  kprobe+bpf tracepoint+bpf raw_tracepoint+bpf
      task_rename   1.1M   769K        947K            1.0M
      urandom_read  789K   697K        750K            755K
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      f6ef5658
    • Alexei Starovoitov's avatar
      selftests/bpf: test for bpf_get_stackid() from raw tracepoints · 3bbe0869
      Alexei Starovoitov authored
      similar to traditional traceopint test add bpf_get_stackid() test
      from raw tracepoints
      and reduce verbosity of existing stackmap test
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      3bbe0869
    • Alexei Starovoitov's avatar
      samples/bpf: raw tracepoint test · 4662a4e5
      Alexei Starovoitov authored
      add empty raw_tracepoint bpf program to test overhead similar
      to kprobe and traditional tracepoint tests
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      4662a4e5
    • Alexei Starovoitov's avatar
      libbpf: add bpf_raw_tracepoint_open helper · a0fe3e57
      Alexei Starovoitov authored
      add bpf_raw_tracepoint_open(const char *name, int prog_fd) api to libbpf
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      a0fe3e57
    • Alexei Starovoitov's avatar
      bpf: introduce BPF_RAW_TRACEPOINT · c4f6699d
      Alexei Starovoitov authored
      Introduce BPF_PROG_TYPE_RAW_TRACEPOINT bpf program type to access
      kernel internal arguments of the tracepoints in their raw form.
      
      >From bpf program point of view the access to the arguments look like:
      struct bpf_raw_tracepoint_args {
             __u64 args[0];
      };
      
      int bpf_prog(struct bpf_raw_tracepoint_args *ctx)
      {
        // program can read args[N] where N depends on tracepoint
        // and statically verified at program load+attach time
      }
      
      kprobe+bpf infrastructure allows programs access function arguments.
      This feature allows programs access raw tracepoint arguments.
      
      Similar to proposed 'dynamic ftrace events' there are no abi guarantees
      to what the tracepoints arguments are and what their meaning is.
      The program needs to type cast args properly and use bpf_probe_read()
      helper to access struct fields when argument is a pointer.
      
      For every tracepoint __bpf_trace_##call function is prepared.
      In assembler it looks like:
      (gdb) disassemble __bpf_trace_xdp_exception
      Dump of assembler code for function __bpf_trace_xdp_exception:
         0xffffffff81132080 <+0>:     mov    %ecx,%ecx
         0xffffffff81132082 <+2>:     jmpq   0xffffffff811231f0 <bpf_trace_run3>
      
      where
      
      TRACE_EVENT(xdp_exception,
              TP_PROTO(const struct net_device *dev,
                       const struct bpf_prog *xdp, u32 act),
      
      The above assembler snippet is casting 32-bit 'act' field into 'u64'
      to pass into bpf_trace_run3(), while 'dev' and 'xdp' args are passed as-is.
      All of ~500 of __bpf_trace_*() functions are only 5-10 byte long
      and in total this approach adds 7k bytes to .text.
      
      This approach gives the lowest possible overhead
      while calling trace_xdp_exception() from kernel C code and
      transitioning into bpf land.
      Since tracepoint+bpf are used at speeds of 1M+ events per second
      this is valuable optimization.
      
      The new BPF_RAW_TRACEPOINT_OPEN sys_bpf command is introduced
      that returns anon_inode FD of 'bpf-raw-tracepoint' object.
      
      The user space looks like:
      // load bpf prog with BPF_PROG_TYPE_RAW_TRACEPOINT type
      prog_fd = bpf_prog_load(...);
      // receive anon_inode fd for given bpf_raw_tracepoint with prog attached
      raw_tp_fd = bpf_raw_tracepoint_open("xdp_exception", prog_fd);
      
      Ctrl-C of tracing daemon or cmdline tool that uses this feature
      will automatically detach bpf program, unload it and
      unregister tracepoint probe.
      
      On the kernel side the __bpf_raw_tp_map section of pointers to
      tracepoint definition and to __bpf_trace_*() probe function is used
      to find a tracepoint with "xdp_exception" name and
      corresponding __bpf_trace_xdp_exception() probe function
      which are passed to tracepoint_probe_register() to connect probe
      with tracepoint.
      
      Addition of bpf_raw_tracepoint doesn't interfere with ftrace and perf
      tracepoint mechanisms. perf_event_open() can be used in parallel
      on the same tracepoint.
      Multiple bpf_raw_tracepoint_open("xdp_exception", prog_fd) are permitted.
      Each with its own bpf program. The kernel will execute
      all tracepoint probes and all attached bpf programs.
      
      In the future bpf_raw_tracepoints can be extended with
      query/introspection logic.
      
      __bpf_raw_tp_map section logic was contributed by Steven Rostedt
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Acked-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      c4f6699d
    • Alexei Starovoitov's avatar
      macro: introduce COUNT_ARGS() macro · cf14f27f
      Alexei Starovoitov authored
      move COUNT_ARGS() macro from apparmor to generic header and extend it
      to count till twelve.
      
      COUNT() was an alternative name for this logic, but it's used for
      different purpose in many other places.
      
      Similarly for CONCATENATE() macro.
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      cf14f27f
    • Alexei Starovoitov's avatar
      net/wireless/iwlwifi: fix iwlwifi_dev_ucode_error tracepoint · 4fe43c2c
      Alexei Starovoitov authored
      fix iwlwifi_dev_ucode_error tracepoint to pass pointer to a table
      instead of all 17 arguments by value.
      dvm/main.c and mvm/utils.c have 'struct iwl_error_event_table'
      defined with very similar yet subtly different fields and offsets.
      tracepoint is still common and using definition of 'struct iwl_error_event_table'
      from dvm/commands.h while copying fields.
      Long term this tracepoint probably should be split into two.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      4fe43c2c
    • Alexei Starovoitov's avatar
      net/mac802154: disambiguate mac80215 vs mac802154 trace events · 14624a93
      Alexei Starovoitov authored
      two trace events defined with the same name and both unused.
      They conflict in allyesconfig build. Rename one of them.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      14624a93
    • Alexei Starovoitov's avatar
      net/mediatek: disambiguate mt76 vs mt7601u trace events · d992ee6c
      Alexei Starovoitov authored
      two trace events defined with the same name and both unused.
      They conflict in allyesconfig build. Rename one of them.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      d992ee6c
    • Alexei Starovoitov's avatar
      treewide: remove large struct-pass-by-value from tracepoint arguments · c1055475
      Alexei Starovoitov authored
      - fix trace_hfi1_ctxt_info() to pass large struct by reference instead of by value
      - convert 'type array[]' tracepoint arguments into 'type *array',
        since compiler will warn that sizeof('type array[]') == sizeof('type *array')
        and later should be used instead
      
      The CAST_TO_U64 macro in the later patch will enforce that tracepoint
      arguments can only be integers, pointers, or less than 8 byte structures.
      Larger structures should be passed by reference.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      c1055475
    • Nikita V. Shirokov's avatar
      bpf: Add sock_ops R/W access to ipv4 tos · 6f5c39fa
      Nikita V. Shirokov authored
      Sample usage for tos ...
      
        bpf_getsockopt(skops, SOL_IP, IP_TOS, &v, sizeof(v))
      
      ... where skops is a pointer to the ctx (struct bpf_sock_ops).
      Signed-off-by: default avatarNikita V. Shirokov <tehnerd@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      6f5c39fa
    • Colin Ian King's avatar
      samples/bpf: fix spelling mistake: "revieve" -> "receive" · 20cfb7a0
      Colin Ian King authored
      Trivial fix to spelling mistake in error message text
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      20cfb7a0
  4. 27 Mar, 2018 1 commit
  5. 26 Mar, 2018 3 commits
  6. 23 Mar, 2018 1 commit