1. 18 Dec, 2018 3 commits
    • Yonghong Song's avatar
      bpf: enable cgroup local storage map pretty print with kind_flag · ffa0c1cf
      Yonghong Song authored
      Commit 970289fc0a83 ("bpf: add bpffs pretty print for cgroup
      local storage maps") added bpffs pretty print for cgroup
      local storage maps. The commit worked for struct without kind_flag
      set.
      
      This patch refactored and made pretty print also work
      with kind_flag set for the struct.
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      ffa0c1cf
    • Yonghong Song's avatar
      bpf: btf: fix struct/union/fwd types with kind_flag · 9d5f9f70
      Yonghong Song authored
      This patch fixed two issues with BTF. One is related to
      struct/union bitfield encoding and the other is related to
      forward type.
      
      Issue #1 and solution:
      
      ======================
      
      Current btf encoding of bitfield follows what pahole generates.
      For each bitfield, pahole will duplicate the type chain and
      put the bitfield size at the final int or enum type.
      Since the BTF enum type cannot encode bit size,
      pahole workarounds the issue by generating
      an int type whenever the enum bit size is not 32.
      
      For example,
        -bash-4.4$ cat t.c
        typedef int ___int;
        enum A { A1, A2, A3 };
        struct t {
          int a[5];
          ___int b:4;
          volatile enum A c:4;
        } g;
        -bash-4.4$ gcc -c -O2 -g t.c
      The current kernel supports the following BTF encoding:
        $ pahole -JV t.o
        [1] TYPEDEF ___int type_id=2
        [2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
        [3] ENUM A size=4 vlen=3
              A1 val=0
              A2 val=1
              A3 val=2
        [4] STRUCT t size=24 vlen=3
              a type_id=5 bits_offset=0
              b type_id=9 bits_offset=160
              c type_id=11 bits_offset=164
        [5] ARRAY (anon) type_id=2 index_type_id=2 nr_elems=5
        [6] INT sizetype size=8 bit_offset=0 nr_bits=64 encoding=(none)
        [7] VOLATILE (anon) type_id=3
        [8] INT int size=1 bit_offset=0 nr_bits=4 encoding=(none)
        [9] TYPEDEF ___int type_id=8
        [10] INT (anon) size=1 bit_offset=0 nr_bits=4 encoding=SIGNED
        [11] VOLATILE (anon) type_id=10
      
      Two issues are in the above:
        . by changing enum type to int, we lost the original
          type information and this will not be ideal later
          when we try to convert BTF to a header file.
        . the type duplication for bitfields will cause
          BTF bloat. Duplicated types cannot be deduplicated
          later if the bitfield size is different.
      
      To fix this issue, this patch implemented a compatible
      change for BTF struct type encoding:
        . the bit 31 of struct_type->info, previously reserved,
          now is used to indicate whether bitfield_size is
          encoded in btf_member or not.
        . if bit 31 of struct_type->info is set,
          btf_member->offset will encode like:
            bit 0 - 23: bit offset
            bit 24 - 31: bitfield size
          if bit 31 is not set, the old behavior is preserved:
            bit 0 - 31: bit offset
      
      So if the struct contains a bit field, the maximum bit offset
      will be reduced to (2^24 - 1) instead of MAX_UINT. The maximum
      bitfield size will be 256 which is enough for today as maximum
      bitfield in compiler can be 128 where int128 type is supported.
      
      This kernel patch intends to support the new BTF encoding:
        $ pahole -JV t.o
        [1] TYPEDEF ___int type_id=2
        [2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
        [3] ENUM A size=4 vlen=3
              A1 val=0
              A2 val=1
              A3 val=2
        [4] STRUCT t kind_flag=1 size=24 vlen=3
              a type_id=5 bitfield_size=0 bits_offset=0
              b type_id=1 bitfield_size=4 bits_offset=160
              c type_id=7 bitfield_size=4 bits_offset=164
        [5] ARRAY (anon) type_id=2 index_type_id=2 nr_elems=5
        [6] INT sizetype size=8 bit_offset=0 nr_bits=64 encoding=(none)
        [7] VOLATILE (anon) type_id=3
      
      Issue #2 and solution:
      ======================
      
      Current forward type in BTF does not specify whether the original
      type is struct or union. This will not work for type pretty print
      and BTF-to-header-file conversion as struct/union must be specified.
        $ cat tt.c
        struct t;
        union u;
        int foo(struct t *t, union u *u) { return 0; }
        $ gcc -c -g -O2 tt.c
        $ pahole -JV tt.o
        [1] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
        [2] FWD t type_id=0
        [3] PTR (anon) type_id=2
        [4] FWD u type_id=0
        [5] PTR (anon) type_id=4
      
      To fix this issue, similar to issue #1, type->info bit 31
      is used. If the bit is set, it is union type. Otherwise, it is
      a struct type.
      
        $ pahole -JV tt.o
        [1] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
        [2] FWD t kind_flag=0 type_id=0
        [3] PTR (anon) kind_flag=0 type_id=2
        [4] FWD u kind_flag=1 type_id=0
        [5] PTR (anon) kind_flag=0 type_id=4
      
      Pahole/LLVM change:
      ===================
      
      The new kind_flag functionality has been implemented in pahole
      and llvm:
        https://github.com/yonghong-song/pahole/tree/bitfield
        https://github.com/yonghong-song/llvm/tree/bitfield
      
      Note that pahole hasn't implemented func/func_proto kind
      and .BTF.ext. So to print function signature with bpftool,
      the llvm compiler should be used.
      
      Fixes: 69b693f0 ("bpf: btf: Introduce BPF Type Format (BTF)")
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      9d5f9f70
    • Yonghong Song's avatar
      bpf: btf: refactor btf_int_bits_seq_show() · f97be3ab
      Yonghong Song authored
      Refactor function btf_int_bits_seq_show() by creating
      function btf_bitfield_seq_show() which has no dependence
      on btf and btf_type. The function btf_bitfield_seq_show()
      will be in later patch to directly dump bitfield member values.
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      f97be3ab
  2. 17 Dec, 2018 1 commit
  3. 15 Dec, 2018 11 commits
  4. 14 Dec, 2018 3 commits
    • Alexei Starovoitov's avatar
      Merge branch 'bpf_line_info-in-verifier' · eb415c98
      Alexei Starovoitov authored
      Martin Lau says:
      
      ====================
      This patch set provides bpf_line_info during the verifier's verbose
      log.  Please see individual patch for details.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      eb415c98
    • Martin KaFai Lau's avatar
      bpf: verbose log bpf_line_info in verifier · d9762e84
      Martin KaFai Lau authored
      This patch adds bpf_line_info during the verifier's verbose.
      It can give error context for debug purpose.
      
      ~~~~~~~~~~
      Here is the verbose log for backedge:
      	while (a) {
      		a += bpf_get_smp_processor_id();
      		bpf_trace_printk(fmt, sizeof(fmt), a);
      	}
      
      ~> bpftool prog load ./test_loop.o /sys/fs/bpf/test_loop type tracepoint
      13: while (a) {
      3: a += bpf_get_smp_processor_id();
      back-edge from insn 13 to 3
      
      ~~~~~~~~~~
      Here is the verbose log for invalid pkt access:
      Modification to test_xdp_noinline.c:
      
      	data = (void *)(long)xdp->data;
      	data_end = (void *)(long)xdp->data_end;
      /*
      	if (data + 4 > data_end)
      		return XDP_DROP;
      */
      	*(u32 *)data = dst->dst;
      
      ~> bpftool prog load ./test_xdp_noinline.o /sys/fs/bpf/test_xdp_noinline type xdp
      ; data = (void *)(long)xdp->data;
      224: (79) r2 = *(u64 *)(r10 -112)
      225: (61) r2 = *(u32 *)(r2 +0)
      ; *(u32 *)data = dst->dst;
      226: (63) *(u32 *)(r2 +0) = r1
      invalid access to packet, off=0 size=4, R2(id=0,off=0,r=0)
      R2 offset is outside of the packet
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d9762e84
    • Martin KaFai Lau's avatar
      bpf: Create a new btf_name_by_offset() for non type name use case · 23127b33
      Martin KaFai Lau authored
      The current btf_name_by_offset() is returning "(anon)" type name for
      the offset == 0 case and "(invalid-name-offset)" for the out-of-bound
      offset case.
      
      It fits well for the internal BTF verbose log purpose which
      is focusing on type.  For example,
      offset == 0 => "(anon)" => anonymous type/name.
      Returning non-NULL for the bad offset case is needed
      during the BTF verification process because the BTF verifier may
      complain about another field first before discovering the name_off
      is invalid.
      
      However, it may not be ideal for the newer use case which does not
      necessary mean type name.  For example, when logging line_info
      in the BPF verifier in the next patch, it is better to log an
      empty src line instead of logging "(anon)".
      
      The existing bpf_name_by_offset() is renamed to __bpf_name_by_offset()
      and static to btf.c.
      
      A new bpf_name_by_offset() is added for generic context usage.  It
      returns "\0" for name_off == 0 (note that btf->strings[0] is "\0")
      and NULL for invalid offset.  It allows the caller to decide
      what is the best output in its context.
      
      The new btf_name_by_offset() is overlapped with btf_name_offset_valid().
      Hence, btf_name_offset_valid() is removed from btf.h to keep the btf.h API
      minimal.  The existing btf_name_offset_valid() usage in btf.c could also be
      replaced later.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      23127b33
  5. 13 Dec, 2018 11 commits
  6. 12 Dec, 2018 5 commits
  7. 11 Dec, 2018 4 commits
    • Daniel Borkmann's avatar
      bpf: fix up uapi helper description and sync bpf header with tools · 0bd72117
      Daniel Borkmann authored
      Minor markup fixup from bpf-next into net-next merge in the BPF helper
      description of bpf_sk_lookup_tcp() and bpf_sk_lookup_udp(). Also sync
      up the copy of bpf.h from tooling infrastructure.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0bd72117
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · addb0679
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf-next 2018-12-11
      
      The following pull-request contains BPF updates for your *net-next* tree.
      
      It has three minor merge conflicts, resolutions:
      
      1) tools/testing/selftests/bpf/test_verifier.c
      
       Take first chunk with alignment_prevented_execution.
      
      2) net/core/filter.c
      
        [...]
        case bpf_ctx_range_ptr(struct __sk_buff, flow_keys):
        case bpf_ctx_range(struct __sk_buff, wire_len):
              return false;
        [...]
      
      3) include/uapi/linux/bpf.h
      
        Take the second chunk for the two cases each.
      
      The main changes are:
      
      1) Add support for BPF line info via BTF and extend libbpf as well
         as bpftool's program dump to annotate output with BPF C code to
         facilitate debugging and introspection, from Martin.
      
      2) Add support for BPF_ALU | BPF_ARSH | BPF_{K,X} in interpreter
         and all JIT backends, from Jiong.
      
      3) Improve BPF test coverage on archs with no efficient unaligned
         access by adding an "any alignment" flag to the BPF program load
         to forcefully disable verifier alignment checks, from David.
      
      4) Add a new bpf_prog_test_run_xattr() API to libbpf which allows for
         proper use of BPF_PROG_TEST_RUN with data_out, from Lorenz.
      
      5) Extend tc BPF programs to use a new __sk_buff field called wire_len
         for more accurate accounting of packets going to wire, from Petar.
      
      6) Improve bpftool to allow dumping the trace pipe from it and add
         several improvements in bash completion and map/prog dump,
         from Quentin.
      
      7) Optimize arm64 BPF JIT to always emit movn/movk/movk sequence for
         kernel addresses and add a dedicated BPF JIT backend allocator,
         from Ard.
      
      8) Add a BPF helper function for IR remotes to report mouse movements,
         from Sean.
      
      9) Various cleanups in BPF prog dump e.g. to make UAPI bpf_prog_info
         member naming consistent with existing conventions, from Yonghong
         and Song.
      
      10) Misc cleanups and improvements in allowing to pass interface name
          via cmdline for xdp1 BPF example, from Matteo.
      
      11) Fix a potential segfault in BPF sample loader's kprobes handling,
          from Daniel T.
      
      12) Fix SPDX license in libbpf's README.rst, from Andrey.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      addb0679
    • David Ahern's avatar
      neighbor: gc_list changes should be protected by table lock · 8cc196d6
      David Ahern authored
      Adding and removing neighbor entries to / from the gc_list need to be
      done while holding the table lock; a couple of places were missed in the
      original patch.
      
      Move the list_add_tail in neigh_alloc to ___neigh_create where the lock
      is already obtained. Since neighbor entries should rarely be moved
      to/from PERMANENT state, add lock/unlock around the gc_list changes in
      neigh_change_state rather than extending the lock hold around all
      neighbor updates.
      
      Fixes: 58956317 ("neighbor: Improve garbage collection")
      Reported-by: default avatarAndrei Vagin <avagin@gmail.com>
      Reported-by: syzbot+6cc2fd1d3bdd2e007363@syzkaller.appspotmail.com
      Reported-by: syzbot+35e87b87c00f386b041f@syzkaller.appspotmail.com
      Reported-by: syzbot+b354d1fb59091ea73c37@syzkaller.appspotmail.com
      Reported-by: syzbot+3ddead5619658537909b@syzkaller.appspotmail.com
      Reported-by: syzbot+424d47d5c456ce8b2bbe@syzkaller.appspotmail.com
      Reported-by: syzbot+e4d42eb35f6a27b0a628@syzkaller.appspotmail.com
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8cc196d6
    • David S. Miller's avatar
      Merge tag 'mlx5e-updates-2018-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 93698321
      David S. Miller authored
      Saeed Mahameed:
      
      ====================
      mlx5e-updates-2018-12-10 (gre)
      
      This patch set adds GRE offloading support to Mellanox ethernet driver.
      
      Patches 1-5 replace the existing egdev mechanism with the new TC indirect
      block binds mechanism that was introduced by Netronome:
      7f76fa36 ("net: sched: register callbacks for indirect tc block binds")
      
      Patches 6-9 add GRE offloading support along with some required
      refactoring work.
      
      Patch 10, Add netif_is_gretap()/netif_is_ip6gretap()
       - Changed the is_gretap_dev and is_ip6gretap_dev logic from structure
         comparison to string comparison of the rtnl_link_ops kind field.
      
      Patch 11, add GRE offloading support to mlx5.
      
      Patch 12 removes the egdev mechanism from TC as it is no longer used by
      any of the drivers.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93698321
  8. 10 Dec, 2018 2 commits