1. 20 Apr, 2018 1 commit
  2. 19 Apr, 2018 12 commits
    • Daniel Borkmann's avatar
      Merge branch 'bpf-type-format' · 34df9d37
      Daniel Borkmann authored
      Martin KaFai Lau says:
      
      ====================
      This patch introduces BPF Type Format (BTF).
      
      BTF (BPF Type Format) is the meta data format which describes
      the data types of BPF program/map.  Hence, it basically focus
      on the C programming language which the modern BPF is primary
      using.  The first use case is to provide a generic pretty print
      capability for a BPF map.
      
      A modified pahole that can convert dwarf to BTF is here:
      
        https://github.com/iamkafai/pahole/tree/btf
      
      Please see individual patch for details.
      
      v5:
      - Remove BTF_KIND_FLOAT and BTF_KIND_FUNC which are not
        currently used.  They can be added in the future.
        Some bpf_df_xxx() are removed together.
      - Add comment in patch 7 to clarify that the new bpffs_map_fops
        should not be extended further.
      
      v4:
      - Fix warning (remove unneeded semicolon)
      - Remove a redundant variable (nr_bytes) from btf_int_check_meta() in
        patch 1.  Caught by W=1.
      
      v3:
      - Rebase to bpf-next
      - Fix sparse warning (by adding static)
      - Add BTF header logging: btf_verifier_log_hdr()
      - Fix the alignment test on btf->type_off
      - Add tests for the BTF header
      - Lower the max BTF size to 16MB.  It should be enough
        for some time.  We could raise it later if it would
        be needed.
      
      v2:
      - Use kvfree where needed in patch 1 and 2
      - Also consider BTF_INT_OFFSET() in the btf_int_check_meta()
        in patch 1
      - Fix an incorrect goto target in map_create() during
        the btf-error-path in patch 7
      - re-org some local vars to keep the rev xmas tree in btf.c
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      34df9d37
    • Martin KaFai Lau's avatar
      bpf: btf: Add BTF tests · c0fa1b6c
      Martin KaFai Lau authored
      This patch tests the BTF loading, map_create with BTF
      and the changes in libbpf.
      
      -r: Raw tests that test raw crafted BTF data
      -f: Test LLVM compiled bpf prog with BTF data
      -g: Test BPF_OBJ_GET_INFO_BY_FD for btf_fd
      -p: Test pretty print
      
      The tools/testing/selftests/bpf/Makefile will probe
      for BTF support in llc and pahole before generating
      debug info (-g) and convert them to BTF.  You can supply
      the BTF supported binary through the following make variables:
      LLC, BTF_PAHOLE and LLVM_OBJCOPY.
      
      LLC: The lastest llc with -mattr=dwarfris support for the bpf target.
           It is only in the master of the llvm repo for now.
      BTF_PAHOLE: The modified pahole with BTF support:
      	    https://github.com/iamkafai/pahole/tree/btf
      	    To add a BTF section: "pahole -J bpf_prog.o"
      LLVM_OBJCOPY: Any llvm-objcopy should do
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      c0fa1b6c
    • Martin KaFai Lau's avatar
      bpf: btf: Add BTF support to libbpf · 8a138aed
      Martin KaFai Lau authored
      If the ".BTF" elf section exists, libbpf will try to create
      a btf_fd (through BPF_BTF_LOAD).  If that fails, it will still
      continue loading the bpf prog/map without the BTF.
      
      If the bpf_object has a BTF loaded, it will create a map with the btf_fd.
      libbpf will try to figure out the btf_key_id and btf_value_id of a map by
      finding the BTF type with name "<map_name>_key" and "<map_name>_value".
      If they cannot be found, it will continue without using the BTF.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      8a138aed
    • Martin KaFai Lau's avatar
      bpf: btf: Sync bpf.h and btf.h to tools/ · 3bd86a84
      Martin KaFai Lau authored
      This patch sync up the bpf.h and btf.h to tools/
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      3bd86a84
    • Martin KaFai Lau's avatar
      bpf: btf: Add pretty print support to the basic arraymap · a26ca7c9
      Martin KaFai Lau authored
      This patch adds pretty print support to the basic arraymap.
      Support for other bpf maps can be added later.
      
      This patch adds new attrs to the BPF_MAP_CREATE command to allow
      specifying the btf_fd, btf_key_id and btf_value_id.  The
      BPF_MAP_CREATE can then associate the btf to the map if
      the creating map supports BTF.
      
      A BTF supported map needs to implement two new map ops,
      map_seq_show_elem() and map_check_btf().  This patch has
      implemented these new map ops for the basic arraymap.
      
      It also adds file_operations, bpffs_map_fops, to the pinned
      map such that the pinned map can be opened and read.
      After that, the user has an intuitive way to do
      "cat bpffs/pathto/a-pinned-map" instead of getting
      an error.
      
      bpffs_map_fops should not be extended further to support
      other operations.  Other operations (e.g. write/key-lookup...)
      should be realized by the userspace tools (e.g. bpftool) through
      the BPF_OBJ_GET_INFO_BY_FD, map's lookup/update interface...etc.
      Follow up patches will allow the userspace to obtain
      the BTF from a map-fd.
      
      Here is a sample output when reading a pinned arraymap
      with the following map's value:
      
      struct map_value {
      	int count_a;
      	int count_b;
      };
      
      cat /sys/fs/bpf/pinned_array_map:
      
      0: {1,2}
      1: {3,4}
      2: {5,6}
      ...
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      a26ca7c9
    • Martin KaFai Lau's avatar
      bpf: btf: Add BPF_OBJ_GET_INFO_BY_FD support to BTF fd · 60197cfb
      Martin KaFai Lau authored
      This patch adds BPF_OBJ_GET_INFO_BY_FD support to BTF fd.
      The original BTF data, which was used to create the BTF fd during
      the earlier BPF_BTF_LOAD call, will be returned.
      
      The userspace is expected to allocate buffer
      to info.info and the buffer size is set to info.info_len before
      calling BPF_OBJ_GET_INFO_BY_FD.
      
      The original BTF data is copied to the userspace buffer (info.info).
      Only upto the user's specified info.info_len will be copied.
      
      The original BTF data size is set to info.info_len.  The userspace
      needs to check if it is bigger than its allocated buffer size.
      If it is, the userspace should realloc with the kernel-returned
      info.info_len and call the BPF_OBJ_GET_INFO_BY_FD again.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      60197cfb
    • Martin KaFai Lau's avatar
      bpf: btf: Add BPF_BTF_LOAD command · f56a653c
      Martin KaFai Lau authored
      This patch adds a BPF_BTF_LOAD command which
      1) loads and verifies the BTF (implemented in earlier patches)
      2) returns a BTF fd to userspace.  In the next patch, the
         BTF fd can be specified during BPF_MAP_CREATE.
      
      It currently limits to CAP_SYS_ADMIN.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      f56a653c
    • Martin KaFai Lau's avatar
      bpf: btf: Add pretty print capability for data with BTF type info · b00b8dae
      Martin KaFai Lau authored
      This patch adds pretty print capability for data with BTF type info.
      The current usage is to allow pretty print for a BPF map.
      
      The next few patches will allow a read() on a pinned map with BTF
      type info for its key and value.
      
      This patch uses the seq_printf() infra.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      b00b8dae
    • Martin KaFai Lau's avatar
      bpf: btf: Check members of struct/union · 179cde8c
      Martin KaFai Lau authored
      This patch checks a few things of struct's members:
      
      1) It has a valid size (e.g. a "const void" is invalid)
      2) A member's size (+ its member's offset) does not exceed
         the containing struct's size.
      3) The member's offset satisfies the alignment requirement
      
      The above can only be done after the needs_resolve member's type
      is resolved.  Hence, the above is done together in
      btf_struct_resolve().
      
      Each possible member's type (e.g. int, enum, modifier...) implements
      the check_member() ops which will be called from btf_struct_resolve().
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      179cde8c
    • Martin KaFai Lau's avatar
      bpf: btf: Validate type reference · eb3f595d
      Martin KaFai Lau authored
      After collecting all btf_type in the first pass in an earlier patch,
      the second pass (in this patch) can validate the reference types
      (e.g. the referring type does exist and it does not refer to itself).
      
      While checking the reference type, it also gathers other information (e.g.
      the size of an array).  This info will be useful in checking the
      struct's members in a later patch.  They will also be useful in doing
      pretty print later.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      eb3f595d
    • Martin KaFai Lau's avatar
      bpf: btf: Introduce BPF Type Format (BTF) · 69b693f0
      Martin KaFai Lau authored
      This patch introduces BPF type Format (BTF).
      
      BTF (BPF Type Format) is the meta data format which describes
      the data types of BPF program/map.  Hence, it basically focus
      on the C programming language which the modern BPF is primary
      using.  The first use case is to provide a generic pretty print
      capability for a BPF map.
      
      BTF has its root from CTF (Compact C-Type format).  To simplify
      the handling of BTF data, BTF removes the differences between
      small and big type/struct-member.  Hence, BTF consistently uses u32
      instead of supporting both "one u16" and "two u32 (+padding)" in
      describing type and struct-member.
      
      It also raises the number of types (and functions) limit
      from 0x7fff to 0x7fffffff.
      
      Due to the above changes,  the format is not compatible to CTF.
      Hence, BTF starts with a new BTF_MAGIC and version number.
      
      This patch does the first verification pass to the BTF.  The first
      pass checks:
      1. meta-data size (e.g. It does not go beyond the total btf's size)
      2. name_offset is valid
      3. Each BTF_KIND (e.g. int, enum, struct....) does its
         own check of its meta-data.
      
      Some other checks, like checking a struct's member is referring
      to a valid type, can only be done in the second pass.  The second
      verification pass will be implemented in the next patch.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      69b693f0
    • Jesper Dangaard Brouer's avatar
      bpf: reserve xdp_frame size in xdp headroom · 97e19cce
      Jesper Dangaard Brouer authored
      Commit 6dfb970d ("xdp: avoid leaking info stored in frame data on
      page reuse") tried to allow user/bpf_prog to (re)use area used by
      xdp_frame (stored in frame headroom), by memset clearing area when
      bpf_xdp_adjust_head give bpf_prog access to headroom area.
      
      The mentioned commit had two bugs. (1) Didn't take bpf_xdp_adjust_meta
      into account. (2) a combination of bpf_xdp_adjust_head calls, where
      xdp->data is moved into xdp_frame section, can cause clearing
      xdp_frame area again for area previously granted to bpf_prog.
      
      After discussions with Daniel, we choose to implement a simpler
      solution to the problem, which is to reserve the headroom used by
      xdp_frame info.
      
      This also avoids the situation where bpf_prog is allowed to adjust/add
      headers, and then XDP_REDIRECT later drops the packet due to lack of
      headroom for the xdp_frame.  This would likely confuse the end-user.
      
      Fixes: 6dfb970d ("xdp: avoid leaking info stored in frame data on page reuse")
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      97e19cce
  3. 18 Apr, 2018 19 commits
  4. 17 Apr, 2018 8 commits
    • Lorenzo Bianconi's avatar
      ipv6: send netlink notifications for manually configured addresses · a2d481b3
      Lorenzo Bianconi authored
      Send a netlink notification when userspace adds a manually configured
      address if DAD is enabled and optimistic flag isn't set.
      Moreover send RTM_DELADDR notifications for tentative addresses.
      
      Some userspace applications (e.g. NetworkManager) are interested in
      addr netlink events albeit the address is still in tentative state,
      however events are not sent if DAD process is not completed.
      If the address is added and immediately removed userspace listeners
      are not notified. This behaviour can be easily reproduced by using
      veth interfaces:
      
      $ ip -b - <<EOF
      > link add dev vm1 type veth peer name vm2
      > link set dev vm1 up
      > link set dev vm2 up
      > addr add 2001:db8:a:b:1:2:3:4/64 dev vm1
      > addr del 2001:db8:a:b:1:2:3:4/64 dev vm1
      EOF
      
      This patch reverts the behaviour introduced by the commit f784ad3d
      ("ipv6: do not send RTM_DELADDR for tentative addresses")
      Suggested-by: default avatarThomas Haller <thaller@redhat.com>
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a2d481b3
    • Ganesh Goudar's avatar
      cxgb4vf: display pause settings · a64dcddc
      Ganesh Goudar authored
      Add support to display pause settings
      Signed-off-by: default avatarGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a64dcddc
    • Hangbin Liu's avatar
      vxlan: add ttl inherit support · 72f6d71e
      Hangbin Liu authored
      Like tos inherit, ttl inherit should also means inherit the inner protocol's
      ttl values, which actually not implemented in vxlan yet.
      
      But we could not treat ttl == 0 as "use the inner TTL", because that would be
      used also when the "ttl" option is not specified and that would be a behavior
      change, and breaking real use cases.
      
      So add a different attribute IFLA_VXLAN_TTL_INHERIT when "ttl inherit" is
      specified with ip cmd.
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Suggested-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      72f6d71e
    • Samuel Mendoza-Jonas's avatar
      net/ncsi: Refactor MAC, VLAN filters · 062b3e1b
      Samuel Mendoza-Jonas authored
      The NCSI driver defines a generic ncsi_channel_filter struct that can be
      used to store arbitrarily formatted filters, and several generic methods
      of accessing data stored in such a filter.
      However in both the driver and as defined in the NCSI specification
      there are only two actual filters: VLAN ID filters and MAC address
      filters. The splitting of the MAC filter into unicast, multicast, and
      mixed is also technically not necessary as these are stored in the same
      location in hardware.
      
      To save complexity, particularly in the set up and accessing of these
      generic filters, remove them in favour of two specific structs. These
      can be acted on directly and do not need several generic helper
      functions to use.
      
      This also fixes a memory error found by KASAN on ARM32 (which is not
      upstream yet), where response handlers accessing a filter's data field
      could write past allocated memory.
      
      [  114.926512] ==================================================================
      [  114.933861] BUG: KASAN: slab-out-of-bounds in ncsi_configure_channel+0x4b8/0xc58
      [  114.941304] Read of size 2 at addr 94888558 by task kworker/0:2/546
      [  114.947593]
      [  114.949146] CPU: 0 PID: 546 Comm: kworker/0:2 Not tainted 4.16.0-rc6-00119-ge156398bfcad #13
      ...
      [  115.170233] The buggy address belongs to the object at 94888540
      [  115.170233]  which belongs to the cache kmalloc-32 of size 32
      [  115.181917] The buggy address is located 24 bytes inside of
      [  115.181917]  32-byte region [94888540, 94888560)
      [  115.192115] The buggy address belongs to the page:
      [  115.196943] page:9eeac100 count:1 mapcount:0 mapping:94888000 index:0x94888fc1
      [  115.204200] flags: 0x100(slab)
      [  115.207330] raw: 00000100 94888000 94888fc1 0000003f 00000001 9eea2014 9eecaa74 96c003e0
      [  115.215444] page dumped because: kasan: bad access detected
      [  115.221036]
      [  115.222544] Memory state around the buggy address:
      [  115.227384]  94888400: fb fb fb fb fc fc fc fc 04 fc fc fc fc fc fc fc
      [  115.233959]  94888480: 00 00 00 fc fc fc fc fc 00 04 fc fc fc fc fc fc
      [  115.240529] >94888500: 00 00 04 fc fc fc fc fc 00 00 04 fc fc fc fc fc
      [  115.247077]                                             ^
      [  115.252523]  94888580: 00 04 fc fc fc fc fc fc 06 fc fc fc fc fc fc fc
      [  115.259093]  94888600: 00 00 06 fc fc fc fc fc 00 00 04 fc fc fc fc fc
      [  115.265639] ==================================================================
      Reported-by: default avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarSamuel Mendoza-Jonas <sam@mendozajonas.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      062b3e1b
    • Eric Biggers's avatar
      KEYS: DNS: limit the length of option strings · c210f7b4
      Eric Biggers authored
      Adding a dns_resolver key whose payload contains a very long option name
      resulted in that string being printed in full.  This hit the WARN_ONCE()
      in set_precision() during the printk(), because printk() only supports a
      precision of up to 32767 bytes:
      
          precision 1000000 too large
          WARNING: CPU: 0 PID: 752 at lib/vsprintf.c:2189 vsnprintf+0x4bc/0x5b0
      
      Fix it by limiting option strings (combined name + value) to a much more
      reasonable 128 bytes.  The exact limit is arbitrary, but currently the
      only recognized option is formatted as "dnserror=%lu" which fits well
      within this limit.
      
      Also ratelimit the printks.
      
      Reproducer:
      
          perl -e 'print "#", "A" x 1000000, "\x00"' | keyctl padd dns_resolver desc @s
      
      This bug was found using syzkaller.
      Reported-by: default avatarMark Rutland <mark.rutland@arm.com>
      Fixes: 4a2d7892 ("DNS: If the DNS server returns an error, allow that to be cached [ver #2]")
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c210f7b4
    • Davide Caratti's avatar
    • Stephen Suryaputra's avatar
      ipv6: Count interface receive statistics on the ingress netdev · bdb7cc64
      Stephen Suryaputra authored
      The statistics such as InHdrErrors should be counted on the ingress
      netdev rather than on the dev from the dst, which is the egress.
      Signed-off-by: default avatarStephen Suryaputra <ssuryaextr@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bdb7cc64
    • David Ahern's avatar
      net/ipv6: Make __inet6_bind static · 032234d8
      David Ahern authored
      BPF core gets access to __inet6_bind via ipv6_bpf_stub_impl, so it is
      not invoked directly outside of af_inet6.c. Make it static and move
      inet6_bind after to avoid forward declaration.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      032234d8