1. 27 Dec, 2019 2 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 2bbc078f
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf-next 2019-12-27
      
      The following pull-request contains BPF updates for your *net-next* tree.
      
      We've added 127 non-merge commits during the last 17 day(s) which contain
      a total of 110 files changed, 6901 insertions(+), 2721 deletions(-).
      
      There are three merge conflicts. Conflicts and resolution looks as follows:
      
      1) Merge conflict in net/bpf/test_run.c:
      
      There was a tree-wide cleanup c593642c ("treewide: Use sizeof_field() macro")
      which gets in the way with b590cb5f ("bpf: Switch to offsetofend in
      BPF_PROG_TEST_RUN"):
      
        <<<<<<< HEAD
                if (!range_is_zero(__skb, offsetof(struct __sk_buff, priority) +
                                   sizeof_field(struct __sk_buff, priority),
        =======
                if (!range_is_zero(__skb, offsetofend(struct __sk_buff, priority),
        >>>>>>> 7c8dce4b
      
      There are a few occasions that look similar to this. Always take the chunk with
      offsetofend(). Note that there is one where the fields differ in here:
      
        <<<<<<< HEAD
                if (!range_is_zero(__skb, offsetof(struct __sk_buff, tstamp) +
                                   sizeof_field(struct __sk_buff, tstamp),
        =======
                if (!range_is_zero(__skb, offsetofend(struct __sk_buff, gso_segs),
        >>>>>>> 7c8dce4b
      
      Just take the one with offsetofend() /and/ gso_segs. Latter is correct due to
      850a88cc ("bpf: Expose __sk_buff wire_len/gso_segs to BPF_PROG_TEST_RUN").
      
      2) Merge conflict in arch/riscv/net/bpf_jit_comp.c:
      
      (I'm keeping Bjorn in Cc here for a double-check in case I got it wrong.)
      
        <<<<<<< HEAD
                if (is_13b_check(off, insn))
                        return -1;
                emit(rv_blt(tcc, RV_REG_ZERO, off >> 1), ctx);
        =======
                emit_branch(BPF_JSLT, RV_REG_T1, RV_REG_ZERO, off, ctx);
        >>>>>>> 7c8dce4b
      
      Result should look like:
      
                emit_branch(BPF_JSLT, tcc, RV_REG_ZERO, off, ctx);
      
      3) Merge conflict in arch/riscv/include/asm/pgtable.h:
      
        <<<<<<< HEAD
        =======
        #define VMALLOC_SIZE     (KERN_VIRT_SIZE >> 1)
        #define VMALLOC_END      (PAGE_OFFSET - 1)
        #define VMALLOC_START    (PAGE_OFFSET - VMALLOC_SIZE)
      
        #define BPF_JIT_REGION_SIZE     (SZ_128M)
        #define BPF_JIT_REGION_START    (PAGE_OFFSET - BPF_JIT_REGION_SIZE)
        #define BPF_JIT_REGION_END      (VMALLOC_END)
      
        /*
         * Roughly size the vmemmap space to be large enough to fit enough
         * struct pages to map half the virtual address space. Then
         * position vmemmap directly below the VMALLOC region.
         */
        #define VMEMMAP_SHIFT \
                (CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT)
        #define VMEMMAP_SIZE    BIT(VMEMMAP_SHIFT)
        #define VMEMMAP_END     (VMALLOC_START - 1)
        #define VMEMMAP_START   (VMALLOC_START - VMEMMAP_SIZE)
      
        #define vmemmap         ((struct page *)VMEMMAP_START)
      
        >>>>>>> 7c8dce4b
      
      Only take the BPF_* defines from there and move them higher up in the
      same file. Remove the rest from the chunk. The VMALLOC_* etc defines
      got moved via 01f52e16 ("riscv: define vmemmap before pfn_to_page
      calls"). Result:
      
        [...]
        #define __S101  PAGE_READ_EXEC
        #define __S110  PAGE_SHARED_EXEC
        #define __S111  PAGE_SHARED_EXEC
      
        #define VMALLOC_SIZE     (KERN_VIRT_SIZE >> 1)
        #define VMALLOC_END      (PAGE_OFFSET - 1)
        #define VMALLOC_START    (PAGE_OFFSET - VMALLOC_SIZE)
      
        #define BPF_JIT_REGION_SIZE     (SZ_128M)
        #define BPF_JIT_REGION_START    (PAGE_OFFSET - BPF_JIT_REGION_SIZE)
        #define BPF_JIT_REGION_END      (VMALLOC_END)
      
        /*
         * Roughly size the vmemmap space to be large enough to fit enough
         * struct pages to map half the virtual address space. Then
         * position vmemmap directly below the VMALLOC region.
         */
        #define VMEMMAP_SHIFT \
                (CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT)
        #define VMEMMAP_SIZE    BIT(VMEMMAP_SHIFT)
        #define VMEMMAP_END     (VMALLOC_START - 1)
        #define VMEMMAP_START   (VMALLOC_START - VMEMMAP_SIZE)
      
        [...]
      
      Let me know if there are any other issues.
      
      Anyway, the main changes are:
      
      1) Extend bpftool to produce a struct (aka "skeleton") tailored and specific
         to a provided BPF object file. This provides an alternative, simplified API
         compared to standard libbpf interaction. Also, add libbpf extern variable
         resolution for .kconfig section to import Kconfig data, from Andrii Nakryiko.
      
      2) Add BPF dispatcher for XDP which is a mechanism to avoid indirect calls by
         generating a branch funnel as discussed back in bpfconf'19 at LSF/MM. Also,
         add various BPF riscv JIT improvements, from Björn Töpel.
      
      3) Extend bpftool to allow matching BPF programs and maps by name,
         from Paul Chaignon.
      
      4) Support for replacing cgroup BPF programs attached with BPF_F_ALLOW_MULTI
         flag for allowing updates without service interruption, from Andrey Ignatov.
      
      5) Cleanup and simplification of ring access functions for AF_XDP with a
         bonus of 0-5% performance improvement, from Magnus Karlsson.
      
      6) Enable BPF JITs for x86-64 and arm64 by default. Also, final version of
         audit support for BPF, from Daniel Borkmann and latter with Jiri Olsa.
      
      7) Move and extend test_select_reuseport into BPF program tests under
         BPF selftests, from Jakub Sitnicki.
      
      8) Various BPF sample improvements for xdpsock for customizing parameters
         to set up and benchmark AF_XDP, from Jay Jayatheerthan.
      
      9) Improve libbpf to provide a ulimit hint on permission denied errors.
         Also change XDP sample programs to attach in driver mode by default,
         from Toke Høiland-Jørgensen.
      
      10) Extend BPF test infrastructure to allow changing skb mark from tc BPF
          programs, from Nikita V. Shirokov.
      
      11) Optimize prologue code sequence in BPF arm32 JIT, from Russell King.
      
      12) Fix xdp_redirect_cpu BPF sample to manually attach to tracepoints after
          libbpf conversion, from Jesper Dangaard Brouer.
      
      13) Minor misc improvements from various others.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2bbc078f
    • Andrii Nakryiko's avatar
      bpftool: Make skeleton C code compilable with C++ compiler · 7c8dce4b
      Andrii Nakryiko authored
      When auto-generated BPF skeleton C code is included from C++ application, it
      triggers compilation error due to void * being implicitly casted to whatever
      target pointer type. This is supported by C, but not C++. To solve this
      problem, add explicit casts, where necessary.
      
      To ensure issues like this are captured going forward, add skeleton usage in
      test_cpp test.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20191226210253.3132060-1-andriin@fb.com
      7c8dce4b
  2. 26 Dec, 2019 32 commits
  3. 25 Dec, 2019 6 commits
    • David S. Miller's avatar
      Merge branch 'Simplify-IPv6-route-offload-API' · 9f6cff99
      David S. Miller authored
      Ido Schimmel says:
      
      ====================
      Simplify IPv6 route offload API
      
      Motivation
      ==========
      
      This is the IPv6 counterpart of "Simplify IPv4 route offload API" [1].
      The aim of this patch set is to simplify the IPv6 route offload API by
      making the stack a bit smarter about the notifications it is generating.
      This allows driver authors to focus on programming the underlying device
      instead of having to duplicate the IPv6 route insertion logic in their
      driver, which is error-prone.
      
      Details
      =======
      
      Today, whenever an IPv6 route is added or deleted a notification is sent
      in the FIB notification chain and it is up to offload drivers to decide
      if the route should be programmed to the hardware or not. This is not an
      easy task as in hardware routes are keyed by {prefix, prefix length,
      table id}, whereas the kernel can store multiple such routes that only
      differ in metric / nexthop info.
      
      This series makes sure that only routes that are actually used in the
      data path are notified to offload drivers. This greatly simplifies the
      work these drivers need to do, as they are now only concerned with
      programming the hardware and do not need to replicate the IPv6 route
      insertion logic and store multiple identical routes.
      
      The route that is notified is the first route in the IPv6 FIB node,
      which represents a single prefix and length in a given table. In case
      the route is deleted and there is another route with the same key, a
      replace notification is emitted. Otherwise, a delete notification is
      emitted.
      
      Unlike IPv4, in IPv6 it is possible to append individual nexthops to an
      existing multipath route. Therefore, in addition to the replace and
      delete notifications present in IPv4, an append notification is also
      used.
      
      Testing
      =======
      
      To ensure there is no degradation in route insertion rates, I averaged
      the insertion rate of 512k routes (/64 and /128) over 50 runs. Did not
      observe any degradation.
      
      Functional tests are available here [2]. They rely on route trap
      indication, which is added in a subsequent patch set.
      
      In addition, I have been running syzkaller for the past couple of weeks
      with debug options enabled. Did not observe any problems.
      
      Patch set overview
      ==================
      
      Patches #1-#7 gradually introduce the new FIB notifications
      Patch #8 converts mlxsw to use the new notifications
      Patch #9 remove the old notifications
      
      [1] https://patchwork.ozlabs.org/cover/1209738/
      [2] https://github.com/idosch/linux/tree/fib-notifier
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9f6cff99
    • Ido Schimmel's avatar
      ipv6: Remove old route notifications and convert listeners · caafb250
      Ido Schimmel authored
      Now that mlxsw is converted to use the new FIB notifications it is
      possible to delete the old ones and use the new replace / append /
      delete notifications.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      caafb250
    • Ido Schimmel's avatar
      mlxsw: spectrum_router: Start using new IPv6 route notifications · dacad7b3
      Ido Schimmel authored
      With the new notifications mlxsw does not need to handle identical
      routes itself, as this is taken care of by the core IPv6 code.
      
      Instead, mlxsw only needs to take care of inserting and removing routes
      from the device.
      
      Convert mlxsw to use the new IPv6 route notifications and simplify the
      code.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dacad7b3
    • Ido Schimmel's avatar
      ipv6: Handle multipath route deletion notification · 0284696b
      Ido Schimmel authored
      When an entire multipath route is deleted, only emit a notification if
      it is the first route in the node. Emit a replace notification in case
      the last sibling is followed by another route. Otherwise, emit a delete
      notification.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0284696b
    • Ido Schimmel's avatar
      ipv6: Handle route deletion notification · d2f0c9b1
      Ido Schimmel authored
      For the purpose of route offload, when a single route is deleted, it is
      only of interest if it is the first route in the node or if it is
      sibling to such a route.
      
      In the first case, distinguish between several possibilities:
      
      1. Route is the last route in the node. Emit a delete notification
      
      2. Route is followed by a non-multipath route. Emit a replace
      notification for the non-multipath route.
      
      3. Route is followed by a multipath route. Emit a replace notification
      for the multipath route.
      
      In the second case, only emit a delete notification to ensure the route
      is no longer used as a valid nexthop.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d2f0c9b1
    • Ido Schimmel's avatar
      ipv6: Only Replay routes of interest to new listeners · 9c6ecd3c
      Ido Schimmel authored
      When a new listener is registered to the FIB notification chain it
      receives a dump of all the available routes in the system. Instead, make
      sure to only replay the IPv6 routes that are actually used in the data
      path and are of any interest to the new listener.
      
      This is done by iterating over all the routing tables in the given
      namespace, but from each traversed node only the first route ('leaf') is
      notified. Multipath routes are notified in a single notification instead
      of one for each nexthop.
      
      Add fib6_rt_dump_tmp() to do that. Later on in the patch set it will be
      renamed to fib6_rt_dump() instead of the existing one.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9c6ecd3c