1. 10 Mar, 2021 3 commits
    • Daniel Borkmann's avatar
      Merge branch 'bpf-xdp-redirect' · 32f91529
      Daniel Borkmann authored
      Björn Töpel says:
      
      ====================
      This two patch series contain two optimizations for the
      bpf_redirect_map() helper and the xdp_do_redirect() function.
      
      The bpf_redirect_map() optimization is about avoiding the map lookup
      dispatching. Instead of having a switch-statement and selecting the
      correct lookup function, we let bpf_redirect_map() be a map operation,
      where each map has its own bpf_redirect_map() implementation. This way
      the run-time lookup is avoided.
      
      The xdp_do_redirect() patch restructures the code, so that the map
      pointer indirection can be avoided.
      
      Performance-wise I got 4% improvement for XSKMAP
      (sample:xdpsock/rx-drop), and 8% (sample:xdp_redirect_map) on my
      machine.
      
      v5->v6:  Removed REDIR enum, and instead use map_id and map_type. (Daniel)
               Applied Daniel's fixups on patch 1. (Daniel)
      v4->v5:  Renamed map operation to map_redirect. (Daniel)
      v3->v4:  Made bpf_redirect_map() a map operation. (Daniel)
      v2->v3:  Fix build when CONFIG_NET is not set. (lkp)
      v1->v2:  Removed warning when CONFIG_BPF_SYSCALL was not set. (lkp)
               Cleaned up case-clause in xdp_do_generic_redirect_map(). (Toke)
               Re-added comment. (Toke)
      rfc->v1: Use map_id, and remove bpf_clear_redirect_map(). (Toke)
               Get rid of the macro and use __always_inline. (Jesper)
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      32f91529
    • Björn Töpel's avatar
      bpf, xdp: Restructure redirect actions · ee75aef2
      Björn Töpel authored
      The XDP_REDIRECT implementations for maps and non-maps are fairly
      similar, but obviously need to take different code paths depending on
      if the target is using a map or not. Today, the redirect targets for
      XDP either uses a map, or is based on ifindex.
      
      Here, the map type and id are added to bpf_redirect_info, instead of
      the actual map. Map type, map item/ifindex, and the map_id (if any) is
      passed to xdp_do_redirect().
      
      For ifindex-based redirect, used by the bpf_redirect() XDP BFP helper,
      a special map type/id are used. Map type of UNSPEC together with map id
      equal to INT_MAX has the special meaning of an ifindex based
      redirect. Note that valid map ids are 1 inclusive, INT_MAX exclusive
      ([1,INT_MAX[).
      
      In addition to making the code easier to follow, using explicit type
      and id in bpf_redirect_info has a slight positive performance impact
      by avoiding a pointer indirection for the map type lookup, and instead
      use the cacheline for bpf_redirect_info.
      
      Since the actual map is not passed via bpf_redirect_info anymore, the
      map lookup is only done in the BPF helper. This means that the
      bpf_clear_redirect_map() function can be removed. The actual map item
      is RCU protected.
      
      The bpf_redirect_info flags member is not used by XDP, and not
      read/written any more. The map member is only written to when
      required/used, and not unconditionally.
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/bpf/20210308112907.559576-3-bjorn.topel@gmail.com
      ee75aef2
    • Björn Töpel's avatar
      bpf, xdp: Make bpf_redirect_map() a map operation · e6a4750f
      Björn Töpel authored
      Currently the bpf_redirect_map() implementation dispatches to the
      correct map-lookup function via a switch-statement. To avoid the
      dispatching, this change adds bpf_redirect_map() as a map
      operation. Each map provides its bpf_redirect_map() version, and
      correct function is automatically selected by the BPF verifier.
      
      A nice side-effect of the code movement is that the map lookup
      functions are now local to the map implementation files, which removes
      one additional function call.
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/bpf/20210308112907.559576-2-bjorn.topel@gmail.com
      e6a4750f
  2. 09 Mar, 2021 4 commits
  3. 08 Mar, 2021 6 commits
    • Jean-Philippe Brucker's avatar
      selftests/bpf: Fix typo in Makefile · a0d73acc
      Jean-Philippe Brucker authored
      The selftest build fails when trying to install the scripts:
      
      rsync: [sender] link_stat "tools/testing/selftests/bpf/test_docs_build.sh" failed: No such file or directory (2)
      
      Fix the filename.
      
      Fixes: a01d935b ("tools/bpf: Remove bpf-helpers from bpftool docs")
      Signed-off-by: default avatarJean-Philippe Brucker <jean-philippe@linaro.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20210308182830.155784-1-jean-philippe@linaro.org
      a0d73acc
    • Jean-Philippe Brucker's avatar
      libbpf: Fix arm64 build · a6aac408
      Jean-Philippe Brucker authored
      The macro for libbpf_smp_store_release() doesn't build on arm64, fix it.
      
      Fixes: 291471dd ("libbpf, xsk: Add libbpf_smp_store_release libbpf_smp_load_acquire")
      Signed-off-by: default avatarJean-Philippe Brucker <jean-philippe@linaro.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20210308182521.155536-1-jean-philippe@linaro.org
      a6aac408
    • Andrii Nakryiko's avatar
      Merge branch 'load-acquire/store-release barriers for' · bbb41728
      Andrii Nakryiko authored
      Björn Töpel says:
      
      ====================
      
      This two-patch series introduces load-acquire/store-release barriers
      for the AF_XDP rings.
      
      For most contemporary architectures, this is more effective than a
      SPSC ring based on smp_{r,w,}mb() barriers. More importantly,
      load-acquire/store-release semantics make the ring code easier to
      follow.
      
      This is effectively the change done in commit 6c43c091
      ("documentation: Update circular buffer for
      load-acquire/store-release"), but for the AF_XDP rings.
      
      Both libbpf and the kernel-side are updated.
      
      Full details are outlined in the commits!
      
      Thanks to the LKMM-folks (Paul/Alan/Will) for helping me out in this
      complicated matter!
      
      Changelog
      
      v1[1]->v2:
      * Expanded the commit message for patch 1, and included the LKMM
        litmus tests. Hopefully this clear things up. (Daniel)
      
      * Clarified why the smp_mb()/smp_load_acquire() is not needed in (A);
        control dependency with load to store. (Toke)
      
      [1] https://lore.kernel.org/bpf/20210301104318.263262-1-bjorn.topel@gmail.com/
      
      Thanks,
      Björn
      ====================
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      bbb41728
    • Björn Töpel's avatar
      libbpf, xsk: Add libbpf_smp_store_release libbpf_smp_load_acquire · 291471dd
      Björn Töpel authored
      Now that the AF_XDP rings have load-acquire/store-release semantics,
      move libbpf to that as well.
      
      The library-internal libbpf_smp_{load_acquire,store_release} are only
      valid for 32-bit words on ARM64.
      
      Also, remove the barriers that are no longer in use.
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/bpf/20210305094113.413544-3-bjorn.topel@gmail.com
      291471dd
    • Björn Töpel's avatar
      xsk: Update rings for load-acquire/store-release barriers · a23b3f56
      Björn Töpel authored
      Currently, the AF_XDP rings uses general smp_{r,w,}mb() barriers on
      the kernel-side. On most modern architectures
      load-acquire/store-release barriers perform better, and results in
      simpler code for circular ring buffers.
      
      This change updates the XDP socket rings to use
      load-acquire/store-release barriers.
      
      It is important to note that changing from the old smp_{r,w,}mb()
      barriers, to load-acquire/store-release barriers does not break
      compatibility. The old semantics work with the new one, and vice
      versa.
      
      As pointed out by "Documentation/memory-barriers.txt" in the "SMP
      BARRIER PAIRING" section:
      
        "General barriers pair with each other, though they also pair with
        most other types of barriers, albeit without multicopy atomicity.
        An acquire barrier pairs with a release barrier, but both may also
        pair with other barriers, including of course general barriers."
      
      How different barriers behaves and pairs is outlined in
      "tools/memory-model/Documentation/cheatsheet.txt".
      
      In order to make sure that compatibility is not broken, LKMM herd7
      based litmus tests can be constructed and verified.
      
      We generalize the XDP socket ring to a one entry ring, and create two
      scenarios; One where the ring is full, where only the consumer can
      proceed, followed by the producer. One where the ring is empty, where
      only the producer can proceed, followed by the consumer. Each scenario
      is then expanded to four different tests: general producer/general
      consumer, general producer/acqrel consumer, acqrel producer/general
      consumer, acqrel producer/acqrel consumer. In total eight tests.
      
      The empty ring test:
        C spsc-rb+empty
      
        // Simple one entry ring:
        // prod cons     allowed action       prod cons
        //    0    0 =>       prod          =>   1    0
        //    0    1 =>       cons          =>   0    0
        //    1    0 =>       cons          =>   1    1
        //    1    1 =>       prod          =>   0    1
      
        {}
      
        // We start at prod==0, cons==0, data==0, i.e. nothing has been
        // written to the ring. From here only the producer can start, and
        // should write 1. Afterwards, consumer can continue and read 1 to
        // data. Can we enter state prod==1, cons==1, but consumer observed
        // the incorrect value of 0?
      
        P0(int *prod, int *cons, int *data)
        {
           ... producer
        }
      
        P1(int *prod, int *cons, int *data)
        {
           ... consumer
        }
      
        exists( 1:d=0 /\ prod=1 /\ cons=1 );
      
      The full ring test:
        C spsc-rb+full
      
        // Simple one entry ring:
        // prod cons     allowed action       prod cons
        //    0    0 =>       prod          =>   1    0
        //    0    1 =>       cons          =>   0    0
        //    1    0 =>       cons          =>   1    1
        //    1    1 =>       prod          =>   0    1
      
        { prod = 1; }
      
        // We start at prod==1, cons==0, data==1, i.e. producer has
        // written 0, so from here only the consumer can start, and should
        // consume 0. Afterwards, producer can continue and write 1 to
        // data. Can we enter state prod==0, cons==1, but consumer observed
        // the write of 1?
      
        P0(int *prod, int *cons, int *data)
        {
          ... producer
        }
      
        P1(int *prod, int *cons, int *data)
        {
          ... consumer
        }
      
        exists( 1:d=1 /\ prod=0 /\ cons=1 );
      
      where P0 and P1 are:
      
        P0(int *prod, int *cons, int *data)
        {
        	int p;
      
        	p = READ_ONCE(*prod);
        	if (READ_ONCE(*cons) == p) {
        		WRITE_ONCE(*data, 1);
        		smp_wmb();
        		WRITE_ONCE(*prod, p ^ 1);
        	}
        }
      
        P0(int *prod, int *cons, int *data)
        {
        	int p;
      
        	p = READ_ONCE(*prod);
        	if (READ_ONCE(*cons) == p) {
        		WRITE_ONCE(*data, 1);
        		smp_store_release(prod, p ^ 1);
        	}
        }
      
        P1(int *prod, int *cons, int *data)
        {
        	int c;
        	int d = -1;
      
        	c = READ_ONCE(*cons);
        	if (READ_ONCE(*prod) != c) {
        		smp_rmb();
        		d = READ_ONCE(*data);
        		smp_mb();
        		WRITE_ONCE(*cons, c ^ 1);
        	}
        }
      
        P1(int *prod, int *cons, int *data)
        {
        	int c;
        	int d = -1;
      
        	c = READ_ONCE(*cons);
        	if (smp_load_acquire(prod) != c) {
        		d = READ_ONCE(*data);
        		smp_store_release(cons, c ^ 1);
        	}
        }
      
      The full LKMM litmus tests are found at [1].
      
      On x86-64 systems the l2fwd AF_XDP xdpsock sample performance
      increases by 1%. This is mostly due to that the smp_mb() is removed,
      which is a relatively expensive operation on these
      platforms. Weakly-ordered platforms, such as ARM64 might benefit even
      more.
      
      [1] https://github.com/bjoto/litmus-xskSigned-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/bpf/20210305094113.413544-2-bjorn.topel@gmail.com
      a23b3f56
    • Jiri Olsa's avatar
      selftests/bpf: Fix test_attach_probe for powerpc uprobes · 299194a9
      Jiri Olsa authored
      When testing uprobes we the test gets GEP (Global Entry Point)
      address from kallsyms, but then the function is called locally
      so the uprobe is not triggered.
      
      Fixing this by adjusting the address to LEP (Local Entry Point)
      for powerpc arch plus instruction check stolen from ppc_function_entry
      function pointed out and explained by Michael and Naveen.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Link: https://lore.kernel.org/bpf/20210305134050.139840-1-jolsa@kernel.org
      299194a9
  4. 05 Mar, 2021 27 commits