Merge branch 'bpf-af-xdp-zc-api'
Björn Töpel says: ==================== This patch serie introduces zerocopy (ZC) support for AF_XDP. Programs using AF_XDP sockets will now receive RX packets without any copies and can also transmit packets without incurring any copies. No modifications to the application are needed, but the NIC driver needs to be modified to support ZC. If ZC is not supported by the driver, the modes introduced in the AF_XDP patch will be used. Using ZC in our micro benchmarks results in significantly improved performance as can be seen in the performance section later in this cover letter. Note that for an untrusted application, HW packet steering to a specific queue pair (the one associated with the application) is a requirement when using ZC, as the application would otherwise be able to see other user space processes' packets. If the HW cannot support the required packet steering you need to use the XDP_SKB mode or the XDP_DRV mode without ZC turned on. The XSKMAP introduced in the AF_XDP patch set can be used to do load balancing in that case. For benchmarking, you can use the xdpsock application from the AF_XDP patch set without any modifications. Say that you would like your UDP traffic from port 4242 to end up in queue 16, that we will enable AF_XDP on. Here, we use ethtool for this: ethtool -N p3p2 rx-flow-hash udp4 fn ethtool -N p3p2 flow-type udp4 src-port 4242 dst-port 4242 \ action 16 Running the rxdrop benchmark in XDP_DRV mode with zerocopy can then be done using: samples/bpf/xdpsock -i p3p2 -q 16 -r -N We have run some benchmarks on a dual socket system with two Broadwell E5 2660 @ 2.0 GHz with hyperthreading turned off. Each socket has 14 cores which gives a total of 28, but only two cores are used in these experiments. One for TR/RX and one for the user space application. The memory is DDR4 @ 2133 MT/s (1067 MHz) and the size of each DIMM is 8192MB and with 8 of those DIMMs in the system we have 64 GB of total memory. The compiler used is gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0. The NIC is Intel I40E 40Gbit/s using the i40e driver. Below are the results in Mpps of the I40E NIC benchmark runs for 64 and 1500 byte packets, generated by a commercial packet generator HW outputing packets at full 40 Gbit/s line rate. The results are without retpoline so that we can compare against previous numbers. AF_XDP performance 64 byte packets. Results from the AF_XDP V3 patch set are also reported for ease of reference. The numbers within parantheses are from the RFC V1 ZC patch set. Benchmark XDP_SKB XDP_DRV XDP_DRV with zerocopy rxdrop 2.9* 9.6* 21.1(21.5) txpush 2.6* - 22.0(21.6) l2fwd 1.9* 2.5* 15.3(15.0) AF_XDP performance 1500 byte packets: Benchmark XDP_SKB XDP_DRV XDP_DRV with zerocopy rxdrop 2.1* 3.3* 3.3(3.3) l2fwd 1.4* 1.8* 3.1(3.1) * From AF_XDP V3 patch set and cover letter. So why do we not get higher values for RX similar to the 34 Mpps we had in AF_PACKET V4? We made an experiment running the rxdrop benchmark without using the xdp_do_redirect/flush infrastructure nor using an XDP program (all traffic on a queue goes to one socket). Instead the driver acts directly on the AF_XDP socket. With this we got 36.9 Mpps, a significant improvement without any change to the uapi. So not forcing users to have an XDP program if they do not need it, might be a good idea. This measurement is actually higher than what we got with AF_PACKET V4. XDP performance on our system as a base line: 64 byte packets: XDP stats CPU pps issue-pps XDP-RX CPU 16 32.3M 0 1500 byte packets: XDP stats CPU pps issue-pps XDP-RX CPU 16 3.3M 0 The structure of the patch set is as follows: Patches 1-3: Plumbing for AF_XDP ZC support Patches 4-5: AF_XDP ZC for RX Patches 6-7: AF_XDP ZC for TX Patch 8-10: ZC support for i40e. Patch 11: Use the bind flags in sample application to force TX skb path when -S is providedd on the command line. This patch set is based on the new uapi introduced in "AF_XDP: bug fixes and descriptor changes". You need to apply that patch set first, before applying this one. We based this patch set on bpf-next commit bd3a08aa ("bpf: flowlabel in bpf_fib_lookup should be flowinfo") Comments: * Implementing dynamic creation and deletion of queues in the i40e driver would facilitate the coexistence of xdp_redirect and af_xdp. Thanks: Björn and Magnus ==================== Note: as agreed upon, i40e/zc bits will be routed via Jeff's tree. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Showing
Please register or sign in to comment