1. 10 Aug, 2018 7 commits
    • Toshiaki Makita's avatar
      veth: Avoid drops by oversized packets when XDP is enabled · dc224822
      Toshiaki Makita authored
      Oversized packets including GSO packets can be dropped if XDP is
      enabled on receiver side, so don't send such packets from peer.
      
      Drop TSO and SCTP fragmentation features so that veth devices themselves
      segment packets with XDP enabled. Also cap MTU accordingly.
      
      v4:
      - Don't auto-adjust MTU but cap max MTU.
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      dc224822
    • Toshiaki Makita's avatar
      veth: Add driver XDP · 948d4f21
      Toshiaki Makita authored
      This is the basic implementation of veth driver XDP.
      
      Incoming packets are sent from the peer veth device in the form of skb,
      so this is generally doing the same thing as generic XDP.
      
      This itself is not so useful, but a starting point to implement other
      useful veth XDP features like TX and REDIRECT.
      
      This introduces NAPI when XDP is enabled, because XDP is now heavily
      relies on NAPI context. Use ptr_ring to emulate NIC ring. Tx function
      enqueues packets to the ring and peer NAPI handler drains the ring.
      
      Currently only one ring is allocated for each veth device, so it does
      not scale on multiqueue env. This can be resolved by allocating rings
      on the per-queue basis later.
      
      Note that NAPI is not used but netif_rx is used when XDP is not loaded,
      so this does not change the default behaviour.
      
      v6:
      - Check skb->len only when allocation is needed.
      - Add __GFP_NOWARN to alloc_page() as it can be triggered by external
        events.
      
      v3:
      - Fix race on closing the device.
      - Add extack messages in ndo_bpf.
      
      v2:
      - Squashed with the patch adding NAPI.
      - Implement adjust_tail.
      - Don't acquire consumer lock because it is guarded by NAPI.
      - Make poll_controller noop since it is unnecessary.
      - Register rxq_info on enabling XDP rather than on opening the device.
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      948d4f21
    • Toshiaki Makita's avatar
      net: Export skb_headers_offset_update · b0768a86
      Toshiaki Makita authored
      This is needed for veth XDP which does skb_copy_expand()-like operation.
      
      v2:
      - Drop skb_copy_header part because it has already been exported now.
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      b0768a86
    • Daniel Borkmann's avatar
      Merge branch 'bpf-sample-cpumap-lb' · c4c20217
      Daniel Borkmann authored
      Jesper Dangaard Brouer says:
      
      ====================
      Background: cpumap moves the SKB allocation out of the driver code,
      and instead allocate it on the remote CPU, and invokes the regular
      kernel network stack with the newly allocated SKB.
      
      The idea behind the XDP CPU redirect feature, is to use XDP as a
      load-balancer step in-front of regular kernel network stack.  But the
      current sample code does not provide a good example of this.  Part of
      the reason is that, I have implemented this as part of Suricata XDP
      load-balancer.
      
      Given this is the most frequent feature request I get.  This patchset
      implement the same XDP load-balancing as Suricata does, which is a
      symmetric hash based on the IP-pairs + L4-protocol.
      
      The expected setup for the use-case is to reduce the number of NIC RX
      queues via ethtool (as XDP can handle more per core), and via
      smp_affinity assign these RX queues to a set of CPUs, which will be
      handling RX packets.  The CPUs that runs the regular network stack is
      supplied to the sample xdp_redirect_cpu tool by specifying
      the --cpu option multiple times on the cmdline.
      
      I do note that cpumap SKB creation is not feature complete yet, and
      more work is coming.  E.g. given GRO is not implemented yet, do expect
      TCP workloads to be slower.  My measurements do indicate UDP workloads
      are faster.
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      c4c20217
    • Jesper Dangaard Brouer's avatar
      samples/bpf: xdp_redirect_cpu load balance like Suricata · 1bca4e6b
      Jesper Dangaard Brouer authored
      This implement XDP CPU redirection load-balancing across available
      CPUs, based on the hashing IP-pairs + L4-protocol.  This equivalent to
      xdp-cpu-redirect feature in Suricata, which is inspired by the
      Suricata 'ippair' hashing code.
      
      An important property is that the hashing is flow symmetric, meaning
      that if the source and destination gets swapped then the selected CPU
      will remain the same.  This is helps locality by placing both directions
      of a flows on the same CPU, in a forwarding/routing scenario.
      
      The hashing INITVAL (15485863 the 10^6th prime number) was fairly
      arbitrary choosen, but experiments with kernel tree pktgen scripts
      (pktgen_sample04_many_flows.sh +pktgen_sample05_flow_per_thread.sh)
      showed this improved the distribution.
      
      This patch also change the default loaded XDP program to be this
      load-balancer.  As based on different user feedback, this seems to be
      the expected behavior of the sample xdp_redirect_cpu.
      
      Link: https://github.com/OISF/suricata/commit/796ec08dd7a63Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      1bca4e6b
    • Jesper Dangaard Brouer's avatar
      samples/bpf: add Paul Hsieh's (LGPL 2.1) hash function SuperFastHash · 11395686
      Jesper Dangaard Brouer authored
      Adjusted function call API to take an initval. This allow the API
      user to set the initial value, as a seed. This could also be used for
      inputting the previous hash.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      11395686
    • Björn Töpel's avatar
      Revert "xdp: add NULL pointer check in __xdp_return()" · eb91e4d4
      Björn Töpel authored
      This reverts commit 36e0f12b.
      
      The reverted commit adds a WARN to check against NULL entries in the
      mem_id_ht rhashtable. Any kernel path implementing the XDP (generic or
      driver) fast path is required to make a paired
      xdp_rxq_info_reg/xdp_rxq_info_unreg call for proper function. In
      addition, a driver using a different allocation scheme than the
      default MEM_TYPE_PAGE_SHARED is required to additionally call
      xdp_rxq_info_reg_mem_model.
      
      For MEM_TYPE_ZERO_COPY, an xdp_rxq_info_reg_mem_model call ensures
      that the mem_id_ht rhashtable has a properly inserted allocator id. If
      not, this would be a driver bug. A NULL pointer kernel OOPS is
      preferred to the WARN.
      Suggested-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      eb91e4d4
  2. 09 Aug, 2018 33 commits