1. 16 Sep, 2019 7 commits
    • Ilya Leoshkevich's avatar
      bpf: fix accessing bpf_sysctl.file_pos on s390 · d895a0f1
      Ilya Leoshkevich authored
      "ctx:file_pos sysctl:read write ok" fails on s390 with "Read value  !=
      nux". This is because verifier rewrites a complete 32-bit
      bpf_sysctl.file_pos update to a partial update of the first 32 bits of
      64-bit *bpf_sysctl_kern.ppos, which is not correct on big-endian
      systems.
      
      Fix by using an offset on big-endian systems.
      
      Ditto for bpf_sysctl.file_pos reads. Currently the test does not detect
      a problem there, since it expects to see 0, which it gets with high
      probability in error cases, so change it to seek to offset 3 and expect
      3 in bpf_sysctl.file_pos.
      
      Fixes: e1550bfe ("bpf: Add file_pos field to bpf_sysctl ctx")
      Signed-off-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20190816105300.49035-1-iii@linux.ibm.com/
      d895a0f1
    • Toke Høiland-Jørgensen's avatar
      xdp: Fix race in dev_map_hash_update_elem() when replacing element · af58e7ee
      Toke Høiland-Jørgensen authored
      syzbot found a crash in dev_map_hash_update_elem(), when replacing an
      element with a new one. Jesper correctly identified the cause of the crash
      as a race condition between the initial lookup in the map (which is done
      before taking the lock), and the removal of the old element.
      
      Rather than just add a second lookup into the hashmap after taking the
      lock, fix this by reworking the function logic to take the lock before the
      initial lookup.
      
      Fixes: 6f9d451a ("xdp: Add devmap_hash map type for looking up devices by hashed index")
      Reported-and-tested-by: syzbot+4e7a85b1432052e8d6f8@syzkaller.appspotmail.com
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      af58e7ee
    • Daniel Borkmann's avatar
      Merge branch 'bpf-af-xdp-unaligned-fixes' · a4fa6e16
      Daniel Borkmann authored
      Ciara Loftus says:
      
      ====================
      This patch set contains some fixes for AF_XDP zero copy in the i40e and
      ixgbe drivers as well as a fix for the 'xdpsock' sample application when
      running in unaligned mode.
      
      Patches 1 and 2 fix a regression for the i40e and ixgbe drivers which
      caused the umem headroom to be added to the xdp handle twice, resulting in
      an incorrect value being received by the user for the case where the umem
      headroom is non-zero.
      
      Patch 3 fixes an issue with the xdpsock sample application whereby the
      start of the tx packet data (offset) was not being set correctly when the
      application was being run in unaligned mode.
      
      This patch set has been applied against commit a2c11b03 ("kcm: use
      BPF_PROG_RUN")
      ====================
      Acked-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      a4fa6e16
    • Ciara Loftus's avatar
      samples/bpf: fix xdpsock l2fwd tx for unaligned mode · 5a712e13
      Ciara Loftus authored
      Preserve the offset of the address of the received descriptor, and include
      it in the address set for the tx descriptor, so the kernel can correctly
      locate the start of the packet data.
      
      Fixes: 03895e63 ("samples/bpf: add buffer recycling for unaligned chunks to xdpsock")
      Signed-off-by: default avatarCiara Loftus <ciara.loftus@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      5a712e13
    • Ciara Loftus's avatar
      ixgbe: fix xdp handle calculations · 2e78fc62
      Ciara Loftus authored
      Commit 7cbbf9f1 ("ixgbe: fix xdp handle calculations") reintroduced
      the addition of the umem headroom to the xdp handle in the ixgbe_zca_free,
      ixgbe_alloc_buffer_slow_zc and ixgbe_alloc_buffer_zc functions. However,
      the headroom is already added to the handle in the function
      ixgbe_run_xdp_zc. This commit removes the latter addition and fixes the
      case where the headroom is non-zero.
      
      Fixes: 7cbbf9f1 ("ixgbe: fix xdp handle calculations")
      Signed-off-by: default avatarCiara Loftus <ciara.loftus@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      2e78fc62
    • Ciara Loftus's avatar
      i40e: fix xdp handle calculations · 168dfc3a
      Ciara Loftus authored
      Commit 4c5d9a7f ("i40e: fix xdp handle calculations") reintroduced
      the addition of the umem headroom to the xdp handle in the i40e_zca_free,
      i40e_alloc_buffer_slow_zc and i40e_alloc_buffer_zc functions. However,
      the headroom is already added to the handle in the function i40_run_xdp_zc.
      This commit removes the latter addition and fixes the case where the
      headroom is non-zero.
      
      Fixes: 4c5d9a7f ("i40e: fix xdp handle calculations")
      Signed-off-by: default avatarCiara Loftus <ciara.loftus@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      168dfc3a
    • Ilya Leoshkevich's avatar
      selftests/bpf: add bpf-gcc support · 4ce150b6
      Ilya Leoshkevich authored
      Now that binutils and gcc support for BPF is upstream, make use of it in
      BPF selftests using alu32-like approach. Share as much as possible of
      CFLAGS calculation with clang.
      
      Fixes only obvious issues, leaving more complex ones for later:
      - Use gcc-provided bpf-helpers.h instead of manually defining the
        helpers, change bpf_helpers.h include guard to avoid conflict.
      - Include <linux/stddef.h> for __always_inline.
      - Add $(OUTPUT)/../usr/include to include path in order to use local
        kernel headers instead of system kernel headers when building with O=.
      
      In order to activate the bpf-gcc support, one needs to configure
      binutils and gcc with --target=bpf and make them available in $PATH. In
      particular, gcc must be installed as `bpf-gcc`, which is the default.
      
      Right now with binutils 25a2915e8dba and gcc r275589 only a handful of
      tests work:
      
      	# ./test_progs_bpf_gcc
      	# Summary: 7/39 PASSED, 1 SKIPPED, 98 FAILED
      
      The reason for those failures are as follows:
      
      - Build errors:
        - `error: too many function arguments for eBPF` for __always_inline
          functions read_str_var and read_map_var - must be inlining issue,
          and for process_l3_headers_v6, which relies on optimizing away
          function arguments.
        - `error: indirect call in function, which are not supported by eBPF`
          where there are no obvious indirect calls in the source calls, e.g.
          in __encap_ipip_none.
        - `error: field 'lock' has incomplete type` for fields of `struct
          bpf_spin_lock` type - bpf_spin_lock is re#defined by bpf-helpers.h,
          so its usage is sensitive to order of #includes.
        - `error: eBPF stack limit exceeded` in sysctl_tcp_mem.
      - Load errors:
        - Missing object files due to above build errors.
        - `libbpf: failed to create map (name: 'test_ver.bss')`.
        - `libbpf: object file doesn't contain bpf program`.
        - `libbpf: Program '.text' contains unrecognized relo data pointing to
          section 0`.
        - `libbpf: BTF is required, but is missing or corrupted` - no BTF
          support in gcc yet.
      Signed-off-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Cc: Jose E. Marchesi <jose.marchesi@oracle.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      4ce150b6
  2. 06 Sep, 2019 29 commits
  3. 05 Sep, 2019 4 commits
    • Daniel Borkmann's avatar
      Merge branch 'bpf-af-xdp-barrier-fixes' · 593f191a
      Daniel Borkmann authored
      Björn Töpel says:
      
      ====================
      This is a four patch series of various barrier, {READ, WRITE}_ONCE
      cleanups in the AF_XDP socket code. More details can be found in the
      corresponding commit message. Previous revisions: v1 [4] and v2 [5].
      
      For an AF_XDP socket, most control plane operations are done under the
      control mutex (struct xdp_sock, mutex), but there are some places
      where members of the struct is read outside the control mutex. The
      dev, queue_id members are set in bind() and cleared at cleanup. The
      umem, fq, cq, tx, rx, and state member are all assigned in various
      places, e.g. bind() and setsockopt(). When the members are assigned,
      they are protected by the control mutex, but since they are read
      outside the mutex, a WRITE_ONCE is required to avoid store-tearing on
      the read-side.
      
      Prior the state variable was introduced by Ilya, the dev member was
      used to determine whether the socket was bound or not. However, when
      dev was read, proper SMP barriers and READ_ONCE were missing. In order
      to address the missing barriers and READ_ONCE, we start using the
      state variable as a point of synchronization. The state member
      read/write is paired with proper SMP barriers, and from this follows
      that the members described above does not need READ_ONCE statements if
      used in conjunction with state check.
      
      To summarize: The members struct xdp_sock members dev, queue_id, umem,
      fq, cq, tx, rx, and state were read lock-less, with incorrect barriers
      and missing {READ, WRITE}_ONCE. After this series umem, fq, cq, tx,
      rx, and state are read lock-less. When these members are updated,
      WRITE_ONCE is used. When read, READ_ONCE are only used when read
      outside the control mutex (e.g. mmap) or, not synchronized with the
      state member (XSK_BOUND plus smp_rmb())
      
      [1] https://lore.kernel.org/bpf/beef16bb-a09b-40f1-7dd0-c323b4b89b17@iogearbox.net/
      [2] https://lwn.net/Articles/793253/
      [3] https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE
      [4] https://lore.kernel.org/bpf/20190822091306.20581-1-bjorn.topel@gmail.com/
      [5] https://lore.kernel.org/bpf/20190826061053.15996-1-bjorn.topel@gmail.com/
      
      v2->v3:
        Minor restructure of commits.
        Improve cover and commit messages. (Daniel)
      v1->v2:
        Removed redundant dev check. (Jonathan)
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      593f191a
    • Björn Töpel's avatar
      xsk: lock the control mutex in sock_diag interface · 25dc18ff
      Björn Töpel authored
      When accessing the members of an XDP socket, the control mutex should
      be held. This commit fixes that.
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Fixes: a36b38aa ("xsk: add sock_diag interface for AF_XDP")
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      25dc18ff
    • Björn Töpel's avatar
      xsk: use state member for socket synchronization · 42fddcc7
      Björn Töpel authored
      Prior the state variable was introduced by Ilya, the dev member was
      used to determine whether the socket was bound or not. However, when
      dev was read, proper SMP barriers and READ_ONCE were missing. In order
      to address the missing barriers and READ_ONCE, we start using the
      state variable as a point of synchronization. The state member
      read/write is paired with proper SMP barriers, and from this follows
      that the members described above does not need READ_ONCE if used in
      conjunction with state check.
      
      In all syscalls and the xsk_rcv path we check if state is
      XSK_BOUND. If that is the case we do a SMP read barrier, and this
      implies that the dev, umem and all rings are correctly setup. Note
      that no READ_ONCE are needed for these variable if used when state is
      XSK_BOUND (plus the read barrier).
      
      To summarize: The members struct xdp_sock members dev, queue_id, umem,
      fq, cq, tx, rx, and state were read lock-less, with incorrect barriers
      and missing {READ, WRITE}_ONCE. Now, umem, fq, cq, tx, rx, and state
      are read lock-less. When these members are updated, WRITE_ONCE is
      used. When read, READ_ONCE are only used when read outside the control
      mutex (e.g. mmap) or, not synchronized with the state member
      (XSK_BOUND plus smp_rmb())
      
      Note that dev and queue_id do not need a WRITE_ONCE or READ_ONCE, due
      to the introduce state synchronization (XSK_BOUND plus smp_rmb()).
      
      Introducing the state check also fixes a race, found by syzcaller, in
      xsk_poll() where umem could be accessed when stale.
      Suggested-by: default avatarHillf Danton <hdanton@sina.com>
      Reported-by: syzbot+c82697e3043781e08802@syzkaller.appspotmail.com
      Fixes: 77cd0d7b ("xsk: add support for need_wakeup flag in AF_XDP rings")
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      42fddcc7
    • Björn Töpel's avatar
      xsk: avoid store-tearing when assigning umem · 9764f4b3
      Björn Töpel authored
      The umem member of struct xdp_sock is read outside of the control
      mutex, in the mmap implementation, and needs a WRITE_ONCE to avoid
      potential store-tearing.
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Fixes: 423f3832 ("xsk: add umem fill queue support and mmap")
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      9764f4b3