Commits · 62b31b42cff924c7d1e9a095b68ff3bbfc49b15b · nexedi / linux

24 Mar, 2019 1 commit

bpf: silence uninitialized var warning in bpf_skb_net_grow · 62b31b42

Willem de Bruijn authored Mar 23, 2019

These three variables are set in one branch and used in another with
the same condition. But on some architectures they still generate
compiler warnings of the kind:

  warning: 'inner_trans' may be used uninitialized in this function [-Wmaybe-uninitialized]

Silence these false positives. Use the straightforward approach to
always initialize them, if a bit superfluous.

Fixes: 868d5235 ("bpf: add bpf_skb_adjust_room encap flags")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

62b31b42

23 Mar, 2019 2 commits

selftests: bpf: tc-bpf flow shaping with EDT · 7df5e3db

Peter Oskolkov authored Mar 22, 2019

Add a small test that shows how to shape a TCP flow in tc-bpf
with EDT and ECN.
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

7df5e3db

bpf: make bpf_skb_ecn_set_ce callable from BPF_PROG_TYPE_SCHED_ACT · 315a2029

Peter Oskolkov authored Mar 22, 2019

This helper is useful if a bpf tc filter sets skb->tstamp.
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

315a2029

22 Mar, 2019 25 commits

Merge branch 'bpf-tc-tunneling' · 629a0025

Alexei Starovoitov authored Mar 22, 2019

Willem de Bruijn says:

====================
BPF allows for dynamic tunneling, choosing the tunnel destination and
features on-demand. Extend bpf_skb_adjust_room to allow for efficient
tunneling at the TC hooks.

Most features are required for large packets with GSO, as these will
be modified after this patch.

Patch 1
  is a performance optimization, avoiding an unnecessary unclone
  for the TCP hot path.

Patches 2..6
  introduce a regression test. These can be squashed, but the code is
  arguably more readable when gradually expanding the feature set.

Patch 7
  is a performance optimization, avoid copying network headers
  that are going to be overwritten. This also simplifies the bpf
  program.

Patch 8
  reenables bpf_skb_adjust_room for UDP packets.

Patch 9
  configures skb tunneling metadata analogous to tunnel devices.

Patches 10..13
  expand the regression test to make use of the new features and
  enable the GSO testcases.

Changes
  v1->v2
  - move BPF_F_ADJ_ROOM_MASK out of uapi as it can be expanded
  - document new flags
  - in tests replace netcat -q flag with coreutils timeout:
      the -q flag is not supported in all netcat versions
  v2->v3
  - move BPF_F_ADJ_ROOM_ENCAP_L3_MASK out of uapi as it has no
    use in userspace
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

629a0025

selftests/bpf: convert bpf tunnel test to encap modes · 75a1a9fa

Willem de Bruijn authored Mar 22, 2019

Make the tests correctly annotate skbs with tunnel metadata.

This makes the gso tests succeed. Enable them.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

75a1a9fa

selftests/bpf: convert bpf tunnel test to BPF_F_ADJ_ROOM_FIXED_GSO · 94f16813

Willem de Bruijn authored Mar 22, 2019

Lower route MTU to ensure packets fit in device MTU after encap, then
skip the gso_size changes.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

94f16813

selftests/bpf: convert bpf tunnel test to BPF_ADJ_ROOM_MAC · 005edd16

Willem de Bruijn authored Mar 22, 2019

Avoid moving the network layer header when prefixing tunnel headers.

This avoids an explicit call to bpf_skb_store_bytes and an implicit
move of the network header bytes in bpf_skb_adjust_room.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

005edd16

bpf: Sync bpf.h to tools · 6c408dec

Willem de Bruijn authored Mar 22, 2019

Sync include/uapi/linux/bpf.h with tools/

Changes
  v1->v2:
  - BPF_F_ADJ_ROOM_MASK moved, no longer in this commit
  v2->v3:
  - BPF_F_ADJ_ROOM_ENCAP_L3_MASK moved, no longer in this commit
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

6c408dec

bpf: add bpf_skb_adjust_room encap flags · 868d5235

Willem de Bruijn authored Mar 22, 2019

When pushing tunnel headers, annotate skbs in the same way as tunnel
devices.

For GSO packets, the network stack requires certain fields set to
segment packets with tunnel headers. gro_gse_segment depends on
transport and inner mac header, for instance.

Add an option to pass this information.

Remove the restriction on len_diff to network header length, which
is too short, e.g., for GRE protocols.

Changes
  v1->v2:
  - document new flags
  - BPF_F_ADJ_ROOM_MASK moved
  v2->v3:
  - BPF_F_ADJ_ROOM_ENCAP_L3_MASK moved
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

868d5235

bpf: add bpf_skb_adjust_room flag BPF_F_ADJ_ROOM_FIXED_GSO · 2278f6cc

Willem de Bruijn authored Mar 22, 2019

bpf_skb_adjust_room adjusts gso_size of gso packets to account for the
pushed or popped header room.

This is not allowed with UDP, where gso_size delineates datagrams. Add
an option to avoid these updates and allow this call for datagrams.

It can also be used with TCP, when MSS is known to allow headroom,
e.g., through MSS clamping or route MTU.

Changes v1->v2:
  - document flag BPF_F_ADJ_ROOM_FIXED_GSO
  - do not expose BPF_F_ADJ_ROOM_MASK through uapi, as it may change.

Link: https://patchwork.ozlabs.org/patch/1052497/Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

2278f6cc

bpf: add bpf_skb_adjust_room mode BPF_ADJ_ROOM_MAC · 14aa3192

Willem de Bruijn authored Mar 22, 2019

bpf_skb_adjust_room net allows inserting room in an skb.

Existing mode BPF_ADJ_ROOM_NET inserts room after the network header
by pulling the skb, moving the network header forward and zeroing the
new space.

Add new mode BPF_ADJUST_ROOM_MAC that inserts room after the mac
header. This allows inserting tunnel headers in front of the network
header without having to recreate the network header in the original
space, avoiding two copies.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

14aa3192

selftests/bpf: extend bpf tunnel test with tso · 81429589

Willem de Bruijn authored Mar 22, 2019

Segmentation offload takes a longer path. Verify that the feature
works with large packets.

The test succeeds if not setting dodgy in bpf_skb_adjust_room, as veth
TSO is permissive.

If not setting SKB_GSO_DODGY, this enables tunneled TSO offload on
supporting NICs.

The feature sets SKB_GSO_DODGY because the caller is untrusted. As a
result the packets traverse through the gso stack at least up to TCP.
And fail the gso_type validation, such as the skb->encapsulation check
in gre_gso_segment and the gso_type checks introduced in commit
418e897e ("gso: validate gso_type on ipip style tunnel").

This will be addressed in a follow-on feature patch. In the meantime,
disable the new gso tests.

Changes v1->v2:
  - not all netcat versions support flag '-q', use timeout instead
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

81429589

selftests/bpf: extend bpf tunnel test with gre · 7255fade

Willem de Bruijn authored Mar 22, 2019

GRE is a commonly used protocol. Add GRE cases for both IPv4 and IPv6.

It also inserts different sized headers, which can expose some
unexpected edge cases.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

7255fade

selftests/bpf: expand bpf tunnel test to ipv6 · ef81bd05

Willem de Bruijn authored Mar 22, 2019

The test only uses ipv4 so far, expand to ipv6.
This is mostly a boilerplate near copy of the ipv4 path.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

ef81bd05

selftests/bpf: expand bpf tunnel test with decap · ccd34cd3

Willem de Bruijn authored Mar 22, 2019

The bpf tunnel test encapsulates using bpf, then decapsulates using
a standard tunnel device to verify correctness.

Once encap is verified, also test decap, by replacing the tunnel
device on decap with another bpf program.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

ccd34cd3

selftests/bpf: bpf tunnel encap test · 98cdabcd

Willem de Bruijn authored Mar 22, 2019

Validate basic tunnel encapsulation using ipip.

Set up two namespaces connected by veth. Connect a client and server.
Do this with and without bpf encap.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

98cdabcd

bpf: in bpf_skb_adjust_room avoid copy in tx fast path · 908adce6

Willem de Bruijn authored Mar 22, 2019

bpf_skb_adjust_room calls skb_cow on grow.

This expensive operation can be avoided in the fast path when the only
other clone has released the header. This is the common case for TCP,
where one headerless clone is kept on the retransmit queue.

It is safe to do so even when touching the gso fields in skb_shinfo.
Regular tunnel encap with iptunnel_handle_offloads takes the same
optimization.

The tcp stack unclones in the unlikely case that it accesses these
fields through headerless clones packets on the retransmit queue (see
__tcp_retransmit_skb).

If any other clones are present, e.g., from packet sockets,
skb_cow_head returns the same value as skb_cow().
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

908adce6

selftests: bpf: modify urandom_read and link it non-statically · f6827526

Ivan Vecera authored Mar 15, 2019

After some experiences I found that urandom_read does not need to be
linked statically. When the 'read' syscall call is moved to separate
non-inlined function then bpf_get_stackid() is able to find
the executable in stack trace and extract its build_id from it.
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

f6827526

samples: bpf: add xdp_sample_pkts to .gitignore · ab99e7a8

Daniel T. Lee authored Mar 20, 2019

This commit adds xdp_sample_pkts to .gitignore which is
currently ommited from the ignore file.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

ab99e7a8

Merge branch 'bpf_tcp_check_syncookie' · 25694738

Alexei Starovoitov authored Mar 21, 2019

Lorenz Bauer says:

====================
This series adds the necessary helpers to determine wheter a given
(encapsulated) TCP packet belongs to a connection known to the network stack.

* bpf_skc_lookup_tcp gives access to request and timewait sockets
* bpf_tcp_check_syncookie identifies the final 3WHS ACK when syncookies
  are enabled

The goal is to be able to implement load-balancing approaches like
glb-director [1] or Beamer [2] in pure eBPF. Specifically, we'd like to replace
the functionality of the glb-redirect kernel module [3] by an XDP program or
tc classifier.

Changes in v3:
* Fix missing check for ip4->ihl
* Only cast to unsigned long in BPF_CALLs

Changes in v2:
* Rename bpf_sk_check_syncookie to bpf_tcp_check_syncookie.
* Add bpf_skc_lookup_tcp. Without it bpf_tcp_check_syncookie doesn't make sense.
* Check tcp_synq_no_recent_overflow() in bpf_tcp_check_syncookie.
* Check th->syn in bpf_tcp_check_syncookie.
* Require CONFIG_IPV6 to be a built in.

1: https://github.com/github/glb-director
2: https://www.usenix.org/conference/nsdi18/presentation/olteanu
3: https://github.com/github/glb-director/tree/master/src/glb-redirect
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

25694738

selftests/bpf: add tests for bpf_tcp_check_syncookie and bpf_skc_lookup_tcp · bafc0ba8

Lorenz Bauer authored Mar 22, 2019

Add tests which verify that the new helpers work for both IPv4 and
IPv6, by forcing SYN cookies to always on. Use a new network namespace
to avoid clobbering the global SYN cookie settings.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bafc0ba8

selftests/bpf: test references to sock_common · 5792d52d

Lorenz Bauer authored Mar 22, 2019

Make sure that returning a struct sock_common * reference invokes
the reference tracking machinery in the verifier.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

5792d52d

selftests/bpf: allow specifying helper for BPF_SK_LOOKUP · dbaf2877

Lorenz Bauer authored Mar 22, 2019

Make the BPF_SK_LOOKUP macro take a helper function, to ease
writing tests for new helpers.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

dbaf2877

tools: update include/uapi/linux/bpf.h · 253c8dde

Lorenz Bauer authored Mar 22, 2019

Pull definitions for bpf_skc_lookup_tcp and bpf_sk_check_syncookie.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

253c8dde

bpf: add helper to check for a valid SYN cookie · 39904084

Lorenz Bauer authored Mar 22, 2019

Using bpf_skc_lookup_tcp it's possible to ascertain whether a packet
belongs to a known connection. However, there is one corner case: no
sockets are created if SYN cookies are active. This means that the final
ACK in the 3WHS is misclassified.

Using the helper, we can look up the listening socket via
bpf_skc_lookup_tcp and then check whether a packet is a valid SYN
cookie ACK.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

39904084

bpf: add skc_lookup_tcp helper · edbf8c01

Lorenz Bauer authored Mar 22, 2019

Allow looking up a sock_common. This gives eBPF programs
access to timewait and request sockets.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

edbf8c01

bpf: allow helpers to return PTR_TO_SOCK_COMMON · 85a51f8c

Lorenz Bauer authored Mar 22, 2019

It's currently not possible to access timewait or request sockets
from eBPF, since there is no way to return a PTR_TO_SOCK_COMMON
from a helper. Introduce RET_PTR_TO_SOCK_COMMON to enable this
behaviour.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

85a51f8c

bpf: track references based on is_acquire_func · 0f3adc28

Lorenz Bauer authored Mar 22, 2019

So far, the verifier only acquires reference tracking state for
RET_PTR_TO_SOCKET_OR_NULL. Instead of extending this for every
new return type which desires these semantics, acquire reference
tracking state iff the called helper is an acquire function.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

0f3adc28

21 Mar, 2019 1 commit

selftests/bpf: Add arm target register definitions · 48e5d98a

Adrian Ratiu authored Mar 20, 2019

eBPF "restricted C" code can be compiled with LLVM/clang using target
triplets like armv7l-unknown-linux-gnueabihf and loaded/run with small
cross-compiled gobpf/elf [1] programs without requiring a full BCC
port which is also undesirable on small embedded systems due to its
size footprint. The only missing pieces are these helper macros which
otherwise have to be redefined by each eBPF arm program.

[1] https://github.com/iovisor/gobpf/tree/master/elfSigned-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

48e5d98a

20 Mar, 2019 11 commits

net: isdn: Make isdn_ppp_mp_discard and isdn_ppp_mp_reassembly static · a534ea30

YueHaibing authored Mar 20, 2019

Fix sparse warnings:

drivers/isdn/i4l/isdn_ppp.c:1891:16: warning:
 symbol 'isdn_ppp_mp_discard' was not declared. Should it be static?
drivers/isdn/i4l/isdn_ppp.c:1903:6: warning:
 symbol 'isdn_ppp_mp_reassembly' was not declared. Should it be static?
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a534ea30

net: hns3: Make hclge_destroy_cmd_queue static · 881d7afd

YueHaibing authored Mar 20, 2019

Fix sparse warning:

drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c:414:6:
 warning: symbol 'hclge_destroy_cmd_queue' was not declared. Should it be static?
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

881d7afd

Merge branch 'net-refactor-ndo_select_queue' · 75d317c4

David S. Miller authored Mar 20, 2019

Paolo Abeni says:

====================
net: refactor ndo_select_queue()

Currently, on most devices implementing ndo_select_queue(), we get 2
indirect calls per xmit packet, at least in some scenarios.

We can avoid one of such indirect calls refactoring the ndo_select_queue()
usage so that we don't need anymore the 'fallback' argument.

The first patch renames a helper used later as a public API, the second one
changes the af packet implementation so that it uses the common infrastructure
to select the xmit queue, and the second patch drops the now unneeded argument
from ndo_select_queue().

Alternatively we could use the INDIRECT_CALL_WRAPPER infrastructure to avoid
the fallback indirect call in the common case, but this solution allows also
for some code cleanup.

 v1 -> v2:
  - renamed select queue helpers, as per Eric's and David's suggestions
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

75d317c4

net: remove 'fallback' argument from dev->ndo_select_queue() · a350ecce

Paolo Abeni authored Mar 20, 2019

After the previous patch, all the callers of ndo_select_queue()
provide as a 'fallback' argument netdev_pick_tx.
The only exceptions are nested calls to ndo_select_queue(),
which pass down the 'fallback' available in the current scope
- still netdev_pick_tx.

We can drop such argument and replace fallback() invocation with
netdev_pick_tx(). This avoids an indirect call per xmit packet
in some scenarios (TCP syn, UDP unconnected, XDP generic, pktgen)
with device drivers implementing such ndo. It also clean the code
a bit.

Tested with ixgbe and CONFIG_FCOE=m

With pktgen using queue xmit:
threads		vanilla 	patched
		(kpps)		(kpps)
1		2334		2428
2		4166		4278
4		7895		8100

 v1 -> v2:
 - rebased after helper's name change
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a350ecce

packet: rework packet_pick_tx_queue() to use common code selection · b71b5837

Paolo Abeni authored Mar 20, 2019

Currently packet_pick_tx_queue() is the only caller of
ndo_select_queue() using a fallback argument other than
netdev_pick_tx.

Leveraging rx queue, we can obtain a similar queue selection
behavior using core helpers. After this change, ndo_select_queue()
is always invoked with netdev_pick_tx() as fallback.
We can change ndo_select_queue() signature in a followup patch,
dropping an indirect call per transmitted packet in some scenarios
(e.g. TCP syn and XDP generic xmit)

This changes slightly how af packet queue selection happens when
PACKET_QDISC_BYPASS is set. It's now more similar to plan dev_queue_xmit()
tacking in account both XPS and TC mapping.

 v1  -> v2:
  - rebased after helper name change
 RFC -> v1:
  - initialize sender_cpu to the expected value
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b71b5837

net: dev: rename queue selection helpers. · 4bd97d51

Paolo Abeni authored Mar 20, 2019

With the following patches, we are going to use __netdev_pick_tx() in
many modules. Rename it to netdev_pick_tx(), to make it clear is
a public API.

Also rename the existing netdev_pick_tx() to netdev_core_pick_tx(),
to avoid name clashes.
Suggested-by: Eric Dumazet <edumazet@google.com>
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4bd97d51

Merge branch 'qed-next' · 0b963ef2

David S. Miller authored Mar 20, 2019

Sudarsana Reddy Kalluru says:

====================
qed* enhancements.

The patch series adds couple of enhancements for qed/qede drivers.
Please consider applying it to 'net-next' tree.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

0b963ef2

qed: Define new MF bit for no_vlan config · 1a3ca250

Sudarsana Reddy Kalluru authored Mar 20, 2019

The patch introduces a new Multi-Function bit for cases where firmware
shouldn't perform the insertion of vlan-0 tag. The new bit is defined to
abstract the implementation from the actual MF mode.
Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1a3ca250

qede: Populate mbi version in ethtool driver query data. · a88381de

Sudarsana Reddy Kalluru authored Mar 20, 2019

The patch adds support to display MBI image version in 'ethtool -i' output.
Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a88381de

macvlan: pass get_ts_info and SIOC[SG]HWTSTAMP ioctl to real device · 254c0a2b

Hangbin Liu authored Mar 20, 2019

Similiar to commit a6111d3c ("vlan: Pass SIOC[SG]HWTSTAMP ioctls to
real device") and commit 37dd9255 ("vlan: Pass ethtool get_ts_info
queries to real device."), add MACVlan HW ptp support.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

254c0a2b

net: bridge: use eth_broadcast_addr() to assign broadcast address · 1bfe45f4

Mao Wenan authored Mar 20, 2019

This patch is to use eth_broadcast_addr() to assign broadcast address
insetad of memset().
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1bfe45f4