Commits · 1b16bf42d154c8fbbab2cccc419e2ba47d700849 · nexedi / linux

15 Jul, 2016 19 commits

macvtap: avoid hash calculating for single queue · 1b16bf42

Jason Wang authored Jul 15, 2016

We decide the rxq through calculating its hash which is not necessary
if we only have one rx queue. So this patch skip this and just return
queue 0. Test shows 22% improving on guest rx pps.

Before: 1201504 pkts/s
After:  1472731 pkts/s
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1b16bf42

Merge branch 'bpf-event-output-helper-improvements' · e980a076

David S. Miller authored Jul 15, 2016

Daniel Borkmann says:

====================
BPF event output helper improvements

This set adds improvements to the BPF event output helper to
support non-linear data sampling, here specifically, for skb
context. For details please see individual patches. The set
is based against net-next tree.

v1 -> v2:
  - Integrated and adapted Peter's diff into patch 1, updated
    the remaining ones accordingly. Thanks Peter!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e980a076

bpf: avoid stack copy and use skb ctx for event output · 555c8a86

Daniel Borkmann authored Jul 14, 2016

This work addresses a couple of issues bpf_skb_event_output()
helper currently has: i) We need two copies instead of just a
single one for the skb data when it should be part of a sample.
The data can be non-linear and thus needs to be extracted via
bpf_skb_load_bytes() helper first, and then copied once again
into the ring buffer slot. ii) Since bpf_skb_load_bytes()
currently needs to be used first, the helper needs to see a
constant size on the passed stack buffer to make sure BPF
verifier can do sanity checks on it during verification time.
Thus, just passing skb->len (or any other non-constant value)
wouldn't work, but changing bpf_skb_load_bytes() is also not
the proper solution, since the two copies are generally still
needed. iii) bpf_skb_load_bytes() is just for rather small
buffers like headers, since they need to sit on the limited
BPF stack anyway. Instead of working around in bpf_skb_load_bytes(),
this work improves the bpf_skb_event_output() helper to address
all 3 at once.

We can make use of the passed in skb context that we have in
the helper anyway, and use some of the reserved flag bits as
a length argument. The helper will use the new __output_custom()
facility from perf side with bpf_skb_copy() as callback helper
to walk and extract the data. It will pass the data for setup
to bpf_event_output(), which generates and pushes the raw record
with an additional frag part. The linear data used in the first
frag of the record serves as programmatically defined meta data
passed along with the appended sample.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

555c8a86

bpf, perf: split bpf_perf_event_output · 8e7a3920

Daniel Borkmann authored Jul 14, 2016

Split the bpf_perf_event_output() helper as a preparation into
two parts. The new bpf_perf_event_output() will prepare the raw
record itself and test for unknown flags from BPF trace context,
where the __bpf_perf_event_output() does the core work. The
latter will be reused later on from bpf_event_output() directly.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

8e7a3920

perf, events: add non-linear data support for raw records · 7e3f977e

Daniel Borkmann authored Jul 14, 2016

This patch adds support for non-linear data on raw records. It
extends raw records to have one or multiple fragments that will
be written linearly into the ring slot, where each fragment can
optionally have a custom callback handler to walk and extract
complex, possibly non-linear data.

If a callback handler is provided for a fragment, then the new
__output_custom() will be used instead of __output_copy() for
the perf_output_sample() part. perf_prepare_sample() does all
the size calculation only once, so perf_output_sample() doesn't
need to redo the same work anymore, meaning real_size and padding
will be cached in the raw record. The raw record becomes 32 bytes
in size without holes; to not increase it further and to avoid
doing unnecessary recalculations in fast-path, we can reuse
next pointer of the last fragment, idea here is borrowed from
ZERO_OR_NULL_PTR(), which should keep the perf_output_sample()
path for PERF_SAMPLE_RAW minimal.

This facility is needed for BPF's event output helper as a first
user that will, in a follow-up, add an additional perf_raw_frag
to its perf_raw_record in order to be able to more efficiently
dump skb context after a linear head meta data related to it.
skbs can be non-linear and thus need a custom output function to
dump buffers. Currently, the skb data needs to be copied twice;
with the help of __output_custom() this work only needs to be
done once. Future users could be things like XDP/BPF programs
that work on different context though and would thus also have
a different callback function.

The few users of raw records are adapted to initialize their frag
data from the raw record itself, no change in behavior for them.
The code is based upon a PoC diff provided by Peter Zijlstra [1].

[1] http://thread.gmane.org/gmane.linux.network/421294Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

7e3f977e

rxrpc: checking for IS_ERR() instead of NULL · 7acef604

Dan Carpenter authored Jul 14, 2016

The rxrpc_lookup_peer() function returns NULL on error, it never returns
error pointers.

Fixes: 8496af50 ('rxrpc: Use RCU to access a peer's service connection tree')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7acef604

net: phy: micrel: Add KSZ8041FTL fiber mode support · 77501a79

Philipp Zabel authored Jul 14, 2016

We can't detect the FXEN (fiber mode) bootstrap pin, so configure
it via a boolean device tree property "micrel,fiber-mode".
If it is enabled, auto-negotiation is not supported.
The only available modes are 100base-fx (full duplex and half duplex).
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

77501a79

wan/fsl_ucc_hdlc: info leak in uhdlc_ioctl() · 2f43b9be

Dan Carpenter authored Jul 14, 2016

There is a 2 byte struct whole after line.loopback so we need to clear
that out to avoid disclosing stack information.

Fixes: c19b6d24 ('drivers/net: support hdlc function for QE-UCC')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2f43b9be

Merge branch 'rds-enable-mprds' · 7a7d1d57

David S. Miller authored Jul 15, 2016

Sowmini Varadhan says:

====================
RDS: TCP: Enable mprds for rds-tcp

The third, and final, installment for mprds-tcp changes.

In Patch 3 of this set, if the transport support t_mp_capable,
we hash outgoing traffic across multiple paths.  Additionally, even if
the transport is MP capable, we may be peering with some node that does
not support mprds, or supports a different number of paths. This
necessitates RDS control plane changes so that both peers agree
on the number of paths to be used for the rds-tcp connection.
Patch 3 implements all these changes, which are documented in patch 5
of the series.

Patch 1 of this series is a bug fix for a race-condition
that has always existed, but is now more easily encountered with mprds.
Patch 2 is code refactoring. Patches 4 and 5 are Documentation updates.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

7a7d1d57

Documentation: RDS: Document Multipath RDS (mprds) · 09204a6c

Sowmini Varadhan authored Jul 14, 2016

Document the design of mprds, covering a brief description
of the motivation, data-structures and modifications to the
RDS control plane.
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

09204a6c

Documentation: RDS: updates for SO_RDS_TRANSPORT socket option · d67214a2

Sowmini Varadhan authored Jul 14, 2016

Update the documentation to describe the changes added by
commit 8ba38460 ("net/rds Add getsockopt support for SO_RDS_TRANSPORT")
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d67214a2

RDS: TCP: Enable multipath RDS for TCP · 5916e2c1

Sowmini Varadhan authored Jul 14, 2016

Use RDS probe-ping to compute how many paths may be used with
the peer, and to synchronously start the multiple paths. If mprds is
supported, hash outgoing traffic to one of multiple paths in rds_sendmsg()
when multipath RDS is supported by the transport.

CC: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5916e2c1

RDS: TCP: Reduce code duplication in rds_tcp_reset_callbacks() · ac3615e7

Sowmini Varadhan authored Jul 14, 2016

Some code duplication in rds_tcp_reset_callbacks() can be avoided
by having the function call rds_tcp_restore_callbacks() and
rds_tcp_set_callbacks().
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ac3615e7

RDS: TCP: avoid bad page reference in rds_tcp_listen_data_ready · a93d01f5

Sowmini Varadhan authored Jul 14, 2016

As the existing comments in rds_tcp_listen_data_ready() indicate,
it is possible under some race-windows to get to this function with the
accept() socket. If that happens, we could run into a sequence whereby

   thread 1				thread 2

rds_tcp_accept_one() thread
sets up new_sock via ->accept().
The sk_user_data is now
sock_def_readable
					data comes in for new_sock,
					->sk_data_ready is called, and
					we land in rds_tcp_listen_data_ready
rds_tcp_set_callbacks()
takes the sk_callback_lock and
sets up sk_user_data to be the cp
					read_lock sk_callback_lock
					ready = cp
					unlock sk_callback_lock
					page fault on ready

In the above sequence, we end up with a panic on a bad page reference
when trying to execute (*ready)(). Instead we need to call
sock_def_readable() safely, which is what this patch achieves.
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a93d01f5

devlink: fix trace format string · caeccd51

Arnd Bergmann authored Jul 14, 2016

Including devlink.h on ARM and probably other 32-bit architectures results in
a harmless warning:

In file included from ../include/trace/define_trace.h:95:0,
                 from ../include/trace/events/devlink.h:51,
                 from ../net/core/devlink.c:30:
include/trace/events/devlink.h: In function 'trace_raw_output_devlink_hwmsg':
include/trace/events/devlink.h:42:12: error: format '%lu' expects argument of type 'long unsigned int', but argument 10 has type 'size_t {aka unsigned int}' [-Werror=format=]

The correct format string for 'size_t' is %zu, not %lu, this works on all
architectures.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: e5224f0f ("devlink: add hardware messages tracing facility")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

caeccd51

tracing: change owner name to driver name for devlink hwmsg tracepoint · 0e1824c9

Jiri Pirko authored Jul 14, 2016

Turned on that driver->owner which is struct module is not available when
modules are disabled. Better to depend on a driver name which is
always available.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Fixes: e5224f0f ("devlink: add hardware messages tracing facility")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

0e1824c9

mlxsw: spectrum_router: Return -ENOENT in case of error · a1e3e737

Christophe Jaillet authored Jul 14, 2016

'vr' should be a valid pointer here, so returning 'PTR_ERR(vr)' is wrong.
Return an explicit error code (-ENOENT) instead.

Fixes: 61c503f9 ("mlxsw: spectrum_router: Implement fib4 add/del switchdev obj ops")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a1e3e737

net: ethernet: ll_temac: use phy_ethtool_{get|set}_link_ksettings · e6dab902

Philippe Reynes authored Jul 14, 2016

There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e6dab902

net: ethernet: ll_temac: use phydev from struct net_device · 31abbe34

Philippe Reynes authored Jul 14, 2016

The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phy in the private structure, and update the driver to use the
one contained in struct net_device.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

31abbe34

14 Jul, 2016 21 commits

Merge tag 'wireless-drivers-next-for-davem-2016-07-13' of... · 88b3ec52

David S. Miller authored Jul 14, 2016

Merge tag 'wireless-drivers-next-for-davem-2016-07-13' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for 4.8

Major changes:

iwlwifi

* more work on the RX path for the 9000 device series
* some more dynamic queue allocation work
* SAR BIOS implementation
* some work on debugging capabilities
* added support for GCMP encryption
* data path rework in preparation for new HW
* some cleanup to remove transport dependency on mac80211
* support for MSIx in preparation for new HW
* lots of work in preparation for HW support (9000 and a000 series)

mwifiex

* implement get_tx_power and get_antenna cfg80211 operation callbacks

wl18xx

* add support for 64bit clock

rtl8xxxu

* aggregation support (optional for now)

Also wireless-drivers is merged to fix some conflicts.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

88b3ec52

Merge branch 'pktgen-scripts' · d8c62a91

David S. Miller authored Jul 14, 2016

Jesper Dangaard Brouer says:

====================
pktgen samples: new scripts and removing older samples

This patchset is adding some pktgen sample scripts that I've been
using for a while[1], and they seams to relevant for more people.

Patchset also remove some of the older style pktgen samples.

[1] https://github.com/netoptimizer/network-testing/tree/master/pktgen
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

d8c62a91

pktgen: remove sample script pktgen.conf-1-1-rdos · d3c937bb

Jesper Dangaard Brouer authored Jul 13, 2016

Removing the pktgen sample script pktgen.conf-1-1-rdos, because
it does not contain anything that is not covered by the other and
newer style sample scripts.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d3c937bb

pktgen: add sample script pktgen_sample05_flow_per_thread.sh · d25692e4

Jesper Dangaard Brouer authored Jul 13, 2016

This pktgen sample script is useful for scalability testing a
receiver.  The script will simply generate one flow per
thread (option -t N) using the thread number as part of the
source IP-address.

The single flow sample (pktgen_sample03_burst_single_flow.sh)
have become quite popular, but it is important that developers
also make sure to benchmark scalability of multiple receive
queues.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

d25692e4

pktgen: add sample script pktgen_sample04_many_flows.sh · 15f2cbbd

Jesper Dangaard Brouer authored Jul 13, 2016

Adding a pktgen sample script that demonstrates how to use pktgen
for simulating flows.  Script will generate a certain number of
concurrent flows ($FLOWS) and each flow will contain $FLOWLEN
packets, which will be send back-to-back, before switching to a
new flow, due to flag FLOW_SEQ.

This script obsoletes the old sample script 'pktgen.conf-1-1-flows',
which is removed.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

15f2cbbd

Merge branch 'mlx5-bulk-flow-stats-sriov-tc-offloads' · 53d94892

David S. Miller authored Jul 14, 2016

Saeed Mahameed says:

====================
Mellanox 100G mlx5 Bulk flow statistics and SRIOV TC offloads

This series from Amir and Or deals with two enhancements for the mlx5 TC offloads.

The 1st two patches add bulk reading of flow counters. Few bulk counter queries are
used instead of issuing thousands firmware commands per second to get statistics of all
flows set to HW.

The next patches add TC based SRIOV offloading to mlx5, as a follow up for the e-switch
offloads mode and the VF representors. When the e-switch is set to the (new) "offloads"
mode, we can now offload TC/flower drop and forward rules, the forward action we offload
is TC mirred/redirect.

The above is done by the VF representor netdevices exporting the setup_tc ndo where from
there we're re-using and enhancing the existing mlx5 TC offloads sub-module which now
works for both the NIC and the SRIOV cases.

The series is applied on top b38a75d2 ('mlxsw: core: Trace EMAD messages')
and it has no merge issues with the on-going net submission ('mlx5 tx timeout watchdog fixes')

V2:
    - Fixed compilation warning.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

53d94892