Commits · c77d62ffae377de331a19020c0b598a6faceb0ef · Kirill Smelkov / linux

17 May, 2018 11 commits

tcp: separate loss marking and state update on RTO · c77d62ff

Yuchung Cheng authored May 16, 2018

Previously when TCP times out, it first updates cwnd and ssthresh,
marks packets lost, and then updates congestion state again. This
was fine because everything not yet delivered is marked lost,
so the inflight is always 0 and cwnd can be safely set to 1 to
retransmit one packet on timeout.

But the inflight may not always be 0 on timeout if TCP changes to
mark packets lost based on packet sent time. Therefore we must
first mark the packet lost, then set the cwnd based on the
(updated) inflight.

This is not a pure refactor. Congestion control may potentially
break if it uses (not yet updated) inflight to compute ssthresh.
Fortunately all existing congestion control modules does not do that.
Also it changes the inflight when CA_LOSS_EVENT is called, and only
westwood processes such an event but does not use inflight.

This change has two other minor side benefits:
1) consistent with Fast Recovery s.t. the inflight is updated
   first before tcp_enter_recovery flips state to CA_Recovery.

2) avoid intertwining loss marking with state update, making the
   code more readable.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c77d62ff

tcp: new helper tcp_timeout_mark_lost · 2ad55f56

Yuchung Cheng authored May 16, 2018

Refactor using a new helper, tcp_timeout_mark_loss(), that marks packets
lost upon RTO.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2ad55f56

tcp: account lost retransmit after timeout · d716bfdb

Yuchung Cheng authored May 16, 2018

The previous approach for the lost and retransmit bits was to
wipe the slate clean: zero all the lost and retransmit bits,
correspondingly zero the lost_out and retrans_out counters, and
then add back the lost bits (and correspondingly increment lost_out).

The new approach is to treat this very much like marking packets
lost in fast recovery. We don’t wipe the slate clean. We just say
that for all packets that were not yet marked sacked or lost, we now
mark them as lost in exactly the same way we do for fast recovery.

This fixes the lost retransmit accounting at RTO time and greatly
simplifies the RTO code by sharing much of the logic with Fast
Recovery.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d716bfdb

tcp: simpler NewReno implementation · 6ac06ecd

Yuchung Cheng authored May 16, 2018

This is a rewrite of NewReno loss recovery implementation that is
simpler and standalone for readability and better performance by
using less states.

Note that NewReno refers to RFC6582 as a modification to the fast
recovery algorithm. It is used only if the connection does not
support SACK in Linux. It should not to be confused with the Reno
(AIMD) congestion control.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6ac06ecd

tcp: disable RFC6675 loss detection · b38a51fe

Yuchung Cheng authored May 16, 2018

This patch disables RFC6675 loss detection and make sysctl
net.ipv4.tcp_recovery = 1 controls a binary choice between RACK
(1) or RFC6675 (0).
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b38a51fe

tcp: support DUPACK threshold in RACK · 20b654df

Yuchung Cheng authored May 16, 2018

This patch adds support for the classic DUPACK threshold rule
(#DupThresh) in RACK.

When the number of packets SACKed is greater or equal to the
threshold, RACK sets the reordering window to zero which would
immediately mark all the unsacked packets below the highest SACKed
sequence lost. Since this approach is known to not work well with
reordering, RACK only uses it if no reordering has been observed.

The DUPACK threshold rule is a particularly useful extension to the
fast recoveries triggered by RACK reordering timer. For example
data-center transfers where the RTT is much smaller than a timer
tick, or high RTT path where the default RTT/4 may take too long.

Note that this patch differs slightly from RFC6675. RFC6675
considers a packet lost when at least #DupThresh higher-sequence
packets are SACKed.

With RACK, for connections that have seen reordering, RACK
continues to use a dynamically-adaptive time-based reordering
window to detect losses. But for connections on which we have not
yet seen reordering, this patch considers a packet lost when at
least one higher sequence packet is SACKed and the total number
of SACKed packets is at least DupThresh. For example, suppose a
connection has not seen reordering, and sends 10 packets, and
packets 3, 5, 7 are SACKed. RFC6675 considers packets 1 and 2
lost. RACK considers packets 1, 2, 4, 6 lost.

There is some small risk of spurious retransmits here due to
reordering. However, this is mostly limited to the first flight of
a connection on which the sender receives SACKs from reordering.
And RFC 6675 and FACK loss detection have a similar risk on the
first flight with reordering (it's just that the risk of spurious
retransmits from reordering was slightly narrower for those older
algorithms due to the margin of 3*MSS).

Also the minimum reordering window is reduced from 1 msec to 0
to recover quicker on short RTT transfers. Therefore RACK is more
aggressive in marking packets lost during recovery to reduce the
reordering window timeouts.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

20b654df

net: ethernet: ti: cpsw: disable mq feature for "AM33xx ES1.0" devices · 9611d6d6

Ivan Khoronzhuk authored May 17, 2018

The early versions of am33xx devices, related to ES1.0 SoC revision
have errata limiting mq support. That's the same errata as
commit 7da11600 ("drivers: net: cpsw: add am335x errata workarround for
interrutps")

AM33xx Errata [1] Advisory 1.0.9
http://www.ti.com/lit/er/sprz360f/sprz360f.pdf

After additional investigation were found that drivers w/a is
propagated on all AM33xx SoCs and on DM814x. But the errata exists
only for ES1.0 of AM33xx family, limiting mq support for revisions
after ES1.0. So, disable mq support only for related SoCs and use
separate polls for revisions allowing mq.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

9611d6d6

Merge branch 'sched-refactor-NOLOCK-qdiscs' · 4b9c7768

David S. Miller authored May 17, 2018

Paolo Abeni says:

====================
sched: refactor NOLOCK qdiscs

With the introduction of NOLOCK qdiscs, pfifo_fast performances in the
uncontended scenario degraded measurably, especially after the commit
eb82a994 ("net: sched, fix OOO packets with pfifo_fast").

This series restore the pfifo_fast performances in such scenario back the
previous level, mainly reducing the number of atomic operations required to
perform the qdisc_run() call. Even performances in the contended scenario
increase measurably.

Note: This series is on top of:

sched: manipulate __QDISC_STATE_RUNNING in qdisc_run_* helpers
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

4b9c7768

pfifo_fast: drop unneeded additional lock on dequeue · 021a17ed

Paolo Abeni authored May 15, 2018

After the previous patch, for NOLOCK qdiscs, q->seqlock is
always held when the dequeue() is invoked, we can drop
any additional locking to protect such operation.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

021a17ed

sched: replace __QDISC_STATE_RUNNING bit with a spin lock · 96009c7d

Paolo Abeni authored May 15, 2018

So that we can use lockdep on it.
The newly introduced sequence lock has the same scope of busylock,
so it shares the same lockdep annotation, but it's only used for
NOLOCK qdiscs.

With this changeset we acquire such lock in the control path around
flushing operation (qdisc reset), to allow more NOLOCK qdisc perf
improvement in the next patch.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

96009c7d

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · b9f672af

David S. Miller authored May 16, 2018

Daniel Borkmann says:

====================
pull-request: bpf-next 2018-05-17

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Provide a new BPF helper for doing a FIB and neighbor lookup
   in the kernel tables from an XDP or tc BPF program. The helper
   provides a fast-path for forwarding packets. The API supports
   IPv4, IPv6 and MPLS protocols, but currently IPv4 and IPv6 are
   implemented in this initial work, from David (Ahern).

2) Just a tiny diff but huge feature enabled for nfp driver by
   extending the BPF offload beyond a pure host processing offload.
   Offloaded XDP programs are allowed to set the RX queue index and
   thus opening the door for defining a fully programmable RSS/n-tuple
   filter replacement. Once BPF decided on a queue already, the device
   data-path will skip the conventional RSS processing completely,
   from Jakub.

3) The original sockmap implementation was array based similar to
   devmap. However unlike devmap where an ifindex has a 1:1 mapping
   into the map there are use cases with sockets that need to be
   referenced using longer keys. Hence, sockhash map is added reusing
   as much of the sockmap code as possible, from John.

4) Introduce BTF ID. The ID is allocatd through an IDR similar as
   with BPF maps and progs. It also makes BTF accessible to user
   space via BPF_BTF_GET_FD_BY_ID and adds exposure of the BTF data
   through BPF_OBJ_GET_INFO_BY_FD, from Martin.

5) Enable BPF stackmap with build_id also in NMI context. Due to the
   up_read() of current->mm->mmap_sem build_id cannot be parsed.
   This work defers the up_read() via a per-cpu irq_work so that
   at least limited support can be enabled, from Song.

6) Various BPF JIT follow-up cleanups and fixups after the LD_ABS/LD_IND
   JIT conversion as well as implementation of an optimized 32/64 bit
   immediate load in the arm64 JIT that allows to reduce the number of
   emitted instructions; in case of tested real-world programs they
   were shrinking by three percent, from Daniel.

7) Add ifindex parameter to the libbpf loader in order to enable
   BPF offload support. Right now only iproute2 can load offloaded
   BPF and this will also enable libbpf for direct integration into
   other applications, from David (Beckett).

8) Convert the plain text documentation under Documentation/bpf/ into
   RST format since this is the appropriate standard the kernel is
   moving to for all documentation. Also add an overview README.rst,
   from Jesper.

9) Add __printf verification attribute to the bpf_verifier_vlog()
   helper. Though it uses va_list we can still allow gcc to check
   the format string, from Mathieu.

10) Fix a bash reference in the BPF selftest's Makefile. The '|& ...'
    is a bash 4.0+ feature which is not guaranteed to be available
    when calling out to shell, therefore use a more portable variant,
    from Joe.

11) Fix a 64 bit division in xdp_umem_reg() by using div_u64()
    instead of relying on the gcc built-in, from Björn.

12) Fix a sock hashmap kmalloc warning reported by syzbot when an
    overly large key size is used in hashmap then causing overflows
    in htab->elem_size. Reject bogus attr->key_size early in the
    sock_hash_alloc(), from Yonghong.

13) Ensure in BPF selftests when urandom_read is being linked that
    --build-id is always enabled so that test_stacktrace_build_id[_nmi]
    won't be failing, from Alexei.

14) Add bitsperlong.h as well as errno.h uapi headers into the tools
    header infrastructure which point to one of the arch specific
    uapi headers. This was needed in order to fix a build error on
    some systems for the BPF selftests, from Sirio.

15) Allow for short options to be used in the xdp_monitor BPF sample
    code. And also a bpf.h tools uapi header sync in order to fix a
    selftest build failure. Both from Prashant.

16) More formally clarify the meaning of ID in the direct packet access
    section of the BPF documentation, from Wang.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b9f672af

16 May, 2018 29 commits

bpf: sockmap, on update propagate errors back to userspace · e23afe5e

John Fastabend authored May 16, 2018

When an error happens in the update sockmap element logic also pass
the err up to the user.

Fixes: e5cd3abc ("bpf: sockmap, refactor sockmap routines to work with hashmap")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

e23afe5e

bpf: fix sock hashmap kmalloc warning · 683d2ac3

Yonghong Song authored May 16, 2018

syzbot reported a kernel warning below:
  WARNING: CPU: 0 PID: 4499 at mm/slab_common.c:996 kmalloc_slab+0x56/0x70 mm/slab_common.c:996
  Kernel panic - not syncing: panic_on_warn set ...

  CPU: 0 PID: 4499 Comm: syz-executor050 Not tainted 4.17.0-rc3+ #9
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Call Trace:
   __dump_stack lib/dump_stack.c:77 [inline]
   dump_stack+0x1b9/0x294 lib/dump_stack.c:113
   panic+0x22f/0x4de kernel/panic.c:184
   __warn.cold.8+0x163/0x1b3 kernel/panic.c:536
   report_bug+0x252/0x2d0 lib/bug.c:186
   fixup_bug arch/x86/kernel/traps.c:178 [inline]
   do_error_trap+0x1de/0x490 arch/x86/kernel/traps.c:296
   do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
   invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992
  RIP: 0010:kmalloc_slab+0x56/0x70 mm/slab_common.c:996
  RSP: 0018:ffff8801d907fc58 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff8801aeecb280 RCX: ffffffff8185ebd7
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000ffffffe1
  RBP: ffff8801d907fc58 R08: ffff8801adb5e1c0 R09: ffffed0035a84700
  R10: ffffed0035a84700 R11: ffff8801ad423803 R12: ffff8801aeecb280
  R13: 00000000fffffff4 R14: ffff8801ad891a00 R15: 00000000014200c0
   __do_kmalloc mm/slab.c:3713 [inline]
   __kmalloc+0x25/0x760 mm/slab.c:3727
   kmalloc include/linux/slab.h:517 [inline]
   map_get_next_key+0x24a/0x640 kernel/bpf/syscall.c:858
   __do_sys_bpf kernel/bpf/syscall.c:2131 [inline]
   __se_sys_bpf kernel/bpf/syscall.c:2096 [inline]
   __x64_sys_bpf+0x354/0x4f0 kernel/bpf/syscall.c:2096
   do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

The test case is against sock hashmap with a key size 0xffffffe1.
Such a large key size will cause the below code in function
sock_hash_alloc() overflowing and produces a smaller elem_size,
hence map creation will be successful.
    htab->elem_size = sizeof(struct htab_elem) +
                      round_up(htab->map.key_size, 8);

Later, when map_get_next_key is called and kernel tries
to allocate the key unsuccessfully, it will issue
the above warning.

Similar to hashtab, ensure the key size is at most
MAX_BPF_STACK for a successful map creation.

Fixes: 81110384 ("bpf: sockmap, add hash map support")
Reported-by: syzbot+e4566d29080e7f3460ff@syzkaller.appspotmail.com
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

683d2ac3

libbpf: add ifindex to enable offload support · f0307a7e

David Beckett authored May 16, 2018

BPF programs currently can only be offloaded using iproute2. This
patch will allow programs to be offloaded using libbpf calls.
Signed-off-by: David Beckett <david.beckett@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

f0307a7e

bpf: add __printf verification to bpf_verifier_vlog · be2d04d1

Mathieu Malaterre authored May 16, 2018

__printf is useful to verify format and arguments. ‘bpf_verifier_vlog’
function is used twice in verifier.c in both cases the caller function
already uses the __printf gcc attribute.

Remove the following warning, triggered with W=1:

  kernel/bpf/verifier.c:176:2: warning: function might be possible candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format]
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

be2d04d1

samples/bpf: Decrement ttl in fib forwarding example · 44edef77

David Ahern authored May 15, 2018

Only consider forwarding packets if ttl in received packet is > 1 and
decrement ttl before handing off to bpf_redirect_map.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

44edef77

Merge branch 'bpf-sock-hashmap' · 5b26ace6

Daniel Borkmann authored May 16, 2018

John Fastabend says:

====================
In the original sockmap implementation we got away with using an
array similar to devmap. However, unlike devmap where an ifindex
has a nice 1:1 function into the map we have found some use cases
with sockets that need to be referenced using longer keys.

This series adds support for a sockhash map reusing as much of
the sockmap code as possible. I made the decision to add sockhash
specific helpers vs trying to generalize the existing helpers
because (a) they have sockmap in the name and (b) the keys are
different types. I prefer to be explicit here rather than play
type games or do something else tricky.

To test this we duplicate all the sockmap testing except swap out
the sockmap with a sockhash.

v2: fix file stats and add v2 tag
v3: move tool updates into test patch, move bpftool updates into
    its own patch, and fixup the test patch stats to catch the
    renamed file and provide only diffs ± on that.
v4: Add documentation to UAPI bpf.h
v5: Add documentation to tools UAPI bpf.h
v6: 'git add' test_sockhash_kern.c which was previously missing
    but was not causing issues because of typo in test script,
    noticed by Daniel. After this the git format-patch -M option
    no longer tracks the rename of the test_sockmap_kern files for
    some reason. I guess the diff has exceeded some threshold.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

5b26ace6

bpf: bpftool, support for sockhash · 62c52d1f

John Fastabend authored May 14, 2018

This adds the SOCKHASH map type to bpftools so that we get correct
pretty printing.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

62c52d1f

bpf: selftest additions for SOCKHASH · b8b394fa

John Fastabend authored May 14, 2018

This runs existing SOCKMAP tests with SOCKHASH map type. To do this
we push programs into include file and build two BPF programs. One
for SOCKHASH and one for SOCKMAP.

We then run the entire test suite with each type.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

b8b394fa

cxgb4: update LE-TCAM collection for T6 · 8e725f7c

Rahul Lakkireddy authored May 16, 2018

For T6, clip table is separated from main TCAM. So, update LE-TCAM
collection logic to collect clip table TCAM as well. IPv6 takes
4 entries in clip table TCAM compared to 2 entries in main TCAM.

Also, in case of errors, keep LE-TCAM collected so far and set the
status to partial dump.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8e725f7c

Merge branch 'qed-LL2-fixes' · 7e360d9d

David S. Miller authored May 16, 2018

Michal Kalderon says:

====================
qed: LL2 fixes

This series fixes some issues in ll2 related to synchronization
and resource freeing
====================
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>

7e360d9d

qed: Fix LL2 race during connection terminate · fc16f56b

Michal Kalderon authored May 16, 2018

Stress on qedi/qedr load unload lead to list_del corruption.
This is due to ll2 connection terminate freeing resources without
verifying that no more ll2 processing will occur.

This patch unregisters the ll2 status block before terminating
the connection to assure this race does not occur.

Fixes: 1d6cff4f ("qed: Add iSCSI out of order packet handling")
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fc16f56b

qed: Fix possibility of list corruption during rmmod flows · 6291c608

Michal Kalderon authored May 16, 2018

The ll2 flows of flushing the txq/rxq need to be synchronized with the
regular fp processing. Caused list corruption during load/unload stress
tests.

Fixes: 0a7fb11c ("qed: Add Light L2 support")
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6291c608

qed: LL2 flush isles when connection is closed · 974f6c04

Michal Kalderon authored May 16, 2018

Driver should free all pending isles once it gets a FLUSH cqe from FW.
Part of iSCSI out of order flow.

Fixes: 1d6cff4f ("qed: Add iSCSI out of order packet handling")
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

974f6c04

net: ethoc: Remove useless test before clk_disable_unprepare · 76e597eb

YueHaibing authored May 16, 2018

clk_disable_unprepare() already checks that the clock pointer is valid.
No need to test it before calling it.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

76e597eb

net: stmmac: Remove useless test before clk_disable_unprepare · 93120eba

YueHaibing authored May 16, 2018

clk_disable_unprepare() already checks that the clock pointer is valid.
No need to test it before calling it.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

93120eba

net: qcom/emac: Encapsulate sgmii ops under one structure · 9e6881d3

Hemanth Puranik authored May 16, 2018

This patch introduces ops structure for sgmii, This by ensures that
we do not need dummy functions in case of emulation platforms.
Signed-off-by: Hemanth Puranik <hpuranik@codeaurora.org>
Acked-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

9e6881d3

Merge branch 'rmnet-next' · cfb3e089

David S. Miller authored May 16, 2018

Subash Abhinov Kasiviswanathan says:

====================
net: qualcomm: rmnet: Updates 2018-05-14

Patch 1 adds tx_drops counter to more places.
Patch 2 adds ethtool private stats support to make it easy to debug
the checksum offload path.
Patch 3 is a cleanup in command packet processing path.

v1->v2: Fix the incorrect if / else statement in
rmnet_map_checksum_downlink_packet() and define rmnet_ethtool_ops
as static as mentioned by kbuild test robot.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

cfb3e089

net: qualcomm: rmnet: Remove redundant command check · 721ce0f6

Subash Abhinov Kasiviswanathan authored May 15, 2018

The command packet size is already checked once in
rmnet_map_deaggregate() for the header, packet and trailer size, so
this additional check is not needed.
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

721ce0f6

net: qualcomm: rmnet: Add support for ethtool private stats · bbde32d3

Subash Abhinov Kasiviswanathan authored May 15, 2018

Add ethtool private stats handler to debug the handling of packets
with checksum offload header / trailer. This allows to keep track of
the number of packets for which hardware computes the checksum and
counts and reasons where checksum computation was skipped in hardware
and was done in the network stack.
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bbde32d3

net: qualcomm: rmnet: Capture all drops in transmit path · 1eece799

Subash Abhinov Kasiviswanathan authored May 15, 2018

Packets in transmit path could potentially be dropped if there were
errors while adding the MAP header or the checksum header.
Increment the tx_drops stats in these cases.

Additionally, refactor the code to free the packet and increment
the tx_drops stat under a single label.
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

1eece799

Merge branch 'of-mdio-Fall-back-to-mdiobus_register-with-NULL-device_node' · b0ec7585

David S. Miller authored May 16, 2018

Florian Fainelli says:

====================
of: mdio: Fall back to mdiobus_register() with NULL device_node

This patch series updates of_mdiobus_register() such that when the device_node
argument is NULL, it calls mdiobus_register() directly. This is consistent with
the behavior of of_mdiobus_register() when CONFIG_OF=n.

I only converted the most obvious drivers, there are others that have a much
less obvious behavior and specifically attempt to deal with CONFIG_ACPI.

Changes in v2:

- fixed build error in davincin_mdio.c (Grygorii)
- reworked first patch a bit: commit message, subject and removed useless
  code comment
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b0ec7585

drivers: net: Remove device_node checks with of_mdiobus_register() · 00e798c7

Florian Fainelli authored May 15, 2018

A number of drivers have the following pattern:

if (np)
	of_mdiobus_register()
else
	mdiobus_register()

which the implementation of of_mdiobus_register() now takes care of.
Remove that pattern in drivers that strictly adhere to it.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Fugang Duan <fugang.duan@nxp.com>
Reviewed-by: Antoine Tenart <antoine.tenart@bootlin.com>
Reviewed-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

00e798c7

of: mdio: Fall back to mdiobus_register() with NULL device_node · 6d07a68a

Florian Fainelli authored May 15, 2018

When the device_node specified is NULL, fall back to mdiobus_register().
We have a number of drivers having a similar pattern which is:

if (np)
	of_mdiobus_register()
else
	mdiobus_register()

so incorporate that behavior within the core of_mdiobus_register()
function. This is also consistent with the stub version that we defined
when CONFIG_OF=n.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6d07a68a

net: ethernet: ti: cpsw-phy-sel: check bus_find_device() ret value · c6213eb1

Grygorii Strashko authored May 15, 2018

This fixes klockworks warnings: Pointer 'dev' returned from call to
function 'bus_find_device' at line 179 may be NULL and will be dereferenced
at line 181.

    cpsw-phy-sel.c:179: 'dev' is assigned the return value from function 'bus_find_device'.
    bus.c:342: 'bus_find_device' explicitly returns a NULL value.
    cpsw-phy-sel.c:181: 'dev' is dereferenced by passing argument 1 to function 'dev_get_drvdata'.
    device.h:1024: 'dev' is passed to function 'dev_get_drvdata'.
    device.h:1026: 'dev' is explicitly dereferenced.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
[nsekhar@ti.com: add an error message, fix return path]
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c6213eb1

Revert "bonding: allow carrier and link status to determine link state" · b3c898e2

Debabrata Banerjee authored May 16, 2018

This reverts commit 1386c36b.

We don't want to encourage drivers to not report carrier status
correctly, therefore remove this commit.
Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b3c898e2

tc-testing: updated mirred and vlan with more tests · 55df3e97

Roman Mashak authored May 15, 2018

Added extra test cases for different control actions (reclassify, pipe
etc.), cookies, max values & exceeding maximum, and replace existing
actions unit tests.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

55df3e97

tc-testing: fixed copy-pasting error in police tests · 2f42a128

Roman Mashak authored May 15, 2018

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2f42a128

sched: manipulate __QDISC_STATE_RUNNING in qdisc_run_* helpers · 32f7b44d

Paolo Abeni authored May 15, 2018

Currently NOLOCK qdiscs pay a measurable overhead to atomically
manipulate the __QDISC_STATE_RUNNING. Such bit is flipped twice per
packet in the uncontended scenario with packet rate below the
line rate: on packed dequeue and on the next, failing dequeue attempt.

This changeset moves the bit manipulation into the qdisc_run_{begin,end}
helpers, so that the bit is now flipped only once per packet, with
measurable performance improvement in the uncontended scenario.

This also allows simplifying the qdisc teardown code path - since
qdisc_is_running() is now effective for each qdisc type - and avoid a
possible race between qdisc_run() and dev_deactivate_many(), as now
the some_qdisc_is_busy() can properly detect NOLOCK qdiscs being busy
dequeuing packets.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

32f7b44d

Merge branch 'bonding-performance-and-reliability' · 5a63f77a

David S. Miller authored May 16, 2018

Debabrata Banerjee says:

====================
bonding: performance and reliability

Series of fixes to how rlb updates are handled, code cleanup, allowing
higher performance tx hashing in balance-alb mode, and reliability of
link up/down monitoring.

v2: refactor bond_is_nondyn_tlb with inline fn, update log comment to
point out that multicast addresses will not get rlb updates.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

5a63f77a