Commits · adc6a3ab192eb40fb9d8b093c87d9aa785af4513 · Kirill Smelkov / linux

13 Apr, 2019 10 commits

rhashtable: move dereference inside rht_ptr() · adc6a3ab

NeilBrown authored Apr 12, 2019

Rather than dereferencing a pointer to a bucket and then passing the
result to rht_ptr(), we now pass in the pointer and do the dereference
in rht_ptr().

This requires that we pass in the tbl and hash as well to support RCU
checks, and means that the various rht_for_each functions can expect a
pointer that can be dereferenced without further care.

There are two places where we dereference a bucket pointer
where there is no testable protection - in each case we know
that we much have exclusive access without having taken a lock.
The previous code used rht_dereference() to pretend that holding
the mutex provided protects, but holding the mutex never provides
protection for accessing buckets.

So instead introduce rht_ptr_exclusive() that can be used when
there is known to be exclusive access without holding any locks.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

adc6a3ab

rhashtable: reorder some inline functions and macros. · c5783311

NeilBrown authored Apr 12, 2019

This patch only moves some code around, it doesn't
change the code at all.
A subsequent patch will benefit from this as it needs
to add calls to functions which are now defined before the
call-site, but weren't before.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c5783311

rhashtable: fix some __rcu annotation errors · e4edbe3c

NeilBrown authored Apr 12, 2019

With these annotations, the rhashtable now gets no
warnings when compiled with "C=1" for sparse checking.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e4edbe3c

rhashtable: use struct_size() in kvzalloc() · c252aa3e

Gustavo A. R. Silva authored Apr 11, 2019

One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along with
memory for some number of elements for that array.  For example:

struct foo {
    int stuff;
    struct boo entry[];
};

size = sizeof(struct foo) + count * sizeof(struct boo);
instance = kvzalloc(size, GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kvzalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c252aa3e

Merge branch 'nfp-update-to-control-structures' · 9d60f0ea

David S. Miller authored Apr 12, 2019

Jakub Kicinski says:

====================
nfp: update to control structures

This series prepares NFP control structures for crypto offloads.
So far we mostly dealt with configuration requests under rtnl lock.
This will no longer be the case with crypto.  Additionally we will
try to reuse the BPF control message format, so we move common code
out of BPF.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

9d60f0ea

nfp: split out common control message handling code · bcf0cafa

Jakub Kicinski authored Apr 11, 2019

BPF's control message handler seems like a good base to built
on for request-reply control messages.  Split it out to allow
for reuse.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bcf0cafa

nfp: move vNIC reset before netdev init · 0a72d833

Jakub Kicinski authored Apr 11, 2019

During probe we clear vNIC configuration in case the device
wasn't closed cleanly by previous driver.  Move that code
before netdev init, so netdev init can already try to apply
its config parameters.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0a72d833

nfp: add a mutex lock for the vNIC ctrl BAR · dd5b2498

Jakub Kicinski authored Apr 11, 2019

Soon we will try to write to the vNIC mailbox without RTNL held.
Add a new mutex to protect access to specific parts of the PCI
control BAR.

Move the mailbox size checking to the mailbox lock() helper, where
it can be more effective (happen prior to potential overwrite of
other data).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dd5b2498

nfp: opportunistically poll for reconfig result · e6471828

Dirk van der Merwe authored Apr 11, 2019

If the reconfig was a quick update, we could have results available from
firmware within 200us.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e6471828

ipv6: Remove flowi6_oif compare from __ip6_route_redirect · 1deeb640

David Ahern authored Apr 12, 2019

In the review of 0b34eb00 ("ipv6: Refactor __ip6_route_redirect"),
Martin noted that the flowi6_oif compare is moved to the new helper and
should be removed from __ip6_route_redirect. Fix the oversight.

Fixes: 0b34eb00 ("ipv6: Refactor __ip6_route_redirect")
Reported-by: Martin Lau <kafai@fb.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1deeb640

12 Apr, 2019 28 commits

Merge branch 'netdevsim-Mostly-cleanup-in-sdev-bpf-iface-area' · 8c5a3ca3

David S. Miller authored Apr 12, 2019

Jiri Pirko says:

====================
netdevsim: Mostly cleanup in sdev/bpf iface area

This patches does mainly internal netdevsim code shuffle. Nothing
serious, just small changes to help readability and preparations for
future work.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

8c5a3ca3

netdevsim: move sdev-specific init/uninit code into separate functions · 4b3a84bc

Jiri Pirko authored Apr 12, 2019

In order to improve readability and prepare for future code changes,
move sdev specific init/uninit code into separate functions.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4b3a84bc

netdevsim: make bpf_offload_dev_create() per-sdev instead of first ns · b26b6946

Jiri Pirko authored Apr 12, 2019

offload dev is stored in sdev struct. However, first netdevsim instance
is used as a priv. Change this to be sdev to as it is shared among
multiple netdevsim instances.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b26b6946

netdevsim: move sdev specific bpf debugfs files to sdev dir · 38f58c97

Jiri Pirko authored Apr 12, 2019

Some netdevsim bpf debugfs files are per-sdev, yet they are defined per
netdevsim instance. Move them under sdev directory.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

38f58c97

netdevsim: move shared dev creation and destruction into separate file · af9095f0

Jiri Pirko authored Apr 12, 2019

To make code easier to read, move shared dev bits into a separate file.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

af9095f0

Documentation: net: dsa: transition to the rst format · 3c91d114

Ioana Ciornei authored Apr 12, 2019

This patch also performs some minor adjustments such as numbering for
the receive path sequence, conversion of keywords to inline literals and
adding an index page so it looks better in the output of 'make htmldocs'.
Signed-off-by: Ioana Ciornei <ciorneiioana@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3c91d114

net: veth: use generic helper to report timestamping info · 056b21fb

Julian Wiedmann authored Apr 12, 2019

For reporting the common set of SW timestamping capabilities, use
ethtool_op_get_ts_info() instead of re-implementing it.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

056b21fb

net: loopback: use generic helper to report timestamping info · af730342

Julian Wiedmann authored Apr 12, 2019

For reporting the common set of SW timestamping capabilities, use
ethtool_op_get_ts_info() instead of re-implementing it.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

af730342

net: dummy: use generic helper to report timestamping info · abe9fd57

Julian Wiedmann authored Apr 12, 2019

For reporting the common set of SW timestamping capabilities, use
ethtool_op_get_ts_info() instead of re-implementing it.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

abe9fd57

Merge branch 'smc-next' · e0a092eb

David S. Miller authored Apr 12, 2019

Ursula Braun says:

====================
net/smc: patches 2019-04-12

here are patches for SMC:
* patch 1 improves behavior of non-blocking connect
* patches 2, 3, 5, 7, and 8 improve connecting return codes
* patches 4 and 6 are a cleanups without functional change
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e0a092eb

net/smc: improve smc_conn_create reason codes · 7a62725a

Karsten Graul authored Apr 12, 2019

Rework smc_conn_create() to always return a valid DECLINE reason code.
This removes the need to translate the return codes on 4 different
places and allows to easily add more detailed return codes by changing
smc_conn_create() only.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7a62725a

net/smc: improve smc_listen_work reason codes · 9aa68d29

Karsten Graul authored Apr 12, 2019

Rework smc_listen_work() to provide improved reason codes when an
SMC connection is declined. This allows better debugging on user side.
This also adds 3 more detailed reason codes in smc_clc.h to indicate
what type of device was not found (ism or rdma or both), or if ism
cannot talk to the peer.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9aa68d29

net/smc: code cleanup smc_listen_work · 228bae05

Karsten Graul authored Apr 12, 2019

In smc_listen_work() the variables rc and reason_code are defined which
have the same meaning. Eliminate reason_code in favor of the shorter
name rc. No functional changes.
Rename the functions smc_check_ism() and smc_check_rdma() into
smc_find_ism_device() and smc_find_rdma_device() to make there purpose
more clear. No functional changes.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

228bae05

net/smc: cleanup of get vlan id · fba7e8ef

Karsten Graul authored Apr 12, 2019

The vlan_id of the underlying CLC socket was retrieved two times
during processing of the listen handshaking. Change this to get the
vlan id one time in connect and in listen processing, and reuse the id.
And add a new CLC DECLINE return code for the case when the retrieval
of the vlan id failed.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fba7e8ef

net/smc: consolidate function parameters · bc36d2fc

Karsten Graul authored Apr 12, 2019

During initialization of an SMC socket a lot of function parameters need
to get passed down the function call path. Consolidate the parameters
in a helper struct so there are less enough parameters to get all passed
by register.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bc36d2fc

net/smc: check for ip prefix and subnet · 59886697

Karsten Graul authored Apr 12, 2019

The check for a matching ip prefix and subnet was only done for SMC-R
in smc_listen_rdma_check() but not when an SMC-D connection was
possible. Rename the function into smc_listen_prfx_check() and move its
call to a place where it is called for both SMC variants.
And add a new CLC DECLINE reason for the case when the IP prefix or
subnet check fails so the reason for the failing SMC connection can be
found out more easily.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

59886697

net/smc: fallback to TCP after connect problems · 4ada81fd

Karsten Graul authored Apr 12, 2019

Correct the CLC decline reason codes for internal problems to not have
the sign bit set, negative reason codes are interpreted as not eligible
for TCP fallback.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4ada81fd

net/smc: nonblocking connect rework · 50717a37

Ursula Braun authored Apr 12, 2019

For nonblocking sockets move the kernel_connect() from the connect
worker into the initial smc_connect part to return kernel_connect()
errors other than -EINPROGRESS to user space.
Reviewed-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

50717a37

xen-netback: add reference from xenvif to backend_info to facilitate coredump analysis · 6dc400af

Dongli Zhang authored Apr 12, 2019

During coredump analysis, it is not easy to obtain the address of
backend_info in xen-netback.

So far there are two ways to obtain backend_info:

1. Do what xenbus_device_find() does for vmcore to find the xenbus_device
and then derive it from dev_get_drvdata().

2. Extract backend_info from callstack of xenwatch (e.g., netback_remove()
or frontend_changed()).

This patch adds a reference from xenvif to backend_info so that it would be
much more easier to obtain backend_info during coredump analysis.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6dc400af

Merge branch 'sctp-skb-list' · 8af9f729

David S. Miller authored Apr 11, 2019

David Miller says:

====================
SCTP: Event skb list overhaul.

This patch series eliminates the explicit reference to the skb list
implementation via skb->prev dereferences.

The approach used is to pass a non-empty skb list around instead of an
event skb object which may or may not be on a list.

I'd like to thank Marcelo Leitner, Xin Long, and Neil Horman for
reviewing previous versions of this series.

Testing would be very much appreciated, in addition to the review of
course.

v4 --> v5: Rebase to net-next

v3 --> v4: Fix the logic in patch #4 so that we don't miss cases
           where we should add event to the on-stack temp list.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

8af9f729

sctp: Pass sk_buff_head explicitly to sctp_ulpq_tail_event(). · 013b96ec

David Miller authored Apr 11, 2019

Now the SKB list implementation assumption can be removed.

And now that we know that the list head is always non-NULL
we can remove the code blocks dealing with that as well.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

013b96ec

sctp: Make sctp_enqueue_event tak an skb list. · 178ca044

David Miller authored Apr 11, 2019

Pass this, instead of an event.  Then everything trickles down and we
always have events a non-empty list.

Then we needs a list creating stub to place into .enqueue_event for sctp_stream_interleave_1.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

178ca044

sctp: Use helper for sctp_ulpq_tail_event() when hooked up to ->enqueue_event · 5e8f641d

David Miller authored Apr 11, 2019

This way we can make sure events sent this way to
sctp_ulpq_tail_event() are on a list as well.  Now all such code paths
are fully covered.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5e8f641d

sctp: Always pass skbs on a list to sctp_ulpq_tail_event(). · 925b9374

David Miller authored Apr 11, 2019

This way we can simplify the logic and remove assumptions
about the implementation of skb lists.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

925b9374

sctp: Remove superfluous test in sctp_ulpq_reasm_drain(). · 0eff1052

David Miller authored Apr 11, 2019

Inside the loop, we always start with event non-NULL.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0eff1052

net: sched: flower: fix filter net reference counting · 9994677c

Vlad Buslov authored Apr 12, 2019

Fix net reference counting in fl_change() and remove redundant call to
tcf_exts_get_net() from __fl_delete(). __fl_put() already tries to get net
before releasing exts and deallocating a filter, so this code caused flower
classifier to obtain net twice per filter that is being deleted.

Implementation of __fl_delete() called tcf_exts_get_net() to pass its
result as 'async' flag to fl_mask_put(). However, 'async' flag is redundant
and only complicates fl_mask_put() implementation. This functionality seems
to be copied from filter cleanup code, where it was added by Cong with
following explanation:

This patchset tries to fix the race between call_rcu() and
cleanup_net() again. Without holding the netns refcnt the
tc_action_net_exit() in netns workqueue could be called before
filter destroy works in tc filter workqueue. This patchset
moves the netns refcnt from tc actions to tcf_exts, without
breaking per-netns tc actions.

This doesn't apply to flower mask, which doesn't call any tc action code
during cleanup. Simplify fl_mask_put() by removing the flag parameter and
always use tcf_queue_work() to free mask objects.

Fixes: 06177558 ("net: sched: flower: introduce reference counting for filters")
Fixes: 1f17f774 ("net: sched: flower: insert filter to ht before offloading it to hw")
Fixes: 05cd271f ("cls_flower: Support multiple masks per priority")
Reported-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9994677c

selftests: Add debugging options to pmtu.sh · 56490b62

David Ahern authored Apr 11, 2019

pmtu.sh script runs a number of tests and dumps a summary of pass/fail.
If a test fails, it is near impossible to debug why. For example:

    TEST: ipv6: PMTU exceptions                       [FAIL]

There are a lot of commands run behind the scenes for this test. Which
one is failing?

Add a VERBOSE option to show commands that are run and any output from
those commands. Add a PAUSE_ON_FAIL option to halt the script if a test
fails allowing users to poke around with the setup in the failed state.

In the process, rename tracing to TRACING and move declaration to top
with the new variables.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

56490b62

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · bb23581b

David S. Miller authored Apr 11, 2019

Daniel Borkmann says:

====================
pull-request: bpf-next 2019-04-12

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Improve BPF verifier scalability for large programs through two
   optimizations: i) remove verifier states that are not useful in pruning,
   ii) stop walking parentage chain once first LIVE_READ is seen. Combined
   gives approx 20x speedup. Increase limits for accepting large programs
   under root, and add various stress tests, from Alexei.

2) Implement global data support in BPF. This enables static global variables
   for .data, .rodata and .bss sections to be properly handled which allows
   for more natural program development. This also opens up the possibility
   to optimize program workflow by compiling ELFs only once and later only
   rewriting section data before reload, from Daniel and with test cases and
   libbpf refactoring from Joe.

3) Add config option to generate BTF type info for vmlinux as part of the
   kernel build process. DWARF debug info is converted via pahole to BTF.
   Latter relies on libbpf and makes use of BTF deduplication algorithm which
   results in 100x savings compared to DWARF data. Resulting .BTF section is
   typically about 2MB in size, from Andrii.

4) Add BPF verifier support for stack access with variable offset from
   helpers and add various test cases along with it, from Andrey.

5) Extend bpf_skb_adjust_room() growth BPF helper to mark inner MAC header
   so that L2 encapsulation can be used for tc tunnels, from Alan.

6) Add support for input __sk_buff context in BPF_PROG_TEST_RUN so that
   users can define a subset of allowed __sk_buff fields that get fed into
   the test program, from Stanislav.

7) Add bpf fs multi-dimensional array tests for BTF test suite and fix up
   various UBSAN warnings in bpftool, from Yonghong.

8) Generate a pkg-config file for libbpf, from Luca.

9) Dump program's BTF id in bpftool, from Prashant.

10) libbpf fix to use smaller BPF log buffer size for AF_XDP's XDP
    program, from Magnus.

11) kallsyms related fixes for the case when symbols are not present in
    BPF selftests and samples, from Daniel
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

bb23581b

11 Apr, 2019 2 commits

bpf: explicitly prohibit ctx_{in, out} in non-skb BPF_PROG_TEST_RUN · 947e8b59

Stanislav Fomichev authored Apr 11, 2019

This should allow us later to extend BPF_PROG_TEST_RUN for non-skb case
and be sure that nobody is erroneously setting ctx_{in,out}.

Fixes: b0b9395d ("bpf: support input __sk_buff context in BPF_PROG_TEST_RUN")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

947e8b59

tools: add smp_* barrier variants to include infrastructure · 6b7a2114

Daniel Borkmann authored Apr 09, 2019

Add the definition for smp_rmb(), smp_wmb(), and smp_mb() to the
tools include infrastructure: this patch adds the implementation
for x86-64 and arm64, and have it fall back as currently is for
other archs which do not have it implemented at this point. The
x86-64 one uses lock + add combination for smp_mb() with address
below red zone.

This is on top of 09d62154 ("tools, perf: add and use optimized
ring_buffer_{read_head, write_tail} helpers"), which didn't touch
smp_* barrier implementations. Magnus recently rightfully reported
however that the latter on x86-64 still wrongly falls back to sfence,
lfence and mfence respectively, thus fix that for applications under
tools making use of these to avoid such ugly surprises. The main
header under tools (include/asm/barrier.h) will in that case not
select the fallback implementation.
Reported-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

6b7a2114