Commits · cb662ac6711f7135618526221498ebfae155531a · Kirill Smelkov / linux

23 Oct, 2019 4 commits

netfilter: nf_tables: increase maximum devices number per flowtable · cb662ac6

Pablo Neira Ayuso authored Oct 16, 2019

Rise the maximum limit of devices per flowtable up to 256. Rename
NFT_FLOWTABLE_DEVICE_MAX to NFT_NETDEVICE_MAX in preparation to reuse
the netdev hook parser for ingress basechain.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

cb662ac6

netfilter: nf_tables: allow netdevice to be used only once per flowtable · b75a3e83
Pablo Neira Ayuso authored Oct 16, 2019
```
Allow netdevice only once per flowtable, otherwise hit EEXIST.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
```
b75a3e83
netfilter: nf_tables: dynamically allocate hooks per net_device in flowtables · 3f0465a9
Pablo Neira Ayuso authored Oct 16, 2019
```
Use a list of hooks per device instead an array.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
```
3f0465a9

netfilter: nf_flow_table: move priority to struct nf_flowtable · 71a8a63b

Pablo Neira Ayuso authored Oct 16, 2019

Hardware offload needs access to the priority field, store this field in
the nf_flowtable object.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

71a8a63b

17 Oct, 2019 6 commits

netfilter: nft_tproxy: Fix typo in IPv6 module description. · 0a9b3385

Norman Rasmussen authored Oct 12, 2019

Signed-off-by: Norman Rasmussen <norman@rasmussen.co.za>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

0a9b3385

netfilter: add and use nf_hook_slow_list() · ca58fbe0

Florian Westphal authored Oct 11, 2019

At this time, NF_HOOK_LIST() macro will iterate the list and then calls
nf_hook() for each individual skb.

This makes it so the entire list is passed into the netfilter core.
The advantage is that we only need to fetch the rule blob once per list
instead of per-skb.

NF_HOOK_LIST now only works for ipv4 and ipv6, as those are the only
callers.

v2: use skb_list_del_init() instead of list_del (Edward Cree)
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ca58fbe0

netfilter: conntrack: free extension area immediately · 2ad9d774

Florian Westphal authored Oct 15, 2019

Instead of waiting for rcu grace period just free it directly.

This is safe because conntrack lookup doesn't consider extensions.

Other accesses happen while ct->ext can't be free'd, either because
a ct refcount was taken or because the conntrack hash bucket lock or
the dying list spinlock have been taken.

This allows to remove __krealloc in a followup patch, netfilter was the
only user.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

2ad9d774

netfilter: ctnetlink: don't dump ct extensions of unconfirmed conntracks · 49ca022b

Florian Westphal authored Oct 15, 2019

When dumping the unconfirmed lists, the cpu that is processing the ct
entry can reallocate ct->ext at any time.

Right now accessing the extensions from another CPU is ok provided
we're holding rcu read lock: extension reallocation does use rcu.

Once RCU isn't used anymore this becomes unsafe, so skip extensions for
the unconfirmed list.

Dumping the extension area for confirmed or dying conntracks is fine:
no reallocations are allowed and list iteration holds appropriate
locks that prevent ct (and this ct->ext) from getting free'd.

v2: fix compiler warnings due to misue of 'const' and missing return
    statement (kbuild robot).
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

49ca022b

Merge tag 'ipvs-next-for-v5.5' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next · 5ccbf891

Pablo Neira Ayuso authored Oct 17, 2019

Pablo Neira Ayuso says:

====================
IPVS updates for v5.5

1) Two patches to speedup ipvs netns dismantle, from Haishuang Yan.

2) Three patches to add selftest script for ipvs, also from
   Haishuang Yan.

3) Simplify __ip_vs_get_out_rt() from zhang kai.
====================
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

5ccbf891

netfilter: ecache: document extension area access rules · 63f55acf

Florian Westphal authored Oct 13, 2019

Once ct->ext gets free'd via kfree() rather than kfree_rcu we can't
access the extension area anymore without owning the conntrack.

This is a special case:

The worker is walking the pcpu dying list while holding dying list lock:
Neither ct nor ct->ext can be free'd until after the walk has completed.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

63f55acf

11 Oct, 2019 3 commits

selftests: netfilter: add ipvs tunnel test case · 176a5204

Haishuang Yan authored Oct 10, 2019

Test virtual server via ipip tunnel.

Tested:
# selftests: netfilter: ipvs.sh
# Testing DR mode...
# Testing NAT mode...
# Testing Tunnel mode...
# ipvs.sh: PASS
ok 6 selftests: netfilter: ipvs.sh
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

176a5204

selftests: netfilter: add ipvs nat test case · 0ed15462

Haishuang Yan authored Oct 10, 2019

Test virtual server via NAT.

Tested:
# selftests: netfilter: ipvs.sh
# Testing DR mode...
# Testing NAT mode...
# ipvs.sh: PASS
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

0ed15462

selftests: netfilter: add ipvs test script · 867d2190

Haishuang Yan authored Oct 10, 2019

Test virutal server via directing routing for IPv4.

Tested:

# selftests: netfilter: ipvs.sh
# Testing DR mode...
# ipvs.sh: PASS
ok 6 selftests: netfilter: ipvs.sh
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

867d2190

08 Oct, 2019 3 commits

ipvs: batch __ip_vs_dev_cleanup · ac524481

Haishuang Yan authored Sep 27, 2019

It's better to batch __ip_vs_cleanup to speedup ipvs
devices dismantle.
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

ac524481

ipvs: batch __ip_vs_cleanup · 5d5a0815

Haishuang Yan authored Sep 27, 2019

It's better to batch __ip_vs_cleanup to speedup ipvs
connections dismantle.
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

5d5a0815

ipvs: no need to update skb route entry for local destination packets. · c09b8970

zhang kai authored Sep 30, 2019

In the end of function __ip_vs_get_out_rt/__ip_vs_get_out_rt_v6,the
'local' variable is always zero.
Signed-off-by: zhang kai <zhangkaiheb@126.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

c09b8970

07 Oct, 2019 7 commits

netfilter: ipset: move ip_set_get_ip_port() to ip_set_bitmap_port.c. · f8615bf8

Jeremy Sowden authored Oct 03, 2019

ip_set_get_ip_port() is only used in ip_set_bitmap_port.c.  Move it
there and make it static.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

f8615bf8

netfilter: ipset: move function to ip_set_bitmap_ip.c. · 3fbd6c45

Jeremy Sowden authored Oct 03, 2019

One inline function in ip_set_bitmap.h is only called in
ip_set_bitmap_ip.c: move it and remove inline function specifier.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

3fbd6c45

netfilter: ipset: make ip_set_put_flags extern. · 85639185

Jeremy Sowden authored Oct 03, 2019

ip_set_put_flags is rather large for a static inline function in a
header-file.  Move it to ip_set_core.c and export it.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

85639185

netfilter: ipset: move functions to ip_set_core.c. · 2398a976

Jeremy Sowden authored Oct 03, 2019

Several inline functions in ip_set.h are only called in ip_set_core.c:
move them and remove inline function specifier.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

2398a976

netfilter: ipset: move ip_set_comment functions from ip_set.h to ip_set_core.c. · 94177f6e

Jeremy Sowden authored Oct 03, 2019

Most of the functions are only called from within ip_set_core.c.

The exception is ip_set_init_comment.  However, this is too complex to
be a good candidate for a static inline function.  Move it to
ip_set_core.c, change its linkage to extern and export it, leaving a
declaration in ip_set.h.

ip_set_comment_free is only used as an extension destructor, so change
its prototype to match and drop cast.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

94177f6e

netfilter: ipset: remove inline from static functions in .c files. · 8dea982a

Jeremy Sowden authored Oct 03, 2019

The inline function-specifier should not be used for static functions
defined in .c files since it bloats the kernel.  Instead leave the
compiler to decide which functions to inline.

While a couple of the files affected (ip_set_*_gen.h) are technically
headers, they contain templates for generating the common parts of
particular set-types and so we treat them like .c files.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

8dea982a

netfilter: ipset: add a coding-style fix to ip_set_ext_destroy. · 017f77c0

Jeremy Sowden authored Oct 03, 2019

Use a local variable to hold comment in order to align the arguments of
ip_set_comment_free properly.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

017f77c0

05 Oct, 2019 17 commits

Merge branch 'create-netdevsim-instances-in-namespace' · fbe3d0c7

David S. Miller authored Oct 05, 2019

Jiri Pirko says:

====================
create netdevsim instances in namespace

Allow user to create netdevsim devlink and netdevice instances in a
network namespace according to the namespace where the user resides in.
Add a selftest to test this.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

fbe3d0c7

selftests: test creating netdevsim inside network namespace · c04d71b5

Jiri Pirko authored Oct 05, 2019

Add a test that creates netdevsim instance inside network namespace
and verifies that the related devlink instance and port netdevices
reside in the namespace.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c04d71b5

netdevsim: create devlink and netdev instances in namespace · 7b60027b

Jiri Pirko authored Oct 05, 2019

When user does create new netdevsim instance using sysfs bus file,
create the devlink instance and related netdev instance in the namespace
of the caller.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7b60027b

net: devlink: export devlink net setter · 8273fd84

Jiri Pirko authored Oct 05, 2019

For newly allocated devlink instance allow drivers to set net struct
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8273fd84

Merge branch 'net-tls-add-ctrl-path-tracing-and-statistics' · 128d23c3

David S. Miller authored Oct 05, 2019

Jakub Kicinski says:

====================
net/tls: add ctrl path tracing and statistics

This set adds trace events related to TLS offload and basic MIB stats
for TLS.

First patch contains the TLS offload related trace points. Those are
helpful in troubleshooting offload issues, especially around the
resync paths.

Second patch adds a tracepoint to the fastpath of device offload,
it's separated out in case there will be objections to adding
fast path tracepoints. Again, it's quite useful for debugging
offload issues.

Next four patches add MIB statistics. The statistics are implemented
as per-cpu per-netns counters. Since there are currently no fast path
statistics we could move to atomic variables. Per-CPU seem more common.

Most basic statistics are number of created and live sessions, broken
out to offloaded and non-offloaded. Users seem to like those a lot.

Next there is a statistic for decryption errors. These are primarily
useful for device offload debug, in normal deployments decryption
errors should not be common.

Last but not least a counter for device RX resync.
====================
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

128d23c3

net/tls: add TlsDeviceRxResync statistic · a4d26fdb

Jakub Kicinski authored Oct 04, 2019

Add a statistic for number of RX resyncs sent down to the NIC.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a4d26fdb

net/tls: add TlsDecryptError stat · 5c5ec668

Jakub Kicinski authored Oct 04, 2019

Add a statistic for TLS record decryption errors.

Since devices are supposed to pass records as-is when they
encounter errors this statistic will count bad records in
both pure software and inline crypto configurations.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5c5ec668

net/tls: add statistics for installed sessions · b32fd3cc

Jakub Kicinski authored Oct 04, 2019

Add SNMP stats for number of sockets with successfully
installed sessions.  Break them down to software and
hardware ones.  Note that if hardware offload fails
stack uses software implementation, and counts the
session appropriately.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b32fd3cc

net/tls: add skeleton of MIB statistics · d26b698d

Jakub Kicinski authored Oct 04, 2019

Add a skeleton structure for adding TLS statistics.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d26b698d

net/tls: add device decrypted trace point · 9ec1c6ac

Jakub Kicinski authored Oct 04, 2019

Add a tracepoint to the TLS offload's fast path. This tracepoint
can be used to track the decrypted and encrypted status of received
records. Records decrypted by the device should have decrypted set
to 1, records which have neither decrypted nor decrypted set are
partially decrypted, require re-encryption and therefore are most
expensive to deal with.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9ec1c6ac

net/tls: add tracing for device/offload events · 8538d29c

Jakub Kicinski authored Oct 04, 2019

Add tracing of device-related interaction to aid performance
analysis, especially around resync:

 tls:tls_device_offload_set
 tls:tls_device_rx_resync_send
 tls:tls_device_rx_resync_nh_schedule
 tls:tls_device_rx_resync_nh_delay
 tls:tls_device_tx_resync_req
 tls:tls_device_tx_resync_send
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8538d29c

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 6f4c930e
David S. Miller authored Oct 05, 2019

6f4c930e

Merge tag 'kbuild-fixes-v5.4' of... · 2d00aee2

Linus Torvalds authored Oct 05, 2019

Merge tag 'kbuild-fixes-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - remove unneeded ar-option and KBUILD_ARFLAGS

 - remove long-deprecated SUBDIRS

 - fix modpost to suppress false-positive warnings for UML builds

 - fix namespace.pl to handle relative paths to ${objtree}, ${srctree}

 - make setlocalversion work for /bin/sh

 - make header archive reproducible

 - fix some Makefiles and documents

* tag 'kbuild-fixes-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kheaders: make headers archive reproducible
  kbuild: update compile-test header list for v5.4-rc2
  kbuild: two minor updates for Documentation/kbuild/modules.rst
  scripts/setlocalversion: clear local variable to make it work for sh
  namespace: fix namespace.pl script to support relative paths
  video/logo: do not generate unneeded logo C files
  video/logo: remove unneeded *.o pattern from clean-files
  integrity: remove pointless subdir-$(CONFIG_...)
  integrity: remove unneeded, broken attempt to add -fshort-wchar
  modpost: fix static EXPORT_SYMBOL warnings for UML build
  kbuild: correct formatting of header in kbuild module docs
  kbuild: remove SUBDIRS support
  kbuild: remove ar-option and KBUILD_ARFLAGS

2d00aee2

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 126195c9

Linus Torvalds authored Oct 05, 2019

Pull SCSI fixes from James Bottomley:
 "Twelve patches mostly small but obvious fixes or cosmetic but small
  updates"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: qla2xxx: Fix Nport ID display value
  scsi: qla2xxx: Fix N2N link up fail
  scsi: qla2xxx: Fix N2N link reset
  scsi: qla2xxx: Optimize NPIV tear down process
  scsi: qla2xxx: Fix stale mem access on driver unload
  scsi: qla2xxx: Fix unbound sleep in fcport delete path.
  scsi: qla2xxx: Silence fwdump template message
  scsi: hisi_sas: Make three functions static
  scsi: megaraid: disable device when probe failed after enabled device
  scsi: storvsc: setup 1:1 mapping between hardware queue and CPU queue
  scsi: qedf: Remove always false 'tmp_prio < 0' statement
  scsi: ufs: skip shutdown if hba is not powered
  scsi: bnx2fc: Handle scope bits when array returns BUSY or TSF

126195c9

Merge branch 'readdir' (readdir speedup and sanity checking) · 4f11918a

Linus Torvalds authored Oct 05, 2019

This makes getdents() and getdents64() do sanity checking on the
pathname that it gives to user space.  And to mitigate the performance
impact of that, it first cleans up the way it does the user copying, so
that the code avoids doing the SMAP/PAN updates between each part of the
dirent structure write.

I really wanted to do this during the merge window, but didn't have
time.  The conversion of filldir to unsafe_put_user() is something I've
had around for years now in a private branch, but the extra pathname
checking finally made me clean it up to the point where it is mergable.

It's worth noting that the filename validity checking really should be a
bit smarter: it would be much better to delay the error reporting until
the end of the readdir, so that non-corrupted filenames are still
returned.  But that involves bigger changes, so let's see if anybody
actually hits the corrupt directory entry case before worrying about it
further.

* branch 'readdir':
  Make filldir[64]() verify the directory entry filename is valid
  Convert filldir[64]() from __put_user() to unsafe_put_user()

4f11918a

Make filldir[64]() verify the directory entry filename is valid · 8a23eb80

Linus Torvalds authored Oct 05, 2019

This has been discussed several times, and now filesystem people are
talking about doing it individually at the filesystem layer, so head
that off at the pass and just do it in getdents{64}().

This is partially based on a patch by Jann Horn, but checks for NUL
bytes as well, and somewhat simplified.

There's also commentary about how it might be better if invalid names
due to filesystem corruption don't cause an immediate failure, but only
an error at the end of the readdir(), so that people can still see the
filenames that are ok.

There's also been discussion about just how much POSIX strictly speaking
requires this since it's about filesystem corruption.  It's really more
"protect user space from bad behavior" as pointed out by Jann.  But
since Eric Biederman looked up the POSIX wording, here it is for context:

 "From readdir:

   The readdir() function shall return a pointer to a structure
   representing the directory entry at the current position in the
   directory stream specified by the argument dirp, and position the
   directory stream at the next entry. It shall return a null pointer
   upon reaching the end of the directory stream. The structure dirent
   defined in the <dirent.h> header describes a directory entry.

  From definitions:

   3.129 Directory Entry (or Link)

   An object that associates a filename with a file. Several directory
   entries can associate names with the same file.

  ...

   3.169 Filename

   A name consisting of 1 to {NAME_MAX} bytes used to name a file. The
   characters composing the name may be selected from the set of all
   character values excluding the slash character and the null byte. The
   filenames dot and dot-dot have special meaning. A filename is
   sometimes referred to as a 'pathname component'."

Note that I didn't bother adding the checks to any legacy interfaces
that nobody uses.

Also note that if this ends up being noticeable as a performance
regression, we can fix that to do a much more optimized model that
checks for both NUL and '/' at the same time one word at a time.

We haven't really tended to optimize 'memchr()', and it only checks for
one pattern at a time anyway, and we really _should_ check for NUL too
(but see the comment about "soft errors" in the code about why it
currently only checks for '/')

See the CONFIG_DCACHE_WORD_ACCESS case of hash_name() for how the name
lookup code looks for pathname terminating characters in parallel.

Link: https://lore.kernel.org/lkml/20190118161440.220134-2-jannh@google.com/
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jann Horn <jannh@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8a23eb80

Convert filldir[64]() from __put_user() to unsafe_put_user() · 9f79b78e

Linus Torvalds authored May 21, 2016

We really should avoid the "__{get,put}_user()" functions entirely,
because they can easily be mis-used and the original intent of being
used for simple direct user accesses no longer holds in a post-SMAP/PAN
world.

Manually optimizing away the user access range check makes no sense any
more, when the range check is generally much cheaper than the "enable
user accesses" code that the __{get,put}_user() functions still need.

So instead of __put_user(), use the unsafe_put_user() interface with
user_access_{begin,end}() that really does generate better code these
days, and which is generally a nicer interface. Under some loads, the
multiple user writes that filldir() does are actually quite noticeable.

This also makes the dirent name copy use unsafe_put_user() with a couple
of macros. We do not want to make function calls with SMAP/PAN
disabled, and the code this generates is quite good when the
architecture uses "asm goto" for unsafe_put_user() like x86 does.

Note that this doesn't bother with the legacy cases. Nobody should use
them anyway, so performance doesn't really matter there.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9f79b78e