Commits · 8eeef2350453aa012d846457eb6ecd012a35d99b · Kirill Smelkov / linux

01 May, 2017 6 commits

netfilter: nf_ct_ext: invoke destroy even when ext is not attached · 8eeef235

Liping Zhang authored Apr 29, 2017

For NF_NAT_MANIP_SRC, we will insert the ct to the nat_bysource_table,
then remove it from the nat_bysource_table via nat_extend->destroy.

But now, the nat extension is attached on demand, so if the nat extension
is not attached, we will not be notified when the ct is destroyed, i.e.
we may fail to remove ct from the nat_bysource_table.

So just keep it simple, even if the extension is not attached, we will
still invoke the related ext->destroy. And this will also preserve the
flexibility for the future extension.

Fixes: 9a08ecfe ("netfilter: don't attach a nat extension by default")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

8eeef235

Merge tag 'ipvs3-for-v4.12' of http://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next · d1908ca8

Pablo Neira Ayuso authored May 01, 2017

Simon Horman says:

====================
Third Round of IPVS Updates for v4.12

please consider these enhancements to IPVS for v4.12.
If it is too late for v4.12 then please consider them for v4.13.

* Remove unused function
* Correct comparison of unsigned value
====================
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

d1908ca8

netfilter: snmp: avoid stack size warning · 0e72f55f

Florian Westphal authored Apr 27, 2017

net/ipv4/netfilter/nf_nat_snmp_basic.c:1158:1: warning: the frame size
of 1160 bytes is larger than 1024 bytes
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

0e72f55f

netfilter: nf_queue: only call synchronize_net twice if nf_queue is active · 039b40ee

Florian Westphal authored Apr 24, 2017

nf_unregister_net_hook(s) can avoid a second call to synchronize_net,
provided there is no nfqueue active in that net namespace (which is
the common case).

This also gets rid of the extra arg to nf_queue_nf_hook_drop(), normally
this gets called during netns cleanup so no packets should be queued.

For the rare case of base chain being unregistered or module removal
while nfqueue is in use the extra hiccup due to the packet drops isn't
a big deal.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

039b40ee

netfilter: nf_log: don't call synchronize_rcu in nf_log_unset · c83fa196

Florian Westphal authored Apr 25, 2017

nf_log_unregister() (which is what gets called in the logger backends
module exit paths) does a (required, module is removed) synchronize_rcu().

But nf_log_unset() is only called from pernet exit handlers. It doesn't
free any memory so there appears to be no need to call synchronize_rcu.

v2: Liping Zhang points out that nf_log_unregister() needs to be called
after pernet unregister, else rmmod would become unsafe.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

c83fa196

netfilter: batch synchronize_net calls during hook unregister · 933bd83e

Florian Westphal authored Apr 24, 2017

synchronize_net is expensive and slows down netns cleanup a lot.

We have two APIs to unregister a hook:
nf_unregister_net_hook (which calls synchronize_net())
and
nf_unregister_net_hooks (calls nf_unregister_net_hook in a loop)

Make nf_unregister_net_hook a wapper around new helper
__nf_unregister_net_hook, which unlinks the hook but does not free it.

Then, we can call that helper in nf_unregister_net_hooks and then
call synchronize_net() only once.

Andrey Konovalov reports this change improves syzkaller fuzzing speed at
least twice.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

933bd83e

28 Apr, 2017 2 commits

ipvs: change comparison on sync_refresh_period · fb90e8de

Aaron Conole authored Apr 12, 2017

The sync_refresh_period variable is unsigned, so it can never be < 0.
Signed-off-by: Aaron Conole <aconole@bytheb.org>
Signed-off-by: Simon Horman <horms@verge.net.au>

fb90e8de

ipvs: remove unused function ip_vs_set_state_timeout · 65ba101e

Aaron Conole authored Apr 10, 2017

There are no in-tree callers of this function and it isn't exported.
Signed-off-by: Aaron Conole <aconole@bytheb.org>
Signed-off-by: Simon Horman <horms@verge.net.au>

65ba101e

26 Apr, 2017 11 commits

netfilter: don't attach a nat extension by default · 9a08ecfe

Florian Westphal authored Apr 20, 2017

nowadays the NAT extension only stores the interface index
(used to purge connections that got masqueraded when interface goes down)
and pptp nat information.

Previous patches moved nf_ct_nat_ext_add to those places that need it.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

9a08ecfe

netfilter: pptp: attach nat extension when needed · 2fe7c321

Florian Westphal authored Apr 20, 2017

make sure nat extension gets added if the master conntrack is subject to
NAT. This will be required once the nat core stops adding it by default.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

2fe7c321

netfilter: masquerade: attach nat extension if not present · ff459018

Florian Westphal authored Apr 20, 2017

Currently the nat extension is always attached as soon as nat module is
loaded.  However, most NAT uses do not need the nat extension anymore.

Prepare to remove the add-nat-by-default by making those places that need
it attach it if its not present yet.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ff459018

netfilter: conntrack: handle initial extension alloc via krealloc · 22d4536d

Florian Westphal authored Apr 20, 2017

krealloc(NULL, ..) is same as kmalloc(), so we can avoid special-casing
the initial allocation after the prealloc removal (we had to use
->alloc_len as the initial allocation size).

This also means we do not zero the preallocated memory anymore; only
offsets[].  Existing code makes sure the new (used) extension space gets
zeroed out.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

22d4536d

netfilter: conntrack: mark extension structs as const · 23f671a1
Florian Westphal authored Apr 20, 2017
```
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
```
23f671a1

netfilter: conntrack: remove prealloc support · 54044b1f

Florian Westphal authored Apr 20, 2017

It was used by the nat extension, but since commit
7c966435 ("netfilter: move nat hlist_head to nf_conn") its only needed
for connections that use MASQUERADE target or a nat helper.

Also it seems a lot easier to preallocate a fixed size instead.

With default settings, conntrack first adds ecache extension (sysctl
defaults to 1), so we get 40(ct extension header) + 24 (ecache) == 64 byte
on x86_64 for initial allocation.

Followup patches can constify the extension structs and avoid
the initial zeroing of the entire extension area.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

54044b1f

netfilter: SYNPROXY: Return NF_STOLEN instead of NF_DROP during handshaking · 495dcb56

Gao Feng authored Apr 20, 2017

Current SYNPROXY codes return NF_DROP during normal TCP handshaking,
it is not friendly to caller. Because the nf_hook_slow would treat
the NF_DROP as an error, and return -EPERM.
As a result, it may cause the top caller think it meets one error.

For example, the following codes are from cfv_rx_poll()
	err = netif_receive_skb(skb);
	if (unlikely(err)) {
		++cfv->ndev->stats.rx_dropped;
	} else {
		++cfv->ndev->stats.rx_packets;
		cfv->ndev->stats.rx_bytes += skb_len;
	}
When SYNPROXY returns NF_DROP, then netif_receive_skb returns -EPERM.
As a result, the cfv driver would treat it as an error, and increase
the rx_dropped counter.

So use NF_STOLEN instead of NF_DROP now because there is no error
happened indeed, and free the skb directly.
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

495dcb56

ebtables: remove nf_hook_register usage · aee12a0a

Florian Westphal authored Apr 20, 2017

Similar to ip_register_table, pass nf_hook_ops to ebt_register_table().
This allows to handle hook registration also via pernet_ops and allows
us to avoid use of legacy register_hook api.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

aee12a0a

netfilter: decnet: only register hooks in init namespace · 1a0ed0ad

Florian Westphal authored Apr 20, 2017

looks like decnet isn't namespacified in first place, so restrict hook
registration to the initial namespace.

Prepares for eventual removal of legacy nf_register_hook() api.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

1a0ed0ad

ipvs: convert to use pernet nf_hook api · efe41606

Florian Westphal authored Apr 19, 2017

nf_(un)register_hooks has to maintain an internal hook list to add/remove
those hooks from net namespaces as they are added/deleted.

ipvs already uses pernet_ops, so we can switch to the (more recent)
pernet hook api instead.

Compile tested only.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

efe41606

netfilter: synproxy: only register hooks when needed · 1fefe147

Florian Westphal authored Apr 19, 2017

Defer registration of the synproxy hooks until the first SYNPROXY rule is
added.  Also means we only register hooks in namespaces that need it.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

1fefe147

19 Apr, 2017 11 commits

netfilter: tcp: Use TCP_MAX_WSCALE instead of literal 14 · 122868b3

Gao Feng authored Apr 19, 2017

The window scale may be enlarged from 14 to 15 according to the itef
draft https://tools.ietf.org/html/draft-nishida-tcpm-maxwin-03.

Use the macro TCP_MAX_WSCALE to support it easily with TCP stack in
the future.
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

122868b3

netfilter: ipvs: fix incorrect conflict resolution · be7be6e1

Florian Westphal authored Apr 18, 2017

The commit ab8bc7ed
("netfilter: remove nf_ct_is_untracked")
changed the line
   if (ct && !nf_ct_is_untracked(ct) && nfct_nat(ct)) {
	   to
   if (ct && nfct_nat(ct)) {

meanwhile, the commit 41390895
("netfilter: ipvs: don't check for presence of nat extension")
from ipvs-next had changed the same line to

  if (ct && !nf_ct_is_untracked(ct) && (ct->status & IPS_NAT_MASK)) {

When ipvs-next got merged into nf-next, the merge resolution took
the first version, dropping the conversion of nfct_nat().

While this doesn't cause a problem at the moment, it will once we stop
adding the nat extension by default.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

be7be6e1

nefilter: eache: reduce struct size from 32 to 24 byte · 01026ede

Florian Westphal authored Apr 18, 2017

Only "cache" needs to use ulong (its used with set_bit()), missed can use
u16.  Also add build-time assertion to ensure event bits fit.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

01026ede

netfilter: allow early drop of assured conntracks · c6dd940b

Florian Westphal authored Apr 16, 2017

If insertion of a new conntrack fails because the table is full, the kernel
searches the next buckets of the hash slot where the new connection
was supposed to be inserted at for an entry that hasn't seen traffic
in reply direction (non-assured), if it finds one, that entry is
is dropped and the new connection entry is allocated.

Allow the conntrack gc worker to also remove *assured* conntracks if
resources are low.

Do this by querying the l4 tracker, e.g. tcp connections are now dropped
if they are no longer established (e.g. in finwait).

This could be refined further, e.g. by adding 'soft' established timeout
(i.e., a timeout that is only used once we get close to resource
exhaustion).

Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

c6dd940b

netfilter: conntrack: use u8 for extension sizes again · b3a5db10

Florian Westphal authored Apr 16, 2017

commit 223b02d9
("netfilter: nf_conntrack: reserve two bytes for nf_ct_ext->len")
had to increase size of the extension offsets because total size of the
extensions had increased to a point where u8 did overflow.

3 years later we've managed to diet extensions a bit and we no longer
need u16.  Furthermore we can now add a compile-time assertion for this
problem.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

b3a5db10

netfilter: remove last traces of variable-sized extensions · faec865d

Florian Westphal authored Apr 16, 2017

get rid of the (now unused) nf_ct_ext_add_length define and also
rename the function to plain nf_ct_ext_add().
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

faec865d

netfilter: helpers: remove data_len usage for inkernel helpers · 9f0f3ebe

Florian Westphal authored Apr 16, 2017

No need to track this for inkernel helpers anymore as
NF_CT_HELPER_BUILD_BUG_ON checks do this now.

All inkernel helpers know what kind of structure they
stored in helper->data.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

9f0f3ebe

netfilter: nfnetlink_cthelper: reject too large userspace allocation requests · 157ffffe

Florian Westphal authored Apr 16, 2017

Userspace should not abuse the kernel to store large amounts of data,
reject requests larger than the private area can accommodate.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

157ffffe

netfilter: helper: add build-time asserts for helper data size · dcf67740

Florian Westphal authored Apr 16, 2017

add a 32 byte scratch area in the helper struct instead of relying
on variable sized helpers plus compile-time asserts to let us know
if 32 bytes aren't enough anymore.

Not having variable sized helpers will later allow to add BUILD_BUG_ON
for the total size of conntrack extensions -- the helper extension is
the only one that doesn't have a fixed size.

The (useless!) NF_CT_HELPER_BUILD_BUG_ON(0); are added so that in case
someone adds a new helper and copy-pastes from one that doesn't store
private data at least some indication that this macro should be used
somehow is there...
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

dcf67740

netfilter: conntrack: move helper struct to nf_conntrack_helper.h · 906535b0

Florian Westphal authored Apr 16, 2017

its definition is not needed in nf_conntrack.h.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

906535b0

netfilter: nft_ct: allow to set ctnetlink event types of a connection · 694a0055

Florian Westphal authored Apr 15, 2017

By default the kernel emits all ctnetlink events for a connection.
This allows to select the types of events to generate.

This can be used to e.g. only send DESTROY events but no NEW/UPDATE ones
and will work even if sysctl net.netfilter.nf_conntrack_events is set to 0.

This was already possible via iptables' CT target, but the nft version has
the advantage that it can also be used with already-established conntracks.

The added nf_ct_is_template() check isn't a bug fix as we only support
mark and labels (and unlike ecache the conntrack core doesn't copy those).
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

694a0055

15 Apr, 2017 6 commits

netfilter: remove nf_ct_is_untracked · ab8bc7ed

Florian Westphal authored Apr 14, 2017

This function is now obsolete and always returns false.
This change has no effect on generated code.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ab8bc7ed

netfilter: kill the fake untracked conntrack objects · cc41c84b

Florian Westphal authored Apr 14, 2017

resurrect an old patch from Pablo Neira to remove the untracked objects.

Currently, there are four possible states of an skb wrt. conntrack.

1. No conntrack attached, ct is NULL.
2. Normal (kmem cache allocated) ct attached.
3. a template (kmalloc'd), not in any hash tables at any point in time
4. the 'untracked' conntrack, a percpu nf_conn object, tagged via
   IPS_UNTRACKED_BIT in ct->status.

Untracked is supposed to be identical to case 1.  It exists only
so users can check

-m conntrack --ctstate UNTRACKED vs.
-m conntrack --ctstate INVALID

e.g. attempts to set connmark on INVALID or UNTRACKED conntracks is
supposed to be a no-op.

Thus currently we need to check
 ct == NULL || nf_ct_is_untracked(ct)

in a lot of places in order to avoid altering untracked objects.

The other consequence of the percpu untracked object is that all
-j NOTRACK (and, later, kfree_skb of such skbs) result in an atomic op
(inc/dec the untracked conntracks refcount).

This adds a new kernel-private ctinfo state, IP_CT_UNTRACKED, to
make the distinction instead.

The (few) places that care about packet invalid (ct is NULL) vs.
packet untracked now need to test ct == NULL vs. ctinfo == IP_CT_UNTRACKED,
but all other places can omit the nf_ct_is_untracked() check.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

cc41c84b

netfilter: ecache: Refine the nf_ct_deliver_cached_events · 6e354a5e

Gao Feng authored Apr 13, 2017

1. Remove single !events condition check to deliver the missed event
even though there is no new event happened.

Consider this case:
1) nf_ct_deliver_cached_events is invoked at the first time, the
event is failed to deliver, then the missed is set.
2) nf_ct_deliver_cached_events is invoked again, but there is no
any new event happened.
The missed event is lost really.

It would try to send the missed event again after remove this check.
And it is ok if there is no missed event because the latter check
!((events | missed) & e->ctmask) could avoid it.

2. Correct the return value check of notify->fcn.
When send the event successfully, it returns 0, not postive value.
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

6e354a5e

netfilter: nf_nat: Fix return NF_DROP in nfnetlink_parse_nat_setup · 7025bac4

Gao Feng authored Apr 12, 2017

The __nf_nat_alloc_null_binding invokes nf_nat_setup_info which may
return NF_DROP when memory is exhausted, so convert NF_DROP to -ENOMEM
to make ctnetlink happy. Or ctnetlink_setup_nat treats it as a success
when one error NF_DROP happens actully.
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

7025bac4

Merge tag 'ipvs2-for-v4.12' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next · a702ece3

Pablo Neira Ayuso authored Apr 15, 2017

Simon Horman says:

====================
Second Round of IPVS Updates for v4.12

please consider these clean-ups and enhancements to IPVS for v4.12.

* Removal unused variable
* Use kzalloc where appropriate
* More efficient detection of presence of NAT extension
====================
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Conflicts:
	net/netfilter/ipvs/ip_vs_ftp.c

a702ece3

ipset: remove unused function __ip_set_get_netlink · db268d4d

Aaron Conole authored Apr 10, 2017

There are no in-tree callers.
Signed-off-by: Aaron Conole <aconole@bytheb.org>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

db268d4d

13 Apr, 2017 3 commits

netfilter: nf_conntrack: remove double assignment · 809c2d9a

Aaron Conole authored Apr 12, 2017

The protonet pointer will unconditionally be rewritten, so just do the
needed assignment first.
Signed-off-by: Aaron Conole <aconole@bytheb.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

809c2d9a

netfilter: nf_tables: remove double return statement · 79250568

Aaron Conole authored Apr 12, 2017

Signed-off-by: Aaron Conole <aconole@bytheb.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

79250568

netfilter: nat: remove rcu_read_lock in __nf_nat_decode_session. · 53890234

Taehee Yoo authored Mar 28, 2017

__nf_nat_decode_session is called from nf_nat_decode_session as decodefn.
before calling decodefn, it already set rcu_read_lock. so rcu_read_lock in
__nf_nat_decode_session can be removed.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

53890234

08 Apr, 2017 1 commit

netfilter: udplite: Remove duplicated udplite4/6 declaration · 4f139972

Gao Feng authored Apr 05, 2017

There are two nf_conntrack_l4proto_udp4 declarations in the head file
nf_conntrack_ipv4/6.h. Now remove one which is not enbraced by the macro
CONFIG_NF_CT_PROTO_UDPLITE.
Signed-off-by: Gao Feng <fgao@ikuai8.com>

4f139972