Commits · ac088a88b5d544b7b82f00214b1588b3c88a7fc6 · Kirill Smelkov / linux

28 Jan, 2019 4 commits

netfilter: conntrack: fix error path in nf_conntrack_pernet_init() · ac088a88

Cong Wang authored Jan 23, 2019

When nf_ct_netns_get() fails, it should clean up itself,
its caller doesn't need to call nf_conntrack_fini_net().

nf_conntrack_init_net() is called after registering sysctl
and proc, so its cleanup function should be called before
unregistering sysctl and proc.

Fixes: ba3fbe66 ("netfilter: nf_conntrack: provide modparam to always register conntrack hooks")
Fixes: b884fa46 ("netfilter: conntrack: unify sysctl handling")
Reported-and-tested-by: syzbot+fcee88b2d87f0539dfe9@syzkaller.appspotmail.com
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ac088a88

netfilter: nft_counter: remove wrong __percpu of nft_counter_resest()'s arg · dd03b1ad

Luc Van Oostenryck authored Jan 19, 2019

nft_counter_rest() has its first argument declared as
	struct nft_counter_percpu_priv __percpu *priv
but this structure is not percpu (it only countains
a member 'counter' which is, correctly, a pointer to a
percpu struct nft_counter).

So, remove the '__percpu' from the argument's declaration.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

dd03b1ad

ipvs: use indirect call wrappers · 6ecd7548

Matteo Croce authored Jan 19, 2019

Use the new indirect call wrappers in IPVS when calling the TCP or UDP
protocol specific functions.
This avoids an indirect calls in IPVS, and reduces the performance
impact of the Spectre mitigation.
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

6ecd7548

ipvs: avoid indirect calls when calculating checksums · fe19a8fe

Matteo Croce authored Jan 19, 2019

The function pointer ip_vs_protocol->csum_check is only used in protocol
specific code, and never in the generic one.
Remove the function pointer from struct ip_vs_protocol and call the
checksum functions directly.
This reduces the performance impact of the Spectre mitigation, and
should give a small improvement even with RETPOLINES disabled.
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

fe19a8fe

22 Jan, 2019 2 commits

netfilter: conntrack: fix bogus port values for other l4 protocols · e2f7cc72

Florian Westphal authored Jan 21, 2019

We must only extract l4 proto information if we can track the layer 4
protocol.

Before removal of pkt_to_tuple callback, the code to extract port
information was only reached for TCP/UDP/LITE/DCCP/SCTP.

The other protocols were handled by the indirect call, and the
'generic' tracker took care of other protocols that have no notion
of 'ports'.

After removal of the callback we must be more strict here and only
init port numbers for those protocols that have ports.

Fixes: df5e1629 ("netfilter: conntrack: remove pkt_to_tuple callback")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

e2f7cc72

netfilter: conntrack: fix IPV6=n builds · 81e01647

Florian Westphal authored Jan 21, 2019

Stephen Rothwell reports:
 After merging the netfilter-next tree, today's linux-next build
 (powerpc ppc64_defconfig) failed like this:

 ERROR: "nf_conntrack_invert_icmpv6_tuple" [nf_conntrack.ko] undefined!
 ERROR: "nf_conntrack_icmpv6_packet" [nf_conntrack.ko] undefined!
 ERROR: "nf_conntrack_icmpv6_init_net" [nf_conntrack.ko] undefined!
 ERROR: "icmpv6_pkt_to_tuple" [nf_conntrack.ko] undefined!
 ERROR: "nf_ct_gre_keymap_destroy" [nf_conntrack.ko] undefined!

icmpv6 related errors are due to lack of IS_ENABLED(CONFIG_IPV6) (no
icmpv6 support is builtin if kernel has CONFIG_IPV6=n), the
nf_ct_gre_keymap_destroy error is due to lack of PROTO_GRE check.

Fixes: a47c5404 ("netfilter: conntrack: handle builtin l4proto packet functions via direct calls")
Fixes: e2e48b47 ("netfilter: conntrack: handle icmp pkt_to_tuple helper via direct calls")
Fixes: 197c4300 ("netfilter: conntrack: remove invert_tuple callback")
Fixes: 2a389de8 ("netfilter: conntrack: remove l4proto init and get_net callbacks")
Fixes: e5689435 ("netfilter: conntrack: remove l4proto destroy hook")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

81e01647

18 Jan, 2019 34 commits

Revert "netfilter: nft_hash: add map lookups for hashing operations" · 0123a75e

Laura Garcia Liebana authored Jan 18, 2019

A better way to implement this from userspace has been found without
specific code in the kernel side, revert this.

Fixes: b9ccc07e ("netfilter: nft_hash: add map lookups for hashing operations")
Signed-off-by: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

0123a75e

netfilter: nat: un-export nf_nat_used_tuple · 472caa69

Florian Westphal authored Jan 17, 2019

Not used since 203f2e78 ("netfilter: nat: remove l4proto->unique_tuple")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

472caa69

netfilter: nft_meta: Add NFT_META_I/OIFKIND meta type · 0fb4d219

wenxu authored Jan 16, 2019

In the ip_rcv the skb goes through the PREROUTING hook first, then kicks
in vrf device and go through the same hook again. When conntrack dnat
works with vrf, there will be some conflict with rules because the
packet goes through the hook twice with different nf status.

ip link add user1 type vrf table 1
ip link add user2 type vrf table 2
ip l set dev tun1 master user1
ip l set dev tun2 master user2

nft add table firewall
nft add chain firewall zones { type filter hook prerouting  priority - 300 \; }
nft add rule firewall zones counter ct zone set iif map { "tun1" : 1, "tun2" : 2 }
nft add chain firewall rule-1000-ingress
nft add rule firewall rule-1000-ingress ct zone 1 tcp dport 22 ct state new counter accept
nft add rule firewall rule-1000-ingress counter drop
nft add chain firewall rule-1000-egress
nft add rule firewall rule-1000-egress tcp dport 22 ct state new counter drop
nft add rule firewall rule-1000-egress counter accept

nft add chain firewall rules-all { type filter hook prerouting priority - 150 \; }
nft add rule firewall rules-all ip daddr vmap { "2.2.2.11" : jump rule-1000-ingress }
nft add rule firewall rules-all ct zone vmap { 1 : jump rule-1000-egress }

nft add rule firewall dnat-all ct zone vmap { 1 : jump dnat-1000 }
nft add rule firewall dnat-1000 ip daddr 2.2.2.11 counter dnat to 10.0.0.7

For a package with ip daddr 2.2.2.11 and tcp dport 22, first time accept in the
rule-1000-ingress and dnat to 10.0.0.7. Then second time the packet goto the wrong
chain rule-1000-egress which leads the packet drop

With this patch, userspace can add the 'don't re-do entire ruleset for
vrf' policy itself via:

nft add rule firewall rules-all meta iifkind "vrf" counter accept
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

0fb4d219

netfilter: nf_conntrack: provide modparam to always register conntrack hooks · ba3fbe66

Pablo Neira Ayuso authored Jan 16, 2019

The connection tracking hooks can be optionally registered per netns
when conntrack is specifically invoked from the ruleset since
0c66dc1e ("netfilter: conntrack: register hooks in netns when needed
by ruleset"). Then, since 4d3a57f2 ("netfilter: conntrack: do not
enable connection tracking unless needed"), the default behaviour is
changed to always register them on demand.

This patch provides a toggle that allows users to always register them.
Without this toggle, in order to use conntrack for statistics
collection, you need a dummy rule that refers to conntrack, eg.

        iptables -I INPUT -m state --state NEW

This patch allows users to restore the original behaviour via modparam,
ie. always register connection tracking, eg.

        modprobe nf_conntrack enable_hooks=1

Hence, no dummy rule is required.
Reported-by: Laura Garcia <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ba3fbe66

netfilter: conntrack: remove nf_ct_l4proto_find_get · 4a60dc74

Florian Westphal authored Jan 15, 2019

Its now same as __nf_ct_l4proto_find(), so rename that to
nf_ct_l4proto_find and use it everywhere.

It never returns NULL and doesn't need locks or reference counts.

Before this series:
302824  net/netfilter/nf_conntrack.ko
 21504  net/netfilter/nf_conntrack_proto_gre.ko

  text	   data	    bss	    dec	    hex	filename
  6281	   1732	      4	   8017	   1f51	nf_conntrack_proto_gre.ko
108356	  20613	    236	 129205	  1f8b5	nf_conntrack.ko

After:
294864  net/netfilter/nf_conntrack.ko
  text	   data	    bss	    dec	    hex	filename
106979	  19557	    240	 126776	  1ef38	nf_conntrack.ko

so, even with builtin gre, total size got reduced.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

4a60dc74

netfilter: conntrack: remove l4proto destroy hook · e5689435

Florian Westphal authored Jan 15, 2019

Only one user (gre), add a direct call and remove this facility.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

e5689435

netfilter: conntrack: remove l4proto init and get_net callbacks · 2a389de8

Florian Westphal authored Jan 15, 2019

Those were needed we still had modular trackers.
As we don't have those anymore, prefer direct calls and remove all
the (un)register infrastructure associated with this.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

2a389de8

netfilter: conntrack: remove sysctl registration helpers · 70aed464

Florian Westphal authored Jan 15, 2019

After previous patch these are not used anymore.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

70aed464

netfilter: conntrack: unify sysctl handling · b884fa46

Florian Westphal authored Jan 15, 2019

Due to historical reasons, all l4 trackers register their own
sysctls.

This leads to copy&pasted boilerplate code, that does exactly same
thing, just with different data structure.

Place all of this in a single file.

This allows to remove the various ctl_table pointers from the ct_netns
structure and reduces overall code size.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

b884fa46

netfilter: conntrack: avoid unneeded nf_conntrack_l4proto lookups · 303e0c55

Florian Westphal authored Jan 15, 2019

after removal of the packet and invert function pointers, several
places do not need to lookup the l4proto structure anymore.

Remove those lookups.
The function nf_ct_invert_tuplepr becomes redundant, replace
it with nf_ct_invert_tuple everywhere.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

303e0c55

netfilter: conntrack: remove pernet l4 proto register interface · edf0338d

Florian Westphal authored Jan 15, 2019

No used anymore.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

edf0338d

netfilter: conntrack: remove remaining l4proto indirect packet calls · 44fb87f6

Florian Westphal authored Jan 15, 2019

Now that all l4trackers are builtin, no need to use a mix of direct and
indirect calls.
This removes the last two users: gre and the generic l4 protocol
tracker.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

44fb87f6

netfilter: conntrack: remove module owner field · b184356d

Florian Westphal authored Jan 15, 2019

No need to get/put module owner reference, none of these can be removed
anymore.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

b184356d

netfilter: conntrack: remove invert_tuple callback · 197c4300

Florian Westphal authored Jan 15, 2019

Only used by icmp(v6).  Prefer a direct call and remove this
function from the l4proto struct.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

197c4300

netfilter: conntrack: remove pkt_to_tuple callback · df5e1629

Florian Westphal authored Jan 15, 2019

GRE is now builtin, so we can handle it via direct call and
remove the callback.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

df5e1629

netfilter: conntrack: remove net_id · 751fc301

Florian Westphal authored Jan 15, 2019

No users anymore.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

751fc301

netfilter: conntrack: gre: switch module to be built-in · 22fc4c4c

Florian Westphal authored Jan 15, 2019

This makes the last of the modular l4 trackers 'bool'.

After this, all infrastructure to handle dynamic l4 protocol registration
becomes obsolete and can be removed in followup patches.

Old:
302824 net/netfilter/nf_conntrack.ko
 21504 net/netfilter/nf_conntrack_proto_gre.ko

New:
313728 net/netfilter/nf_conntrack.ko

Old:
   text	   data	    bss	    dec	    hex	filename
   6281	   1732	      4	   8017	   1f51	nf_conntrack_proto_gre.ko
 108356	  20613	    236	 129205	  1f8b5	nf_conntrack.ko
New:
 112095	  21381	    240	 133716	  20a54	nf_conntrack.ko

The size increase is only temporary.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

22fc4c4c

netfilter: conntrack: gre: convert rwlock to rcu · 202e651c

Florian Westphal authored Jan 15, 2019

We can use gre.  Lock is only needed when a new expectation is added.

In case a single spinlock proves to be problematic we can either add one
per netns or use an array of locks combined with net_hash_mix() or similar
to pick the 'correct' one.

But given this is only needed for an expectation rather than per packet
a single one should be ok.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

202e651c

netfilter: conntrack: handle icmp pkt_to_tuple helper via direct calls · e2e48b47

Florian Westphal authored Jan 15, 2019

rather than handling them via indirect call, use a direct one instead.
This leaves GRE as the last user of this indirect call facility.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

e2e48b47

netfilter: conntrack: handle builtin l4proto packet functions via direct calls · a47c5404

Florian Westphal authored Jan 15, 2019

The l4 protocol trackers are invoked via indirect call: l4proto->packet().

With one exception (gre), all l4trackers are builtin, so we can make
.packet optional and use a direct call for most protocols.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

a47c5404

netfilter: nf_tables: Support RULE_ID reference in new rule · 75dd48e2

Phil Sutter authored Jan 14, 2019

To allow for a batch to contain rules in arbitrary ordering, introduce
NFTA_RULE_POSITION_ID attribute which works just like NFTA_RULE_POSITION
but contains the ID of another rule within the same batch. This helps
iptables-nft-restore handling dumps with mixed insert/append commands
correctly.

Note that NFTA_RULE_POSITION takes precedence over
NFTA_RULE_POSITION_ID, so if the former is present, the latter is
ignored.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

75dd48e2

netfilter: physdev: relax br_netfilter dependency · 8e2f311a

Florian Westphal authored Jan 11, 2019

Following command:
  iptables -D FORWARD -m physdev ...
causes connectivity loss in some setups.

Reason is that iptables userspace will probe kernel for the module revision
of the physdev patch, and physdev has an artificial dependency on
br_netfilter (xt_physdev use makes no sense unless a br_netfilter module
is loaded).

This causes the "phydev" module to be loaded, which in turn enables the
"call-iptables" infrastructure.

bridged packets might then get dropped by the iptables ruleset.

The better fix would be to change the "call-iptables" defaults to 0 and
enforce explicit setting to 1, but that breaks backwards compatibility.

This does the next best thing: add a request_module call to checkentry.
This was a stray '-D ... -m physdev' won't activate br_netfilter
anymore.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

8e2f311a

netfilter: conntrack: remove helper hook again · 827318fe

Florian Westphal authored Jan 09, 2019

place them into the confirm one.

Old:
 hook (300): ipv4/6_help() first call helper, then seqadj.
 hook (INT_MAX): confirm

Now:
 hook (INT_MAX): confirm, first call helper, then seqadj, then confirm

Not having the extra call is noticeable in bechmarks.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

827318fe

netfilter: nf_tables: add direct calls for all builtin expressions · 10870dd8

Florian Westphal authored Jan 08, 2019

With CONFIG_RETPOLINE its faster to add an if (ptr == &foo_func)
check and and use direct calls for all the built-in expressions.

~15% improvement in pathological cases.

checkpatch doesn't like the X macro due to the embedded return statement,
but the macro has a very limited scope so I don't think its a problem.

I would like to avoid bugs of the form
  If (e->ops->eval == (unsigned long)nft_foo_eval)
	 nft_bar_eval();

and open-coded if ()/else if()/else cascade, thus the macro.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

10870dd8

netfilter: nf_tables: handle nft_object lookups via rhltable · 4d44175a

Florian Westphal authored Jan 08, 2019

Instead of linear search, use rhlist interface to look up the objects.
This fixes rulesets with thousands of named objects (quota, counters and
the like).

We only use a single table for this and consider the address of the
table we're doing the lookup in as a part of the key.

This reduces restore time of a sample ruleset with ~20k named counters
from 37 seconds to 0.8 seconds.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

4d44175a

netfilter: nf_tables: prepare nft_object for lookups via hashtable · d152159b

Florian Westphal authored Jan 08, 2019

Add a 'key' structure for object, so we can look them up by name + table
combination (the name can be the same in each table).
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

d152159b

Merge branch 'tcp_openreq_child' · 435f3f26

David S. Miller authored Jan 17, 2019

Eric Dumazet says:

====================
tcp: remove code from tcp_create_openreq_child()

tcp_create_openreq_child() is essentially cloning a listener, then
must initialize some fields that can not be inherited.

Listeners are either fresh sockets, or sockets that came through
tcp_disconnect() after a session that dirtied many fields.

By moving code to tcp_disconnect(), we can shorten time taken
to create a clone, since tcp_disconnect() operation is very
unlikely.
====================
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

435f3f26

tcp: move rx_opt & syn_data_acked init to tcp_disconnect() · 6bcdc40d

Eric Dumazet authored Jan 17, 2019

If we make sure all listeners have these fields cleared, then a clone
will also inherit zero values.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6bcdc40d

tcp: move tp->rack init to tcp_disconnect() · 792c4354

Eric Dumazet authored Jan 17, 2019

If we make sure all listeners have proper tp->rack value,
then a clone will also inherit proper initial value.

Note that fresh sockets init tp->rack from tcp_init_sock()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

792c4354

tcp: move app_limited init to tcp_disconnect() · 6cda8b74

Eric Dumazet authored Jan 17, 2019

If we make sure all listeners have app_limited set to ~0U,
then a clone will also inherit proper initial value.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6cda8b74

tcp: move retrans_out, sacked_out, tlp_high_seq, last_oow_ack_time init to tcp_disconnect() · 5c701549

Eric Dumazet authored Jan 17, 2019

If we make sure all listeners have these fields cleared, then a clone
will also inherit zero values.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5c701549

tcp: do not clear urg_data in tcp_create_openreq_child · 5d836764

Eric Dumazet authored Jan 17, 2019

All listeners have this field cleared already, since tcp_disconnect()
clears it and newly created sockets have also a zero value here.

So a clone will inherit a zero value here.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5d836764

tcp: move snd_cwnd & snd_cwnd_cnt init to tcp_disconnect() · 3a9a57f6

Eric Dumazet authored Jan 17, 2019

Passive connections can inherit proper value by cloning,
if we make sure all listeners have the proper values there.

tcp_disconnect() was setting snd_cwnd to 2, which seems
quite obsolete since IW10 adoption.

Also remove an obsolete comment.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3a9a57f6

tcp: move mdev_us init to tcp_disconnect() · b9e2e689

Eric Dumazet authored Jan 17, 2019

If we make sure a listener always has its mdev_us
field set to TCP_TIMEOUT_INIT, we do not need to rewrite
this field after a new clone is created.

tcp_disconnect() is very seldom used in real applications.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b9e2e689