Commits · 12f7a505331e6b2754684b509f2ac8f0011ce644 · nexedi / linux

16 Jun, 2012 7 commits

netfilter: add user-space connection tracking helper infrastructure · 12f7a505

Pablo Neira Ayuso authored May 13, 2012

There are good reasons to supports helpers in user-space instead:

* Rapid connection tracking helper development, as developing code
  in user-space is usually faster.

* Reliability: A buggy helper does not crash the kernel. Moreover,
  we can monitor the helper process and restart it in case of problems.

* Security: Avoid complex string matching and mangling in kernel-space
  running in privileged mode. Going further, we can even think about
  running user-space helpers as a non-root process.

* Extensibility: It allows the development of very specific helpers (most
  likely non-standard proprietary protocols) that are very likely not to be
  accepted for mainline inclusion in the form of kernel-space connection
  tracking helpers.

This patch adds the infrastructure to allow the implementation of
user-space conntrack helpers by means of the new nfnetlink subsystem
`nfnetlink_cthelper' and the existing queueing infrastructure
(nfnetlink_queue).

I had to add the new hook NF_IP6_PRI_CONNTRACK_HELPER to register
ipv[4|6]_helper which results from splitting ipv[4|6]_confirm into
two pieces. This change is required not to break NAT sequence
adjustment and conntrack confirmation for traffic that is enqueued
to our user-space conntrack helpers.

Basic operation, in a few steps:

1) Register user-space helper by means of `nfct':

 nfct helper add ftp inet tcp

 [ It must be a valid existing helper supported by conntrack-tools ]

2) Add rules to enable the FTP user-space helper which is
   used to track traffic going to TCP port 21.

For locally generated packets:

 iptables -I OUTPUT -t raw -p tcp --dport 21 -j CT --helper ftp

For non-locally generated packets:

 iptables -I PREROUTING -t raw -p tcp --dport 21 -j CT --helper ftp

3) Run the test conntrackd in helper mode (see example files under
   doc/helper/conntrackd.conf

 conntrackd

4) Generate FTP traffic going, if everything is OK, then conntrackd
   should create expectations (you can check that with `conntrack':

 conntrack -E expect

    [NEW] 301 proto=6 src=192.168.1.136 dst=130.89.148.12 sport=0 dport=54037 mask-src=255.255.255.255 mask-dst=255.255.255.255 sport=0 dport=65535 master-src=192.168.1.136 master-dst=130.89.148.12 sport=57127 dport=21 class=0 helper=ftp
[DESTROY] 301 proto=6 src=192.168.1.136 dst=130.89.148.12 sport=0 dport=54037 mask-src=255.255.255.255 mask-dst=255.255.255.255 sport=0 dport=65535 master-src=192.168.1.136 master-dst=130.89.148.12 sport=57127 dport=21 class=0 helper=ftp

This confirms that our test helper is receiving packets including the
conntrack information, and adding expectations in kernel-space.

The user-space helper can also store its private tracking information
in the conntrack structure in the kernel via the CTA_HELP_INFO. The
kernel will consider this a binary blob whose layout is unknown. This
information will be included in the information that is transfered
to user-space via glue code that integrates nfnetlink_queue and
ctnetlink.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

12f7a505

netfilter: ctnetlink: add CTA_HELP_INFO attribute · ae243bee

Pablo Neira Ayuso authored Jun 07, 2012

This attribute can be used to modify and to dump the internal
protocol information.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ae243bee

netfilter: nfnetlink_queue: add NAT TCP sequence adjustment if packet mangled · 8c88f87c

Pablo Neira Ayuso authored Jun 07, 2012

User-space programs that receive traffic via NFQUEUE may mangle packets.
If NAT is enabled, this usually puzzles sequence tracking, leading to
traffic disruptions.

With this patch, nfnl_queue will make the corresponding NAT TCP sequence
adjustment if:

1) The packet has been mangled,
2) the NFQA_CFG_F_CONNTRACK flag has been set, and
3) NAT is detected.

There are some records on the Internet complaning about this issue:
http://stackoverflow.com/questions/260757/packet-mangling-utilities-besides-iptables

By now, we only support TCP since we have no helpers for DCCP or SCTP.
Better to add this if we ever have some helper over those layer 4 protocols.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

8c88f87c

netfilter: add glue code to integrate nfnetlink_queue and ctnetlink · 9cb01766

Pablo Neira Ayuso authored Jun 07, 2012

This patch allows you to include the conntrack information together
with the packet that is sent to user-space via NFQUEUE.

Previously, there was no integration between ctnetlink and
nfnetlink_queue. If you wanted to access conntrack information
from your libnetfilter_queue program, you required to query
ctnetlink from user-space to obtain it. Thus, delaying the packet
processing even more.

Including the conntrack information is optional, you can set it
via NFQA_CFG_F_CONNTRACK flag with the new NFQA_CFG_FLAGS attribute.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

9cb01766

netfilter: nf_ct_helper: implement variable length helper private data · 1afc5679

Pablo Neira Ayuso authored Jun 07, 2012

This patch uses the new variable length conntrack extensions.

Instead of using union nf_conntrack_help that contain all the
helper private data information, we allocate variable length
area to store the private helper data.

This patch includes the modification of all existing helpers.
It also includes a couple of include header to avoid compilation
warnings.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

1afc5679

netfilter: nf_ct_ext: support variable length extensions · 3cf4c7e3

Pablo Neira Ayuso authored Feb 01, 2012

We can now define conntrack extensions of variable size. This
patch is useful to get rid of these unions:

union nf_conntrack_help
union nf_conntrack_proto
union nf_conntrack_nat_help
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

3cf4c7e3

netfilter: nf_ct_helper: allocate 16 bytes for the helper and policy names · 3a8fc53a

Pablo Neira Ayuso authored Jan 15, 2012

This patch modifies the struct nf_conntrack_helper to allocate
the room for the helper name. The maximum length is 16 bytes
(this was already introduced in 2.6.24).

For the maximum length for expectation policy names, I have
also selected 16 bytes.

This patch is required by the follow-up patch to support
user-space connection tracking helpers.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

3a8fc53a

12 Jun, 2012 2 commits

ipv4: Add interface option to enable routing of 127.0.0.0/8 · d0daebc3

Thomas Graf authored Jun 12, 2012

Routing of 127/8 is tradtionally forbidden, we consider
packets from that address block martian when routing and do
not process corresponding ARP requests.

This is a sane default but renders a huge address space
practically unuseable.

The RFC states that no address within the 127/8 block should
ever appear on any network anywhere but it does not forbid
the use of such addresses outside of the loopback device in
particular. For example to address a pool of virtual guests
behind a load balancer.

This patch adds a new interface option 'route_localnet'
enabling routing of the 127/8 address block and processing
of ARP requests on a specific interface.

Note that for the feature to work, the default local route
covering 127/8 dev lo needs to be removed.

Example:
  $ sysctl -w net.ipv4.conf.eth0.route_localnet=1
  $ ip route del 127.0.0.0/8 dev lo table local
  $ ip addr add 127.1.0.1/16 dev eth0
  $ ip route flush cache

V2: Fix invalid check to auto flush cache (thanks davem)
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d0daebc3

Merge branch 'master' of... · 0440507b

John W. Linville authored Jun 12, 2012

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem

0440507b

11 Jun, 2012 19 commits

phy: Use pr_<level> · 8d242488

Joe Perches authored Jun 09, 2012

Use a more current logging style.

Add pr_fmt and missing newlines.
Remove embedded prefixes.
Neaten phy_print_status to avoid using KERN_CONT.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8d242488

af_packet: use sizeof instead of constant in spkt_device · de74e92a

danborkmann@iogearbox.net authored Jun 10, 2012

This small patch removes access to the last element of the spkt_device
array through a constant. Instead, it is accessed by sizeof() to respect
possible changes in if_packet.h.
Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

de74e92a

inet: Fix BUG triggered by __rt{,6}_get_peer(). · 55afabaa

David S. Miller authored Jun 11, 2012

If no peer actually gets attached (either because create is zero or
the peer allocation fails) we'll trigger a BUG because we
unconditionally do an rt{,6}_peer_ptr() afterwards.

Fix this by guarding it with the proper check.
Signed-off-by: David S. Miller <davem@davemloft.net>

55afabaa

netfilter: nf_ct_tcp, udp: fix compilation with sysctl disabled · 352e04b9

Pablo Neira Ayuso authored Jun 11, 2012

This patch fixes the compilation of the TCP and UDP trackers with sysctl
compilation disabled:

net/netfilter/nf_conntrack_proto_udp.c: In function ‘udp_init_net_data’:
net/netfilter/nf_conntrack_proto_udp.c:279:13: error: ‘struct nf_proto_net’ has no member named
 ‘user’
net/netfilter/nf_conntrack_proto_tcp.c:1606:9: error: ‘struct nf_proto_net’ has no member named
 ‘user’
net/netfilter/nf_conntrack_proto_tcp.c:1643:9: error: ‘struct nf_proto_net’ has no member named
 ‘user’
Reported-by: Fengguang Wu <wfg@linux.intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

352e04b9

net: keep name_hlist close to name · 9136461a

Eric Dumazet authored Jun 11, 2012

__dev_get_by_name() is slow because pm_qos_req has been inserted between
name[] and name_hlist, adding cache misses.

pm_qos_req has nothing to do at the beginning of struct net_device
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9136461a

Merge branch 'master' of git://1984.lsi.us.es/net-next · 67da2552
David S. Miller authored Jun 11, 2012

67da2552

ssb: add missing PCI ID for b/g/n single band BCM4322 · 7f0d9f43

Jonas Gorski authored Jun 10, 2012

14e4:432c is found on some bcm63xx devices. The device is working fine
with b43.
Reported-by: Álvaro Fernández Rojas <noltari@gmail.com>
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

7f0d9f43

ath9k_hw: Initvals update for AR9462 · cba63e99

Sujith Manoharan authored Jun 08, 2012

MSI is enabled by default for most of the 4th generation
chips. Add this for AR9462 - this fixes PowerSave operation,
the chip was not entering Network-Sleep mode earlier.
With proper powering down of the MAC now, power consumption
in associated state is reduced considerably.
Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

cba63e99

mwifiex: scan less channels per scan command to improve Tx traffic · 658f37b7

Amitkumar Karwar authored Jun 06, 2012

Currently 4 channels are scanned per scan command. if scan request
is issued by user during Tx traffic, radio will be out of channel
for "4 * per_chan_scan_time" for each scan command and will not be
able to receive Rx packets. This adds delay in data traffic. We can
minimize it by reducing number of channels scanned per scan command
in this scenario.

We can not always scan 1 channel per scan command due to limitation
of number of command buffers. So we add code to decide number of
channels scanned per scan command in associated state.
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

658f37b7

mwifiex: fix simultaneous scan and Tx traffic problem · 3249ba73

Amitkumar Karwar authored Jun 06, 2012

If scan operation is started when Tx traffic is already running,
driver locks Tx queue until it gets completed. With this logic
there is a delay for Tx packets.

This patch implements new approach to give Tx path higher priority
in this case. Driver internally sends multiple synchronous scan
commands to firmware when scan is requested by user. Now we will
make sure that Tx queue is empty everytime before sending next scan
command. If Tx queue isn't empty scan command will be postponsed by
20msec. This rule will be followed until Tx queue becomes empty or
timeout of 1 second happens. In case of timeout scan operation will
be aborted.
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

3249ba73

mwifiex: shorten per channel scan time · 38e8b7d9

Bing Zhao authored Jun 06, 2012

Currently the scan time per channel for active scanning is set to
200ms. It takes quite a while to finsh scanning on all channels,
especially with a dual band configuration.

Change the per channel scan time settings to the following values:

passive scan: 110ms
active scan: 30ms
specific scan: 30ms

Above settings have been tested on x86 and arm platforms.
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

38e8b7d9

Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next · b6038961
John W. Linville authored Jun 11, 2012
```
Conflicts:
	drivers/net/wireless/iwlwifi/iwl-eeprom.c
```
b6038961
Merge tag 'nfc-next-3.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-3.0 · 2e486868
John W. Linville authored Jun 11, 2012

2e486868

inet: Avoid potential NULL peer dereference. · 7b34ca2a

David S. Miller authored Jun 11, 2012

We handle NULL in rt{,6}_set_peer but then our caller will try to pass
that NULL pointer into inet_putpeer() which isn't ready for it.

Fix this by moving the NULL check one level up, and then remove the
now unnecessary NULL check from inetpeer_ptr_set_peer().
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7b34ca2a

inet: Use FIB table peer roots in routes. · 8b96d22d
David S. Miller authored Jun 11, 2012
```
Signed-off-by: David S. Miller <davem@davemloft.net>
```
8b96d22d
inet: Add inetpeer tree roots to the FIB tables. · 8e773277
David S. Miller authored Jun 11, 2012
```
Signed-off-by: David S. Miller <davem@davemloft.net>
```
8e773277

inet: Add family scope inetpeer flushes. · b48c80ec

David S. Miller authored Jun 10, 2012

This implementation can deal with having many inetpeer roots, which is
a necessary prerequisite for per-FIB table rooted peer tables.

Each family (AF_INET, AF_INET6) has a sequence number which we bump
when we get a family invalidation request.

Each peer lookup cheaply checks whether the flush sequence of the
root we are using is out of date, and if so flushes it and updates
the sequence number.
Signed-off-by: David S. Miller <davem@davemloft.net>

b48c80ec

ipv4: Kill ip_rt_frag_needed(). · 46517008

David S. Miller authored Jun 10, 2012

There is zero point to this function.

It's only real substance is to perform an extremely outdated BSD4.2
ICMP check, which we can safely remove.  If you really have a MTU
limited link being routed by a BSD4.2 derived system, here's a nickel
go buy yourself a real router.

The other actions of ip_rt_frag_needed(), checking and conditionally
updating the peer, are done by the per-protocol handlers of the ICMP
event.

TCP, UDP, et al. have a handler which will receive this event and
transmit it back into the associated route via dst_ops->update_pmtu().

This simplification is important, because it eliminates the one place
where we do not have a proper route context in which to make an
inetpeer lookup.
Signed-off-by: David S. Miller <davem@davemloft.net>

46517008

inet: Hide route peer accesses behind helpers. · 97bab73f

David S. Miller authored Jun 09, 2012

We encode the pointer(s) into an unsigned long with one state bit.

The state bit is used so we can store the inetpeer tree root to use
when resolving the peer later.

Later the peer roots will be per-FIB table, and this change works to
facilitate that.
Signed-off-by: David S. Miller <davem@davemloft.net>

97bab73f

10 Jun, 2012 3 commits

inet: Pass inetpeer root into inet_getpeer*() interfaces. · c0efc887

David S. Miller authored Jun 09, 2012

Otherwise we reference potentially non-existing members when
ipv6 is disabled.
Signed-off-by: David S. Miller <davem@davemloft.net>

c0efc887

af_unix: remove unix_iter_state · 8b51b064

Eric Dumazet authored Jun 08, 2012

As pointed out by Michael Tokarev , struct unix_iter_state is no longer
needed.
Suggested-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8b51b064

ipv6: Do not mark ipv6_inetpeer_ops as __net_initdata. · 2b823f72
David S. Miller authored Jun 09, 2012
```
Signed-off-by: David S. Miller <davem@davemloft.net>
```
2b823f72

09 Jun, 2012 5 commits

inet: Consolidate inetpeer_invalidate_tree() interfaces. · 56a6b248

David S. Miller authored Jun 09, 2012

We only need one interface for this operation, since we always know
which inetpeer root we want to flush.
Signed-off-by: David S. Miller <davem@davemloft.net>

56a6b248

inet: Initialize per-netns inetpeer roots in net/ipv{4,6}/route.c · c3426b47
David S. Miller authored Jun 09, 2012
```
Instead of net/ipv4/inetpeer.c
Signed-off-by: David S. Miller <davem@davemloft.net>
```
c3426b47

[PATCH] tcp: Cache inetpeer in timewait socket, and only when necessary. · 2397849b

David S. Miller authored Jun 09, 2012

Since it's guarenteed that we will access the inetpeer if we're trying
to do timewait recycling and TCP options were enabled on the
connection, just cache the peer in the timewait socket.

In the future, inetpeer lookups will be context dependent (per routing
realm), and this helps facilitate that as well.
Signed-off-by: David S. Miller <davem@davemloft.net>

2397849b

tcp: Get rid of inetpeer special cases. · 4670fd81

David S. Miller authored Jun 09, 2012

The get_peer method TCP uses is full of special cases that make no
sense accommodating, and it also gets in the way of doing more
reasonable things here.

First of all, if the socket doesn't have a usable cached route, there
is no sense in trying to optimize timewait recycling.

Likewise for the case where we have IP options, such as SRR enabled,
that make the IP header destination address (and thus the destination
address of the route key) differ from that of the connection's
destination address.

Just return a NULL peer in these cases, and thus we're also able to
get rid of the clumsy inetpeer release logic.
Signed-off-by: David S. Miller <davem@davemloft.net>

4670fd81

inet: Create and use rt{,6}_get_peer_create(). · fbfe95a4

David S. Miller authored Jun 08, 2012

There's a lot of places that open-code rt{,6}_get_peer() only because
they want to set 'create' to one.  So add an rt{,6}_get_peer_create()
for their sake.

There were also a few spots open-coding plain rt{,6}_get_peer() and
those are transformed here as well.
Signed-off-by: David S. Miller <davem@davemloft.net>

fbfe95a4

08 Jun, 2012 4 commits

af_unix: speedup /proc/net/unix · 7123aaa3

Eric Dumazet authored Jun 08, 2012

/proc/net/unix has quadratic behavior, and can hold unix_table_lock for
a while if high number of unix sockets are alive. (90 ms for 200k
sockets...)

We already have a hash table, so its quite easy to use it.

Problem is unbound sockets are still hashed in a single hash slot
(unix_socket_table[UNIX_HASH_TABLE])

This patch also spreads unbound sockets to 256 hash slots, to speedup
both /proc/net/unix and unix_diag.

Time to read /proc/net/unix with 200k unix sockets :
(time dd if=/proc/net/unix of=/dev/null bs=4k)

before : 520 secs
after : 2 secs
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7123aaa3

inetpeer: add parameter net for inet_getpeer_v4,v6 · 54db0cc2

Gao feng authored Jun 08, 2012

add struct net as a parameter of inet_getpeer_v[4,6],
use net to replace &init_net.

and modify some places to provide net for inet_getpeer_v[4,6]
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

54db0cc2

inetpeer: add namespace support for inetpeer · c8a627ed

Gao feng authored Jun 08, 2012

now inetpeer doesn't support namespace,the information will
be leaking across namespace.

this patch move the global vars v4_peers and v6_peers to
netns_ipv4 and netns_ipv6 as a field peers.

add struct pernet_operations inetpeer_ops to initial pernet
inetpeer data.

and change family_to_base and inet_getpeer to support namespace.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c8a627ed

wl18xx: avoid some -Wformat warnings · 934b9d1e

John W. Linville authored Jun 08, 2012

CC drivers/net/wireless/ti/wl18xx/main.o
drivers/net/wireless/ti/wl18xx/main.c: In function ‘wl18xx_conf_init’:
drivers/net/wireless/ti/wl18xx/main.c:1024:3: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘long unsigned int’ [-Wformat]
drivers/net/wireless/ti/wl18xx/main.c:1024:3: warning: format ‘%d’ expects argument of type ‘int’, but argument 3 has type ‘size_t’ [-Wformat]
Signed-off-by: John W. Linville <linville@tuxdriver.com>

934b9d1e