Commits · 94b265514a8398ba3cfecb5a821a027b68a5c38e · Kirill Smelkov / linux

31 Aug, 2009 4 commits

IPVS: Add handling of incoming ICMPV6 messages · 94b26551

Julius Volz authored Aug 31, 2009

Add handling of incoming ICMPv6 messages.
This follows the handling of IPv4 ICMP messages.

Amongst ther things this problem allows IPVS to behave sensibly
when an ICMPV6_PKT_TOOBIG message is received:

This message is received when a realserver sends a packet >PMTU to the
client. The hop on this path with insufficient MTU will generate an
ICMPv6 Packet Too Big message back to the VIP. The LVS server receives
this message, but the call to the function handling this has been
missing. Thus, IPVS fails to forward the message to the real server,
which then does not adjust the path MTU. This patch adds the missing
call to ip_vs_in_icmp_v6() in ip_vs_in() to handle this situation.

Thanks to Rob Gallagher from HEAnet for reporting this issue and for
testing this patch in production (with direct routing mode).

[horms@verge.net.au: tweaked changelog]
Signed-off-by: Julius Volz <julius.volz@gmail.com>
Tested-by: Rob Gallagher <robert.gallagher@heanet.ie>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>

94b26551

netfilter: ip6t_eui: fix read outside array bounds · 48890869

Patrick McHardy authored Aug 31, 2009

Use memcmp() instead of open coded comparison that reads one byte past
the intended end.

Based on patch from Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

48890869

netfilter: nf_conntrack: netns fix re reliable conntrack event delivery · ee254fa4

Alexey Dobriyan authored Aug 31, 2009

Conntracks in netns other than init_net dying list were never killed.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>

ee254fa4

ipvs: Use atomic operations atomicly · 1e66dafc

Simon Horman authored Aug 31, 2009

A pointed out by Shin Hong, IPVS doesn't always use atomic operations
in an atomic manner. While this seems unlikely to be manifest in
strange behaviour, it seems appropriate to clean this up.

Cc: shin hong <hongshin@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>

1e66dafc

25 Aug, 2009 3 commits

netfilter: nfnetlink: constify message attributes and headers · 39938324
Patrick McHardy authored Aug 25, 2009
```
Signed-off-by: Patrick McHardy <kaber@trash.net>
```
39938324

netlink: constify nlmsghdr arguments · 3a6c2b41

Patrick McHardy authored Aug 25, 2009

Consitfy nlmsghdr arguments to a couple of functions as preparation
for the next patch, which will constify the netlink message data in
all nfnetlink users.
Signed-off-by: Patrick McHardy <kaber@trash.net>

3a6c2b41

netfilter: nf_conntrack: log packets dropped by helpers · 74f7a655

Patrick McHardy authored Aug 25, 2009

Log packets dropped by helpers using the netfilter logging API. This
is useful in combination with nfnetlink_log to analyze those packets
in userspace for debugging.
Signed-off-by: Patrick McHardy <kaber@trash.net>

74f7a655

24 Aug, 2009 3 commits

netfilter: bridge: refcount fix · f3abc9b9

Eric Dumazet authored Aug 24, 2009

commit f216f082
([NETFILTER]: bridge netfilter: deal with martians correctly)
added a refcount leak on in_dev.

Instead of using in_dev_get(), we can use __in_dev_get_rcu(),
as netfilter hooks are running under rcu_read_lock(), as pointed
by Patrick.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

f3abc9b9

netfilter: nf_nat: fix inverted logic for persistent NAT mappings · cce5a5c3

Maximilian Engelhardt authored Aug 24, 2009

Kernel 2.6.30 introduced a patch [1] for the persistent option in the
netfilter SNAT target. This is exactly what we need here so I had a quick look
at the code and noticed that the patch is wrong. The logic is simply inverted.
The patch below fixes this.

Also note that because of this the default behavior of the SNAT target has
changed since kernel 2.6.30 as it now ignores the destination IP in choosing
the source IP for nating (which should only be the case if the persistent
option is set).

[1] http://git.eu.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=98d500d66cb7940747b424b245fc6a51ecfbf005Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>

cce5a5c3

netfilter: xtables: mark initial tables constant · 35aad0ff

Jan Engelhardt authored Aug 24, 2009

The inputted table is never modified, so should be considered const.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>

35aad0ff

10 Aug, 2009 21 commits

Merge branch 'master' of git://dev.medozas.de/linux · dc05a564
Patrick McHardy authored Aug 10, 2009

dc05a564

netfilter: xtables: check for standard verdicts in policies · e2fe35c1

Jan Engelhardt authored Jul 18, 2009

This adds the second check that Rusty wanted to have a long time ago. :-)

Base chain policies must have absolute verdicts that cease processing
in the table, otherwise rule execution may continue in an unexpected
spurious fashion (e.g. next chain that follows in memory).
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>

e2fe35c1

netfilter: xtables: check for unconditionality of policies · 90e7d4ab

Jan Engelhardt authored Jul 09, 2009

This adds a check that iptables's original author Rusty set forth in
a FIXME comment.

Underflows in iptables are better known as chain policies, and are
required to be unconditional or there would be a stochastical chance
for the policy rule to be skipped if it does not match. If that were
to happen, rule execution would continue in an unexpected spurious
fashion.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>

90e7d4ab

netfilter: xtables: ignore unassigned hooks in check_entry_size_and_hooks · a7d51738

Jan Engelhardt authored Jul 18, 2009

The "hook_entry" and "underflow" array contains values even for hooks
not provided, such as PREROUTING in conjunction with the "filter"
table. Usually, the values point to whatever the next rule is. For
the upcoming unconditionality and underflow checking patches however,
we must not inspect that arbitrary rule.

Skipping unassigned hooks seems like a good idea, also because
newinfo->hook_entry and newinfo->underflow will then continue to have
the poison value for detecting abnormalities.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>

a7d51738

netfilter: xtables: use memcmp in unconditional check · 47901dc2

Jan Engelhardt authored Jul 09, 2009

Instead of inspecting each u32/char open-coded, clean up and make use
of memcmp. On some arches, memcmp is implemented as assembly or GCC's
__builtin_memcmp which can possibly take advantages of known
alignment.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>

47901dc2

netfilter: iptables: remove unused datalen variable · e5afbba1
Jan Engelhardt authored Jul 08, 2009
```
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
```
e5afbba1

netfilter: xtables: realign struct xt_target_param · 98d89b41

Jan Engelhardt authored Jul 05, 2009

This commit gets rid of a padding hole as reported by pahole(1).
Saves 8 bytes on x86_64.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>

98d89b41

netfilter: xtables: switch table AFs to nfproto · f88e6a8a
Jan Engelhardt authored Jun 13, 2009
```
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
```
f88e6a8a
netfilter: xtables: switch hook PFs to nfproto · 24c232d8
Jan Engelhardt authored Jun 13, 2009
```
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
```
24c232d8

netfilter: conntrack: switch hook PFs to nfproto · 57750a22

Jan Engelhardt authored Jun 13, 2009

Simple substitution to indicate that the fields indeed use the
NFPROTO_ space.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>

57750a22

netfilter: xtables: remove redirecting header files · 93bb1e9d

Jan Engelhardt authored Jun 12, 2009

When IPv4 and IPv6 matches were unified approx. 3.5 years ago, they
received new header filenames (e.g. xt_CLASSIFY.h). Let's remove the
old ones now.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>

93bb1e9d

netfilter: xtables: remove xt_owner v0 · 6461caed

Jan Engelhardt authored Jun 12, 2009

Superseded by xt_owner v1 (v2.6.24-2388-g0265ab44).
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>

6461caed

netfilter: xtables: remove xt_mark v0 · 4725c728

Jan Engelhardt authored Jun 12, 2009

Superseded by xt_mark v1 (v2.6.24-2922-g17b0d7ef).
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>

4725c728

netfilter: xtables: remove xt_iprange v0 · 36d4084d

Jan Engelhardt authored Jun 12, 2009

Superseded by xt_iprange v1 (v2.6.24-2928-g1a50c5a1).
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>

36d4084d

netfilter: xtables: remove xt_conntrack v0 · 9e05ec4b

Jan Engelhardt authored Jun 12, 2009

Superseded by xt_conntrack v1 (v2.6.24-2921-g64eb12f).
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>

9e05ec4b

netfilter: xtables: remove xt_connmark v0 · 84899a2b

Jan Engelhardt authored Jun 12, 2009

Superseded by xt_connmark v1 (v2.6.24-2919-g96e32272).
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>

84899a2b

netfilter: xtables: remove xt_MARK v0, v1 · c8001f7f

Jan Engelhardt authored Jun 12, 2009

Superseded by xt_MARK v2 (v2.6.24-2918-ge0a812ae).
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>

c8001f7f

netfilter: xtables: remove xt_CONNMARK v0 · e973a70c

Jan Engelhardt authored Jun 12, 2009

Superseded by xt_CONNMARK v1 (v2.6.24-2917-g0dc8c760).
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>

e973a70c

netfilter: xtables: remove xt_TOS v0 · 7cd1837b

Jan Engelhardt authored Jun 12, 2009

Superseded by xt_TOS v1 (v2.6.24-2396-g5c350e5a).
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>

7cd1837b

netfilter: ebtables: Use %pM conversion specifier · be39ee11

Tobias Klauser authored Aug 10, 2009

ebt_log uses its own implementation of print_mac to print MAC addresses.
This patch converts it to use the %pM conversion specifier for printk.
Signed-off-by: Tobias Klauser <klto@zhaw.ch>
Signed-off-by: Patrick McHardy <kaber@trash.net>

be39ee11

netfilter: nf_conntrack: add SCTP support for SO_ORIGINAL_DST · 54981279
Rafael Laufer authored Aug 10, 2009
```
Signed-off-by: Patrick McHardy <kaber@trash.net>
```
54981279

07 Aug, 2009 4 commits

net: Avoid enqueuing skb for default qdiscs · bbd8a0d3

Krishna Kumar authored Aug 06, 2009

dev_queue_xmit enqueue's a skb and calls qdisc_run which
dequeue's the skb and xmits it. In most cases, the skb that
is enqueue'd is the same one that is dequeue'd (unless the
queue gets stopped or multiple cpu's write to the same queue
and ends in a race with qdisc_run). For default qdiscs, we
can remove the redundant enqueue/dequeue and simply xmit the
skb since the default qdisc is work-conserving.

The patch uses a new flag - TCQ_F_CAN_BYPASS to identify the
default fast queue. The controversial part of the patch is
incrementing qlen when a skb is requeued - this is to avoid
checks like the second line below:

+  } else if ((q->flags & TCQ_F_CAN_BYPASS) && !qdisc_qlen(q) &&
>>         !q->gso_skb &&
+          !test_and_set_bit(__QDISC_STATE_RUNNING, &q->state)) {

Results of a 2 hour testing for multiple netperf sessions (1,
2, 4, 8, 12 sessions on a 4 cpu system-X). The BW numbers are
aggregate Mb/s across iterations tested with this version on
System-X boxes with Chelsio 10gbps cards:

----------------------------------
Size |  ORG BW          NEW BW   |
----------------------------------
128K |  156964          159381   |
256K |  158650          162042   |
----------------------------------

Changes from ver1:

1. Move sch_direct_xmit declaration from sch_generic.h to
   pkt_sched.h
2. Update qdisc basic statistics for direct xmit path.
3. Set qlen to zero in qdisc_reset.
4. Changed some function names to more meaningful ones.
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bbd8a0d3

mlx4_en: Not using Shared Receive Queues · 9f519f68

Yevgeny Petrilin authored Aug 06, 2009

We use 1:1 mapping between QPs and SRQs on receive side,
so additional indirection level not required. Allocated the receive
buffers for the RSS QPs.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>

9f519f68

mlx4_en: Using real number of rings as RSS map size · b6b912e0

Yevgeny Petrilin authored Aug 06, 2009

There is no point in using more QPs then actual number of receive rings.
If the RSS function for two streams gives the same result modulo number
of rings, they will arrive to the same RX ring anyway.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>

b6b912e0

mlx4_en: Adaptive moderation policy change · a35ee541

Yevgeny Petrilin authored Aug 06, 2009

If the net device is identified as "sender" (number of sent packets
is higher then the number of received packets and the incoming packets are
small), set the moderation time to its low limit.
We do it because the incoming packets are acks, and we don't want to delay them
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>

a35ee541

06 Aug, 2009 5 commits

net: smsc911x: switch to new dev_pm_ops · 6cb87823

Daniel Mack authored Aug 05, 2009

Hibernation is unsupported for now, which meets the actual
implementation in the driver. For free/thaw, the chip's D2 state should
be entered.
Signed-off-by: Daniel Mack <daniel@caiaq.de>
Acked-by: <steve.glendinning@smsc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6cb87823

tc35815: Use 0 RxFragSize.MinFrag value for non-packing mode · a48ec346

Atsushi Nemoto authored Aug 06, 2009

The datasheet say "When not enabling packing, the MinFrag value must
remain at 0".  Do not set value to RxFragSize register if
TC35815_USE_PACKEDBUFFER disabled.

This is not a bugfix.  No real problem reported on this.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>

a48ec346

tc35815: Fix rx_missed_errors count · 7bb82e83

Atsushi Nemoto authored Aug 06, 2009

The Miss_Cnt register is cleared by reading.  Accumulate its value to
rx_missed_errors count.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>

7bb82e83

tc35815: Increase timeout for mdio · c60a5cf7

Atsushi Nemoto authored Aug 06, 2009

The current timeout value is too short for very high-load condition
which jiffies might jump up in busy-loop.
Also add minimum delay before checking completion of MDIO.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>

c60a5cf7

tc35815: Improve BLEx / FDAEx handling · db30f5ef

Atsushi Nemoto authored Aug 06, 2009

Clear Int_BLEx / Int_FDAEx after (not before) processing Rx interrupt.
This will reduce number of unnecessary interrupts.
Also print rx error messages only if netif_msg_rx_err() enabled.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>

db30f5ef