Commits · 266918303226cceac7eca38ced30f15f277bd89c · Kirill Smelkov / linux

12 Oct, 2007 9 commits

[SKY2]: status polling loop (post merge) · 26691830

Stephen Hemminger authored Oct 11, 2007

Handle the corner case where budget is exhausted correctly.
And save unnecessary read of index register.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

26691830

[NET]: Fix NAPI completion handling in some drivers. · 6f535763

David S. Miller authored Oct 11, 2007

In order for the list handling in net_rx_action() to be
correct, drivers must follow certain rules as stated by
this comment in net_rx_action():

		/* Drivers must not modify the NAPI state if they
		 * consume the entire weight.  In such cases this code
		 * still "owns" the NAPI instance and therefore can
		 * move the instance around on the list at-will.
		 */

A few drivers do not do this because they mix the budget checks
with reading hardware state, resulting in crashes like the one
reported by takano@axe-inc.co.jp.

BNX2 and TG3 are taken care of here, SKY2 fix is from Stephen
Hemminger.
Signed-off-by: David S. Miller <davem@davemloft.net>

6f535763

[TCP]: Limit processing lost_retrans loop to work-to-do cases · b08d6cb2

Ilpo Järvinen authored Oct 11, 2007

This addition of lost_retrans_low to tcp_sock might be
unnecessary, it's not clear how often lost_retrans worker is
executed when there wasn't work to do.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>

b08d6cb2

[TCP]: Fix lost_retrans loop vs fastpath problems · f785a8e2

Ilpo Järvinen authored Oct 11, 2007

Detection implemented with lost_retrans must work also when
fastpath is taken, yet most of the queue is skipped including
(very likely) those retransmitted skb's we're interested in.
This problem appeared when the hints got added, which removed
a need to always walk over the whole write queue head.
Therefore decicion for the lost_retrans worker loop entry must
be separated from the sacktag processing more than it was
necessary before.

It turns out to be problematic to optimize the worker loop
very heavily because ack_seqs of skb may have a number of
discontinuity points. Maybe similar approach as currently is
implemented could be attempted but that's becoming more and
more complex because the trend is towards less skb walking
in sacktag marker. Trying a simple work until all rexmitted
skbs heve been processed approach.

Maybe after(highest_sack_end_seq, tp->high_seq) checking is not
sufficiently accurate and causes entry too often in no-work-to-do
cases. Since that's not known, I've separated solution to that
from this patch.

Noticed because of report against a related problem from TAKANO
Ryousei <takano@axe-inc.co.jp>. He also provided a patch to
that part of the problem. This patch includes solution to it
(though this patch has to use somewhat different placement).
TAKANO's description and patch is available here:

  http://marc.info/?l=linux-netdev&m=119149311913288&w=2

...In short, TAKANO's problem is that end_seq the loop is using
not necessarily the largest SACK block's end_seq because the
current ACK may still have higher SACK blocks which are later
by the loop.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>

f785a8e2

[TCP]: No need to re-count fackets_out/sacked_out at RTO · 4cd82999

Ilpo Järvinen authored Oct 11, 2007

Both sacked_out and fackets_out are directly known from how
parameter. Since fackets_out is accurate, there's no need for
recounting (sacked_out was previously unnecessarily counted
in the loop anyway).
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>

4cd82999

[TCP]: Extract tcp_match_queue_to_sack from sacktag code · d1935942

Ilpo Järvinen authored Oct 11, 2007

This is necessary for upcoming DSACK bugfix. Reduces sacktag
length which is not very sad thing at all... :-)

Notice that there's a need to handle out-of-mem at caller's
place.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>

d1935942

[TCP]: Kill almost unused variable pcount from sacktag · f6fb128d

Ilpo Järvinen authored Oct 11, 2007

It's on the way for future cutting of that function.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>

f6fb128d

[TCP]: Fix mark_head_lost to ignore R-bit when trying to mark L · 3eec0047

Ilpo Järvinen authored Oct 11, 2007

This condition (plain R) can arise at least in recovery that
is triggered after tcp_undo_loss. There isn't any reason why
they should not be marked as lost, not marking makes in_flight
estimator to return too large values.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>

3eec0047

[TCP]: Add bytes_acked (ABC) clearing to FRTO too · 16e90681

Ilpo Järvinen authored Oct 11, 2007

I was reading tcp_enter_loss while looking for Cedric's bug and
noticed bytes_acked adjustment is missing from FRTO side.

Since bytes_acked will only be used in tcp_cong_avoid, I think
it's safe to assume RTO would be spurious. During FRTO cwnd
will be not controlled by tcp_cong_avoid and if FRTO calls for
conventional recovery, cwnd is adjusted and the result of wrong
assumption is cleared from bytes_acked. If RTO was in fact
spurious, we did normal ABC already and can continue without
any additional adjustments.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>

16e90681

11 Oct, 2007 22 commits

[IPv6]: Update setsockopt(IPV6_MULTICAST_IF) to support RFC 3493, try2 · 4953f0fc

Brian Haley authored Oct 11, 2007

 From RFC 3493, Section 5.2:

       IPV6_MULTICAST_IF

          Set the interface to use for outgoing multicast packets.  The
          argument is the index of the interface to use.  If the
          interface index is specified as zero, the system selects the
          interface (for example, by looking up the address in a routing
          table and using the resulting interface).

This patch adds support for (index == 0) to reset the value to it's 
original state, allowing the system to choose the best interface.  IPv4 
already behaves this way.
Signed-off-by: Brian Haley <brian.haley@hp.com>
Acked-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4953f0fc

[NETFILTER]: x_tables: add missing ip6t_modulename aliases · 73aaf935

Jan Engelhardt authored Oct 11, 2007

The patch will add MODULE_ALIAS("ip6t_<modulename>") where missing,
otherwise you will get

	ip6tables: No chain/target/match by that name

when xt_<modulename> is not already loaded.
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

73aaf935

[NETFILTER]: nf_conntrack_tcp: fix connection reopening · 17311393

Jozsef Kadlecsik authored Oct 11, 2007

With your description I could reproduce the bug and actually you were
completely right: the code above is incorrect. Somehow I was able to
misread RFC1122 and mixed the roles :-(:

   When a connection is >>closed actively<<, it MUST linger in
   TIME-WAIT state for a time 2xMSL (Maximum Segment Lifetime).
   However, it MAY >>accept<< a new SYN from the remote TCP to
   reopen the connection directly from TIME-WAIT state, if it:
   [...]

The fix is as follows: if the receiver initiated an active close, then the
sender may reopen the connection - otherwise try to figure out if we hold
a dead connection.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

17311393

[QETH]: fix qeth_main.c · d71fce6b

Andrew Morton authored Oct 11, 2007

drivers/s390/net/qeth_main.c: In function 'qeth_hard_header_parse':
drivers/s390/net/qeth_main.c:6584: error: 'dev' undeclared (first use in this function)
drivers/s390/net/qeth_main.c:6584: error: (Each undeclared identifier is reported only once
drivers/s390/net/qeth_main.c:6584: error: for each function it appears in.)
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

d71fce6b

[NETLINK]: fib_frontend build fixes · 28f7b036

David S. Miller authored Oct 10, 2007

1) fibnl needs to be declared outside of config ifdefs,
   and also should not be explicitly initialized to NULL
2) nl_fib_input() args are wrong for netlink_kernel_create()
   input method
Signed-off-by: David S. Miller <davem@davemloft.net>

28f7b036

[IPv6]: Export userland ND options through netlink (RDNSS support) · 31910575

Pierre Ynard authored Oct 10, 2007

As discussed before, this patch provides userland with a way to access
relevant options in Router Advertisements, after they are processed
and validated by the kernel. Extra options are processed in a generic
way; this patch only exports RDNSS options described in RFC5006, but
support to control which options are exported could be easily added.

A new rtnetlink message type is defined, to transport Neighbor
Discovery options, along with optional context information. At the
moment only the address of the router sending an RDNSS option is
included, but additional attributes may be later defined, if needed by
new use cases.
Signed-off-by: Pierre Ynard <linkfanel@yahoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

31910575

[9P]: build fix with !CONFIG_SYSCTL · 092e9d93

Ingo Molnar authored Oct 10, 2007

found via make randconfig build testing: 

 net/built-in.o: In function `init_p9':
 mod.c:(.init.text+0x3b39): undefined reference to `p9_sysctl_register'
 net/built-in.o: In function `exit_p9':
 mod.c:(.exit.text+0x36b): undefined reference to `p9_sysctl_unregister'
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>

092e9d93

[NET]: Fix dev_put() and dev_hold() comments · 9ef4429b

Benjamin Thery authored Oct 10, 2007

Trivial fix: Swap comments for dev_put() and dev_hold() to get them 
at the right place.
Typo introduced by 4fa57c9ea9f36f9ca852f3a88ca5d2f1aebbc960.
Signed-of-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

9ef4429b

[NET]: make netlink user -> kernel interface synchronious · cd40b7d3

Denis V. Lunev authored Oct 10, 2007

This patch make processing netlink user -> kernel messages synchronious.
This change was inspired by the talk with Alexey Kuznetsov about current
netlink messages processing. He says that he was badly wrong when introduced 
asynchronious user -> kernel communication.

The call netlink_unicast is the only path to send message to the kernel
netlink socket. But, unfortunately, it is also used to send data to the
user.

Before this change the user message has been attached to the socket queue
and sk->sk_data_ready was called. The process has been blocked until all
pending messages were processed. The bad thing is that this processing
may occur in the arbitrary process context.

This patch changes nlk->data_ready callback to get 1 skb and force packet
processing right in the netlink_unicast.

Kernel -> user path in netlink_unicast remains untouched.

EINTR processing for in netlink_run_queue was changed. It forces rtnl_lock
drop, but the process remains in the cycle until the message will be fully
processed. So, there is no need to use this kludges now.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

cd40b7d3

[NET]: unify netlink kernel socket recognition · aed81560

Denis V. Lunev authored Oct 10, 2007

There are currently two ways to determine whether the netlink socket is a
kernel one or a user one. This patch creates a single inline call for
this purpose and unifies all the calls in the af_netlink.c

No similar calls are found outside af_netlink.c.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

aed81560

[NET]: cleanup 3rd argument in netlink_sendskb · 7ee015e0

Denis V. Lunev authored Oct 10, 2007

netlink_sendskb does not use third argument. Clean it and save a couple of
bytes.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

7ee015e0

[NET]: Make netlink processing routines semi-synchronious (inspired by rtnl) v2 · 3b71535f

Denis V. Lunev authored Oct 10, 2007

The code in netfilter/nfnetlink.c and in ./net/netlink/genetlink.c looks
like outdated copy/paste from rtnetlink.c. Push them into sync with the
original.

Changes from v1:
- deleted comment in nfnetlink_rcv_msg by request of Patrick McHardy
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

3b71535f

[NET]: rtnl_unlock cleanups · 1536cc0d

Denis V. Lunev authored Oct 10, 2007

There is no need to process outstanding netlink user->kernel packets
during rtnl_unlock now. There is no rtnl_trylock in the rtnetlink_rcv
anymore.

Normal code path is the following:
netlink_sendmsg
   netlink_unicast
       netlink_sendskb
           skb_queue_tail
           netlink_data_ready
               rtnetlink_rcv
                   mutex_lock(&rtnl_mutex);
                   netlink_run_queue(sk, qlen, &rtnetlink_rcv_msg);
                   mutex_unlock(&rtnl_mutex);

So, it is possible, that packets can be present in the rtnl->sk_receive_queue
during rtnl_unlock, but there is no need to process them at that moment as
rtnetlink_rcv for that packet is pending.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

1536cc0d

[NETLINK]: Fix typos in comments in netlink.h · d1ec3b77

Pierre Ynard authored Oct 10, 2007

This patch fixes a few typos in comments in include/net/netlink.h
Signed-off-by: Pierre Ynard <linkfanel@yahoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

d1ec3b77

[NET]: sanitize kernel_accept() error path · fa8705b0

Tony Battersby authored Oct 10, 2007

If kernel_accept() returns an error, it may pass back a pointer to
freed memory (which the caller should ignore).  Make it pass back NULL
instead for better safety.
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fa8705b0

[TG3]: Update version to 3.83 · 414c66e0

Matt Carlson authored Oct 10, 2007

Update to version 3.83.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

414c66e0

[TG3]: WOL defaults · 0527ba35

Matt Carlson authored Oct 10, 2007

This patch enables WOL by default if out-of-box WOL is enabled in the
NVRAM.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0527ba35

[TG3]: Add 5761 support · 9936bcf6

Matt Carlson authored Oct 10, 2007

This patch adds rest of the miscellaneous code required to support the
5761.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9936bcf6

[TG3]: Add 5761 APE support · 0d3031d9

Matt Carlson authored Oct 10, 2007

This patch adds support for the new APE block, present in 5761 chips.
APE stands for Application Processing Engine.  The primary function of
the APE is to process manageability traffic, such as ASF.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0d3031d9

[TG3]: Add new 5761 NVRAM decode routines · 6b91fa02

Matt Carlson authored Oct 10, 2007

This patch adds a new 5761-specific NVRAM strapping decode routine.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6b91fa02

[INET]: local port range robustness · 227b60f5

Stephen Hemminger authored Oct 10, 2007

Expansion of original idea from Denis V. Lunev <den@openvz.org>

Add robustness and locking to the local_port_range sysctl.
1. Enforce that low < high when setting.
2. Use seqlock to ensure atomic update.

The locking might seem like overkill, but there are
cases where sysadmin might want to change value in the
middle of a DoS attack.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

227b60f5

[SCTP]: port randomization · 06393009

Stephen Hemminger authored Oct 10, 2007

Add port randomization rather than a simple fixed rover
for use with SCTP.  This makes it act similar to TCP, UDP, DCCP
when allocating ports.

No longer need port_alloc_lock as well (suggestion by Brian Haley).
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

06393009

10 Oct, 2007 9 commits

[NET_SCHED]: Show timer resolution instead of clock resolution in /proc/net/psched · 3c0cfc13

Patrick McHardy authored Oct 10, 2007

The fourth parameter of /proc/net/psched is supposed to show the timer
resultion and is used by HTB userspace to calculate the necessary
burst rate. Currently we show the clock resolution, which results in a
too low burst rate when the two differ.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

3c0cfc13

[BNX2]: Update version to 1.6.7. · 32d1316b

Michael Chan authored Oct 10, 2007

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

32d1316b

[BNX2]: Fix default WoL setting. · 846f5c62

Michael Chan authored Oct 10, 2007

Change the default WoL setting to match the NVRAM's setting.  It
always defaulted to WoL disabled before and caused a lot of confusion
for users.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

846f5c62

[BNX2]: Fix remote PHY media detection problems. · 489310a4

Michael Chan authored Oct 10, 2007

The remote PHY media type and link status can change between
->probe() and ->open().  For correct operation, we need to get the
new status again during ->open().

The ethtool link test and loopback test are also fixed to work with
remote PHY.  PHY loopback is simply skipped when remote PHY is
present.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

489310a4