Commits · 4bb073c0e32a0862bdb5215d11af19f6c0180c98 · nexedi / linux

12 Jun, 2008 14 commits

net: Eliminate flush_scheduled_work() calls while RTNL is held. · 4bb073c0

David S. Miller authored Jun 12, 2008

If the RTNL is held when we invoke flush_scheduled_work() we could
deadlock.  One such case is linkwatch, it is a work struct which tries
to grab the RTNL semaphore.

The most common case are net driver ->stop() methods.  The
simplest conversion is to instead use cancel_{delayed_}work_sync()
explicitly on the various work struct the driver uses.

This is an OK transformation because these work structs are doing
things like resetting the chip, restarting link negotiation, and so
forth.  And if we're bringing down the device, we're about to turn the
chip off and reset it anways.  So if we cancel a pending work event,
that's fine here.

Some drivers were working around this deadlock by using a msleep()
polling loop of some sort, and those cases are converted to instead
use cancel_{delayed_}work_sync() as well.
Signed-off-by: David S. Miller <davem@davemloft.net>

4bb073c0

Merge branch 'davem-fixes' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 · 7afb380d
David S. Miller authored Jun 11, 2008

7afb380d

drivers/net/r6040.c: correct bad use of round_jiffies() · 208aefa2

Christophe Jaillet authored May 15, 2008

Compared to other places in the kernel, I think that this driver misuses
the function round_jiffies.
Signed-off-by: Christophe Jaillet <christophe.jaillet@wanadoo.fr>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

208aefa2

fec_mpc52xx: MPC52xx_MESSAGES_DEFAULT: 2nd NETIF_MSG_IFDOWN => IFUP · 8b983510

Roel Kluin authored Jun 09, 2008

Duplicate NETIF_MSG_IFDOWN, 2nd should be NETIF_MSG_IFUP
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Acked-by: Sylvain Munaut <tnt@246tNt.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

8b983510

ipg: fix receivemode IPG_RM_RECEIVEMULTICAST{,HASH} in ipg_nic_set_multicast_list() · 0761248f

Roel Kluin authored Jun 09, 2008

The branches are dead code.  even when dev->flag IFF_MULTICAST (defined
0x1000) is set, dev->flags & IFF_MULTICAST & [boolean] always evaluates to
0.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Cc: Francois Romieu <romieu@fr.zoreil.com>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

0761248f

Merge branch 'net-2.6-misc-20080611a' of git://git.linux-ipv6.org/gitroot/yoshfuji/linux-2.6-fix · a4056573
David S. Miller authored Jun 11, 2008

a4056573
Merge branch 'master' of git://eden-feed.erg.abdn.ac.uk/net-2.6 · 5cb960a8
David S. Miller authored Jun 11, 2008

5cb960a8

netfilter: nf_conntrack: fix ctnetlink related crash in nf_nat_setup_info() · ceeff754

Patrick McHardy authored Jun 11, 2008

When creation of a new conntrack entry in ctnetlink fails after having
set up the NAT mappings, the conntrack has an extension area allocated
that is not getting properly destroyed when freeing the conntrack again.
This means the NAT extension is still in the bysource hash, causing a
crash when walking over the hash chain the next time:

BUG: unable to handle kernel paging request at 00120fbd
IP: [<c03d394b>] nf_nat_setup_info+0x221/0x58a
*pde = 00000000
Oops: 0000 [#1] PREEMPT SMP

Pid: 2795, comm: conntrackd Not tainted (2.6.26-rc5 #1)
EIP: 0060:[<c03d394b>] EFLAGS: 00010206 CPU: 1
EIP is at nf_nat_setup_info+0x221/0x58a
EAX: 00120fbd EBX: 00120fbd ECX: 00000001 EDX: 00000000
ESI: 0000019e EDI: e853bbb4 EBP: e853bbc8 ESP: e853bb78
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process conntrackd (pid: 2795, ti=e853a000 task=f7de10f0 task.ti=e853a000)
Stack: 00000000 e853bc2c e85672ec 00000008 c0561084 63c1db4a 00000000 00000000
       00000000 0002e109 61d2b1c3 00000000 00000000 00000000 01114e22 61d2b1c3
       00000000 00000000 f7444674 e853bc04 00000008 c038e728 0000000a f7444674
Call Trace:
 [<c038e728>] nla_parse+0x5c/0xb0
 [<c0397c1b>] ctnetlink_change_status+0x190/0x1c6
 [<c0397eec>] ctnetlink_new_conntrack+0x189/0x61f
 [<c0119aee>] update_curr+0x3d/0x52
 [<c03902d1>] nfnetlink_rcv_msg+0xc1/0xd8
 [<c0390228>] nfnetlink_rcv_msg+0x18/0xd8
 [<c0390210>] nfnetlink_rcv_msg+0x0/0xd8
 [<c038d2ce>] netlink_rcv_skb+0x2d/0x71
 [<c0390205>] nfnetlink_rcv+0x19/0x24
 [<c038d0f5>] netlink_unicast+0x1b3/0x216
 ...

Move invocation of the extension destructors to nf_conntrack_free()
to fix this problem.

Fixes http://bugzilla.kernel.org/show_bug.cgi?id=10875Reported-and-Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

ceeff754

netfilter: Make nflog quiet when no one listen in userspace. · b66985b1

Eric Leblond authored Jun 11, 2008

The message "nf_log_packet: can't log since no backend logging module loaded
in! Please either load one, or disable logging explicitly" was displayed for
each logged packet when no userspace application is listening to nflog events.
The message seems to warn for a problem with a kernel module missing but as
said before this is not the case. I thus propose to suppress the message (I
don't see any reason to flood the log because a user application has crashed.)
Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

b66985b1

ipv6: Fail with appropriate error code when setting not-applicable sockopt. · 1717699c

YOSHIFUJI Hideaki authored Jun 12, 2008

IPV6_MULTICAST_HOPS, for example, is not valid for stream sockets.
Since they are virtually unavailable for stream sockets,
we should return ENOPROTOOPT instead of EINVAL.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

1717699c

ipv6: Check IPV6_MULTICAST_LOOP option value. · 28d44882

YOSHIFUJI Hideaki authored Jun 12, 2008

Only 0 and 1 are valid for IPV6_MULTICAST_LOOP socket option,
and we should return an error of EINVAL otherwise, per RFC3493.

Based on patch from Shan Wei <shanwei@cn.fujitsu.com>.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

28d44882

ipv6: Check the hop limit setting in ancillary data. · e8766fc8

Shan Wei authored Jun 10, 2008

When specifing the outgoing hop limit as ancillary data for sendmsg(),
the kernel doesn't check the integer hop limit value as specified in
[RFC-3542] section 6.3.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

e8766fc8

ipv6 route: Fix route lifetime in netlink message. · 36e3deae

YOSHIFUJI Hideaki authored May 13, 2008

1) We may have route lifetime larger than INT_MAX.
In that case we had wired value in lifetime.
Use INT_MAX if lifetime does not fit in s32.

2) Lifetime is valid iif RTF_EXPIRES is set.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

36e3deae

ipv6 mcast: Check address family of gf_group in getsockopt(MS_FILTER). · 20c61fbd
YOSHIFUJI Hideaki authored Apr 28, 2008
```
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
```
20c61fbd

11 Jun, 2008 6 commits

dccp: Bug in initial acknowledgment number assignment · be4c798a

Gerrit Renker authored Jun 11, 2008

Step 8.5 in RFC 4340 says for the newly cloned socket

           Initialize S.GAR := S.ISS,

but what in fact the code (minisocks.c) does is

           Initialize S.GAR := S.ISR,

which is wrong (typo?) -- fixed by the patch.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>

be4c798a

dccp ccid-3: X truncated due to type conversion · 7deb0f85

Gerrit Renker authored Jun 11, 2008

This fixes a bug in computing the inter-packet-interval t_ipi = s/X: 

 scaled_div32(a, b) uses u32 for b, but in "scaled_div32(s, X)" the type of the
 sending rate `X' is u64. Since X is scaled by 2^6, this truncates rates greater
 than 2^26 Bps (~537 Mbps).

Using full 64-bit division now.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>

7deb0f85

dccp ccid-3: TFRC reverse-lookup Bug-Fix · 1e8a287c

Gerrit Renker authored Jun 11, 2008

This fixes a bug in the reverse lookup of p: given a value f(p), instead of p,
the function returned the smallest tabulated value f(p).

The smallest tabulated value of
	 
   10^6 * f(p) =  sqrt(2*p/3) + 12 * sqrt(3*p/8) * (32 * p^3 + p) 

for p=0.0001 is 8172. 

Since this value is scaled by 10^6, the outcome of this bug is that a loss
of 8172/10^6 = 0.8172% was reported whenever the input was below the table
resolution of 0.01%.

This means that the value was over 80 times too high, resulting in large spikes
of the initial loss interval, thus unnecessarily reducing the throughput.

Also corrected the printk format (%u for u32).
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>

1e8a287c

dccp ccid-2: Bug-Fix - Ack Vectors need to be ignored on request sockets · 65907a43

Gerrit Renker authored Jun 11, 2008

This fixes an oversight from an earlier patch, ensuring that Ack Vectors
are not processed on request sockets.

The issue is that Ack Vectors must not be parsed on request sockets, since
the Ack Vector feature depends on the selection of the (TX) CCID. During the
initial handshake the CCIDs are undefined, and so RFC 4340, 10.3 applies:

"Using CCID-specific options and feature options during a negotiation
for the corresponding CCID feature is NOT RECOMMENDED [...]"

And it is not even possible: when the server receives the Request from the
client, the CCID and Ack vector features are undefined; when the Ack finalising
the 3-way hanshake arrives, the request socket has not been cloned yet into a
full socket. (This order is necessary, since otherwise the newly created socket
would have to be destroyed whenever an option error occurred - a malicious
hacker could simply send garbage options and exploit this.)
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>

65907a43

dccp: Fix sparse warnings · 1e2f0e5e

Gerrit Renker authored Jun 11, 2008

This patch fixes the following sparse warnings:
 * nested min(max()) expression:
   net/dccp/ccids/ccid3.c:91:21: warning: symbol '__x' shadows an earlier one
   net/dccp/ccids/ccid3.c:91:21: warning: symbol '__y' shadows an earlier one
   
 * Declaration of function prototypes in .c instead of .h file, resulting in
   "should it be static?" warnings. 

 * Declared "struct dccpw" static (local to dccp_probe).
 
 * Disabled dccp_delayed_ack() - not fully removed due to RFC 4340, 11.3
   ("Receivers SHOULD implement delayed acknowledgement timers ...").

 * Used a different local variable name to avoid
   net/dccp/ackvec.c:293:13: warning: symbol 'state' shadows an earlier one
   net/dccp/ackvec.c:238:33: originally declared here

 * Removed unused functions `dccp_ackvector_print' and `dccp_ackvec_print'.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>

1e2f0e5e

dccp ccid-3: Bug-Fix - Zero RTT is possible · 3294f202

Gerrit Renker authored Jun 11, 2008

In commit $(825de27d) (from 27th May, commit
message `dccp ccid-3: Fix "t_ipi explosion" bug'), the CCID-3 window counter
computation was fixed to cope with RTTs < 4 microseconds.

Such RTTs can be found e.g. when running CCID-3 over loopback. The fix removed
a check against RTT < 4, but introduced a divide-by-zero bug.

All steady-state RTTs in DCCP are filtered using dccp_sample_rtt(), which
ensures non-zero samples. However, a zero RTT is possible on initialisation,
when there is no RTT sample from the Request/Response exchange.

The fix is to use the fallback-RTT from RFC 4340, 3.4.

This is also better than just fixing update_win_count() since it allows other
parts of the code to always assume that the RTT is non-zero during the time
that the CCID is used.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>

3294f202

10 Jun, 2008 20 commits

Merge branch 'davem-fixes' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 · 513fd370
David S. Miller authored Jun 10, 2008

513fd370

net: Fix routing tables with id > 255 for legacy software · 709772e6

Krzysztof Piotr Oledzki authored Jun 10, 2008

Most legacy software do not like tables > 255 as rtm_table is u8
so tb_id is sent &0xff and it is possible to mismatch for example
table 510 with table 254 (main).

This patch introduces RT_TABLE_COMPAT=252 so the code uses it if
tb_id > 255. It makes such old applications happy, new
ones are still able to use RTA_TABLE to get a proper table id.
Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

709772e6

sky2: Hold RTNL while calling dev_close() · 68c28898

Ben Hutchings authored May 31, 2008

dev_close() must be called holding the RTNL.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

68c28898

s2io iomem annotations · 69de8d23

Al Viro authored Jun 02, 2008

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

69de8d23

atl1: fix suspend regression · ae6b4d9a

Jay Cliburn authored Jun 01, 2008

Using vendor magic to force the PHY into power save mode breaks
suspend.  It isn't needed anyway, so remove it.
Tested-by: Avuton Olrich <avuton@gmail.com>
Signed-off-by: Jay Cliburn <jacliburn@bellsouth.net>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

ae6b4d9a

qeth: start dev queue after tx drop error · d0ec0f54

Frank Blaschka authored Jun 06, 2008

In case the xmit function drop out with an error, we have to wake
the netdevice queue to start another xmit.
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

d0ec0f54

qeth: Prepare-function to call s390dbf was wrong · 345aa66e

Peter Tiedemann authored Jun 06, 2008

Prepare-function to call s390dbf was wrong handling variable arguments.
This worked as macro but not as function any more.
Now using va_list processing.
Signed-off-by: Peter Tiedemann <ptiedem@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

345aa66e

qeth: reduce number of kernel messages · 14cc21b6

Frank Blaschka authored Jun 06, 2008

Remove unnecessary messages. Write important debug information to
s390dbf.
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

14cc21b6

qeth: Use ccw_device_get_id(). · f06f6f32

Cornelia Huck authored Jun 06, 2008

Get the devno from the ccw device via ccw_device_get_id() instead
of parsing the bus_id.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

f06f6f32

qeth: layer 3 Oops in ip event handler · e5bd7be5

Frank Blaschka authored Jun 06, 2008

The ip event handler may present us non qeth network interfaces.
Add qeth card pointer check.
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

e5bd7be5

virtio: use callback on empty in virtio_net · 363f1514

Rusty Russell authored Jun 08, 2008

virtio_net uses a timer to free old transmitted packets, rather than
leaving callbacks enabled all the time.  If the host promises to
always notify us when the transmit ring is empty, we can free packets
at that point and avoid the timer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

363f1514

virtio: virtio_net free transmit skbs in a timer · 14c998f0

Mark McLoughlin authored Jun 08, 2008

virtio_net currently only frees old transmit skbs just
before queueing new ones. If the queue is full, it then
enables interrupts and waits for notification that more
work has been performed.

However, a side-effect of this scheme is that there are
always xmit skbs left dangling when no new packets are
sent, against the Documentation/networking/driver.txt
guideline:

  "... it is not allowed for your TX mitigation scheme
   to let TX packets "hang out" in the TX ring unreclaimed
   forever if no new TX packets are sent."

Add a timer to ensure that any time we queue new TX
skbs, we will shortly free them again.

This fixes an easily reproduced hang at shutdown where
iptables attempts to unload nf_conntrack and nf_conntrack
waits for an skb it is tracking to be freed, but virtio_net
never frees it.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

14c998f0

virtio: Fix typo in virtio_net_hdr comments · 2506ece0

Mark McLoughlin authored Jun 08, 2008

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

2506ece0

virtio_net: Fix skb->csum_start computation · 23cde76d

Mark McLoughlin authored Jun 08, 2008

hdr->csum_start is the offset from the start of the ethernet
header to the transport layer checksum field. skb->csum_start
is the offset from skb->head.

skb_partial_csum_set() assumes that skb->data points to the
ethernet header - i.e. it computes skb->csum_start by adding
the headroom to hdr->csum_start.

Since eth_type_trans() skb_pull()s the ethernet header,
skb_partial_csum_set() should be called before
eth_type_trans().

(Without this patch, GSO packets from a guest to the world outside the
host are corrupted).
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

23cde76d

ehea: set mac address fix · 00aaea2f

Jan-Bernd Themann authored Jun 09, 2008

eHEA has to call firmware functions in order to change the mac address
of a logical port. This patch checks if the logical port is up
when calling the register / deregister mac address calls. If the port
is down these firmware calls would fail and are therefore not executed.
Signed-off-by: Jan-Bernd Themann <themann@de.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

00aaea2f

sfc: Recover from RX queue flush failure · 23bdfdd3

Steve Hodgson authored Jun 09, 2008

RX queue flush can fail if traffic continues to arrive.  Recover by
performing an invisible reset.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

23bdfdd3

add missing lance_* exports · bf4d5934

Adrian Bunk authored Jun 10, 2008

This patch fixes the following build error:

<--  snip  -->

...
  Building modules, stage 2.
  MODPOST 1203 modules
ERROR: "lance_open" [drivers/net/mvme147.ko] undefined!
ERROR: "lance_close" [drivers/net/mvme147.ko] undefined!
ERROR: "lance_tx_timeout" [drivers/net/mvme147.ko] undefined!
ERROR: "lance_set_multicast" [drivers/net/mvme147.ko] undefined!
ERROR: "lance_start_xmit" [drivers/net/mvme147.ko] undefined!
...
make[2]: *** [__modpost] Error 1

<--  snip  -->
Reported-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

bf4d5934

ixgbe: fix typo · ff68cdbf

Jeff Kirsher authored Jun 09, 2008

Define names were accidently transposed.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

ff68cdbf

forcedeth: msi interrupts · 4db0ee17

Ayaz Abdulla authored Jun 09, 2008

Add a workaround for lost MSI interrupts.  There is a race condition in
the HW in which future interrupts could be missed.  The workaround is to
toggle the MSI irq mask.

Added cleanup based on comments from Andrew Morton.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

4db0ee17

ipsec: pfkey should ignore events when no listeners · 99c6f60e

Jamal Hadi Salim authored Jun 10, 2008

When pfkey has no km listeners, it still does a lot of work
before finding out there aint nobody out there.
If a tree falls in a forest and no one is around to hear it, does it make
a sound? In this case it makes a lot of noise:
With this short-circuit adding 10s of thousands of SAs using
netlink improves performance by ~10%.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>

99c6f60e