Commits · cbdadbbf0c790f79350a8f36029208944c5487d0 · Kirill Smelkov / linux

09 Jul, 2013 6 commits

virtio_net: fix race in RX VQ processing · cbdadbbf

Michael S. Tsirkin authored Jul 09, 2013

virtio net called virtqueue_enable_cq on RX path after napi_complete, so
with NAPI_STATE_SCHED clear - outside the implicit napi lock.
This violates the requirement to synchronize virtqueue_enable_cq wrt
virtqueue_add_buf.  In particular, used event can move backwards,
causing us to lose interrupts.
In a debug build, this can trigger panic within START_USE.

Jason Wang reports that he can trigger the races artificially,
by adding udelay() in virtqueue_enable_cb() after virtio_mb().

However, we must call napi_complete to clear NAPI_STATE_SCHED before
polling the virtqueue for used buffers, otherwise napi_schedule_prep in
a callback will fail, causing us to lose RX events.

To fix, call virtqueue_enable_cb_prepare with NAPI_STATE_SCHED
set (under napi lock), later call virtqueue_poll with
NAPI_STATE_SCHED clear (outside the lock).
Reported-by: Jason Wang <jasowang@redhat.com>
Tested-by: Jason Wang <jasowang@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cbdadbbf

virtio: support unlocked queue poll · cc229884

Michael S. Tsirkin authored Jul 09, 2013

This adds a way to check ring empty state after enable_cb outside any
locks. Will be used by virtio_net.

Note: there's room for more optimization: caller is likely to have a
memory barrier already, which means we might be able to get rid of a
barrier here.  Deferring this optimization until we do some
benchmarking.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cc229884

net/cadence/macb: fix bug/typo in extracting gem_irq_read_clear bit · 01276ed2

Jongsung Kim authored Jul 09, 2013

Signed-off-by: Jongsung Kim <neidhard.kim@lge.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

01276ed2

Documentation: Fix references to defunct linux-net@vger.kernel.org · b78ba72c

Geert Uytterhoeven authored Jul 09, 2013

linux-net@vger.kernel.org was replaced by netdev@oss.sgi.com was replaced
by netdev@vger.kernel.org.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

b78ba72c

net/fs: change busy poll time accounting · 76b1e9b9

Eliezer Tamir authored Jul 09, 2013

Suggested by Linus:
Changed time accounting for busy-poll:
- Make it microsecond based.
- Use unsigned longs.
- Revert back to use time_after instead of time_in_range.
Reorder poll/select busy loop conditions:
- Clear busy_flag after one time we can't busy-poll.
- Only init busy_end if we actually are going to busy-poll.
Added one more missing need_resched() test.
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

76b1e9b9

net: rename low latency sockets functions to busy poll · cbf55001

Eliezer Tamir authored Jul 08, 2013

Rename functions in include/net/ll_poll.h to busy wait.
Clarify documentation about expected power use increase.
Rename POLL_LL to POLL_BUSY_LOOP.
Add need_resched() testing to poll/select busy loops.

Note, that in select and poll can_busy_poll is dynamic and is
updated continuously to reflect the existence of supported
sockets with valid queue information.
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cbf55001

07 Jul, 2013 1 commit

bridge: fix some kernel warning in multicast timer · c7e8e8a8

Cong Wang authored Jul 05, 2013

Several people reported the warning: "kernel BUG at kernel/timer.c:729!"
and the stack trace is:

	#7 [ffff880214d25c10] mod_timer+501 at ffffffff8106d905
	#8 [ffff880214d25c50] br_multicast_del_pg.isra.20+261 at ffffffffa0731d25 [bridge]
	#9 [ffff880214d25c80] br_multicast_disable_port+88 at ffffffffa0732948 [bridge]
	#10 [ffff880214d25cb0] br_stp_disable_port+154 at ffffffffa072bcca [bridge]
	#11 [ffff880214d25ce8] br_device_event+520 at ffffffffa072a4e8 [bridge]
	#12 [ffff880214d25d18] notifier_call_chain+76 at ffffffff8164aafc
	#13 [ffff880214d25d50] raw_notifier_call_chain+22 at ffffffff810858f6
	#14 [ffff880214d25d60] call_netdevice_notifiers+45 at ffffffff81536aad
	#15 [ffff880214d25d80] dev_close_many+183 at ffffffff81536d17
	#16 [ffff880214d25dc0] rollback_registered_many+168 at ffffffff81537f68
	#17 [ffff880214d25de8] rollback_registered+49 at ffffffff81538101
	#18 [ffff880214d25e10] unregister_netdevice_queue+72 at ffffffff815390d8
	#19 [ffff880214d25e30] __tun_detach+272 at ffffffffa074c2f0 [tun]
	#20 [ffff880214d25e88] tun_chr_close+45 at ffffffffa074c4bd [tun]
	#21 [ffff880214d25ea8] __fput+225 at ffffffff8119b1f1
	#22 [ffff880214d25ef0] ____fput+14 at ffffffff8119b3fe
	#23 [ffff880214d25f00] task_work_run+159 at ffffffff8107cf7f
	#24 [ffff880214d25f30] do_notify_resume+97 at ffffffff810139e1
	#25 [ffff880214d25f50] int_signal+18 at ffffffff8164f292

this is due to I forgot to check if mp->timer is armed in
br_multicast_del_pg(). This bug is introduced by
commit 9f00b2e7 (bridge: only expire the mdb entry
when query is received).

Same for __br_mdb_del().
Tested-by: poma <pomidorabelisima@gmail.com>
Reported-by: LiYonghua <809674045@qq.com>
Reported-by: Robert Hancock <hancockrwd@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c7e8e8a8

05 Jul, 2013 1 commit

sfc: Fix memory leak when discarding scattered packets · 734d4e15

Ben Hutchings authored Jul 04, 2013

Commit 2768935a ('sfc: reuse pages to avoid DMA mapping/unmapping
costs') did not fully take account of DMA scattering which was
introduced immediately before.  If a received packet is invalid and
must be discarded, we only drop a reference to the first buffer's
page, but we need to drop a reference for each buffer the packet
used.

I think this bug was missed partly because efx_recycle_rx_buffers()
was not renamed and so no longer does what its name says.  It does not
change the state of buffers, but only prepares the underlying pages
for recycling.  Rename it accordingly.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

734d4e15

04 Jul, 2013 8 commits

sit: fix tunnel update via netlink · 86bd68bf

Nicolas Dichtel authored Jul 03, 2013

The device can stand in another netns, hence we need to do the lookup in netns
tunnel->net.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

86bd68bf

dt:net:stmmac: Add dt specific phy reset callback support. · 0e076471

Srinivas Kandagatla authored Jul 04, 2013

This patch adds phy reset callback support for stmmac driver via device
trees. It adds three new properties to gmac device tree bindings to
define the reset signal via gpio.

With this patch users can conveniently pass reset gpio number with pre,
pulse and post delay in micro secs via DTs.

 active low:
		_________		 ____________
	<pre-delay>	|<pulse-delay>	|<post-delay>
		 	|		|
		 	|_______________|

 active high:
 			 ________________
	<pre-delay>	|<pulse-delay>	|<post-delay>
			|		|
		________|		|___________
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0e076471

dt:net:stmmac: Add support to dwmac version 3.610 and 3.710 · 25c83b5c

Srinivas Kandagatla authored Jul 04, 2013

This patch adds dt support to dwmac version 3.610 and 3.710 these
versions are integrated in STiH415 and STiH416 ARM A9 SOCs.
To support these IP version, some of the device tree properties are
extended.
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

25c83b5c

dt:net:stmmac: Allocate platform data only if its NULL. · d741434c

Srinivas Kandagatla authored Jul 04, 2013

In some DT use-cases platform data might be already allocated and passed
via AUXDATA. These are the cases where machine level code populates few
callbacks in the platform data.

This patch adds check and reuses platform_data if its valid, before
allocating a new one.
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d741434c

net:stmmac: fix memleak in the open method · c9324d18

Giuseppe CAVALLARO authored Jul 04, 2013

This patch is to fix a memory leak in the open method, it reviews error
conditions freeing the resources previously allocated and not freed in
cased of DMA failure.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c9324d18

ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available · 3630d400

Hannes Frederic Sowa authored Jul 03, 2013

After the removal of rt->n we do not create a neighbour entry at route
insertion time (rt6_bind_neighbour is gone). As long as no neighbour is
created because of "useful traffic" we skip this routing entry because
rt6_check_neigh cannot pick up a valid neighbour (neigh == NULL) and
thus returns false.

This change was introduced by commit
887c95cc ("ipv6: Complete neighbour
entry removal from dst_entry.")

To quote RFC4191:
"If the host has no information about the router's reachability, then
the host assumes the router is reachable."

and also:
"A host MUST NOT probe a router's reachability in the absence of useful
traffic that the host would have sent to the router if it were reachable."

So, just assume the router is reachable and let's rt6_probe do the
rest. We don't need to create a neighbour on route insertion time.

If we don't compile with CONFIG_IPV6_ROUTER_PREF (RFC4191 support)
a neighbour is only valid if its nud_state is NUD_VALID. I did not find
any references that we should probe the router on route insertion time
via the other RFCs. So skip this route in that case.

v2:
a) use IS_ENABLED instead of #ifdefs (thanks to Sergei Shtylyov)
Reported-by: Pierre Emeriaud <petrus.lt@gmail.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

3630d400

net: ipv6: fix wrong ping_v6_sendmsg return value · fbfe80c8

Lorenzo Colitti authored Jul 04, 2013

ping_v6_sendmsg currently returns 0 on success. It should return
the number of bytes written instead.
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fbfe80c8

net: ipv6: add missing lock in ping_v6_sendmsg · a1bdc455

Lorenzo Colitti authored Jul 04, 2013

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a1bdc455

03 Jul, 2013 9 commits

netem: fix possible NULL deref in netem_dequeue() · 36b7bfe0

Eric Dumazet authored Jul 03, 2013

commit aec0a40a ("netem: use rb tree to implement the time queue")
added a regression if a child qdisc is attached to netem, as we perform
a NULL dereference.

Fix this by adding a temporary variable to cache
netem_skb_cb(skb)->time_to_send.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

36b7bfe0

net: sock: fix TCP_SKB_MIN_TRUESIZE · 9eb5bf83

Eric Dumazet authored Jul 03, 2013

commit eea86af6 ("net: sock: adapt SOCK_MIN_RCVBUF and
SOCK_MIN_SNDBUF") forgot the sk_buff alignment taken into account
in __alloc_skb() : skb->truesize = SKB_TRUESIZE(size);

While above commit fixed the sender issue, the receiver is still
dropping the second packet (on loopback device), because the receiver
socket can not really hold two skbs :
First packet truesize already is above sk_rcvbuf, so even TCP coalescing
cannot help.

On a typical 64bit build, each tcp skb truesize is 2304, instead of 2272
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9eb5bf83

net/mlx4: fix small memory leak on error · 9caf83c3

Dan Carpenter authored Jul 03, 2013

"work" needs to be freed before returning on this error path.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9caf83c3

core: Copy inner_protocol in copy_skb_header() · 4bc41b84

Joe Stringer authored Jul 03, 2013

inner_protocol was added to struct sk_buff in
0d89d203 ("MPLS: Add limited GSO support"),
which is scheduled to be included in v3.11.

That patch did not update __copy_skb_header to copy the inner_protocol.
Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

4bc41b84

net: fec: Add VLAN receive HW support. · cdffcf1b

Jim Baxter authored Jul 02, 2013

This enables the driver to take advantage of the FEC VLAN
indicator to improve performance.
Signed-off-by: Jim Baxter <jim_baxter@mentor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cdffcf1b

alx: remove WoL support · bc2bebe8

Johannes Berg authored Jul 03, 2013

Unfortunately, WoL is broken and the system will immediately
resume after suspending, and I can't seem to figure out why.
Remove WoL support until the issue can be found.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

bc2bebe8

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 0c1072ae

David S. Miller authored Jul 03, 2013

Conflicts:
	drivers/net/ethernet/freescale/fec_main.c
	drivers/net/ethernet/renesas/sh_eth.c
	net/ipv4/gre.c

The GRE conflict is between a bug fix (kfree_skb --> kfree_skb_list)
and the splitting of the gre.c code into seperate files.

The FEC conflict was two sets of changes adding ethtool support code
in an "!CONFIG_M5272" CPP protected block.

Finally the sh_eth.c conflict was between one commit add bits set
in the .eesr_err_check mask whilst another commit removed the
.tx_error_check member and assignments.
Signed-off-by: David S. Miller <davem@davemloft.net>

0c1072ae

net: gre: move GSO functions to gre_offload · c50cd357

Daniel Borkmann authored Jul 01, 2013

Similarly to TCP/UDP offloading, move all related GRE functions to
gre_offload.c to make things more explicit and similar to the rest
of the code.
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c50cd357

net: lls fix build with allnoconfig · 419076f5

Eliezer Tamir authored Jul 03, 2013

correct placeholder declarations to prevent build breakage when
!CONFIG_NET_LL_RX_POLL
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

419076f5

02 Jul, 2013 15 commits

ip_tunnels: Use skb-len to PMTU check. · 23a3647b

Pravin B Shelar authored Jul 02, 2013

In path mtu check, ip header total length works for gre device
but not for gre-tap device.  Use skb len which is consistent
for all tunneling types.  This is old bug in gre.
This also fixes mtu calculation bug introduced by
commit c5441932 (GRE: Refactor GRE tunneling code).
Reported-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

23a3647b

Merge branch 'l2tp_seq' · 784771e7

David S. Miller authored Jul 02, 2013

James Chapman says:

====================
L2TP data sequence numbers, if enabled, ensure in-order delivery. A
receiver may reorder data packets, or simply drop out-of-sequence
packets. If reordering is not enabled, the current implementation does
not handle data packet loss correctly, which can result in a stalled
L2TP session datapath as soon as the first packet is lost. Most L2TP
users either disable sequence numbers or enable data packet reordering
when sequence numbers are used to circumvent the issue. This patch
series fixes the problem, and makes the L2TP sequence number handling
RFC-compliant.

v2 incorporates string format changes requested by sergei.shtylyov@cogentembedded.com.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

784771e7

l2tp: make datapath resilient to packet loss when sequence numbers enabled · a0dbd822

James Chapman authored Jul 02, 2013

If L2TP data sequence numbers are enabled and reordering is not
enabled, data reception stops if a packet is lost since the kernel
waits for a sequence number that is never resent. (When reordering is
enabled, data reception restarts when the reorder timeout expires.) If
no reorder timeout is set, we should count the number of in-sequence
packets after the out-of-sequence (OOS) condition is detected, and reset
sequence number state after a number of such packets are received.

For now, the number of in-sequence packets while in OOS state which
cause the sequence number state to be reset is hard-coded to 5. This
could be configurable later.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a0dbd822

l2tp: make datapath sequence number support RFC-compliant · 8a1631d5

James Chapman authored Jul 02, 2013

The L2TP datapath is not currently RFC-compliant when sequence numbers
are used in L2TP data packets. According to the L2TP RFC, any received
sequence number NR greater than or equal to the next expected NR is
acceptable, where the "greater than or equal to" test is determined by
the NR wrap point. This differs for L2TPv2 and L2TPv3, so add state in
the session context to hold the max NR value and the NR window size in
order to do the acceptable sequence number value check. These might be
configurable later, but for now we derive it from the tunnel L2TP
version, which determines the sequence number field size.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8a1631d5

l2tp: do data sequence number handling in a separate func · b6dc01a4

James Chapman authored Jul 02, 2013

This change moves some code handling data sequence numbers into a
separate function to avoid too much indentation. This is to prepare
for some changes to data sequence number handling in subsequent
patches.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b6dc01a4

sctp: use get_unused_fd_flags(0) instead of get_unused_fd() · 8a59bd3e

Yann Droneaud authored Jul 02, 2013

Macro get_unused_fd() is used to allocate a file descriptor with
default flags. Those default flags (0) can be "unsafe":
O_CLOEXEC must be used by default to not leak file descriptor
across exec().

Instead of macro get_unused_fd(), functions anon_inode_getfd()
or get_unused_fd_flags() should be used with flags given by userspace.
If not possible, flags should be set to O_CLOEXEC to provide userspace
with a default safe behavor.

In a further patch, get_unused_fd() will be removed so that
new code start using anon_inode_getfd() or get_unused_fd_flags()
with correct flags.

This patch replaces calls to get_unused_fd() with equivalent call to
get_unused_fd_flags(0) to preserve current behavor for existing code.

The hard coded flag value (0) should be reviewed on a per-subsystem basis,
and, if possible, set to O_CLOEXEC.
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8a59bd3e

stmmac: dity-up and rework the driver debug levels · 83d7af64

Giuseppe CAVALLARO authored Jul 02, 2013

Prior this patch, the internal debugging was based on ifdef
and also some printk were useless because many info are exposed
via ethtool.
This patch remove all the ifdef defines and now we only use
netif_msg_XXX levels.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

83d7af64

core/dev: set pkt_type after eth_type_trans() in dev_forward_skb() · 06a23fe3

Isaku Yamahata authored Jul 02, 2013

The dev_forward_skb() assignment of pkt_type should be done
after the call to eth_type_trans().

ip-encapsulated packets can be handled by localhost. But skb->pkt_type
can be PACKET_OTHERHOST when packet comes via veth into ip tunnel device.
In that case, the packet is dropped by ip_rcv().
Although this example uses gretap. l2tp-eth also has same issue.
For l2tp-eth case, add dummy device for ip address and ip l2tp command.

netns A |                     root netns                      | netns B
   veth<->veth=bridge=gretap <-loop back-> gretap=bridge=veth<->veth

arp packet ->
pkt_type
         BROADCAST------------>ip_rcv()------------------------>

                                                             <- arp reply
                                                                pkt_type
                               ip_rcv()<-----------------OTHERHOST
                               drop

sample operations
  ip link add tapa type gretap remote 172.17.107.4 local 172.17.107.3
  ip link add tapb type gretap remote 172.17.107.3 local 172.17.107.4
  ip link set tapa up
  ip link set tapb up
  ip address add 172.17.107.3 dev tapa
  ip address add 172.17.107.4 dev tapb
  ip route get 172.17.107.3
  > local 172.17.107.3 dev lo  src 172.17.107.3
  >    cache <local>
  ip route get 172.17.107.4
  > local 172.17.107.4 dev lo  src 172.17.107.4
  >    cache <local>
  ip link add vetha type veth peer name vetha-peer
  ip link add vethb type veth peer name vethb-peer
  brctl addbr bra
  brctl addbr brb
  brctl addif bra tapa
  brctl addif bra vetha-peer
  brctl addif brb tapb
  brctl addif brb vethb-peer
  brctl show
  > bridge name     bridge id               STP enabled     interfaces
  > bra             8000.6ea21e758ff1       no              tapa
  >                                                         vetha-peer
  > brb             8000.420020eb92d5       no              tapb
  >                                                         vethb-peer
  ip link set vetha-peer up
  ip link set vethb-peer up
  ip link set bra up
  ip link set brb up
  ip netns add a
  ip netns add b
  ip link set vetha netns a
  ip link set vethb netns b
  ip netns exec a ip address add 10.0.0.3/24 dev vetha
  ip netns exec b ip address add 10.0.0.4/24 dev vethb
  ip netns exec a ip link set vetha up
  ip netns exec b ip link set vethb up
  ip netns exec a arping -I vetha 10.0.0.4
  ARPING 10.0.0.4 from 10.0.0.3 vetha
  ^CSent 2 probes (2 broadcast(s))
  Received 0 response(s)

Cc: Jason Wang <jasowang@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Hong Zhiguo <honkiko@gmail.com>
Cc: Rami Rosen <ramirose@gmail.com>
Cc: Tom Parkin <tparkin@katalix.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: Jesse Gross <jesse@nicira.com>
Cc: dev@openvswitch.org
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>

06a23fe3

net: convert lls to use time_in_range() · 1bc2774d

Eliezer Tamir authored Jul 02, 2013

Time in range will fail safely if we move to a different cpu with an
extremely large clock skew.
Add time_in_range64() and convert lls to use it.

changelog:
v2
- fixed double call to sched_clock in can_poll_ll
- fixed checkpatchisms
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1bc2774d

nlmon: use standard rtnetlink link api for add/del devices · 7e6d4da8

Jiri Pirko authored Jul 02, 2013

It is not nice when netdev is created right after module load and with
some implicit name. So rather change nlmon to use standard rtnl link API.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7e6d4da8

ipv6: ip6_append_data_mtu did not care about pmtudisc and frag_size · 75a493e6

Hannes Frederic Sowa authored Jul 02, 2013

If the socket had an IPV6_MTU value set, ip6_append_data_mtu lost track
of this when appending the second frame on a corked socket. This results
in the following splat:

[37598.993962] ------------[ cut here ]------------
[37598.994008] kernel BUG at net/core/skbuff.c:2064!
[37598.994008] invalid opcode: 0000 [#1] SMP
[37598.994008] Modules linked in: tcp_lp uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media vfat fat usb_storage fuse ebtable_nat xt_CHECKSUM bridge stp llc ipt_MASQUERADE nf_conntrack_netbios_ns nf_conntrack_broadcast ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat
+nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi
+scsi_transport_iscsi rfcomm bnep iTCO_wdt iTCO_vendor_support snd_hda_codec_conexant arc4 iwldvm mac80211 snd_hda_intel acpi_cpufreq mperf coretemp snd_hda_codec microcode cdc_wdm cdc_acm
[37598.994008]  snd_hwdep cdc_ether snd_seq snd_seq_device usbnet mii joydev btusb snd_pcm bluetooth i2c_i801 e1000e lpc_ich mfd_core ptp iwlwifi pps_core snd_page_alloc mei cfg80211 snd_timer thinkpad_acpi snd tpm_tis soundcore rfkill tpm tpm_bios vhost_net tun macvtap macvlan kvm_intel kvm uinput binfmt_misc
+dm_crypt i915 i2c_algo_bit drm_kms_helper drm i2c_core wmi video
[37598.994008] CPU 0
[37598.994008] Pid: 27320, comm: t2 Not tainted 3.9.6-200.fc18.x86_64 #1 LENOVO 27744PG/27744PG
[37598.994008] RIP: 0010:[<ffffffff815443a5>]  [<ffffffff815443a5>] skb_copy_and_csum_bits+0x325/0x330
[37598.994008] RSP: 0018:ffff88003670da18  EFLAGS: 00010202
[37598.994008] RAX: ffff88018105c018 RBX: 0000000000000004 RCX: 00000000000006c0
[37598.994008] RDX: ffff88018105a6c0 RSI: ffff88018105a000 RDI: ffff8801e1b0aa00
[37598.994008] RBP: ffff88003670da78 R08: 0000000000000000 R09: ffff88018105c040
[37598.994008] R10: ffff8801e1b0aa00 R11: 0000000000000000 R12: 000000000000fff8
[37598.994008] R13: 00000000000004fc R14: 00000000ffff0504 R15: 0000000000000000
[37598.994008] FS:  00007f28eea59740(0000) GS:ffff88023bc00000(0000) knlGS:0000000000000000
[37598.994008] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[37598.994008] CR2: 0000003d935789e0 CR3: 00000000365cb000 CR4: 00000000000407f0
[37598.994008] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[37598.994008] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[37598.994008] Process t2 (pid: 27320, threadinfo ffff88003670c000, task ffff88022c162ee0)
[37598.994008] Stack:
[37598.994008]  ffff88022e098a00 ffff88020f973fc0 0000000000000008 00000000000004c8
[37598.994008]  ffff88020f973fc0 00000000000004c4 ffff88003670da78 ffff8801e1b0a200
[37598.994008]  0000000000000018 00000000000004c8 ffff88020f973fc0 00000000000004c4
[37598.994008] Call Trace:
[37598.994008]  [<ffffffff815fc21f>] ip6_append_data+0xccf/0xfe0
[37598.994008]  [<ffffffff8158d9f0>] ? ip_copy_metadata+0x1a0/0x1a0
[37598.994008]  [<ffffffff81661f66>] ? _raw_spin_lock_bh+0x16/0x40
[37598.994008]  [<ffffffff8161548d>] udpv6_sendmsg+0x1ed/0xc10
[37598.994008]  [<ffffffff812a2845>] ? sock_has_perm+0x75/0x90
[37598.994008]  [<ffffffff815c3693>] inet_sendmsg+0x63/0xb0
[37598.994008]  [<ffffffff812a2973>] ? selinux_socket_sendmsg+0x23/0x30
[37598.994008]  [<ffffffff8153a450>] sock_sendmsg+0xb0/0xe0
[37598.994008]  [<ffffffff810135d1>] ? __switch_to+0x181/0x4a0
[37598.994008]  [<ffffffff8153d97d>] sys_sendto+0x12d/0x180
[37598.994008]  [<ffffffff810dfb64>] ? __audit_syscall_entry+0x94/0xf0
[37598.994008]  [<ffffffff81020ed1>] ? syscall_trace_enter+0x231/0x240
[37598.994008]  [<ffffffff8166a7e7>] tracesys+0xdd/0xe2
[37598.994008] Code: fe 07 00 00 48 c7 c7 04 28 a6 81 89 45 a0 4c 89 4d b8 44 89 5d a8 e8 1b ac b1 ff 44 8b 5d a8 4c 8b 4d b8 8b 45 a0 e9 cf fe ff ff <0f> 0b 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 48
[37598.994008] RIP  [<ffffffff815443a5>] skb_copy_and_csum_bits+0x325/0x330
[37598.994008]  RSP <ffff88003670da18>
[37599.007323] ---[ end trace d69f6a17f8ac8eee ]---

While there, also check if path mtu discovery is activated for this
socket. The logic was adapted from ip6_append_data when first writing
on the corked socket.

This bug was introduced with commit
0c183379 ("ipv6: fix incorrect ipsec
fragment").

v2:
a) Replace IPV6_PMTU_DISC_DO with IPV6_PMTUDISC_PROBE.
b) Don't pass ipv6_pinfo to ip6_append_data_mtu (suggestion by Gao
   feng, thanks!).
c) Change mtu to unsigned int, else we get a warning about
   non-matching types because of the min()-macro type-check.
Acked-by: Gao feng <gaofeng@cn.fujitsu.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

75a493e6

ipv6: call udp_push_pending_frames when uncorking a socket with AF_INET pending data · 8822b64a

Hannes Frederic Sowa authored Jul 01, 2013

We accidentally call down to ip6_push_pending_frames when uncorking
pending AF_INET data on a ipv6 socket. This results in the following
splat (from Dave Jones):

skbuff: skb_under_panic: text:ffffffff816765f6 len:48 put:40 head:ffff88013deb6df0 data:ffff88013deb6dec tail:0x2c end:0xc0 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:126!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: dccp_ipv4 dccp 8021q garp bridge stp dlci mpoa snd_seq_dummy sctp fuse hidp tun bnep nfnetlink scsi_transport_iscsi rfcomm can_raw can_bcm af_802154 appletalk caif_socket can caif ipt_ULOG x25 rose af_key pppoe pppox ipx phonet irda llc2 ppp_generic slhc p8023 psnap p8022 llc crc_ccitt atm bluetooth
+netrom ax25 nfc rfkill rds af_rxrpc coretemp hwmon kvm_intel kvm crc32c_intel snd_hda_codec_realtek ghash_clmulni_intel microcode pcspkr snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep usb_debug snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c
CPU: 2 PID: 8095 Comm: trinity-child2 Not tainted 3.10.0-rc7+ #37
task: ffff8801f52c2520 ti: ffff8801e6430000 task.ti: ffff8801e6430000
RIP: 0010:[<ffffffff816e759c>]  [<ffffffff816e759c>] skb_panic+0x63/0x65
RSP: 0018:ffff8801e6431de8  EFLAGS: 00010282
RAX: 0000000000000086 RBX: ffff8802353d3cc0 RCX: 0000000000000006
RDX: 0000000000003b90 RSI: ffff8801f52c2ca0 RDI: ffff8801f52c2520
RBP: ffff8801e6431e08 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffff88022ea0c800
R13: ffff88022ea0cdf8 R14: ffff8802353ecb40 R15: ffffffff81cc7800
FS:  00007f5720a10740(0000) GS:ffff880244c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000005862000 CR3: 000000022843c000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
 ffff88013deb6dec 000000000000002c 00000000000000c0 ffffffff81a3f6e4
 ffff8801e6431e18 ffffffff8159a9aa ffff8801e6431e90 ffffffff816765f6
 ffffffff810b756b 0000000700000002 ffff8801e6431e40 0000fea9292aa8c0
Call Trace:
 [<ffffffff8159a9aa>] skb_push+0x3a/0x40
 [<ffffffff816765f6>] ip6_push_pending_frames+0x1f6/0x4d0
 [<ffffffff810b756b>] ? mark_held_locks+0xbb/0x140
 [<ffffffff81694919>] udp_v6_push_pending_frames+0x2b9/0x3d0
 [<ffffffff81694660>] ? udplite_getfrag+0x20/0x20
 [<ffffffff8162092a>] udp_lib_setsockopt+0x1aa/0x1f0
 [<ffffffff811cc5e7>] ? fget_light+0x387/0x4f0
 [<ffffffff816958a4>] udpv6_setsockopt+0x34/0x40
 [<ffffffff815949f4>] sock_common_setsockopt+0x14/0x20
 [<ffffffff81593c31>] SyS_setsockopt+0x71/0xd0
 [<ffffffff816f5d54>] tracesys+0xdd/0xe2
Code: 00 00 48 89 44 24 10 8b 87 d8 00 00 00 48 89 44 24 08 48 8b 87 e8 00 00 00 48 c7 c7 c0 04 aa 81 48 89 04 24 31 c0 e8 e1 7e ff ff <0f> 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55
RIP  [<ffffffff816e759c>] skb_panic+0x63/0x65
 RSP <ffff8801e6431de8>

This patch adds a check if the pending data is of address family AF_INET
and directly calls udp_push_ending_frames from udp_v6_push_pending_frames
if that is the case.

This bug was found by Dave Jones with trinity.

(Also move the initialization of fl6 below the AF_INET check, even if
not strictly necessary.)

Cc: Dave Jones <davej@redhat.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

8822b64a

net: fec: Fix RMON registers on imx6 · b9eef55c

Jim Baxter authored Jul 01, 2013

commit 38ae92dc "fec: Add support for reading
RMON registers" causes the imx6Q to crash.

This fixes it by only enabling the RMON registers, the
registers are already cleared by the MAC being reset.
Signed-off-by: Jim Baxter <jim_baxter@mentor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b9eef55c

cxgb3: Missing rtnl lock in error recovery · 7cc47d13

Benjamin Herrenschmidt authored Jun 30, 2013

When exercising error injection on IBM pseries machine, I hit the
following warning:

[  251.450043] RTAS: event: 89, Type: Platform Error, Severity: 2
[  253.549822] cxgb3 0006:01:00.0: enabling device (0140 -> 0142)
[  253.713560] cxgb3 0006:01:00.0: adapter recovering, PEX ERR 0x100
[  254.895437] RTNL: assertion failed at net/core/dev.c (2031)
[  254.895467] CPU: 6 PID: 5449 Comm: eehd Tainted: G        W    3.10.0-rc7-00157-gea461abf #19
[  254.895474] Call Trace:
[  254.895483] [c000000fac56f7d0] [c000000000014dcc] .show_stack+0x7c/0x1f0 (unreliable)
[  254.895493] [c000000fac56f8a0] [c0000000007ba318] .dump_stack+0x28/0x3c
[  254.895500] [c000000fac56f910] [c0000000006c0384] .netif_set_real_num_tx_queues+0x224/0x230
[  254.895515] [c000000fac56f9b0] [d00000000ef35510] .cxgb_open+0x80/0x3f0 [cxgb3]
[  254.895525] [c000000fac56fa50] [d00000000ef35914] .t3_resume_ports+0x94/0x100 [cxgb3]
[  254.895533] [c000000fac56fae0] [c00000000005fc8c] .eeh_report_resume+0x8c/0xd0
[  254.895539] [c000000fac56fb60] [c00000000005e9fc] .eeh_pe_dev_traverse+0x9c/0x190
[  254.895545] [c000000fac56fc10] [c000000000060000] .eeh_handle_event+0x110/0x330
[  254.895551] [c000000fac56fca0] [c000000000060350] .eeh_event_handler+0x130/0x1a0
[  254.895558] [c000000fac56fd30] [c0000000000ad758] .kthread+0xe8/0xf0
[  254.895566] [c000000fac56fe30] [c00000000000a05c] .ret_from_kernel_thread+0x5c/0x80

It appears that t3_resume_ports() is called with the rtnl_lock held from
the fatal error task but not from the PCI error callbacks. This fixes it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

7cc47d13

net: cdc_ether: allow combined control and data interface · 1fc4c84d

Bjørn Mork authored Jun 29, 2013

Some Icera based Huawei modems handled by this driver are not
completely CDC ECM compliant, using the same USB interface for both
control and data. The CDC functional descriptors include a Union
naming this interface as both master and slave, so it is supportable
by relaxing the descriptor parsing in case these interfaces are
identical.

This has been tested on a Huawei K3806 and verified to add support
for that device.
Reported-and-tested-by: Enrico Mioso <mrkiko.rs@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Acked-by: Oliver Neukum <oliver@neukum.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

1fc4c84d