Commits · 83bf79b6bb64e686ccf0b66b6828b7bfe9f70ae9 · nexedi / linux

11 Mar, 2014 4 commits

stmmac: disable at run-time the EEE if not supported · 83bf79b6

Giuseppe CAVALLARO authored Mar 10, 2014

This patch is to disable the EEE (so HW and timers)
for example when the phy communicates that the EEE
can be supported anymore.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

83bf79b6

vmxnet3: fix netpoll race condition · d25f06ea

Neil Horman authored Mar 10, 2014

vmxnet3's netpoll driver is incorrectly coded.  It directly calls
vmxnet3_do_poll, which is the driver internal napi poll routine.  As the netpoll
controller method doesn't block real napi polls in any way, there is a potential
for race conditions in which the netpoll controller method and the napi poll
method run concurrently.  The result is data corruption causing panics such as this
one recently observed:
PID: 1371   TASK: ffff88023762caa0  CPU: 1   COMMAND: "rs:main Q:Reg"
 #0 [ffff88023abd5780] machine_kexec at ffffffff81038f3b
 #1 [ffff88023abd57e0] crash_kexec at ffffffff810c5d92
 #2 [ffff88023abd58b0] oops_end at ffffffff8152b570
 #3 [ffff88023abd58e0] die at ffffffff81010e0b
 #4 [ffff88023abd5910] do_trap at ffffffff8152add4
 #5 [ffff88023abd5970] do_invalid_op at ffffffff8100cf95
 #6 [ffff88023abd5a10] invalid_op at ffffffff8100bf9b
    [exception RIP: vmxnet3_rq_rx_complete+1968]
    RIP: ffffffffa00f1e80  RSP: ffff88023abd5ac8  RFLAGS: 00010086
    RAX: 0000000000000000  RBX: ffff88023b5dcee0  RCX: 00000000000000c0
    RDX: 0000000000000000  RSI: 00000000000005f2  RDI: ffff88023b5dcee0
    RBP: ffff88023abd5b48   R8: 0000000000000000   R9: ffff88023a3b6048
    R10: 0000000000000000  R11: 0000000000000002  R12: ffff8802398d4cd8
    R13: ffff88023af35140  R14: ffff88023b60c890  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff88023abd5b50] vmxnet3_do_poll at ffffffffa00f204a [vmxnet3]
 #8 [ffff88023abd5b80] vmxnet3_netpoll at ffffffffa00f209c [vmxnet3]
 #9 [ffff88023abd5ba0] netpoll_poll_dev at ffffffff81472bb7

The fix is to do as other drivers do, and have the poll controller call the top
half interrupt handler, which schedules a napi poll properly to recieve frames

Tested by myself, successfully.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Shreyas Bhatewara <sbhatewara@vmware.com>
CC: "VMware, Inc." <pv-drivers@vmware.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: stable@vger.kernel.org
Reviewed-by: Shreyas N Bhatewara <sbhatewara@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d25f06ea

vlan: Set correct source MAC address with TX VLAN offload enabled · dd38743b

Peter Boström authored Mar 10, 2014

With TX VLAN offload enabled the source MAC address for frames sent using the
VLAN interface is currently set to the address of the real interface. This is
wrong since the VLAN interface may be configured with a different address.

The bug was introduced in commit 2205369a
("vlan: Fix header ops passthru when doing TX VLAN offload.").

This patch sets the source address before calling the create function of the
real interface.
Signed-off-by: Peter Boström <peter.bostrom@netrounds.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dd38743b

Xen-netback: Fix issue caused by using gso_type wrongly · 5bd07670

Annie Li authored Mar 10, 2014

Current netback uses gso_type to check whether the skb contains
gso offload, and this is wrong. Gso_size is the right one to
check gso existence, and gso_type is only used to check gso type.

Some skbs contains nonzero gso_type and zero gso_size, current
netback would treat these skbs as gso and create wrong response
for this. This also causes ssh failure to domu from other server.

V2: use skb_is_gso function as Paul Durrant suggested
Signed-off-by: Annie Li <annie.li@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5bd07670

10 Mar, 2014 4 commits

pkt_sched: fq: do not hold qdisc lock while allocating memory · 2818fa0f

Eric Dumazet authored Mar 06, 2014

Resizing fq hash table allocates memory while holding qdisc spinlock,
with BH disabled.

This is definitely not good, as allocation might sleep.

We can drop the lock and get it when needed, we hold RTNL so no other
changes can happen at the same time.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: afe4fd06 ("pkt_sched: fq: Fair Queue packet scheduler")
Signed-off-by: David S. Miller <davem@davemloft.net>

2818fa0f

bna: Replace large udelay() with mdelay() · bc48bc80

Ben Hutchings authored Mar 09, 2014

udelay() does not work on some architectures for values above
2000, in particular on ARM:

ERROR: "__bad_udelay" [drivers/net/ethernet/brocade/bna/bna.ko] undefined!
Reported-by: Vagrant Cascadian <vagrant@debian.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

bc48bc80

pkt_sched: move the sanity test in qdisc_list_add() · 37314363

Eric Dumazet authored Mar 08, 2014

The WARN_ON(root == &noop_qdisc)) added in qdisc_list_add()
can trigger in normal conditions when devices are not up.
It should be done only right before the list_add_tail() call.

Fixes: e57a784d ("pkt_sched: set root qdisc before change() in attach_default_qdiscs()")
Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Tested-by: Mirco Tischler <mt-ml@gmx.de>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

37314363

Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless · 92f092d1

David S. Miller authored Mar 10, 2014

John W. Linville says:

====================
Please pull this batch of fixes intende for the 3.14 stream...

For the mac80211 bits, Johannes says:

"Here I have a fix from Eliad for the minimal channel width calculation
in the mac80211 code which lead to monitor mode not working at all for
drivers using that. One of my fixes is for an issue noticed by Michal,
we clear an already cleared value but do it without locking, so just
remove that. The other is for a data leak - we leak two bytes of kernel
memory out over the air in QoS NULL frames because those don't get a
sequence number assigned in the TX path."

For the iwlwifi bits, Emmanuel says:

"One more fix and an update for device IDs.
There is a bugzilla reported for the fix which is mentioned in the commit message."

Along with those...

Amitkumar Karwar provides two mwifiex fixes, both correcting some
data transcription problems.

Ivaylo Dimitrov uses skb_trim in the wl1251 driver to avoid
HAVE_EFFICIENT_UNALIGNED_ACCESS problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

92f092d1

09 Mar, 2014 1 commit

bnx2: Fix shutdown sequence · a8d9bc2e

Michael Chan authored Mar 09, 2014

The pci shutdown handler added in:

    bnx2: Add pci shutdown handler
    commit 25bfb1dd

created a shutdown down sequence without chip reset if the device was
never brought up.  This can cause the firmware to shutdown the PHY
prematurely and cause MMIO read cycles to be unresponsive.  On some
systems, it may generate NMI in the bnx2's pci shutdown handler.

The fix is to tell the firmware not to shutdown the PHY if there was
no prior chip reset.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a8d9bc2e

07 Mar, 2014 1 commit

Merge branch 'master' of... · 97bd5f00

John W. Linville authored Mar 07, 2014

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem

97bd5f00

06 Mar, 2014 30 commits

ipv6: don't set DST_NOCOUNT for remotely added routes · c88507fb

Sabrina Dubroca authored Mar 06, 2014

DST_NOCOUNT should only be used if an authorized user adds routes
locally. In case of routes which are added on behalf of router
advertisments this flag must not get used as it allows an unlimited
number of routes getting added remotely.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

c88507fb

net/mlx4_core: mlx4_init_slave() shouldn't access comm channel before PF is ready · 97989356

Amir Vadai authored Mar 06, 2014

Currently, the PF call to pci_enable_sriov from the PF probe function
stalls for 10 seconds times the number of VFs probed on the host. This
happens because the way for such VFs to determine of the PF
initialization finished, is by attempting to issue reset on the
comm-channel and get timeout (after 10s).

The PF probe function is called from a kenernel workqueue, and therefore
during that time, rcu lock is being held and kernel's workqueue is
stalled. This blocks other processes that try to use the workqueue
or rcu lock.  For example, interface renaming which is calling
rcu_synchronize is blocked, and timedout by systemd.

Changed mlx4_init_slave() to allow VF probed on the host to immediatly
detect that the PF is not ready, and return EPROBE_DEFER instantly.

Only when the PF finishes the initialization, allow such VFs to
access the comm channel.

This issue and fix are relevant only for probed VFs on the hypervisor,
there is no way to pass this information to a VM until comm channel is
ready, so in a VM, if PF is not ready, the first command will be timedout
after 10 seconds and return EPROBE_DEFER.
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

97989356

net/mlx4_core: Fix memory access error in mlx4_QUERY_DEV_CAP_wrapper() · 57352ef4

Amir Vadai authored Mar 06, 2014

Fix a regression introduced by [1]. outbox was accessed instead of
outbox->buf. Typo was copy-pasted to [2] and [3].

[1] - cc1ade94 mlx4_core: Disable memory windows for virtual functions
[2] - 4de65803 mlx4_core: Add support for steerable IB UD QPs
[3] - 7ffdf726 net/mlx4_core: Add basic support for TCP/IP offloads under
      tunneling
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

57352ef4

bonding: correctly handle out of range parameters for lp_interval · 5bd4e4c1

Sasha Levin authored Mar 06, 2014

We didn't correctly check cases where the value for lp_interval is not
within the legal range due to a missing table terminator.

This would let userspace trigger a kernel panic by specifying a value out
of range:

	echo -1 > /sys/devices/virtual/net/bond0/bonding/lp_interval

Introduced by commit 4325b374 ("bonding: convert lp_interval to use
the new option API").
Acked-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5bd4e4c1

ipv6: Fix exthdrs offload registration. · d2d273ff

Anton Nayshtut authored Mar 05, 2014

Without this fix, ipv6_exthdrs_offload_init doesn't register IPPROTO_DSTOPTS
offload, but returns 0 (as the IPPROTO_ROUTING registration actually succeeds).

This then causes the ipv6_gso_segment to drop IPv6 packets with IPPROTO_DSTOPTS
header.

The issue detected and the fix verified by running MS HCK Offload LSO test on
top of QEMU Windows guests, as this test sends IPv6 packets with
IPPROTO_DSTOPTS.
Signed-off-by: Anton Nayshtut <anton@swortex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d2d273ff

ibmveth: Fix endian issues with MAC addresses · d746ca95

Anton Blanchard authored Mar 05, 2014

The code to load a MAC address into a u64 for passing to the
hypervisor via a register is broken on little endian.

Create a helper function called ibmveth_encode_mac_addr
which does the right thing in both big and little endian.

We were storing the MAC address in a long in struct ibmveth_adapter.
It's never used so remove it - we don't need another place in the
driver where we create endian issues with MAC addresses.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

d746ca95

net: unix socket code abuses csum_partial · 0a13404d

Anton Blanchard authored Mar 05, 2014

The unix socket code is using the result of csum_partial to
hash into a lookup table:

	unix_hash_fold(csum_partial(sunaddr, len, 0));

csum_partial is only guaranteed to produce something that can be
folded into a checksum, as its prototype explains:

 * returns a 32-bit number suitable for feeding into itself
 * or csum_tcpudp_magic

The 32bit value should not be used directly.

Depending on the alignment, the ppc64 csum_partial will return
different 32bit partial checksums that will fold into the same
16bit checksum.

This difference causes the following testcase (courtesy of
Gustavo) to sometimes fail:

#include <sys/socket.h>
#include <stdio.h>

int main()
{
	int fd = socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0);

	int i = 1;
	setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &i, 4);

	struct sockaddr addr;
	addr.sa_family = AF_LOCAL;
	bind(fd, &addr, 2);

	listen(fd, 128);

	struct sockaddr_storage ss;
	socklen_t sslen = (socklen_t)sizeof(ss);
	getsockname(fd, (struct sockaddr*)&ss, &sslen);

	fd = socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0);

	if (connect(fd, (struct sockaddr*)&ss, sslen) == -1){
		perror(NULL);
		return 1;
	}
	printf("OK\n");
	return 0;
}

As suggested by davem, fix this by using csum_fold to fold the
partial 32bit checksum into a 16bit checksum before using it.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

0a13404d

net: Improve SO_TIMESTAMPING documentation and fix a minor code bug · adca4767

Andrew Lutomirski authored Mar 04, 2014

The original documentation was very unclear.

The code fix is presumably related to the formerly unclear
documentation: SOCK_TIMESTAMPING_RX_SOFTWARE has no effect on
__sock_recv_timestamp's behavior, so calling __sock_recv_ts_and_drops
from sock_recv_ts_and_drops if only SOCK_TIMESTAMPING_RX_SOFTWARE is
set is pointless.  This should have no user-observable effect.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

adca4767

phy: fix compiler array bounds warning on settings[] · 4ae6e50c

Bjorn Helgaas authored Mar 04, 2014

With -Werror=array-bounds, gcc v4.7.x warns that in phy_find_valid(), the
settings[] "array subscript is above array bounds", I think because idx is
a signed integer and if the caller supplied idx < 0, we pass the guard but
still reference out of bounds.

Fix this by making idx unsigned here and elsewhere.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4ae6e50c

inet: frag: make sure forced eviction removes all frags · e588e2f2

Florian Westphal authored Mar 06, 2014

Quoting Alexander Aring:
  While fragmentation and unloading of 6lowpan module I got this kernel Oops
  after few seconds:

  BUG: unable to handle kernel paging request at f88bbc30
  [..]
  Modules linked in: ipv6 [last unloaded: 6lowpan]
  Call Trace:
   [<c012af4c>] ? call_timer_fn+0x54/0xb3
   [<c012aef8>] ? process_timeout+0xa/0xa
   [<c012b66b>] run_timer_softirq+0x140/0x15f

Problem is that incomplete frags are still around after unload; when
their frag expire timer fires, we get crash.

When a netns is removed (also done when unloading module), inet_frag
calls the evictor with 'force' argument to purge remaining frags.

The evictor loop terminates when accounted memory ('work') drops to 0
or the lru-list becomes empty.  However, the mem accounting is done
via percpu counters and may not be accurate, i.e. loop may terminate
prematurely.

Alter evictor to only stop once the lru list is empty when force is
requested.
Reported-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Reported-by: Alexander Aring <alex.aring@gmail.com>
Tested-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e588e2f2

Merge branch 'tipc' · 409e1456

David S. Miller authored Mar 06, 2014

Eric Hugne says:

====================
tipc: refcount and memory leak fixes

v3: Remove error logging from data path completely. Rebased on top of
    latest net merge.

v2: Drop specific -ENOMEM logging in patch #1 (tipc: allow connection
    shutdown callback to be invoked in advance) And add a general error
    message if an internal server tries to send a message on a
    closed/nonexisting connection.

In addition to the fix for refcount leak and memory leak during
module removal, we also fix a problem where the topology server
listening socket where unexpectedly closed. We also eliminate an
unnecessary context switch during accept()/recvmsg() for nonblocking
sockets.

It might be good to include this patchset in stable aswell. After the
v3 rebase on latest merge from net all patches apply cleanly on that
tree.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

409e1456

tipc: don't log disabled tasklet handler errors · 2892505e

Erik Hugne authored Mar 06, 2014

Failure to schedule a TIPC tasklet with tipc_k_signal because the
tasklet handler is disabled is not an error. It means TIPC is
currently in the process of shutting down. We remove the error
logging in this case.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2892505e

tipc: fix memory leak during module removal · 1bb8dce5

Erik Hugne authored Mar 06, 2014

When the TIPC module is removed, the tasklet handler is disabled
before all other subsystems. This will cause lingering publications
in the name table because the node_down tasklets responsible to
clean up publications from an unreachable node will never run.
When the name table is shut down, these publications are detected
and an error message is logged:
tipc: nametbl_stop(): orphaned hash chain detected
This is actually a memory leak, introduced with commit
993b858e ("tipc: correct the order
of stopping services at rmmod")

Instead of just logging an error and leaking memory, we free
the orphaned entries during nametable shutdown.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1bb8dce5

tipc: drop subscriber connection id invalidation · edcc0511

Erik Hugne authored Mar 06, 2014

When a topology server subscriber is disconnected, the associated
connection id is set to zero. A check vs zero is then done in the
subscription timeout function to see if the subscriber have been
shut down. This is unnecessary, because all subscription timers
will be cancelled when a subscriber terminates. Setting the
connection id to zero is actually harmful because id zero is the
identity of the topology server listening socket, and can cause a
race that leads to this socket being closed instead.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

edcc0511

tipc: avoid to unnecessary process switch under non-block mode · fe8e4649

Ying Xue authored Mar 06, 2014

When messages are received via tipc socket under non-block mode,
schedule_timeout() is called in tipc_wait_for_rcvmsg(), that is,
the process of receiving messages will be scheduled once although
timeout value passed to schedule_timeout() is 0. The same issue
exists in accept()/wait_for_accept(). To avoid this unnecessary
process switch, we only call schedule_timeout() if the timeout
value is non-zero.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fe8e4649

tipc: fix connection refcount leak · 4652edb7

Ying Xue authored Mar 06, 2014

When tipc_conn_sendmsg() calls tipc_conn_lookup() to query a
connection instance, its reference count value is increased if
it's found. But subsequently if it's found that the connection is
closed, the work of sending message is not queued into its server
send workqueue, and the connection reference count is not decreased.
This will cause a reference count leak. To reproduce this problem,
an application would need to open and closes topology server
connections with high intensity.

We fix this by immediately decrementing the connection reference
count if a send fails due to the connection being closed.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4652edb7

tipc: allow connection shutdown callback to be invoked in advance · 6d4ebeb4

Ying Xue authored Mar 06, 2014

Currently connection shutdown callback function is called when
connection instance is released in tipc_conn_kref_release(), and
receiving packets and sending packets are running in different
threads. Even if connection is closed by the thread of receiving
packets, its shutdown callback may not be called immediately as
the connection reference count is non-zero at that moment. So,
although the connection is shut down by the thread of receiving
packets, the thread of sending packets doesn't know it. Before
its shutdown callback is invoked to tell the sending thread its
connection has been closed, the sending thread may deliver
messages by tipc_conn_sendmsg(), this is why the following error
information appears:

"Sending subscription event failed, no memory"

To eliminate it, allow connection shutdown callback function to
be called before connection id is removed in tipc_close_conn(),
which makes the sending thread know the truth in time that its
socket is closed so that it doesn't send message to it. We also
remove the "Sending XXX failed..." error reporting for topology
and config services.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6d4ebeb4

l2tp: fix userspace reception on plain L2TP sockets · 9e9cb622

Guillaume Nault authored Mar 06, 2014

As pppol2tp_recv() never queues up packets to plain L2TP sockets,
pppol2tp_recvmsg() never returns data to userspace, thus making
the recv*() system calls unusable.

Instead of dropping packets when the L2TP socket isn't bound to a PPP
channel, this patch adds them to its reception queue.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

9e9cb622

l2tp: fix manual sequencing (de)activation in L2TPv2 · bb5016ea

Guillaume Nault authored Mar 06, 2014

Commit e0d4435f "l2tp: Update PPP-over-L2TP driver to work over L2TPv3"
broke the PPPOL2TP_SO_SENDSEQ setsockopt. The L2TP header length was
previously computed by pppol2tp_l2t_header_len() before each call to
l2tp_xmit_skb(). Now that header length is retrieved from the hdr_len
session field, this field must be updated every time the L2TP header
format is modified, or l2tp_xmit_skb() won't push the right amount of
data for the L2TP header.

This patch uses l2tp_session_set_header_len() to adjust hdr_len every
time sequencing is (de)activated from userspace (either by the
PPPOL2TP_SO_SENDSEQ setsockopt or the L2TP_ATTR_SEND_SEQ netlink
attribute).
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

bb5016ea

mwifiex: save and copy AP's VHT capability info correctly · d5124648

Amitkumar Karwar authored Mar 04, 2014

While preparing association request, intersection of device's
VHT capability information and corresponding field advertised
by AP is used.

This patch fixes a couple errors while saving and copying vht_cap
and vht_oper fields from AP's beacon.

Cc: <stable@vger.kernel.org> # 3.9+
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

d5124648

mwifiex: copy AP's HT capability info correctly · c99b1861

Amitkumar Karwar authored Mar 04, 2014

While preparing association request, intersection of device's HT
capability information and corresponding fields advertised by AP
is used.

This patch fixes an error while copying this field from AP's
beacon.

Cc: <stable@vger.kernel.org>
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

c99b1861

Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 · 73c8daa0
John W. Linville authored Mar 06, 2014

73c8daa0

hyperv: Move state setting for link query · 1b07da51

Haiyang Zhang authored Mar 04, 2014

It moves the state setting for query into rndis_filter_receive_response().
All callbacks including query-complete and status-callback are synchronized
by channel->inbound_lock. This prevents pentential race between them.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1b07da51

net: macb: DMA-unmap full rx-buffer · 48330e08

Soren Brinkmann authored Mar 04, 2014

When allocating RX buffers a fixed size is used, while freeing is based
on actually received bytes, resulting in the following kernel warning
when CONFIG_DMA_API_DEBUG is enabled:
 WARNING: CPU: 0 PID: 0 at lib/dma-debug.c:1051 check_unmap+0x258/0x894()
 macb e000b000.ethernet: DMA-API: device driver frees DMA memory with different size [device address=0x000000002d170040] [map size=1536 bytes] [unmap size=60 bytes]
 Modules linked in:
 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.14.0-rc3-xilinx-00220-g49f84081ce4f #65
 [<c001516c>] (unwind_backtrace) from [<c0011df8>] (show_stack+0x10/0x14)
 [<c0011df8>] (show_stack) from [<c03c775c>] (dump_stack+0x7c/0xc8)
 [<c03c775c>] (dump_stack) from [<c00245cc>] (warn_slowpath_common+0x60/0x84)
 [<c00245cc>] (warn_slowpath_common) from [<c0024670>] (warn_slowpath_fmt+0x2c/0x3c)
 [<c0024670>] (warn_slowpath_fmt) from [<c0227d44>] (check_unmap+0x258/0x894)
 [<c0227d44>] (check_unmap) from [<c0228588>] (debug_dma_unmap_page+0x64/0x70)
 [<c0228588>] (debug_dma_unmap_page) from [<c02ab78c>] (gem_rx+0x118/0x170)
 [<c02ab78c>] (gem_rx) from [<c02ac4d4>] (macb_poll+0x24/0x94)
 [<c02ac4d4>] (macb_poll) from [<c031222c>] (net_rx_action+0x6c/0x188)
 [<c031222c>] (net_rx_action) from [<c0028a28>] (__do_softirq+0x108/0x280)
 [<c0028a28>] (__do_softirq) from [<c0028e8c>] (irq_exit+0x84/0xf8)
 [<c0028e8c>] (irq_exit) from [<c000f360>] (handle_IRQ+0x68/0x8c)
 [<c000f360>] (handle_IRQ) from [<c0008528>] (gic_handle_irq+0x3c/0x60)
 [<c0008528>] (gic_handle_irq) from [<c0012904>] (__irq_svc+0x44/0x78)
 Exception stack(0xc056df20 to 0xc056df68)
 df20: 00000001 c0577430 00000000 c0577430 04ce8e0d 00000002 edfce238 00000000
 df40: 04e20f78 00000002 c05981f4 00000000 00000008 c056df68 c0064008 c02d7658
 df60: 20000013 ffffffff
 [<c0012904>] (__irq_svc) from [<c02d7658>] (cpuidle_enter_state+0x54/0xf8)
 [<c02d7658>] (cpuidle_enter_state) from [<c02d77dc>] (cpuidle_idle_call+0xe0/0x138)
 [<c02d77dc>] (cpuidle_idle_call) from [<c000f660>] (arch_cpu_idle+0x8/0x3c)
 [<c000f660>] (arch_cpu_idle) from [<c006bec4>] (cpu_startup_entry+0xbc/0x124)
 [<c006bec4>] (cpu_startup_entry) from [<c053daec>] (start_kernel+0x350/0x3b0)
 ---[ end trace d5fdc38641bd3a11 ]---
 Mapped at:
  [<c0227184>] debug_dma_map_page+0x48/0x11c
  [<c02ab32c>] gem_rx_refill+0x154/0x1f8
  [<c02ac7b4>] macb_open+0x270/0x3e0
  [<c03152e0>] __dev_open+0x7c/0xfc
  [<c031554c>] __dev_change_flags+0x8c/0x140

Fixing this by passing the same size which is passed during mapping the
memory to the unmap function as well.
Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com>
Reviewed-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

48330e08

net: macb: Check DMA mappings for error · 92030908

Soren Brinkmann authored Mar 04, 2014

With CONFIG_DMA_API_DEBUG enabled the following warning is printed:
 WARNING: CPU: 0 PID: 619 at lib/dma-debug.c:1101 check_unmap+0x758/0x894()
 macb e000b000.ethernet: DMA-API: device driver failed to check map error[device address=0x000000002d171c02] [size=322 bytes] [mapped as single]
 Modules linked in:
 CPU: 0 PID: 619 Comm: udhcpc Not tainted 3.14.0-rc3-xilinx-00219-gd158fc7f #63
 [<c001516c>] (unwind_backtrace) from [<c0011df8>] (show_stack+0x10/0x14)
 [<c0011df8>] (show_stack) from [<c03c7714>] (dump_stack+0x7c/0xc8)
 [<c03c7714>] (dump_stack) from [<c00245cc>] (warn_slowpath_common+0x60/0x84)
 [<c00245cc>] (warn_slowpath_common) from [<c0024670>] (warn_slowpath_fmt+0x2c/0x3c)
 [<c0024670>] (warn_slowpath_fmt) from [<c0228244>] (check_unmap+0x758/0x894)
 [<c0228244>] (check_unmap) from [<c0228588>] (debug_dma_unmap_page+0x64/0x70)
 [<c0228588>] (debug_dma_unmap_page) from [<c02aba64>] (macb_interrupt+0x1f8/0x2dc)
 [<c02aba64>] (macb_interrupt) from [<c006c6e4>] (handle_irq_event_percpu+0x2c/0x178)
 [<c006c6e4>] (handle_irq_event_percpu) from [<c006c86c>] (handle_irq_event+0x3c/0x5c)
 [<c006c86c>] (handle_irq_event) from [<c006f548>] (handle_fasteoi_irq+0xb8/0x100)
 [<c006f548>] (handle_fasteoi_irq) from [<c006c148>] (generic_handle_irq+0x20/0x30)
 [<c006c148>] (generic_handle_irq) from [<c000f35c>] (handle_IRQ+0x64/0x8c)
 [<c000f35c>] (handle_IRQ) from [<c0008528>] (gic_handle_irq+0x3c/0x60)
 [<c0008528>] (gic_handle_irq) from [<c0012904>] (__irq_svc+0x44/0x78)
 Exception stack(0xed197f60 to 0xed197fa8)
 7f60: 00000134 60000013 bd94362e bd94362e be96b37c 00000014 fffffd72 00000122
 7f80: c000ebe4 ed196000 00000000 00000011 c032c0d8 ed197fa8 c0064008 c000ea20
 7fa0: 60000013 ffffffff
 [<c0012904>] (__irq_svc) from [<c000ea20>] (ret_fast_syscall+0x0/0x48)
 ---[ end trace 478f921d0d542d1e ]---
 Mapped at:
  [<c0227184>] debug_dma_map_page+0x48/0x11c
  [<c02aaca0>] macb_start_xmit+0x184/0x2a8
  [<c03143c0>] dev_hard_start_xmit+0x334/0x470
  [<c032c09c>] sch_direct_xmit+0x78/0x2f8
  [<c0314814>] __dev_queue_xmit+0x318/0x708

due to missing checks of the dma mapping. Add the appropriate checks to fix
this.
Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com>
Reviewed-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

92030908

net: sctp: fix skb leakage in COOKIE ECHO path of chunk->auth_chunk · c485658b

Daniel Borkmann authored Mar 04, 2014

While working on ec0223ec ("net: sctp: fix sctp_sf_do_5_1D_ce to
verify if we/peer is AUTH capable"), we noticed that there's a skb
memory leakage in the error path.

Running the same reproducer as in ec0223ec and by unconditionally
jumping to the error label (to simulate an error condition) in
sctp_sf_do_5_1D_ce() receive path lets kmemleak detector bark about
the unfreed chunk->auth_chunk skb clone:

Unreferenced object 0xffff8800b8f3a000 (size 256):
  comm "softirq", pid 0, jiffies 4294769856 (age 110.757s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    89 ab 75 5e d4 01 58 13 00 00 00 00 00 00 00 00  ..u^..X.........
  backtrace:
    [<ffffffff816660be>] kmemleak_alloc+0x4e/0xb0
    [<ffffffff8119f328>] kmem_cache_alloc+0xc8/0x210
    [<ffffffff81566929>] skb_clone+0x49/0xb0
    [<ffffffffa0467459>] sctp_endpoint_bh_rcv+0x1d9/0x230 [sctp]
    [<ffffffffa046fdbc>] sctp_inq_push+0x4c/0x70 [sctp]
    [<ffffffffa047e8de>] sctp_rcv+0x82e/0x9a0 [sctp]
    [<ffffffff815abd38>] ip_local_deliver_finish+0xa8/0x210
    [<ffffffff815a64af>] nf_reinject+0xbf/0x180
    [<ffffffffa04b4762>] nfqnl_recv_verdict+0x1d2/0x2b0 [nfnetlink_queue]
    [<ffffffffa04aa40b>] nfnetlink_rcv_msg+0x14b/0x250 [nfnetlink]
    [<ffffffff815a3269>] netlink_rcv_skb+0xa9/0xc0
    [<ffffffffa04aa7cf>] nfnetlink_rcv+0x23f/0x408 [nfnetlink]
    [<ffffffff815a2bd8>] netlink_unicast+0x168/0x250
    [<ffffffff815a2fa1>] netlink_sendmsg+0x2e1/0x3f0
    [<ffffffff8155cc6b>] sock_sendmsg+0x8b/0xc0
    [<ffffffff8155d449>] ___sys_sendmsg+0x369/0x380

What happens is that commit bbd0d598 clones the skb containing
the AUTH chunk in sctp_endpoint_bh_rcv() when having the edge case
that an endpoint requires COOKIE-ECHO chunks to be authenticated:

  ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ---------->
  <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] ---------
  ------------------ AUTH; COOKIE-ECHO ---------------->
  <-------------------- COOKIE-ACK ---------------------

When we enter sctp_sf_do_5_1D_ce() and before we actually get to
the point where we process (and subsequently free) a non-NULL
chunk->auth_chunk, we could hit the "goto nomem_init" path from
an error condition and thus leave the cloned skb around w/o
freeing it.

The fix is to centrally free such clones in sctp_chunk_destroy()
handler that is invoked from sctp_chunk_free() after all refs have
dropped; and also move both kfree_skb(chunk->auth_chunk) there,
so that chunk->auth_chunk is either NULL (since sctp_chunkify()
allocs new chunks through kmem_cache_zalloc()) or non-NULL with
a valid skb pointer. chunk->skb and chunk->auth_chunk are the
only skbs in the sctp_chunk structure that need to be handeled.

While at it, we should use consume_skb() for both. It is the same
as dev_kfree_skb() but more appropriately named as we are not
a device but a protocol. Also, this effectively replaces the
kfree_skb() from both invocations into consume_skb(). Functions
are the same only that kfree_skb() assumes that the frame was
being dropped after a failure (e.g. for tools like drop monitor),
usage of consume_skb() seems more appropriate in function
sctp_chunk_destroy() though.

Fixes: bbd0d598 ("[SCTP]: Implement the receive and verification of AUTH chunk")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Vlad Yasevich <yasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c485658b

r8152: disable the ECM mode · 10c32717

hayeswang authored Mar 04, 2014

There are known issues for switching the drivers between ECM mode and
vendor mode. The interrup transfer may become abnormal. The hardware
may have the opportunity to die if you change the configuration without
unloading the current driver first, because all the control transfers
of the current driver would fail after the command of switching the
configuration.

Although to use the ecm driver and vendor driver independently is fine,
it may have problems to change the driver from one to the other by
switching the configuration. Additionally, now the vendor mode driver
is more powerful than the ECM driver. Thus, disable the ECM mode driver,
and let r8152 to set the configuration to vendor mode and reset the
device automatically.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

10c32717

net/mlx4: Support shutdown() interface · 367d56f7

Gavin Shan authored Mar 04, 2014

In kexec scenario, we failed to load the mlx4 driver in the
second kernel because the ownership bit was hold by the first
kernel without release correctly.

The patch adds shutdown() interface so that the ownership can
be released correctly in the first kernel. It also helps avoiding
EEH error happened during boot stage of the second kernel because
of undesired traffic, which can't be handled by hardware during
that stage on Power platform.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Tested-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

367d56f7

bridge: multicast: add sanity check for query source addresses · 6565b9ee

Linus Lüssing authored Mar 04, 2014

MLD queries are supposed to have an IPv6 link-local source address
according to RFC2710, section 4 and RFC3810, section 5.1.14. This patch
adds a sanity check to ignore such broken MLD queries.

Without this check, such malformed MLD queries can result in a
denial of service: The queries are ignored by any MLD listener
therefore they will not respond with an MLD report. However,
without this patch these malformed MLD queries would enable the
snooping part in the bridge code, potentially shutting down the
according ports towards these hosts for multicast traffic as the
bridge did not learn about these listeners.
Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

6565b9ee

net: fix for a race condition in the inet frag code · 24b9bf43

Nikolay Aleksandrov authored Mar 03, 2014

I stumbled upon this very serious bug while hunting for another one,
it's a very subtle race condition between inet_frag_evictor,
inet_frag_intern and the IPv4/6 frag_queue and expire functions
(basically the users of inet_frag_kill/inet_frag_put).

What happens is that after a fragment has been added to the hash chain
but before it's been added to the lru_list (inet_frag_lru_add) in
inet_frag_intern, it may get deleted (either by an expired timer if
the system load is high or the timer sufficiently low, or by the
fraq_queue function for different reasons) before it's added to the
lru_list, then after it gets added it's a matter of time for the
evictor to get to a piece of memory which has been freed leading to a
number of different bugs depending on what's left there.

I've been able to trigger this on both IPv4 and IPv6 (which is normal
as the frag code is the same), but it's been much more difficult to
trigger on IPv4 due to the protocol differences about how fragments
are treated.

The setup I used to reproduce this is: 2 machines with 4 x 10G bonded
in a RR bond, so the same flow can be seen on multiple cards at the
same time. Then I used multiple instances of ping/ping6 to generate
fragmented packets and flood the machines with them while running
other processes to load the attacked machine.

*It is very important to have the _same flow_ coming in on multiple CPUs
concurrently. Usually the attacked machine would die in less than 30
minutes, if configured properly to have many evictor calls and timeouts
it could happen in 10 minutes or so.

An important point to make is that any caller (frag_queue or timer) of
inet_frag_kill will remove both the timer refcount and the
original/guarding refcount thus removing everything that's keeping the
frag from being freed at the next inet_frag_put.  All of this could
happen before the frag was ever added to the LRU list, then it gets
added and the evictor uses a freed fragment.

An example for IPv6 would be if a fragment is being added and is at
the stage of being inserted in the hash after the hash lock is
released, but before inet_frag_lru_add executes (or is able to obtain
the lru lock) another overlapping fragment for the same flow arrives
at a different CPU which finds it in the hash, but since it's
overlapping it drops it invoking inet_frag_kill and thus removing all
guarding refcounts, and afterwards freeing it by invoking
inet_frag_put which removes the last refcount added previously by
inet_frag_find, then inet_frag_lru_add gets executed by
inet_frag_intern and we have a freed fragment in the lru_list.

The fix is simple, just move the lru_add under the hash chain locked
region so when a removing function is called it'll have to wait for
the fragment to be added to the lru_list, and then it'll remove it (it
works because the hash chain removal is done before the lru_list one
and there's no window between the two list adds when the frag can get
dropped). With this fix applied I couldn't kill the same machine in 24
hours with the same setup.

Fixes: 3ef0eb0d ("net: frag, move LRU list maintenance outside of
rwlock")

CC: Florian Westphal <fw@strlen.de>
CC: Jesper Dangaard Brouer <brouer@redhat.com>
CC: David S. Miller <davem@davemloft.net>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

24b9bf43