Commits · dbde497966804e63a38fdedc1e3815e77097efc2 · nexedi / linux

19 Nov, 2013 4 commits

tcp: don't update snd_nxt, when a socket is switched from repair mode · dbde4979

Andrey Vagin authored Nov 19, 2013

snd_nxt must be updated synchronously with sk_send_head.  Otherwise
tp->packets_out may be updated incorrectly, what may bring a kernel panic.

Here is a kernel panic from my host.
[  103.043194] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[  103.044025] IP: [<ffffffff815aaaaf>] tcp_rearm_rto+0xcf/0x150
...
[  146.301158] Call Trace:
[  146.301158]  [<ffffffff815ab7f0>] tcp_ack+0xcc0/0x12c0

Before this panic a tcp socket was restored. This socket had sent and
unsent data in the write queue. Sent data was restored in repair mode,
then the socket was switched from reapair mode and unsent data was
restored. After that the socket was switched back into repair mode.

In that moment we had a socket where write queue looks like this:
snd_una    snd_nxt   write_seq
   |_________|________|
             |
	  sk_send_head

After a second switching from repair mode the state of socket was
changed:

snd_una          snd_nxt, write_seq
   |_________ ________|
             |
	  sk_send_head

This state is inconsistent, because snd_nxt and sk_send_head are not
synchronized.

Bellow you can find a call trace, how packets_out can be incremented
twice for one skb, if snd_nxt and sk_send_head are not synchronized.
In this case packets_out will be always positive, even when
sk_write_queue is empty.

tcp_write_wakeup
	skb = tcp_send_head(sk);
	tcp_fragment
		if (!before(tp->snd_nxt, TCP_SKB_CB(buff)->end_seq))
			tcp_adjust_pcount(sk, skb, diff);
	tcp_event_new_data_sent
		tp->packets_out += tcp_skb_pcount(skb);

I think update of snd_nxt isn't required, when a socket is switched from
repair mode.  Because it's initialized in tcp_connect_init. Then when a
write queue is restored, snd_nxt is incremented in tcp_event_new_data_sent,
so it's always is in consistent state.

I have checked, that the bug is not reproduced with this patch and
all tests about restoring tcp connections work fine.

Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dbde4979

atm: idt77252: fix dev refcnt leak · b5de4a22

Ying Xue authored Nov 19, 2013

init_card() calls dev_get_by_name() to get a network deceive. But it
doesn't decrease network device reference count after the device is
used.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b5de4a22

xfrm: Release dst if this dst is improper for vti tunnel · 236c9f84

fan.du authored Nov 19, 2013

After searching rt by the vti tunnel dst/src parameter,
if this rt has neither attached to any transformation
nor the transformation is not tunnel oriented, this rt
should be released back to ip layer.

otherwise causing dst memory leakage.
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

236c9f84

netlink: fix documentation typo in netlink_set_err() · 840e93f2

Johannes Berg authored Nov 19, 2013

The parameter is just 'group', not 'groups', fix the documentation typo.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

840e93f2

18 Nov, 2013 14 commits

be2net: Delete secondary unicast MAC addresses during be_close · d11a347d

Ajit Khaparde authored Nov 18, 2013

Secondary unicast MAC addresses will get deleted only when the interface is UP.
When the interface is DOWN, though these secondary MAC addresses are unusable
and awaiting to be deleted, cause the firmware to believe that they are being used.
If the user intends to set a MAC address as primary MAC from one of these
secondary MAC addresses, the firmware returns a MAC address Collision error.
Delete these secondary MAC addresses during be_close.

The secondary MAC addresses list will be refreshed during interface open anyway.
Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d11a347d

be2net: Fix unconditional enabling of Rx interface options · 012bd387

Ajit Khaparde authored Nov 18, 2013

The driver currently requests the firmware to enable rx_interface options
without considering if the interface was created with that capability.
This could cause commands to firmware to fail.

To avoid this, enable only those options on an interface if the interface
was created with that capability.
Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

012bd387

net, virtio_net: replace the magic value · 0f13b66b

Zhi Yong Wu authored Nov 18, 2013

It is more appropriate to use # of queue pairs currently used by
the driver instead of a magic value.
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0f13b66b

ping: prevent NULL pointer dereference on write to msg_name · cf970c00

Hannes Frederic Sowa authored Nov 18, 2013

A plain read() on a socket does set msg->msg_name to NULL. So check for
NULL pointer first.
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

cf970c00

Merge branch 'bnx2x' · 4861292f

David S. Miller authored Nov 18, 2013

Yuval Mintz says:

====================
bnx2x: Bug fixes patch series

This series contains several fixes, relating either to SR-IOV flows
or to critical sections protected by the rtnl lock.

Please consider applying these patches to `net'.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

4861292f

bnx2x: Prevent "timeout waiting for state X" · 6ffa39f2

Dmitry Kravkov authored Nov 17, 2013

Current driver release rtnl lock in between DCB re-configuration.
As a result, other flows (e.g., mtu config) may enter in between and fail
due to halted tx path for dcb configuration.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6ffa39f2

bnx2x: prevent CFC attention · ffa1cb96

Dmitry Kravkov authored Nov 17, 2013

During VF load, prior to sending messages on HW channel to PF the VF
checks its bulletin board to see whether the PF indicated it has closed;
If a closed PF is encountered, the VF skips sending the message.

Due to incorrect return values, there's a possible scenario in which the VF
finishes loading "successfully", while the PF hasn't actually fully configured
FW/HW for the VFs supposed configuration.
Once VF tries to send Tx packets, HW will raise an attention (and FW possibly
will start treat the VF as malicious).

The patch fails the loading process in such a scenario.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ffa1cb96

bnx2x: Prevent panic during DMAE timeout · 9dcd9acd

Dmitry Kravkov authored Nov 17, 2013

If chip enters a recovery flow just after the driver issues a DMAE request
the DMAE will timeout. Current code will cause a bnx2x_panic() as a result,
which means interface will no longer be usable (regardless of the recovery
results), as bnx2x_panic() is irreversible for the driver.

As this is a possible flow, the panic should be reached only when driver
is compiled with STOP_ON_ERROR.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9dcd9acd

bnx2x: Clean the sp rtnl task upon unload · a0d307b2

Dmitry Kravkov authored Nov 17, 2013

While unloading, bnx2x needs to clean the sp_rtnl_state to prevent
configuration made before the unload to be applied afterwards with
stale values.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a0d307b2

ipv6: Fix inet6_init() cleanup order · eca42aaf

Vlad Yasevich authored Nov 16, 2013

Commit 6d0bfe22
	net: ipv6: Add IPv6 support to the ping socket

introduced a change in the cleanup logic of inet6_init and
has a bug in that ipv6_packet_cleanup() may not be called.
Fix the cleanup ordering.

CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
CC: Lorenzo Colitti <lorenzo@google.com>
CC: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

eca42aaf

genetlink: rename shadowed variable · 029b234f

Johannes Berg authored Nov 18, 2013

Sparse pointed out that the new flags variable I had added
shadowed an existing one, rename the new one to avoid that,
making the code clearer.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

029b234f

inet: prevent leakage of uninitialized memory to user in recv syscalls · bceaa902

Hannes Frederic Sowa authored Nov 18, 2013

Only update *addr_len when we actually fill in sockaddr, otherwise we
can return uninitialized memory from the stack to the caller in the
recvfrom, recvmmsg and recvmsg syscalls. Drop the the (addr_len == NULL)
checks because we only get called with a valid addr_len pointer either
from sock_common_recvmsg or inet_recvmsg.

If a blocking read waits on a socket which is concurrently shut down we
now return zero and set msg_msgnamelen to 0.
Reported-by: mpb <mpb.mail@gmail.com>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bceaa902

net: ipv6: ndisc: Fix warning when CONFIG_SYSCTL=n · bcd081a3

Fabio Estevam authored Nov 16, 2013

When CONFIG_SYSCTL=n the following build warning happens:

net/ipv6/ndisc.c:1730:1: warning: label 'out' defined but not used [-Wunused-label]

The 'out' label is only used when CONFIG_SYSCTL=y, so move it inside the
'ifdef CONFIG_SYSCTL' block.
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bcd081a3

xen-netfront: fix missing rx_refill_timer when allocate memory failed · fdcf7765

Ma JieYue authored Nov 15, 2013

There was a bug in xennet_alloc_rx_buffers, when allocating page or
sk_buff failed, and at the same time rx_batch queue not empty,
the rx_refill_timer timer won't be scheduled. If finally the remaining
request buffers in rx ring less than what backend driver expected,
the backend driver would think of rx ring as full and start dropping packets.
In such situation, there is no way for the netfront driver to recover
automatically, so that the device can not work properly.

The patch fixes the problem by always scheduling rx_refill_timer timer when
alloc_page or __netdev_alloc_skb fails, no matter whether rx_batch queue is
empty or not. It ensures that the rx ring request buffers will finally meet
the backend needs.
Signed-off-by: Ma JieYue <jieyue.majy@alibaba-inc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fdcf7765

16 Nov, 2013 5 commits

net: Handle CHECKSUM_COMPLETE more adequately in pskb_trim_rcsum(). · 018c5bba

David S. Miller authored Nov 15, 2013

Currently pskb_trim_rcsum() just balks on CHECKSUM_COMPLETE packets
and remarks them as CHECKSUM_NONE, forcing a software checksum
validation later.

We have all of the mechanics available to fixup the skb->csum value,
even for complicated fragmented packets, via the helpers
skb_checksum() and csum_sub().

So just use them.

Based upon a suggestion by Herbert Xu.
Signed-off-by: David S. Miller <davem@davemloft.net>

018c5bba

pkt_sched: fq: fix pacing for small frames · f52ed899

Eric Dumazet authored Nov 15, 2013

For performance reasons, sch_fq tried hard to not setup timers for every
sent packet, using a quantum based heuristic : A delay is setup only if
the flow exhausted its credit.

Problem is that application limited flows can refill their credit
for every queued packet, and they can evade pacing.

This problem can also be triggered when TCP flows use small MSS values,
as TSO auto sizing builds packets that are smaller than the default fq
quantum (3028 bytes)

This patch adds a 40 ms delay to guard flow credit refill.

Fixes: afe4fd06 ("pkt_sched: fq: Fair Queue packet scheduler")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f52ed899

pkt_sched: fq: warn users using defrate · 65c5189a

Eric Dumazet authored Nov 15, 2013

Commit 7eec4174 ("pkt_sched: fq: fix non TCP flows pacing")
obsoleted TCA_FQ_FLOW_DEFAULT_RATE without notice for the users.

Suggested by David Miller
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

65c5189a

genetlink: unify registration functions · 568508aa

Johannes Berg authored Nov 15, 2013

Now that the ops assignment is just two variables rather than a
long list iteration etc., there's no reason to separately export
__genl_register_family() and __genl_register_family_with_ops().

Unify the two functions into __genl_register_family() and make
genl_register_family_with_ops() call it after assigning the ops.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

568508aa

MAINTAINERS: Update Pegasus and RTL8150 repositories; · 052e3128

Petko Manolov authored Nov 15, 2013

The diff is against latest 'net' repository;
Signed-off-by: Petko Manolov <petkan@nucleusys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

052e3128

15 Nov, 2013 7 commits

net: ethernet: ti/cpsw: do not crash on single-MAC machines during resume · 1e7a2e21

Daniel Mack authored Nov 15, 2013

During resume, use for_each_slave to walk the slaves of the cpsw, and
soft-reset each of them. This prevents oopses if there is only one
slave configured.
Signed-off-by: Daniel Mack <zonque@gmail.com>
Acked-by: Mugunthan V N <mugunthanvnm@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1e7a2e21

Merge branch 'macvlan' · 82c80e9d

David S. Miller authored Nov 15, 2013

Michal Kubecek says:

====================
macvlan: disable LRO on lowerdev instead of a macvlan

A customer of ours encountered a problem with LRO on an ixgbe network
card. Analysis showed that it was a known conflict of forwarding and LRO
but the forwarding was enabled in an LXC container where only a macvlan
was, not the ethernet device itself.

I believe the solution is exactly the same as what we do for "normal"
(802.1q) VLAN devices: if dev_disable_lro() is called for such device,
LRO is disabled on the underlying "real" device instead.

v2: adapt to changes merged from net-next

v3: use BUG() in macvlan_dev_real_dev() if compiled without macvlan
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

82c80e9d

macvlan: disable LRO on lower device instead of macvlan · 529d0489

Michal Kubeček authored Nov 15, 2013

A macvlan device has always LRO disabled so that calling
dev_disable_lro() on it does nothing. If we need to disable LRO
e.g. because

  - the macvlan device is inserted into a bridge
  - IPv6 forwarding is enabled for it
  - it is in a different namespace than lowerdev and IPv4
    forwarding is enabled in it

we need to disable LRO on its underlying device instead (as we
do for 802.1q VLAN devices).

v2: use newly introduced netif_is_macvlan()
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>

529d0489

macvlan: introduce macvlan_dev_real_dev() helper function · be9eac48

Michal Kubeček authored Nov 15, 2013

Introduce helper function macvlan_dev_real_dev which returns the
underlying device of a macvlan device, similar to vlan_dev_real_dev()
for 802.1q VLAN devices.

v2: IFF_MACVLAN flag and equivalent of is_macvlan_dev() were
introduced in the meantime

v3: do BUG() if compiled without macvlan support
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>

be9eac48

bonding: add ip checks when store ip target · f9de11a1

Wang Weidong authored Nov 15, 2013

I met a Bug when I add ip target with the wrong ip address:

echo +500.500.500.500 > /sys/class/net/bond0/bonding/arp_ip_target

the wrong ip address will transfor to 245.245.245.244 and add
to the ip target success, it is uncorrect, so I add checks to avoid
adding wrong address.

The in4_pton() will set wrong ip address to 0.0.0.0, it will return by
the next check and will not add to ip target.

v2
According Veaceslav's opinion, simplify the code.

v3
According Veaceslav's opinion, add broadcast check and make a micro
definition to package it.

v4
Solve the problem of the format which David point out.
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f9de11a1

6lowpan: Uncompression of traffic class field was incorrect · 1188f054

Jukka Rissanen authored Nov 13, 2013

If priority/traffic class field in IPv6 header is set (seen when
using ssh), the uncompression sets the TC and Flow fields incorrectly.

Example:

This is IPv6 header of a sent packet. Note the priority/TC (=1) in
the first byte.

00000000: 61 00 00 00 00 2c 06 40 fe 80 00 00 00 00 00 00
00000010: 02 02 72 ff fe c6 42 10 fe 80 00 00 00 00 00 00
00000020: 02 1e ab ff fe 4c 52 57

This gets compressed like this in the sending side

00000000: 72 31 04 06 02 1e ab ff fe 4c 52 57 ec c2 00 16
00000010: aa 2d fe 92 86 4e be c6 ....

In the receiving end, the packet gets uncompressed to this
IPv6 header

00000000: 60 06 06 02 00 2a 1e 40 fe 80 00 00 00 00 00 00
00000010: 02 02 72 ff fe c6 42 10 fe 80 00 00 00 00 00 00
00000020: ab ff fe 4c 52 57 ec c2

First four bytes are set incorrectly and we have also lost
two bytes from destination address.

The fix is to switch the case values in switch statement
when checking the TC field.
Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1188f054

tipc: fix dereference before check warning · 3db0a197

Erik Hugne authored Nov 13, 2013

This fixes the following Smatch warning:
net/tipc/link.c:2364 tipc_link_recv_fragment()
    warn: variable dereferenced before check '*head' (see line 2361)

A null pointer might be passed to skb_try_coalesce if
a malicious sender injects orphan fragments on a link.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3db0a197

14 Nov, 2013 10 commits

ipv4: fix possible seqlock deadlock · c9e90429

Eric Dumazet authored Nov 14, 2013

ip4_datagram_connect() being called from process context,
it should use IP_INC_STATS() instead of IP_INC_STATS_BH()
otherwise we can deadlock on 32bit arches, or get corruptions of
SNMP counters.

Fixes: 584bdf8c ("[IPV4]: Fix "ipOutNoRoutes" counter error for TCP and UDP")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c9e90429

net/hsr: Fix possible leak in 'hsr_get_node_status()' · 84a035f6

Geyslan G. Bem authored Nov 14, 2013

If 'hsr_get_node_data()' returns error, going directly to 'fail' label
doesn't free the memory pointed by 'skb_out'.
Signed-off-by: Geyslan G. Bem <geyslan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

84a035f6

Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless · 8422d1f1

David S. Miller authored Nov 14, 2013

John W. Linville says:

====================
pull request: wireless 2013-11-14

Please pull this batch of fixes intended for the 3.13 stream!

Amitkumar Karwar offers a quartet of mwifiex fixes, including an
endian fix and three fixes for invalid memory access.

Avinash Patil trims the packet length value for packets received from
an SDIO interface.

Colin Ian King fixes a NULL pointer dereference in the rtlwifi
efuse code.

Dan Carpenter cleans-up an mwifiex integer underflow, a potential
libertas oops, a memory corrupion bug in wcn36xx, and a locking issue
also in wcn36xx.

Dan Williams helps prism54 devices to avoid being misclassified as
Ethernet devices.

Felipe Pena fixes a couple of typo errors, one in rt2x00 and the
other in rtlwifi.

Janusz Dziedzic corrects a pair of DFS-related problems in ath9k.

Larry Finger patches three rtlwifi drivers to correctly report signal
strength even for an unassociated AP.

Mark Cave-Ayland rewrites some endian-illiterate packet type extraction
code in rtlwifi.

Stanislaw Gruszka addresses an rt2x00 regression related to setting
HT station WCID and AMPDU density parameters.

Sujith Manoharan corrects the initvals settings for AR9485.

Ujjal Roy patches an obscure bit of code in mwifiex that was using
the wrong definition of eth_hdr when briding patches in AP mode.

Wei Yongjun fixes a couple of bugs: one is a return code handling
bug in libertas; and, the other is a locking issue in wcn36xx.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

8422d1f1

virtio-net: mergeable buffer size should include virtio-net header · 5061de36

Michael Dalton authored Nov 14, 2013

Commit 2613af0e ("virtio_net: migrate mergeable rx buffers to page
frag allocators") changed the mergeable receive buffer size from PAGE_SIZE
to MTU-size. However, the merge buffer size does not take into account the
size of the virtio-net header. Consequently, packets that are MTU-size
will take two buffers intead of one (to store the virtio-net header),
substantially decreasing the throughput of MTU-size traffic due to TCP
window / SKB truesize effects.

This commit changes the mergeable buffer size to include the virtio-net
header. The buffer size is cacheline-aligned because skb_page_frag_refill
will not automatically align the requested size.

Benchmarks taken from an average of 5 netperf 30-second TCP_STREAM runs
between two QEMU VMs on a single physical machine. Each VM has two VCPUs and
vhost enabled. All VMs and vhost threads run in a single 4 CPU cgroup
cpuset, using cgroups to ensure that other processes in the system will not
be scheduled on the benchmark CPUs. Transmit offloads and mergeable receive
buffers are enabled, but guest_tso4 / guest_csum are explicitly disabled to
force MTU-sized packets on the receiver.

next-net trunk before 2613af0e (PAGE_SIZE buf): 3861.08Gb/s
net-next trunk (MTU 1500- packet uses two buf due to size bug): 4076.62Gb/s
net-next trunk (MTU 1480- packet fits in one buf): 6301.34Gb/s
net-next trunk w/ size fix (MTU 1500 - packet fits in one buf): 6445.44Gb/s
Suggested-by: Eric Northup <digitaleric@google.com>
Signed-off-by: Michael Dalton <mwdalton@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5061de36

connector: improved unaligned access error fix · 1ca1a4cf

Chris Metcalf authored Nov 14, 2013

In af3e095a, Erik Jacobsen fixed one type of unaligned access
bug for ia64 by converting a 64-bit write to use put_unaligned().
Unfortunately, since gcc will convert a short memset() to a series
of appropriately-aligned stores, the problem is now visible again
on tilegx, where the memset that zeros out proc_event is converted
to three 64-bit stores, causing an unaligned access panic.

A better fix for the original problem is to ensure that proc_event
is aligned to 8 bytes here. We can do that relatively easily by
arranging to start the struct cn_msg aligned to 8 bytes and then
offset by 4 bytes. Doing so means that the immediately following
proc_event structure is then correctly aligned to 8 bytes.

The result is that the memset() stores are now aligned, and as an
added benefit, we can remove the put_unaligned() calls in the code.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1ca1a4cf

pkt_sched: fq: change classification of control packets · 2abc2f07

Maciej Żenczykowski authored Nov 14, 2013

Initial sch_fq implementation copied code from pfifo_fast to classify
a packet as a high prio packet.

This clashes with setups using PRIO with say 7 bands, as one of the
band could be incorrectly (mis)classified by FQ.

Packets would be queued in the 'internal' queue, and no pacing ever
happen for this special queue.

Fixes: afe4fd06 ("pkt_sched: fq: Fair Queue packet scheduler")
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2abc2f07

alx: Reset phy speed after resume · b54629e2

hahnjo authored Nov 12, 2013

This fixes bug 62491 (https://bugzilla.kernel.org/show_bug.cgi?id=62491).
After resuming some users got the following error flooding the kernel log:
alx 0000:02:00.0: invalid PHY speed/duplex: 0xffff
Signed-off-by: Jonas Hahnfeld <linux@hahnjo.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

b54629e2

Merge branch 'genetlink' · 4fb09a87

David S. Miller authored Nov 14, 2013

Johannes Berg says:

====================
genetlink: reduce ops size and complexity (v2)

As before - reduce the complexity and data/code size of genetlink ops
by making them an array rather than a linked list. Most users already
use an array thanks to genl_register_family_with_ops(), so convert the
remaining ones allowing us to get rid of the list head in each op.

Also make them const, this just makes sense at that point and the security
people like making function pointers const as well :-)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

4fb09a87

genetlink: make genl_ops flags a u8 and move to end · 3f5ccd06

Johannes Berg authored Nov 14, 2013

To save some space in the struct on 32-bit systems,
make the flags a u8 (only 4 bits are used) and also
move them to the end of the struct.

This has no impact on 64-bit systems as alignment of
the struct in an array uses up the space anyway.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3f5ccd06

genetlink: make all genl_ops users const · 4534de83

Johannes Berg authored Nov 14, 2013

Now that genl_ops are no longer modified in place when
registering, they can be made const. This patch was done
mostly with spatch:

@@
identifier ops;
@@
+const
 struct genl_ops ops[] = {
 ...
 };

(except the struct thing in net/openvswitch/datapath.c)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4534de83