Commits · 715dc1f342713816d1be1c37643a2c9e6ee181a7 · nexedi / linux

03 May, 2012 18 commits

net: Fix truesize accounting in skb_gro_receive() · 715dc1f3

Eric Dumazet authored May 02, 2012

GRO is very optimistic in skb truesize estimates, only taking into
account the used part of fragments.

Be conservative, and use more precise computation, so that bloated GRO
skbs can be collapsed eventually.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

715dc1f3

ixgbe: Fix use after free on module remove · af94bf6d

Alexander Duyck authored May 02, 2012

While testing the TCP changes I had to fix an issue in order to be able to
load and unload the module.

The recent patch that added thermal sensor support added a use after free
bug on module unload with an 82598 adapter in the system.  To resolve the
issue I have updated the code so that when we free the info_kobj we set it
back to NULL.

I suspect there are likely other bugs present, but I will leave that for
another patch that can undergo more testing.

I am submitting this directly to net-next since this fixes a fairly serious
bug that will lock up the ixgbe module until the system is rebooted.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

af94bf6d

tcp: move stats merge to the end of tcp_try_coalesce · 34a802a5

Alexander Duyck authored May 02, 2012

This change cleans up the last bits of tcp_try_coalesce so that we only
need one goto which jumps to the end of the function.  The idea is to make
the code more readable by putting things in a linear order so that we start
execution at the top of the function, and end it at the bottom.

I also made a slight tweak to the code for handling frags when we are a
clone.  Instead of making it an if (clone) loop else nr_frags = 0 I changed
the logic so that if (!clone) we just set the number of frags to 0 which
disables the for loop anyway.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

34a802a5

tcp: Move code related to head frag in tcp_try_coalesce · 57b55a7e

Alexander Duyck authored May 02, 2012

This change reorders the code related to the use of an skb->head_frag so it
is placed before we check the rest of the frags.  This allows the code to
read more linearly instead of like some sort of loop.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

57b55a7e

tcp: Fix truesize accounting in tcp_try_coalesce · c73c3d9c

Alexander Duyck authored May 02, 2012

This patch addresses several issues in the way we were tracking the
truesize in tcp_try_coalesce.

First it was using ksize which prevents us from having a 0 sized head frag
and getting a usable result.  To resolve that this patch uses the end
pointer which is set based off either ksize, or the frag_size supplied in
build_skb.  This allows us to compute the original truesize of the entire
buffer and remove that value leaving us with just what was added as pages.

The second issue was the use of skb->len if there is a mergeable head frag.
We should only need to remove the size of an data aligned sk_buff from our
current skb->truesize to compute the delta for a buffer with a reused head.
By using skb->len the value of truesize was being artificially reduced
which means that head frags could use more memory than buffers using
standard allocations.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c73c3d9c

net: Add missing linux/prefetch.h include to net/core/sock.c · 8c1ae10d
David S. Miller authored May 03, 2012
```
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
8c1ae10d

net: Stop decapitating clones that have a head_frag · 2996d31f

Alexander Duyck authored May 02, 2012

This change is meant ot prevent stealing the skb->head to use as a page in
the event that the skb->head was cloned.  This allows the other clones to
track each other via shinfo->dataref.

Without this we break down to two methods for tracking the reference count,
one being dataref, the other being the page count.  As a result it becomes
difficult to track how many references there are to skb->head.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2996d31f

net: implement tcp coalescing in tcp_queue_rcv() · b081f85c

Eric Dumazet authored May 02, 2012

Extend tcp coalescing implementing it from tcp_queue_rcv(), the main
receiver function when application is not blocked in recvmsg().

Function tcp_queue_rcv() is moved a bit to allow its call from
tcp_data_queue()

This gives good results especially if GRO could not kick, and if skb
head is a fragment.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b081f85c

net: take care of cloned skbs in tcp_try_coalesce() · 923dd347

Eric Dumazet authored May 02, 2012

Before stealing fragments or skb head, we must make sure skbs are not
cloned.

Alexander was worried about destination skb being cloned : In bridge
setups, a driver could be fooled if skb->data_len would not match skb
nr_frags.

If source skb is cloned, we must take references on pages instead.

Bug happened using tcpdump (if not using mmap())

Introduce kfree_skb_partial() helper to cleanup code.
Reported-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

923dd347

be2net: Fix EEH error reset before a flash dump completes · eeb7fc7b

Somnath Kotur authored May 02, 2012

An EEH error can cause the FW to trigger a flash debug dump.
Resetting the card while flash dump is in progress can cause it not to recover.
Wait for it to finish before letting EEH flow to reset the card.
Signed-off-by: Sathya Perla <Sathya.Perla@emulex.com>
Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

eeb7fc7b

be2net: Record receive queue index in skb to aid RPS. · aaa6daec

Somnath Kotur authored May 02, 2012

Signed-off-by: Sarveshwar Bandi <Sarveshwar.Bandi@emulex.com>
Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

aaa6daec

be2net: Fix to apply duplex value as unknown when link is down. · 682256db

Somnath Kotur authored May 02, 2012

Suggested-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Sarveshwar Bandi <sarveshwar.bandi@emulex.com>
Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

682256db

be2net: Fix to not set link speed for disabled functions of a UMC card · 22ca7a6e

Somnath Kotur authored May 02, 2012

This renders the interface view somewhat inconsistent from the Host OS POV
considering the rest of the interfaces are showing their respective speeds
based on the bandwidth assigned to them.
Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

22ca7a6e

tcp: early retransmit: delayed fast retransmit · 750ea2ba

Yuchung Cheng authored May 02, 2012

Implementing the advanced early retransmit (sysctl_tcp_early_retrans==2).
Delays the fast retransmit by an interval of RTT/4. We borrow the
RTO timer to implement the delay. If we receive another ACK or send
a new packet, the timer is cancelled and restored to original RTO
value offset by time elapsed.  When the delayed-ER timer fires,
we enter fast recovery and perform fast retransmit.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

750ea2ba

tcp: early retransmit · eed530b6

Yuchung Cheng authored May 02, 2012

This patch implements RFC 5827 early retransmit (ER) for TCP.
It reduces DUPACK threshold (dupthresh) if outstanding packets are
less than 4 to recover losses by fast recovery instead of timeout.

While the algorithm is simple, small but frequent network reordering
makes this feature dangerous: the connection repeatedly enter
false recovery and degrade performance. Therefore we implement
a mitigation suggested in the appendix of the RFC that delays
entering fast recovery by a small interval, i.e., RTT/4. Currently
ER is conservative and is disabled for the rest of the connection
after the first reordering event. A large scale web server
experiment on the performance impact of ER is summarized in
section 6 of the paper "Proportional Rate Reduction for TCP”,
IMC 2011. http://conferences.sigcomm.org/imc/2011/docs/p155.pdf

Note that Linux has a similar feature called THIN_DUPACK. The
differences are THIN_DUPACK do not mitigate reorderings and is only
used after slow start. Currently ER is disabled if THIN_DUPACK is
enabled. I would be happy to merge THIN_DUPACK feature with ER if
people think it's a good idea.

ER is enabled by sysctl_tcp_early_retrans:
  0: Disables ER

  1: Reduce dupthresh to packets_out - 1 when outstanding packets < 4.

  2: (Default) reduce dupthresh like mode 1. In addition, delay
     entering fast recovery by RTT/4.

Note: mode 2 is implemented in the third part of this patch series.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

eed530b6

tcp: early retransmit: tcp_enter_recovery() · 1fbc3405

Yuchung Cheng authored May 02, 2012

This a prepartion patch that refactors the code to enter recovery
into a new function tcp_enter_recovery(). It's needed to implement
the delayed fast retransmit in ER.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1fbc3405

net/pasemi: fix compiler warning · 5c6239c8

Stephen Rothwell authored May 03, 2012

Fix this compiler warning (on PowerPC) by not marking a parameter as
const:

drivers/net/ethernet/pasemi/pasemi_mac.c: In function 'pasemi_mac_replenish_rx_ring':
drivers/net/ethernet/pasemi/pasemi_mac.c:646:3: warning: passing argument 1 of 'netdev_alloc_skb' discards qualifiers from pointer target type
include/linux/skbuff.h:1706:31: note: expected 'struct net_device *' but argument is of type 'const struct net_device *'

Cc: Olof Johansson <olof@lixom.net>
Cc: Pradeep A. Dalvi <netdev@pradeepdalvi.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

5c6239c8

bnx2x: fix handling single MSIX mode for 57710/57711 · 69c326b3

Dmitry Kravkov authored May 02, 2012

commit 30a5de77 added
ability to use single MSI-X vector, but lack proper
handling for 57710/57711 HW
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

69c326b3

02 May, 2012 7 commits

ixgbe: Reset max_vfs to zero when user request is out of range · 6b42a9c5

Greg Rose authored Apr 17, 2012

If the user request for the number of VFs in the max_vfs parameter is
out of range then reset the value to the default value of zero. This
makes the behavior of the ixgbe driver the same as for the igb driver.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Robert Garrett <robertx.e.garrett@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

6b42a9c5

ixgbe: Deny MACVLAN requests from VFs with admin set MAC · 2ee7065f

Greg Rose authored Mar 24, 2012

If the host VMM administrator has set the virtual function device's
MAC address then also deny VF requests for MACVLAN filters.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Garrett, Robert <robertx.e.garrett@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

2ee7065f

ixgbe: add hwmon interface to export thermal data · 3ca8bc6d

Don Skidmore authored Apr 12, 2012

Some of our adapters have thermal data available, this patch exports
this data via hwmon sysfs interface.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

3ca8bc6d

ixgbe: add support functions to access thermal data · e1ea9158

Don Skidmore authored Feb 17, 2012

Some 82599 adapters contain thermal data that we can get to via
an i2c interface.  These functions provide support to get at that
data.  A following patch will export this data.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1ea9158

e1000e: fix .ndo_set_rx_mode for 82579 · 69e1e019

Bruce Allan authored Apr 14, 2012

Secondary unicast and multicast addresses are added to the Receive
Address registers (RAR) for most parts supported by the driver. For
82579, there is only one actual RAR and a number of Shared Receive Address
registers (SHRAR) that are shared among the driver and f/w which can be
reserved and write-protected by the f/w. On this device, use the SHRARs
that are not taken by f/w for the additional addresses.

Add a MAC ops function pointer infrastructure (similar to other MAC
operations in the driver) for setting RARs, introduce a new rar_set
function for 82579 and convert the existing code that sets RARs on other
devices to a generic rar_set function.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

69e1e019

e1000e: PHY initialization flow changes for 82577/8/9 · cb17aab9

Bruce Allan authored Apr 13, 2012

The PHY initialization flows and assorted workarounds for 82577/8/9 done
during driver load and resume from Sx should be the same yet they are not.
Combine the current flows/workarounds into a common set of functions that
are called during the different code paths.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

cb17aab9

e1000e: workaround EEPROM configuration change on 82579 · 62bc813e

Bruce Allan authored Mar 20, 2012

An update to the EEPROM on 82579 will extend a delay in hardware to fix an
issue with WoL not working after a G3->S5 transition which is unrelated to
the driver. However, this extended delay conflicts with nominal operation
of the device when it is initialized by the driver and after every reset
of the hardware (i.e. the driver starts configuring the device before the
hardware is done with it's own configuration work). The workaround for
when the driver is in control of the device is to tell the hardware after
every reset the configuration delay should be the original shorter one.

Some pre-existing variables are renamed generically to be re-used with
new register accesses.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

62bc813e

01 May, 2012 15 commits

netem: add ECN capability · e4ae004b

Eric Dumazet authored Apr 30, 2012

Add ECN (Explicit Congestion Notification) marking capability to netem

tc qdisc add dev eth0 root netem drop 0.5 ecn

Instead of dropping packets, try to ECN mark them.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

e4ae004b

net: skb_peek()/skb_peek_tail() cleanups · 18d07000

Eric Dumazet authored Apr 30, 2012

remove useless casts and rename variables for less confusion.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

18d07000

net: add a prefetch in socket backlog processing · e4cbb02a

Eric Dumazet authored Apr 30, 2012

TCP or UDP stacks have big enough latencies that prefetching next
pointer is worth it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e4cbb02a

l2tp: let iproute2 create L2TPv3 IP tunnels using IPv6 · 5dac94e1

James Chapman authored Apr 29, 2012

The netlink API lets users create unmanaged L2TPv3 tunnels using
iproute2. Until now, a request to create an unmanaged L2TPv3 IP
encapsulation tunnel over IPv6 would be rejected with
EPROTONOSUPPORT. Now that l2tp_ip6 implements sockets for L2TP IP
encapsulation over IPv6, we can add support for that tunnel type.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5dac94e1

l2tp: introduce L2TPv3 IP encapsulation support for IPv6 · a32e0eec

Chris Elston authored Apr 29, 2012

L2TPv3 defines an IP encapsulation packet format where data is carried
directly over IP (no UDP). The kernel already has support for L2TP IP
encapsulation over IPv4 (l2tp_ip). This patch introduces support for
L2TP IP encapsulation over IPv6.

The implementation is derived from ipv6/raw and ipv4/l2tp_ip.
Signed-off-by: Chris Elston <celston@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a32e0eec

ipv6: Export ipv6 functions for use by other protocols · a495f836

Chris Elston authored Apr 29, 2012

For implementing other protocols on top of IPv6, such as L2TPv3's IP
encapsulation over ipv6, we'd like to call some IPv6 functions which
are not currently exported. This patch exports them.
Signed-off-by: Chris Elston <celston@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a495f836

l2tp: netlink api for l2tpv3 ipv6 unmanaged tunnels · f9bac8df

Chris Elston authored Apr 29, 2012

This patch adds support for unmanaged L2TPv3 tunnels over IPv6 using
the netlink API. We already support unmanaged L2TPv3 tunnels over
IPv4. A patch to iproute2 to make use of this feature will be
submitted separately.
Signed-off-by: Chris Elston <celston@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f9bac8df

l2tp: show IPv6 addresses in l2tp debugfs file · 2121c3f5

Chris Elston authored Apr 29, 2012

If an L2TP tunnel uses IPv6, make sure the l2tp debugfs file shows the
IPv6 address correctly.
Signed-off-by: Chris Elston <celston@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2121c3f5

l2tp: pppol2tp_connect() handles ipv6 sockaddr variants · b79585f5

James Chapman authored Apr 29, 2012

Userspace uses connect() to associate a pppol2tp socket with a tunnel
socket. This needs to allow the caller to supply the new IPv6
sockaddr_pppol2tp structures if IPv6 is used.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b79585f5

pppox: Replace __attribute__((packed)) in if_pppox.h · 9d4ec1ae

James Chapman authored Apr 29, 2012

Checkpatch warns about the use of __attribute__((packed)). So use the
recommended __packed syntax instead.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9d4ec1ae

l2tp: remove unused stats from l2tp_ip socket · c8657fd5

James Chapman authored Apr 29, 2012

The l2tp_ip socket currently maintains packet/byte stats in its
private socket structure. But these counters aren't exposed to
userspace and so serve no purpose. The counters were also
smp-unsafe. So this patch just gets rid of the stats.

While here, change a couple of internal __u32 variables to u32.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c8657fd5

l2tp: Use ip4_datagram_connect() in l2tp_ip_connect() · de3c7a18

James Chapman authored Apr 29, 2012

Cleanup the l2tp_ip code to make use of an existing ipv4 support function.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

de3c7a18

l2tp: fix locking of 64-bit counters for smp · 5de7aee5

James Chapman authored Apr 29, 2012

L2TP uses 64-bit counters but since these are not updated atomically,
we need to make them safe for smp. This patch addresses that.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5de7aee5

atl1c: remove PHY polling from atl1c_change_mtu · 80bcb423

Huang, Xiong authored Apr 30, 2012

PHY polling code for FPGA is considered in every MDIO R/W API.
no need to add additional code to atl1c_change_mtu.
Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: David Liu <dwliu@qca.qaulcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

80bcb423

atl1c: Disable L0S when no cable link · 4fc36352

Huang, Xiong authored Apr 30, 2012

L0S might be unstable if no cable link, only enable it when link up.
Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4fc36352