Commits · eed530b6c67624db3f2cf477bac7c4d005d8f7ba · nexedi / linux

03 May, 2012 4 commits

Yuchung Cheng authored May 02, 2012

This patch implements RFC 5827 early retransmit (ER) for TCP.
It reduces DUPACK threshold (dupthresh) if outstanding packets are
less than 4 to recover losses by fast recovery instead of timeout.

While the algorithm is simple, small but frequent network reordering
makes this feature dangerous: the connection repeatedly enter
false recovery and degrade performance. Therefore we implement
a mitigation suggested in the appendix of the RFC that delays
entering fast recovery by a small interval, i.e., RTT/4. Currently
ER is conservative and is disabled for the rest of the connection
after the first reordering event. A large scale web server
experiment on the performance impact of ER is summarized in
section 6 of the paper "Proportional Rate Reduction for TCP”,
IMC 2011. http://conferences.sigcomm.org/imc/2011/docs/p155.pdf

Note that Linux has a similar feature called THIN_DUPACK. The
differences are THIN_DUPACK do not mitigate reorderings and is only
used after slow start. Currently ER is disabled if THIN_DUPACK is
enabled. I would be happy to merge THIN_DUPACK feature with ER if
people think it's a good idea.

ER is enabled by sysctl_tcp_early_retrans:
  0: Disables ER

  1: Reduce dupthresh to packets_out - 1 when outstanding packets < 4.

  2: (Default) reduce dupthresh like mode 1. In addition, delay
     entering fast recovery by RTT/4.

Note: mode 2 is implemented in the third part of this patch series.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

eed530b6

tcp: early retransmit: tcp_enter_recovery() · 1fbc3405

Yuchung Cheng authored May 02, 2012

This a prepartion patch that refactors the code to enter recovery
into a new function tcp_enter_recovery(). It's needed to implement
the delayed fast retransmit in ER.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1fbc3405

net/pasemi: fix compiler warning · 5c6239c8

Stephen Rothwell authored May 03, 2012

Fix this compiler warning (on PowerPC) by not marking a parameter as
const:

drivers/net/ethernet/pasemi/pasemi_mac.c: In function 'pasemi_mac_replenish_rx_ring':
drivers/net/ethernet/pasemi/pasemi_mac.c:646:3: warning: passing argument 1 of 'netdev_alloc_skb' discards qualifiers from pointer target type
include/linux/skbuff.h:1706:31: note: expected 'struct net_device *' but argument is of type 'const struct net_device *'

Cc: Olof Johansson <olof@lixom.net>
Cc: Pradeep A. Dalvi <netdev@pradeepdalvi.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

5c6239c8

bnx2x: fix handling single MSIX mode for 57710/57711 · 69c326b3

Dmitry Kravkov authored May 02, 2012

commit 30a5de77 added
ability to use single MSI-X vector, but lack proper
handling for 57710/57711 HW
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

69c326b3

02 May, 2012 7 commits

ixgbe: Reset max_vfs to zero when user request is out of range · 6b42a9c5

Greg Rose authored Apr 17, 2012

If the user request for the number of VFs in the max_vfs parameter is
out of range then reset the value to the default value of zero. This
makes the behavior of the ixgbe driver the same as for the igb driver.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Robert Garrett <robertx.e.garrett@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

6b42a9c5

ixgbe: Deny MACVLAN requests from VFs with admin set MAC · 2ee7065f

Greg Rose authored Mar 24, 2012

If the host VMM administrator has set the virtual function device's
MAC address then also deny VF requests for MACVLAN filters.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Garrett, Robert <robertx.e.garrett@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

2ee7065f

ixgbe: add hwmon interface to export thermal data · 3ca8bc6d

Don Skidmore authored Apr 12, 2012

Some of our adapters have thermal data available, this patch exports
this data via hwmon sysfs interface.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

3ca8bc6d

ixgbe: add support functions to access thermal data · e1ea9158

Don Skidmore authored Feb 17, 2012

Some 82599 adapters contain thermal data that we can get to via
an i2c interface.  These functions provide support to get at that
data.  A following patch will export this data.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1ea9158

e1000e: fix .ndo_set_rx_mode for 82579 · 69e1e019

Bruce Allan authored Apr 14, 2012

Secondary unicast and multicast addresses are added to the Receive
Address registers (RAR) for most parts supported by the driver. For
82579, there is only one actual RAR and a number of Shared Receive Address
registers (SHRAR) that are shared among the driver and f/w which can be
reserved and write-protected by the f/w. On this device, use the SHRARs
that are not taken by f/w for the additional addresses.

Add a MAC ops function pointer infrastructure (similar to other MAC
operations in the driver) for setting RARs, introduce a new rar_set
function for 82579 and convert the existing code that sets RARs on other
devices to a generic rar_set function.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

69e1e019

e1000e: PHY initialization flow changes for 82577/8/9 · cb17aab9

Bruce Allan authored Apr 13, 2012

The PHY initialization flows and assorted workarounds for 82577/8/9 done
during driver load and resume from Sx should be the same yet they are not.
Combine the current flows/workarounds into a common set of functions that
are called during the different code paths.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

cb17aab9

e1000e: workaround EEPROM configuration change on 82579 · 62bc813e

Bruce Allan authored Mar 20, 2012

An update to the EEPROM on 82579 will extend a delay in hardware to fix an
issue with WoL not working after a G3->S5 transition which is unrelated to
the driver. However, this extended delay conflicts with nominal operation
of the device when it is initialized by the driver and after every reset
of the hardware (i.e. the driver starts configuring the device before the
hardware is done with it's own configuration work). The workaround for
when the driver is in control of the device is to tell the hardware after
every reset the configuration delay should be the original shorter one.

Some pre-existing variables are renamed generically to be re-used with
new register accesses.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

62bc813e

01 May, 2012 29 commits

netem: add ECN capability · e4ae004b

Eric Dumazet authored Apr 30, 2012

Add ECN (Explicit Congestion Notification) marking capability to netem

tc qdisc add dev eth0 root netem drop 0.5 ecn

Instead of dropping packets, try to ECN mark them.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

e4ae004b

net: skb_peek()/skb_peek_tail() cleanups · 18d07000

Eric Dumazet authored Apr 30, 2012

remove useless casts and rename variables for less confusion.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

18d07000

net: add a prefetch in socket backlog processing · e4cbb02a

Eric Dumazet authored Apr 30, 2012

TCP or UDP stacks have big enough latencies that prefetching next
pointer is worth it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e4cbb02a

l2tp: let iproute2 create L2TPv3 IP tunnels using IPv6 · 5dac94e1

James Chapman authored Apr 29, 2012

The netlink API lets users create unmanaged L2TPv3 tunnels using
iproute2. Until now, a request to create an unmanaged L2TPv3 IP
encapsulation tunnel over IPv6 would be rejected with
EPROTONOSUPPORT. Now that l2tp_ip6 implements sockets for L2TP IP
encapsulation over IPv6, we can add support for that tunnel type.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5dac94e1

l2tp: introduce L2TPv3 IP encapsulation support for IPv6 · a32e0eec

Chris Elston authored Apr 29, 2012

L2TPv3 defines an IP encapsulation packet format where data is carried
directly over IP (no UDP). The kernel already has support for L2TP IP
encapsulation over IPv4 (l2tp_ip). This patch introduces support for
L2TP IP encapsulation over IPv6.

The implementation is derived from ipv6/raw and ipv4/l2tp_ip.
Signed-off-by: Chris Elston <celston@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a32e0eec

ipv6: Export ipv6 functions for use by other protocols · a495f836

Chris Elston authored Apr 29, 2012

For implementing other protocols on top of IPv6, such as L2TPv3's IP
encapsulation over ipv6, we'd like to call some IPv6 functions which
are not currently exported. This patch exports them.
Signed-off-by: Chris Elston <celston@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a495f836

l2tp: netlink api for l2tpv3 ipv6 unmanaged tunnels · f9bac8df

Chris Elston authored Apr 29, 2012

This patch adds support for unmanaged L2TPv3 tunnels over IPv6 using
the netlink API. We already support unmanaged L2TPv3 tunnels over
IPv4. A patch to iproute2 to make use of this feature will be
submitted separately.
Signed-off-by: Chris Elston <celston@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f9bac8df

l2tp: show IPv6 addresses in l2tp debugfs file · 2121c3f5

Chris Elston authored Apr 29, 2012

If an L2TP tunnel uses IPv6, make sure the l2tp debugfs file shows the
IPv6 address correctly.
Signed-off-by: Chris Elston <celston@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2121c3f5

l2tp: pppol2tp_connect() handles ipv6 sockaddr variants · b79585f5

James Chapman authored Apr 29, 2012

Userspace uses connect() to associate a pppol2tp socket with a tunnel
socket. This needs to allow the caller to supply the new IPv6
sockaddr_pppol2tp structures if IPv6 is used.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b79585f5

pppox: Replace __attribute__((packed)) in if_pppox.h · 9d4ec1ae

James Chapman authored Apr 29, 2012

Checkpatch warns about the use of __attribute__((packed)). So use the
recommended __packed syntax instead.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9d4ec1ae

l2tp: remove unused stats from l2tp_ip socket · c8657fd5

James Chapman authored Apr 29, 2012

The l2tp_ip socket currently maintains packet/byte stats in its
private socket structure. But these counters aren't exposed to
userspace and so serve no purpose. The counters were also
smp-unsafe. So this patch just gets rid of the stats.

While here, change a couple of internal __u32 variables to u32.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c8657fd5

l2tp: Use ip4_datagram_connect() in l2tp_ip_connect() · de3c7a18

James Chapman authored Apr 29, 2012

Cleanup the l2tp_ip code to make use of an existing ipv4 support function.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

de3c7a18

l2tp: fix locking of 64-bit counters for smp · 5de7aee5

James Chapman authored Apr 29, 2012

L2TP uses 64-bit counters but since these are not updated atomically,
we need to make them safe for smp. This patch addresses that.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5de7aee5

atl1c: remove PHY polling from atl1c_change_mtu · 80bcb423

Huang, Xiong authored Apr 30, 2012

PHY polling code for FPGA is considered in every MDIO R/W API.
no need to add additional code to atl1c_change_mtu.
Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: David Liu <dwliu@qca.qaulcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

80bcb423

atl1c: Disable L0S when no cable link · 4fc36352

Huang, Xiong authored Apr 30, 2012

L0S might be unstable if no cable link, only enable it when link up.
Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4fc36352

atl1c: do MAC-reset when PHY link down · 5e5c0964

Huang, Xiong authored Apr 30, 2012

There may be tx-skbs still pending in HW when PHY link down.
Reset MAC will make the DMA engine go to the start point.
and release all pending skbs.
Note: Reset MAC will clear any interrupt status and mask.
Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5e5c0964

atl1c: cancel task when interface closed · 0aa76ce3

Huang, Xiong authored Apr 30, 2012

common_task might be running while close routine is called,
wait/cancel it.
Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0aa76ce3

atl1c: enlarge L1 response waiting timer · f56fa567

Huang, Xiong authored Apr 30, 2012

The hardware incorrectly process L0S/L1 entrance if the chipset/root
response after specific/shorter timer and cause system hang.
Enlarge the timeout value to avoid this issue.
Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f56fa567

atl1c: refine mac address related code · 229e6b6e

Huang, Xiong authored Apr 30, 2012

On some platform with EEPROM/OTP existing, the BIOS could overwrite
a new MAC address for the NIC. so, the permanent mac address should
be from BIOS. the address is restored when driver removing.
Voltage raising isn't applicable for l1d.
Replace swab32 with htonl for big/little endian platform.
related Registers are refined as well.
Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

229e6b6e

atl1c: remove code of closing register writable attribution · e1192580

Huang, Xiong authored Apr 30, 2012

The Close-action is done by atl1c_reset_pcie, remove it from
atl1c_get_permanent_address.
Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1192580

atl1c: clear WoL status when reset pcie · 87eabe6b

Huang, Xiong authored Apr 30, 2012

WoL status is read-clear and should be cleared when in S0
status.
putting it in atl1c_reset_pcie is more suitable than
in atl1c_get_permanent_address.
Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

87eabe6b

atl1c: add PHY link event(up/down) patch · 903d7ce0

Huang, Xiong authored Apr 30, 2012

On some platforms the PHY settings need to change depending on the
cable link status to get better stability.
Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

903d7ce0

atl1c: add workaround for issue of bit INTX-disable for MSI interrupt · 7cb6a291

Huang, Xiong authored Apr 30, 2012

All supported devices have one issue that msi interrupt doesn't assert
if pci command register bit (PCI_COMMAND_INTX_DISABLE) is set.
Add workaround in drivers/pci/quirks.c
Signed-off-by: xiong <xiong@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7cb6a291

Merge branch 'tipc_net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux · b6d151bb
David S. Miller authored Apr 30, 2012

b6d151bb

bnx2x: remove some bloat · 1191cb83

Eric Dumazet authored Apr 27, 2012

Before doing skb->head_frag work on bnx2x driver, I found too much stuff
was inlined in bnx2x/bnx2x_cmn.h for no good reason and made my work not
very easy.

Move some big functions out of this include file to the respective .c
file.

A lot of inline keywords are not needed at all in this huge driver.

   text	   data	    bss	    dec	    hex	filename
 490083	   1270	     56	 491409	  77f91	bnx2x/bnx2x.ko.before
 484206	   1270	     56	 485532	  7689c	bnx2x/bnx2x.ko
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Eilon Greenstein <eilong@broadcom.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Matt Carlson <mcarlson@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1191cb83

pch_gbe: reprogram multicast address register on reset · d344c4f3

RongQing.Li authored Apr 27, 2012

The reset logic after a Rx FIFO overrun will clear the programmed
multicast addresses. This patch fixes the issue by reprogramming the
registers after the reset.

The commit eefc48b0 ("pch_gbe: reprogram multicast address register on
reset") tried to fix this problem, but it introduces unnecessary
codes. In fact, all multicast addresses have been saved in netdev->mc,
So we can call pch_gbe_set_multi() directly after reset_hw and
reset_rx.

This commit kills 50+ line codes

Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Takahiro Shimizu <tshimizu818@gmail.com>
Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d344c4f3

net: makes skb_splice_bits() aware of skb->head_frag · 1d0c0b32

Eric Dumazet authored Apr 27, 2012

__skb_splice_bits() can check if skb to be spliced has its skb->head
mapped to a page fragment, instead of a kmalloc() area.

If so we can avoid a copy of the skb head and get a reference on
underlying page.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Matt Carlson <mcarlson@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1d0c0b32

tcp: makes tcp_try_coalesce aware of skb->head_frag · 329033f6

Eric Dumazet authored Apr 27, 2012

TCP coalesce can check if skb to be merged has its skb->head mapped to a
page fragment, instead of a kmalloc() area.

We had to disable coalescing in this case, for performance reasons.

We 'upgrade' skb->head as a fragment in itself.

This reduces number of cache misses when user makes its copies, since a
less sk_buff are fetched.

This makes receive and ofo queues shorter and thus reduce cache line
misses in TCP stack.

This is a followup of patch "net: allow skb->head to be a page fragment"

Tested with tg3 nic, with GRO on or off. We can see "TCPRcvCoalesce"
counter being incremented.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Matt Carlson <mcarlson@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

329033f6

net: make GRO aware of skb->head_frag · d7e8883c

Eric Dumazet authored Apr 30, 2012

GRO can check if skb to be merged has its skb->head mapped to a page
fragment, instead of a kmalloc() area.

We 'upgrade' skb->head as a fragment in itself

This avoids the frag_list fallback, and permits to build true GRO skb
(one sk_buff and up to 16 fragments), using less memory.

This reduces number of cache misses when user makes its copy, since a
single sk_buff is fetched.

This is a followup of patch "net: allow skb->head to be a page fragment"
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Matt Carlson <mcarlson@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d7e8883c