Commits · fe62d001372388abb15a324148c913f9b43722a8 · Kirill Smelkov / linux

03 Jun, 2014 1 commit

ethtool: Replace ethtool_ops::{get,set}_rxfh_indir() with {get,set}_rxfh() · fe62d001

Ben Hutchings authored May 15, 2014

ETHTOOL_{G,S}RXFHINDIR and ETHTOOL_{G,S}RSSH should work for drivers
regardless of whether they expose the hash key, unless you try to
set a hash key for a driver that doesn't expose it.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

fe62d001

19 May, 2014 6 commits

ethtool, be2net: constify array pointer parameters to ethtool_ops::set_rxfh · 33cb0fa7
Ben Hutchings authored May 15, 2014
```
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
```
33cb0fa7

ethtool: Disallow ETHTOOL_SRSSH with both indir table and hash key unchanged · 61d88c68

Ben Hutchings authored May 19, 2014

This would be a no-op, so there is no reason to request it.

This also allows conversion of the current implementations of
ethtool_ops::{get,set}_rxfh_indir to ethtool_ops::{get,set}_rxfh
with no change other than their parameters.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

61d88c68

ethtool: Expand documentation of ethtool_ops::{get,set}_rxfh() · 678e30df
Ben Hutchings authored May 19, 2014
```
Some corner-cases are not explained properly.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
```
678e30df

ethtool: Improve explanation of the two arrays following struct ethtool_rxfh · 38c891a4

Ben Hutchings authored May 15, 2014

The use of two variable-length arrays is unusual so deserves a bit
more explanation.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

38c891a4

ethtool: Name the 'no change' value for setting RSS hash key but not indir table · 7455fa24

Ben Hutchings authored May 15, 2014

We usually allocate special values of u32 fields starting from the top
down, so also change the value to 0xffffffff.  As these operations
haven't been included in a stable release yet, it's not too late to
change.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

7455fa24

ethtool: Return immediately on error in ethtool_copy_validate_indir() · fb95cd8d

Ben Hutchings authored May 15, 2014

We must return -EFAULT immediately rather than continuing into
the loop.

Similarly, we may as well return -EINVAL directly.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

fb95cd8d

14 May, 2014 27 commits

driver/net/ethernet/ec_bhf.c: fix sparse warnings · eb02a272

Darek Marcinkiewicz authored May 14, 2014

Sparse was reporting quite a few warnings for the driver.
Those get fixed by this patch.
Signed-off-by: Dariusz Marcinkiewicz <reksio@newterm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>

eb02a272

net: Use a more standard macro for INET_ADDR_COOKIE · c7228317

Joe Perches authored May 13, 2014

Missing a colon on definition use is a bit odd so
change the macro for the 32 bit case to declare an
__attribute__((unused)) and __deprecated variable.

The __deprecated attribute will cause gcc to emit
an error if the variable is actually used.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c7228317

net: systemport: Use devm_ioremap_resource() · 126e6122

Jingoo Han authored May 14, 2014

Use devm_ioremap_resource() because devm_request_and_ioremap() is
obsoleted by devm_ioremap_resource().
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

126e6122

Merge branch 'mlx4-next' · 005e35f5

David S. Miller authored May 14, 2014

Or Gerlitz says:

====================
Mellanox driver update 2014-05-12

This patchset introduce some small bug fixes:

Eyal fixed some compilation and syntactic checkers warnings. Ido fixed a
coruption in user priority mapping when changing number of channels. Shani
fixed some other problems when modifying MAC address. Yuval fixed a problem
when changing IRQ affinity during high traffic - IRQ changes got effective
only after the first pause in traffic.

This patchset was tested and applied over commit 93dccc59: "mdio_bus: fix
devm_mdiobus_alloc_size export"

Changes from V1:
- applied feedback from Dave to use true/false and not 0/1 in patch 1/9
- removed the patch from Noa which adddressed a bug in flow steering table
  when using a bond device, as the fix might need to be in the bonding driver,
  this is now dicussed in the netdev thread "bonding directly changes
  underlying device address"

Changes from V0:
- Patch 1/9 - net/mlx4_core: Enforce irq affinity changes immediatly
  - Moved the new members to a hot cache line as Eric suggested
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

005e35f5

net/mlx4_core: Fix inaccurate return value of mlx4_flow_attach() · 75720384

Eyal Perry authored May 14, 2014

Adopt the "info: why not propagate 'ret' from parse_trans_rule()..."
suggestion made by the smatch semantic checker on:
drivers/net/ethernet/mellanox/mlx4/mcg.c:867 mlx4_flow_attach()
Signed-off-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

75720384

net/mlx4_en: Using positive error value for unsigned · c3ca5205

Eyal Perry authored May 14, 2014

Using a positive value for error: MLX4_NET_TRANS_RULE_NUM instead
of -EPROTONOSUPPORT, to remove compilation warning.
Signed-off-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c3ca5205

net/mlx4_en: Protect MAC address modification with the state_lock mutex · fe1ff29d

Shani Michaelli authored May 14, 2014

This Patches solves an issue that could raise when modifying the
device's MAC. It occurs due to a simultaneous access to priv->mac_hash
from two contexts. The buggy scenario described below:
Context 1: copy the new mac address to the dev->dev_addr field.
Context 2: mlx4_en_do_uc_filter removes prev_mac entry from the mac_hash
           db since it is not in dev->uc and not equal to dev->dev_addr.
Context 1: mlx4_en_do_set_mac() calls mlx4_en_replace_mac() to replace
           prev_mac with dev_addr but it fails to update the mac_hash db
           since it no longer contains prev_mac, therefore it returns
           with an error.

The fix is to prevent mlx4_en_do_uc_filter from being executed by both
of the context 1 calls described above, This is done by putting them
both under the mdev->state_lock lock, it will solve this issue since
mlx4_en_do_uc_filter is already protected by the mdev->state_lock.
Reviewed-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fe1ff29d

net/mlx4_core: Removed unnecessary bit operation condition · 483e0132

Eyal Perry authored May 14, 2014

Fix the "warn: suspicious bitop condition" made by the smatch semantic
checker on:
drivers/net/ethernet/mellanox/mlx4/main.c:509 mlx4_slave_cap()
Signed-off-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

483e0132

net/mlx4_core: Fix smatch error - possible access to a null variable · c05a116f

Eyal Perry authored May 14, 2014

Fix the "error: we previously assumed 'out_param' could be null" found
by smatch semantic checker on:
drivers/net/ethernet/mellanox/mlx4/cmd.c:506 mlx4_cmd_poll()
drivers/net/ethernet/mellanox/mlx4/cmd.c:578 mlx4_cmd_wait()
Signed-off-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c05a116f

net/mlx4_en: Fix errors in MAC address changing when port is down · ee755324

Shani Michaelli authored May 14, 2014

This patch fix an issue that happen when changing the MAC address when
the port is down, described as follows:
1. Set the port down.
2. Change the MAC address - mlx4_en_set_mac() will change dev->dev_addr.
3. Set the port up - will result in mlx4_en_do_uc_filter that will
   remove the prev_mac entry from the mac_hash db.
4. Changing the MAC address again will eventually trigger the call to
   mlx4_en_replace_mac() in order to replace prev_mac with dev_addr but
   the prev_mac entry is already not exist in the mac_hash db therefore
   the operation fails.

The fix is to set the prev_mac with the new MAC address so in step 3
above, after setting the port up mlx4_en_get_qp() is updating the
mac_hash with the entry of dev_addr which is equal to prev_mac.
Therefore in step 4, when calling mlx4_en_replace_mac, the entry related
to prev_mac exist in mac_hash and the replace operation succeed.
Reviewed-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ee755324

net/mlx4_en: User prio mapping gets corrupted when changing number of channels · f5b6345b

Ido Shamay authored May 14, 2014

When using ethtool set_channels, mlx4_en_setup_tc is always called, even
when it was not configured. Fixed code to call mlx4_en_setup_tc() only
if needed.
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f5b6345b

net/mlx4_core: Enforce irq affinity changes immediatly · 2eacc23c

Yuval Atias authored May 14, 2014

During heavy traffic, napi is constatntly polling the complition queue
and no interrupt is fired. Because of that, changes to irq affinity are
ignored until traffic is stopped and resumed.

By registering to the irq notifier mechanism, and forcing interrupt when
affinity is changed, irq affinity changes will be immediatly enforced.
Signed-off-by: Yuval Atias <yuvala@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2eacc23c

macvlan: Propagate lowerdev MTU changes · 3763e7ef

dingtianhong authored May 13, 2014

When the physical MTU changes we should ensure that all existing MACVLAN
dev MTU do not exceed the new lowerdev MTU. This patch adds that
propagation.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

3763e7ef

dccp: make the request_retries minimum is 1 · 8ba7e7bf

wangweidong authored May 13, 2014

In Documentation/networking/dccp.txt points that request_retries
should be greater than 0. So make the extra1 to be &one instead
of &zero.
Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

8ba7e7bf

snmp: fix some left over of snmp stats · c9f2dba6

WANG Cong authored May 12, 2014

Fengguang reported the following sparse warning:

>> net/ipv6/proc.c:198:41: sparse: incorrect type in argument 1 (different address spaces)
   net/ipv6/proc.c:198:41:    expected void [noderef] <asn:3>*mib
   net/ipv6/proc.c:198:41:    got void [noderef] <asn:3>**pcpumib

Fixes: commit 698365fa (net: clean up snmp stats code)
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c9f2dba6

ipv4: make ip_local_reserved_ports per netns · 122ff243

WANG Cong authored May 12, 2014

ip_local_port_range is already per netns, so should ip_local_reserved_ports
be. And since it is none by default we don't actually need it when we don't
enable CONFIG_SYSCTL.

By the way, rename inet_is_reserved_local_port() to inet_is_local_reserved_port()

Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

122ff243

irda: sh_irda: Enable driver compilation with COMPILE_TEST · 9cc5e36d

Laurent Pinchart authored May 13, 2014

This helps increasing build testing coverage.
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

9cc5e36d

Merge branch 'tipc-next' · d6cc76d3

David S. Miller authored May 14, 2014

Jon Maloy says:

====================
tipc: bug fixes and improvements

Intensive and extensive testing has revealed some rather infrequent
problems related to flow control, buffer handling and link
establishment. Commits ##1 to 4 deal with these problems.

The remaining four commits are just code improvments, aiming at
making the code more comprehensible and maintainable. There are
no functional enhancements in this series.

v2: Fixed a typo in commit log #2. Otherwise no changes from v1.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

d6cc76d3

tipc: merge port message reception into socket reception function · 9816f061

Jon Paul Maloy authored May 14, 2014

In order to reduce complexity and save a call level during message
reception at port/socket level, we remove the function tipc_port_rcv()
and merge its functionality into tipc_sk_rcv().
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9816f061

tipc: clean up neigbor discovery message reception · c82910e2

Jon Paul Maloy authored May 14, 2014

The function tipc_disc_rcv(), which is handling received neighbor
discovery messages, is perceived as messy, and it is hard to verify
its correctness by code inspection. The fact that the task it is set
to resolve is fairly complex does not make the situation better.

In this commit we try to take a more systematic approach to the
problem. We define a decision machine which takes three state flags
as input, and produces three action flags as output. We then walk
through all permutations of the state flags, and for each of them we
describe verbally what is going on, plus that we set zero or more of
the action flags. The action flags indicate what should be done once
the decision machine has finished its job, while the last part of the
function deals with performing those actions.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c82910e2

tipc: improve and extend media address conversion functions · 38504c28

Jon Paul Maloy authored May 14, 2014

TIPC currently handles two media specific addresses: Ethernet MAC
addresses and InfiniBand addresses. Those are kept in three different
formats:

1) A "raw" format as obtained from the device. This format is known
   only by the media specific adapter code in eth_media.c and
   ib_media.c.
2) A "generic" internal format, in the form of struct tipc_media_addr,
   which can be referenced and passed around by the generic media-
   unaware code.
3) A serialized version of the latter, to be conveyed in neighbor
   discovery messages.

Conversion between the three formats can only be done by the media
specific code, so we have function pointers for this purpose in
struct tipc_media. Here, the media adapters can install their own
conversion functions at startup.

We now introduce a new such function, 'raw2addr()', whose purpose
is to convert from format 1 to format 2 above. We also try to as far
as possible uniform commenting, variable names and usage of these
functions, with the purpose of making them more comprehensible.

We can now also remove the function tipc_l2_media_addr_set(), whose
job is done better by the new function.

Finally, we expand the field for serialized addresses (format 3)
in discovery messages from 20 to 32 bytes. This is permitted
according to the spec, and reduces the risk of problems when we
add new media in the future.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

38504c28

tipc: rename and move message reassembly function · 37e22164

Jon Paul Maloy authored May 14, 2014

The function tipc_link_frag_rcv() is in reality a re-entrant generic
message reassemby function that has nothing in particular to do with
the link, where it is defined now. This becomes obvious when we see
the need to call the function from other places in the code.

In this commit rename it to tipc_buf_append() and move it to the file
msg.c. We also simplify its signature by moving the tail pointer to
the control block of the head buffer, hence making the head buffer
self-contained.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

37e22164

tipc: mark head of reassembly buffer as non-linear · 5074ab89

Jon Paul Maloy authored May 14, 2014

The message reassembly function does not update the 'len' and 'data_len'
fields of the head skbuff correctly when fragments are chained to it.
This may sometimes lead to obsure errors, such as fragment reordering
when we receive fragments which are cloned buffers.

This commit fixes this, by ensuring that the two fields are updated
correctly.
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5074ab89

tipc: don't record link RESET or ACTIVATE messages as traffic · ec37dcd3

Jon Paul Maloy authored May 14, 2014

In the current code, all incoming LINK_PROTOCOL messages, irrespective
of type, nudge the "last message received" checkpoint, informing the
link state machine that a message was received from the peer since last
supervision timeout event. This inhibits the link from starting probing
the peer unnecessarily.

However, not only STATE messages are recorded as legitimate incoming
traffic this way, but even RESET and ACTIVATE messages, which in
reality are there to inform the link that the peer endpoint has been
reset. At the same time, some RESET messages may be dropped instead
of causing a link reset. This happens when the link endpoint thinks
it is fully up and working, and the session number of the RESET is
lower than or equal to the current link session. In such cases the
RESET is perceived as a delayed remnant from an earlier session, or
the current one, and dropped.

Now, if a TIPC module is removed and then immediately reinserted, e.g.
when using a script, RESET messages may arrive at the peer link endpoint
before this one has had time to discover the failure. The RESET may be
dropped because of the session number, but only after it has been
recorded as a legitimate traffic event. Hence, the receiving link will
not start probing, and not discover that the peer endpoint is down, at
the same time ignoring the periodic RESET messages coming from that
endpoint. We have ended up in a stale state where a failed link cannot
be re-established.

In this commit, we remedy this by nudging the checkpoint only for
received STATE messages, not for RESET or ACTIVATE messages.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ec37dcd3

tipc: compensate for double accounting in socket rcv buffer · 4f4482dc

Jon Paul Maloy authored May 14, 2014

The function net/core/sock.c::__release_sock() runs a tight loop
to move buffers from the socket backlog queue to the receive queue.

As a security measure, sk_backlog.len of the receiving socket
is not set to zero until after the loop is finished, i.e., until
the whole backlog queue has been transferred to the receive queue.
During this transfer, the data that has already been moved is counted
both in the backlog queue and the receive queue, hence giving an
incorrect picture of the available queue space for new arriving buffers.

This leads to unnecessary rejection of buffers by sk_add_backlog(),
which in TIPC leads to unnecessarily broken connections.

In this commit, we compensate for this double accounting by adding
a counter that keeps track of it. The function socket.c::backlog_rcv()
receives buffers one by one from __release_sock(), and adds them to the
socket receive queue. If the transfer is successful, it increases a new
atomic counter 'tipc_sock::dupl_rcvcnt' with 'truesize' of the
transferred buffer. If a new buffer arrives during this transfer and
finds the socket busy (owned), we attempt to add it to the backlog.
However, when sk_add_backlog() is called, we adjust the 'limit'
parameter with the value of the new counter, so that the risk of
inadvertent rejection is eliminated.

It should be noted that this change does not invalidate the original
purpose of zeroing 'sk_backlog.len' after the full transfer. We set an
upper limit for dupl_rcvcnt, so that if a 'wild' sender (i.e., one that
doesn't respect the send window) keeps pumping in buffers to
sk_add_backlog(), he will eventually reach an upper limit,
(2 x TIPC_CONN_OVERLOAD_LIMIT). After that, no messages can be added
to the backlog, and the connection will be broken. Ordinary, well-
behaved senders will never reach this buffer limit at all.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4f4482dc

tipc: decrease connection flow control window · 6163a194

Jon Paul Maloy authored May 14, 2014

Memory overhead when allocating big buffers for data transfer may
be quite significant. E.g., truesize of a 64 KB buffer turns out
to be 132 KB, 2 x the requested size.

This invalidates the "worst case" calculation we have been
using to determine the default socket receive buffer limit,
which is based on the assumption that 1024x64KB = 67MB buffers
may be queued up on a socket.

Since TIPC connections cannot survive hitting the buffer limit,
we have to compensate for this overhead.

We do that in this commit by dividing the fix connection flow
control window from 1024 (2*512) messages to 512 (2*256). Since
older version nodes send out acks at 512 message intervals,
compatibility with such nodes is guaranteed, although performance
may be non-optimal in such cases.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6163a194

bonding: alloc the structure ad_info dynamically in per slave · 3fdddd85

dingtianhong authored May 12, 2014

The struct ad_slave_info is very huge, and only be used for 802.3ad mode,
so alloc the structure dynamically could save 356 Bits for every slave in
non 802.3ad mode.

Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Acked-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3fdddd85

13 May, 2014 6 commits

sh_eth: replace devm_kzalloc() with devm_kmalloc_array() · 86b5d251

Sergei Shtylyov authored May 13, 2014

When I was converting the driver to the managed device API, only devm_kzalloc()
was available for memory allocation, so I had to use it, despite zeroing out the
PHY IRQ array right before initializing all its entries to PHY_POLL was quite
stupid. Now that devm_kmalloc_array() has become available, we can avoid the
needless zeroing out...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

86b5d251

Merge branch 'tg3-next' · 1e1c77bf

David S. Miller authored May 13, 2014

Michael Chan says:

====================
tg3: TSO related enhancements to prevent memory allocation failure

Michael Chan (3):
  tg3: Don't modify ip header fields when doing GSO
  tg3: Prevent page allocation failure during TSO workaround
  tg3: Update copyright and version to 3.137
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

1e1c77bf

tg3: Update copyright and version to 3.137 · de750e4c

Michael Chan authored May 11, 2014

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

de750e4c

tg3: Prevent page allocation failure during TSO workaround · d3f6f3a1

Michael Chan authored May 11, 2014

If any TSO fragment hits hardware bug conditions (e.g. 4G boundary), the
driver will workaround by calling skb_copy() to copy to a linear SKB. Users
have reported page allocation failures as the TSO packet can be up to 64K.
Copying such a large packet is also very inefficient. We fix this by using
existing tg3_tso_bug() to transmit the packet using GSO.
Signed-off-by: Prashant Sreedharan <prashant@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d3f6f3a1

tg3: Don't modify ip header fields when doing GSO · d71c0dc4

Michael Chan authored May 11, 2014

tg3 uses GSO as workaround if the hardware cannot perform TSO on certain
packets.  We should not modify the ip header fields if we do GSO on the
packet.  It happens to work by accident because GSO recalculates the IP
checksum and IP total length.

Also fix the tg3_start_xmit comment to reflect that this is the only
xmit function for all devices.
Signed-off-by: Prashant Sreedharan <prashant@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d71c0dc4

Merge branch 'inet_fwmark_reflect' · b6bd26c4

David S. Miller authored May 13, 2014

Lorenzo Colitti says:

====================
Make mark-based routing work better with multiple separate networks.

Mark-based routing (ip rule fwmark 17 lookup 100) combined with
either iptables marking (iptables -j MARK --set-mark 17) or
application-based marking (the SO_MARK setsockopt) are a good
way to deal with connecting simultaneously to multiple networks.

Each network can be given a routing table, and ip rules can
be configured to make different fwmarks select different
networks. Applications can select networks them by setting
appropriate socket marks, and iptables rules can be used to
handle non-aware applications, enforce policy, etc.

This patch series improves functionality when mark-based routing
is used in this way. Current behaviour has the following
limitations:

1. Kernel-originated replies that are not associated with a
   socket always use a mark of zero. This means that, for
   example, when the kernel sends a ping reply or a TCP reset,
   it does not send it on the network from which it received the
   original packet.
2. Path MTU discovery, which is triggered by incoming packets,
   does not always work correctly, because the routing lookups it
   uses to clone routes do not take the fwmark into account and
   thus can happen in the wrong routing table.
3. Application-based marking works well for outbound connections,
   but does not work well for incoming connections. Marking a
   listening socket causes that socket to only accept
   connections from a given network, and sockets that are
   returned by accept() are not marked (and are thus not routed
   correctly).

sysctl. This causes route lookups for kernel-generated replies
and PMTUD to use the fwmark of the packet that caused them.

which causes TCP sockets returned by accept() to be marked with
the same mark that sent the intial SYN packet.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b6bd26c4