Commits · 887c95cc1da53f66a5890fdeab13414613010097 · Kirill Smelkov / linux

17 Jan, 2013 18 commits

ipv6: Complete neighbour entry removal from dst_entry. · 887c95cc

YOSHIFUJI Hideaki / 吉藤英明 authored Jan 17, 2013

CC: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

887c95cc

ipv6: Do not depend on rt->n in ip6_finish_output2(). · 6fd6ce20

YOSHIFUJI Hideaki / 吉藤英明 authored Jan 17, 2013

If neigh is not found, create new one.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

6fd6ce20

ipv6: Do not depend on rt->n in ip6_dst_lookup_tail(). · 707be1ff
YOSHIFUJI Hideaki / 吉藤英明 authored Jan 17, 2013
```
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
707be1ff

ipv6: Introduce rt6_nexthop() to select nexthop address. · 9bb5a148

YOSHIFUJI Hideaki / 吉藤英明 authored Jan 17, 2013

For RTF_GATEWAY route, return rt->rt6i_gateway.
Otherwise, return 2nd argument (destination address).

This will be used by following patches which remove rt->n
dependency patches in ip6_dst_lookup_tail() and ip6_finish_output2().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

9bb5a148

ipv6: Do not depend on rt->n in rt6_probe(). · 2152caea

YOSHIFUJI Hideaki / 吉藤英明 authored Jan 17, 2013

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

2152caea

ipv6: Do not depend on rt->n in rt6_check_neigh(). · 145a3621

YOSHIFUJI Hideaki / 吉藤英明 authored Jan 17, 2013

CC: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

145a3621

ipv6: Do not depend on rt->n in ip6_pol_route(). · c440f160
YOSHIFUJI Hideaki / 吉藤英明 authored Jan 17, 2013
```
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
c440f160

ndisc: Introduce __ipv6_neigh_lookup_noref(). · ac3175fe

YOSHIFUJI Hideaki / 吉藤英明 authored Jan 17, 2013

This function, which looks up neighbour entry for an IPv6 address
without touching refcnt, will be used for patches to remove
dependency on rt->n (neighbour entry in rt6_info).
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ac3175fe

ipv6 route: Dump gateway based on RTF_GATEWAY flag and rt->rt6i_gateway. · dd0cbf29
YOSHIFUJI Hideaki / 吉藤英明 authored Jan 17, 2013
```
Do not depend on rt->n.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
dd0cbf29

ndisc: Remove tbl argument for __ipv6_neigh_lookup(). · 8e022ee6

YOSHIFUJI Hideaki / 吉藤英明 authored Jan 17, 2013

We can refer to nd_tbl directly.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

8e022ee6

ndisc: Update neigh->updated with write lock. · 7ff74a59

YOSHIFUJI Hideaki / 吉藤英明 authored Jan 17, 2013

neigh->nud_state and neigh->updated are under protection of
neigh->lock.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

7ff74a59

bnx2x: fix GRO parameters · cbf1de72

Yuval Mintz authored Jan 17, 2013

bnx2x does an internal GRO pass but doesn't provide gso_segs, thus
breaking qdisc_pkt_len_init() in case ingress qdisc is used.

We store gso_segs in NAPI_GRO_CB(skb)->count, where tcp_gro_complete()
expects to find the number of aggregated segments.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cbf1de72

cxgb3: Fix Tx csum stats · bc6c47b5

Vipul Pandya authored Jan 16, 2013

Signed-off-by: Jay Hernandez <jay@chelsio.com>
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bc6c47b5

ipv6: fix ipv6_prefix_equal64_half mask conversion · 512613d7

Fabio Baltieri authored Jan 16, 2013

Fix the 64bit optimized version of ipv6_prefix_equal to convert the
bitmask to network byte order only after the bit-shift.

The bug was introduced in:

38675170 ipv6: 64bit version of ipv6_prefix_equal().
Signed-off-by: Fabio Baltieri <fabio.baltieri@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

512613d7

net: increase fragment memory usage limits · c2a93660

Jesper Dangaard Brouer authored Jan 15, 2013

Increase the amount of memory usage limits for incomplete
IP fragments.

Arguing for new thresh high/low values:

 High threshold = 4 MBytes
 Low  threshold = 3 MBytes

The fragmentation memory accounting code, tries to account for the
real memory usage, by measuring both the size of frag queue struct
(inet_frag_queue (ipv4:ipq/ipv6:frag_queue)) and the SKB's truesize.

We want to be able to handle/hold-on-to enough fragments, to ensure
good performance, without causing incomplete fragments to hurt
scalability, by causing the number of inet_frag_queue to grow too much
(resulting longer searches for frag queues).

For IPv4, how much memory does the largest frag consume.

Maximum size fragment is 64K, which is approx 44 fragments with
MTU(1500) sized packets. Sizeof(struct ipq) is 200.  A 1500 byte
packet results in a truesize of 2944 (not 2048 as I first assumed)

  (44*2944)+200 = 129736 bytes

The current default high thresh of 262144 bytes, is obviously
problematic, as only two 64K fragments can fit in the queue at the
same time.

How many 64K fragment can we fit into 4 MBytes:

  4*2^20/((44*2944)+200) = 32.34 fragment in queues

An attacker could send a separate/distinct fake fragment packets per
queue, causing us to allocate one inet_frag_queue per packet, and thus
attacking the hash table and its lists.

How many frag queue do we need to store, and given a current hash size
of 64, what is the average list length.

Using one MTU sized fragment per inet_frag_queue, each consuming
(2944+200) 3144 bytes.

  4*2^20/(2944+200) = 1334 frag queues -> 21 avg list length

An attack could send small fragments, the smallest packet I could send
resulted in a truesize of 896 bytes (I'm a little surprised by this).

  4*2^20/(896+200)  = 3827 frag queues -> 59 avg list length

When increasing these number, we also need to followup with
improvements, that is going to help scalability.  Simply increasing
the hash size, is not enough as the current implementation does not
have a per hash bucket locking.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c2a93660

sk-filter: Add ability to lock a socket filter program · d59577b6

Vincent Bernat authored Jan 16, 2013

While a privileged program can open a raw socket, attach some
restrictive filter and drop its privileges (or send the socket to an
unprivileged program through some Unix socket), the filter can still
be removed or modified by the unprivileged program. This commit adds a
socket option to lock the filter (SO_LOCK_FILTER) preventing any
modification of a socket filter program.

This is similar to OpenBSD BIOCLOCK ioctl on bpf sockets, except even
root is not allowed change/drop the filter.

The state of the lock can be read with getsockopt(). No error is
triggered if the state is not changed. -EPERM is returned when a user
tries to remove the lock or to change/remove the filter while the lock
is active. The check is done directly in sk_attach_filter() and
sk_detach_filter() and does not affect only setsockopt() syscall.
Signed-off-by: Vincent Bernat <bernat@luffy.cx>
Signed-off-by: David S. Miller <davem@davemloft.net>

d59577b6

netpoll: fix a missing dev refcounting · 5bd30d39

Cong Wang authored Jan 17, 2013

__dev_get_by_name() doesn't refcount the network device,
so we have to do this by ourselves. Noticed by Eric.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5bd30d39

ipv6: Fix endianess warning in ip6_flow_hdr(). · 07f623d3

YOSHIFUJI Hideaki authored Jan 17, 2013

Commit 3e4e4c1f ("ipv6: Introduce ip6_flow_hdr() to fill version,
tclass and flowlabel.) uses ntohl(), which should be htonl().

Found by Fengguang Wu <fengguang.wu@intel.com>.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

07f623d3

16 Jan, 2013 22 commits

r8169: remove unneeded dirty_rx index · 9fba0812

Timo Teräs authored Jan 15, 2013

After commit 6f0333b8 ("r8169: use 50% less ram for RX ring") the rx
ring buffers are always copied making dirty_rx useless.
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9fba0812

netpoll: fix a rtnl lock assertion failure · f92d3180

Cong Wang authored Jan 14, 2013

v4: hold rtnl lock for the whole netpoll_setup()
v3: remove the comment
v2: use RCU read lock

This patch fixes the following warning:

[   72.013864] RTNL: assertion failed at net/core/dev.c (4955)
[   72.017758] Pid: 668, comm: netpoll-prep-v6 Not tainted 3.8.0-rc1+ #474
[   72.019582] Call Trace:
[   72.020295]  [<ffffffff8176653d>] netdev_master_upper_dev_get+0x35/0x58
[   72.022545]  [<ffffffff81784edd>] netpoll_setup+0x61/0x340
[   72.024846]  [<ffffffff815d837e>] store_enabled+0x82/0xc3
[   72.027466]  [<ffffffff815d7e51>] netconsole_target_attr_store+0x35/0x37
[   72.029348]  [<ffffffff811c3479>] configfs_write_file+0xe2/0x10c
[   72.030959]  [<ffffffff8115d239>] vfs_write+0xaf/0xf6
[   72.032359]  [<ffffffff81978a05>] ? sysret_check+0x22/0x5d
[   72.033824]  [<ffffffff8115d453>] sys_write+0x5c/0x84
[   72.035328]  [<ffffffff819789d9>] system_call_fastpath+0x16/0x1b

In case of other races, hold rtnl lock for the entire netpoll_setup() function.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f92d3180

vmxnet3: better RSS support · 7db11f75

Stephen Hemminger authored Jan 15, 2013

The VMXNET3 device provides RSS hash value for received packets,
but it is not being used.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

7db11f75

vmxnet3: use static RSS key · 66d35910

Stephen Hemminger authored Jan 15, 2013

Rather than generating a different RSS key on each boot, just use
a predetermined value that will map same flow to same value on
every device for more predictable testing. This is already done
on most hardware drivers.

Initial key value just some arbitrary bits extracted once
from /dev/random.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

66d35910

vmxnet3: remove unused irq_share_mode · 4db37a78

Stephen Hemminger authored Jan 15, 2013

This static variable is never set, it initializes to 0 which
is VMXNET3_INTR_BUDDYSHARE, and never changes.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

4db37a78

vmxnet3: remove device counter · f32a2605

Stephen Hemminger authored Jan 15, 2013

An atomic counter of devices present is maintained but never used.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

f32a2605

vmxnet3: remove VMXNET3_MAX_DEVICES · 4816a072

Stephen Hemminger authored Jan 15, 2013

Defined but never used.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

4816a072

vmxnet3: use netdev_ printk wrappers · 204a6e65

Stephen Hemminger authored Jan 15, 2013

Use the standard netdev_xxx() and dev_xxx() wrappers to format
log messages.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

204a6e65

vmxnet3: use netdev_dbg · fdcd79b9

Stephen Hemminger authored Jan 15, 2013

Use netdev_dbg() rather than dev_dbg() because the former prints
the device name which is more useful than the pci name.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

fdcd79b9

vmxnet3: fix messages printed before registration · 4bad25fa

Stephen Hemminger authored Jan 15, 2013

This messages that occur during boot time from this device
when netdev_err is called before calling register_netdevice().
Switch to using dev_XXX macros which correlate message with PCI info which
is available.

Rather than fixing the features message, just remove it since
the information is redundant and available through ethtool.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

4bad25fa

vmxnet3: remove unnecessary bookkeeping · 69b9a712

Stephen Hemminger authored Jan 15, 2013

The uncommitted[] array was set but never used except in a debug
message. Remove it.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

69b9a712

vmxnet3: use netdev_alloc_skb_ip_align · 0d735f13

Stephen Hemminger authored Jan 15, 2013

Use netdev_alloc_skb_align, rather than open code using dev_alloc_skb.
Change allocation at startup to use GFP_KERNEL.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

0d735f13

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next · 8c174e6f

David S. Miller authored Jan 16, 2013

Jeff Kirsher says:

====================
This series contains updates to e1000e only.

v2- updates patch 09/15 "e1000e: resolve checkpatch PREFER_PR_LEVEL warning"
    based on feedback from Joe Perches.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

8c174e6f

e1000e: merge multiple conditional statements into one · d60923c4

Bruce Allan authored Dec 05, 2012

Cleanup a set of conditional tests.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

d60923c4

e1000e: cleanup code duplication · e3d14b08

Bruce Allan authored Dec 05, 2012

The removed code block is duplicated in e1000e_write_itr() so use that
instead.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e3d14b08

e1000e: cleanup magic number · 3a3104e7

Bruce Allan authored Dec 05, 2012

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

3a3104e7

e1000e: cleanup unnecessary line wrap · 1860ac84

Bruce Allan authored Dec 05, 2012

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

1860ac84

e1000e: cleanup unusual comment placement · 2a2293b9

Bruce Allan authored Dec 05, 2012

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

2a2293b9

e1000e: cleanup redundant statistics counter · 0a939912

Bruce Allan authored Dec 05, 2012

rx_long_byte_count can be removed since it is duplicated in rx_bytes
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

0a939912

e1000e: resolve checkpatch PREFER_PR_LEVEL warning · 7dbc1672

Bruce Allan authored Jan 12, 2013

WARNING: Prefer netdev_info(netdev, ... then dev_info(dev, ...
then pr_info(...  to printk(KERN_INFO ...

v2 - remove unnecessary "e1000e:" prefix as pointed out by Joe Perches
     since that produces a redundant "e1000e:" in the log message

Cc: Joe Perches <joe@perches.com>
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

7dbc1672

e1000e: add missing bailout on error · 8e5ab42d

Bruce Allan authored Dec 05, 2012

...discovered during code inspection.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

8e5ab42d

e1000e: unexpected "Reset adapter" message when cable pulled · 12d43f7d

Bruce Allan authored Dec 05, 2012

When there is heavy traffic and the cable is pulled, the driver must reset
the adapter to flush the Tx queue in hardware. This causes the reset path
to be scheduled and logs the message "Reset adapter" which could be mis-
interpreted as an error by the user. Change how the reset path is invoked
for this scenario by using the same method done in an existing work-around
for 80003es2lan (i.e. set a flag and if the flag is set in the reset code
do not log the "Reset adapter" message since the reset is expected).

Re-name the FLAG_RX_RESTART_NOW to FLAG_RESTART_NOW since it is used for
resets in both the Rx and Tx specific code.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

12d43f7d