Commits · 23624935e0c4b04730ed8d7d21f0cd25b2c2cda1 · nexedi / linux

22 Jan, 2011 1 commit

net_sched: TCQ_F_CAN_BYPASS generalization · 23624935

Eric Dumazet authored Jan 21, 2011

Now qdisc stab is handled before TCQ_F_CAN_BYPASS test in
__dev_xmit_skb(), we can generalize TCQ_F_CAN_BYPASS to other qdiscs
than pfifo_fast : pfifo, bfifo, pfifo_head_drop and sfq

SFQ is special because it can have external classifiers, and in these
cases, we cannot bypass queue discipline (packet could be dropped by
classifier) without admin asking it, or further changes.

Its worth doing this, especially for SFQ, avoiding dirtying memory in
case no packets are already waiting in queue.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

23624935

21 Jan, 2011 18 commits

net: netif_setup_tc() is static · bb134d22

Eric Dumazet authored Jan 20, 2011

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bb134d22

rtnetlink: fix link attribute validation with IFLA_GROUP · ffa934f1

Patrick McHardy authored Jan 20, 2011

rtnl_group_changelink() is invoked by rtnl_newlink() before the link
attributes have been validated. Additionally the group changes are
performed even if NLM_F_CREATE is specified and a new link is
created, while more reasonable semantics would be to set the group
value on the newly created link.

Fix both problems by moving the rtnl_group_changelink() invocation
down to the handling of non-existant links without NLM_F_CREATE()
and add a dev_set_group() call to rtnl_create_link().
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Vlad Dogaru <ddvlad@rosedu.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ffa934f1

ppp: Use SKB queue abstraction interfaces in fragment processing. · d52344a7

David S. Miller authored Jan 20, 2011

No more direct references to SKB queue and list implementation
details.
Signed-off-by: David S. Miller <davem@davemloft.net>

d52344a7

net: Add safe reverse SKB queue walkers. · 686a2955
David S. Miller authored Jan 20, 2011
```
Signed-off-by: David S. Miller <davem@davemloft.net>
```
686a2955

ppp: Reconstruct fragmented packets using frag lists instead of copying. · 212bfb9e

David S. Miller authored Jan 20, 2011

[paulus@samba.org: fixed a couple of bugs]
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>

212bfb9e

ppp: Clean up kernel log messages. · b48f8c23

David S. Miller authored Jan 20, 2011

Use netdev_*() and pr_*().

To preserve existing semantics in cases where KERN_DEBUG is indeed
appropriate, use netdev_printk(KERN_DEBUG, ...)

Convert PPPIOCDETACH to pr_warn() because an unexpected file count is
a serious bug and should be logged with KERN_WARN.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>

b48f8c23

dccp: clean up unused DCCP_STATE_MASK definition · d18046b3

Shan Wei authored Jan 19, 2011

Remove unused DCCP_STATE_MASK macro.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

d18046b3

enic: Bug Fix: Dont reset ENIC_SET_APPLIED flag on port profile disassociate · 4dce2396

Roopa Prabhu authored Jan 20, 2011

enic_get_vf_port returns port profile operation status only if ENIC_SET_APPLIED
flag is set. A recent rework of enic_set_port_profile added code to reset this
flag on disassociate. As a result of which a client calling enic_get_vf_port
to get the status of port profile disassociate will always get a return value
of ENODATA. This patch renames ENIC_SET_APPLIED to more appropriate
ENIC_PORT_REQUEST_APPLIED and reverts back the recent change so that the
flag is set both at associate and disassociate of a port profile.
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David Wang <dwang2@cisco.com>
Signed-off-by: Christian Benvenuti <benve@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4dce2396

ipv6: raw: rcu annotations · f2eda47d

Eric Dumazet authored Jan 20, 2011

Remove sparse warnings, using a function typedef to be able to use __rcu
annotation on mh_filter pointer.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f2eda47d

neigh: __rcu annotations · 6193d2be

Eric Dumazet authored Jan 19, 2011

fix some minor issues and sparse (__rcu) warnings
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6193d2be

net: ipv6: sit: fix rcu annotations · 753ea8e9

Eric Dumazet authored Jan 20, 2011

Fix minor __rcu annotations and remove sparse warnings
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

753ea8e9

via-velocity: fix the WOL bug on 1000M full duplex forced mode. · 2ffa007e

françois romieu authored Jan 20, 2011

The VIA velocity card can't be waken up by WOL tool on 1000M full
duplex forced mode. This patch fixes the bug.
Signed-off-by: David Lv <DavidLv@viatech.com.cn>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2ffa007e

net_sched: RCU conversion of stab · a2da570d

Eric Dumazet authored Jan 20, 2011

This patch converts stab qdisc management to RCU, so that we can perform
the qdisc_calculate_pkt_len() call before getting qdisc lock.

This shortens the lock's held time in __dev_xmit_skb().

This permits more qdiscs to get TCQ_F_CAN_BYPASS status, avoiding lot of
cache misses and so reducing latencies.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
CC: Jesper Dangaard Brouer <hawk@diku.dk>
CC: Jarek Poplawski <jarkao2@gmail.com>
CC: Jamal Hadi Salim <hadi@cyberus.ca>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a2da570d

net_sched: move TCQ_F_THROTTLED flag · fd245a4a

Eric Dumazet authored Jan 20, 2011

In commit 37112105 (net: QDISC_STATE_RUNNING dont need atomic bit
ops) I moved QDISC_STATE_RUNNING flag to __state container, located in
the cache line containing qdisc lock and often dirtied fields.

I now move TCQ_F_THROTTLED bit too, so that we let first cache line read
mostly, and shared by all cpus. This should speedup HTB/CBQ for example.

Not using test_bit()/__clear_bit()/__test_and_set_bit allows to use an
"unsigned int" for __state container, reducing by 8 bytes Qdisc size.

Introduce helpers to hide implementation details.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
CC: Jesper Dangaard Brouer <hawk@diku.dk>
CC: Jarek Poplawski <jarkao2@gmail.com>
CC: Jamal Hadi Salim <hadi@cyberus.ca>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fd245a4a

net_sched: sfq: allow divisor to be a parameter · 817fb15d

Eric Dumazet authored Jan 20, 2011

SFQ currently uses a 1024 slots hash table, and its internal structure
(sfq_sched_data) allocation needs order-1 page on x86_64

Allow tc command to specify a divisor value (hash table size), between 1
and 65536.
If no value is provided, assume the 1024 default size.

This allows admins to setup smaller (or bigger) SFQ for specific needs.

This also brings back sfq_sched_data allocations to order-0 ones, saving
3KB per SFQ qdisc.

Jesper uses ~55.000 SFQ in one machine, this patch should free 165 MB of
memory.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
CC: Jesper Dangaard Brouer <hawk@diku.dk>
CC: Jarek Poplawski <jarkao2@gmail.com>
CC: Jamal Hadi Salim <hadi@cyberus.ca>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

817fb15d

net: dev_close_many() is static · 3fbd8758

Eric Dumazet authored Jan 19, 2011

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Octavian Purdila <opurdila@ixiacom.com>
Reviewed-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3fbd8758

atl1e: remove private #define. · ccd5c8ef

françois romieu authored Jan 20, 2011

Either unused or duplicates from mii.h.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Cc: Jay Cliburn <jcliburn@gmail.com>
Cc: Chris Snook <chris.snook@gmail.com>
Cc: Jie Yang <jie.yang@atheros.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ccd5c8ef

atl1c: remove private #define. · 34aac66c

françois romieu authored Jan 20, 2011

Either unused or duplicates from mii.h.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Cc: Jay Cliburn <jcliburn@gmail.com>
Cc: Chris Snook <chris.snook@gmail.com>
Cc: Jie Yang <jie.yang@atheros.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

34aac66c

20 Jan, 2011 21 commits

netfilter: add a missing include in nf_conntrack_reasm.c · bced94ed

Eric Dumazet authored Jan 20, 2011

After commit ae90bdea (netfilter: fix compilation when conntrack is
disabled but tproxy is enabled) we have following warnings :

net/ipv6/netfilter/nf_conntrack_reasm.c:520:16: warning: symbol
'nf_ct_frag6_gather' was not declared. Should it be static?
net/ipv6/netfilter/nf_conntrack_reasm.c:591:6: warning: symbol
'nf_ct_frag6_output' was not declared. Should it be static?
net/ipv6/netfilter/nf_conntrack_reasm.c:612:5: warning: symbol
'nf_ct_frag6_init' was not declared. Should it be static?
net/ipv6/netfilter/nf_conntrack_reasm.c:640:6: warning: symbol
'nf_ct_frag6_cleanup' was not declared. Should it be static?

Fix this including net/netfilter/ipv6/nf_defrag_ipv6.h
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>

bced94ed

netfilter: nf_conntrack: fix linker error with NF_CONNTRACK_TIMESTAMP=n · 2f1e3176

Patrick McHardy authored Jan 20, 2011

net/built-in.o: In function `nf_conntrack_init_net':
net/netfilter/nf_conntrack_core.c:1521:
	undefined reference to `nf_conntrack_tstamp_init'
net/netfilter/nf_conntrack_core.c:1531:
	undefined reference to `nf_conntrack_tstamp_fini'

Add dummy inline functions for the =n case to fix this.
Reported-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

2f1e3176

netfilter: xtables: add missing header inclusions for headers_check · 06988b06

Jan Engelhardt authored Jan 20, 2011

Resolve these warnings on `make headers_check`:

usr/include/linux/netfilter/xt_CT.h:7: found __[us]{8,16,32,64} type
without #include <linux/types.h>
...
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>

06988b06

netfilter: nf_nat: place conntrack in source hash after SNAT is done · 41a7cab6

Changli Gao authored Jan 20, 2011

If SNAT isn't done, the wrong info maybe got by the other cts.

As the filter table is after DNAT table, the packets dropped in filter
table also bother bysource hash table.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

41a7cab6

Merge branch 'connlimit' of git://dev.medozas.de/linux · 4cda47d2
Patrick McHardy authored Jan 20, 2011

4cda47d2

netfilter: xtables: remove duplicate member · ba12b130

Jan Engelhardt authored Jan 20, 2011

Accidentally missed removing the old out-of-union "inverse" member,
which caused the struct size to change which then gives size mismatch
warnings when using an old iptables.

It is interesting to see that gcc did not warn about this before.
(Filed http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47376 )
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>

ba12b130

Merge branch 'connlimit' of git://dev.medozas.de/linux · 82d800d8
Patrick McHardy authored Jan 20, 2011
```
Conflicts:
	Documentation/feature-removal-schedule.txt
Signed-off-by: Patrick McHardy <kaber@trash.net>
```
82d800d8

netfilter: do not omit re-route check on NF_QUEUE verdict · 28a51ba5

Florian Westphal authored Jan 20, 2011

ret != NF_QUEUE only works in the "--queue-num 0" case; for
queues > 0 the test should be '(ret & NF_VERDICT_MASK) != NF_QUEUE'.

However, NF_QUEUE no longer DROPs the skb unconditionally if queueing
fails (due to NF_VERDICT_FLAG_QUEUE_BYPASS verdict flag), so the
re-route test should also be performed if this flag is set in the
verdict.

The full test would then look something like

&& ((ret & NF_VERDICT_MASK) == NF_QUEUE && (ret & NF_VERDICT_FLAG_QUEUE_BYPASS))

This is rather ugly, so just remove the NF_QUEUE test altogether.

The only effect is that we might perform an unnecessary route lookup
in the NF_QUEUE case.

ip6table_mangle did not have such a check.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>

28a51ba5

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 · a07aa004
David S. Miller authored Jan 20, 2011

a07aa004

netfilter: xtables: remove extraneous header that slipped in · 5d844928

Jan Engelhardt authored Jan 20, 2011

Commit 0b8ad876 (netfilter: xtables: add missing header files to export
list) erroneously added this.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>

5d844928

net_sched: cleanups · cc7ec456

Eric Dumazet authored Jan 19, 2011

Cleanup net/sched code to current CodingStyle and practices.

Reduce inline abuse
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cc7ec456

af_unix: coding style: remove one level of indentation in unix_shutdown() · 7180a031

Alban Crequy authored Jan 19, 2011

Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk>
Reviewed-by: Ian Molton <ian.molton@collabora.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

7180a031

net_sched: implement a root container qdisc sch_mqprio · b8970f0b

John Fastabend authored Jan 17, 2011

This implements a mqprio queueing discipline that by default creates
a pfifo_fast qdisc per tx queue and provides the needed configuration
interface.

Using the mqprio qdisc the number of tcs currently in use along
with the range of queues alloted to each class can be configured. By
default skbs are mapped to traffic classes using the skb priority.
This mapping is configurable.

Configurable parameters,

struct tc_mqprio_qopt {
	__u8    num_tc;
	__u8    prio_tc_map[TC_BITMASK + 1];
	__u8    hw;
	__u16   count[TC_MAX_QUEUE];
	__u16   offset[TC_MAX_QUEUE];
};

Here the count/offset pairing give the queue alignment and the
prio_tc_map gives the mapping from skb->priority to tc.

The hw bit determines if the hardware should configure the count
and offset values. If the hardware bit is set then the operation
will fail if the hardware does not implement the ndo_setup_tc
operation. This is to avoid undetermined states where the hardware
may or may not control the queue mapping. Also minimal bounds
checking is done on the count/offset to verify a queue does not
exceed num_tx_queues and that queue ranges do not overlap. Otherwise
it is left to user policy or hardware configuration to create
useful mappings.

It is expected that hardware QOS schemes can be implemented by
creating appropriate mappings of queues in ndo_tc_setup().

One expected use case is drivers will use the ndo_setup_tc to map
queue ranges onto 802.1Q traffic classes. This provides a generic
mechanism to map network traffic onto these traffic classes and
removes the need for lower layer drivers to know specifics about
traffic types.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b8970f0b

net: implement mechanism for HW based QOS · 4f57c087

John Fastabend authored Jan 17, 2011

This patch provides a mechanism for lower layer devices to
steer traffic using skb->priority to tx queues. This allows
for hardware based QOS schemes to use the default qdisc without
incurring the penalties related to global state and the qdisc
lock. While reliably receiving skbs on the correct tx ring
to avoid head of line blocking resulting from shuffling in
the LLD. Finally, all the goodness from txq caching and xps/rps
can still be leveraged.

Many drivers and hardware exist with the ability to implement
QOS schemes in the hardware but currently these drivers tend
to rely on firmware to reroute specific traffic, a driver
specific select_queue or the queue_mapping action in the
qdisc.

By using select_queue for this drivers need to be updated for
each and every traffic type and we lose the goodness of much
of the upstream work. Firmware solutions are inherently
inflexible. And finally if admins are expected to build a
qdisc and filter rules to steer traffic this requires knowledge
of how the hardware is currently configured. The number of tx
queues and the queue offsets may change depending on resources.
Also this approach incurs all the overhead of a qdisc with filters.

With the mechanism in this patch users can set skb priority using
expected methods ie setsockopt() or the stack can set the priority
directly. Then the skb will be steered to the correct tx queues
aligned with hardware QOS traffic classes. In the normal case with
single traffic class and all queues in this class everything
works as is until the LLD enables multiple tcs.

To steer the skb we mask out the lower 4 bits of the priority
and allow the hardware to configure upto 15 distinct classes
of traffic. This is expected to be sufficient for most applications
at any rate it is more then the 8021Q spec designates and is
equal to the number of prio bands currently implemented in
the default qdisc.

This in conjunction with a userspace application such as
lldpad can be used to implement 8021Q transmission selection
algorithms one of these algorithms being the extended transmission
selection algorithm currently being used for DCB.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4f57c087

netlink: support setting devgroup parameters · e7ed828f

Vlad Dogaru authored Jan 13, 2011

If a rtnetlink request specifies a negative or zero ifindex and has no
interface name attribute, but has a group attribute, then the chenges
are made to all the interfaces belonging to the specified group.
Signed-off-by: Vlad Dogaru <ddvlad@rosedu.org>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>

e7ed828f

net_device: add support for network device groups · cbda10fa

Vlad Dogaru authored Jan 13, 2011

Net devices can now be grouped, enabling simpler manipulation from
userspace. This patch adds a group field to the net_device structure, as
well as rtnetlink support to query and modify it.
Signed-off-by: Vlad Dogaru <ddvlad@rosedu.org>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>

cbda10fa

net: cleanup unused macros in net directory · 441c793a

Shan Wei authored Jan 13, 2011

Clean up some unused macros in net/*.
1. be left for code change. e.g. PGV_FROM_VMALLOC, PGV_FROM_VMALLOC, KMEM_SAFETYZONE.
2. never be used since introduced to kernel.
   e.g. P9_RDMA_MAX_SGE, UTIL_CTRL_PKT_SIZE.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Acked-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

441c793a

vxge: update driver version · 6997e618

Jon Mason authored Jan 18, 2011

Update vxge driver version to 2.5.2
Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6997e618

vxge: MSIX one shot mode · 16fded7d

Jon Mason authored Jan 18, 2011

To reduce the possibility of losing an interrupt in the handler due to a
race between an interrupt processing and disable/enable of interrupts,
enable MSIX one shot.

Also, add support for adaptive interrupt coalesing
Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: Masroor Vettuparambil <masroor.vettuparambil@exar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

16fded7d

vxge: correct eprom version detection · 1d15f81c

Jon Mason authored Jan 18, 2011

The firmware PXE EPROM version detection is failing due to passing the
wrong parameter into firmware query function.  Also, the version
printing function has an extraneous newline.
Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: Sivakumar Subramani <sivakumar.subramani@exar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1d15f81c

vxge: cleanup probe error paths · 6cca2003

Jon Mason authored Jan 18, 2011

Reorder the commands to be in the inverse order of their allocations
(instead of the random order they appear to be in), propagate return
code on errors from pci_request_region and register_netdev, reduce the
config_dev_cnt and total_dev_cnt counters on remove, and return the
correct error code for vdev->vpaths kzalloc failures.  Also, prevent
leaking of vdev->vpaths memory and netdev in vxge_probe error path due
to freeing for these not occurring in vxge_device_unregister.
Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: Sivakumar Subramani <sivakumar.subramani@exar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6cca2003