Commits · af356afa010f3cd2c8b8fcc3bce90f7a7b7ec02a · Kirill Smelkov / linux

06 Sep, 2009 4 commits

net_sched: reintroduce dev->qdisc for use by sch_api · af356afa

Patrick McHardy authored Sep 04, 2009

Currently the multiqueue integration with the qdisc API suffers from
a few problems:

- with multiple queues, all root qdiscs use the same handle. This means
  they can't be exposed to userspace in a backwards compatible fashion.

- all API operations always refer to queue number 0. Newly created
  qdiscs are automatically shared between all queues, its not possible
  to address individual queues or restore multiqueue behaviour once a
  shared qdisc has been attached.

- Dumps only contain the root qdisc of queue 0, in case of non-shared
  qdiscs this means the statistics are incomplete.

This patch reintroduces dev->qdisc, which points to the (single) root qdisc
from userspace's point of view. Currently it either points to the first
(non-shared) default qdisc, or a qdisc shared between all queues. The
following patches will introduce a classful dummy qdisc, which will be used
as root qdisc and contain the per-queue qdiscs as children.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

af356afa

net_sched: remove some unnecessary checks in classful schedulers · 5b9a9ccf

Patrick McHardy authored Sep 04, 2009

The class argument to the ->graft(), ->leaf(), ->dump(), ->dump_stats() all
originate from either ->get() or ->walk() and are always valid.

Remove unnecessary checks.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

5b9a9ccf

net_sched: make cls_ops->change and cls_ops->delete optional · de6d5cdf

Patrick McHardy authored Sep 04, 2009

Some schedulers don't support creating, changing or deleting classes.
Make the respective callbacks optionally and consistently return
-EOPNOTSUPP for unsupported operations, instead of currently either
-EOPNOTSUPP, -ENOSYS or no error.

In case of sch_prio and sch_multiq, the removed operations additionally
checked for an invalid class. This is not necessary since the class
argument can only orginate from ->get() or in case of ->change is 0
for creation of new classes, in which case ->change() incorrectly
returned -ENOENT.

As a side-effect, this patch fixes a possible (root-only) NULL pointer
function call in sch_ingress, which didn't implement a so far mandatory
->delete() operation.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

de6d5cdf

net_sched: make cls_ops->tcf_chain() optional · 71ebe5e9

Patrick McHardy authored Sep 04, 2009

Some qdiscs don't support attaching filters. Handle this centrally in
cls_api and return a proper errno code (EOPNOTSUPP) instead of EINVAL.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

71ebe5e9

05 Sep, 2009 2 commits

net_sched: fix class grafting errno codes · c9f1d038

Patrick McHardy authored Sep 04, 2009

If the parent qdisc doesn't support classes, use EOPNOTSUPP.
If the parent class doesn't exist, use ENOENT. Currently EINVAL
is returned in both cases.

Additionally check whether grafting is supported and remove a now
unnecessary graft function from sch_ingress.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

c9f1d038

netlink: silence compiler warning · b1f57195

Brian Haley authored Sep 04, 2009

  CC      net/netlink/genetlink.o
net/netlink/genetlink.c: In function ‘genl_register_mc_group’:
net/netlink/genetlink.c:139: warning: ‘err’ may be used uninitialized in this function

From following the code 'err' is initialized, but set it to zero to
silence the warning.
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b1f57195

04 Sep, 2009 34 commits

sctp: Catch bogus stream sequence numbers · f1751c57

Vlad Yasevich authored Sep 04, 2009

Since our TSN map is capable of holding at most a 4K chunk gap,
there is no way that during this gap, a stream sequence number
(unsigned short) can wrap such that the new number is smaller
then the next expected one.  If such a case is encountered,
this is a protocol violation.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

f1751c57

sctp: remove dup code in net/sctp/output.c · be297143

Wei Yongjun authored Sep 04, 2009

Use sctp_packet_reset() instead of dup code.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

be297143

sctp: turn flags in 'struct sctp_association' into bit fields · 9237ccbc

Wei Yongjun authored Sep 04, 2009

This shrinks the size of struct sctp_association a little.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

9237ccbc

sctp: Sysctl configuration for IPv4 Address Scoping · 72388433

Bhaskar Dutta authored Sep 03, 2009

This patch introduces a new sysctl option to make IPv4 Address Scoping
configurable <draft-stewart-tsvwg-sctp-ipv4-00.txt>.

In networking environments where DNAT rules in iptables prerouting
chains convert destination IP's to link-local/private IP addresses,
SCTP connections fail to establish as the INIT chunk is dropped by the
kernel due to address scope match failure.
For example to support overlapping IP addresses (same IP address with
different vlan id) a Layer-5 application listens on link local IP's,
and there is a DNAT rule that maps the destination IP to a link local
IP. Such applications never get the SCTP INIT if the address-scoping
draft is strictly followed.

This sysctl configuration allows SCTP to function in such
unconventional networking environments.

Sysctl options:
0 - Disable IPv4 address scoping draft altogether
1 - Enable IPv4 address scoping (default, current behavior)
2 - Enable address scoping but allow IPv4 private addresses in init/init-ack
3 - Enable address scoping but allow IPv4 link local address in init/init-ack
Signed-off-by: Bhaskar Dutta <bhaskar.dutta@globallogic.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

72388433

sctp: Get rid of an extra routing lookup when adding a transport. · 8da645e1

Vlad Yasevich authored Sep 04, 2009

We used to perform 2 routing lookups for a new transport: one
just for path mtu detection, and one to actually route to destination
and path mtu update when sending a packet.  There is no point in doing
both of them, especially since the first one just for path mtu doesn't
take into account source address and sometimes gives the wrong route,
causing path mtu updates anyway.

We now do just the one call to do both route to destination and get
path mtu updates.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

8da645e1

sctp: Turn flags in 'sctp_packet' into bit fields · a803c942

Vlad Yasevich authored Sep 04, 2009

This shrinks the size of sctp_packet a little.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

a803c942

sctp: Correctly track if AUTH has been bundled. · 4007cc88

Vlad Yasevich authored Sep 04, 2009

We currently track if AUTH has been bundled using the 'auth'
pointer to the chunk.  However, AUTH is disallowed after DATA
is already in the packet, so we need to instead use the
'has_auth' field.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

4007cc88

sctp: fix to reset packet information after packet transmit · d521c08f

Wei Yongjun authored Sep 02, 2009

The packet information does not reset after packet transmit, this
may cause some problems such as following DATA chunk be sent without
AUTH chunk, even if the authentication of DATA chunk has been
requested by the peer.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

d521c08f

sctp: Failover transmitted list on transport delete · 31b02e15

Vlad Yasevich authored Sep 04, 2009

Add-IP feature allows users to delete an active transport.  If that
transport has chunks in flight, those chunks need to be moved to another
transport or association may get into unrecoverable state.
Reported-by: Rafael Laufer <rlaufer@cisco.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

31b02e15

sctp: Fix SCTP_MAXSEG socket option to comply to spec. · f68b2e05

Vlad Yasevich authored Sep 04, 2009

We had a bug that we never stored the user-defined value for
MAXSEG when setting the value on an association.  Thus future
PMTU events ended up re-writing the frag point and increasing
it past user limit.  Additionally, when setting the option on
the socket/endpoint, we effect all current associations, which
is against spec.

Now, we store the user 'maxseg' value along with the computed
'frag_point'.  We inherit 'maxseg' from the socket at association
creation and use it as an upper limit for 'frag_point' when its
set.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

f68b2e05

sctp: Don't do NAGLE delay on large writes that were fragmented small · cb95ea32

Vlad Yasevich authored Sep 04, 2009

SCTP will delay the last part of a large write due to NAGLE, if that
part is smaller then MTU. Since we are doing large writes, we might
as well send the last portion now instead of waiting untill the next
large write happens. The small portion will be sent as is regardless,
so it's better to not delay it.

This is a result of much discussions with Wei Yongjun <yjwei@cn.fujitsu.com>
and Doug Graham <dgraham@nortel.com>. Many thanks go out to them.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

cb95ea32

sctp: Nagle delay should be based on path mtu · b29e7907

Vlad Yasevich authored Sep 04, 2009

The decision to delay due to Nagle should be based on the path mtu
and future packet size.  We currently incorrectly base it on
'frag_point' which is the SCTP DATA segment size, and also we do
not count DATA chunk header overhead in the computation.  This
actuall allows situations where a user can set low 'frag_point',
and then send small messages without delay.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

b29e7907

sctp: Try not to change a_rwnd when faking a SACK from SHUTDOWN. · d4d6fb57

Vlad Yasevich authored Sep 04, 2009

We currently set a_rwnd to 0 when faking a SACK from SHUTDOWN.
This results in an hung association if the remote only uses
SHUTDOWNs (which it's allowed to do) to acknowlege DATA when
closing.  The reason for that is that we simply honor the a_rwnd
from the sack, but since we faked it to be 0, we enter 0-window
probing.  The fix is to use the peers old rwnd and add our flight
size to it.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

d4d6fb57

sctp: drop a_rwnd to 0 when receive buffer overflows. · 4d3c46e6

Vlad Yasevich authored Sep 04, 2009

SCTP has a problem that when small chunks are used, it is possible
to exhaust the receiver buffer without fully closing receive window.
This happens due to all overhead that we have account for with small
messages. To fix this, when receive buffer is exceeded, we'll drop
the window to 0 and save the 'drop' portion. When application starts
reading data and freeing up recevie buffer space, we'll wait until
we've reached the 'drop' window and then add back this 'drop' one
mtu at a time. This worked well in testing and under stress produced
rather even recovery.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

4d3c46e6

sctp: Clear fast_recovery on the transport when T3 timer expires. · 33ce8281

Vlad Yasevich authored Sep 04, 2009

If T3 timer expires, we are retransmitting data due to timeout any
any fast recovery is null and void.  We can clear the fast recovery
flag.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

33ce8281

sctp: Fix error count increments that were results of HEARTBEATS · b9f84786

Vlad Yasevich authored Aug 26, 2009

SCTP RFC 4960 states that unacknowledged HEARTBEATS count as
errors agains a given transport or endpoint.  As such, we
should increment the error counts for only for unacknowledged
HB, otherwise we detect failure too soon.  This goes for both
the overall error count and the path error count.

Now, there is a difference in how the detection is done
between the two.  The path error detection is done after
the increment, so to detect it properly, we actually need
to exceed the path threshold.  The overall error detection
is done _BEFORE_ the increment.  Thus to detect the failure,
it's enough for the error count to match the threshold.
This is why all the state functions use '>=' to detect failure,
while path detection uses '>'.

Thanks goes to Chunbo Luo <chunbo.luo@windriver.com> who first
proposed patches to fix this issue and made me re-read the spec
and the code to figure out how this cruft really works.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

b9f84786

sctp: use proc_create() · d71a09ed

Alexey Dobriyan authored Aug 23, 2009

create_proc_entry() is deprecated (not formally, though).
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

d71a09ed

sctp: fix check the chunk length of received HEARTBEAT-ACK chunk · dadb50cc

Wei Yongjun authored Aug 22, 2009

The receiver of the HEARTBEAT should respond with a HEARTBEAT ACK
that contains the Heartbeat Information field copied from the
received HEARTBEAT chunk. So the received HEARTBEAT-ACK chunk
must have a length of:
  sizeof(sctp_chunkhdr_t) + sizeof(sctp_sender_hb_info_t)

A badly formatted HB-ACK chunk, it is possible that we may access
invalid memory.  We should really make sure that the chunk format
is what we expect, before attempting to touch the data.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

dadb50cc

sctp: drop SHUTDOWN chunk if the TSN is less than the CTSN · a2f36eec

Wei Yongjun authored Aug 22, 2009

If Cumulative TSN Ack field of SHUTDOWN chunk is less than the
Cumulative TSN Ack Point then drop the SHUTDOWN chunk.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

a2f36eec

sctp: Send user messages to the lower layer as one · 9c5c62be

Vlad Yasevich authored Aug 10, 2009

Currenlty, sctp breaks up user messages into fragments and
sends each fragment to the lower layer by itself.  This means
that for each fragment we go all the way down the stack
and back up.  This also discourages bundling of multiple
fragments when they can fit into a sigle packet (ex: due
to user setting a low fragmentation threashold).

We introduce a new command SCTP_CMD_SND_MSG and hand the
whole message down state machine.  The state machine and
the side-effect parser will cork the queue, add all chunks
from the message to the queue, and then un-cork the queue
thus causing the chunks to get transmitted.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

9c5c62be

sctp: Try to encourage SACK bundling with DATA. · 5d7ff261

Vlad Yasevich authored Aug 07, 2009

If the association has a SACK timer pending and now DATA queued
to be send, we'll try to bundle the SACK with the next application send.
As such, try encourage bundling by accounting for SACK in the size
of the first chunk fragment.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

5d7ff261

sctp: Generate SACKs when actually sending outbound DATA · e83963b7

Vlad Yasevich authored Aug 07, 2009

We are now trying to bundle SACKs when we have outbound
DATA to send.  However, there are situations where this
outbound DATA will not be sent (due to congestion or 
available window).  In such cases it's ok to wait for the
timer to expire.  This patch refactors the sending code
so that betfore attempting to bundle the SACK we check
to see if the DATA will actually be transmitted.

Based on eirlier works for Doug Graham <dgraham@nortel.com> and
Wei Youngjun <yjwei@cn.fujitsu.com>.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

e83963b7

sctp: Fix data segmentation with small frag_size · 3e62abf9

Vlad Yasevich authored Sep 04, 2009

Since an application may specify the maximum SCTP fragment size
that all data should be fragmented to, we need to fix how
we do segmentation.   Right now, if a user specifies a small
fragment size, the segment size can go negative in the presence
of AUTH or COOKIE_ECHO bundling.

What we need to do is track the largest possbile DATA chunk that
can fit into the mtu.  Then if the fragment size specified is
bigger then this maximum length, we'll shrink it down.  Otherwise,
we just use the smaller segment size without changing it further.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

3e62abf9

sctp: Disallow new connection on a closing socket · bec9640b

Vlad Yasevich authored Jul 30, 2009

If a socket has a lot of association that are in the process of
of being closed/aborted, it is possible for a remote to establish
new associations during the time period that the old ones are shutting
down. If this was a result of a close() call, there will be no socket
and will cause a memory leak. We'll prevent this by setting the
socket state to CLOSING and disallow new associations when in this state.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

bec9640b

sctp: Fix piggybacked ACKs · af87b823

Doug Graham authored Jul 29, 2009

This patch corrects the conditions under which a SACK will be piggybacked
on a DATA packet. The previous condition was incorrect due to a
misinterpretation of RFC 4960 and/or RFC 2960. Specifically, the
following paragraph from section 6.2 had not been implemented correctly:

Before an endpoint transmits a DATA chunk, if any received DATA
chunks have not been acknowledged (e.g., due to delayed ack), the
sender should create a SACK and bundle it with the outbound DATA
chunk, as long as the size of the final SCTP packet does not exceed
the current MTU. See Section 6.2.

When about to send a DATA chunk, the code now checks to see if the SACK
timer is running. If it is, we know we have a SACK to send to the
peer, so we append the SACK (assuming available space in the packet)
and turn off the timer. For a simple request-response scenario, this
will result in the SACK being bundled with the response, meaning the
the SACK is received quickly by the client, and also meaning that no
separate SACK packet needs to be sent by the server to acknowledge the
request. Prior to this patch, a separate SACK packet would have been
sent by the server SCTP only after its delayed-ACK timer had expired
(usually 200ms). This is wasteful of bandwidth, and can also have a
major negative impact on performance due the interaction of delayed ACKs
with the Nagle algorithm.
Signed-off-by: Doug Graham <dgraham@nortel.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

af87b823

sctp: remove unused union (sctp_cmsg_data_t) definition · b4e8c6a7

Rami Rosen authored Jul 30, 2009

This patch removes an unused union definition (sctp_cmsg_data_t)
from include/net/sctp/user.h.
Signed-off-by: Rami Rosen <rosenrami@gmail.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

b4e8c6a7

sctp: release cached route when the transport goes down. · 40187886

Vlad Yasevich authored Jun 23, 2009

When the sctp transport is marked down, we can release the
cached route and force a new lookup when attempting to use
this transport for anything.  This way, if a better route
or source address is available, we'll try to use it.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

40187886

sctp: update the route for non-active transports after addresses are added · 3cd9749c

Wei Yongjun authored Jun 16, 2009

Update the route and saddr entries for the non-active transports as some
of the added addresses can be used as better source addresses, or may
be there is a better route.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

3cd9749c

sctp: check the unrecognized ASCONF parameter before access it · 44e65c1e

Wei Yongjun authored Jun 16, 2009

This patch fix to check the unrecognized ASCONF parameter before
access it.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

44e65c1e

sctp: avoid overwrite the return value of sctp_process_asconf_ack() · 425e0f68

Wei Yongjun authored Jun 16, 2009

The return value of sctp_process_asconf_ack() may be
overwritten while process parameters with no error.
This patch fixed the problem.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

425e0f68

net: Fix a build break because of a typo in drivers/net/3c503.c · 8a34e2f8
Sachin Sant authored Sep 04, 2009
```
Signed-off-by: Sachin Sant <sachinp@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
8a34e2f8

can: sja1000: legacy SJA1000 ISA bus driver · 2a6ba39a

Wolfgang Grandegger authored Sep 01, 2009

This patch adds support for legacy SJA1000 CAN controllers on the ISA
or PC-104 bus. The I/O port or memory address and the IRQ number must
be specified via module parameters:

  insmod sja1000_isa.ko port=0x310,0x380 irq=7,11

for ISA devices using I/O ports or:

  insmod sja1000_isa.ko mem=0xd1000,0xd1000 irq=7,11

for memory mapped ISA devices.

Indirect access via address and data port is supported as well:

  insmod sja1000_isa.ko port=0x310,0x380 indirect=1 irq=7,11

Here is a full list of the supported module parameters:

  port:I/O port number (array of ulong)
  mem:I/O memory address (array of ulong)
  indirect:Indirect access via address and data port (array of byte)
  irq:IRQ number (array of int)
  clk:External oscillator clock frequency (default=16000000 [16 MHz])
      (array of int)
  cdr:Clock divider register (default=0x48 [CDR_CBP | CDR_CLK_OFF])
      (array of byte)
  ocr:Output clock register (default=0x18 [OCR_TX0_PUSHPULL])
      (array of byte)

Note: for clk, cdr, ocr, the first argument re-defines the default
for all other devices, e.g.:

 insmod sja1000_isa.ko mem=0xd1000,0xd1000 irq=7,11 clk=24000000

is equivalent to

 insmod sja1000_isa.ko mem=0xd1000,0xd1000 irq=7,11 \
                       clk=24000000,24000000
Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Tested-by: Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

2a6ba39a

can: sja1000: fix network statistics update · 8935f57e

Wolfgang Grandegger authored Sep 01, 2009

The member "tx_bytes" of "struct net_device_stats" should be
incremented when the interrupt is done and an "arbitration
lost error" is a TX error and the statistics should be updated
accordingly.
Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8935f57e

can: add can_free_echo_skb() for upcoming drivers · 39e3ab6f

Wolfgang Grandegger authored Sep 01, 2009

This patch adds the function can_free_echo_skb to the CAN
device interface to allow upcoming drivers to release echo
skb's in case of error.
Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

39e3ab6f