Commits · af3b867e2f6b72422bc7aacb1f1e26f47a9649bc · Kirill Smelkov / linux

28 Jan, 2008 40 commits

[DCCP]: Support inserting options during the 3-way handshake · af3b867e

Gerrit Renker authored Dec 13, 2007

This provides a separate routine to insert options during the initial handshake.
The main purpose is to conduct feature negotiation, for the moment the only user
is the timestamp echo needed for the (CCID3) handshake RTT sample.

Padding of options has been put into a small separate routine, to be shared among
the two functions. This could also be used as a generic routine to finish inserting
options.

Also removed an `XXX' comment since its content was obvious.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

af3b867e

[DCCP]: Handle timestamps on Request/Response exchange separately · b4d4f7c7

Gerrit Renker authored Dec 13, 2007

In DCCP, timestamps can occur on packets anytime, CCID3 uses a timestamp(/echo) on the Request/Response
exchange. This patch addresses the following situation:
	* timestamps are recorded on the listening socket;
	* Responses are sent from dccp_request_sockets;
	* suppose two connections reach the listening socket with very small time in between:
	* the first timestamp value gets overwritten by the second connection request.

This is not really good, so this patch separates timestamps into
 * those which are received by the server during the initial handshake (on dccp_request_sock);
 * those which are received by the client or the client after connection establishment.

As before, a timestamp of 0 is regarded as indicating that no (meaningful) timestamp has been
received (in addition, a warning message is printed if hosts send 0-valued timestamps).

The timestamp-echoing now works as follows:
 * when a timestamp is present on the initial Request, it is placed into dreq, due to the
   call to dccp_parse_options in dccp_v{4,6}_conn_request;
 * when a timestamp is present on the Ack leading from RESPOND => OPEN, it is copied over
   from the request_sock into the child cocket in dccp_create_openreq_child;
 * timestamps received on an (established) dccp_sock are treated as before.

Since Elapsed Time is measured in hundredths of milliseconds (13.2), the new dccp_timestamp()
function is used, as it is expected that the time between receiving the timestamp and
sending the timestamp echo will be very small against the wrap-around time. As a byproduct,
this allows smaller timestamping-time fields.

Furthermore, inserting the Timestamp Echo option has been taken out of the block starting with
'!dccp_packet_without_ack()', since Timestamp Echo can be carried on any packet (5.8 and 13.3).
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b4d4f7c7

[DCCP]: Add (missing) option parsing to request_sock processing · 8109616e

Gerrit Renker authored Dec 13, 2007

This adds option-parsing code to processing of Acks in the listening state
on request_socks on the server, serving two purposes
 (i)  resolves a FIXME (removed);
 (ii) paves the way for feature-negotiation during connection-setup.

There is an intended subtlety here with regard to dccp_check_req:

 Parsing options happens only after testing whether the received packet is
 a retransmitted Request.  Otherwise, if the Request contained (a possibly
 large number of) feature-negotiation options, recomputing state would have to
 happen each time a retransmitted Request arrives, which opens the door to an
 easy DoS attack.  Since in a genuine retransmission the options should not be
 different from the original, reusing the already computed state seems better.

 The other point is - if there are timestamp options on the Request, they will
 not be answered; which means that in the presence of retransmission (likely
 due to loss and/or other problems), the use of Request/Response RTT sampling
 is suspended, so that startup problems here do not propagate.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8109616e

[DCCP]: Allow to parse options on Request Sockets · 8b819412

Gerrit Renker authored Dec 13, 2007

The option parsing code currently only parses on full sk's. This causes a problem for
options sent during the initial handshake (in particular timestamps and feature-negotiation
options). Therefore, this patch extends the option parsing code with an additional argument
for request_socks: if it is non-NULL, options are parsed on the request socket, otherwise
the normal path (parsing on the sk) is used.

Subsequent patches, which implement feature negotiation during connection setup, make use
of this facility.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8b819412

[DCCP]: Collapse repeated `len' statements into one · 79133506

Gerrit Renker authored Dec 13, 2007

This replaces 4 individual assignments for `len' with a single
one, placed where the control flow of those 4 leads to.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

79133506

[DCCP]: Support for server holding timewait state · b8599d20

Gerrit Renker authored Dec 13, 2007

This adds a socket option and signalling support for the case where the server
holds timewait state on closing the connection, as described in RFC 4340, 8.3.

Since holding timewait state at the server is the non-usual case, it is enabled
via a socket option. Documentation for this socket option has been added.

The setsockopt statement has been made resilient against different possible cases
of expressing boolean `true' values using a suggestion by Ian McDonald.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b8599d20

[DCCP]: Use maximum-RTO backoff from DCCP spec · 28be5440

Gerrit Renker authored Dec 13, 2007

This removes another Fixme, using the TCP maximum RTO rather than the value
specified by the DCCP specification. Across the sections in RFC 4340, 64
seconds is consistently suggested as maximum RTO backoff value; and this is
the value which is now used.

I have checked both termination cases for retransmissions of Close/CloseReq:
with the default value 15 of `retries2', and an initial icsk_retransmit = 0,
it takes about 614 seconds to declare a non-responding peer as dead, after
which the final terminating Reset is sent. With the TCP maximum RTO value of
120 seconds it takes (as might be expected) almost twice as long, about 23
minutes.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

28be5440

[DCCP]: Shift the retransmit timer for active-close into output.c · 92d31920

Gerrit Renker authored Dec 13, 2007

When performing active close, RFC 4340, 8.3. requires to retransmit the
Close/CloseReq with a backoff-retransmit timer starting at intially 2 RTTs.

This patch shifts the existing code for active-close retransmit timer
into output.c, so that the retransmit timer is started when the first
Close/CloseReq is sent. Previously, the timer was started when, after
releasing the socket in dccp_close(), the actively-closing side had not yet
reached the CLOSED/TIMEWAIT state.

The patch further reduces the initial timeout from 3 seconds to the required
2 RTTs, where - in absence of a known RTT - the fallback value specified in
RFC 4340, 3.4 is used.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

92d31920

[IPV6]: fix section mismatch warnings · 09f7709f

Daniel Lezcano authored Dec 13, 2007

Removed useless and buggy __exit section in the different
ipv6 subsystems. Otherwise they will be called inside an
init section during rollbacking in case of an error in the
protocol initialization.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

09f7709f

[DCCP]: Perform SHUT_RD and SHUT_WR on receiving close · 69567d0b

Gerrit Renker authored Dec 13, 2007

This patch performs two changes:

1) Close the write-end in addition to the read-end when a fin-like segment
  (Close or CloseReq) is received by DCCP. This accounts for the fact that DCCP,
  in contrast to TCP, does not have a half-close. RFC 4340 says in this respect
  that when a fin-like segment has been sent there is no guarantee at all that
  any   further data will be processed.
  Thus this patch performs SHUT_WR in addition to the SHUT_RD when a fin-like
  segment is encountered.

2) Minor change: I noted that code appears twice in different places and think it
   makes sense to put this into a self-contained function (dccp_enqueue()).
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

69567d0b

[DECNET]: Fix inverted wait flag in xfrm_lookup call · 96eba69d

Herbert Xu authored Dec 13, 2007

My previous patch made the wait flag take the opposite value to what
it should be.  This patch fixes that.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

96eba69d

[NET]: Check RTNL status in unregister_netdevice · a6620712

Herbert Xu authored Dec 12, 2007

The caller must hold the RTNL so let's check it in unregister_netdevice.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

a6620712

[IPSEC]: Do not let packets pass when ICMP flag is off · aebcf82c

Herbert Xu authored Dec 12, 2007

This fixes a logical error in ICMP policy checks which lets
packets through if the state ICMP flag is off.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

aebcf82c

[IPSEC]: Make callers of xfrm_lookup to use XFRM_LOOKUP_WAIT · bb72845e

Herbert Xu authored Dec 12, 2007

This patch converts all callers of xfrm_lookup that used an
explicit value of 1 to indiciate blocking to use the new flag
XFRM_LOOKUP_WAIT.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

bb72845e

[IPSEC]: Fix reversed ICMP6 policy check · 7233b9f3

Herbert Xu authored Dec 12, 2007

The policy check I added for ICMP on IPv6 is reversed.  This
patch fixes that.

It also adds an skb->sp check so that unprotected packets that
fail the policy check do not crash the machine.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

7233b9f3

[BNX2]: Fix compiler warning. · 2ba582b7

Michael Chan authored Dec 21, 2007

Change bnx2_init_napi() to void.

Warning was noted by DaveM.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2ba582b7

[BNX2]: Update version to 1.7.1. · f13561cb

Michael Chan authored Dec 20, 2007

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f13561cb

[BNX2]: Enable new tx ring. · 57851d84

Michael Chan authored Dec 20, 2007

Enable new tx ring and add new MSIX handler and NAPI poll function
for the new tx ring.  Enable MSIX when the hardware supports it.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

57851d84

[BNX2]: Add support for a new tx ring. · c76c0475

Michael Chan authored Dec 20, 2007

To separate TX IRQs into a different MSIX vector, we need to
support a new tx ring.  The original tx ring will still be used
when not using MSIX.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c76c0475

[BNX2]: Support multiple MSIX IRQs. · b4b36042

Michael Chan authored Dec 20, 2007

Change bnx2_napi struct into an array and add code to manage multiple
IRQs.  MSIX hardware structures and new registers are also added.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b4b36042

[BNX2]: Move rx indexes into bnx2_napi struct. · a1f60190

Michael Chan authored Dec 20, 2007

Rx related fields used in NAPI polling are moved from the main
bnx2 struct to the bnx2_napi struct.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a1f60190

[BNX2]: Move tx indexes into bnx2_napi struct. · a550c99b

Michael Chan authored Dec 20, 2007

Tx related fields used in NAPI polling are moved from the main
bnx2 struct to the bnx2_napi struct.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a550c99b

[BNX2]: Introduce new bnx2_napi structure. · 35efa7c1

Michael Chan authored Dec 20, 2007

Introduce a bnx2_napi structure that will hold a napi_struct and
other fields to handle NAPI polling for the napi_struct.  Various tx
and rx indexes and status block pointers will be moved from the main
bnx2 structure to this bnx2_napi structure.

Most NAPI path functions are modified to be passed this bnx2_napi
struct pointer.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

35efa7c1

[BNX2]: Restructure IRQ datastructures. · 6d866ffc

Michael Chan authored Dec 20, 2007

Add a table to keep track of multiple IRQs and restructure the IRQ
request and free functions so that they can be easily expanded to
handle multiple IRQs.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6d866ffc

[BNX2]: Add function to fetch hardware tx index. · ead7270b

Michael Chan authored Dec 20, 2007

This makes the code cleaner and easier to support different tx rings.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ead7270b

[BNX2]: Update version to 1.6.9. · a0d142c6

Michael Chan authored Dec 12, 2007

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a0d142c6

[BNX2]: Enable S/G for jumbo RX. · 84eaa187

Michael Chan authored Dec 12, 2007

If the MTU requires more than 1 page for the SKB, enable the page ring
and calculate the size of the page ring.  This will guarantee order-0
allocation regardless of the MTU size.

Fixup loopback test packet size so that we don't deal with the pages
during loopback test.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

84eaa187

[BNX2]: Add fast path code to handle RX pages. · 1db82f2a

Michael Chan authored Dec 12, 2007

Add function to reuse a page in case of allocation or other errors.
Add code to construct the completed SKB with the additional data in
the pages.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1db82f2a

[BNX2]: Add init. code to handle RX pages. · 47bf4246

Michael Chan authored Dec 12, 2007

Add new fields to keep track of the pages and the page rings.
Add functions to allocate and free pages.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

47bf4246

[BNX2]: Update firmware to support S/G RX buffers. · 110d0ef9

Michael Chan authored Dec 12, 2007

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

110d0ef9

[BNX2]: Restructure RX ring init. code. · 5d5d0015

Michael Chan authored Dec 12, 2007

Factor out the common functions that will be used to initialize the
normal RX rings and the page rings.

Change the copybreak constant RX_COPY_THRESH to 128.  This same
constant will be used for the max. size of the linear SKB when pages
are used.  Copybreak will be turned off when pages are used.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5d5d0015

[BNX2]: Restructure RX fast path handling. · 85833c62

Michael Chan authored Dec 12, 2007

Add a new function to handle new SKB allocation and to prepare the
completed SKB.  This makes it easier to add support for non-linear
SKB.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

85833c62

[BNX2]: Add ring constants. · e343d55c

Michael Chan authored Dec 12, 2007

Define the various ring constants to make the code cleaner.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e343d55c

[NET]: fix drivers/net/ns83820.c build · f5f97b57

Andrew Morton authored Dec 12, 2007

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

f5f97b57

[IPIP]: Allow rebinding the tunnel to another interface · 5533995b

Michal Schmidt authored Dec 12, 2007

Once created, an IP tunnel can't be bound to another device.
(reported as https://bugzilla.redhat.com/show_bug.cgi?id=419671)

To reproduce:

# create a tunnel:
ip tunnel add tunneltest0 mode ipip remote 10.0.0.1 dev eth0
# try to change the bounding device from eth0 to eth1:
ip tunnel change tunneltest0 dev eth1
# show the result:
ip tunnel show tunneltest0

tunneltest0: ip/ip  remote 10.0.0.1  local any  dev eth0  ttl inherit

Notice the bound device has not changed from eth0 to eth1.

This patch fixes it. When changing the binding, it also recalculates the
MTU according to the new bound device's MTU.

If the change is acceptable, I'll do the same for GRE and SIT tunnels.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5533995b

[NET]: Remove unused define from loopback driver. · 6a7657f5

Pavel Emelyanov authored Dec 12, 2007

The LOOPBACK_OVERHEAD is not used in this file at all.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

6a7657f5

[NETNS]: network namespace was passed into dev_getbyhwaddr but not used · 81103a52
Denis V. Lunev authored Dec 12, 2007
```
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
81103a52

[NET]: Remove FASTCALL macro · 41380930

Harvey Harrison authored Dec 12, 2007

X86_32 was the last user of the FASTCALL macro, now that it
uses regparm(3) by default, this macro expands to nothing.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

41380930

[IPSEC]: Add ICMP host relookup support · 8b7817f3

Herbert Xu authored Dec 12, 2007

RFC 4301 requires us to relookup ICMP traffic that does not match any
policies using the reverse of its payload. This patch implements this
for ICMP traffic that originates from or terminates on localhost.

This is activated on outbound with the new policy flag XFRM_POLICY_ICMP,
and on inbound by the new state flag XFRM_STATE_ICMP.

On inbound the policy check is now performed by the ICMP protocol so
that it can repeat the policy check where necessary.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

8b7817f3

[IPSEC]: Added xfrm_decode_session_reverse and xfrmX_policy_check_reverse · d5422efe

Herbert Xu authored Dec 12, 2007

RFC 4301 requires us to relookup ICMP traffic that does not match any
policies using the reverse of its payload.  This patch adds the functions
xfrm_decode_session_reverse and xfrmX_policy_check_reverse so we can get
the reverse flow to perform such a lookup.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

d5422efe