Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
L
linux
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
Kirill Smelkov
linux
Commits
b4f94e3f
Commit
b4f94e3f
authored
Dec 22, 2002
by
David S. Miller
Browse files
Options
Browse Files
Download
Plain Diff
Merge
ssh://nuts.ninka.net//home/davem/src/BK/net-2.5
into kernel.bkbits.net:/home/davem/net-2.5
parents
52a150d8
545ac82c
Changes
32
Show whitespace changes
Inline
Side-by-side
Showing
32 changed files
with
2212 additions
and
864 deletions
+2212
-864
Documentation/networking/bonding.txt
Documentation/networking/bonding.txt
+377
-133
Documentation/networking/ifenslave.c
Documentation/networking/ifenslave.c
+90
-3
MAINTAINERS
MAINTAINERS
+6
-0
drivers/net/Kconfig
drivers/net/Kconfig
+5
-3
drivers/net/bonding.c
drivers/net/bonding.c
+982
-215
drivers/net/pppoe.c
drivers/net/pppoe.c
+24
-21
drivers/net/pppox.c
drivers/net/pppox.c
+9
-9
include/linux/if_bonding.h
include/linux/if_bonding.h
+7
-3
include/linux/sysctl.h
include/linux/sysctl.h
+2
-1
include/net/dst.h
include/net/dst.h
+2
-0
include/net/sctp/sctp.h
include/net/sctp/sctp.h
+2
-2
include/net/sctp/sm.h
include/net/sctp/sm.h
+10
-3
include/net/sctp/structs.h
include/net/sctp/structs.h
+21
-8
net/core/netfilter.c
net/core/netfilter.c
+9
-1
net/core/pktgen.c
net/core/pktgen.c
+4
-4
net/ipv4/route.c
net/ipv4/route.c
+44
-36
net/ipv4/tcp_ipv4.c
net/ipv4/tcp_ipv4.c
+1
-0
net/ipv6/ndisc.c
net/ipv6/ndisc.c
+1
-0
net/key/af_key.c
net/key/af_key.c
+1
-1
net/sctp/associola.c
net/sctp/associola.c
+6
-3
net/sctp/bind_addr.c
net/sctp/bind_addr.c
+2
-2
net/sctp/input.c
net/sctp/input.c
+6
-3
net/sctp/ipv6.c
net/sctp/ipv6.c
+110
-78
net/sctp/protocol.c
net/sctp/protocol.c
+106
-44
net/sctp/sm_make_chunk.c
net/sctp/sm_make_chunk.c
+50
-10
net/sctp/sm_sideeffect.c
net/sctp/sm_sideeffect.c
+22
-9
net/sctp/sm_statefuns.c
net/sctp/sm_statefuns.c
+137
-57
net/sctp/sm_statetable.c
net/sctp/sm_statetable.c
+1
-1
net/sctp/socket.c
net/sctp/socket.c
+135
-211
net/sctp/sysctl.c
net/sctp/sysctl.c
+5
-0
net/sctp/transport.c
net/sctp/transport.c
+1
-1
net/sctp/ulpevent.c
net/sctp/ulpevent.c
+34
-2
No files found.
Documentation/networking/bonding.txt
View file @
b4f94e3f
...
...
@@ -17,6 +17,23 @@ extreme-linux and beowulf sites will not work with this version of the driver.
For new versions of the driver, patches for older kernels and the updated
userspace tools, please follow the links at the end of this file.
Table of Contents
=================
Installation
Bond Configuration
Module Parameters
Configuring Multiple Bonds
Switch Configuration
Verifying Bond Configuration
Frequently Asked Questions
High Availability
Promiscuous Sniffing notes
Limitations
Resources and Links
Installation
============
...
...
@@ -51,16 +68,21 @@ To install ifenslave.c, do:
# gcc -Wall -Wstrict-prototypes -O -I/usr/src/linux/include ifenslave.c -o ifenslave
# cp ifenslave /sbin/ifenslave
3) Configure your system
------------------------
Also see the following section on the module parameters. You will need to add
at least the following line to /etc/conf.modules (or /etc/modules.conf):
Bond Configuration
==================
You will need to add at least the following line to /etc/modules.conf
so the bonding driver will automatically load when the bond0 interface is
configured. Refer to the modules.conf manual page for specific modules.conf
syntax details. The Module Parameters section of this document describes each
bonding driver parameter.
alias bond0 bonding
Use standard distribution techniques to define bond0 network interface. For
example, on modern Red
Hat distributions, create
ifcfg-bond0 file in
/etc/sysconfig/network-scripts directory that looks like this
:
Use standard distribution techniques to define
the
bond0 network interface. For
example, on modern Red
Hat distributions, create an
ifcfg-bond0 file in
the /etc/sysconfig/network-scripts directory that resembles the following
:
DEVICE=bond0
IPADDR=192.168.1.1
...
...
@@ -71,12 +93,12 @@ ONBOOT=yes
BOOTPROTO=none
USERCTL=no
(
put the appropriate values for you network instead of 192.168.1).
(
use appropriate values for your network above)
All interfaces that are part of
the trunk,
should have SLAVE and MASTER
definitions. For example, in the case of RedHat, if you wish to make eth0 and
eth1
(or other interfaces) a part of the bonding interface bond0, their config
files (ifcfg-eth0, ifcfg-eth1, etc.) should look like this
:
All interfaces that are part of
a bond
should have SLAVE and MASTER
definitions. For example, in the case of Red
Hat, if you wish to make eth0 and
eth1
a part of the bonding interface bond0, their config files (ifcfg-eth0 and
ifcfg-eth1) should resemble the following
:
DEVICE=eth0
USERCTL=no
...
...
@@ -85,78 +107,193 @@ MASTER=bond0
SLAVE=yes
BOOTPROTO=none
(use DEVICE=eth1 for eth1 and MASTER=bond1 for bond1 if you have configured
second bonding interface).
Use DEVICE=eth1 in the ifcfg-eth1 config file. If you configure a second bonding
interface (bond1), use MASTER=bond1 in the config file to make the network
interface be a slave of bond1.
Restart the networking subsystem or just bring up the bonding device if your
administration tools allow it. Otherwise, reboot.
(For the case of RedHat
distros, you can do `ifup bond0' or `/etc/rc.d/init.d/network restart'.)
administration tools allow it. Otherwise, reboot.
On Red Hat distros you can
issue `ifup bond0' or `/etc/rc.d/init.d/network restart'.
If the administration tools of your distribution do not support master/slave
notation in configuration of network interfaces, you will need to configure
the bonding device with the following commands manually:
notation in configuring network interfaces, you will need to manually configure
the bonding device with the following commands:
# /sbin/ifconfig bond0 192.168.1.1 netmask 255.255.255.0 \
broadcast 192.168.1.255 up
# /sbin/ifconfig bond0 192.168.1.1 up
# /sbin/ifenslave bond0 eth0
# /sbin/ifenslave bond0 eth1
(substitute 192.168.1.1 with your IP address and add custom network and custom
netmask to the arguments of ifconfig if required).
(use appropriate values for your network above)
You can then create a script
with these commands and put it into the appropriate
rc directory.
You can then create a script
containing these commands and place it in the
appropriate
rc directory.
If you specifically need
that all your network drivers are loaded before the
bonding driver, use one of modutils' powerful features : in your modules.conf,
tell that when asked for bond0, modprobe should first load all your interfaces :
If you specifically need
all network drivers loaded before the bonding driver,
adding the following line to modules.conf will cause the network driver for
eth0 and eth1 to be loaded before the bonding driver.
probeall bond0 eth0 eth1 bonding
Be careful not to reference bond0 itself at the end of the line, or modprobe will
die in an endless recursive loop.
Be careful not to reference bond0 itself at the end of the line, or modprobe
will die in an endless recursive loop.
To have device characteristics (such as MTU size) propagate to slave devices,
set the bond characteristics before enslaving the device. The characteristics
are propagated during the enslave process.
If running SNMP agents, the bonding driver should be loaded before any network
drivers participating in a bond. This requirement is due to the the interface
index (ipAdEntIfIndex) being associated to the first interface found with a
given IP address. That is, there is only one ipAdEntIfIndex for each IP
address. For example, if eth0 and eth1 are slaves of bond0 and the driver for
eth0 is loaded before the bonding driver, the interface for the IP address
will be associated with the eth0 interface. This configuration is shown below,
the IP address 192.168.1.1 has an interface index of 2 which indexes to eth0
in the ifDescr table (ifDescr.2).
interfaces.ifTable.ifEntry.ifDescr.1 = lo
interfaces.ifTable.ifEntry.ifDescr.2 = eth0
interfaces.ifTable.ifEntry.ifDescr.3 = eth1
interfaces.ifTable.ifEntry.ifDescr.4 = eth2
interfaces.ifTable.ifEntry.ifDescr.5 = eth3
interfaces.ifTable.ifEntry.ifDescr.6 = bond0
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.10.10.10 = 5
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.192.168.1.1 = 2
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.74.20.94 = 4
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.127.0.0.1 = 1
This problem is avoided by loading the bonding driver before any network
drivers participating in a bond. Below is an example of loading the bonding
driver first, the IP address 192.168.1.1 is correctly associated with ifDescr.2.
interfaces.ifTable.ifEntry.ifDescr.1 = lo
interfaces.ifTable.ifEntry.ifDescr.2 = bond0
interfaces.ifTable.ifEntry.ifDescr.3 = eth0
interfaces.ifTable.ifEntry.ifDescr.4 = eth1
interfaces.ifTable.ifEntry.ifDescr.5 = eth2
interfaces.ifTable.ifEntry.ifDescr.6 = eth3
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.10.10.10 = 6
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.192.168.1.1 = 2
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.74.20.94 = 5
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.127.0.0.1 = 1
While some distributions may not report the interface name in ifDescr,
the association between the IP address and IfIndex remains and SNMP
functions such as Interface_Scan_Next will report that association.
Module Parameters
=================
Optional parameters for the bonding driver can be supplied as command line
arguments to the insmod command. Typically, these parameters are specified in
the file /etc/modules.conf (see the manual page for modules.conf). The
available bonding driver parameters are listed below. If a parameter is not
specified the default value is used. When initially configuring a bond, it
is recommended "tail -f /var/log/messages" be run in a separate window to
watch for bonding driver error messages.
It is critical that either the miimon or arp_interval and arp_ip_target
parameters be specified, otherwise serious network degradation will occur
during link failures.
mode
Specifies one of four bonding policies. The default is round-robin.
Possible values are:
0 Round-robin policy: Transmit in a sequential order from the
first available slave through the last. This mode provides
load balancing and fault tolerance.
1 Active-backup policy: Only one slave in the bond is active. A
different slave becomes active if, and only if, the active slave
fails. The bond's MAC address is externally visible on only
one port (network adapter) to avoid confusing the switch.
This mode provides fault tolerance.
4) Module parameters.
---------------------
The following module parameters can be passed:
2 XOR policy: Transmit based on [(source MAC address XOR'd with
destination MAC address) modula slave count]. This selects the
same slave for each destination MAC address. This mode provides
load balancing and fault tolerance.
mode=
3 Broadcast policy: transmits everything on all slave interfaces.
This mode provides fault tolerance.
Possible values are 0 (round robin policy, default) and 1 (active backup
policy), and 2 (XOR). See question 9 and the HA section for additional info.
miimon
miimon=
Specifies the frequency in milli-seconds that MII link monitoring will
occur. A value of zero disables MII link monitoring. A value of
100 is a good starting point. See High Availability section for
additional information. The default value is 0.
Use integer value for the frequency (in ms) of MII link monitoring. Zero value
is default and means the link monitoring will be disabled. A good value is 100
if you wish to use link monitoring. See HA section for additional info.
downdelay
downdelay=
Specifies the delay time in milli-seconds to disable a link after a
link failure has been detected. This should be a multiple of miimon
value, otherwise the value will be rounded. The default value is 0.
Use integer value for delaying disabling a link by this number (in ms) after
the link failure has been detected. Must be a multiple of miimon. Default
value is zero. See HA section for additional info.
updelay
updelay=
Specifies the delay time in milli-seconds to enable a link after a
link up status has been detected. This should be a multiple of miimon
value, otherwise the value will be rounded. The default value is 0.
Use integer value for delaying enabling a link by this number (in ms) after
the "link up" status has been detected. Must be a multiple of miimon. Default
value is zero. See HA section for additional info.
arp_interval
arp_interval=
Specifies the ARP monitoring frequency in milli-seconds.
If ARP monitoring is used in a load-balancing mode (mode 0 or 2), the
switch should be configured in a mode that evenly distributes packets
across all links - such as round-robin. If the switch is configured to
distribute the packets in an XOR fashion, all replies from the ARP
targets will be received on the same link which could cause the other
team members to fail. ARP monitoring should not be used in conjunction
with miimon. A value of 0 disables ARP monitoring. The default value
is 0.
Use integer value for the frequency (in ms) of arp monitoring. Zero value
is default and means the arp monitoring will be disabled. See HA section
for additional info. This field is value in active_backup mode only.
arp_ip_target
arp_ip_target=
Specifies the ip addresses to use when arp_interval is > 0. These are
the targets of the ARP request sent to determine the health of the link
to the targets. Specify these values in ddd.ddd.ddd.ddd format.
Multiple ip adresses must be seperated by a comma. At least one ip
address needs to be given for ARP monitoring to work. The maximum number
of targets that can be specified is set at 16.
An ip address to use when arp_interval is > 0. This is the target of the
arp request sent to determine the health of the link to the target.
Specify this value in ddd.ddd.ddd.ddd format.
primary
If you need to configure several bonding devices, the driver must be loaded
several times. I.e. for two bonding devices, your /etc/conf.modules must look
like this:
A string (eth0, eth2, etc) to equate to a primary device. If this
value is entered, and the device is on-line, it will be used first as
the output media. Only when this device is off-line, will alternate
devices be used. Otherwise, once a failover is detected and a new
default output is chosen, it will remain the output media until it too
fails. This is useful when one slave was preferred over another, i.e.
when one slave is 1000Mbps and another is 100Mbps. If the 1000Mbps
slave fails and is later restored, it may be preferred the faster slave
gracefully become the active slave - without deliberately failing the
100Mbps slave. Specifying a primary is only valid in active-backup mode.
multicast
Integer value for the mode of operation for multicast support.
Possible values are:
0 Disabled (no multicast support)
1 Enabled on active slave only, useful in active-backup mode
2 Enabled on all slaves, this is the default
Configuring Multiple Bonds
==========================
If several bonding interfaces are required, the driver must be loaded
multiple times. For example, to configure two bonding interfaces with link
monitoring performed every 100 milli-seconds, the /etc/conf.modules should
resemble the following:
alias bond0 bonding
alias bond1 bonding
...
...
@@ -164,10 +301,67 @@ alias bond1 bonding
options bond0 miimon=100
options bond1 -o bonding1 miimon=100
5) Testing configuration
------------------------
You can test the configuration and transmit policy with ifconfig. For example,
for round robin policy, you should get something like this:
Configuring Multiple ARP Targets
================================
While ARP monitoring can be done with just one target, it can be usefull
in a High Availability setup to have several targets to monitor. In the
case of just one target, the target itself may go down or have a problem
making it unresponsive to ARP requests. Having an additional target (or
several) would increase the reliability of the ARP monitoring.
Multiple ARP targets must be seperated by commas as follows:
# example options for ARP monitoring with three targets
alias bond0 bonding
options bond0 arp_interval=60 arp_ip_target=192.168.0.1,192.168.0.3,192.168.0.9
For just a single target the options would resemble:
# example options for ARP monitoring with one target
alias bond0 bonding
options bond0 arp_interval=60 arp_ip_target=192.168.0.100
Switch Configuration
====================
While the switch does not need to be configured when the active-backup
policy is used (mode=1), it does need to be configured for the round-robin,
XOR, and broadcast policies (mode=0, mode=2, and mode=3).
Verifying Bond Configuration
============================
1) Bonding information files
----------------------------
The bonding driver information files reside in the /proc/net/bond* directories.
Sample contents of /proc/net/bond0/info after the driver is loaded with
parameters of mode=0 and miimon=1000 is shown below.
Bonding Mode: load balancing (round-robin)
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 1000
Up Delay (ms): 0
Down Delay (ms): 0
Slave Interface: eth1
MII Status: up
Link Failure Count: 1
Slave Interface: eth0
MII Status: up
Link Failure Count: 1
2) Network verification
-----------------------
The network configuration can be verified using the ifconfig command. In
the example below, the bond0 interface is the master (MASTER) while eth0 and
eth1 are slaves (SLAVE). Notice all slaves of bond0 have the same MAC address
(HWaddr) as bond0.
[root]# /sbin/ifconfig
bond0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4
...
...
@@ -193,8 +387,9 @@ eth1 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4
collisions:0 txqueuelen:100
Interrupt:9 Base address:0x1400
Questions :
===========
Frequently Asked Questions
==========================
1. Is it SMP safe?
...
...
@@ -209,31 +404,30 @@ Questions :
3. How many bonding devices can I have?
One for each module you load. See section on
module p
arameters for how
One for each module you load. See section on
Module P
arameters for how
to accomplish this.
4. How many slaves can a bonding device have?
Limited by the number of network interfaces Linux supports and the
number of cards you can place in your system.
Limited by the number of network interfaces Linux supports and
/or
the
number of
network
cards you can place in your system.
5. What happens when a slave link dies?
If your ethernet cards support MII
status monitoring and the MII
monitoring has been enabled in the driver (see description of module
parameters), there will be no adverse consequences. This release
of the bonding driver knows how to get the MII information and
If your ethernet cards support MII
or ETHTOOL link status monitoring
and the MII monitoring has been enabled in the driver (see description
of module parameters), there will be no adverse consequences. This
release
of the bonding driver knows how to get the MII information and
enables or disables its slaves according to their link status.
See section on H
A
for additional information.
See section on H
igh Availability
for additional information.
For ethernet cards not supporting MII status, or if you wish to
verify that packets have been both send and received, you may
configure the arp_interval and arp_ip_target. If packets have
not been sent or received during this interval, an arp request
is sent to the target to generate send and receive traffic.
If after this interval, either the successful send and/or
receive count has not incremented, the next slave in the sequence
will become the active slave.
For ethernet cards not supporting MII status, the arp_interval and
arp_ip_target parameters must be specified for bonding to work
correctly. If packets have not been sent or received during the
specified arp_interval durration, an ARP request is sent to the targets
to generate send and receive traffic. If after this interval, either
the successful send and/or receive count has not incremented, the next
slave in the sequence will become the active slave.
If neither mii_monitor and arp_interval is configured, the bonding
driver will not handle this situation very well. The driver will
...
...
@@ -245,11 +439,12 @@ Questions :
6. Can bonding be used for High Availability?
Yes, if you use MII monitoring and ALL your cards support MII link
status reporting. See section on H
A
for more information.
status reporting. See section on H
igh Availability
for more information.
7. Which switches/systems does it work with?
In round-robin mode, it works with systems that support trunking:
In round-robin and XOR mode, it works with systems that support
trunking:
* Cisco 5500 series (look for EtherChannel support).
* SunTrunking software.
...
...
@@ -259,7 +454,8 @@ Questions :
units.
* Linux bonding, of course !
In Active-backup mode, it should work with any Layer-II switches.
In active-backup mode, it should work with any Layer-II switche.
8. Where does a bonding device get its MAC address from?
...
...
@@ -297,55 +493,68 @@ Questions :
9. Which transmit polices can be used?
Round
robin, based on the order of enslaving, the output device
Round
-
robin, based on the order of enslaving, the output device
is selected base on the next available slave. Regardless of
the source and/or destination of the packet.
XOR, based on (src hw addr XOR dst hw addr) % slave cnt. This
selects the same slave for each destination hw address.
Active-backup policy that ensures that one and only one device will
transmit at any given moment. Active-backup policy is useful for
implementing high availability solutions using two hubs (see
section on H
A
).
section on H
igh Availability
).
High availability
=================
XOR, based on (src hw addr XOR dst hw addr) % slave count. This
policy selects the same slave for each destination hw address.
Broadcast policy transmits everything on all slave interfaces.
To implement high availability using the bonding driver, you need to
compile the driver as module because currently it is the only way to pass
parameters to the driver. This may change in the future.
High availability is achieved by using MII status reporting. You need to
verify that all your interfaces support MII link status reporting. On Linux
kernel 2.2.17, all the 100 Mbps capable drivers and yellowfin gigabit driver
support it. If your system has an interface that does not support MII status
reporting, a failure of its link will not be detected!
High Availability
=================
The bonding driver can regularly check all its slaves links by checking the
MII status registers. The check interval is specified by the module argument
"miimon" (MII monitoring). It takes an integer that represents the
checking time in milliseconds. It should not come to close to (1000/HZ)
(10 ms on i386) because it may then reduce the system interactivity. 100 ms
seems to be a good value. It means that a dead link will be detected at most
100 ms after it goes down.
To implement high availability using the bonding driver, the driver needs to be
compiled as a module, because currently it is the only way to pass parameters
to the driver. This may change in the future.
High availability is achieved by using MII or ETHTOOL status reporting. You
need to verify that all your interfaces support MII or ETHTOOL link status
reporting. On Linux kernel 2.2.17, all the 100 Mbps capable drivers and
yellowfin gigabit driver support MII. To determine if ETHTOOL link reporting
is available for interface eth0, type "ethtool eth0" and the "Link detected:"
line should contain the correct link status. If your system has an interface
that does not support MII or ETHTOOL status reporting, a failure of its link
will not be detected! A message indicating MII and ETHTOOL is not supported by
a network driver is logged when the bonding driver is loaded with a non-zero
miimon value.
The bonding driver can regularly check all its slaves links using the ETHTOOL
IOCTL (ETHTOOL_GLINK command) or by checking the MII status registers. The
check interval is specified by the module argument "miimon" (MII monitoring).
It takes an integer that represents the checking time in milliseconds. It
should not come to close to (1000/HZ) (10 milli-seconds on i386) because it
may then reduce the system interactivity. A value of 100 seems to be a good
starting point. It means that a dead link will be detected at most 100
milli-seconds after it goes down.
Example:
# modprobe bonding miimon=100
Or, put
in your /etc/modules.conf
:
Or, put
the following lines in /etc/modules.conf
:
alias bond0 bonding
options bond0 miimon=100
There are currently two policies for high availability, depending on whether
a) hosts are connected to a single host or switch that support trunking
b) hosts are connected to several different switches or a single switch that
does not support trunking.
There are currently two policies for high availability. They are dependent on
whether:
a) hosts are connected to a single host or switch that support trunking
b) hosts are connected to several different switches or a single switch that
does not support trunking
1) HA on a single switch or host - load balancing
-------------------------------------------------
1) High Availability on a single switch or host - load balancing
----------------------------------------------------------------
It is the easiest to set up and to understand. Simply configure the
remote equipment (host or switch) to aggregate traffic over several
ports (Trunk, EtherChannel, etc.) and configure the bonding interfaces.
...
...
@@ -356,7 +565,7 @@ encounter problems on some buggy switches that disable the trunk for a
long time if all ports in a trunk go down. This is not Linux, but really
the switch (reboot it to ensure).
Example 1 : host to host at
doubl
e speed
Example 1 : host to host at
twice th
e speed
+----------+ +----------+
| |eth0 eth0| |
...
...
@@ -370,7 +579,7 @@ Example 1 : host to host at double speed
# ifconfig bond0 addr
# ifenslave bond0 eth0 eth1
Example 2 : host to switch at
doubl
e speed
Example 2 : host to switch at
twice th
e speed
+----------+ +----------+
| |eth0 port1| |
...
...
@@ -384,7 +593,9 @@ Example 2 : host to switch at double speed
# ifconfig bond0 addr and port2
# ifenslave bond0 eth0 eth1
2) HA on two or more switches (or a single switch without trunking support)
2) High Availability on two or more switches (or a single switch without
trunking support)
---------------------------------------------------------------------------
This mode is more problematic because it relies on the fact that there
are multiple ports and the host's MAC address should be visible on one
...
...
@@ -423,14 +634,14 @@ point of failure" solution.
+--------------+ host2 +----------------+
eth0 +-------+ eth1
In this configuration, there
are
an ISL - Inter Switch Link (could be a trunk),
In this configuration, there
is
an ISL - Inter Switch Link (could be a trunk),
several servers (host1, host2 ...) attached to both switches each, and one or
more ports to the outside world (port3...). One an only one slave on each host
is active at a time, while all links are still monitored (the system can
detect a failure of active and backup links).
Each time a host changes its active interface, it sticks to the new one until
it goes down. In this example, the hosts are n
ot too much
affected by the
it goes down. In this example, the hosts are n
egligibly
affected by the
expiration time of the switches' forwarding tables.
If host1 and host2 have the same functionality and are used in load balancing
...
...
@@ -460,6 +671,7 @@ Each time the host changes its active interface, it sticks to the new one until
it goes down. In this example, the host is strongly affected by the expiration
time of the switch forwarding table.
3) Adapting to your switches' timing
------------------------------------
If your switches take a long time to go into backup mode, it may be
...
...
@@ -488,8 +700,34 @@ Examples :
# modprobe bonding miimon=100 mode=1 downdelay=2000 updelay=5000
# modprobe bonding miimon=100 mode=0 downdelay=0 updelay=5000
4) Limitations
--------------
Promiscuous Sniffing notes
==========================
If you wish to bond channels together for a network sniffing
application --- you wish to run tcpdump, or ethereal, or an IDS like
snort, with its input aggregated from multiple interfaces using the
bonding driver --- then you need to handle the Promiscuous interface
setting by hand. Specifically, when you "ifconfing bond0 up" you
must add the promisc flag there; it will be propagated down to the
slave interfaces at ifenslave time; a full example might look like:
grep bond0 /etc/modules.conf || echo alias bond0 bonding >/etc/modules.conf
ifconfig bond0 promisc up
for if in eth1 eth2 ...;do
ifconfig $if up
ifenslave bond0 $if
done
snort ... -i bond0 ...
Ifenslave also wants to propagate addresses from interface to
interface, appropriately for its design functions in HA and channel
capacity aggregating; but it works fine for unnumbered interfaces;
just ignore all the warnings it emits.
Limitations
===========
The main limitations are :
- only the link status is monitored. If the switch on the other side is
partially down (e.g. doesn't forward anymore, but the link is OK), the link
...
...
@@ -500,7 +738,13 @@ The main limitations are :
Use the arp_interval/arp_ip_target parameters to count incoming/outgoing
frames.
Resources and links
- A Transmit Load Balancing policy is not currently available. This mode
allows every slave in the bond to transmit while only one receives. If
the "receiving" slave fails, another slave takes over the MAC address of
the failed receiving slave.
Resources and Links
===================
Current development on this driver is posted to:
...
...
Documentation/networking/ifenslave.c
View file @
b4f94e3f
...
...
@@ -41,6 +41,16 @@
* - 2002/02/18 Erik Habbinga <erik_habbinga @ hp dot com> :
* - ifr2.ifr_flags was not initialized in the hwaddr_notset case,
* SIOCGIFFLAGS now called before hwaddr_notset test
*
* - 2002/10/31 Tony Cureington <tony.cureington * hp_com> :
* - If the master does not have a hardware address when the first slave
* is enslaved, the master is assigned the hardware address of that
* slave - there is a comment in bonding.c stating "ifenslave takes
* care of this now." This corrects the problem of slaves having
* different hardware addresses in active-backup mode when
* multiple interfaces are specified on a single ifenslave command
* (ifenslave bond0 eth0 eth1).
*
*/
static
char
*
version
=
...
...
@@ -131,6 +141,7 @@ main(int argc, char **argv)
sa_family_t
master_family
;
char
**
spp
,
*
master_ifname
,
*
slave_ifname
;
int
hwaddr_notset
;
int
master_up
;
while
((
c
=
getopt_long
(
argc
,
argv
,
"acdfrvV?h"
,
longopts
,
0
))
!=
EOF
)
switch
(
c
)
{
...
...
@@ -300,10 +311,86 @@ main(int argc, char **argv)
return
1
;
}
if
(
hwaddr_notset
)
{
/* we do nothing */
if
(
hwaddr_notset
)
{
/* assign the slave hw address to the
* master since it currently does not
* have one; otherwise, slaves may
* have different hw addresses in
* active-backup mode as seen when enslaving
* using "ifenslave bond0 eth0 eth1" because
* hwaddr_notset is set outside this loop.
* TODO: put this and the "else" portion in
* a function.
*/
goterr
=
0
;
master_up
=
0
;
if
(
if_flags
.
ifr_flags
&
IFF_UP
)
{
if_flags
.
ifr_flags
&=
~
IFF_UP
;
if
(
ioctl
(
skfd
,
SIOCSIFFLAGS
,
&
if_flags
)
<
0
)
{
goterr
=
1
;
fprintf
(
stderr
,
"Shutting down "
"interface %s failed: "
"%s
\n
"
,
master_ifname
,
strerror
(
errno
));
}
else
{
/* we took the master down,
* so we must bring it up
*/
master_up
=
1
;
}
}
if
(
!
goterr
)
{
/* get the slaves MAC address */
strncpy
(
if_hwaddr
.
ifr_name
,
slave_ifname
,
IFNAMSIZ
);
if
(
ioctl
(
skfd
,
SIOCGIFHWADDR
,
&
if_hwaddr
)
<
0
)
{
fprintf
(
stderr
,
"Could not get MAC "
"address of %s: %s
\n
"
,
slave_ifname
,
strerror
(
errno
));
strncpy
(
if_hwaddr
.
ifr_name
,
master_ifname
,
IFNAMSIZ
);
goterr
=
1
;
}
}
if
(
!
goterr
)
{
strncpy
(
if_hwaddr
.
ifr_name
,
master_ifname
,
IFNAMSIZ
);
if
(
ioctl
(
skfd
,
SIOCSIFHWADDR
,
&
if_hwaddr
)
<
0
)
{
fprintf
(
stderr
,
"Could not set MAC "
"address of %s: %s
\n
"
,
master_ifname
,
strerror
(
errno
));
goterr
=
1
;
}
else
{
hwaddr_notset
=
0
;
}
}
if
(
master_up
)
{
if_flags
.
ifr_flags
|=
IFF_UP
;
if
(
ioctl
(
skfd
,
SIOCSIFFLAGS
,
&
if_flags
)
<
0
)
{
fprintf
(
stderr
,
"Bringing up interface "
"%s failed: %s
\n
"
,
master_ifname
,
strerror
(
errno
));
}
else
{
/* we'll assign master's hwaddr to this slave */
}
}
else
{
/* we'll assign master's hwaddr to this slave */
if
(
ifr2
.
ifr_flags
&
IFF_UP
)
{
ifr2
.
ifr_flags
&=
~
IFF_UP
;
if
(
ioctl
(
skfd
,
SIOCSIFFLAGS
,
&
ifr2
)
<
0
)
{
...
...
MAINTAINERS
View file @
b4f94e3f
...
...
@@ -1521,6 +1521,12 @@ M: Kai.Makisara@metla.fi
L: linux-scsi@vger.kernel.org
S: Maintained
SCTP PROTOCOL
P: Jon Grimm
M: jgrimm2@us.ibm.com
L: lksctp-developers@lists.sourceforge.net
S: Supported
SCx200 CPU SUPPORT
P: Christer Weinigel
M: christer@weinigel.se
...
...
drivers/net/Kconfig
View file @
b4f94e3f
...
...
@@ -2028,9 +2028,11 @@ config PPPOE
help
Support for PPP over Ethernet.
This driver requires a specially patched pppd daemon. The patch to
pppd, along with binaries of a patched pppd package can be found at:
<http://www.shoshin.uwaterloo.ca/~mostrows/>.
This driver requires the latest version of pppd from the CVS
repository at cvs.samba.org. Alternatively, see the
RoaringPenguin package (http://www.roaringpenguin.com/pppoe)
which contains instruction on how to use this driver (under
the heading "Kernel mode PPPoE").
config PPPOATM
tristate "PPP over ATM"
...
...
drivers/net/bonding.c
View file @
b4f94e3f
...
...
@@ -177,20 +177,91 @@
* - Port Gleb Natapov's multicast support patchs from 2.4.12
* to 2.4.18 adding support for multicast.
*
* 2002/06/1
7
- Tony Cureington <tony.cureington * hp_com>
* 2002/06/1
0
- Tony Cureington <tony.cureington * hp_com>
* - corrected uninitialized pointer (ifr.ifr_data) in bond_check_dev_link;
* actually changed function to use
ETHTOOL, then MIIPHY
, and finally
*
MIIREG
to determine the link status
* actually changed function to use
MIIPHY, then MIIREG
, and finally
*
ETHTOOL
to determine the link status
* - fixed bad ifr_data pointer assignments in bond_ioctl
* - corrected mode 1 being reported as active-backup in bond_get_info;
* also added text to distinguish type of load balancing (rr or xor)
* - change arp_ip_target module param from "1-12s" (array of 12 ptrs)
* to "s" (a single ptr)
*
* 2002/08/30 - Jay Vosburgh <fubar at us dot ibm dot com>
* - Removed acquisition of xmit_lock in set_multicast_list; caused
* deadlock on SMP (lock is held by caller).
* - Revamped SIOCGMIIPHY, SIOCGMIIREG portion of bond_check_dev_link().
*
* 2002/09/18 - Jay Vosburgh <fubar at us dot ibm dot com>
* - Fixed up bond_check_dev_link() (and callers): removed some magic
* numbers, banished local MII_ defines, wrapped ioctl calls to
* prevent EFAULT errors
*
* 2002/9/30 - Jay Vosburgh <fubar at us dot ibm dot com>
* - make sure the ip target matches the arp_target before saving the
* hw address.
*
* 2002/9/30 - Dan Eisner <eisner at 2robots dot com>
* - make sure my_ip is set before taking down the link, since
* not all switches respond if the source ip is not set.
*
* 2002/10/8 - Janice Girouard <girouard at us dot ibm dot com>
* - read in the local ip address when enslaving a device
* - add primary support
* - make sure 2*arp_interval has passed when a new device
* is brought on-line before taking it down.
*
* 2002/09/11 - Philippe De Muyter <phdm at macqel dot be>
* - Added bond_xmit_broadcast logic.
* - Added bond_mode() support function.
*
* 2002/10/26 - Laurent Deniel <laurent.deniel at free.fr>
* - allow to register multicast addresses only on active slave
* (useful in active-backup mode)
* - add multicast module parameter
* - fix deletion of multicast groups after unloading module
*
* 2002/11/06 - Kameshwara Rayaprolu <kameshwara.rao * wipro_com>
* - Changes to prevent panic from closing the device twice; if we close
* the device in bond_release, we must set the original_flags to down
* so it won't be closed again by the network layer.
*
* 2002/11/07 - Tony Cureington <tony.cureington * hp_com>
* - Fix arp_target_hw_addr memory leak
* - Created activebackup_arp_monitor function to handle arp monitoring
* in active backup mode - the bond_arp_monitor had several problems...
* such as allowing slaves to tx arps sequentially without any delay
* for a response
* - Renamed bond_arp_monitor to loadbalance_arp_monitor and re-wrote
* this function to just handle arp monitoring in load-balancing mode;
* it is a lot more compact now
* - Changes to ensure one and only one slave transmits in active-backup
* mode
* - Robustesize parameters; warn users about bad combinations of
* parameters; also if miimon is specified and a network driver does
* not support MII or ETHTOOL, inform the user of this
* - Changes to support link_failure_count when in arp monitoring mode
* - Fix up/down delay reported in /proc
* - Added version; log version; make version available from "modinfo -d"
* - Fixed problem in bond_check_dev_link - if the first IOCTL (SIOCGMIIPH)
* failed, the ETHTOOL ioctl never got a chance
*
* 2002/11/16 - Laurent Deniel <laurent.deniel at free.fr>
* - fix multicast handling in activebackup_arp_monitor
* - remove one unnecessary and confusing current_slave == slave test
* in activebackup_arp_monitor
*
* 2002/11/17 - Laurent Deniel <laurent.deniel at free.fr>
* - fix bond_slave_info_query when slave_id = num_slaves
*
* 2002/11/19 - Janice Girouard <girouard at us dot ibm dot com>
* - correct ifr_data reference. Update ifr_data reference
* to mii_ioctl_data struct values to avoid confusion.
*
*
* 2002/11/22 - Bert Barbe <bert.barbe at oracle dot com>
* - Add support for multiple arp_ip_target
*
*/
#include <linux/config.h>
...
...
@@ -201,6 +272,7 @@
#include <linux/interrupt.h>
#include <linux/ioport.h>
#include <linux/in.h>
#include <linux/ip.h>
#include <linux/slab.h>
#include <linux/string.h>
#include <linux/init.h>
...
...
@@ -208,6 +280,7 @@
#include <linux/socket.h>
#include <linux/errno.h>
#include <linux/netdevice.h>
#include <linux/inetdevice.h>
#include <linux/etherdevice.h>
#include <linux/skbuff.h>
#include <net/sock.h>
...
...
@@ -225,6 +298,13 @@
#include <asm/dma.h>
#include <asm/uaccess.h>
#define DRV_VERSION "2.4.20-20021210"
#define DRV_RELDATE "December 10, 2002"
#define DRV_NAME "bonding"
#define DRV_DESCRIPTION "Ethernet Channel Bonding Driver"
static
const
char
*
version
=
DRV_NAME
".c:v"
DRV_VERSION
" ("
DRV_RELDATE
")
\n
"
;
/* monitor all links that often (in milliseconds). <=0 disables monitoring */
#ifndef BOND_LINK_MON_INTERV
...
...
@@ -235,20 +315,31 @@
#define BOND_LINK_ARP_INTERV 0
#endif
#ifndef MAX_ARP_IP_TARGETS
#define MAX_ARP_IP_TARGETS 16
#endif
static
int
arp_interval
=
BOND_LINK_ARP_INTERV
;
static
char
*
arp_ip_target
=
NULL
;
static
unsigned
long
arp_target
=
0
;
static
char
*
arp_ip_target
[
MAX_ARP_IP_TARGETS
]
=
{
NULL
,
};
static
unsigned
long
arp_target
[
MAX_ARP_IP_TARGETS
]
=
{
0
,
}
;
static
int
arp_ip_count
=
0
;
static
u32
my_ip
=
0
;
char
*
arp_target_hw_addr
=
NULL
;
static
char
*
primary
=
NULL
;
static
int
max_bonds
=
BOND_DEFAULT_MAX_BONDS
;
static
int
miimon
=
BOND_LINK_MON_INTERV
;
static
int
mode
=
BOND_MODE_ROUNDROBIN
;
static
int
updelay
=
0
;
static
int
downdelay
=
0
;
#define BOND_MULTICAST_DISABLED 0
#define BOND_MULTICAST_ACTIVE 1
#define BOND_MULTICAST_ALL 2
static
int
multicast
=
BOND_MULTICAST_ALL
;
static
int
first_pass
=
1
;
int
bond_cnt
;
static
struct
bonding
*
these_bonds
=
NULL
;
static
struct
net_device
*
dev_bonds
=
NULL
;
...
...
@@ -259,13 +350,17 @@ MODULE_PARM_DESC(miimon, "Link check interval in milliseconds");
MODULE_PARM
(
mode
,
"i"
);
MODULE_PARM
(
arp_interval
,
"i"
);
MODULE_PARM_DESC
(
arp_interval
,
"arp interval in milliseconds"
);
MODULE_PARM
(
arp_ip_target
,
"s"
);
MODULE_PARM_DESC
(
arp_ip_target
,
"arp target in n.n.n.n form"
);
MODULE_PARM
(
arp_ip_target
,
"
1-"
__MODULE_STRING
(
MAX_ARP_IP_TARGETS
)
"
s"
);
MODULE_PARM_DESC
(
arp_ip_target
,
"arp target
s
in n.n.n.n form"
);
MODULE_PARM_DESC
(
mode
,
"Mode of operation : 0 for round robin, 1 for active-backup, 2 for xor"
);
MODULE_PARM
(
updelay
,
"i"
);
MODULE_PARM_DESC
(
updelay
,
"Delay before considering link up, in milliseconds"
);
MODULE_PARM
(
downdelay
,
"i"
);
MODULE_PARM_DESC
(
downdelay
,
"Delay before considering link down, in milliseconds"
);
MODULE_PARM
(
primary
,
"s"
);
MODULE_PARM_DESC
(
primary
,
"Primary network device to use"
);
MODULE_PARM
(
multicast
,
"i"
);
MODULE_PARM_DESC
(
multicast
,
"Mode for multicast support : 0 for none, 1 for active slave, 2 for all slaves (default)"
);
extern
void
arp_send
(
int
type
,
int
ptype
,
u32
dest_ip
,
struct
net_device
*
dev
,
u32
src_ip
,
unsigned
char
*
dest_hw
,
unsigned
char
*
src_hw
,
...
...
@@ -276,7 +371,8 @@ static int bond_xmit_xor(struct sk_buff *skb, struct net_device *dev);
static
int
bond_xmit_activebackup
(
struct
sk_buff
*
skb
,
struct
net_device
*
dev
);
static
struct
net_device_stats
*
bond_get_stats
(
struct
net_device
*
dev
);
static
void
bond_mii_monitor
(
struct
net_device
*
dev
);
static
void
bond_arp_monitor
(
struct
net_device
*
dev
);
static
void
loadbalance_arp_monitor
(
struct
net_device
*
dev
);
static
void
activebackup_arp_monitor
(
struct
net_device
*
dev
);
static
int
bond_event
(
struct
notifier_block
*
this
,
unsigned
long
event
,
void
*
ptr
);
static
void
bond_restore_slave_flags
(
slave_t
*
slave
);
static
void
bond_mc_list_destroy
(
struct
bonding
*
bond
);
...
...
@@ -287,6 +383,7 @@ static inline int dmi_same(struct dev_mc_list *dmi1, struct dev_mc_list *dmi2);
static
void
bond_set_promiscuity
(
bonding_t
*
bond
,
int
inc
);
static
void
bond_set_allmulti
(
bonding_t
*
bond
,
int
inc
);
static
struct
dev_mc_list
*
bond_mc_list_find_dmi
(
struct
dev_mc_list
*
dmi
,
struct
dev_mc_list
*
mc_list
);
static
void
bond_mc_update
(
bonding_t
*
bond
,
slave_t
*
new
,
slave_t
*
old
);
static
void
bond_set_slave_inactive_flags
(
slave_t
*
slave
);
static
void
bond_set_slave_active_flags
(
slave_t
*
slave
);
static
int
bond_enslave
(
struct
net_device
*
master
,
struct
net_device
*
slave
);
...
...
@@ -309,6 +406,47 @@ static int bond_get_info(char *buf, char **start, off_t offset, int length);
#define IS_UP(dev) ((((dev)->flags & (IFF_UP)) == (IFF_UP)) && \
(netif_running(dev) && netif_carrier_ok(dev)))
static
void
arp_send_all
(
slave_t
*
slave
)
{
int
i
;
for
(
i
=
0
;
(
i
<
MAX_ARP_IP_TARGETS
)
&&
arp_target
[
i
];
i
++
)
{
arp_send
(
ARPOP_REQUEST
,
ETH_P_ARP
,
arp_target
[
i
],
slave
->
dev
,
my_ip
,
arp_target_hw_addr
,
slave
->
dev
->
dev_addr
,
arp_target_hw_addr
);
}
}
static
const
char
*
bond_mode
(
void
)
{
switch
(
mode
)
{
case
BOND_MODE_ROUNDROBIN
:
return
"load balancing (round-robin)"
;
case
BOND_MODE_ACTIVEBACKUP
:
return
"fault-tolerance (active-backup)"
;
case
BOND_MODE_XOR
:
return
"load balancing (xor)"
;
case
BOND_MODE_BROADCAST
:
return
"fault-tolerance (broadcast)"
;
default
:
return
"unknown"
;
}
}
static
const
char
*
multicast_mode
(
void
)
{
switch
(
multicast
)
{
case
BOND_MULTICAST_DISABLED
:
return
"disabled"
;
case
BOND_MULTICAST_ACTIVE
:
return
"active slave only"
;
case
BOND_MULTICAST_ALL
:
return
"all slaves"
;
default
:
return
"unknown"
;
}
}
static
void
bond_restore_slave_flags
(
slave_t
*
slave
)
{
slave
->
dev
->
flags
=
slave
->
original_flags
;
...
...
@@ -415,37 +553,38 @@ static u16 bond_check_dev_link(struct net_device *dev)
/* call it and not the others for that team */
/* member. */
/* try SOICETHTOOL ioctl, some drivers cache ETHTOOL_GLINK */
/* for a period of time; we need to encourage link status */
/* be reported by network drivers in real time; if the */
/* value is cached, the mmimon module parm may have no */
/* effect... */
etool
.
cmd
=
ETHTOOL_GLINK
;
ifr
.
ifr_data
=
(
char
*
)
&
etool
;
if
(
IOCTL
(
dev
,
&
ifr
,
SIOCETHTOOL
)
==
0
)
{
if
(
etool
.
data
==
1
)
{
return
BMSR_LSTATUS
;
}
else
{
return
(
0
);
}
}
/*
* We cannot assume that SIOCGMIIPHY will also read a
* register; not all network drivers support that.
* register; not all network drivers (e.g., e100)
* support that.
*/
/* Yes, the mii is overlaid on the ifreq.ifr_ifru */
mii
=
(
struct
mii_ioctl_data
*
)
&
ifr
.
ifr_data
;
if
(
IOCTL
(
dev
,
&
ifr
,
SIOCGMIIPHY
)
!=
0
)
{
return
BMSR_LSTATUS
;
/* can't tell */
}
if
(
IOCTL
(
dev
,
&
ifr
,
SIOCGMIIPHY
)
==
0
)
{
mii
->
reg_num
=
MII_BMSR
;
if
(
IOCTL
(
dev
,
&
ifr
,
SIOCGMIIREG
)
==
0
)
{
return
mii
->
val_out
&
BMSR_LSTATUS
;
}
}
/* try SIOCETHTOOL ioctl, some drivers cache ETHTOOL_GLINK */
/* for a period of time so we attempt to get link status */
/* from it last if the above MII ioctls fail... */
etool
.
cmd
=
ETHTOOL_GLINK
;
ifr
.
ifr_data
=
(
char
*
)
&
etool
;
if
(
IOCTL
(
dev
,
&
ifr
,
SIOCETHTOOL
)
==
0
)
{
if
(
etool
.
data
==
1
)
{
return
BMSR_LSTATUS
;
}
else
{
#ifdef BONDING_DEBUG
printk
(
KERN_INFO
":: SIOCETHTOOL shows failure
\n
"
);
#endif
return
(
0
);
}
}
}
return
BMSR_LSTATUS
;
/* spoof link up ( we can't check it) */
...
...
@@ -483,7 +622,11 @@ static int bond_open(struct net_device *dev)
init_timer
(
arp_timer
);
arp_timer
->
expires
=
jiffies
+
(
arp_interval
*
HZ
/
1000
);
arp_timer
->
data
=
(
unsigned
long
)
dev
;
arp_timer
->
function
=
(
void
*
)
&
bond_arp_monitor
;
if
(
mode
==
BOND_MODE_ACTIVEBACKUP
)
{
arp_timer
->
function
=
(
void
*
)
&
activebackup_arp_monitor
;
}
else
{
arp_timer
->
function
=
(
void
*
)
&
loadbalance_arp_monitor
;
}
add_timer
(
arp_timer
);
}
return
0
;
...
...
@@ -501,6 +644,10 @@ static int bond_close(struct net_device *master)
}
if
(
arp_interval
>
0
)
{
/* arp interval, in milliseconds. */
del_timer
(
&
bond
->
arp_timer
);
if
(
arp_target_hw_addr
!=
NULL
)
{
kfree
(
arp_target_hw_addr
);
arp_target_hw_addr
=
NULL
;
}
}
/* Release the bonded slaves */
...
...
@@ -545,9 +692,18 @@ static void bond_mc_list_destroy(struct bonding *bond)
static
void
bond_mc_add
(
bonding_t
*
bond
,
void
*
addr
,
int
alen
)
{
slave_t
*
slave
;
for
(
slave
=
bond
->
prev
;
slave
!=
(
slave_t
*
)
bond
;
slave
=
slave
->
prev
)
{
switch
(
multicast
)
{
case
BOND_MULTICAST_ACTIVE
:
/* write lock already acquired */
if
(
bond
->
current_slave
!=
NULL
)
dev_mc_add
(
bond
->
current_slave
->
dev
,
addr
,
alen
,
0
);
break
;
case
BOND_MULTICAST_ALL
:
for
(
slave
=
bond
->
prev
;
slave
!=
(
slave_t
*
)
bond
;
slave
=
slave
->
prev
)
dev_mc_add
(
slave
->
dev
,
addr
,
alen
,
0
);
break
;
case
BOND_MULTICAST_DISABLED
:
break
;
}
}
...
...
@@ -557,9 +713,19 @@ static void bond_mc_add(bonding_t *bond, void *addr, int alen)
static
void
bond_mc_delete
(
bonding_t
*
bond
,
void
*
addr
,
int
alen
)
{
slave_t
*
slave
;
switch
(
multicast
)
{
case
BOND_MULTICAST_ACTIVE
:
/* write lock already acquired */
if
(
bond
->
current_slave
!=
NULL
)
dev_mc_delete
(
bond
->
current_slave
->
dev
,
addr
,
alen
,
0
);
break
;
case
BOND_MULTICAST_ALL
:
for
(
slave
=
bond
->
prev
;
slave
!=
(
slave_t
*
)
bond
;
slave
=
slave
->
prev
)
dev_mc_delete
(
slave
->
dev
,
addr
,
alen
,
0
);
break
;
case
BOND_MULTICAST_DISABLED
:
break
;
}
}
/*
...
...
@@ -603,9 +769,19 @@ static inline int dmi_same(struct dev_mc_list *dmi1, struct dev_mc_list *dmi2)
static
void
bond_set_promiscuity
(
bonding_t
*
bond
,
int
inc
)
{
slave_t
*
slave
;
switch
(
multicast
)
{
case
BOND_MULTICAST_ACTIVE
:
/* write lock already acquired */
if
(
bond
->
current_slave
!=
NULL
)
dev_set_promiscuity
(
bond
->
current_slave
->
dev
,
inc
);
break
;
case
BOND_MULTICAST_ALL
:
for
(
slave
=
bond
->
prev
;
slave
!=
(
slave_t
*
)
bond
;
slave
=
slave
->
prev
)
dev_set_promiscuity
(
slave
->
dev
,
inc
);
break
;
case
BOND_MULTICAST_DISABLED
:
break
;
}
}
/*
...
...
@@ -614,9 +790,19 @@ static void bond_set_promiscuity(bonding_t *bond, int inc)
static
void
bond_set_allmulti
(
bonding_t
*
bond
,
int
inc
)
{
slave_t
*
slave
;
switch
(
multicast
)
{
case
BOND_MULTICAST_ACTIVE
:
/* write lock already acquired */
if
(
bond
->
current_slave
!=
NULL
)
dev_set_allmulti
(
bond
->
current_slave
->
dev
,
inc
);
break
;
case
BOND_MULTICAST_ALL
:
for
(
slave
=
bond
->
prev
;
slave
!=
(
slave_t
*
)
bond
;
slave
=
slave
->
prev
)
dev_set_allmulti
(
slave
->
dev
,
inc
);
break
;
case
BOND_MULTICAST_DISABLED
:
break
;
}
}
/*
...
...
@@ -641,6 +827,8 @@ static void set_multicast_list(struct net_device *master)
struct
dev_mc_list
*
dmi
;
unsigned
long
flags
=
0
;
if
(
multicast
==
BOND_MULTICAST_DISABLED
)
return
;
/*
* Lock the private data for the master
*/
...
...
@@ -682,6 +870,43 @@ static void set_multicast_list(struct net_device *master)
write_unlock_irqrestore
(
&
bond
->
lock
,
flags
);
}
/*
* Update the mc list and multicast-related flags for the new and
* old active slaves (if any) according to the multicast mode
*/
static
void
bond_mc_update
(
bonding_t
*
bond
,
slave_t
*
new
,
slave_t
*
old
)
{
struct
dev_mc_list
*
dmi
;
switch
(
multicast
)
{
case
BOND_MULTICAST_ACTIVE
:
if
(
bond
->
device
->
flags
&
IFF_PROMISC
)
{
if
(
old
!=
NULL
&&
new
!=
old
)
dev_set_promiscuity
(
old
->
dev
,
-
1
);
dev_set_promiscuity
(
new
->
dev
,
1
);
}
if
(
bond
->
device
->
flags
&
IFF_ALLMULTI
)
{
if
(
old
!=
NULL
&&
new
!=
old
)
dev_set_allmulti
(
old
->
dev
,
-
1
);
dev_set_allmulti
(
new
->
dev
,
1
);
}
/* first remove all mc addresses from old slave if any,
and _then_ add them to new active slave */
if
(
old
!=
NULL
&&
new
!=
old
)
{
for
(
dmi
=
bond
->
device
->
mc_list
;
dmi
!=
NULL
;
dmi
=
dmi
->
next
)
dev_mc_delete
(
old
->
dev
,
dmi
->
dmi_addr
,
dmi
->
dmi_addrlen
,
0
);
}
for
(
dmi
=
bond
->
device
->
mc_list
;
dmi
!=
NULL
;
dmi
=
dmi
->
next
)
dev_mc_add
(
new
->
dev
,
dmi
->
dmi_addr
,
dmi
->
dmi_addrlen
,
0
);
break
;
case
BOND_MULTICAST_ALL
:
/* nothing to do: mc list is already up-to-date on all slaves */
break
;
case
BOND_MULTICAST_DISABLED
:
break
;
}
}
/*
* This function counts the number of attached
* slaves for use by bond_xmit_xor.
...
...
@@ -703,9 +928,16 @@ static int bond_enslave(struct net_device *master_dev,
bonding_t
*
bond
=
NULL
;
slave_t
*
new_slave
=
NULL
;
unsigned
long
flags
=
0
;
unsigned
long
rflags
=
0
;
int
ndx
=
0
;
int
err
=
0
;
struct
dev_mc_list
*
dmi
;
struct
in_ifaddr
**
ifap
;
struct
in_ifaddr
*
ifa
;
static
int
(
*
ioctl
)(
struct
net_device
*
,
struct
ifreq
*
,
int
);
struct
ifreq
ifr
;
struct
ethtool_value
etool
;
int
link_reporting
=
0
;
if
(
master_dev
==
NULL
||
slave_dev
==
NULL
)
{
return
-
ENODEV
;
...
...
@@ -758,6 +990,7 @@ static int bond_enslave(struct net_device *master_dev,
new_slave
->
dev
=
slave_dev
;
if
(
multicast
==
BOND_MULTICAST_ALL
)
{
/* set promiscuity level to new slave */
if
(
master_dev
->
flags
&
IFF_PROMISC
)
dev_set_promiscuity
(
slave_dev
,
1
);
...
...
@@ -769,6 +1002,7 @@ static int bond_enslave(struct net_device *master_dev,
/* upload master's mc_list to new slave */
for
(
dmi
=
master_dev
->
mc_list
;
dmi
!=
NULL
;
dmi
=
dmi
->
next
)
dev_mc_add
(
slave_dev
,
dmi
->
dmi_addr
,
dmi
->
dmi_addrlen
,
0
);
}
/*
* queue to the end of the slaves list, make the first element its
...
...
@@ -799,6 +1033,56 @@ static int bond_enslave(struct net_device *master_dev,
new_slave
->
delay
=
0
;
new_slave
->
link_failure_count
=
0
;
if
(
miimon
>
0
)
{
/* if the network driver for the slave does not support
* ETHTOOL/MII link status reporting, warn the user of this
*/
if
((
ioctl
=
slave_dev
->
do_ioctl
)
!=
NULL
)
{
etool
.
cmd
=
ETHTOOL_GLINK
;
ifr
.
ifr_data
=
(
char
*
)
&
etool
;
if
(
IOCTL
(
slave_dev
,
&
ifr
,
SIOCETHTOOL
)
==
0
)
{
link_reporting
=
1
;
}
else
{
if
(
IOCTL
(
slave_dev
,
&
ifr
,
SIOCGMIIPHY
)
==
0
)
{
/* Yes, the mii is overlaid on the
* ifreq.ifr_ifru
*/
((
struct
mii_ioctl_data
*
)
(
&
ifr
.
ifr_data
))
->
reg_num
=
1
;
if
(
IOCTL
(
slave_dev
,
&
ifr
,
SIOCGMIIREG
)
==
0
)
{
link_reporting
=
1
;
}
}
}
}
if
((
link_reporting
==
0
)
&&
(
arp_interval
==
0
))
{
/* miimon is set but a bonded network driver does
* not support ETHTOOL/MII and arp_interval is
* not set
*/
printk
(
KERN_ERR
"bond_enslave(): MII and ETHTOOL support not "
"available for interface %s, and "
"arp_interval/arp_ip_target module parameters "
"not specified, thus bonding will not detect "
"link failures! see bonding.txt for details.
\n
"
,
slave_dev
->
name
);
}
else
if
(
link_reporting
==
0
)
{
/* unable get link status using mii/ethtool */
printk
(
KERN_WARNING
"bond_enslave: can't get link status from "
"interface %s; the network driver associated "
"with this interface does not support "
"MII or ETHTOOL link status reporting, thus "
"miimon has no effect on this interface.
\n
"
,
slave_dev
->
name
);
}
}
/* check for initial state */
if
((
miimon
<=
0
)
||
(
bond_check_dev_link
(
slave_dev
)
==
BMSR_LSTATUS
))
{
...
...
@@ -806,6 +1090,7 @@ static int bond_enslave(struct net_device *master_dev,
printk
(
KERN_CRIT
"Initial state of slave_dev is BOND_LINK_UP
\n
"
);
#endif
new_slave
->
link
=
BOND_LINK_UP
;
new_slave
->
jiffies
=
jiffies
;
}
else
{
#ifdef BONDING_DEBUG
...
...
@@ -832,6 +1117,7 @@ static int bond_enslave(struct net_device *master_dev,
is OK, so make this interface the active one */
bond
->
current_slave
=
new_slave
;
bond_set_slave_active_flags
(
new_slave
);
bond_mc_update
(
bond
,
new_slave
,
NULL
);
}
else
{
#ifdef BONDING_DEBUG
...
...
@@ -839,16 +1125,25 @@ static int bond_enslave(struct net_device *master_dev,
#endif
bond_set_slave_inactive_flags
(
new_slave
);
}
read_lock_irqsave
(
&
(((
struct
in_device
*
)
slave_dev
->
ip_ptr
)
->
lock
),
rflags
);
ifap
=
&
(((
struct
in_device
*
)
slave_dev
->
ip_ptr
)
->
ifa_list
);
ifa
=
*
ifap
;
my_ip
=
ifa
->
ifa_address
;
read_unlock_irqrestore
(
&
(((
struct
in_device
*
)
slave_dev
->
ip_ptr
)
->
lock
),
rflags
);
/* if there is a primary slave, remember it */
if
(
primary
!=
NULL
)
if
(
strcmp
(
primary
,
new_slave
->
dev
->
name
)
==
0
)
bond
->
primary_slave
=
new_slave
;
}
else
{
#ifdef BONDING_DEBUG
printk
(
KERN_CRIT
"This slave is always active in trunk mode
\n
"
);
#endif
/* always active in trunk mode */
new_slave
->
state
=
BOND_STATE_ACTIVE
;
if
(
bond
->
current_slave
==
NULL
)
{
if
(
bond
->
current_slave
==
NULL
)
bond
->
current_slave
=
new_slave
;
}
}
update_slave_cnt
(
bond
);
...
...
@@ -938,6 +1233,7 @@ static int bond_change_active(struct net_device *master_dev, struct net_device *
IS_UP
(
newactive
->
dev
))
{
bond_set_slave_inactive_flags
(
oldactive
);
bond_set_slave_active_flags
(
newactive
);
bond_mc_update
(
bond
,
newactive
,
oldactive
);
bond
->
current_slave
=
newactive
;
printk
(
"%s : activate %s(old : %s)
\n
"
,
master_dev
->
name
,
newactive
->
dev
->
name
,
...
...
@@ -978,6 +1274,7 @@ slave_t *change_active_interface(bonding_t *bond)
newslave
=
bond
->
current_slave
=
bond
->
next
;
write_unlock
(
&
bond
->
ptrlock
);
}
else
{
printk
(
" but could not find any %s interface.
\n
"
,
(
mode
==
BOND_MODE_ACTIVEBACKUP
)
?
"backup"
:
"other"
);
write_lock
(
&
bond
->
ptrlock
);
...
...
@@ -985,16 +1282,38 @@ slave_t *change_active_interface(bonding_t *bond)
write_unlock
(
&
bond
->
ptrlock
);
return
NULL
;
/* still no slave, return NULL */
}
}
else
if
(
mode
==
BOND_MODE_ACTIVEBACKUP
)
{
/* make sure oldslave doesn't send arps - this could
* cause a ping-pong effect between interfaces since they
* would be able to tx arps - in active backup only one
* slave should be able to tx arps, and that should be
* the current_slave; the only exception is when all
* slaves have gone down, then only one non-current slave can
* send arps at a time; clearing oldslaves' mc list is handled
* later in this function.
*/
bond_set_slave_inactive_flags
(
oldslave
);
}
mintime
=
updelay
;
/* first try the primary link; if arping, a link must tx/rx traffic
* before it can be considered the current_slave - also, we would skip
* slaves between the current_slave and primary_slave that may be up
* and able to arp
*/
if
((
bond
->
primary_slave
!=
NULL
)
&&
(
arp_interval
==
0
))
{
if
(
IS_UP
(
bond
->
primary_slave
->
dev
))
newslave
=
bond
->
primary_slave
;
}
do
{
if
(
IS_UP
(
newslave
->
dev
))
{
if
(
newslave
->
link
==
BOND_LINK_UP
)
{
/* this one is immediately usable */
if
(
mode
==
BOND_MODE_ACTIVEBACKUP
)
{
bond_set_slave_active_flags
(
newslave
);
bond_mc_update
(
bond
,
newslave
,
oldslave
);
printk
(
" and making interface %s the active one.
\n
"
,
newslave
->
dev
->
name
);
}
...
...
@@ -1030,14 +1349,30 @@ slave_t *change_active_interface(bonding_t *bond)
bestslave
->
delay
=
0
;
bestslave
->
link
=
BOND_LINK_UP
;
bestslave
->
jiffies
=
jiffies
;
bond_set_slave_active_flags
(
bestslave
);
bond_mc_update
(
bond
,
bestslave
,
oldslave
);
write_lock
(
&
bond
->
ptrlock
);
bond
->
current_slave
=
bestslave
;
write_unlock
(
&
bond
->
ptrlock
);
return
bestslave
;
}
if
((
mode
==
BOND_MODE_ACTIVEBACKUP
)
&&
(
multicast
==
BOND_MULTICAST_ACTIVE
)
&&
(
oldslave
!=
NULL
))
{
/* flush bonds (master's) mc_list from oldslave since it wasn't
* updated (and deleted) above
*/
bond_mc_list_flush
(
oldslave
->
dev
,
bond
->
device
);
if
(
bond
->
device
->
flags
&
IFF_PROMISC
)
{
dev_set_promiscuity
(
oldslave
->
dev
,
-
1
);
}
if
(
bond
->
device
->
flags
&
IFF_ALLMULTI
)
{
dev_set_allmulti
(
oldslave
->
dev
,
-
1
);
}
}
printk
(
" but could not find any %s interface.
\n
"
,
(
mode
==
BOND_MODE_ACTIVEBACKUP
)
?
"backup"
:
"other"
);
...
...
@@ -1081,6 +1416,7 @@ static int bond_release(struct net_device *master, struct net_device *slave)
return
-
EINVAL
;
}
bond
->
current_arp_slave
=
NULL
;
our_slave
=
(
slave_t
*
)
bond
;
old_current
=
bond
->
current_slave
;
while
((
our_slave
=
our_slave
->
prev
)
!=
(
slave_t
*
)
bond
)
{
...
...
@@ -1101,6 +1437,7 @@ static int bond_release(struct net_device *master, struct net_device *slave)
/* release the slave from its bond */
if
(
multicast
==
BOND_MULTICAST_ALL
)
{
/* flush master's mc_list from slave */
bond_mc_list_flush
(
slave
,
master
);
...
...
@@ -1111,6 +1448,7 @@ static int bond_release(struct net_device *master, struct net_device *slave)
/* unset allmulti level from slave */
if
(
master
->
flags
&
IFF_ALLMULTI
)
dev_set_allmulti
(
slave
,
-
1
);
}
netdev_set_master
(
slave
,
NULL
);
...
...
@@ -1122,6 +1460,7 @@ static int bond_release(struct net_device *master, struct net_device *slave)
if
(
slave
->
flags
&
IFF_NOARP
||
bond
->
current_slave
!=
NULL
)
{
dev_close
(
slave
);
our_slave
->
original_flags
&=
~
IFF_UP
;
}
bond_restore_slave_flags
(
our_slave
);
...
...
@@ -1135,6 +1474,10 @@ static int bond_release(struct net_device *master, struct net_device *slave)
update_slave_cnt
(
bond
);
if
(
bond
->
primary_slave
==
our_slave
)
{
bond
->
primary_slave
=
NULL
;
}
write_unlock_irqrestore
(
&
bond
->
lock
,
flags
);
return
0
;
/* deletion OK */
}
...
...
@@ -1166,12 +1509,28 @@ static int bond_release_all(struct net_device *master)
}
bond
=
(
struct
bonding
*
)
master
->
priv
;
bond
->
current_slave
=
NULL
;
bond
->
current_
arp_
slave
=
NULL
;
while
((
our_slave
=
bond
->
prev
)
!=
(
slave_t
*
)
bond
)
{
slave_dev
=
our_slave
->
dev
;
bond
->
prev
=
our_slave
->
prev
;
if
(
multicast
==
BOND_MULTICAST_ALL
||
(
multicast
==
BOND_MULTICAST_ACTIVE
&&
bond
->
current_slave
==
our_slave
))
{
/* flush master's mc_list from slave */
bond_mc_list_flush
(
slave_dev
,
master
);
/* unset promiscuity level from slave */
if
(
master
->
flags
&
IFF_PROMISC
)
dev_set_promiscuity
(
slave_dev
,
-
1
);
/* unset allmulti level from slave */
if
(
master
->
flags
&
IFF_ALLMULTI
)
dev_set_allmulti
(
slave_dev
,
-
1
);
}
kfree
(
our_slave
);
netdev_set_master
(
slave_dev
,
NULL
);
...
...
@@ -1183,9 +1542,12 @@ static int bond_release_all(struct net_device *master)
if
(
slave_dev
->
flags
&
IFF_NOARP
)
dev_close
(
slave_dev
);
}
bond
->
current_slave
=
NULL
;
bond
->
next
=
(
slave_t
*
)
bond
;
bond
->
slave_cnt
=
0
;
printk
(
KERN_INFO
"%s: releases all slaves
\n
"
,
master
->
name
);
bond
->
primary_slave
=
NULL
;
printk
(
KERN_INFO
"%s: released all slaves
\n
"
,
master
->
name
);
return
0
;
}
...
...
@@ -1291,6 +1653,7 @@ static void bond_mii_monitor(struct net_device *master)
}
else
{
/* link up again */
slave
->
link
=
BOND_LINK_UP
;
slave
->
jiffies
=
jiffies
;
printk
(
KERN_INFO
"%s: link status up again after %d ms "
"for interface %s.
\n
"
,
...
...
@@ -1343,8 +1706,10 @@ static void bond_mii_monitor(struct net_device *master)
if
(
slave
->
delay
==
0
)
{
/* now the link has been up for long time enough */
slave
->
link
=
BOND_LINK_UP
;
slave
->
jiffies
=
jiffies
;
if
(
mode
==
BOND_MODE_ACTIVEBACKUP
)
{
if
(
(
mode
==
BOND_MODE_ACTIVEBACKUP
)
||
(
slave
!=
bond
->
primary_slave
)
)
{
/* prevent it from being the active one */
slave
->
state
=
BOND_STATE_BACKUP
;
}
...
...
@@ -1358,15 +1723,26 @@ static void bond_mii_monitor(struct net_device *master)
"for interface %s.
\n
"
,
master
->
name
,
dev
->
name
);
if
(
(
bond
->
primary_slave
!=
NULL
)
&&
(
slave
==
bond
->
primary_slave
)
)
change_active_interface
(
bond
);
}
else
slave
->
delay
--
;
/* we'll also look for the mostly eligible slave */
if
(
bond
->
primary_slave
==
NULL
)
{
if
(
IS_UP
(
dev
)
&&
(
slave
->
delay
<
mindelay
))
{
mindelay
=
slave
->
delay
;
bestslave
=
slave
;
}
}
else
if
(
(
IS_UP
(
bond
->
primary_slave
->
dev
))
||
(
(
!
IS_UP
(
bond
->
primary_slave
->
dev
))
&&
(
IS_UP
(
dev
)
&&
(
slave
->
delay
<
mindelay
))
)
)
{
mindelay
=
slave
->
delay
;
bestslave
=
slave
;
}
}
break
;
}
/* end of switch */
...
...
@@ -1380,6 +1756,7 @@ static void bond_mii_monitor(struct net_device *master)
oldcurrent
=
bond
->
current_slave
;
read_unlock
(
&
bond
->
ptrlock
);
/* no active interface at the moment or need to bring up the primary */
if
(
oldcurrent
==
NULL
)
{
/* no active interface at the moment */
if
(
bestslave
!=
NULL
)
{
/* last chance to find one ? */
if
(
bestslave
->
link
==
BOND_LINK_UP
)
{
...
...
@@ -1395,10 +1772,12 @@ static void bond_mii_monitor(struct net_device *master)
bestslave
->
delay
=
0
;
bestslave
->
link
=
BOND_LINK_UP
;
bestslave
->
jiffies
=
jiffies
;
}
if
(
mode
==
BOND_MODE_ACTIVEBACKUP
)
{
bond_set_slave_active_flags
(
bestslave
);
bond_mc_update
(
bond
,
bestslave
,
NULL
);
}
else
{
bestslave
->
state
=
BOND_STATE_ACTIVE
;
}
...
...
@@ -1420,10 +1799,12 @@ static void bond_mii_monitor(struct net_device *master)
/*
* this function is called regularly to monitor each slave's link
* insuring that traffic is being sent and received. If the adapter
* has been dormant, then an arp is transmitted to generate traffic
* ensuring that traffic is being sent and received when arp monitoring
* is used in load-balancing mode. if the adapter has been dormant, then an
* arp is transmitted to generate traffic. see activebackup_arp_monitor for
* arp monitoring in active backup mode.
*/
static
void
bond
_arp_monitor
(
struct
net_device
*
master
)
static
void
loadbalance
_arp_monitor
(
struct
net_device
*
master
)
{
bonding_t
*
bond
;
unsigned
long
flags
;
...
...
@@ -1439,147 +1820,358 @@ static void bond_arp_monitor(struct net_device *master)
read_lock_irqsave
(
&
bond
->
lock
,
flags
);
if
(
!
IS_UP
(
master
))
{
/* TODO: investigate why rtnl_shlock_nowait and rtnl_exlock_nowait
* are called below and add comment why they are required...
*/
if
((
!
IS_UP
(
master
))
||
rtnl_shlock_nowait
())
{
mod_timer
(
&
bond
->
arp_timer
,
next_timer
);
goto
arp_monitor_out
;
}
if
(
rtnl_shlock_nowait
())
{
goto
arp_monitor_out
;
read_unlock_irqrestore
(
&
bond
->
lock
,
flags
);
return
;
}
if
(
rtnl_exlock_nowait
())
{
rtnl_shunlock
();
goto
arp_monitor_out
;
mod_timer
(
&
bond
->
arp_timer
,
next_timer
);
read_unlock_irqrestore
(
&
bond
->
lock
,
flags
);
return
;
}
/* see if any of the previous devices are up now (i.e. they have seen a
* response from an arp request sent by another adapter, since they
* have the same hardware address).
/* see if any of the previous devices are up now (i.e. they have
* xmt and rcv traffic). the current_slave does not come into
* the picture unless it is null. also, slave->jiffies is not needed
* here because we send an arp on each slave and give a slave as
* long as it needs to get the tx/rx within the delta.
* TODO: what about up/down delay in arp mode? it wasn't here before
* so it can wait
*/
slave
=
(
slave_t
*
)
bond
;
while
((
slave
=
slave
->
prev
)
!=
(
slave_t
*
)
bond
)
{
read_lock
(
&
bond
->
ptrlock
);
if
(
(
!
(
slave
->
link
==
BOND_LINK_UP
))
&&
(
slave
!=
bond
->
current_slave
)
)
{
if
(
slave
->
link
!=
BOND_LINK_UP
)
{
read_unlock
(
&
bond
->
ptrlock
);
if
(
((
jiffies
-
slave
->
dev
->
trans_start
)
<=
if
(((
jiffies
-
slave
->
dev
->
trans_start
)
<=
the_delta_in_ticks
)
&&
((
jiffies
-
slave
->
dev
->
last_rx
)
<=
the_delta_in_ticks
)
)
{
the_delta_in_ticks
))
{
slave
->
link
=
BOND_LINK_UP
;
write_lock
(
&
bond
->
ptrlock
);
if
(
bond
->
current_slave
==
NULL
)
{
slave
->
state
=
BOND_STATE_ACTIVE
;
bond
->
current_slave
=
slave
;
/* primary_slave has no meaning in round-robin
* mode. the window of a slave being up and
* current_slave being null after enslaving
* is closed.
*/
read_lock
(
&
bond
->
ptrlock
);
if
(
bond
->
current_slave
==
NULL
)
{
read_unlock
(
&
bond
->
ptrlock
);
printk
(
KERN_INFO
"%s: link status definitely up "
"for interface %s, "
,
master
->
name
,
slave
->
dev
->
name
);
change_active_interface
(
bond
);
}
else
{
read_unlock
(
&
bond
->
ptrlock
);
printk
(
KERN_INFO
"%s: interface %s is now up
\n
"
,
master
->
name
,
slave
->
dev
->
name
);
}
if
(
slave
!=
bond
->
current_slave
)
{
slave
->
dev
->
flags
|=
IFF_NOARP
;
}
write_unlock
(
&
bond
->
ptrlock
);
}
else
{
/* slave->link == BOND_LINK_UP */
/* not all switches will respond to an arp request
* when the source ip is 0, so don't take the link down
* if we don't know our ip yet
*/
if
(((
jiffies
-
slave
->
dev
->
trans_start
)
>=
(
2
*
the_delta_in_ticks
))
||
(((
jiffies
-
slave
->
dev
->
last_rx
)
>=
(
2
*
the_delta_in_ticks
))
&&
my_ip
!=
0
))
{
slave
->
link
=
BOND_LINK_DOWN
;
slave
->
state
=
BOND_STATE_BACKUP
;
if
(
slave
->
link_failure_count
<
UINT_MAX
)
{
slave
->
link_failure_count
++
;
}
printk
(
KERN_INFO
"%s: interface %s is now down.
\n
"
,
master
->
name
,
slave
->
dev
->
name
);
read_lock
(
&
bond
->
ptrlock
);
if
(
slave
==
bond
->
current_slave
)
{
read_unlock
(
&
bond
->
ptrlock
);
change_active_interface
(
bond
);
}
else
{
read_unlock
(
&
bond
->
ptrlock
);
}
}
}
/* note: if switch is in round-robin mode, all links
* must tx arp to ensure all links rx an arp - otherwise
* links may oscillate or not come up at all; if switch is
* in something like xor mode, there is nothing we can
* do - all replies will be rx'ed on same link causing slaves
* to be unstable during low/no traffic periods
*/
if
(
IS_UP
(
slave
->
dev
))
{
arp_send_all
(
slave
);
}
}
rtnl_exunlock
();
rtnl_shunlock
();
read_unlock_irqrestore
(
&
bond
->
lock
,
flags
);
/* re-arm the timer */
mod_timer
(
&
bond
->
arp_timer
,
next_timer
);
}
/*
* When using arp monitoring in active-backup mode, this function is
* called to determine if any backup slaves have went down or a new
* current slave needs to be found.
* The backup slaves never generate traffic, they are considered up by merely
* receiving traffic. If the current slave goes down, each backup slave will
* be given the opportunity to tx/rx an arp before being taken down - this
* prevents all slaves from being taken down due to the current slave not
* sending any traffic for the backups to receive. The arps are not necessarily
* necessary, any tx and rx traffic will keep the current slave up. While any
* rx traffic will keep the backup slaves up, the current slave is responsible
* for generating traffic to keep them up regardless of any other traffic they
* may have received.
* see loadbalance_arp_monitor for arp monitoring in load balancing mode
*/
static
void
activebackup_arp_monitor
(
struct
net_device
*
master
)
{
bonding_t
*
bond
;
unsigned
long
flags
;
slave_t
*
slave
;
int
the_delta_in_ticks
=
arp_interval
*
HZ
/
1000
;
int
next_timer
=
jiffies
+
(
arp_interval
*
HZ
/
1000
);
bond
=
(
struct
bonding
*
)
master
->
priv
;
if
(
master
->
priv
==
NULL
)
{
mod_timer
(
&
bond
->
arp_timer
,
next_timer
);
return
;
}
read_lock_irqsave
(
&
bond
->
lock
,
flags
);
if
(
!
IS_UP
(
master
))
{
mod_timer
(
&
bond
->
arp_timer
,
next_timer
);
read_unlock_irqrestore
(
&
bond
->
lock
,
flags
);
return
;
}
/* determine if any slave has come up or any backup slave has
* gone down
* TODO: what about up/down delay in arp mode? it wasn't here before
* so it can wait
*/
slave
=
(
slave_t
*
)
bond
;
while
((
slave
=
slave
->
prev
)
!=
(
slave_t
*
)
bond
)
{
if
(
slave
->
link
!=
BOND_LINK_UP
)
{
if
((
jiffies
-
slave
->
dev
->
last_rx
)
<=
the_delta_in_ticks
)
{
arp_send
(
ARPOP_REQUEST
,
ETH_P_ARP
,
arp_target
,
slave
->
dev
,
my_ip
,
arp_target_hw_addr
,
slave
->
dev
->
dev_addr
,
arp_target_hw_addr
);
slave
->
link
=
BOND_LINK_UP
;
write_lock
(
&
bond
->
ptrlock
);
if
((
bond
->
current_slave
==
NULL
)
&&
((
jiffies
-
slave
->
dev
->
trans_start
)
<=
the_delta_in_ticks
))
{
bond
->
current_slave
=
slave
;
bond_set_slave_active_flags
(
slave
);
bond_mc_update
(
bond
,
slave
,
NULL
);
bond
->
current_arp_slave
=
NULL
;
}
else
if
(
bond
->
current_slave
!=
slave
)
{
/* this slave has just come up but we
* already have a current slave; this
* can also happen if bond_enslave adds
* a new slave that is up while we are
* searching for a new slave
*/
bond_set_slave_inactive_flags
(
slave
);
bond
->
current_arp_slave
=
NULL
;
}
if
(
slave
==
bond
->
current_slave
)
{
printk
(
KERN_INFO
"%s: %s is up and now the "
"active interface
\n
"
,
master
->
name
,
slave
->
dev
->
name
);
}
else
{
printk
(
KERN_INFO
"%s: backup interface %s is "
"now up
\n
"
,
master
->
name
,
slave
->
dev
->
name
);
}
write_unlock
(
&
bond
->
ptrlock
);
}
}
else
}
else
{
read_lock
(
&
bond
->
ptrlock
);
if
((
slave
!=
bond
->
current_slave
)
&&
(
bond
->
current_arp_slave
==
NULL
)
&&
(((
jiffies
-
slave
->
dev
->
last_rx
)
>=
3
*
the_delta_in_ticks
)
&&
(
my_ip
!=
0
)))
{
/* a backup slave has gone down; three times
* the delta allows the current slave to be
* taken out before the backup slave.
* note: a non-null current_arp_slave indicates
* the current_slave went down and we are
* searching for a new one; under this
* condition we only take the current_slave
* down - this gives each slave a chance to
* tx/rx traffic before being taken out
*/
read_unlock
(
&
bond
->
ptrlock
);
slave
->
link
=
BOND_LINK_DOWN
;
if
(
slave
->
link_failure_count
<
UINT_MAX
)
{
slave
->
link_failure_count
++
;
}
bond_set_slave_inactive_flags
(
slave
);
printk
(
KERN_INFO
"%s: backup interface %s is now down
\n
"
,
master
->
name
,
slave
->
dev
->
name
);
}
else
{
read_unlock
(
&
bond
->
ptrlock
);
}
}
}
read_lock
(
&
bond
->
ptrlock
);
slave
=
bond
->
current_slave
;
read_unlock
(
&
bond
->
ptrlock
);
if
(
slave
!=
0
)
{
if
(
slave
!=
NULL
)
{
/* see if you need to take down the current_slave, since
* you haven't seen an arp in 2*arp_intervals
/* if we have sent traffic in the past 2*arp_intervals but
* haven't xmit and rx traffic in that time interval, select
* a different slave. slave->jiffies is only updated when
* a slave first becomes the current_slave - not necessarily
* after every arp; this ensures the slave has a full 2*delta
* before being taken out. if a primary is being used, check
* if it is up and needs to take over as the current_slave
*/
if
(
((
jiffies
-
slave
->
dev
->
trans_start
)
>=
if
((((
jiffies
-
slave
->
dev
->
trans_start
)
>=
(
2
*
the_delta_in_ticks
))
||
((
jiffies
-
slave
->
dev
->
last_rx
)
>=
(
2
*
the_delta_in_ticks
))
)
{
(((
jiffies
-
slave
->
dev
->
last_rx
)
>=
(
2
*
the_delta_in_ticks
))
&&
(
my_ip
!=
0
)))
&&
((
jiffies
-
slave
->
jiffies
)
>=
2
*
the_delta_in_ticks
))
{
if
(
slave
->
link
==
BOND_LINK_UP
)
{
slave
->
link
=
BOND_LINK_DOWN
;
slave
->
state
=
BOND_STATE_BACKUP
;
/*
* we want to see arps, otherwise we couldn't
* bring the adapter back online...
*/
printk
(
KERN_INFO
"%s: link status definitely "
"down for interface %s, "
"disabling it"
,
slave
->
dev
->
master
->
name
,
if
(
slave
->
link_failure_count
<
UINT_MAX
)
{
slave
->
link_failure_count
++
;
}
printk
(
KERN_INFO
"%s: link status down for "
"active interface %s, disabling it"
,
master
->
name
,
slave
->
dev
->
name
);
/* find a new interface and be verbose */
change_active_interface
(
bond
);
read_lock
(
&
bond
->
ptrlock
);
slave
=
bond
->
current_slave
;
read_unlock
(
&
bond
->
ptrlock
);
slave
=
change_active_interface
(
bond
);
bond
->
current_arp_slave
=
slave
;
if
(
slave
!=
NULL
)
{
slave
->
jiffies
=
jiffies
;
}
}
else
if
((
bond
->
primary_slave
!=
NULL
)
&&
(
bond
->
primary_slave
!=
slave
)
&&
(
bond
->
primary_slave
->
link
==
BOND_LINK_UP
))
{
/* at this point, slave is the current_slave */
printk
(
KERN_INFO
"%s: changing from interface %s to primary "
"interface %s
\n
"
,
master
->
name
,
slave
->
dev
->
name
,
bond
->
primary_slave
->
dev
->
name
);
/* primary is up so switch to it */
bond_set_slave_inactive_flags
(
slave
);
bond_mc_update
(
bond
,
bond
->
primary_slave
,
slave
);
write_lock
(
&
bond
->
ptrlock
);
bond
->
current_slave
=
bond
->
primary_slave
;
write_unlock
(
&
bond
->
ptrlock
);
slave
=
bond
->
primary_slave
;
bond_set_slave_active_flags
(
slave
);
slave
->
jiffies
=
jiffies
;
}
else
{
bond
->
current_arp_slave
=
NULL
;
}
/*
* ok, we know up/down, so just send a arp out if there has
* been no activity for a while
*/
if
(
slave
!=
NULL
)
{
if
(
((
jiffies
-
slave
->
dev
->
trans_start
)
>=
the_delta_in_ticks
)
||
((
jiffies
-
slave
->
dev
->
last_rx
)
>=
the_delta_in_ticks
)
)
{
arp_send
(
ARPOP_REQUEST
,
ETH_P_ARP
,
arp_target
,
slave
->
dev
,
my_ip
,
arp_target_hw_addr
,
slave
->
dev
->
dev_addr
,
arp_target_hw_addr
);
/* the current slave must tx an arp to ensure backup slaves
* rx traffic
*/
if
((
slave
!=
NULL
)
&&
(((
jiffies
-
slave
->
dev
->
last_rx
)
>=
the_delta_in_ticks
)
&&
(
my_ip
!=
0
)))
{
arp_send_all
(
slave
);
}
}
/* if we don't have a current_slave, search for the next available
* backup slave from the current_arp_slave and make it the candidate
* for becoming the current_slave
*/
if
(
slave
==
NULL
)
{
if
((
bond
->
current_arp_slave
==
NULL
)
||
(
bond
->
current_arp_slave
==
(
slave_t
*
)
bond
))
{
bond
->
current_arp_slave
=
bond
->
prev
;
}
if
(
bond
->
current_arp_slave
!=
(
slave_t
*
)
bond
)
{
bond_set_slave_inactive_flags
(
bond
->
current_arp_slave
);
slave
=
bond
->
current_arp_slave
->
next
;
/* search for next candidate */
do
{
if
(
IS_UP
(
slave
->
dev
))
{
slave
->
link
=
BOND_LINK_BACK
;
bond_set_slave_active_flags
(
slave
);
arp_send_all
(
slave
);
slave
->
jiffies
=
jiffies
;
bond
->
current_arp_slave
=
slave
;
break
;
}
/* if we have no current slave.. try sending
* an arp on all of the interfaces
/* if the link state is up at this point, we
* mark it down - this can happen if we have
* simultaneous link failures and
* change_active_interface doesn't make this
* one the current slave so it is still marked
* up when it is actually down
*/
if
(
slave
->
link
==
BOND_LINK_UP
)
{
slave
->
link
=
BOND_LINK_DOWN
;
if
(
slave
->
link_failure_count
<
UINT_MAX
)
{
slave
->
link_failure_count
++
;
}
read_lock
(
&
bond
->
ptrlock
);
if
(
bond
->
current_slave
==
NULL
)
{
read_unlock
(
&
bond
->
ptrlock
);
slave
=
(
slave_t
*
)
bond
;
while
((
slave
=
slave
->
prev
)
!=
(
slave_t
*
)
bond
)
{
arp_send
(
ARPOP_REQUEST
,
ETH_P_ARP
,
arp_target
,
slave
->
dev
,
my_ip
,
arp_target_hw_addr
,
slave
->
dev
->
dev_addr
,
arp_target_hw_addr
);
bond_set_slave_inactive_flags
(
slave
);
printk
(
KERN_INFO
"%s: backup interface "
"%s is now down.
\n
"
,
master
->
name
,
slave
->
dev
->
name
);
}
}
while
((
slave
=
slave
->
next
)
!=
bond
->
current_arp_slave
->
next
);
}
else
{
read_unlock
(
&
bond
->
ptrlock
);
}
rtnl_exunlock
();
rtnl_shunlock
();
arp_monitor_out:
read_unlock_irqrestore
(
&
bond
->
lock
,
flags
);
/* re-arm the timer */
mod_timer
(
&
bond
->
arp_timer
,
next_timer
);
read_unlock_irqrestore
(
&
bond
->
lock
,
flags
);
}
#define isdigit(c) (c >= '0' && c <= '9')
__inline
static
int
atoi
(
char
**
s
)
{
...
...
@@ -1720,7 +2312,7 @@ static int bond_slave_info_query(struct net_device *master,
}
read_unlock_irqrestore
(
&
bond
->
lock
,
flags
);
if
(
cur_ndx
==
info
->
slave_i
d
)
{
if
(
slave
!=
(
slave_t
*
)
bon
d
)
{
strcpy
(
info
->
slave_name
,
slave
->
dev
->
name
);
info
->
link
=
slave
->
link
;
info
->
state
=
slave
->
state
;
...
...
@@ -1737,7 +2329,7 @@ static int bond_ioctl(struct net_device *master_dev, struct ifreq *ifr, int cmd)
struct
net_device
*
slave_dev
=
NULL
;
struct
ifbond
*
u_binfo
=
NULL
,
k_binfo
;
struct
ifslave
*
u_sinfo
=
NULL
,
k_sinfo
;
u16
*
data
=
NULL
;
struct
mii_ioctl_data
*
mii
=
NULL
;
int
ret
=
0
;
#ifdef BONDING_DEBUG
...
...
@@ -1747,23 +2339,23 @@ static int bond_ioctl(struct net_device *master_dev, struct ifreq *ifr, int cmd)
switch
(
cmd
)
{
case
SIOCGMIIPHY
:
data
=
(
u16
*
)
ifr
->
ifr_data
;
if
(
data
==
NULL
)
{
mii
=
(
struct
mii_ioctl_data
*
)
&
ifr
->
ifr_data
;
if
(
mii
==
NULL
)
{
return
-
EINVAL
;
}
data
[
0
]
=
0
;
mii
->
phy_id
=
0
;
/* Fall Through */
case
SIOCGMIIREG
:
/*
* We do this again just in case we were called by SIOCGMIIREG
* instead of SIOCGMIIPHY.
*/
data
=
(
u16
*
)
ifr
->
ifr_data
;
if
(
data
==
NULL
)
{
mii
=
(
struct
mii_ioctl_data
*
)
&
ifr
->
ifr_data
;
if
(
mii
==
NULL
)
{
return
-
EINVAL
;
}
if
(
data
[
1
]
==
1
)
{
data
[
3
]
=
bond_check_mii_link
(
if
(
mii
->
reg_num
==
1
)
{
mii
->
val_out
=
bond_check_mii_link
(
(
struct
bonding
*
)
master_dev
->
priv
);
}
return
0
;
...
...
@@ -1846,6 +2438,65 @@ static int bond_accept_fastpath(struct net_device *dev, struct dst_entry *dst)
}
#endif
/*
* in broadcast mode, we send everything to all usable interfaces.
*/
static
int
bond_xmit_broadcast
(
struct
sk_buff
*
skb
,
struct
net_device
*
dev
)
{
slave_t
*
slave
,
*
start_at
;
struct
bonding
*
bond
=
(
struct
bonding
*
)
dev
->
priv
;
unsigned
long
flags
;
struct
net_device
*
device_we_should_send_to
=
0
;
if
(
!
IS_UP
(
dev
))
{
/* bond down */
dev_kfree_skb
(
skb
);
return
0
;
}
read_lock_irqsave
(
&
bond
->
lock
,
flags
);
read_lock
(
&
bond
->
ptrlock
);
slave
=
start_at
=
bond
->
current_slave
;
read_unlock
(
&
bond
->
ptrlock
);
if
(
slave
==
NULL
)
{
/* we're at the root, get the first slave */
/* no suitable interface, frame not sent */
read_unlock_irqrestore
(
&
bond
->
lock
,
flags
);
dev_kfree_skb
(
skb
);
return
0
;
}
do
{
if
(
IS_UP
(
slave
->
dev
)
&&
(
slave
->
link
==
BOND_LINK_UP
)
&&
(
slave
->
state
==
BOND_STATE_ACTIVE
))
{
if
(
device_we_should_send_to
)
{
struct
sk_buff
*
skb2
;
if
((
skb2
=
skb_clone
(
skb
,
GFP_ATOMIC
))
==
NULL
)
{
printk
(
KERN_ERR
"bond_xmit_broadcast: skb_clone() failed
\n
"
);
continue
;
}
skb2
->
dev
=
device_we_should_send_to
;
skb2
->
priority
=
1
;
dev_queue_xmit
(
skb2
);
}
device_we_should_send_to
=
slave
->
dev
;
}
}
while
((
slave
=
slave
->
next
)
!=
start_at
);
if
(
device_we_should_send_to
)
{
skb
->
dev
=
device_we_should_send_to
;
skb
->
priority
=
1
;
dev_queue_xmit
(
skb
);
}
else
dev_kfree_skb
(
skb
);
/* frame sent to all suitable interfaces */
read_unlock_irqrestore
(
&
bond
->
lock
,
flags
);
return
0
;
}
static
int
bond_xmit_roundrobin
(
struct
sk_buff
*
skb
,
struct
net_device
*
dev
)
{
slave_t
*
slave
,
*
start_at
;
...
...
@@ -1978,15 +2629,26 @@ static int bond_xmit_activebackup(struct sk_buff *skb, struct net_device *dev)
}
/* if we are sending arp packets and don't know
the target hw address, save it so we don't need
to use a broadcast address */
if
(
(
arp_interval
>
0
)
&&
(
arp_target_hw_addr
==
NULL
)
&&
* the target hw address, save it so we don't need
* to use a broadcast address.
* don't do this if in active backup mode because the slaves must
* receive packets to stay up, and the only ones they receive are
* broadcasts.
*/
if
(
(
mode
!=
BOND_MODE_ACTIVEBACKUP
)
&&
(
arp_ip_count
==
1
)
&&
(
arp_interval
>
0
)
&&
(
arp_target_hw_addr
==
NULL
)
&&
(
skb
->
protocol
==
__constant_htons
(
ETH_P_IP
)
)
)
{
struct
ethhdr
*
eth_hdr
=
(
struct
ethhdr
*
)
(((
char
*
)
skb
->
data
));
struct
iphdr
*
ip_hdr
=
(
struct
iphdr
*
)(
eth_hdr
+
1
);
if
(
arp_target
[
0
]
==
ip_hdr
->
daddr
)
{
arp_target_hw_addr
=
kmalloc
(
ETH_ALEN
,
GFP_KERNEL
);
if
(
arp_target_hw_addr
!=
NULL
)
memcpy
(
arp_target_hw_addr
,
eth_hdr
->
h_dest
,
ETH_ALEN
);
}
}
read_lock_irqsave
(
&
bond
->
lock
,
flags
);
...
...
@@ -2074,29 +2736,7 @@ static int bond_get_info(char *buf, char **start, off_t offset, int length)
*/
link
=
bond_check_mii_link
(
bond
);
len
+=
sprintf
(
buf
+
len
,
"Bonding Mode: "
);
switch
(
mode
)
{
case
BOND_MODE_ACTIVEBACKUP
:
len
+=
sprintf
(
buf
+
len
,
"%s
\n
"
,
"active-backup"
);
break
;
case
BOND_MODE_ROUNDROBIN
:
len
+=
sprintf
(
buf
+
len
,
"%s
\n
"
,
"load balancing (round-robin)"
);
break
;
case
BOND_MODE_XOR
:
len
+=
sprintf
(
buf
+
len
,
"%s
\n
"
,
"load balancing (xor)"
);
break
;
default:
len
+=
sprintf
(
buf
+
len
,
"%s
\n
"
,
"unknown"
);
break
;
}
len
+=
sprintf
(
buf
+
len
,
"Bonding Mode: %s
\n
"
,
bond_mode
());
if
(
mode
==
BOND_MODE_ACTIVEBACKUP
)
{
read_lock_irqsave
(
&
bond
->
lock
,
flags
);
...
...
@@ -2115,8 +2755,11 @@ static int bond_get_info(char *buf, char **start, off_t offset, int length)
link
==
BMSR_LSTATUS
?
"up
\n
"
:
"down
\n
"
);
len
+=
sprintf
(
buf
+
len
,
"MII Polling Interval (ms): %d
\n
"
,
miimon
);
len
+=
sprintf
(
buf
+
len
,
"Up Delay (ms): %d
\n
"
,
updelay
);
len
+=
sprintf
(
buf
+
len
,
"Down Delay (ms): %d
\n
"
,
downdelay
);
len
+=
sprintf
(
buf
+
len
,
"Up Delay (ms): %d
\n
"
,
updelay
*
miimon
);
len
+=
sprintf
(
buf
+
len
,
"Down Delay (ms): %d
\n
"
,
downdelay
*
miimon
);
len
+=
sprintf
(
buf
+
len
,
"Multicast Mode: %s
\n
"
,
multicast_mode
());
read_lock_irqsave
(
&
bond
->
lock
,
flags
);
for
(
slave
=
bond
->
prev
;
slave
!=
(
slave_t
*
)
bond
;
...
...
@@ -2205,6 +2848,7 @@ static struct notifier_block bond_netdev_notifier = {
static
int
__init
bond_init
(
struct
net_device
*
dev
)
{
bonding_t
*
bond
,
*
this_bond
,
*
last_bond
;
int
count
;
#ifdef BONDING_DEBUG
printk
(
KERN_INFO
"Begin bond_init for %s
\n
"
,
dev
->
name
);
...
...
@@ -2228,6 +2872,7 @@ static int __init bond_init(struct net_device *dev)
bond
->
next
=
bond
->
prev
=
(
slave_t
*
)
bond
;
bond
->
current_slave
=
NULL
;
bond
->
current_arp_slave
=
NULL
;
bond
->
device
=
dev
;
dev
->
priv
=
bond
;
...
...
@@ -2238,6 +2883,8 @@ static int __init bond_init(struct net_device *dev)
dev
->
hard_start_xmit
=
bond_xmit_roundrobin
;
}
else
if
(
mode
==
BOND_MODE_XOR
)
{
dev
->
hard_start_xmit
=
bond_xmit_xor
;
}
else
if
(
mode
==
BOND_MODE_BROADCAST
)
{
dev
->
hard_start_xmit
=
bond_xmit_broadcast
;
}
else
{
printk
(
KERN_ERR
"Unknown bonding mode %d
\n
"
,
mode
);
kfree
(
bond
->
stats
);
...
...
@@ -2272,7 +2919,18 @@ static int __init bond_init(struct net_device *dev)
}
else
{
printk
(
"out MII link monitoring"
);
}
printk
(
", in %s mode.
\n
"
,
mode
?
"active-backup"
:
"bonding"
);
printk
(
", in %s mode.
\n
"
,
bond_mode
());
printk
(
KERN_INFO
"%s registered with"
,
dev
->
name
);
if
(
arp_interval
>
0
)
{
printk
(
" ARP monitoring set to %d ms with %d target(s):"
,
arp_interval
,
arp_ip_count
);
for
(
count
=
0
;
count
<
arp_ip_count
;
count
++
)
printk
(
" %s"
,
arp_ip_target
[
count
]);
printk
(
"
\n
"
);
}
else
{
printk
(
"out ARP monitoring
\n
"
);
}
#ifdef CONFIG_PROC_FS
bond
->
bond_proc_dir
=
proc_mkdir
(
dev
->
name
,
proc_net
);
...
...
@@ -2329,6 +2987,8 @@ static int __init bonding_init(void)
/* Find a name for this unit */
static
struct
net_device
*
dev_bond
=
NULL
;
printk
(
KERN_INFO
"%s"
,
version
);
if
(
max_bonds
<
1
||
max_bonds
>
INT_MAX
)
{
printk
(
KERN_WARNING
"bonding_init(): max_bonds (%d) not in range %d-%d, "
...
...
@@ -2343,6 +3003,14 @@ static int __init bonding_init(void)
}
memset
(
dev_bonds
,
0
,
max_bonds
*
sizeof
(
struct
net_device
));
if
(
miimon
<
0
)
{
printk
(
KERN_WARNING
"bonding_init(): miimon module parameter (%d), "
"not in range 0-%d, so it was reset to %d
\n
"
,
miimon
,
INT_MAX
,
BOND_LINK_MON_INTERV
);
miimon
=
BOND_LINK_MON_INTERV
;
}
if
(
updelay
<
0
)
{
printk
(
KERN_WARNING
"bonding_init(): updelay module parameter (%d), "
...
...
@@ -2359,6 +3027,52 @@ static int __init bonding_init(void)
downdelay
=
0
;
}
if
(
miimon
==
0
)
{
if
((
updelay
!=
0
)
||
(
downdelay
!=
0
))
{
/* just warn the user the up/down delay will have
* no effect since miimon is zero...
*/
printk
(
KERN_WARNING
"bonding_init(): miimon module parameter not "
"set and updelay (%d) or downdelay (%d) module "
"parameter is set; updelay and downdelay have "
"no effect unless miimon is set
\n
"
,
updelay
,
downdelay
);
}
}
else
{
/* don't allow arp monitoring */
if
(
arp_interval
!=
0
)
{
printk
(
KERN_WARNING
"bonding_init(): miimon (%d) and arp_interval "
"(%d) can't be used simultaneously, "
"disabling ARP monitoring
\n
"
,
miimon
,
arp_interval
);
arp_interval
=
0
;
}
if
((
updelay
%
miimon
)
!=
0
)
{
/* updelay will be rounded in bond_init() when it
* is divided by miimon, we just inform user here
*/
printk
(
KERN_WARNING
"bonding_init(): updelay (%d) is not a multiple "
"of miimon (%d), updelay rounded to %d ms
\n
"
,
updelay
,
miimon
,
(
updelay
/
miimon
)
*
miimon
);
}
if
((
downdelay
%
miimon
)
!=
0
)
{
/* downdelay will be rounded in bond_init() when it
* is divided by miimon, we just inform user here
*/
printk
(
KERN_WARNING
"bonding_init(): downdelay (%d) is not a "
"multiple of miimon (%d), downdelay rounded "
"to %d ms
\n
"
,
downdelay
,
miimon
,
(
downdelay
/
miimon
)
*
miimon
);
}
}
if
(
arp_interval
<
0
)
{
printk
(
KERN_WARNING
"bonding_init(): arp_interval module parameter (%d), "
...
...
@@ -2367,11 +3081,63 @@ static int __init bonding_init(void)
arp_interval
=
BOND_LINK_ARP_INTERV
;
}
if
(
arp_ip_target
)
{
for
(
arp_ip_count
=
0
;
(
arp_ip_count
<
MAX_ARP_IP_TARGETS
)
&&
arp_ip_target
[
arp_ip_count
];
arp_ip_count
++
)
{
/* TODO: check and log bad ip address */
if
(
my_inet_aton
(
arp_ip_target
,
&
arp_target
)
==
0
)
{
if
(
my_inet_aton
(
arp_ip_target
[
arp_ip_count
],
&
arp_target
[
arp_ip_count
])
==
0
)
{
printk
(
KERN_WARNING
"bonding_init(): bad arp_ip_target module "
"parameter (%s), ARP monitoring will not be "
"performed
\n
"
,
arp_ip_target
[
arp_ip_count
]);
arp_interval
=
0
;
}
}
if
(
(
arp_interval
>
0
)
&&
(
arp_ip_count
==
0
))
{
/* don't allow arping if no arp_ip_target given... */
printk
(
KERN_WARNING
"bonding_init(): arp_interval module parameter "
"(%d) specified without providing an arp_ip_target "
"parameter, arp_interval was reset to 0
\n
"
,
arp_interval
);
arp_interval
=
0
;
}
if
((
miimon
==
0
)
&&
(
arp_interval
==
0
))
{
/* miimon and arp_interval not set, we need one so things
* work as expected, see bonding.txt for details
*/
printk
(
KERN_ERR
"bonding_init(): either miimon or "
"arp_interval and arp_ip_target module parameters "
"must be specified, otherwise bonding will not detect "
"link failures! see bonding.txt for details.
\n
"
);
}
if
((
primary
!=
NULL
)
&&
(
mode
!=
BOND_MODE_ACTIVEBACKUP
)){
/* currently, using a primary only makes sence
* in active backup mode
*/
printk
(
KERN_WARNING
"bonding_init(): %s primary device specified but has "
" no effect in %s mode
\n
"
,
primary
,
bond_mode
());
primary
=
NULL
;
}
if
(
multicast
!=
BOND_MULTICAST_DISABLED
&&
multicast
!=
BOND_MULTICAST_ACTIVE
&&
multicast
!=
BOND_MULTICAST_ALL
)
{
printk
(
KERN_WARNING
"bonding_init(): unknown multicast module "
"parameter (%d), multicast reset to %d
\n
"
,
multicast
,
BOND_MULTICAST_ALL
);
multicast
=
BOND_MULTICAST_ALL
;
}
for
(
no
=
0
;
no
<
max_bonds
;
no
++
)
{
...
...
@@ -2420,6 +3186,7 @@ static void __exit bonding_exit(void)
module_init
(
bonding_init
);
module_exit
(
bonding_exit
);
MODULE_LICENSE
(
"GPL"
);
MODULE_DESCRIPTION
(
DRV_DESCRIPTION
", v"
DRV_VERSION
);
/*
* Local variables:
...
...
drivers/net/pppoe.c
View file @
b4f94e3f
...
...
@@ -5,7 +5,7 @@
* PPPoE --- PPP over Ethernet (RFC 2516)
*
*
* Version:
0.6.11
* Version:
0.7.0
*
* 220102 : Fix module use count on failure in pppoe_create, pppox_sk -acme
* 030700 : Fixed connect logic to allow for disconnect.
...
...
@@ -36,7 +36,8 @@
* from interrupts. Thus, we mark the socket as a ZOMBIE
* and do the unregistration later.
* 081002 : seq_file support for proc stuff -acme
*
* 111602 : Merge all 2.4 fixes into 2.5/2.6 tree. Label 2.5/2.6
* as version 0.7. Spacing cleanup.
* Author: Michal Ostrowski <mostrows@speakeasy.net>
* Contributors:
* Arnaldo Carvalho de Melo <acme@conectiva.com.br>
...
...
@@ -443,8 +444,10 @@ static int pppoe_disc_rcv(struct sk_buff *skb,
* what kind of SKB it is during backlog rcv.
*/
if
(
sock_owned_by_user
(
sk
)
==
0
)
{
/* We're no longer connect at the PPPOE layer,
* and must wait for ppp channel to disconnect us.
*/
sk
->
state
=
PPPOX_ZOMBIE
;
pppox_unbind_sock
(
sk
);
}
bh_unlock_sock
(
sk
);
...
...
@@ -583,8 +586,7 @@ int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr,
if
((
sk
->
state
&
PPPOX_CONNECTED
)
&&
sp
->
sa_addr
.
pppoe
.
sid
)
goto
end
;
/* Check for already disconnected sockets,
on attempts to disconnect */
/* Check for already disconnected sockets, on attempts to disconnect */
error
=
-
EALREADY
;
if
((
sk
->
state
&
PPPOX_DEAD
)
&&
!
sp
->
sa_addr
.
pppoe
.
sid
)
goto
end
;
...
...
@@ -596,6 +598,7 @@ int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr,
/* Delete the old binding */
delete_item
(
po
->
pppoe_pa
.
sid
,
po
->
pppoe_pa
.
remote
);
if
(
po
->
pppoe_dev
)
dev_put
(
po
->
pppoe_dev
);
memset
(
po
,
0
,
sizeof
(
struct
pppox_opt
));
...
...
drivers/net/pppox.c
View file @
b4f94e3f
...
...
@@ -5,9 +5,9 @@
* PPPoE --- PPP over Ethernet (RFC 2516)
*
*
* Version: 0.5.
1
* Version: 0.5.
2
*
* Author: Michal Ostrowski <mostrows@s
tyx.uwaterloo.ca
>
* Author: Michal Ostrowski <mostrows@s
peakeasy.net
>
*
* 051000 : Initialization cleanup
*
...
...
@@ -65,9 +65,9 @@ void pppox_unbind_sock(struct sock *sk)
{
/* Clear connection to ppp device, if attached. */
if
(
sk
->
state
&
PPPOX_BOUND
)
{
if
(
sk
->
state
&
(
PPPOX_BOUND
|
PPPOX_ZOMBIE
)
)
{
ppp_unregister_channel
(
&
pppox_sk
(
sk
)
->
chan
);
sk
->
state
&=
~
PPPOX_BOUN
D
;
sk
->
state
=
PPPOX_DEA
D
;
}
}
...
...
include/linux/if_bonding.h
View file @
b4f94e3f
...
...
@@ -40,6 +40,7 @@
#define BOND_MODE_ROUNDROBIN 0
#define BOND_MODE_ACTIVEBACKUP 1
#define BOND_MODE_XOR 2
#define BOND_MODE_BROADCAST 3
/* each slave's link has 4 states */
#define BOND_LINK_UP 0
/* link is up and running */
...
...
@@ -74,6 +75,7 @@ typedef struct slave {
struct
slave
*
prev
;
struct
net_device
*
dev
;
short
delay
;
unsigned
long
jiffies
;
char
link
;
/* one of BOND_LINK_XXXX */
char
state
;
/* one of BOND_STATE_XXXX */
unsigned
short
original_flags
;
...
...
@@ -93,6 +95,8 @@ typedef struct bonding {
slave_t
*
next
;
slave_t
*
prev
;
slave_t
*
current_slave
;
slave_t
*
primary_slave
;
slave_t
*
current_arp_slave
;
__s32
slave_cnt
;
rwlock_t
lock
;
rwlock_t
ptrlock
;
...
...
include/linux/sysctl.h
View file @
b4f94e3f
...
...
@@ -544,7 +544,8 @@ enum {
NET_SCTP_PATH_MAX_RETRANS
=
8
,
NET_SCTP_MAX_INIT_RETRANSMITS
=
9
,
NET_SCTP_HB_INTERVAL
=
10
,
NET_SCTP_MAX_BURST
=
11
,
NET_SCTP_PRESERVE_ENABLE
=
11
,
NET_SCTP_MAX_BURST
=
12
,
};
/* CTL_PROC names: */
...
...
include/net/dst.h
View file @
b4f94e3f
...
...
@@ -10,6 +10,7 @@
#include <linux/config.h>
#include <linux/rtnetlink.h>
#include <linux/rcupdate.h>
#include <net/neighbour.h>
#include <asm/processor.h>
...
...
@@ -71,6 +72,7 @@ struct dst_entry
#endif
struct
dst_ops
*
ops
;
struct
rcu_head
rcu_head
;
char
info
[
0
];
};
...
...
include/net/sctp/sctp.h
View file @
b4f94e3f
...
...
@@ -123,8 +123,8 @@ extern sctp_protocol_t sctp_proto;
extern
struct
sock
*
sctp_get_ctl_sock
(
void
);
extern
int
sctp_copy_local_addr_list
(
sctp_protocol_t
*
,
sctp_bind_addr_t
*
,
sctp_scope_t
,
int
priority
,
int
flags
);
extern
s
ctp_pf_t
*
sctp_get_pf_specific
(
in
t
family
);
extern
void
sctp_set_pf_specific
(
int
family
,
sctp_pf_t
*
);
extern
s
truct
sctp_pf
*
sctp_get_pf_specific
(
sa_family_
t
family
);
extern
int
sctp_register_pf
(
struct
sctp_pf
*
,
sa_family_t
);
/*
* sctp_socket.c
...
...
include/net/sctp/sm.h
View file @
b4f94e3f
...
...
@@ -140,6 +140,8 @@ sctp_state_fn_t sctp_sf_do_5_2_2_dupinit;
sctp_state_fn_t
sctp_sf_do_5_2_4_dupcook
;
sctp_state_fn_t
sctp_sf_unk_chunk
;
sctp_state_fn_t
sctp_sf_do_8_5_1_E_sa
;
sctp_state_fn_t
sctp_sf_cookie_echoed_err
;
sctp_state_fn_t
sctp_sf_do_5_2_6_stale
;
/* Prototypes for primitive event state functions. */
sctp_state_fn_t
sctp_sf_do_prm_asoc
;
...
...
@@ -175,7 +177,6 @@ sctp_state_fn_t sctp_sf_autoclose_timer_expire;
*/
/* Prototypes for chunk state functions. Not in use. */
sctp_state_fn_t
sctp_sf_do_5_2_6_stale
;
sctp_state_fn_t
sctp_sf_do_9_2_reshutack
;
sctp_state_fn_t
sctp_sf_do_9_2_reshut
;
sctp_state_fn_t
sctp_sf_do_9_2_shutack
;
...
...
@@ -211,7 +212,7 @@ void sctp_populate_tie_tags(__u8 *cookie, __u32 curTag, __u32 hisTag);
/* Prototypes for chunk-building functions. */
sctp_chunk_t
*
sctp_make_init
(
const
sctp_association_t
*
,
const
sctp_bind_addr_t
*
,
int
priority
);
int
priority
,
int
vparam_len
);
sctp_chunk_t
*
sctp_make_init_ack
(
const
sctp_association_t
*
,
const
sctp_chunk_t
*
,
const
int
priority
,
...
...
@@ -322,9 +323,15 @@ sctp_pack_cookie(const sctp_endpoint_t *, const sctp_association_t *,
const
__u8
*
,
int
addrs_len
);
sctp_association_t
*
sctp_unpack_cookie
(
const
sctp_endpoint_t
*
,
const
sctp_association_t
*
,
sctp_chunk_t
*
,
int
priority
,
int
*
err
);
sctp_chunk_t
*
,
int
priority
,
int
*
err
,
sctp_chunk_t
**
err_chk_p
);
int
sctp_addip_addr_config
(
sctp_association_t
*
,
sctp_param_t
,
struct
sockaddr_storage
*
,
int
);
void
sctp_send_stale_cookie_err
(
const
sctp_endpoint_t
*
ep
,
const
sctp_association_t
*
asoc
,
const
sctp_chunk_t
*
chunk
,
sctp_cmd_seq_t
*
commands
,
sctp_chunk_t
*
err_chunk
);
/* 3rd level prototypes */
__u32
sctp_generate_tag
(
const
sctp_endpoint_t
*
);
...
...
include/net/sctp/structs.h
View file @
b4f94e3f
...
...
@@ -42,6 +42,7 @@
* Sridhar Samudrala <sri@us.ibm.com>
* Daisy Chang <daisyc@us.ibm.com>
* Dajiang Zhang <dajiang.zhang@nokia.com>
* Ardelle Fan <ardelle.fan@intel.com>
*
* Any bugs reported given to us we will try to fix... any fixes shared will
* be incorporated into the next SCTP release.
...
...
@@ -183,6 +184,9 @@ struct SCTP_protocol {
/* Valid.Cookie.Life - 60 seconds */
int
valid_cookie_life
;
/* Whether Cookie Preservative is enabled(1) or not(0) */
int
cookie_preserve_enable
;
/* Association.Max.Retrans - 10 attempts
* Path.Max.Retrans - 5 attempts (per destination address)
* Max.Init.Retransmits - 8 attempts
...
...
@@ -234,7 +238,7 @@ struct SCTP_protocol {
* Pointers to address related SCTP functions.
* (i.e. things that depend on the address family.)
*/
typedef
struct
sctp_func
{
struct
sctp_af
{
int
(
*
queue_xmit
)
(
struct
sk_buff
*
skb
);
int
(
*
setsockopt
)
(
struct
sock
*
sk
,
int
level
,
...
...
@@ -259,27 +263,34 @@ typedef struct sctp_func {
void
(
*
from_skb
)
(
union
sctp_addr
*
,
struct
sk_buff
*
skb
,
int
saddr
);
void
(
*
from_sk
)
(
union
sctp_addr
*
,
struct
sock
*
sk
);
void
(
*
to_sk
)
(
union
sctp_addr
*
,
struct
sock
*
sk
);
int
(
*
addr_valid
)
(
union
sctp_addr
*
);
sctp_scope_t
(
*
scope
)
(
union
sctp_addr
*
);
void
(
*
inaddr_any
)
(
union
sctp_addr
*
,
unsigned
short
);
int
(
*
is_any
)
(
const
union
sctp_addr
*
);
int
(
*
available
)
(
const
union
sctp_addr
*
);
__u16
net_header_len
;
int
sockaddr_len
;
sa_family_t
sa_family
;
struct
list_head
list
;
}
sctp_func_t
;
};
sctp_func_t
*
sctp_get_af_specific
(
sa_family_t
);
struct
sctp_af
*
sctp_get_af_specific
(
sa_family_t
);
int
sctp_register_af
(
struct
sctp_af
*
);
/* Protocol family functions. */
typedef
struct
sctp_pf
{
void
(
*
event_msgname
)(
sctp_ulpevent_t
*
,
char
*
,
int
*
);
void
(
*
skb_msgname
)(
struct
sk_buff
*
,
char
*
,
int
*
);
int
(
*
af_supported
)(
sa_family_t
);
void
(
*
skb_msgname
)
(
struct
sk_buff
*
,
char
*
,
int
*
);
int
(
*
af_supported
)
(
sa_family_t
);
int
(
*
cmp_addr
)
(
const
union
sctp_addr
*
,
const
union
sctp_addr
*
,
struct
sctp_opt
*
);
struct
sctp_func
*
af
;
int
(
*
bind_verify
)
(
struct
sctp_opt
*
,
union
sctp_addr
*
);
struct
sctp_af
*
af
;
}
sctp_pf_t
;
/* SCTP Socket type: UDP or TCP style. */
...
...
@@ -623,7 +634,7 @@ struct SCTP_transport {
union
sctp_addr
ipaddr
;
/* These are the functions we call to handle LLP stuff. */
s
ctp_func_t
*
af_specific
;
s
truct
sctp_af
*
af_specific
;
/* Which association do we belong to? */
sctp_association_t
*
asoc
;
...
...
@@ -1271,7 +1282,6 @@ struct SCTP_association {
/* The cookie life I award for any cookie. */
struct
timeval
cookie_life
;
__u32
cookie_preserve
;
/* Overall : The overall association error count.
* Error Count : [Clear this any time I get something.]
...
...
@@ -1350,6 +1360,9 @@ struct SCTP_association {
*/
__u32
rwnd
;
/* This is the last advertised value of rwnd over a SACK chunk. */
__u32
a_rwnd
;
/* Number of bytes by which the rwnd has slopped. The rwnd is allowed
* to slop over a maximum of the association's frag_point.
*/
...
...
net/core/netfilter.c
View file @
b4f94e3f
...
...
@@ -574,6 +574,14 @@ void nf_reinject(struct sk_buff *skb, struct nf_info *info,
/* Release those devices we held, or Alexey will kill me. */
if
(
info
->
indev
)
dev_put
(
info
->
indev
);
if
(
info
->
outdev
)
dev_put
(
info
->
outdev
);
#if defined(CONFIG_BRIDGE) || defined(CONFIG_BRIDGE_MODULE)
if
(
skb
->
nf_bridge
)
{
if
(
skb
->
nf_bridge
->
physindev
)
dev_put
(
skb
->
nf_bridge
->
physindev
);
if
(
skb
->
nf_bridge
->
physoutdev
)
dev_put
(
skb
->
nf_bridge
->
physoutdev
);
}
#endif
kfree
(
info
);
return
;
...
...
net/core/pktgen.c
View file @
b4f94e3f
...
...
@@ -207,13 +207,13 @@ static struct pktgen_info pginfos[MAX_PKTGEN];
/** Convert to miliseconds */
inline
__u64
tv_to_ms
(
const
struct
timeval
*
tv
)
{
static
inline
__u64
tv_to_ms
(
const
struct
timeval
*
tv
)
{
__u64
ms
=
tv
->
tv_usec
/
1000
;
ms
+=
(
__u64
)
tv
->
tv_sec
*
(
__u64
)
1000
;
return
ms
;
}
inline
__u64
getCurMs
(
void
)
{
static
inline
__u64
getCurMs
(
void
)
{
struct
timeval
tv
;
do_gettimeofday
(
&
tv
);
return
tv_to_ms
(
&
tv
);
...
...
@@ -1277,7 +1277,7 @@ static int proc_write(struct file *file, const char *user_buffer,
}
int
create_proc_dir
(
void
)
static
int
create_proc_dir
(
void
)
{
int
len
;
/* does proc_dir already exists */
...
...
@@ -1295,7 +1295,7 @@ int create_proc_dir(void)
return
1
;
}
int
remove_proc_dir
(
void
)
static
int
remove_proc_dir
(
void
)
{
remove_proc_entry
(
PG_PROC_DIR
,
proc_net
);
return
1
;
...
...
net/ipv4/route.c
View file @
b4f94e3f
...
...
@@ -86,6 +86,7 @@
#include <linux/mroute.h>
#include <linux/netfilter_ipv4.h>
#include <linux/random.h>
#include <linux/rcupdate.h>
#include <net/protocol.h>
#include <net/ip.h>
#include <net/route.h>
...
...
@@ -178,7 +179,7 @@ __u8 ip_tos2prio[16] = {
/* The locking scheme is rather straight forward:
*
* 1)
A BH protected rwlocks protect
buckets of the central route hash.
* 1)
Read-Copy Update protects the
buckets of the central route hash.
* 2) Only writers remove entries, and they hold the lock
* as they look at rtable reference counts.
* 3) Only readers acquire references to rtable entries,
...
...
@@ -188,7 +189,7 @@ __u8 ip_tos2prio[16] = {
struct
rt_hash_bucket
{
struct
rtable
*
chain
;
rw
lock_t
lock
;
spin
lock_t
lock
;
}
__attribute__
((
__aligned__
(
8
)));
static
struct
rt_hash_bucket
*
rt_hash_table
;
...
...
@@ -220,11 +221,11 @@ static struct rtable *rt_cache_get_first(struct seq_file *seq)
struct
rt_cache_iter_state
*
st
=
seq
->
private
;
for
(
st
->
bucket
=
rt_hash_mask
;
st
->
bucket
>=
0
;
--
st
->
bucket
)
{
r
ead_lock_bh
(
&
rt_hash_table
[
st
->
bucket
].
lock
);
r
cu_read_lock
(
);
r
=
rt_hash_table
[
st
->
bucket
].
chain
;
if
(
r
)
break
;
r
ead_unlock_bh
(
&
rt_hash_table
[
st
->
bucket
].
lock
);
r
cu_read_unlock
(
);
}
return
r
;
}
...
...
@@ -233,12 +234,13 @@ static struct rtable *rt_cache_get_next(struct seq_file *seq, struct rtable *r)
{
struct
rt_cache_iter_state
*
st
=
seq
->
private
;
read_barrier_depends
();
r
=
r
->
u
.
rt_next
;
while
(
!
r
)
{
r
ead_unlock_bh
(
&
rt_hash_table
[
st
->
bucket
].
lock
);
r
cu_read_unlock
(
);
if
(
--
st
->
bucket
<
0
)
break
;
r
ead_lock_bh
(
&
rt_hash_table
[
st
->
bucket
].
lock
);
r
cu_read_lock
(
);
r
=
rt_hash_table
[
st
->
bucket
].
chain
;
}
return
r
;
...
...
@@ -276,7 +278,7 @@ static void rt_cache_seq_stop(struct seq_file *seq, void *v)
if
(
v
&&
v
!=
(
void
*
)
1
)
{
struct
rt_cache_iter_state
*
st
=
seq
->
private
;
r
ead_unlock_bh
(
&
rt_hash_table
[
st
->
bucket
].
lock
);
r
cu_read_unlock
(
);
}
}
...
...
@@ -406,13 +408,13 @@ void __init rt_cache_proc_exit(void)
static
__inline__
void
rt_free
(
struct
rtable
*
rt
)
{
dst_free
(
&
rt
->
u
.
dst
);
call_rcu
(
&
rt
->
u
.
dst
.
rcu_head
,
(
void
(
*
)(
void
*
))
dst_free
,
&
rt
->
u
.
dst
);
}
static
__inline__
void
rt_drop
(
struct
rtable
*
rt
)
{
ip_rt_put
(
rt
);
dst_free
(
&
rt
->
u
.
dst
);
call_rcu
(
&
rt
->
u
.
dst
.
rcu_head
,
(
void
(
*
)(
void
*
))
dst_free
,
&
rt
->
u
.
dst
);
}
static
__inline__
int
rt_fast_clean
(
struct
rtable
*
rth
)
...
...
@@ -465,7 +467,7 @@ static void SMP_TIMER_NAME(rt_check_expire)(unsigned long dummy)
i
=
(
i
+
1
)
&
rt_hash_mask
;
rthp
=
&
rt_hash_table
[
i
].
chain
;
write
_lock
(
&
rt_hash_table
[
i
].
lock
);
spin
_lock
(
&
rt_hash_table
[
i
].
lock
);
while
((
rth
=
*
rthp
)
!=
NULL
)
{
if
(
rth
->
u
.
dst
.
expires
)
{
/* Entry is expired even if it is in use */
...
...
@@ -484,7 +486,7 @@ static void SMP_TIMER_NAME(rt_check_expire)(unsigned long dummy)
*
rthp
=
rth
->
u
.
rt_next
;
rt_free
(
rth
);
}
write
_unlock
(
&
rt_hash_table
[
i
].
lock
);
spin
_unlock
(
&
rt_hash_table
[
i
].
lock
);
/* Fallback loop breaker. */
if
((
jiffies
-
now
)
>
0
)
...
...
@@ -507,11 +509,11 @@ static void SMP_TIMER_NAME(rt_run_flush)(unsigned long dummy)
rt_deadline
=
0
;
for
(
i
=
rt_hash_mask
;
i
>=
0
;
i
--
)
{
write
_lock_bh
(
&
rt_hash_table
[
i
].
lock
);
spin
_lock_bh
(
&
rt_hash_table
[
i
].
lock
);
rth
=
rt_hash_table
[
i
].
chain
;
if
(
rth
)
rt_hash_table
[
i
].
chain
=
NULL
;
write
_unlock_bh
(
&
rt_hash_table
[
i
].
lock
);
spin
_unlock_bh
(
&
rt_hash_table
[
i
].
lock
);
for
(;
rth
;
rth
=
next
)
{
next
=
rth
->
u
.
rt_next
;
...
...
@@ -635,7 +637,7 @@ static int rt_garbage_collect(void)
k
=
(
k
+
1
)
&
rt_hash_mask
;
rthp
=
&
rt_hash_table
[
k
].
chain
;
write
_lock_bh
(
&
rt_hash_table
[
k
].
lock
);
spin
_lock_bh
(
&
rt_hash_table
[
k
].
lock
);
while
((
rth
=
*
rthp
)
!=
NULL
)
{
if
(
!
rt_may_expire
(
rth
,
tmo
,
expire
))
{
tmo
>>=
1
;
...
...
@@ -646,7 +648,7 @@ static int rt_garbage_collect(void)
rt_free
(
rth
);
goal
--
;
}
write
_unlock_bh
(
&
rt_hash_table
[
k
].
lock
);
spin
_unlock_bh
(
&
rt_hash_table
[
k
].
lock
);
if
(
goal
<=
0
)
break
;
}
...
...
@@ -714,7 +716,7 @@ static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp)
restart:
rthp
=
&
rt_hash_table
[
hash
].
chain
;
write
_lock_bh
(
&
rt_hash_table
[
hash
].
lock
);
spin
_lock_bh
(
&
rt_hash_table
[
hash
].
lock
);
while
((
rth
=
*
rthp
)
!=
NULL
)
{
if
(
compare_keys
(
&
rth
->
fl
,
&
rt
->
fl
))
{
/* Put it first */
...
...
@@ -725,7 +727,7 @@ static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp)
rth
->
u
.
dst
.
__use
++
;
dst_hold
(
&
rth
->
u
.
dst
);
rth
->
u
.
dst
.
lastuse
=
now
;
write
_unlock_bh
(
&
rt_hash_table
[
hash
].
lock
);
spin
_unlock_bh
(
&
rt_hash_table
[
hash
].
lock
);
rt_drop
(
rt
);
*
rp
=
rth
;
...
...
@@ -741,7 +743,7 @@ static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp)
if
(
rt
->
rt_type
==
RTN_UNICAST
||
rt
->
fl
.
iif
==
0
)
{
int
err
=
arp_bind_neighbour
(
&
rt
->
u
.
dst
);
if
(
err
)
{
write
_unlock_bh
(
&
rt_hash_table
[
hash
].
lock
);
spin
_unlock_bh
(
&
rt_hash_table
[
hash
].
lock
);
if
(
err
!=
-
ENOBUFS
)
{
rt_drop
(
rt
);
...
...
@@ -782,7 +784,7 @@ static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp)
}
#endif
rt_hash_table
[
hash
].
chain
=
rt
;
write
_unlock_bh
(
&
rt_hash_table
[
hash
].
lock
);
spin
_unlock_bh
(
&
rt_hash_table
[
hash
].
lock
);
*
rp
=
rt
;
return
0
;
}
...
...
@@ -849,7 +851,7 @@ static void rt_del(unsigned hash, struct rtable *rt)
{
struct
rtable
**
rthp
;
write
_lock_bh
(
&
rt_hash_table
[
hash
].
lock
);
spin
_lock_bh
(
&
rt_hash_table
[
hash
].
lock
);
ip_rt_put
(
rt
);
for
(
rthp
=
&
rt_hash_table
[
hash
].
chain
;
*
rthp
;
rthp
=
&
(
*
rthp
)
->
u
.
rt_next
)
...
...
@@ -858,7 +860,7 @@ static void rt_del(unsigned hash, struct rtable *rt)
rt_free
(
rt
);
break
;
}
write
_unlock_bh
(
&
rt_hash_table
[
hash
].
lock
);
spin
_unlock_bh
(
&
rt_hash_table
[
hash
].
lock
);
}
void
ip_rt_redirect
(
u32
old_gw
,
u32
daddr
,
u32
new_gw
,
...
...
@@ -897,10 +899,11 @@ void ip_rt_redirect(u32 old_gw, u32 daddr, u32 new_gw,
rthp
=&
rt_hash_table
[
hash
].
chain
;
r
ead_lock
(
&
rt_hash_table
[
hash
].
lock
);
r
cu_read_lock
(
);
while
((
rth
=
*
rthp
)
!=
NULL
)
{
struct
rtable
*
rt
;
read_barrier_depends
();
if
(
rth
->
fl
.
fl4_dst
!=
daddr
||
rth
->
fl
.
fl4_src
!=
skeys
[
i
]
||
rth
->
fl
.
fl4_tos
!=
tos
||
...
...
@@ -918,7 +921,7 @@ void ip_rt_redirect(u32 old_gw, u32 daddr, u32 new_gw,
break
;
dst_clone
(
&
rth
->
u
.
dst
);
r
ead_unlock
(
&
rt_hash_table
[
hash
].
lock
);
r
cu_read_unlock
(
);
rt
=
dst_alloc
(
&
ipv4_dst_ops
);
if
(
rt
==
NULL
)
{
...
...
@@ -929,6 +932,7 @@ void ip_rt_redirect(u32 old_gw, u32 daddr, u32 new_gw,
/* Copy all the information. */
*
rt
=
*
rth
;
INIT_RCU_HEAD
(
&
rt
->
u
.
dst
.
rcu_head
);
rt
->
u
.
dst
.
__use
=
1
;
atomic_set
(
&
rt
->
u
.
dst
.
__refcnt
,
1
);
if
(
rt
->
u
.
dst
.
dev
)
...
...
@@ -964,7 +968,7 @@ void ip_rt_redirect(u32 old_gw, u32 daddr, u32 new_gw,
ip_rt_put
(
rt
);
goto
do_next
;
}
r
ead_unlock
(
&
rt_hash_table
[
hash
].
lock
);
r
cu_read_unlock
(
);
do_next:
;
}
...
...
@@ -1144,9 +1148,10 @@ unsigned short ip_rt_frag_needed(struct iphdr *iph, unsigned short new_mtu)
for
(
i
=
0
;
i
<
2
;
i
++
)
{
unsigned
hash
=
rt_hash_code
(
daddr
,
skeys
[
i
],
tos
);
r
ead_lock
(
&
rt_hash_table
[
hash
].
lock
);
r
cu_read_lock
(
);
for
(
rth
=
rt_hash_table
[
hash
].
chain
;
rth
;
rth
=
rth
->
u
.
rt_next
)
{
read_barrier_depends
();
if
(
rth
->
fl
.
fl4_dst
==
daddr
&&
rth
->
fl
.
fl4_src
==
skeys
[
i
]
&&
rth
->
rt_dst
==
daddr
&&
...
...
@@ -1182,7 +1187,7 @@ unsigned short ip_rt_frag_needed(struct iphdr *iph, unsigned short new_mtu)
}
}
}
r
ead_unlock
(
&
rt_hash_table
[
hash
].
lock
);
r
cu_read_unlock
(
);
}
return
est_mtu
?
:
new_mtu
;
}
...
...
@@ -1736,8 +1741,9 @@ int ip_route_input(struct sk_buff *skb, u32 daddr, u32 saddr,
tos
&=
IPTOS_RT_MASK
;
hash
=
rt_hash_code
(
daddr
,
saddr
^
(
iif
<<
5
),
tos
);
r
ead_lock
(
&
rt_hash_table
[
hash
].
lock
);
r
cu_read_lock
(
);
for
(
rth
=
rt_hash_table
[
hash
].
chain
;
rth
;
rth
=
rth
->
u
.
rt_next
)
{
read_barrier_depends
();
if
(
rth
->
fl
.
fl4_dst
==
daddr
&&
rth
->
fl
.
fl4_src
==
saddr
&&
rth
->
fl
.
iif
==
iif
&&
...
...
@@ -1750,12 +1756,12 @@ int ip_route_input(struct sk_buff *skb, u32 daddr, u32 saddr,
dst_hold
(
&
rth
->
u
.
dst
);
rth
->
u
.
dst
.
__use
++
;
rt_cache_stat
[
smp_processor_id
()].
in_hit
++
;
r
ead_unlock
(
&
rt_hash_table
[
hash
].
lock
);
r
cu_read_unlock
(
);
skb
->
dst
=
(
struct
dst_entry
*
)
rth
;
return
0
;
}
}
r
ead_unlock
(
&
rt_hash_table
[
hash
].
lock
);
r
cu_read_unlock
(
);
/* Multicast recognition logic is moved from route cache to here.
The problem was that too many Ethernet cards have broken/missing
...
...
@@ -2100,8 +2106,9 @@ int __ip_route_output_key(struct rtable **rp, const struct flowi *flp)
hash
=
rt_hash_code
(
flp
->
fl4_dst
,
flp
->
fl4_src
^
(
flp
->
oif
<<
5
),
flp
->
fl4_tos
);
r
ead_lock_bh
(
&
rt_hash_table
[
hash
].
lock
);
r
cu_read_lock
(
);
for
(
rth
=
rt_hash_table
[
hash
].
chain
;
rth
;
rth
=
rth
->
u
.
rt_next
)
{
read_barrier_depends
();
if
(
rth
->
fl
.
fl4_dst
==
flp
->
fl4_dst
&&
rth
->
fl
.
fl4_src
==
flp
->
fl4_src
&&
rth
->
fl
.
iif
==
0
&&
...
...
@@ -2115,12 +2122,12 @@ int __ip_route_output_key(struct rtable **rp, const struct flowi *flp)
dst_hold
(
&
rth
->
u
.
dst
);
rth
->
u
.
dst
.
__use
++
;
rt_cache_stat
[
smp_processor_id
()].
out_hit
++
;
r
ead_unlock_bh
(
&
rt_hash_table
[
hash
].
lock
);
r
cu_read_unlock
(
);
*
rp
=
rth
;
return
0
;
}
}
r
ead_unlock_bh
(
&
rt_hash_table
[
hash
].
lock
);
r
cu_read_unlock
(
);
return
ip_route_output_slow
(
rp
,
flp
);
}
...
...
@@ -2328,9 +2335,10 @@ int ip_rt_dump(struct sk_buff *skb, struct netlink_callback *cb)
if
(
h
<
s_h
)
continue
;
if
(
h
>
s_h
)
s_idx
=
0
;
r
ead_lock_bh
(
&
rt_hash_table
[
h
].
lock
);
r
cu_read_lock
(
);
for
(
rt
=
rt_hash_table
[
h
].
chain
,
idx
=
0
;
rt
;
rt
=
rt
->
u
.
rt_next
,
idx
++
)
{
read_barrier_depends
();
if
(
idx
<
s_idx
)
continue
;
skb
->
dst
=
dst_clone
(
&
rt
->
u
.
dst
);
...
...
@@ -2338,12 +2346,12 @@ int ip_rt_dump(struct sk_buff *skb, struct netlink_callback *cb)
cb
->
nlh
->
nlmsg_seq
,
RTM_NEWROUTE
,
1
)
<=
0
)
{
dst_release
(
xchg
(
&
skb
->
dst
,
NULL
));
r
ead_unlock_bh
(
&
rt_hash_table
[
h
].
lock
);
r
cu_read_unlock
(
);
goto
done
;
}
dst_release
(
xchg
(
&
skb
->
dst
,
NULL
));
}
r
ead_unlock_bh
(
&
rt_hash_table
[
h
].
lock
);
r
cu_read_unlock
(
);
}
done:
...
...
@@ -2627,7 +2635,7 @@ int __init ip_rt_init(void)
rt_hash_mask
--
;
for
(
i
=
0
;
i
<=
rt_hash_mask
;
i
++
)
{
rt_hash_table
[
i
].
lock
=
RW
_LOCK_UNLOCKED
;
rt_hash_table
[
i
].
lock
=
SPIN
_LOCK_UNLOCKED
;
rt_hash_table
[
i
].
chain
=
NULL
;
}
...
...
net/ipv4/tcp_ipv4.c
View file @
b4f94e3f
...
...
@@ -2236,6 +2236,7 @@ static void *listening_get_next(struct seq_file *seq, void *cur)
goto
get_req
;
}
read_unlock_bh
(
&
tp
->
syn_wait_lock
);
sk
=
sk
->
next
;
}
if
(
++
st
->
bucket
<
TCP_LHTABLE_SIZE
)
{
sk
=
tcp_listening_hash
[
st
->
bucket
];
...
...
net/ipv6/ndisc.c
View file @
b4f94e3f
...
...
@@ -871,6 +871,7 @@ static void ndisc_router_discovery(struct sk_buff *skb)
}
if
(
!
ndisc_parse_options
(
opt
,
optlen
,
&
ndopts
))
{
in6_dev_put
(
in6_dev
);
if
(
net_ratelimit
())
ND_PRINTK2
(
KERN_WARNING
"ICMP6 RA: invalid ND option, ignored.
\n
"
);
...
...
net/key/af_key.c
View file @
b4f94e3f
/*
* net/key/
pfkeyv2.c An implemen
ation of PF_KEYv2 sockets.
* net/key/
af_key.c An implement
ation of PF_KEYv2 sockets.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
...
...
net/sctp/associola.c
View file @
b4f94e3f
...
...
@@ -128,8 +128,9 @@ sctp_association_t *sctp_association_init(sctp_association_t *asoc,
asoc
->
state_timestamp
=
jiffies
;
/* Set things that have constant value. */
asoc
->
cookie_life
.
tv_sec
=
SCTP_DEFAULT_COOKIE_LIFE_SEC
;
asoc
->
cookie_life
.
tv_usec
=
SCTP_DEFAULT_COOKIE_LIFE_USEC
;
asoc
->
cookie_life
.
tv_sec
=
sctp_proto
.
valid_cookie_life
/
HZ
;
asoc
->
cookie_life
.
tv_usec
=
(
sctp_proto
.
valid_cookie_life
%
HZ
)
*
1000000L
/
HZ
;
asoc
->
pmtu
=
0
;
asoc
->
frag_point
=
0
;
...
...
@@ -185,6 +186,8 @@ sctp_association_t *sctp_association_init(sctp_association_t *asoc,
else
asoc
->
rwnd
=
sk
->
rcvbuf
;
asoc
->
a_rwnd
=
0
;
asoc
->
rwnd_over
=
0
;
/* Use my own max window until I learn something better. */
...
...
@@ -642,7 +645,7 @@ __u16 __sctp_association_get_next_ssn(sctp_association_t *asoc, __u16 sid)
int
sctp_cmp_addr_exact
(
const
union
sctp_addr
*
ss1
,
const
union
sctp_addr
*
ss2
)
{
struct
sctp_
func
*
af
;
struct
sctp_
af
*
af
;
af
=
sctp_get_af_specific
(
ss1
->
sa
.
sa_family
);
if
(
!
af
)
...
...
net/sctp/bind_addr.c
View file @
b4f94e3f
...
...
@@ -327,7 +327,7 @@ static int sctp_copy_one_addr(sctp_bind_addr_t *dest, union sctp_addr *addr,
/* Is this a wildcard address? */
int
sctp_is_any
(
const
union
sctp_addr
*
addr
)
{
struct
sctp_
func
*
af
=
sctp_get_af_specific
(
addr
->
sa
.
sa_family
);
struct
sctp_
af
*
af
=
sctp_get_af_specific
(
addr
->
sa
.
sa_family
);
if
(
!
af
)
return
0
;
return
af
->
is_any
(
addr
);
...
...
@@ -362,7 +362,7 @@ int sctp_in_scope(const union sctp_addr *addr, sctp_scope_t scope)
/* What is the scope of 'addr'? */
sctp_scope_t
sctp_scope
(
const
union
sctp_addr
*
addr
)
{
struct
sctp_
func
*
af
;
struct
sctp_
af
*
af
;
af
=
sctp_get_af_specific
(
addr
->
sa
.
sa_family
);
if
(
!
af
)
...
...
net/sctp/input.c
View file @
b4f94e3f
...
...
@@ -42,6 +42,7 @@
* Hui Huang <hui.huang@nokia.com>
* Daisy Chang <daisyc@us.ibm.com>
* Sridhar Samudrala <sri@us.ibm.com>
* Ardelle Fan <ardelle.fan@intel.com>
*
* Any bugs reported given to us we will try to fix... any fixes shared will
* be incorporated into the next SCTP release.
...
...
@@ -96,7 +97,7 @@ int sctp_rcv(struct sk_buff *skb)
struct
sctphdr
*
sh
;
union
sctp_addr
src
;
union
sctp_addr
dest
;
struct
sctp_
func
*
af
;
struct
sctp_
af
*
af
;
int
ret
=
0
;
if
(
skb
->
pkt_type
!=
PACKET_HOST
)
...
...
@@ -279,6 +280,7 @@ int sctp_rcv_ootb(struct sk_buff *skb)
{
sctp_chunkhdr_t
*
ch
;
__u8
*
ch_end
;
sctp_errhdr_t
*
err
;
ch
=
(
sctp_chunkhdr_t
*
)
skb
->
data
;
...
...
@@ -308,7 +310,8 @@ int sctp_rcv_ootb(struct sk_buff *skb)
goto
discard
;
if
(
ch
->
type
==
SCTP_CID_ERROR
)
{
/* FIXME - Need to check the "Stale cookie" ERROR. */
err
=
(
sctp_errhdr_t
*
)(
ch
+
sizeof
(
sctp_chunkhdr_t
));
if
(
SCTP_ERROR_STALE_COOKIE
==
err
->
cause
)
goto
discard
;
}
...
...
net/sctp/ipv6.c
View file @
b4f94e3f
...
...
@@ -76,8 +76,19 @@
#include <asm/uaccess.h>
/* FIXME: Cleanup so we don't need TEST_FRAME here. */
#ifndef TEST_FRAME
extern
struct
notifier_block
sctp_inetaddr_notifier
;
/* FIXME: This macro needs to be moved to a common header file. */
#define NIP6(addr) \
ntohs((addr)->s6_addr16[0]), \
ntohs((addr)->s6_addr16[1]), \
ntohs((addr)->s6_addr16[2]), \
ntohs((addr)->s6_addr16[3]), \
ntohs((addr)->s6_addr16[4]), \
ntohs((addr)->s6_addr16[5]), \
ntohs((addr)->s6_addr16[6]), \
ntohs((addr)->s6_addr16[7])
/* FIXME: Comments. */
static
inline
void
sctp_v6_err
(
struct
sk_buff
*
skb
,
struct
inet6_skb_parm
*
opt
,
...
...
@@ -92,13 +103,38 @@ static inline int sctp_v6_xmit(struct sk_buff *skb)
struct
sock
*
sk
=
skb
->
sk
;
struct
ipv6_pinfo
*
np
=
inet6_sk
(
sk
);
struct
flowi
fl
;
struct
dst_entry
*
dst
;
struct
dst_entry
*
dst
=
skb
->
dst
;
struct
rt6_info
*
rt6
=
(
struct
rt6_info
*
)
dst
;
struct
in6_addr
saddr
;
int
err
=
0
;
int
err
;
fl
.
proto
=
sk
->
protocol
;
fl
.
fl6_dst
=
&
np
->
daddr
;
fl
.
fl6_src
=
NULL
;
fl
.
fl6_dst
=
&
rt6
->
rt6i_dst
.
addr
;
/* FIXME: Currently, ip6_route_output() doesn't fill in the source
* address in the returned route entry. So we call ipv6_get_saddr()
* to get an appropriate source address. It is possible that this address
* may not be part of the bind address list of the association.
* Once ip6_route_ouput() is fixed so that it returns a route entry
* with an appropriate source address, the following if condition can
* be removed. With ip6_route_output() returning a source address filled
* route entry, sctp_transport_route() can do real source address
* selection for v6.
*/
if
(
ipv6_addr_any
(
&
rt6
->
rt6i_src
.
addr
))
{
err
=
ipv6_get_saddr
(
dst
,
fl
.
fl6_dst
,
&
saddr
);
if
(
err
)
{
printk
(
KERN_ERR
"%s: No saddr available for "
"DST=%04x:%04x:%04x:%04x:%04x:%04x:%04x:%04x
\n
"
,
__FUNCTION__
,
NIP6
(
fl
.
fl6_src
));
return
err
;
}
fl
.
fl6_src
=
&
saddr
;
}
else
{
fl
.
fl6_src
=
&
rt6
->
rt6i_src
.
addr
;
}
fl
.
fl6_flowlabel
=
np
->
flow_label
;
IP6_ECN_flow_xmit
(
sk
,
fl
.
fl6_flowlabel
);
...
...
@@ -111,63 +147,8 @@ static inline int sctp_v6_xmit(struct sk_buff *skb)
fl
.
nl_u
.
ip6_u
.
daddr
=
rt0
->
addr
;
}
dst
=
__sk_dst_check
(
sk
,
np
->
dst_cookie
);
if
(
dst
==
NULL
)
{
dst
=
ip6_route_output
(
sk
,
&
fl
);
if
(
dst
->
error
)
{
sk
->
err_soft
=
-
dst
->
error
;
dst_release
(
dst
);
return
-
sk
->
err_soft
;
}
ip6_dst_store
(
sk
,
dst
,
NULL
);
}
skb
->
dst
=
dst_clone
(
dst
);
/* FIXME: This is all temporary until real source address
* selection is done.
*/
if
(
ipv6_addr_any
(
&
np
->
saddr
))
{
err
=
ipv6_get_saddr
(
dst
,
fl
.
fl6_dst
,
&
saddr
);
if
(
err
)
printk
(
KERN_ERR
"sctp_v6_xmit: no saddr available
\n
"
);
/* FIXME: This is a workaround until we get
* real source address selection done. This is here
* to disallow loopback when the scoping rules have
* not bound loopback to the endpoint.
*/
if
(
sctp_ipv6_addr_type
(
&
saddr
)
&
IPV6_ADDR_LOOPBACK
)
{
if
(
!
(
sctp_ipv6_addr_type
(
&
np
->
daddr
)
&
IPV6_ADDR_LOOPBACK
))
{
ipv6_addr_copy
(
&
saddr
,
&
np
->
daddr
);
}
}
fl
.
fl6_src
=
&
saddr
;
}
else
{
fl
.
fl6_src
=
&
np
->
saddr
;
}
/* Restore final destination back after routing done */
fl
.
nl_u
.
ip6_u
.
daddr
=
&
np
->
daddr
;
return
ip6_xmit
(
sk
,
skb
,
&
fl
,
np
->
opt
);
}
#endif
/* TEST_FRAME */
/* FIXME: This macro needs to be moved to a common header file. */
#define NIP6(addr) \
ntohs((addr)->s6_addr16[0]), \
ntohs((addr)->s6_addr16[1]), \
ntohs((addr)->s6_addr16[2]), \
ntohs((addr)->s6_addr16[3]), \
ntohs((addr)->s6_addr16[4]), \
ntohs((addr)->s6_addr16[5]), \
ntohs((addr)->s6_addr16[6]), \
ntohs((addr)->s6_addr16[7])
/* Returns the dst cache entry for the given source and destination ip
* addresses.
...
...
@@ -261,6 +242,20 @@ static void sctp_v6_from_skb(union sctp_addr *addr,struct sk_buff *skb,
ipv6_addr_copy
(
&
addr
->
v6
.
sin6_addr
,
from
);
}
/* Initialize an sctp_addr from a socket. */
static
void
sctp_v6_from_sk
(
union
sctp_addr
*
addr
,
struct
sock
*
sk
)
{
addr
->
v6
.
sin6_family
=
AF_INET6
;
addr
->
v6
.
sin6_port
=
inet_sk
(
sk
)
->
num
;
addr
->
v6
.
sin6_addr
=
inet6_sk
(
sk
)
->
rcv_saddr
;
}
/* Initialize sk->rcv_saddr from sctp_addr. */
static
void
sctp_v6_to_sk
(
union
sctp_addr
*
addr
,
struct
sock
*
sk
)
{
inet6_sk
(
sk
)
->
rcv_saddr
=
addr
->
v6
.
sin6_addr
;
}
/* Initialize a sctp_addr from a dst_entry. */
static
void
sctp_v6_dst_saddr
(
union
sctp_addr
*
addr
,
struct
dst_entry
*
dst
)
{
...
...
@@ -270,7 +265,7 @@ static void sctp_v6_dst_saddr(union sctp_addr *addr, struct dst_entry *dst)
}
/* Compare addresses exactly. Well.. almost exactly; ignore scope_id
* for now. FIXME.
* for now. FIXME
: v4-mapped-v6
.
*/
static
int
sctp_v6_cmp_addr
(
const
union
sctp_addr
*
addr1
,
const
union
sctp_addr
*
addr2
)
...
...
@@ -300,6 +295,22 @@ static int sctp_v6_is_any(const union sctp_addr *addr)
return
IPV6_ADDR_ANY
==
type
;
}
/* Should this be available for binding? */
static
int
sctp_v6_available
(
const
union
sctp_addr
*
addr
)
{
int
type
;
struct
in6_addr
*
in6
=
(
struct
in6_addr
*
)
&
addr
->
v6
.
sin6_addr
;
type
=
ipv6_addr_type
(
in6
);
if
(
IPV6_ADDR_ANY
==
type
)
return
1
;
if
(
!
(
type
&
IPV6_ADDR_UNICAST
))
return
0
;
return
ipv6_chk_addr
(
in6
,
NULL
);
}
/* This function checks if the address is a valid address to be used for
* SCTP.
*
...
...
@@ -309,7 +320,7 @@ static int sctp_v6_is_any(const union sctp_addr *addr)
*/
static
int
sctp_v6_addr_valid
(
union
sctp_addr
*
addr
)
{
int
ret
=
sctp_
ipv6_addr_type
(
&
addr
->
v6
.
sin6_addr
);
int
ret
=
ipv6_addr_type
(
&
addr
->
v6
.
sin6_addr
);
/* FIXME: v4-mapped-v6 address support. */
...
...
@@ -448,7 +459,7 @@ static int sctp_inet6_cmp_addr(const union sctp_addr *addr1,
const
union
sctp_addr
*
addr2
,
struct
sctp_opt
*
opt
)
{
struct
sctp_
func
*
af1
,
*
af2
;
struct
sctp_
af
*
af1
,
*
af2
;
af1
=
sctp_get_af_specific
(
addr1
->
sa
.
sa_family
);
af2
=
sctp_get_af_specific
(
addr2
->
sa
.
sa_family
);
...
...
@@ -465,7 +476,21 @@ static int sctp_inet6_cmp_addr(const union sctp_addr *addr1,
return
af1
->
cmp_addr
(
addr1
,
addr2
);
}
/* Verify that the provided sockaddr looks bindable. Common verification,
* has already been taken care of.
*/
static
int
sctp_inet6_bind_verify
(
struct
sctp_opt
*
opt
,
union
sctp_addr
*
addr
)
{
struct
sctp_af
*
af
;
/* ASSERT: address family has already been verified. */
if
(
addr
->
sa
.
sa_family
!=
AF_INET6
)
{
af
=
sctp_get_af_specific
(
addr
->
sa
.
sa_family
);
}
else
af
=
opt
->
pf
->
af
;
return
af
->
available
(
addr
);
}
static
struct
proto_ops
inet6_seqpacket_ops
=
{
.
family
=
PF_INET6
,
...
...
@@ -501,29 +526,33 @@ static struct inet6_protocol sctpv6_protocol = {
.
err_handler
=
sctp_v6_err
,
};
static
s
ctp_func_t
sctp_ipv6_specific
=
{
static
s
truct
sctp_af
sctp_ipv6_specific
=
{
.
queue_xmit
=
sctp_v6_xmit
,
.
setsockopt
=
ipv6_setsockopt
,
.
getsockopt
=
ipv6_getsockopt
,
.
get_dst
=
sctp_v6_get_dst
,
.
copy_addrlist
=
sctp_v6_copy_addrlist
,
.
from_skb
=
sctp_v6_from_skb
,
.
from_sk
=
sctp_v6_from_sk
,
.
to_sk
=
sctp_v6_to_sk
,
.
dst_saddr
=
sctp_v6_dst_saddr
,
.
cmp_addr
=
sctp_v6_cmp_addr
,
.
scope
=
sctp_v6_scope
,
.
addr_valid
=
sctp_v6_addr_valid
,
.
inaddr_any
=
sctp_v6_inaddr_any
,
.
is_any
=
sctp_v6_is_any
,
.
available
=
sctp_v6_available
,
.
net_header_len
=
sizeof
(
struct
ipv6hdr
),
.
sockaddr_len
=
sizeof
(
struct
sockaddr_in6
),
.
sa_family
=
AF_INET6
,
};
static
s
ctp_pf_t
sctp_pf_inet6_specific
=
{
static
s
truct
sctp_pf
sctp_pf_inet6_specific
=
{
.
event_msgname
=
sctp_inet6_event_msgname
,
.
skb_msgname
=
sctp_inet6_skb_msgname
,
.
af_supported
=
sctp_inet6_af_supported
,
.
cmp_addr
=
sctp_inet6_cmp_addr
,
.
bind_verify
=
sctp_inet6_bind_verify
,
.
af
=
&
sctp_ipv6_specific
,
};
...
...
@@ -538,11 +567,13 @@ int sctp_v6_init(void)
inet6_register_protosw
(
&
sctpv6_protosw
);
/* Register the SCTP specfic PF_INET6 functions. */
sctp_set_pf_specific
(
PF_INET6
,
&
sctp_pf_inet6_specific
);
sctp_register_pf
(
&
sctp_pf_inet6_specific
,
PF_INET6
);
/* Register the SCTP specfic AF_INET6 functions. */
sctp_register_af
(
&
sctp_ipv6_specific
);
/* Fill in address family info. */
INIT_LIST_HEAD
(
&
sctp_ipv6_specific
.
list
);
list_add_tail
(
&
sctp_ipv6_specific
.
list
,
&
sctp_proto
.
address_families
);
/* Register notifier for inet6 address additions/deletions. */
register_inet6addr_notifier
(
&
sctp_inetaddr_notifier
);
return
0
;
}
...
...
@@ -553,4 +584,5 @@ void sctp_v6_exit(void)
list_del
(
&
sctp_ipv6_specific
.
list
);
inet6_del_protocol
(
&
sctpv6_protocol
,
IPPROTO_SCTP
);
inet6_unregister_protosw
(
&
sctpv6_protosw
);
unregister_inet6addr_notifier
(
&
sctp_inetaddr_notifier
);
}
net/sctp/protocol.c
View file @
b4f94e3f
...
...
@@ -40,6 +40,7 @@
* Jon Grimm <jgrimm@us.ibm.com>
* Sridhar Samudrala <sri@us.ibm.com>
* Daisy Chang <daisyc@us.ibm.com>
* Ardelle Fan <ardelle.fan@intel.com>
*
* Any bugs reported given to us we will try to fix... any fixes shared will
* be incorporated into the next SCTP release.
...
...
@@ -67,8 +68,10 @@ struct sctp_mib sctp_statistics[NR_CPUS * 2];
*/
static
struct
socket
*
sctp_ctl_socket
;
static
sctp_pf_t
*
sctp_pf_inet6_specific
;
static
sctp_pf_t
*
sctp_pf_inet_specific
;
static
struct
sctp_pf
*
sctp_pf_inet6_specific
;
static
struct
sctp_pf
*
sctp_pf_inet_specific
;
static
struct
sctp_af
*
sctp_af_v4_specific
;
static
struct
sctp_af
*
sctp_af_v6_specific
;
extern
struct
net_proto_family
inet_family_ops
;
...
...
@@ -140,12 +143,12 @@ static void __sctp_get_local_addr_list(sctp_protocol_t *proto)
{
struct
net_device
*
dev
;
struct
list_head
*
pos
;
struct
sctp_
func
*
af
;
struct
sctp_
af
*
af
;
read_lock
(
&
dev_base_lock
);
for
(
dev
=
dev_base
;
dev
;
dev
=
dev
->
next
)
{
list_for_each
(
pos
,
&
proto
->
address_families
)
{
af
=
list_entry
(
pos
,
s
ctp_func_t
,
list
);
af
=
list_entry
(
pos
,
s
truct
sctp_af
,
list
);
af
->
copy_addrlist
(
&
proto
->
local_addr_list
,
dev
);
}
}
...
...
@@ -251,7 +254,6 @@ struct dst_entry *sctp_v4_get_dst(union sctp_addr *daddr,
return
&
rt
->
u
.
dst
;
}
/* Initialize a sctp_addr from in incoming skb. */
static
void
sctp_v4_from_skb
(
union
sctp_addr
*
addr
,
struct
sk_buff
*
skb
,
int
is_saddr
)
...
...
@@ -274,6 +276,21 @@ static void sctp_v4_from_skb(union sctp_addr *addr, struct sk_buff *skb,
memcpy
(
&
addr
->
v4
.
sin_addr
.
s_addr
,
from
,
sizeof
(
struct
in_addr
));
}
/* Initialize an sctp_addr from a socket. */
static
void
sctp_v4_from_sk
(
union
sctp_addr
*
addr
,
struct
sock
*
sk
)
{
addr
->
v4
.
sin_family
=
AF_INET
;
addr
->
v4
.
sin_port
=
inet_sk
(
sk
)
->
num
;
addr
->
v4
.
sin_addr
.
s_addr
=
inet_sk
(
sk
)
->
rcv_saddr
;
}
/* Initialize sk->rcv_saddr from sctp_addr. */
static
void
sctp_v4_to_sk
(
union
sctp_addr
*
addr
,
struct
sock
*
sk
)
{
inet_sk
(
sk
)
->
rcv_saddr
=
addr
->
v4
.
sin_addr
.
s_addr
;
}
/* Initialize a sctp_addr from a dst_entry. */
static
void
sctp_v4_dst_saddr
(
union
sctp_addr
*
saddr
,
struct
dst_entry
*
dst
)
{
...
...
@@ -311,7 +328,7 @@ static int sctp_v4_is_any(const union sctp_addr *addr)
}
/* This function checks if the address is a valid address to be used for
* SCTP.
* SCTP
binding
.
*
* Output:
* Return 0 - If the address is a non-unicast or an illegal address.
...
...
@@ -326,6 +343,18 @@ static int sctp_v4_addr_valid(union sctp_addr *addr)
return
1
;
}
/* Should this be available for binding? */
static
int
sctp_v4_available
(
const
union
sctp_addr
*
addr
)
{
int
ret
=
inet_addr_type
(
addr
->
v4
.
sin_addr
.
s_addr
);
/* FIXME: ip_nonlocal_bind sysctl support. */
if
(
addr
->
v4
.
sin_addr
.
s_addr
!=
INADDR_ANY
&&
ret
!=
RTN_LOCAL
)
return
0
;
return
1
;
}
/* Checking the loopback, private and other address scopes as defined in
* RFC 1918. The IPv4 scoping is based on the draft for SCTP IPv4
* scoping <draft-stewart-tsvwg-sctp-ipv4-00.txt>.
...
...
@@ -365,10 +394,10 @@ static sctp_scope_t sctp_v4_scope(union sctp_addr *addr)
return
retval
;
}
/* Event handler for inet
device
events.
/* Event handler for inet
address addition/deletion
events.
* Basically, whenever there is an event, we re-build our local address list.
*/
static
int
sctp_
netdev
_event
(
struct
notifier_block
*
this
,
unsigned
long
event
,
static
int
sctp_
inetaddr
_event
(
struct
notifier_block
*
this
,
unsigned
long
event
,
void
*
ptr
)
{
long
flags
__attribute__
((
unused
));
...
...
@@ -405,29 +434,42 @@ int sctp_ctl_sock_init(void)
return
0
;
}
/* Get the table of functions for manipulating a particular address
* family.
*/
sctp_func_t
*
sctp_get_af_specific
(
sa_family_t
family
)
/* Register address family specific functions. */
int
sctp_register_af
(
struct
sctp_af
*
af
)
{
struct
list_head
*
pos
;
sctp_protocol_t
*
proto
=
sctp_get_protocol
();
struct
sctp_func
*
retval
,
*
af
;
switch
(
af
->
sa_family
)
{
case
AF_INET
:
if
(
sctp_af_v4_specific
)
return
0
;
sctp_af_v4_specific
=
af
;
break
;
case
AF_INET6
:
if
(
sctp_af_v6_specific
)
return
0
;
sctp_af_v6_specific
=
af
;
break
;
default:
return
0
;
}
retval
=
NULL
;
INIT_LIST_HEAD
(
&
af
->
list
);
list_add_tail
(
&
af
->
list
,
&
sctp_proto
.
address_families
);
return
1
;
}
/* Cycle through all AF specific functions looking for a
* match
.
/* Get the table of functions for manipulating a particular address
* family
.
*/
list_for_each
(
pos
,
&
proto
->
address_families
)
{
af
=
list_entry
(
pos
,
sctp_func_t
,
list
);
if
(
family
==
af
->
sa_family
)
{
retval
=
af
;
break
;
}
struct
sctp_af
*
sctp_get_af_specific
(
sa_family_t
family
)
{
switch
(
family
)
{
case
AF_INET
:
return
sctp_af_v4_specific
;
case
AF_INET6
:
return
sctp_af_v6_specific
;
default:
return
NULL
;
}
return
retval
;
}
/* Common code to initialize a AF_INET msg_name. */
...
...
@@ -495,21 +537,28 @@ static int sctp_inet_cmp_addr(const union sctp_addr *addr1,
return
0
;
}
/* Verify that provided sockaddr looks bindable. Common verification has
* already been taken care of.
*/
static
int
sctp_inet_bind_verify
(
struct
sctp_opt
*
opt
,
union
sctp_addr
*
addr
)
{
return
sctp_v4_available
(
addr
);
}
struct
sctp_
func
sctp_ipv4_specific
;
struct
sctp_
af
sctp_ipv4_specific
;
static
s
ctp_pf_t
sctp_pf_inet
=
{
static
s
truct
sctp_pf
sctp_pf_inet
=
{
.
event_msgname
=
sctp_inet_event_msgname
,
.
skb_msgname
=
sctp_inet_skb_msgname
,
.
af_supported
=
sctp_inet_af_supported
,
.
cmp_addr
=
sctp_inet_cmp_addr
,
.
bind_verify
=
sctp_inet_bind_verify
,
.
af
=
&
sctp_ipv4_specific
,
};
/* Registration for netdev events. */
struct
notifier_block
sctp_netdev_notifier
=
{
.
notifier_call
=
sctp_netdev_event
,
/* Notifier for inetaddr addition/deletion events. */
struct
notifier_block
sctp_inetaddr_notifier
=
{
.
notifier_call
=
sctp_inetaddr_event
,
};
/* Socket operations. */
...
...
@@ -551,25 +600,28 @@ static struct inet_protocol sctp_protocol = {
};
/* IPv4 address related functions. */
struct
sctp_
func
sctp_ipv4_specific
=
{
struct
sctp_
af
sctp_ipv4_specific
=
{
.
queue_xmit
=
ip_queue_xmit
,
.
setsockopt
=
ip_setsockopt
,
.
getsockopt
=
ip_getsockopt
,
.
get_dst
=
sctp_v4_get_dst
,
.
copy_addrlist
=
sctp_v4_copy_addrlist
,
.
from_skb
=
sctp_v4_from_skb
,
.
from_sk
=
sctp_v4_from_sk
,
.
to_sk
=
sctp_v4_to_sk
,
.
dst_saddr
=
sctp_v4_dst_saddr
,
.
cmp_addr
=
sctp_v4_cmp_addr
,
.
addr_valid
=
sctp_v4_addr_valid
,
.
inaddr_any
=
sctp_v4_inaddr_any
,
.
is_any
=
sctp_v4_is_any
,
.
available
=
sctp_v4_available
,
.
scope
=
sctp_v4_scope
,
.
net_header_len
=
sizeof
(
struct
iphdr
),
.
sockaddr_len
=
sizeof
(
struct
sockaddr_in
),
.
sa_family
=
AF_INET
,
};
s
ctp_pf_t
*
sctp_get_pf_specific
(
in
t
family
)
{
s
truct
sctp_pf
*
sctp_get_pf_specific
(
sa_family_
t
family
)
{
switch
(
family
)
{
case
PF_INET
:
...
...
@@ -581,20 +633,24 @@ sctp_pf_t *sctp_get_pf_specific(int family) {
}
}
/*
Set
the PF specific function table. */
void
sctp_set_pf_specific
(
int
family
,
sctp_pf_t
*
pf
)
/*
Register
the PF specific function table. */
int
sctp_register_pf
(
struct
sctp_pf
*
pf
,
sa_family_t
family
)
{
switch
(
family
)
{
case
PF_INET
:
if
(
sctp_pf_inet_specific
)
return
0
;
sctp_pf_inet_specific
=
pf
;
break
;
case
PF_INET6
:
if
(
sctp_pf_inet6_specific
)
return
0
;
sctp_pf_inet6_specific
=
pf
;
break
;
default:
BUG
();
break
;
return
0
;
}
return
1
;
}
/* Initialize the universe into something sensible. */
...
...
@@ -617,7 +673,7 @@ int sctp_init(void)
sctp_dbg_objcnt_init
();
/* Initialize the SCTP specific PF functions. */
sctp_
set_pf_specific
(
PF_INET
,
&
sctp_pf_inet
);
sctp_
register_pf
(
&
sctp_pf_inet
,
PF_INET
);
/*
* 14. Suggested SCTP Protocol Parameter Values
*/
...
...
@@ -636,6 +692,9 @@ int sctp_init(void)
/* Valid.Cookie.Life - 60 seconds */
sctp_proto
.
valid_cookie_life
=
60
*
HZ
;
/* Whether Cookie Preservative is enabled(1) or not(0) */
sctp_proto
.
cookie_preserve_enable
=
1
;
/* Max.Burst - 4 */
sctp_proto
.
max_burst
=
SCTP_MAX_BURST
;
...
...
@@ -709,8 +768,7 @@ int sctp_init(void)
sctp_sysctl_register
();
INIT_LIST_HEAD
(
&
sctp_proto
.
address_families
);
INIT_LIST_HEAD
(
&
sctp_ipv4_specific
.
list
);
list_add_tail
(
&
sctp_ipv4_specific
.
list
,
&
sctp_proto
.
address_families
);
sctp_register_af
(
&
sctp_ipv4_specific
);
status
=
sctp_v6_init
();
if
(
status
)
...
...
@@ -727,7 +785,9 @@ int sctp_init(void)
INIT_LIST_HEAD
(
&
sctp_proto
.
local_addr_list
);
sctp_proto
.
local_addr_lock
=
SPIN_LOCK_UNLOCKED
;
register_inetaddr_notifier
(
&
sctp_netdev_notifier
);
/* Register notifier for inet address additions/deletions. */
register_inetaddr_notifier
(
&
sctp_inetaddr_notifier
);
sctp_get_local_addr_list
(
&
sctp_proto
);
return
0
;
...
...
@@ -757,8 +817,10 @@ void sctp_exit(void)
* up all the remaining associations and all that memory.
*/
/* Unregister notifier for inet address additions/deletions. */
unregister_inetaddr_notifier
(
&
sctp_inetaddr_notifier
);
/* Free the local address list. */
unregister_inetaddr_notifier
(
&
sctp_netdev_notifier
);
sctp_free_local_addr_list
(
&
sctp_proto
);
/* Free the control endpoint. */
...
...
net/sctp/sm_make_chunk.c
View file @
b4f94e3f
...
...
@@ -77,11 +77,16 @@ static const sctp_supported_addrs_param_t sat_param = {
{
SCTP_PARAM_SUPPORTED_ADDRESS_TYPES
,
__constant_htons
(
SCTP_SAT_LEN
),
},
{
/* types[] */
}
};
/* gcc 3.2 doesn't allow initialization of zero-length arrays. So the above
* structure is split and the address types array is initialized using a
* fixed length array.
*/
static
const
__u16
sat_addr_types
[
2
]
=
{
SCTP_PARAM_IPV4_ADDRESS
,
SCTP_V6
(
SCTP_PARAM_IPV6_ADDRESS
,)
}
};
/* RFC 2960 3.3.2 Initiation (INIT) (1)
...
...
@@ -163,7 +168,7 @@ void sctp_init_cause(sctp_chunk_t *chunk, __u16 cause_code,
*/
sctp_chunk_t
*
sctp_make_init
(
const
sctp_association_t
*
asoc
,
const
sctp_bind_addr_t
*
bp
,
int
priority
)
int
priority
,
int
vparam_len
)
{
sctp_inithdr_t
init
;
union
sctp_params
addrs
;
...
...
@@ -192,6 +197,7 @@ sctp_chunk_t *sctp_make_init(const sctp_association_t *asoc,
chunksize
=
sizeof
(
init
)
+
addrs_len
+
SCTP_SAT_LEN
;
chunksize
+=
sizeof
(
ecap_param
);
chunksize
+=
vparam_len
;
/* RFC 2960 3.3.2 Initiation (INIT) (1)
*
...
...
@@ -213,7 +219,10 @@ sctp_chunk_t *sctp_make_init(const sctp_association_t *asoc,
sctp_addto_chunk
(
retval
,
sizeof
(
init
),
&
init
);
retval
->
param_hdr
.
v
=
sctp_addto_chunk
(
retval
,
addrs_len
,
addrs
.
v
);
sctp_addto_chunk
(
retval
,
SCTP_SAT_LEN
,
&
sat_param
);
sctp_addto_chunk
(
retval
,
sizeof
(
sctp_paramhdr_t
),
&
sat_param
);
sctp_addto_chunk
(
retval
,
sizeof
(
sat_addr_types
),
sat_addr_types
);
sctp_addto_chunk
(
retval
,
sizeof
(
ecap_param
),
&
ecap_param
);
nodata:
...
...
@@ -1337,7 +1346,7 @@ sctp_cookie_param_t *sctp_pack_cookie(const sctp_endpoint_t *ep,
sctp_association_t
*
sctp_unpack_cookie
(
const
sctp_endpoint_t
*
ep
,
const
sctp_association_t
*
asoc
,
sctp_chunk_t
*
chunk
,
int
priority
,
int
*
error
)
int
*
error
,
sctp_chunk_t
**
err_chk_p
)
{
sctp_association_t
*
retval
=
NULL
;
sctp_signed_cookie_t
*
cookie
;
...
...
@@ -1394,7 +1403,29 @@ sctp_association_t *sctp_unpack_cookie(const sctp_endpoint_t *ep,
* for init collision case of lost COOKIE ACK.
*/
if
(
!
asoc
&&
tv_lt
(
bear_cookie
->
expiration
,
chunk
->
skb
->
stamp
))
{
/*
* Section 3.3.10.3 Stale Cookie Error (3)
*
* Cause of error
* ---------------
* Stale Cookie Error: Indicates the receipt of a valid State
* Cookie that has expired.
*/
*
err_chk_p
=
sctp_make_op_error_space
(
asoc
,
chunk
,
ntohs
(
chunk
->
chunk_hdr
->
length
));
if
(
*
err_chk_p
)
{
suseconds_t
usecs
=
(
chunk
->
skb
->
stamp
.
tv_sec
-
bear_cookie
->
expiration
.
tv_sec
)
*
1000000L
+
chunk
->
skb
->
stamp
.
tv_usec
-
bear_cookie
->
expiration
.
tv_usec
;
usecs
=
htonl
(
usecs
);
sctp_init_cause
(
*
err_chk_p
,
SCTP_ERROR_STALE_COOKIE
,
&
usecs
,
sizeof
(
usecs
));
*
error
=
-
SCTP_IERROR_STALE_COOKIE
;
}
else
*
error
=
-
SCTP_IERROR_NOMEM
;
goto
fail
;
}
...
...
@@ -1751,6 +1782,7 @@ int sctp_process_param(sctp_association_t *asoc, union sctp_params param,
__u16
sat
;
int
retval
=
1
;
sctp_scope_t
scope
;
time_t
stale
;
/* We maintain all INIT parameters in network byte order all the
* time. This allows us to not worry about whether the parameters
...
...
@@ -1770,8 +1802,16 @@ int sctp_process_param(sctp_association_t *asoc, union sctp_params param,
break
;
case
SCTP_PARAM_COOKIE_PRESERVATIVE
:
asoc
->
cookie_preserve
=
ntohl
(
param
.
life
->
lifespan_increment
);
if
(
!
sctp_proto
.
cookie_preserve_enable
)
break
;
stale
=
ntohl
(
param
.
life
->
lifespan_increment
);
/* Suggested Cookie Life span increment's unit is msec,
* (1/1000sec).
*/
asoc
->
cookie_life
.
tv_sec
+=
stale
/
1000
;
asoc
->
cookie_life
.
tv_usec
+=
(
stale
%
1000
)
*
1000
;
break
;
case
SCTP_PARAM_HOST_NAME_ADDRESS
:
...
...
net/sctp/sm_sideeffect.c
View file @
b4f94e3f
...
...
@@ -68,7 +68,8 @@ static void sctp_do_8_2_transport_strike(sctp_association_t *asoc,
sctp_transport_t
*
transport
);
static
void
sctp_cmd_init_failed
(
sctp_cmd_seq_t
*
,
sctp_association_t
*
asoc
);
static
void
sctp_cmd_assoc_failed
(
sctp_cmd_seq_t
*
,
sctp_association_t
*
asoc
,
sctp_event_t
event_type
,
sctp_chunk_t
*
chunk
);
sctp_event_t
event_type
,
sctp_subtype_t
stype
,
sctp_chunk_t
*
chunk
);
static
int
sctp_cmd_process_init
(
sctp_cmd_seq_t
*
,
sctp_association_t
*
asoc
,
sctp_chunk_t
*
chunk
,
sctp_init_chunk_t
*
peer_init
,
...
...
@@ -517,7 +518,7 @@ int sctp_cmd_interpreter(sctp_event_t event_type, sctp_subtype_t subtype,
case
SCTP_CMD_ASSOC_FAILED
:
sctp_cmd_assoc_failed
(
commands
,
asoc
,
event_type
,
chunk
);
subtype
,
chunk
);
break
;
case
SCTP_CMD_COUNTER_INC
:
...
...
@@ -736,6 +737,9 @@ int sctp_gen_sack(sctp_association_t *asoc, int force, sctp_cmd_seq_t *commands)
if
(
!
sack
)
goto
nomem
;
/* Update the last advertised rwnd value. */
asoc
->
a_rwnd
=
asoc
->
rwnd
;
asoc
->
peer
.
sack_needed
=
0
;
asoc
->
peer
.
next_dup_tsn
=
0
;
...
...
@@ -1046,19 +1050,28 @@ static void sctp_cmd_init_failed(sctp_cmd_seq_t *commands,
static
void
sctp_cmd_assoc_failed
(
sctp_cmd_seq_t
*
commands
,
sctp_association_t
*
asoc
,
sctp_event_t
event_type
,
sctp_subtype_t
subtype
,
sctp_chunk_t
*
chunk
)
{
sctp_ulpevent_t
*
event
;
__u16
error
=
0
;
if
(
event_type
==
SCTP_EVENT_T_PRIMITIVE
)
switch
(
event_type
)
{
case
SCTP_EVENT_T_PRIMITIVE
:
if
(
SCTP_PRIMITIVE_ABORT
==
subtype
.
primitive
)
error
=
SCTP_ERROR_USER_ABORT
;
break
;
case
SCTP_EVENT_T_CHUNK
:
if
(
chunk
&&
(
SCTP_CID_ABORT
==
chunk
->
chunk_hdr
->
type
)
&&
(
ntohs
(
chunk
->
chunk_hdr
->
length
)
>=
(
sizeof
(
struct
sctp_chunkhdr
)
+
(
ntohs
(
chunk
->
chunk_hdr
->
length
)
>=
(
sizeof
(
struct
sctp_chunkhdr
)
+
sizeof
(
struct
sctp_errhdr
))))
{
error
=
((
sctp_errhdr_t
*
)
chunk
->
skb
->
data
)
->
cause
;
}
break
;
default:
break
;
}
event
=
sctp_ulpevent_make_assoc_change
(
asoc
,
0
,
...
...
net/sctp/sm_statefuns.c
View file @
b4f94e3f
...
...
@@ -2,6 +2,7 @@
* Copyright (c) 1999-2000 Cisco, Inc.
* Copyright (c) 1999-2001 Motorola, Inc.
* Copyright (c) 2001-2002 International Business Machines, Corp.
* Copyright (c) 2001-2002 Intel Corp.
* Copyright (c) 2002 Nokia Corp.
*
* This file is part of the SCTP kernel reference Implementation
...
...
@@ -502,6 +503,7 @@ sctp_disposition_t sctp_sf_do_5_1D_ce(const sctp_endpoint_t *ep,
sctp_chunk_t
*
repl
;
sctp_ulpevent_t
*
ev
;
int
error
=
0
;
sctp_chunk_t
*
err_chk_p
;
/* If the packet is an OOTB packet which is temporarily on the
* control endpoint, responding with an ABORT.
...
...
@@ -521,7 +523,8 @@ sctp_disposition_t sctp_sf_do_5_1D_ce(const sctp_endpoint_t *ep,
* "Z" will reply with a COOKIE ACK chunk after building a TCB
* and moving to the ESTABLISHED state.
*/
new_asoc
=
sctp_unpack_cookie
(
ep
,
asoc
,
chunk
,
GFP_ATOMIC
,
&
error
);
new_asoc
=
sctp_unpack_cookie
(
ep
,
asoc
,
chunk
,
GFP_ATOMIC
,
&
error
,
&
err_chk_p
);
/* FIXME:
* If the re-build failed, what is the proper error path
...
...
@@ -537,6 +540,11 @@ sctp_disposition_t sctp_sf_do_5_1D_ce(const sctp_endpoint_t *ep,
case
-
SCTP_IERROR_NOMEM
:
goto
nomem
;
case
-
SCTP_IERROR_STALE_COOKIE
:
sctp_send_stale_cookie_err
(
ep
,
asoc
,
chunk
,
commands
,
err_chk_p
);
return
sctp_sf_pdiscard
(
ep
,
asoc
,
type
,
arg
,
commands
);
case
-
SCTP_IERROR_BAD_SIG
:
default:
return
sctp_sf_pdiscard
(
ep
,
asoc
,
type
,
arg
,
commands
);
...
...
@@ -862,8 +870,8 @@ sctp_disposition_t sctp_sf_backbeat_8_3(const sctp_endpoint_t *ep,
/* Check if the timestamp looks valid. */
if
(
time_after
(
hbinfo
->
sent_at
,
jiffies
)
||
time_after
(
jiffies
,
hbinfo
->
sent_at
+
max_interval
))
{
SCTP_DEBUG_PRINTK
(
"%s: HEARTBEAT ACK with invalid timestamp
received for transport: %p
\n
"
,
SCTP_DEBUG_PRINTK
(
"%s: HEARTBEAT ACK with invalid timestamp
"
"
received for transport: %p
\n
"
,
__FUNCTION__
,
link
);
return
SCTP_DISPOSITION_DISCARD
;
}
...
...
@@ -1562,6 +1570,7 @@ sctp_disposition_t sctp_sf_do_5_2_4_dupcook(const sctp_endpoint_t *ep,
sctp_association_t
*
new_asoc
;
int
error
=
0
;
char
action
;
sctp_chunk_t
*
err_chk_p
;
/* "Decode" the chunk. We have no optional parameters so we
* are in good shape.
...
...
@@ -1575,7 +1584,8 @@ sctp_disposition_t sctp_sf_do_5_2_4_dupcook(const sctp_endpoint_t *ep,
* current association, consider the State Cookie valid even if
* the lifespan is exceeded.
*/
new_asoc
=
sctp_unpack_cookie
(
ep
,
asoc
,
chunk
,
GFP_ATOMIC
,
&
error
);
new_asoc
=
sctp_unpack_cookie
(
ep
,
asoc
,
chunk
,
GFP_ATOMIC
,
&
error
,
&
err_chk_p
);
/* FIXME:
* If the re-build failed, what is the proper error path
...
...
@@ -1591,6 +1601,12 @@ sctp_disposition_t sctp_sf_do_5_2_4_dupcook(const sctp_endpoint_t *ep,
case
-
SCTP_IERROR_NOMEM
:
goto
nomem
;
case
-
SCTP_IERROR_STALE_COOKIE
:
sctp_send_stale_cookie_err
(
ep
,
asoc
,
chunk
,
commands
,
err_chk_p
);
return
sctp_sf_pdiscard
(
ep
,
asoc
,
type
,
arg
,
commands
);
break
;
case
-
SCTP_IERROR_BAD_SIG
:
default:
return
sctp_sf_pdiscard
(
ep
,
asoc
,
type
,
arg
,
commands
);
...
...
@@ -1706,7 +1722,47 @@ sctp_disposition_t sctp_sf_shutdown_ack_sent_abort(const sctp_endpoint_t *ep,
return
sctp_sf_shutdown_sent_abort
(
ep
,
asoc
,
type
,
arg
,
commands
);
}
#if 0
/*
* Handle an Error received in COOKIE_ECHOED state.
*
* Only handle the error type of stale COOKIE Error, the other errors will
* be ignored.
*
* Inputs
* (endpoint, asoc, chunk)
*
* Outputs
* (asoc, reply_msg, msg_up, timers, counters)
*
* The return value is the disposition of the chunk.
*/
sctp_disposition_t
sctp_sf_cookie_echoed_err
(
const
sctp_endpoint_t
*
ep
,
const
sctp_association_t
*
asoc
,
const
sctp_subtype_t
type
,
void
*
arg
,
sctp_cmd_seq_t
*
commands
)
{
sctp_chunk_t
*
chunk
=
arg
;
sctp_errhdr_t
*
err
;
/* If we have gotten too many failures, give up. */
if
(
1
+
asoc
->
counters
[
SCTP_COUNTER_INIT_ERROR
]
>
asoc
->
max_init_attempts
)
{
/* INIT_FAILED will issue an ulpevent. */
sctp_add_cmd_sf
(
commands
,
SCTP_CMD_INIT_FAILED
,
SCTP_NULL
());
return
SCTP_DISPOSITION_DELETE_TCB
;
}
err
=
(
sctp_errhdr_t
*
)(
chunk
->
skb
->
data
);
/* Process the error here */
switch
(
err
->
cause
)
{
case
SCTP_ERROR_STALE_COOKIE
:
return
sctp_sf_do_5_2_6_stale
(
ep
,
asoc
,
type
,
arg
,
commands
);
default:
return
sctp_sf_pdiscard
(
ep
,
asoc
,
type
,
arg
,
commands
);
}
}
/*
* Handle a Stale COOKIE Error
*
...
...
@@ -1732,47 +1788,30 @@ sctp_disposition_t sctp_sf_shutdown_ack_sent_abort(const sctp_endpoint_t *ep,
*
* The return value is the disposition of the chunk.
*/
sctp_disposition_t do_5_2_6_stale(const sctp_endpoint_t *ep,
sctp_disposition_t
sctp_sf_
do_5_2_6_stale
(
const
sctp_endpoint_t
*
ep
,
const
sctp_association_t
*
asoc
,
const
sctp_subtype_t
type
,
void
*
arg
,
sctp_cmd_seq_t
*
commands
)
{
sctp_chunk_t
*
chunk
=
arg
;
time_t
stale
;
sctp_cookie_preserve_param_t
bht
;
sctp_errhdr_t
*
err
;
struct
list_head
*
pos
;
sctp_transport_t
*
t
;
sctp_chunk_t
*
reply
;
sctp_bind_addr_t
*
bp
;
int
attempts
;
/* This is not a real chunk type. It is a subtype of the
* ERROR chunk type. The ERROR chunk processing will bring us
* here.
*/
sctp_chunk_t *in_packet;
stp_chunk_t *reply;
sctp_inithdr_t initack;
__u8 *addrs;
int addrs_len;
time_t rtt;
struct sctpCookiePreserve bht;
attempts
=
asoc
->
counters
[
SCTP_COUNTER_INIT_ERROR
]
+
1
;
/* If we have gotten too many failures, give up. */
if (1 + asoc->counters[SctpCounterInits] > asoc->max_init_attempts) {
/* FIXME: Move to new ulpevent. */
retval->event_up = sctp_make_ulp_init_timeout(asoc);
if (!retval->event_up)
goto nomem;
sctp_add_cmd_sf(retval->commands, SCTP_CMD_DELETE_TCB,
SCTP_NULL());
if
(
attempts
>=
asoc
->
max_init_attempts
)
{
sctp_add_cmd_sf
(
commands
,
SCTP_CMD_INIT_FAILED
,
SCTP_NULL
());
return
SCTP_DISPOSITION_DELETE_TCB
;
}
retval->counters[0] = SCTP_COUNTER_INCR;
retval->counters[0] = SctpCounterInits;
retval->counters[1] = 0;
retval->counters[1] = 0;
/* Calculate the RTT in ms. */
/* BUG--we should get the send time of the HEARTBEAT REQUEST. */
in_packet = chunk;
rtt = 1000 * timeval_sub(in_packet->skb->stamp,
asoc->c.state_timestamp);
err
=
(
sctp_errhdr_t
*
)(
chunk
->
skb
->
data
);
/* When calculating the time extension, an implementation
* SHOULD use the RTT information measured based on the
...
...
@@ -1780,28 +1819,48 @@ sctp_disposition_t do_5_2_6_stale(const sctp_endpoint_t *ep,
* more than 1 second beyond the measured RTT, due to long
* State Cookie lifetimes making the endpoint more subject to
* a replay attack.
* Measure of Staleness's unit is usec. (1/1000000 sec)
* Suggested Cookie Life-span Increment's unit is msec.
* (1/1000 sec)
* In general, if you use the suggested cookie life, the value
* found in the field of measure of staleness should be doubled
* to give ample time to retransmit the new cookie and thus
* yield a higher probability of success on the reattempt.
*/
bht.p = {SCTP_COOKIE_PRESERVE, 8}
;
bht.extraTime = htonl(rtt + 1000)
;
stale
=
ntohl
(
*
(
suseconds_t
*
)((
u8
*
)
err
+
sizeof
(
sctp_errhdr_t
)))
;
stale
=
stale
<<
1
/
1000
;
initack.init_tag = htonl(asoc->c.my_vtag);
initack.a_rwnd = htonl(atomic_read(&asoc->rnwd));
initack.num_outbound_streams = htons(asoc->streamoutcnt);
initack.num_inbound_streams = htons(asoc->streamincnt);
initack.initial_tsn = htonl(asoc->c.initSeqNumber);
sctp_get_my_addrs(asoc, &addrs, &addrs_len);
bht
.
param_hdr
.
type
=
SCTP_PARAM_COOKIE_PRESERVATIVE
;
bht
.
param_hdr
.
length
=
htons
(
sizeof
(
bht
));
bht
.
lifespan_increment
=
htonl
(
stale
);
/* Build that new INIT chunk. */
reply = sctp_make_chunk(SCTP_INITIATION, 0,
sizeof(initack)
+ sizeof(bht)
+ addrs_len);
bp
=
(
sctp_bind_addr_t
*
)
&
asoc
->
base
.
bind_addr
;
reply
=
sctp_make_init
(
asoc
,
bp
,
GFP_ATOMIC
,
sizeof
(
bht
));
if
(
!
reply
)
goto
nomem
;
sctp_addto_chunk(reply, sizeof(initack), &initack);
sctp_addto_chunk
(
reply
,
sizeof
(
bht
),
&
bht
);
sctp_addto_chunk(reply, addrs_len, addrs);
/* Cast away the const modifier, as we want to just
* rerun it through as a sideffect.
*/
sctp_add_cmd_sf
(
commands
,
SCTP_CMD_COUNTER_INC
,
SCTP_COUNTER
(
SCTP_COUNTER_INIT_ERROR
));
/* If we've sent any data bundled with COOKIE-ECHO we need to resend. */
list_for_each
(
pos
,
&
asoc
->
peer
.
transport_addr_list
)
{
t
=
list_entry
(
pos
,
sctp_transport_t
,
transports
);
sctp_add_cmd_sf
(
commands
,
SCTP_CMD_RETRAN
,
SCTP_TRANSPORT
(
t
));
}
sctp_add_cmd_sf
(
commands
,
SCTP_CMD_TIMER_STOP
,
SCTP_TO
(
SCTP_EVENT_TIMEOUT_T1_COOKIE
));
sctp_add_cmd_sf
(
commands
,
SCTP_CMD_NEW_STATE
,
SCTP_STATE
(
SCTP_STATE_COOKIE_WAIT
));
sctp_add_cmd_sf
(
commands
,
SCTP_CMD_TIMER_START
,
SCTP_TO
(
SCTP_EVENT_TIMEOUT_T1_INIT
));
sctp_add_cmd_sf
(
commands
,
SCTP_CMD_REPLY
,
SCTP_CHUNK
(
reply
));
return
SCTP_DISPOSITION_CONSUME
;
...
...
@@ -1809,7 +1868,6 @@ sctp_disposition_t do_5_2_6_stale(const sctp_endpoint_t *ep,
nomem:
return
SCTP_DISPOSITION_NOMEM
;
}
#endif /* 0 */
/*
* Process an ABORT.
...
...
@@ -3220,7 +3278,7 @@ sctp_disposition_t sctp_sf_do_prm_asoc(const sctp_endpoint_t *ep,
* 1 to 4294967295 (see 5.3.1 for Tag value selection). ...
*/
repl
=
sctp_make_init
(
asoc
,
bp
,
GFP_ATOMIC
);
repl
=
sctp_make_init
(
asoc
,
bp
,
GFP_ATOMIC
,
0
);
if
(
!
repl
)
goto
nomem
;
...
...
@@ -3992,7 +4050,7 @@ sctp_disposition_t sctp_sf_t1_timer_expire(const sctp_endpoint_t *ep,
switch
(
timer
)
{
case
SCTP_EVENT_TIMEOUT_T1_INIT
:
bp
=
(
sctp_bind_addr_t
*
)
&
asoc
->
base
.
bind_addr
;
repl
=
sctp_make_init
(
asoc
,
bp
,
GFP_ATOMIC
);
repl
=
sctp_make_init
(
asoc
,
bp
,
GFP_ATOMIC
,
0
);
break
;
case
SCTP_EVENT_TIMEOUT_T1_COOKIE
:
...
...
@@ -4334,3 +4392,25 @@ void sctp_ootb_pkt_free(sctp_packet_t *packet)
sctp_transport_free
(
packet
->
transport
);
sctp_packet_free
(
packet
);
}
/* Send a stale cookie error when a invalid COOKIE ECHO chunk is found */
void
sctp_send_stale_cookie_err
(
const
sctp_endpoint_t
*
ep
,
const
sctp_association_t
*
asoc
,
const
sctp_chunk_t
*
chunk
,
sctp_cmd_seq_t
*
commands
,
sctp_chunk_t
*
err_chunk
)
{
sctp_packet_t
*
packet
;
if
(
err_chunk
)
{
packet
=
sctp_ootb_pkt_new
(
asoc
,
chunk
);
if
(
packet
)
{
/* Set the skb to the belonging sock for accounting. */
err_chunk
->
skb
->
sk
=
ep
->
base
.
sk
;
sctp_packet_append_chunk
(
packet
,
err_chunk
);
sctp_add_cmd_sf
(
commands
,
SCTP_CMD_SEND_PKT
,
SCTP_PACKET
(
packet
));
}
else
sctp_free_chunk
(
err_chunk
);
}
}
net/sctp/sm_statetable.c
View file @
b4f94e3f
...
...
@@ -295,7 +295,7 @@ sctp_sm_table_entry_t *sctp_sm_lookup_event(sctp_event_t event_type,
/* SCTP_STATE_COOKIE_WAIT */
\
{.fn = sctp_sf_not_impl, .name = "sctp_sf_not_impl"}, \
/* SCTP_STATE_COOKIE_ECHOED */
\
{.fn = sctp_sf_
not_impl, .name = "sctp_sf_not_impl
"}, \
{.fn = sctp_sf_
cookie_echoed_err, .name = "sctp_sf_cookie_echoed_err
"}, \
/* SCTP_STATE_ESTABLISHED */
\
{.fn = sctp_sf_operr_notify, .name = "sctp_sf_operr_notify"}, \
/* SCTP_STATE_SHUTDOWN_PENDING */
\
...
...
net/sctp/socket.c
View file @
b4f94e3f
...
...
@@ -87,12 +87,7 @@ static int sctp_wait_for_sndbuf(sctp_association_t *asoc, long *timeo_p,
int
msg_len
);
static
int
sctp_wait_for_packet
(
struct
sock
*
sk
,
int
*
err
,
long
*
timeo_p
);
static
int
sctp_wait_for_connect
(
sctp_association_t
*
asoc
,
long
*
timeo_p
);
static
inline
void
sctp_sk_addr_set
(
struct
sock
*
,
const
union
sctp_addr
*
newaddr
,
union
sctp_addr
*
saveaddr
);
static
inline
void
sctp_sk_addr_restore
(
struct
sock
*
,
const
union
sctp_addr
*
);
static
inline
int
sctp_verify_addr
(
struct
sock
*
,
struct
sockaddr
*
,
int
);
static
inline
int
sctp_verify_addr
(
struct
sock
*
,
union
sctp_addr
*
,
int
);
static
int
sctp_bindx_add
(
struct
sock
*
,
struct
sockaddr_storage
*
,
int
);
static
int
sctp_bindx_rem
(
struct
sock
*
,
struct
sockaddr_storage
*
,
int
);
static
int
sctp_do_bind
(
struct
sock
*
,
union
sctp_addr
*
,
int
);
...
...
@@ -133,101 +128,75 @@ int sctp_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len)
return
retval
;
}
static
long
sctp_get_port_local
(
struct
sock
*
,
un
signed
short
);
static
long
sctp_get_port_local
(
struct
sock
*
,
un
ion
sctp_addr
*
);
/*
Bind a local address either to an endpoint or to an association.
*/
SCTP_STATIC
int
sctp_do_bind
(
struct
sock
*
sk
,
union
sctp_addr
*
newaddr
,
int
addr_
len
)
/*
Verify this is a valid sockaddr.
*/
static
struct
sctp_af
*
sctp_sockaddr_af
(
struct
sctp_opt
*
opt
,
union
sctp_addr
*
addr
,
int
len
)
{
sctp_opt_t
*
sp
=
sctp_sk
(
sk
);
sctp_endpoint_t
*
ep
=
sp
->
ep
;
sctp_bind_addr_t
*
bp
=
&
ep
->
base
.
bind_addr
;
unsigned
short
sa_family
=
newaddr
->
sa
.
sa_family
;
union
sctp_addr
tmpaddr
,
saveaddr
;
unsigned
short
*
snum
;
int
ret
=
0
;
struct
sctp_af
*
af
;
SCTP_DEBUG_PRINTK
(
"sctp_do_bind(sk: %p, newaddr: %p, addr_len: %d)
\n
"
,
sk
,
newaddr
,
addr_len
);
/* Check minimum size. */
if
(
len
<
sizeof
(
struct
sockaddr
))
return
NULL
;
/* FIXME: This function needs to handle v4-mapped-on-v6
* addresses!
*/
if
(
PF_INET
==
sk
->
family
)
{
if
(
sa_family
!=
AF_INET
)
return
-
EINVAL
;
}
/* Does this PF support this AF? */
if
(
!
opt
->
pf
->
af_supported
(
addr
->
sa
.
sa_family
))
return
NULL
;
/*
Make a local copy of the new address.
*/
tmpaddr
=
*
newaddr
;
/*
If we get this far, af is valid.
*/
af
=
sctp_get_af_specific
(
addr
->
sa
.
sa_family
)
;
switch
(
sa_family
)
{
case
AF_INET
:
if
(
addr_len
<
sizeof
(
struct
sockaddr_in
))
return
-
EINVAL
;
if
(
len
<
af
->
sockaddr_len
)
return
NULL
;
ret
=
inet_addr_type
(
newaddr
->
v4
.
sin_addr
.
s_addr
);
return
af
;
}
/* FIXME:
* Should we allow apps to bind to non-local addresses by
* checking the IP sysctl parameter "ip_nonlocal_bind"?
*/
if
(
newaddr
->
v4
.
sin_addr
.
s_addr
!=
INADDR_ANY
&&
ret
!=
RTN_LOCAL
)
return
-
EADDRNOTAVAIL
;
tmpaddr
.
v4
.
sin_port
=
htons
(
tmpaddr
.
v4
.
sin_port
);
snum
=
&
tmpaddr
.
v4
.
sin_port
;
break
;
/* Bind a local address either to an endpoint or to an association. */
SCTP_STATIC
int
sctp_do_bind
(
struct
sock
*
sk
,
union
sctp_addr
*
addr
,
int
len
)
{
sctp_opt_t
*
sp
=
sctp_sk
(
sk
);
sctp_endpoint_t
*
ep
=
sp
->
ep
;
sctp_bind_addr_t
*
bp
=
&
ep
->
base
.
bind_addr
;
struct
sctp_af
*
af
;
unsigned
short
snum
;
int
ret
=
0
;
case
AF_INET6
:
SCTP_V6
(
/* FIXME: Hui, please verify this. Looking at
* the ipv6 code I see a SIN6_LEN_RFC2133 check.
* I'm guessing that scope_id is a newer addition.
*/
if
(
addr_len
<
sizeof
(
struct
sockaddr_in6
))
SCTP_DEBUG_PRINTK
(
"sctp_do_bind(sk: %p, newaddr: %p, len: %d)
\n
"
,
sk
,
addr
,
len
);
/* Common sockaddr verification. */
af
=
sctp_sockaddr_af
(
sp
,
addr
,
len
);
if
(
!
af
)
return
-
EINVAL
;
/* FIXME - The support for IPv6 multiple types
* of addresses need to be added later.
*/
ret
=
sctp_ipv6_addr_type
(
&
newaddr
->
v6
.
sin6_addr
);
tmpaddr
.
v6
.
sin6_port
=
htons
(
tmpaddr
.
v6
.
sin6_port
);
snum
=
&
tmpaddr
.
v6
.
sin6_port
;
break
;
)
/* PF specific bind() address verification. */
if
(
!
sp
->
pf
->
bind_verify
(
sp
,
addr
))
return
-
EADDRNOTAVAIL
;
default:
return
-
EINVAL
;
};
snum
=
ntohs
(
addr
->
v4
.
sin_port
);
SCTP_DEBUG_PRINTK
(
"sctp_do_bind: port: %d, new port: %d
\n
"
,
bp
->
port
,
*
snum
);
bp
->
port
,
snum
);
/* We must either be unbound, or bind to the same port. */
if
(
bp
->
port
&&
(
*
snum
!=
bp
->
port
))
{
if
(
bp
->
port
&&
(
snum
!=
bp
->
port
))
{
SCTP_DEBUG_PRINTK
(
"sctp_do_bind:"
" New port %d does not match existing port "
"%d.
\n
"
,
*
snum
,
bp
->
port
);
"%d.
\n
"
,
snum
,
bp
->
port
);
return
-
EINVAL
;
}
if
(
*
snum
&&
*
snum
<
PROT_SOCK
&&
!
capable
(
CAP_NET_BIND_SERVICE
))
if
(
snum
&&
snum
<
PROT_SOCK
&&
!
capable
(
CAP_NET_BIND_SERVICE
))
return
-
EACCES
;
/* FIXME - Make socket understand that there might be multiple bind
* addresses and there will be multiple source addresses involved in
* routing and failover decisions.
*/
sctp_sk_addr_set
(
sk
,
&
tmpaddr
,
&
saveaddr
);
/* Make sure we are allowed to bind here.
* The function sctp_get_port_local() does duplicate address
* detection.
*/
if
((
ret
=
sctp_get_port_local
(
sk
,
*
snum
)))
{
sctp_sk_addr_restore
(
sk
,
&
saveaddr
);
if
((
ret
=
sctp_get_port_local
(
sk
,
addr
)))
{
if
(
ret
==
(
long
)
sk
)
{
/* This endpoint has a conflicting address. */
return
-
EINVAL
;
...
...
@@ -237,25 +206,32 @@ SCTP_STATIC int sctp_do_bind(struct sock *sk, union sctp_addr *newaddr,
}
/* Refresh ephemeral port. */
if
(
!*
snum
)
*
snum
=
inet_sk
(
sk
)
->
num
;
if
(
!
snum
)
snum
=
inet_sk
(
sk
)
->
num
;
/* The getsockname() API depends on 'sport' being set. */
inet_sk
(
sk
)
->
sport
=
htons
(
inet_sk
(
sk
)
->
num
);
/* Add the address to the bind address list. */
sctp_local_bh_disable
();
sctp_write_lock
(
&
ep
->
base
.
addr_lock
);
/* Use GFP_ATOMIC since BHs are disabled. */
if
((
ret
=
sctp_add_bind_addr
(
bp
,
&
tmpaddr
,
GFP_ATOMIC
)))
{
sctp_sk_addr_restore
(
sk
,
&
saveaddr
);
}
else
if
(
!
bp
->
port
)
{
bp
->
port
=
*
snum
;
}
addr
->
v4
.
sin_port
=
ntohs
(
addr
->
v4
.
sin_port
);
ret
=
sctp_add_bind_addr
(
bp
,
addr
,
GFP_ATOMIC
);
addr
->
v4
.
sin_port
=
htons
(
addr
->
v4
.
sin_port
);
if
(
!
ret
&&
!
bp
->
port
)
bp
->
port
=
snum
;
sctp_write_unlock
(
&
ep
->
base
.
addr_lock
);
sctp_local_bh_enable
();
/* Copy back into socket for getsockname() use. */
if
(
!
ret
)
{
inet_sk
(
sk
)
->
sport
=
htons
(
inet_sk
(
sk
)
->
num
);
af
->
to_sk
(
addr
,
sk
);
}
return
ret
;
}
...
...
@@ -735,7 +711,7 @@ SCTP_STATIC void sctp_close(struct sock *sk, long timeout)
SCTP_STATIC
int
sctp_msghdr_parse
(
const
struct
msghdr
*
,
sctp_cmsgs_t
*
);
SCTP_STATIC
int
sctp_sendmsg
(
struct
kiocb
*
iocb
,
struct
sock
*
sk
,
struct
msghdr
*
msg
,
int
size
)
struct
msghdr
*
msg
,
int
msg_len
)
{
sctp_opt_t
*
sp
;
sctp_endpoint_t
*
ep
;
...
...
@@ -750,13 +726,12 @@ SCTP_STATIC int sctp_sendmsg(struct kiocb *iocb, struct sock *sk,
sctp_assoc_t
associd
=
NULL
;
sctp_cmsgs_t
cmsgs
=
{
0
};
int
err
;
size_t
msg_len
;
sctp_scope_t
scope
;
long
timeo
;
__u16
sinfo_flags
=
0
;
SCTP_DEBUG_PRINTK
(
"sctp_sendmsg(sk: %p, msg: %p,
"
"size: %d)
\n
"
,
sk
,
msg
,
size
);
SCTP_DEBUG_PRINTK
(
"sctp_sendmsg(sk: %p, msg: %p,
msg_len: %d)
\n
"
,
sk
,
msg
,
msg_len
);
err
=
0
;
sp
=
sctp_sk
(
sk
);
...
...
@@ -778,12 +753,16 @@ SCTP_STATIC int sctp_sendmsg(struct kiocb *iocb, struct sock *sk,
* For a peeled-off socket, msg_name is ignored.
*/
if
((
SCTP_SOCKET_UDP_HIGH_BANDWIDTH
!=
sp
->
type
)
&&
msg
->
msg_name
)
{
err
=
sctp_verify_addr
(
sk
,
(
struct
sockaddr
*
)
msg
->
msg_name
,
msg
->
msg_namelen
);
int
msg_namelen
=
msg
->
msg_namelen
;
err
=
sctp_verify_addr
(
sk
,
(
union
sctp_addr
*
)
msg
->
msg_name
,
msg_namelen
);
if
(
err
)
return
err
;
memcpy
(
&
to
,
msg
->
msg_name
,
msg
->
msg_namelen
);
if
(
msg_namelen
>
sizeof
(
to
))
msg_namelen
=
sizeof
(
to
);
memcpy
(
&
to
,
msg
->
msg_name
,
msg_namelen
);
SCTP_DEBUG_PRINTK
(
"Just memcpy'd. msg_name is "
"0x%x:%u.
\n
"
,
to
.
v4
.
sin_addr
.
s_addr
,
to
.
v4
.
sin_port
);
...
...
@@ -792,8 +771,6 @@ SCTP_STATIC int sctp_sendmsg(struct kiocb *iocb, struct sock *sk,
msg_name
=
msg
->
msg_name
;
}
msg_len
=
get_user_iov_size
(
msg
->
msg_iov
,
msg
->
msg_iovlen
);
sinfo
=
cmsgs
.
info
;
sinit
=
cmsgs
.
init
;
...
...
@@ -1216,9 +1193,11 @@ SCTP_STATIC int sctp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr
* Otherwise, set MSG_EOR indicating the end of a message.
*/
if
(
skb_len
>
copied
)
{
msg
->
msg_flags
&=
~
MSG_EOR
;
if
(
flags
&
MSG_PEEK
)
goto
out_free
;
sctp_skb_pull
(
skb
,
copied
);
skb_queue_head
(
&
sk
->
receive_queue
,
skb
);
msg
->
msg_flags
&=
~
MSG_EOR
;
goto
out
;
}
else
{
msg
->
msg_flags
|=
MSG_EOR
;
...
...
@@ -1335,6 +1314,16 @@ static inline int sctp_setsockopt_set_peer_addr_params(struct sock *sk,
return
0
;
}
static
inline
int
sctp_setsockopt_initmsg
(
struct
sock
*
sk
,
char
*
optval
,
int
optlen
)
{
if
(
optlen
!=
sizeof
(
struct
sctp_initmsg
))
return
-
EINVAL
;
if
(
copy_from_user
(
&
sctp_sk
(
sk
)
->
initmsg
,
optval
,
optlen
))
return
-
EFAULT
;
return
0
;
}
/* API 6.2 setsockopt(), getsockopt()
*
* Applications use setsockopt() and getsockopt() to set or retrieve
...
...
@@ -1359,9 +1348,6 @@ SCTP_STATIC int sctp_setsockopt(struct sock *sk, int level, int optname,
{
int
retval
=
0
;
char
*
tmp
;
sctp_protocol_t
*
proto
=
sctp_get_protocol
();
struct
list_head
*
pos
;
sctp_func_t
*
af
;
SCTP_DEBUG_PRINTK
(
"sctp_setsockopt(sk: %p... optname: %d)
\n
"
,
sk
,
optname
);
...
...
@@ -1373,15 +1359,10 @@ SCTP_STATIC int sctp_setsockopt(struct sock *sk, int level, int optname,
* are at all well-founded.
*/
if
(
level
!=
SOL_SCTP
)
{
list_for_each
(
pos
,
&
proto
->
address_families
)
{
af
=
list_entry
(
pos
,
sctp_func_t
,
list
);
retval
=
af
->
setsockopt
(
sk
,
level
,
optname
,
optval
,
optlen
);
if
(
retval
<
0
)
struct
sctp_af
*
af
=
sctp_sk
(
sk
)
->
pf
->
af
;
retval
=
af
->
setsockopt
(
sk
,
level
,
optname
,
optval
,
optlen
);
goto
out_nounlock
;
}
}
sctp_lock_sock
(
sk
);
...
...
@@ -1430,6 +1411,10 @@ SCTP_STATIC int sctp_setsockopt(struct sock *sk, int level, int optname,
optlen
);
break
;
case
SCTP_INITMSG
:
retval
=
sctp_setsockopt_initmsg
(
sk
,
optval
,
optlen
);
break
;
default:
retval
=
-
ENOPROTOOPT
;
break
;
...
...
@@ -1484,7 +1469,7 @@ SCTP_STATIC int sctp_connect(struct sock *sk, struct sockaddr *uaddr,
goto
out_unlock
;
}
err
=
sctp_verify_addr
(
sk
,
uaddr
,
addr_len
);
err
=
sctp_verify_addr
(
sk
,
(
union
sctp_addr
*
)
uaddr
,
addr_len
);
if
(
err
)
goto
out_unlock
;
...
...
@@ -1938,13 +1923,19 @@ static inline int sctp_getsockopt_get_peer_addr_params(struct sock *sk,
return
0
;
}
static
inline
int
sctp_getsockopt_initmsg
(
struct
sock
*
sk
,
int
len
,
char
*
optval
,
int
*
optlen
)
{
if
(
len
!=
sizeof
(
struct
sctp_initmsg
))
return
-
EINVAL
;
if
(
copy_to_user
(
optval
,
&
sctp_sk
(
sk
)
->
initmsg
,
len
))
return
-
EFAULT
;
return
0
;
}
SCTP_STATIC
int
sctp_getsockopt
(
struct
sock
*
sk
,
int
level
,
int
optname
,
char
*
optval
,
int
*
optlen
)
{
int
retval
=
0
;
sctp_protocol_t
*
proto
=
sctp_get_protocol
();
sctp_func_t
*
af
;
struct
list_head
*
pos
;
int
len
;
SCTP_DEBUG_PRINTK
(
"sctp_getsockopt(sk: %p, ...)
\n
"
,
sk
);
...
...
@@ -1956,14 +1947,11 @@ SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname,
* are at all well-founded.
*/
if
(
level
!=
SOL_SCTP
)
{
list_for_each
(
pos
,
&
proto
->
address_families
)
{
af
=
list_entry
(
pos
,
sctp_func_t
,
list
);
retval
=
af
->
getsockopt
(
sk
,
level
,
optname
,
optval
,
optlen
);
if
(
retval
<
0
)
struct
sctp_af
*
af
=
sctp_sk
(
sk
)
->
pf
->
af
;
retval
=
af
->
getsockopt
(
sk
,
level
,
optname
,
optval
,
optlen
);
return
retval
;
}
}
if
(
get_user
(
len
,
optlen
))
return
-
EFAULT
;
...
...
@@ -1997,6 +1985,10 @@ SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname,
optlen
);
break
;
case
SCTP_INITMSG
:
retval
=
sctp_getsockopt_initmsg
(
sk
,
len
,
optval
,
optlen
);
break
;
default:
retval
=
-
ENOPROTOOPT
;
break
;
...
...
@@ -2030,13 +2022,18 @@ static void sctp_unhash(struct sock *sk)
*/
static
sctp_bind_bucket_t
*
sctp_bucket_create
(
sctp_bind_hashbucket_t
*
head
,
unsigned
short
snum
);
static
long
sctp_get_port_local
(
struct
sock
*
sk
,
un
signed
short
snum
)
static
long
sctp_get_port_local
(
struct
sock
*
sk
,
un
ion
sctp_addr
*
addr
)
{
sctp_bind_hashbucket_t
*
head
;
/* hash list */
sctp_bind_bucket_t
*
pp
;
/* hash list port iterator */
sctp_protocol_t
*
sctp
=
sctp_get_protocol
();
unsigned
short
snum
;
int
ret
;
/* NOTE: Remember to put this back to net order. */
addr
->
v4
.
sin_port
=
ntohs
(
addr
->
v4
.
sin_port
);
snum
=
addr
->
v4
.
sin_port
;
SCTP_DEBUG_PRINTK
(
"sctp_get_port() begins, snum=%d
\n
"
,
snum
);
sctp_local_bh_disable
();
...
...
@@ -2101,6 +2098,7 @@ static long sctp_get_port_local(struct sock *sk, unsigned short snum)
}
}
if
(
pp
!=
NULL
&&
pp
->
sk
!=
NULL
)
{
/* We had a port hash table hit - there is an
* available port (pp != NULL) and it is being
...
...
@@ -2108,7 +2106,6 @@ static long sctp_get_port_local(struct sock *sk, unsigned short snum)
* socket is going to be sk2.
*/
int
sk_reuse
=
sk
->
reuse
;
union
sctp_addr
tmpaddr
;
struct
sock
*
sk2
=
pp
->
sk
;
SCTP_DEBUG_PRINTK
(
"sctp_get_port() found a "
...
...
@@ -2116,27 +2113,6 @@ static long sctp_get_port_local(struct sock *sk, unsigned short snum)
if
(
pp
->
fastreuse
!=
0
&&
sk
->
reuse
!=
0
)
goto
success
;
/* FIXME - multiple addresses need to be supported
* later.
*/
switch
(
sk
->
family
)
{
case
PF_INET
:
tmpaddr
.
v4
.
sin_family
=
AF_INET
;
tmpaddr
.
v4
.
sin_port
=
snum
;
tmpaddr
.
v4
.
sin_addr
.
s_addr
=
inet_sk
(
sk
)
->
rcv_saddr
;
break
;
case
PF_INET6
:
SCTP_V6
(
tmpaddr
.
v6
.
sin6_family
=
AF_INET6
;
tmpaddr
.
v6
.
sin6_port
=
snum
;
tmpaddr
.
v6
.
sin6_addr
=
inet6_sk
(
sk
)
->
rcv_saddr
;
)
break
;
default:
break
;
};
/* Run through the list of sockets bound to the port
* (pp->port) [via the pointers bind_next and
* bind_pprev in the struct sock *sk2 (pp->sk)]. On each one,
...
...
@@ -2154,8 +2130,7 @@ static long sctp_get_port_local(struct sock *sk, unsigned short snum)
if
(
sk_reuse
&&
sk2
->
reuse
)
continue
;
if
(
sctp_bind_addr_match
(
&
ep2
->
base
.
bind_addr
,
&
tmpaddr
,
if
(
sctp_bind_addr_match
(
&
ep2
->
base
.
bind_addr
,
addr
,
sctp_sk
(
sk
)))
goto
found
;
}
...
...
@@ -2207,12 +2182,25 @@ static long sctp_get_port_local(struct sock *sk, unsigned short snum)
sctp_local_bh_enable
();
SCTP_DEBUG_PRINTK
(
"sctp_get_port() ends, ret=%d
\n
"
,
ret
);
addr
->
v4
.
sin_port
=
htons
(
addr
->
v4
.
sin_port
);
return
ret
;
}
/* Assign a 'snum' port to the socket. If snum == 0, an ephemeral
* port is requested.
*/
static
int
sctp_get_port
(
struct
sock
*
sk
,
unsigned
short
snum
)
{
long
ret
=
sctp_get_port_local
(
sk
,
snum
);
long
ret
;
union
sctp_addr
addr
;
struct
sctp_af
*
af
=
sctp_sk
(
sk
)
->
pf
->
af
;
/* Set up a dummy address struct from the sk. */
af
->
from_sk
(
&
addr
,
sk
);
addr
.
v4
.
sin_port
=
htons
(
snum
);
/* Note: sk->num gets filled in if ephemeral port request. */
ret
=
sctp_get_port_local
(
sk
,
&
addr
);
return
(
ret
?
1
:
0
);
}
...
...
@@ -2413,7 +2401,7 @@ void sctp_put_port(struct sock *sk)
static
int
sctp_autobind
(
struct
sock
*
sk
)
{
union
sctp_addr
autoaddr
;
struct
sctp_
func
*
af
;
struct
sctp_
af
*
af
;
unsigned
short
port
;
/* Initialize a local sockaddr structure to INADDR_ANY. */
...
...
@@ -2537,58 +2525,6 @@ SCTP_STATIC int sctp_msghdr_parse(const struct msghdr *msg,
return
0
;
}
/* Setup sk->rcv_saddr before calling get_port(). */
static
inline
void
sctp_sk_addr_set
(
struct
sock
*
sk
,
const
union
sctp_addr
*
newaddr
,
union
sctp_addr
*
saveaddr
)
{
struct
inet_opt
*
inet
=
inet_sk
(
sk
);
saveaddr
->
sa
.
sa_family
=
newaddr
->
sa
.
sa_family
;
switch
(
newaddr
->
sa
.
sa_family
)
{
case
AF_INET
:
saveaddr
->
v4
.
sin_addr
.
s_addr
=
inet
->
rcv_saddr
;
inet
->
rcv_saddr
=
inet
->
saddr
=
newaddr
->
v4
.
sin_addr
.
s_addr
;
break
;
case
AF_INET6
:
SCTP_V6
({
struct
ipv6_pinfo
*
np
=
inet6_sk
(
sk
);
saveaddr
->
v6
.
sin6_addr
=
np
->
rcv_saddr
;
np
->
rcv_saddr
=
np
->
saddr
=
newaddr
->
v6
.
sin6_addr
;
break
;
})
default:
break
;
};
}
/* Restore sk->rcv_saddr after failing get_port(). */
static
inline
void
sctp_sk_addr_restore
(
struct
sock
*
sk
,
const
union
sctp_addr
*
addr
)
{
struct
inet_opt
*
inet
=
inet_sk
(
sk
);
switch
(
addr
->
sa
.
sa_family
)
{
case
AF_INET
:
inet
->
rcv_saddr
=
inet
->
saddr
=
addr
->
v4
.
sin_addr
.
s_addr
;
break
;
case
AF_INET6
:
SCTP_V6
({
struct
ipv6_pinfo
*
np
=
inet6_sk
(
sk
);
np
->
rcv_saddr
=
np
->
saddr
=
addr
->
v6
.
sin6_addr
;
break
;
})
default:
break
;
};
}
/*
* Wait for a packet..
* Note: This function is the same function as in core/datagram.c
...
...
@@ -2711,27 +2647,15 @@ static struct sk_buff *sctp_skb_recv_datagram(struct sock *sk, int flags, int no
}
/* Verify that this is a valid address. */
static
int
sctp_verify_addr
(
struct
sock
*
sk
,
struct
sock
addr
*
addr
,
int
len
)
static
int
sctp_verify_addr
(
struct
sock
*
sk
,
union
sctp_
addr
*
addr
,
int
len
)
{
struct
sctp_func
*
af
;
/* Check minimum size. */
if
(
len
<
sizeof
(
struct
sockaddr
))
return
-
EINVAL
;
struct
sctp_af
*
af
;
/*
Do we support this address family in general?
*/
af
=
sctp_
get_af_specific
(
addr
->
sa_family
);
/*
Verify basic sockaddr.
*/
af
=
sctp_
sockaddr_af
(
sctp_sk
(
sk
),
addr
,
len
);
if
(
!
af
)
return
-
EINVAL
;
/* Does this PF support this AF? */
if
(
!
sctp_sk
(
sk
)
->
pf
->
af_supported
(
addr
->
sa_family
))
return
-
EINVAL
;
/* Verify the minimum for this AF sockaddr. */
if
(
len
<
af
->
sockaddr_len
)
return
-
EINVAL
;
/* Is this a valid SCTP address? */
if
(
!
af
->
addr_valid
((
union
sctp_addr
*
)
addr
))
return
-
EINVAL
;
...
...
net/sctp/sysctl.c
View file @
b4f94e3f
/* SCTP kernel reference Implementation
* Copyright (c) 2002 International Business Machines Corp.
* Copyright (c) 2002 Intel Corp.
*
* This file is part of the SCTP kernel reference Implementation
*
...
...
@@ -32,6 +33,7 @@
* Written or modified by:
* Mingqin Liu <liuming@us.ibm.com>
* Jon Grimm <jgrimm@us.ibm.com>
* Ardelle Fan <ardelle.fan@intel.com>
*
* Any bugs reported given to us we will try to fix... any fixes shared will
* be incorporated into the next SCTP release.
...
...
@@ -70,6 +72,9 @@ static ctl_table sctp_table[] = {
{
NET_SCTP_HB_INTERVAL
,
"hb_interval"
,
&
sctp_proto
.
hb_interval
,
sizeof
(
int
),
0644
,
NULL
,
&
proc_dointvec_jiffies
,
&
sysctl_jiffies
},
{
NET_SCTP_PRESERVE_ENABLE
,
"cookie_preserve_enable"
,
&
sctp_proto
.
cookie_preserve_enable
,
sizeof
(
int
),
0644
,
NULL
,
&
proc_dointvec_jiffies
,
&
sysctl_jiffies
},
{
NET_SCTP_RTO_ALPHA
,
"rto_alpha_exp_divisor"
,
&
sctp_proto
.
rto_alpha
,
sizeof
(
int
),
0644
,
NULL
,
&
proc_dointvec
},
...
...
net/sctp/transport.c
View file @
b4f94e3f
...
...
@@ -207,7 +207,7 @@ void sctp_transport_route(sctp_transport_t *transport, union sctp_addr *saddr,
struct
sctp_opt
*
opt
)
{
sctp_association_t
*
asoc
=
transport
->
asoc
;
struct
sctp_
func
*
af
=
transport
->
af_specific
;
struct
sctp_
af
*
af
=
transport
->
af_specific
;
union
sctp_addr
*
daddr
=
&
transport
->
ipaddr
;
sctp_bind_addr_t
*
bp
;
rwlock_t
*
addr_lock
;
...
...
net/sctp/ulpevent.c
View file @
b4f94e3f
...
...
@@ -762,6 +762,8 @@ static void sctp_rcvmsg_rfree(struct sk_buff *skb)
{
sctp_association_t
*
asoc
;
sctp_ulpevent_t
*
event
;
sctp_chunk_t
*
sack
;
struct
timer_list
*
timer
;
/* Current stack structures assume that the rcv buffer is
* per socket. For UDP style sockets this is not true as
...
...
@@ -782,9 +784,39 @@ static void sctp_rcvmsg_rfree(struct sk_buff *skb)
asoc
->
rwnd
+=
skb
->
len
;
}
SCTP_DEBUG_PRINTK
(
"rwnd increased by %d to (%u, %u)
\n
"
,
skb
->
len
,
asoc
->
rwnd
,
asoc
->
rwnd_over
);
SCTP_DEBUG_PRINTK
(
"rwnd increased by %d to (%u, %u) - %u
\n
"
,
skb
->
len
,
asoc
->
rwnd
,
asoc
->
rwnd_over
,
asoc
->
a_rwnd
);
/* Send a window update SACK if the rwnd has increased by at least the
* minimum of the association's PMTU and half of the receive buffer.
* The algorithm used is similar to the one described in Section 4.2.3.3
* of RFC 1122.
*/
if
((
asoc
->
state
==
SCTP_STATE_ESTABLISHED
)
&&
(
asoc
->
rwnd
>
asoc
->
a_rwnd
)
&&
((
asoc
->
rwnd
-
asoc
->
a_rwnd
)
>=
min_t
(
__u32
,
(
asoc
->
base
.
sk
->
rcvbuf
>>
1
),
asoc
->
pmtu
)))
{
SCTP_DEBUG_PRINTK
(
"Sending window update SACK- rwnd: %u "
"a_rwnd: %u
\n
"
,
asoc
->
rwnd
,
asoc
->
a_rwnd
);
sack
=
sctp_make_sack
(
asoc
);
if
(
!
sack
)
goto
out
;
/* Update the last advertised rwnd value. */
asoc
->
a_rwnd
=
asoc
->
rwnd
;
asoc
->
peer
.
sack_needed
=
0
;
asoc
->
peer
.
next_dup_tsn
=
0
;
sctp_push_outqueue
(
&
asoc
->
outqueue
,
sack
);
/* Stop the SACK timer. */
timer
=
&
asoc
->
timers
[
SCTP_EVENT_TIMEOUT_SACK
];
if
(
timer_pending
(
timer
)
&&
del_timer
(
timer
))
sctp_association_put
(
asoc
);
}
out:
sctp_association_put
(
asoc
);
}
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment