Commit 69904112 authored by David S. Miller's avatar David S. Miller

Merge nuts.ninka.net:/home/davem/src/BK/sctp-2.5

into nuts.ninka.net:/home/davem/src/BK/net-2.5
parents 54bf5a09 5d7154f7
...@@ -17,6 +17,23 @@ extreme-linux and beowulf sites will not work with this version of the driver. ...@@ -17,6 +17,23 @@ extreme-linux and beowulf sites will not work with this version of the driver.
For new versions of the driver, patches for older kernels and the updated For new versions of the driver, patches for older kernels and the updated
userspace tools, please follow the links at the end of this file. userspace tools, please follow the links at the end of this file.
Table of Contents
=================
Installation
Bond Configuration
Module Parameters
Configuring Multiple Bonds
Switch Configuration
Verifying Bond Configuration
Frequently Asked Questions
High Availability
Promiscuous Sniffing notes
Limitations
Resources and Links
Installation Installation
============ ============
...@@ -51,16 +68,21 @@ To install ifenslave.c, do: ...@@ -51,16 +68,21 @@ To install ifenslave.c, do:
# gcc -Wall -Wstrict-prototypes -O -I/usr/src/linux/include ifenslave.c -o ifenslave # gcc -Wall -Wstrict-prototypes -O -I/usr/src/linux/include ifenslave.c -o ifenslave
# cp ifenslave /sbin/ifenslave # cp ifenslave /sbin/ifenslave
3) Configure your system
------------------------ Bond Configuration
Also see the following section on the module parameters. You will need to add ==================
at least the following line to /etc/conf.modules (or /etc/modules.conf):
You will need to add at least the following line to /etc/modules.conf
so the bonding driver will automatically load when the bond0 interface is
configured. Refer to the modules.conf manual page for specific modules.conf
syntax details. The Module Parameters section of this document describes each
bonding driver parameter.
alias bond0 bonding alias bond0 bonding
Use standard distribution techniques to define bond0 network interface. For Use standard distribution techniques to define the bond0 network interface. For
example, on modern RedHat distributions, create ifcfg-bond0 file in example, on modern Red Hat distributions, create an ifcfg-bond0 file in
/etc/sysconfig/network-scripts directory that looks like this: the /etc/sysconfig/network-scripts directory that resembles the following:
DEVICE=bond0 DEVICE=bond0
IPADDR=192.168.1.1 IPADDR=192.168.1.1
...@@ -71,12 +93,12 @@ ONBOOT=yes ...@@ -71,12 +93,12 @@ ONBOOT=yes
BOOTPROTO=none BOOTPROTO=none
USERCTL=no USERCTL=no
(put the appropriate values for you network instead of 192.168.1). (use appropriate values for your network above)
All interfaces that are part of the trunk, should have SLAVE and MASTER All interfaces that are part of a bond should have SLAVE and MASTER
definitions. For example, in the case of RedHat, if you wish to make eth0 and definitions. For example, in the case of Red Hat, if you wish to make eth0 and
eth1 (or other interfaces) a part of the bonding interface bond0, their config eth1 a part of the bonding interface bond0, their config files (ifcfg-eth0 and
files (ifcfg-eth0, ifcfg-eth1, etc.) should look like this: ifcfg-eth1) should resemble the following:
DEVICE=eth0 DEVICE=eth0
USERCTL=no USERCTL=no
...@@ -85,89 +107,261 @@ MASTER=bond0 ...@@ -85,89 +107,261 @@ MASTER=bond0
SLAVE=yes SLAVE=yes
BOOTPROTO=none BOOTPROTO=none
(use DEVICE=eth1 for eth1 and MASTER=bond1 for bond1 if you have configured Use DEVICE=eth1 in the ifcfg-eth1 config file. If you configure a second bonding
second bonding interface). interface (bond1), use MASTER=bond1 in the config file to make the network
interface be a slave of bond1.
Restart the networking subsystem or just bring up the bonding device if your Restart the networking subsystem or just bring up the bonding device if your
administration tools allow it. Otherwise, reboot. (For the case of RedHat administration tools allow it. Otherwise, reboot. On Red Hat distros you can
distros, you can do `ifup bond0' or `/etc/rc.d/init.d/network restart'.) issue `ifup bond0' or `/etc/rc.d/init.d/network restart'.
If the administration tools of your distribution do not support master/slave If the administration tools of your distribution do not support master/slave
notation in configuration of network interfaces, you will need to configure notation in configuring network interfaces, you will need to manually configure
the bonding device with the following commands manually: the bonding device with the following commands:
# /sbin/ifconfig bond0 192.168.1.1 netmask 255.255.255.0 \
broadcast 192.168.1.255 up
# /sbin/ifconfig bond0 192.168.1.1 up
# /sbin/ifenslave bond0 eth0 # /sbin/ifenslave bond0 eth0
# /sbin/ifenslave bond0 eth1 # /sbin/ifenslave bond0 eth1
(substitute 192.168.1.1 with your IP address and add custom network and custom (use appropriate values for your network above)
netmask to the arguments of ifconfig if required).
You can then create a script with these commands and put it into the appropriate You can then create a script containing these commands and place it in the
rc directory. appropriate rc directory.
If you specifically need that all your network drivers are loaded before the If you specifically need all network drivers loaded before the bonding driver,
bonding driver, use one of modutils' powerful features : in your modules.conf, adding the following line to modules.conf will cause the network driver for
tell that when asked for bond0, modprobe should first load all your interfaces : eth0 and eth1 to be loaded before the bonding driver.
probeall bond0 eth0 eth1 bonding probeall bond0 eth0 eth1 bonding
Be careful not to reference bond0 itself at the end of the line, or modprobe will Be careful not to reference bond0 itself at the end of the line, or modprobe
die in an endless recursive loop. will die in an endless recursive loop.
4) Module parameters. To have device characteristics (such as MTU size) propagate to slave devices,
--------------------- set the bond characteristics before enslaving the device. The characteristics
The following module parameters can be passed: are propagated during the enslave process.
mode= If running SNMP agents, the bonding driver should be loaded before any network
drivers participating in a bond. This requirement is due to the the interface
Possible values are 0 (round robin policy, default) and 1 (active backup index (ipAdEntIfIndex) being associated to the first interface found with a
policy), and 2 (XOR). See question 9 and the HA section for additional info. given IP address. That is, there is only one ipAdEntIfIndex for each IP
address. For example, if eth0 and eth1 are slaves of bond0 and the driver for
miimon= eth0 is loaded before the bonding driver, the interface for the IP address
will be associated with the eth0 interface. This configuration is shown below,
Use integer value for the frequency (in ms) of MII link monitoring. Zero value the IP address 192.168.1.1 has an interface index of 2 which indexes to eth0
is default and means the link monitoring will be disabled. A good value is 100 in the ifDescr table (ifDescr.2).
if you wish to use link monitoring. See HA section for additional info.
interfaces.ifTable.ifEntry.ifDescr.1 = lo
downdelay= interfaces.ifTable.ifEntry.ifDescr.2 = eth0
interfaces.ifTable.ifEntry.ifDescr.3 = eth1
Use integer value for delaying disabling a link by this number (in ms) after interfaces.ifTable.ifEntry.ifDescr.4 = eth2
the link failure has been detected. Must be a multiple of miimon. Default interfaces.ifTable.ifEntry.ifDescr.5 = eth3
value is zero. See HA section for additional info. interfaces.ifTable.ifEntry.ifDescr.6 = bond0
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.10.10.10 = 5
updelay= ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.192.168.1.1 = 2
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.74.20.94 = 4
Use integer value for delaying enabling a link by this number (in ms) after ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.127.0.0.1 = 1
the "link up" status has been detected. Must be a multiple of miimon. Default
value is zero. See HA section for additional info. This problem is avoided by loading the bonding driver before any network
drivers participating in a bond. Below is an example of loading the bonding
driver first, the IP address 192.168.1.1 is correctly associated with ifDescr.2.
interfaces.ifTable.ifEntry.ifDescr.1 = lo
interfaces.ifTable.ifEntry.ifDescr.2 = bond0
interfaces.ifTable.ifEntry.ifDescr.3 = eth0
interfaces.ifTable.ifEntry.ifDescr.4 = eth1
interfaces.ifTable.ifEntry.ifDescr.5 = eth2
interfaces.ifTable.ifEntry.ifDescr.6 = eth3
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.10.10.10 = 6
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.192.168.1.1 = 2
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.74.20.94 = 5
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.127.0.0.1 = 1
While some distributions may not report the interface name in ifDescr,
the association between the IP address and IfIndex remains and SNMP
functions such as Interface_Scan_Next will report that association.
Module Parameters
=================
arp_interval= Optional parameters for the bonding driver can be supplied as command line
arguments to the insmod command. Typically, these parameters are specified in
the file /etc/modules.conf (see the manual page for modules.conf). The
available bonding driver parameters are listed below. If a parameter is not
specified the default value is used. When initially configuring a bond, it
is recommended "tail -f /var/log/messages" be run in a separate window to
watch for bonding driver error messages.
It is critical that either the miimon or arp_interval and arp_ip_target
parameters be specified, otherwise serious network degradation will occur
during link failures.
mode
Specifies one of four bonding policies. The default is round-robin.
Possible values are:
0 Round-robin policy: Transmit in a sequential order from the
first available slave through the last. This mode provides
load balancing and fault tolerance.
1 Active-backup policy: Only one slave in the bond is active. A
different slave becomes active if, and only if, the active slave
fails. The bond's MAC address is externally visible on only
one port (network adapter) to avoid confusing the switch.
This mode provides fault tolerance.
2 XOR policy: Transmit based on [(source MAC address XOR'd with
destination MAC address) modula slave count]. This selects the
same slave for each destination MAC address. This mode provides
load balancing and fault tolerance.
3 Broadcast policy: transmits everything on all slave interfaces.
This mode provides fault tolerance.
miimon
Specifies the frequency in milli-seconds that MII link monitoring will
occur. A value of zero disables MII link monitoring. A value of
100 is a good starting point. See High Availability section for
additional information. The default value is 0.
downdelay
Specifies the delay time in milli-seconds to disable a link after a
link failure has been detected. This should be a multiple of miimon
value, otherwise the value will be rounded. The default value is 0.
updelay
Specifies the delay time in milli-seconds to enable a link after a
link up status has been detected. This should be a multiple of miimon
value, otherwise the value will be rounded. The default value is 0.
arp_interval
Specifies the ARP monitoring frequency in milli-seconds.
If ARP monitoring is used in a load-balancing mode (mode 0 or 2), the
switch should be configured in a mode that evenly distributes packets
across all links - such as round-robin. If the switch is configured to
distribute the packets in an XOR fashion, all replies from the ARP
targets will be received on the same link which could cause the other
team members to fail. ARP monitoring should not be used in conjunction
with miimon. A value of 0 disables ARP monitoring. The default value
is 0.
arp_ip_target
Specifies the ip addresses to use when arp_interval is > 0. These are
the targets of the ARP request sent to determine the health of the link
to the targets. Specify these values in ddd.ddd.ddd.ddd format.
Multiple ip adresses must be seperated by a comma. At least one ip
address needs to be given for ARP monitoring to work. The maximum number
of targets that can be specified is set at 16.
primary
A string (eth0, eth2, etc) to equate to a primary device. If this
value is entered, and the device is on-line, it will be used first as
the output media. Only when this device is off-line, will alternate
devices be used. Otherwise, once a failover is detected and a new
default output is chosen, it will remain the output media until it too
fails. This is useful when one slave was preferred over another, i.e.
when one slave is 1000Mbps and another is 100Mbps. If the 1000Mbps
slave fails and is later restored, it may be preferred the faster slave
gracefully become the active slave - without deliberately failing the
100Mbps slave. Specifying a primary is only valid in active-backup mode.
multicast
Integer value for the mode of operation for multicast support.
Possible values are:
0 Disabled (no multicast support)
1 Enabled on active slave only, useful in active-backup mode
2 Enabled on all slaves, this is the default
Configuring Multiple Bonds
==========================
If several bonding interfaces are required, the driver must be loaded
multiple times. For example, to configure two bonding interfaces with link
monitoring performed every 100 milli-seconds, the /etc/conf.modules should
resemble the following:
Use integer value for the frequency (in ms) of arp monitoring. Zero value alias bond0 bonding
is default and means the arp monitoring will be disabled. See HA section alias bond1 bonding
for additional info. This field is value in active_backup mode only.
arp_ip_target= options bond0 miimon=100
options bond1 -o bonding1 miimon=100
An ip address to use when arp_interval is > 0. This is the target of the Configuring Multiple ARP Targets
arp request sent to determine the health of the link to the target. ================================
Specify this value in ddd.ddd.ddd.ddd format.
If you need to configure several bonding devices, the driver must be loaded While ARP monitoring can be done with just one target, it can be usefull
several times. I.e. for two bonding devices, your /etc/conf.modules must look in a High Availability setup to have several targets to monitor. In the
like this: case of just one target, the target itself may go down or have a problem
making it unresponsive to ARP requests. Having an additional target (or
several) would increase the reliability of the ARP monitoring.
Multiple ARP targets must be seperated by commas as follows:
# example options for ARP monitoring with three targets
alias bond0 bonding alias bond0 bonding
alias bond1 bonding options bond0 arp_interval=60 arp_ip_target=192.168.0.1,192.168.0.3,192.168.0.9
options bond0 miimon=100 For just a single target the options would resemble:
options bond1 -o bonding1 miimon=100
5) Testing configuration # example options for ARP monitoring with one target
------------------------ alias bond0 bonding
You can test the configuration and transmit policy with ifconfig. For example, options bond0 arp_interval=60 arp_ip_target=192.168.0.100
for round robin policy, you should get something like this:
Switch Configuration
====================
While the switch does not need to be configured when the active-backup
policy is used (mode=1), it does need to be configured for the round-robin,
XOR, and broadcast policies (mode=0, mode=2, and mode=3).
Verifying Bond Configuration
============================
1) Bonding information files
----------------------------
The bonding driver information files reside in the /proc/net/bond* directories.
Sample contents of /proc/net/bond0/info after the driver is loaded with
parameters of mode=0 and miimon=1000 is shown below.
Bonding Mode: load balancing (round-robin)
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 1000
Up Delay (ms): 0
Down Delay (ms): 0
Slave Interface: eth1
MII Status: up
Link Failure Count: 1
Slave Interface: eth0
MII Status: up
Link Failure Count: 1
2) Network verification
-----------------------
The network configuration can be verified using the ifconfig command. In
the example below, the bond0 interface is the master (MASTER) while eth0 and
eth1 are slaves (SLAVE). Notice all slaves of bond0 have the same MAC address
(HWaddr) as bond0.
[root]# /sbin/ifconfig [root]# /sbin/ifconfig
bond0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 bond0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4
...@@ -193,12 +387,13 @@ eth1 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 ...@@ -193,12 +387,13 @@ eth1 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4
collisions:0 txqueuelen:100 collisions:0 txqueuelen:100
Interrupt:9 Base address:0x1400 Interrupt:9 Base address:0x1400
Questions :
=========== Frequently Asked Questions
==========================
1. Is it SMP safe? 1. Is it SMP safe?
Yes. The old 2.0.xx channel bonding patch was not SMP safe. Yes. The old 2.0.xx channel bonding patch was not SMP safe.
The new driver was designed to be SMP safe from the start. The new driver was designed to be SMP safe from the start.
2. What type of cards will work with it? 2. What type of cards will work with it?
...@@ -209,31 +404,30 @@ Questions : ...@@ -209,31 +404,30 @@ Questions :
3. How many bonding devices can I have? 3. How many bonding devices can I have?
One for each module you load. See section on module parameters for how One for each module you load. See section on Module Parameters for how
to accomplish this. to accomplish this.
4. How many slaves can a bonding device have? 4. How many slaves can a bonding device have?
Limited by the number of network interfaces Linux supports and the Limited by the number of network interfaces Linux supports and/or the
number of cards you can place in your system. number of network cards you can place in your system.
5. What happens when a slave link dies? 5. What happens when a slave link dies?
If your ethernet cards support MII status monitoring and the MII If your ethernet cards support MII or ETHTOOL link status monitoring
monitoring has been enabled in the driver (see description of module and the MII monitoring has been enabled in the driver (see description
parameters), there will be no adverse consequences. This release of module parameters), there will be no adverse consequences. This
of the bonding driver knows how to get the MII information and release of the bonding driver knows how to get the MII information and
enables or disables its slaves according to their link status. enables or disables its slaves according to their link status.
See section on HA for additional information. See section on High Availability for additional information.
For ethernet cards not supporting MII status, or if you wish to For ethernet cards not supporting MII status, the arp_interval and
verify that packets have been both send and received, you may arp_ip_target parameters must be specified for bonding to work
configure the arp_interval and arp_ip_target. If packets have correctly. If packets have not been sent or received during the
not been sent or received during this interval, an arp request specified arp_interval durration, an ARP request is sent to the targets
is sent to the target to generate send and receive traffic. to generate send and receive traffic. If after this interval, either
If after this interval, either the successful send and/or the successful send and/or receive count has not incremented, the next
receive count has not incremented, the next slave in the sequence slave in the sequence will become the active slave.
will become the active slave.
If neither mii_monitor and arp_interval is configured, the bonding If neither mii_monitor and arp_interval is configured, the bonding
driver will not handle this situation very well. The driver will driver will not handle this situation very well. The driver will
...@@ -245,11 +439,12 @@ Questions : ...@@ -245,11 +439,12 @@ Questions :
6. Can bonding be used for High Availability? 6. Can bonding be used for High Availability?
Yes, if you use MII monitoring and ALL your cards support MII link Yes, if you use MII monitoring and ALL your cards support MII link
status reporting. See section on HA for more information. status reporting. See section on High Availability for more information.
7. Which switches/systems does it work with? 7. Which switches/systems does it work with?
In round-robin mode, it works with systems that support trunking: In round-robin and XOR mode, it works with systems that support
trunking:
* Cisco 5500 series (look for EtherChannel support). * Cisco 5500 series (look for EtherChannel support).
* SunTrunking software. * SunTrunking software.
...@@ -259,7 +454,8 @@ Questions : ...@@ -259,7 +454,8 @@ Questions :
units. units.
* Linux bonding, of course ! * Linux bonding, of course !
In Active-backup mode, it should work with any Layer-II switches. In active-backup mode, it should work with any Layer-II switche.
8. Where does a bonding device get its MAC address from? 8. Where does a bonding device get its MAC address from?
...@@ -297,55 +493,68 @@ Questions : ...@@ -297,55 +493,68 @@ Questions :
9. Which transmit polices can be used? 9. Which transmit polices can be used?
Round robin, based on the order of enslaving, the output device Round-robin, based on the order of enslaving, the output device
is selected base on the next available slave. Regardless of is selected base on the next available slave. Regardless of
the source and/or destination of the packet. the source and/or destination of the packet.
XOR, based on (src hw addr XOR dst hw addr) % slave cnt. This
selects the same slave for each destination hw address.
Active-backup policy that ensures that one and only one device will Active-backup policy that ensures that one and only one device will
transmit at any given moment. Active-backup policy is useful for transmit at any given moment. Active-backup policy is useful for
implementing high availability solutions using two hubs (see implementing high availability solutions using two hubs (see
section on HA). section on High Availability).
High availability XOR, based on (src hw addr XOR dst hw addr) % slave count. This
================= policy selects the same slave for each destination hw address.
Broadcast policy transmits everything on all slave interfaces.
To implement high availability using the bonding driver, you need to
compile the driver as module because currently it is the only way to pass
parameters to the driver. This may change in the future.
High availability is achieved by using MII status reporting. You need to High Availability
verify that all your interfaces support MII link status reporting. On Linux =================
kernel 2.2.17, all the 100 Mbps capable drivers and yellowfin gigabit driver
support it. If your system has an interface that does not support MII status
reporting, a failure of its link will not be detected!
The bonding driver can regularly check all its slaves links by checking the To implement high availability using the bonding driver, the driver needs to be
MII status registers. The check interval is specified by the module argument compiled as a module, because currently it is the only way to pass parameters
"miimon" (MII monitoring). It takes an integer that represents the to the driver. This may change in the future.
checking time in milliseconds. It should not come to close to (1000/HZ)
(10 ms on i386) because it may then reduce the system interactivity. 100 ms High availability is achieved by using MII or ETHTOOL status reporting. You
seems to be a good value. It means that a dead link will be detected at most need to verify that all your interfaces support MII or ETHTOOL link status
100 ms after it goes down. reporting. On Linux kernel 2.2.17, all the 100 Mbps capable drivers and
yellowfin gigabit driver support MII. To determine if ETHTOOL link reporting
is available for interface eth0, type "ethtool eth0" and the "Link detected:"
line should contain the correct link status. If your system has an interface
that does not support MII or ETHTOOL status reporting, a failure of its link
will not be detected! A message indicating MII and ETHTOOL is not supported by
a network driver is logged when the bonding driver is loaded with a non-zero
miimon value.
The bonding driver can regularly check all its slaves links using the ETHTOOL
IOCTL (ETHTOOL_GLINK command) or by checking the MII status registers. The
check interval is specified by the module argument "miimon" (MII monitoring).
It takes an integer that represents the checking time in milliseconds. It
should not come to close to (1000/HZ) (10 milli-seconds on i386) because it
may then reduce the system interactivity. A value of 100 seems to be a good
starting point. It means that a dead link will be detected at most 100
milli-seconds after it goes down.
Example: Example:
# modprobe bonding miimon=100 # modprobe bonding miimon=100
Or, put in your /etc/modules.conf : Or, put the following lines in /etc/modules.conf:
alias bond0 bonding alias bond0 bonding
options bond0 miimon=100 options bond0 miimon=100
There are currently two policies for high availability, depending on whether There are currently two policies for high availability. They are dependent on
a) hosts are connected to a single host or switch that support trunking whether:
b) hosts are connected to several different switches or a single switch that
does not support trunking. a) hosts are connected to a single host or switch that support trunking
b) hosts are connected to several different switches or a single switch that
does not support trunking
1) HA on a single switch or host - load balancing
------------------------------------------------- 1) High Availability on a single switch or host - load balancing
----------------------------------------------------------------
It is the easiest to set up and to understand. Simply configure the It is the easiest to set up and to understand. Simply configure the
remote equipment (host or switch) to aggregate traffic over several remote equipment (host or switch) to aggregate traffic over several
ports (Trunk, EtherChannel, etc.) and configure the bonding interfaces. ports (Trunk, EtherChannel, etc.) and configure the bonding interfaces.
...@@ -356,7 +565,7 @@ encounter problems on some buggy switches that disable the trunk for a ...@@ -356,7 +565,7 @@ encounter problems on some buggy switches that disable the trunk for a
long time if all ports in a trunk go down. This is not Linux, but really long time if all ports in a trunk go down. This is not Linux, but really
the switch (reboot it to ensure). the switch (reboot it to ensure).
Example 1 : host to host at double speed Example 1 : host to host at twice the speed
+----------+ +----------+ +----------+ +----------+
| |eth0 eth0| | | |eth0 eth0| |
...@@ -370,7 +579,7 @@ Example 1 : host to host at double speed ...@@ -370,7 +579,7 @@ Example 1 : host to host at double speed
# ifconfig bond0 addr # ifconfig bond0 addr
# ifenslave bond0 eth0 eth1 # ifenslave bond0 eth0 eth1
Example 2 : host to switch at double speed Example 2 : host to switch at twice the speed
+----------+ +----------+ +----------+ +----------+
| |eth0 port1| | | |eth0 port1| |
...@@ -384,7 +593,9 @@ Example 2 : host to switch at double speed ...@@ -384,7 +593,9 @@ Example 2 : host to switch at double speed
# ifconfig bond0 addr and port2 # ifconfig bond0 addr and port2
# ifenslave bond0 eth0 eth1 # ifenslave bond0 eth0 eth1
2) HA on two or more switches (or a single switch without trunking support)
2) High Availability on two or more switches (or a single switch without
trunking support)
--------------------------------------------------------------------------- ---------------------------------------------------------------------------
This mode is more problematic because it relies on the fact that there This mode is more problematic because it relies on the fact that there
are multiple ports and the host's MAC address should be visible on one are multiple ports and the host's MAC address should be visible on one
...@@ -423,14 +634,14 @@ point of failure" solution. ...@@ -423,14 +634,14 @@ point of failure" solution.
+--------------+ host2 +----------------+ +--------------+ host2 +----------------+
eth0 +-------+ eth1 eth0 +-------+ eth1
In this configuration, there are an ISL - Inter Switch Link (could be a trunk), In this configuration, there is an ISL - Inter Switch Link (could be a trunk),
several servers (host1, host2 ...) attached to both switches each, and one or several servers (host1, host2 ...) attached to both switches each, and one or
more ports to the outside world (port3...). One an only one slave on each host more ports to the outside world (port3...). One an only one slave on each host
is active at a time, while all links are still monitored (the system can is active at a time, while all links are still monitored (the system can
detect a failure of active and backup links). detect a failure of active and backup links).
Each time a host changes its active interface, it sticks to the new one until Each time a host changes its active interface, it sticks to the new one until
it goes down. In this example, the hosts are not too much affected by the it goes down. In this example, the hosts are negligibly affected by the
expiration time of the switches' forwarding tables. expiration time of the switches' forwarding tables.
If host1 and host2 have the same functionality and are used in load balancing If host1 and host2 have the same functionality and are used in load balancing
...@@ -460,6 +671,7 @@ Each time the host changes its active interface, it sticks to the new one until ...@@ -460,6 +671,7 @@ Each time the host changes its active interface, it sticks to the new one until
it goes down. In this example, the host is strongly affected by the expiration it goes down. In this example, the host is strongly affected by the expiration
time of the switch forwarding table. time of the switch forwarding table.
3) Adapting to your switches' timing 3) Adapting to your switches' timing
------------------------------------ ------------------------------------
If your switches take a long time to go into backup mode, it may be If your switches take a long time to go into backup mode, it may be
...@@ -488,8 +700,34 @@ Examples : ...@@ -488,8 +700,34 @@ Examples :
# modprobe bonding miimon=100 mode=1 downdelay=2000 updelay=5000 # modprobe bonding miimon=100 mode=1 downdelay=2000 updelay=5000
# modprobe bonding miimon=100 mode=0 downdelay=0 updelay=5000 # modprobe bonding miimon=100 mode=0 downdelay=0 updelay=5000
4) Limitations
-------------- Promiscuous Sniffing notes
==========================
If you wish to bond channels together for a network sniffing
application --- you wish to run tcpdump, or ethereal, or an IDS like
snort, with its input aggregated from multiple interfaces using the
bonding driver --- then you need to handle the Promiscuous interface
setting by hand. Specifically, when you "ifconfing bond0 up" you
must add the promisc flag there; it will be propagated down to the
slave interfaces at ifenslave time; a full example might look like:
grep bond0 /etc/modules.conf || echo alias bond0 bonding >/etc/modules.conf
ifconfig bond0 promisc up
for if in eth1 eth2 ...;do
ifconfig $if up
ifenslave bond0 $if
done
snort ... -i bond0 ...
Ifenslave also wants to propagate addresses from interface to
interface, appropriately for its design functions in HA and channel
capacity aggregating; but it works fine for unnumbered interfaces;
just ignore all the warnings it emits.
Limitations
===========
The main limitations are : The main limitations are :
- only the link status is monitored. If the switch on the other side is - only the link status is monitored. If the switch on the other side is
partially down (e.g. doesn't forward anymore, but the link is OK), the link partially down (e.g. doesn't forward anymore, but the link is OK), the link
...@@ -500,7 +738,13 @@ The main limitations are : ...@@ -500,7 +738,13 @@ The main limitations are :
Use the arp_interval/arp_ip_target parameters to count incoming/outgoing Use the arp_interval/arp_ip_target parameters to count incoming/outgoing
frames. frames.
Resources and links - A Transmit Load Balancing policy is not currently available. This mode
allows every slave in the bond to transmit while only one receives. If
the "receiving" slave fails, another slave takes over the MAC address of
the failed receiving slave.
Resources and Links
=================== ===================
Current development on this driver is posted to: Current development on this driver is posted to:
......
...@@ -41,6 +41,16 @@ ...@@ -41,6 +41,16 @@
* - 2002/02/18 Erik Habbinga <erik_habbinga @ hp dot com> : * - 2002/02/18 Erik Habbinga <erik_habbinga @ hp dot com> :
* - ifr2.ifr_flags was not initialized in the hwaddr_notset case, * - ifr2.ifr_flags was not initialized in the hwaddr_notset case,
* SIOCGIFFLAGS now called before hwaddr_notset test * SIOCGIFFLAGS now called before hwaddr_notset test
*
* - 2002/10/31 Tony Cureington <tony.cureington * hp_com> :
* - If the master does not have a hardware address when the first slave
* is enslaved, the master is assigned the hardware address of that
* slave - there is a comment in bonding.c stating "ifenslave takes
* care of this now." This corrects the problem of slaves having
* different hardware addresses in active-backup mode when
* multiple interfaces are specified on a single ifenslave command
* (ifenslave bond0 eth0 eth1).
*
*/ */
static char *version = static char *version =
...@@ -131,6 +141,7 @@ main(int argc, char **argv) ...@@ -131,6 +141,7 @@ main(int argc, char **argv)
sa_family_t master_family; sa_family_t master_family;
char **spp, *master_ifname, *slave_ifname; char **spp, *master_ifname, *slave_ifname;
int hwaddr_notset; int hwaddr_notset;
int master_up;
while ((c = getopt_long(argc, argv, "acdfrvV?h", longopts, 0)) != EOF) while ((c = getopt_long(argc, argv, "acdfrvV?h", longopts, 0)) != EOF)
switch (c) { switch (c) {
...@@ -300,10 +311,86 @@ main(int argc, char **argv) ...@@ -300,10 +311,86 @@ main(int argc, char **argv)
return 1; return 1;
} }
if (hwaddr_notset) { /* we do nothing */ if (hwaddr_notset) {
/* assign the slave hw address to the
* master since it currently does not
* have one; otherwise, slaves may
* have different hw addresses in
* active-backup mode as seen when enslaving
* using "ifenslave bond0 eth0 eth1" because
* hwaddr_notset is set outside this loop.
* TODO: put this and the "else" portion in
* a function.
*/
goterr = 0;
master_up = 0;
if (if_flags.ifr_flags & IFF_UP) {
if_flags.ifr_flags &= ~IFF_UP;
if (ioctl(skfd, SIOCSIFFLAGS,
&if_flags) < 0) {
goterr = 1;
fprintf(stderr,
"Shutting down "
"interface %s failed: "
"%s\n",
master_ifname,
strerror(errno));
} else {
/* we took the master down,
* so we must bring it up
*/
master_up = 1;
}
}
} if (!goterr) {
else { /* we'll assign master's hwaddr to this slave */ /* get the slaves MAC address */
strncpy(if_hwaddr.ifr_name,
slave_ifname, IFNAMSIZ);
if (ioctl(skfd, SIOCGIFHWADDR,
&if_hwaddr) < 0) {
fprintf(stderr,
"Could not get MAC "
"address of %s: %s\n",
slave_ifname,
strerror(errno));
strncpy(if_hwaddr.ifr_name,
master_ifname,
IFNAMSIZ);
goterr=1;
}
}
if (!goterr) {
strncpy(if_hwaddr.ifr_name,
master_ifname, IFNAMSIZ);
if (ioctl(skfd, SIOCSIFHWADDR,
&if_hwaddr) < 0) {
fprintf(stderr,
"Could not set MAC "
"address of %s: %s\n",
master_ifname,
strerror(errno));
goterr=1;
} else {
hwaddr_notset = 0;
}
}
if (master_up) {
if_flags.ifr_flags |= IFF_UP;
if (ioctl(skfd, SIOCSIFFLAGS,
&if_flags) < 0) {
fprintf(stderr,
"Bringing up interface "
"%s failed: %s\n",
master_ifname,
strerror(errno));
}
}
} else {
/* we'll assign master's hwaddr to this slave */
if (ifr2.ifr_flags & IFF_UP) { if (ifr2.ifr_flags & IFF_UP) {
ifr2.ifr_flags &= ~IFF_UP; ifr2.ifr_flags &= ~IFF_UP;
if (ioctl(skfd, SIOCSIFFLAGS, &ifr2) < 0) { if (ioctl(skfd, SIOCSIFFLAGS, &ifr2) < 0) {
......
...@@ -1521,6 +1521,12 @@ M: Kai.Makisara@metla.fi ...@@ -1521,6 +1521,12 @@ M: Kai.Makisara@metla.fi
L: linux-scsi@vger.kernel.org L: linux-scsi@vger.kernel.org
S: Maintained S: Maintained
SCTP PROTOCOL
P: Jon Grimm
M: jgrimm2@us.ibm.com
L: lksctp-developers@lists.sourceforge.net
S: Supported
SCx200 CPU SUPPORT SCx200 CPU SUPPORT
P: Christer Weinigel P: Christer Weinigel
M: christer@weinigel.se M: christer@weinigel.se
......
...@@ -177,20 +177,91 @@ ...@@ -177,20 +177,91 @@
* - Port Gleb Natapov's multicast support patchs from 2.4.12 * - Port Gleb Natapov's multicast support patchs from 2.4.12
* to 2.4.18 adding support for multicast. * to 2.4.18 adding support for multicast.
* *
* 2002/06/17 - Tony Cureington <tony.cureington * hp_com> * 2002/06/10 - Tony Cureington <tony.cureington * hp_com>
* - corrected uninitialized pointer (ifr.ifr_data) in bond_check_dev_link; * - corrected uninitialized pointer (ifr.ifr_data) in bond_check_dev_link;
* actually changed function to use ETHTOOL, then MIIPHY, and finally * actually changed function to use MIIPHY, then MIIREG, and finally
* MIIREG to determine the link status * ETHTOOL to determine the link status
* - fixed bad ifr_data pointer assignments in bond_ioctl * - fixed bad ifr_data pointer assignments in bond_ioctl
* - corrected mode 1 being reported as active-backup in bond_get_info; * - corrected mode 1 being reported as active-backup in bond_get_info;
* also added text to distinguish type of load balancing (rr or xor) * also added text to distinguish type of load balancing (rr or xor)
* - change arp_ip_target module param from "1-12s" (array of 12 ptrs) * - change arp_ip_target module param from "1-12s" (array of 12 ptrs)
* to "s" (a single ptr) * to "s" (a single ptr)
* *
* 2002/08/30 - Jay Vosburgh <fubar at us dot ibm dot com>
* - Removed acquisition of xmit_lock in set_multicast_list; caused
* deadlock on SMP (lock is held by caller).
* - Revamped SIOCGMIIPHY, SIOCGMIIREG portion of bond_check_dev_link().
*
* 2002/09/18 - Jay Vosburgh <fubar at us dot ibm dot com> * 2002/09/18 - Jay Vosburgh <fubar at us dot ibm dot com>
* - Fixed up bond_check_dev_link() (and callers): removed some magic * - Fixed up bond_check_dev_link() (and callers): removed some magic
* numbers, banished local MII_ defines, wrapped ioctl calls to * numbers, banished local MII_ defines, wrapped ioctl calls to
* prevent EFAULT errors * prevent EFAULT errors
*
* 2002/9/30 - Jay Vosburgh <fubar at us dot ibm dot com>
* - make sure the ip target matches the arp_target before saving the
* hw address.
*
* 2002/9/30 - Dan Eisner <eisner at 2robots dot com>
* - make sure my_ip is set before taking down the link, since
* not all switches respond if the source ip is not set.
*
* 2002/10/8 - Janice Girouard <girouard at us dot ibm dot com>
* - read in the local ip address when enslaving a device
* - add primary support
* - make sure 2*arp_interval has passed when a new device
* is brought on-line before taking it down.
*
* 2002/09/11 - Philippe De Muyter <phdm at macqel dot be>
* - Added bond_xmit_broadcast logic.
* - Added bond_mode() support function.
*
* 2002/10/26 - Laurent Deniel <laurent.deniel at free.fr>
* - allow to register multicast addresses only on active slave
* (useful in active-backup mode)
* - add multicast module parameter
* - fix deletion of multicast groups after unloading module
*
* 2002/11/06 - Kameshwara Rayaprolu <kameshwara.rao * wipro_com>
* - Changes to prevent panic from closing the device twice; if we close
* the device in bond_release, we must set the original_flags to down
* so it won't be closed again by the network layer.
*
* 2002/11/07 - Tony Cureington <tony.cureington * hp_com>
* - Fix arp_target_hw_addr memory leak
* - Created activebackup_arp_monitor function to handle arp monitoring
* in active backup mode - the bond_arp_monitor had several problems...
* such as allowing slaves to tx arps sequentially without any delay
* for a response
* - Renamed bond_arp_monitor to loadbalance_arp_monitor and re-wrote
* this function to just handle arp monitoring in load-balancing mode;
* it is a lot more compact now
* - Changes to ensure one and only one slave transmits in active-backup
* mode
* - Robustesize parameters; warn users about bad combinations of
* parameters; also if miimon is specified and a network driver does
* not support MII or ETHTOOL, inform the user of this
* - Changes to support link_failure_count when in arp monitoring mode
* - Fix up/down delay reported in /proc
* - Added version; log version; make version available from "modinfo -d"
* - Fixed problem in bond_check_dev_link - if the first IOCTL (SIOCGMIIPH)
* failed, the ETHTOOL ioctl never got a chance
*
* 2002/11/16 - Laurent Deniel <laurent.deniel at free.fr>
* - fix multicast handling in activebackup_arp_monitor
* - remove one unnecessary and confusing current_slave == slave test
* in activebackup_arp_monitor
*
* 2002/11/17 - Laurent Deniel <laurent.deniel at free.fr>
* - fix bond_slave_info_query when slave_id = num_slaves
*
* 2002/11/19 - Janice Girouard <girouard at us dot ibm dot com>
* - correct ifr_data reference. Update ifr_data reference
* to mii_ioctl_data struct values to avoid confusion.
*
*
* 2002/11/22 - Bert Barbe <bert.barbe at oracle dot com>
* - Add support for multiple arp_ip_target
*
*/ */
#include <linux/config.h> #include <linux/config.h>
...@@ -201,6 +272,7 @@ ...@@ -201,6 +272,7 @@
#include <linux/interrupt.h> #include <linux/interrupt.h>
#include <linux/ioport.h> #include <linux/ioport.h>
#include <linux/in.h> #include <linux/in.h>
#include <linux/ip.h>
#include <linux/slab.h> #include <linux/slab.h>
#include <linux/string.h> #include <linux/string.h>
#include <linux/init.h> #include <linux/init.h>
...@@ -208,6 +280,7 @@ ...@@ -208,6 +280,7 @@
#include <linux/socket.h> #include <linux/socket.h>
#include <linux/errno.h> #include <linux/errno.h>
#include <linux/netdevice.h> #include <linux/netdevice.h>
#include <linux/inetdevice.h>
#include <linux/etherdevice.h> #include <linux/etherdevice.h>
#include <linux/skbuff.h> #include <linux/skbuff.h>
#include <net/sock.h> #include <net/sock.h>
...@@ -225,6 +298,13 @@ ...@@ -225,6 +298,13 @@
#include <asm/dma.h> #include <asm/dma.h>
#include <asm/uaccess.h> #include <asm/uaccess.h>
#define DRV_VERSION "2.4.20-20021210"
#define DRV_RELDATE "December 10, 2002"
#define DRV_NAME "bonding"
#define DRV_DESCRIPTION "Ethernet Channel Bonding Driver"
static const char *version =
DRV_NAME ".c:v" DRV_VERSION " (" DRV_RELDATE ")\n";
/* monitor all links that often (in milliseconds). <=0 disables monitoring */ /* monitor all links that often (in milliseconds). <=0 disables monitoring */
#ifndef BOND_LINK_MON_INTERV #ifndef BOND_LINK_MON_INTERV
...@@ -235,20 +315,31 @@ ...@@ -235,20 +315,31 @@
#define BOND_LINK_ARP_INTERV 0 #define BOND_LINK_ARP_INTERV 0
#endif #endif
#ifndef MAX_ARP_IP_TARGETS
#define MAX_ARP_IP_TARGETS 16
#endif
static int arp_interval = BOND_LINK_ARP_INTERV; static int arp_interval = BOND_LINK_ARP_INTERV;
static char *arp_ip_target = NULL; static char *arp_ip_target[MAX_ARP_IP_TARGETS] = { NULL, };
static unsigned long arp_target = 0; static unsigned long arp_target[MAX_ARP_IP_TARGETS] = { 0, } ;
static int arp_ip_count = 0;
static u32 my_ip = 0; static u32 my_ip = 0;
char *arp_target_hw_addr = NULL; char *arp_target_hw_addr = NULL;
static char *primary= NULL;
static int max_bonds = BOND_DEFAULT_MAX_BONDS; static int max_bonds = BOND_DEFAULT_MAX_BONDS;
static int miimon = BOND_LINK_MON_INTERV; static int miimon = BOND_LINK_MON_INTERV;
static int mode = BOND_MODE_ROUNDROBIN; static int mode = BOND_MODE_ROUNDROBIN;
static int updelay = 0; static int updelay = 0;
static int downdelay = 0; static int downdelay = 0;
#define BOND_MULTICAST_DISABLED 0
#define BOND_MULTICAST_ACTIVE 1
#define BOND_MULTICAST_ALL 2
static int multicast = BOND_MULTICAST_ALL;
static int first_pass = 1; static int first_pass = 1;
int bond_cnt;
static struct bonding *these_bonds = NULL; static struct bonding *these_bonds = NULL;
static struct net_device *dev_bonds = NULL; static struct net_device *dev_bonds = NULL;
...@@ -259,13 +350,17 @@ MODULE_PARM_DESC(miimon, "Link check interval in milliseconds"); ...@@ -259,13 +350,17 @@ MODULE_PARM_DESC(miimon, "Link check interval in milliseconds");
MODULE_PARM(mode, "i"); MODULE_PARM(mode, "i");
MODULE_PARM(arp_interval, "i"); MODULE_PARM(arp_interval, "i");
MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds"); MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
MODULE_PARM(arp_ip_target, "s"); MODULE_PARM(arp_ip_target, "1-" __MODULE_STRING(MAX_ARP_IP_TARGETS) "s");
MODULE_PARM_DESC(arp_ip_target, "arp target in n.n.n.n form"); MODULE_PARM_DESC(arp_ip_target, "arp targets in n.n.n.n form");
MODULE_PARM_DESC(mode, "Mode of operation : 0 for round robin, 1 for active-backup, 2 for xor"); MODULE_PARM_DESC(mode, "Mode of operation : 0 for round robin, 1 for active-backup, 2 for xor");
MODULE_PARM(updelay, "i"); MODULE_PARM(updelay, "i");
MODULE_PARM_DESC(updelay, "Delay before considering link up, in milliseconds"); MODULE_PARM_DESC(updelay, "Delay before considering link up, in milliseconds");
MODULE_PARM(downdelay, "i"); MODULE_PARM(downdelay, "i");
MODULE_PARM_DESC(downdelay, "Delay before considering link down, in milliseconds"); MODULE_PARM_DESC(downdelay, "Delay before considering link down, in milliseconds");
MODULE_PARM(primary, "s");
MODULE_PARM_DESC(primary, "Primary network device to use");
MODULE_PARM(multicast, "i");
MODULE_PARM_DESC(multicast, "Mode for multicast support : 0 for none, 1 for active slave, 2 for all slaves (default)");
extern void arp_send( int type, int ptype, u32 dest_ip, struct net_device *dev, extern void arp_send( int type, int ptype, u32 dest_ip, struct net_device *dev,
u32 src_ip, unsigned char *dest_hw, unsigned char *src_hw, u32 src_ip, unsigned char *dest_hw, unsigned char *src_hw,
...@@ -276,7 +371,8 @@ static int bond_xmit_xor(struct sk_buff *skb, struct net_device *dev); ...@@ -276,7 +371,8 @@ static int bond_xmit_xor(struct sk_buff *skb, struct net_device *dev);
static int bond_xmit_activebackup(struct sk_buff *skb, struct net_device *dev); static int bond_xmit_activebackup(struct sk_buff *skb, struct net_device *dev);
static struct net_device_stats *bond_get_stats(struct net_device *dev); static struct net_device_stats *bond_get_stats(struct net_device *dev);
static void bond_mii_monitor(struct net_device *dev); static void bond_mii_monitor(struct net_device *dev);
static void bond_arp_monitor(struct net_device *dev); static void loadbalance_arp_monitor(struct net_device *dev);
static void activebackup_arp_monitor(struct net_device *dev);
static int bond_event(struct notifier_block *this, unsigned long event, void *ptr); static int bond_event(struct notifier_block *this, unsigned long event, void *ptr);
static void bond_restore_slave_flags(slave_t *slave); static void bond_restore_slave_flags(slave_t *slave);
static void bond_mc_list_destroy(struct bonding *bond); static void bond_mc_list_destroy(struct bonding *bond);
...@@ -287,6 +383,7 @@ static inline int dmi_same(struct dev_mc_list *dmi1, struct dev_mc_list *dmi2); ...@@ -287,6 +383,7 @@ static inline int dmi_same(struct dev_mc_list *dmi1, struct dev_mc_list *dmi2);
static void bond_set_promiscuity(bonding_t *bond, int inc); static void bond_set_promiscuity(bonding_t *bond, int inc);
static void bond_set_allmulti(bonding_t *bond, int inc); static void bond_set_allmulti(bonding_t *bond, int inc);
static struct dev_mc_list* bond_mc_list_find_dmi(struct dev_mc_list *dmi, struct dev_mc_list *mc_list); static struct dev_mc_list* bond_mc_list_find_dmi(struct dev_mc_list *dmi, struct dev_mc_list *mc_list);
static void bond_mc_update(bonding_t *bond, slave_t *new, slave_t *old);
static void bond_set_slave_inactive_flags(slave_t *slave); static void bond_set_slave_inactive_flags(slave_t *slave);
static void bond_set_slave_active_flags(slave_t *slave); static void bond_set_slave_active_flags(slave_t *slave);
static int bond_enslave(struct net_device *master, struct net_device *slave); static int bond_enslave(struct net_device *master, struct net_device *slave);
...@@ -308,6 +405,47 @@ static int bond_get_info(char *buf, char **start, off_t offset, int length); ...@@ -308,6 +405,47 @@ static int bond_get_info(char *buf, char **start, off_t offset, int length);
#define IS_UP(dev) ((((dev)->flags & (IFF_UP)) == (IFF_UP)) && \ #define IS_UP(dev) ((((dev)->flags & (IFF_UP)) == (IFF_UP)) && \
(netif_running(dev) && netif_carrier_ok(dev))) (netif_running(dev) && netif_carrier_ok(dev)))
static void arp_send_all(slave_t *slave)
{
int i;
for ( i=0; (i<MAX_ARP_IP_TARGETS) && arp_target[i]; i++) {
arp_send(ARPOP_REQUEST, ETH_P_ARP, arp_target[i], slave->dev,
my_ip, arp_target_hw_addr, slave->dev->dev_addr,
arp_target_hw_addr);
}
}
static const char *bond_mode(void)
{
switch (mode) {
case BOND_MODE_ROUNDROBIN :
return "load balancing (round-robin)";
case BOND_MODE_ACTIVEBACKUP :
return "fault-tolerance (active-backup)";
case BOND_MODE_XOR :
return "load balancing (xor)";
case BOND_MODE_BROADCAST :
return "fault-tolerance (broadcast)";
default :
return "unknown";
}
}
static const char *multicast_mode(void)
{
switch(multicast) {
case BOND_MULTICAST_DISABLED :
return "disabled";
case BOND_MULTICAST_ACTIVE :
return "active slave only";
case BOND_MULTICAST_ALL :
return "all slaves";
default :
return "unknown";
}
}
static void bond_restore_slave_flags(slave_t *slave) static void bond_restore_slave_flags(slave_t *slave)
{ {
...@@ -415,11 +553,24 @@ static u16 bond_check_dev_link(struct net_device *dev) ...@@ -415,11 +553,24 @@ static u16 bond_check_dev_link(struct net_device *dev)
/* call it and not the others for that team */ /* call it and not the others for that team */
/* member. */ /* member. */
/* try SOICETHTOOL ioctl, some drivers cache ETHTOOL_GLINK */ /*
/* for a period of time; we need to encourage link status */ * We cannot assume that SIOCGMIIPHY will also read a
/* be reported by network drivers in real time; if the */ * register; not all network drivers (e.g., e100)
/* value is cached, the mmimon module parm may have no */ * support that.
/* effect... */ */
/* Yes, the mii is overlaid on the ifreq.ifr_ifru */
mii = (struct mii_ioctl_data *)&ifr.ifr_data;
if (IOCTL(dev, &ifr, SIOCGMIIPHY) == 0) {
mii->reg_num = MII_BMSR;
if (IOCTL(dev, &ifr, SIOCGMIIREG) == 0) {
return mii->val_out & BMSR_LSTATUS;
}
}
/* try SIOCETHTOOL ioctl, some drivers cache ETHTOOL_GLINK */
/* for a period of time so we attempt to get link status */
/* from it last if the above MII ioctls fail... */
etool.cmd = ETHTOOL_GLINK; etool.cmd = ETHTOOL_GLINK;
ifr.ifr_data = (char*)&etool; ifr.ifr_data = (char*)&etool;
if (IOCTL(dev, &ifr, SIOCETHTOOL) == 0) { if (IOCTL(dev, &ifr, SIOCETHTOOL) == 0) {
...@@ -427,26 +578,14 @@ static u16 bond_check_dev_link(struct net_device *dev) ...@@ -427,26 +578,14 @@ static u16 bond_check_dev_link(struct net_device *dev)
return BMSR_LSTATUS; return BMSR_LSTATUS;
} }
else { else {
#ifdef BONDING_DEBUG
printk(KERN_INFO
":: SIOCETHTOOL shows failure \n");
#endif
return(0); return(0);
} }
} }
/*
* We cannot assume that SIOCGMIIPHY will also read a
* register; not all network drivers support that.
*/
/* Yes, the mii is overlaid on the ifreq.ifr_ifru */
mii = (struct mii_ioctl_data *)&ifr.ifr_data;
if (IOCTL(dev, &ifr, SIOCGMIIPHY) != 0) {
return BMSR_LSTATUS; /* can't tell */
}
mii->reg_num = MII_BMSR;
if (IOCTL(dev, &ifr, SIOCGMIIREG) == 0) {
return mii->val_out & BMSR_LSTATUS;
}
} }
return BMSR_LSTATUS; /* spoof link up ( we can't check it) */ return BMSR_LSTATUS; /* spoof link up ( we can't check it) */
} }
...@@ -483,7 +622,11 @@ static int bond_open(struct net_device *dev) ...@@ -483,7 +622,11 @@ static int bond_open(struct net_device *dev)
init_timer(arp_timer); init_timer(arp_timer);
arp_timer->expires = jiffies + (arp_interval * HZ / 1000); arp_timer->expires = jiffies + (arp_interval * HZ / 1000);
arp_timer->data = (unsigned long)dev; arp_timer->data = (unsigned long)dev;
arp_timer->function = (void *)&bond_arp_monitor; if (mode == BOND_MODE_ACTIVEBACKUP) {
arp_timer->function = (void *)&activebackup_arp_monitor;
} else {
arp_timer->function = (void *)&loadbalance_arp_monitor;
}
add_timer(arp_timer); add_timer(arp_timer);
} }
return 0; return 0;
...@@ -501,6 +644,10 @@ static int bond_close(struct net_device *master) ...@@ -501,6 +644,10 @@ static int bond_close(struct net_device *master)
} }
if (arp_interval> 0) { /* arp interval, in milliseconds. */ if (arp_interval> 0) { /* arp interval, in milliseconds. */
del_timer(&bond->arp_timer); del_timer(&bond->arp_timer);
if (arp_target_hw_addr != NULL) {
kfree(arp_target_hw_addr);
arp_target_hw_addr = NULL;
}
} }
/* Release the bonded slaves */ /* Release the bonded slaves */
...@@ -545,9 +692,18 @@ static void bond_mc_list_destroy(struct bonding *bond) ...@@ -545,9 +692,18 @@ static void bond_mc_list_destroy(struct bonding *bond)
static void bond_mc_add(bonding_t *bond, void *addr, int alen) static void bond_mc_add(bonding_t *bond, void *addr, int alen)
{ {
slave_t *slave; slave_t *slave;
switch (multicast) {
for (slave = bond->prev; slave != (slave_t*)bond; slave = slave->prev) { case BOND_MULTICAST_ACTIVE :
dev_mc_add(slave->dev, addr, alen, 0); /* write lock already acquired */
if (bond->current_slave != NULL)
dev_mc_add(bond->current_slave->dev, addr, alen, 0);
break;
case BOND_MULTICAST_ALL :
for (slave = bond->prev; slave != (slave_t*)bond; slave = slave->prev)
dev_mc_add(slave->dev, addr, alen, 0);
break;
case BOND_MULTICAST_DISABLED :
break;
} }
} }
...@@ -557,9 +713,19 @@ static void bond_mc_add(bonding_t *bond, void *addr, int alen) ...@@ -557,9 +713,19 @@ static void bond_mc_add(bonding_t *bond, void *addr, int alen)
static void bond_mc_delete(bonding_t *bond, void *addr, int alen) static void bond_mc_delete(bonding_t *bond, void *addr, int alen)
{ {
slave_t *slave; slave_t *slave;
switch (multicast) {
for (slave = bond->prev; slave != (slave_t*)bond; slave = slave->prev) case BOND_MULTICAST_ACTIVE :
dev_mc_delete(slave->dev, addr, alen, 0); /* write lock already acquired */
if (bond->current_slave != NULL)
dev_mc_delete(bond->current_slave->dev, addr, alen, 0);
break;
case BOND_MULTICAST_ALL :
for (slave = bond->prev; slave != (slave_t*)bond; slave = slave->prev)
dev_mc_delete(slave->dev, addr, alen, 0);
break;
case BOND_MULTICAST_DISABLED :
break;
}
} }
/* /*
...@@ -603,9 +769,19 @@ static inline int dmi_same(struct dev_mc_list *dmi1, struct dev_mc_list *dmi2) ...@@ -603,9 +769,19 @@ static inline int dmi_same(struct dev_mc_list *dmi1, struct dev_mc_list *dmi2)
static void bond_set_promiscuity(bonding_t *bond, int inc) static void bond_set_promiscuity(bonding_t *bond, int inc)
{ {
slave_t *slave; slave_t *slave;
switch (multicast) {
for (slave = bond->prev; slave != (slave_t*)bond; slave = slave->prev) case BOND_MULTICAST_ACTIVE :
dev_set_promiscuity(slave->dev, inc); /* write lock already acquired */
if (bond->current_slave != NULL)
dev_set_promiscuity(bond->current_slave->dev, inc);
break;
case BOND_MULTICAST_ALL :
for (slave = bond->prev; slave != (slave_t*)bond; slave = slave->prev)
dev_set_promiscuity(slave->dev, inc);
break;
case BOND_MULTICAST_DISABLED :
break;
}
} }
/* /*
...@@ -614,9 +790,19 @@ static void bond_set_promiscuity(bonding_t *bond, int inc) ...@@ -614,9 +790,19 @@ static void bond_set_promiscuity(bonding_t *bond, int inc)
static void bond_set_allmulti(bonding_t *bond, int inc) static void bond_set_allmulti(bonding_t *bond, int inc)
{ {
slave_t *slave; slave_t *slave;
switch (multicast) {
for (slave = bond->prev; slave != (slave_t*)bond; slave = slave->prev) case BOND_MULTICAST_ACTIVE :
dev_set_allmulti(slave->dev, inc); /* write lock already acquired */
if (bond->current_slave != NULL)
dev_set_allmulti(bond->current_slave->dev, inc);
break;
case BOND_MULTICAST_ALL :
for (slave = bond->prev; slave != (slave_t*)bond; slave = slave->prev)
dev_set_allmulti(slave->dev, inc);
break;
case BOND_MULTICAST_DISABLED :
break;
}
} }
/* /*
...@@ -641,6 +827,8 @@ static void set_multicast_list(struct net_device *master) ...@@ -641,6 +827,8 @@ static void set_multicast_list(struct net_device *master)
struct dev_mc_list *dmi; struct dev_mc_list *dmi;
unsigned long flags = 0; unsigned long flags = 0;
if (multicast == BOND_MULTICAST_DISABLED)
return;
/* /*
* Lock the private data for the master * Lock the private data for the master
*/ */
...@@ -682,6 +870,43 @@ static void set_multicast_list(struct net_device *master) ...@@ -682,6 +870,43 @@ static void set_multicast_list(struct net_device *master)
write_unlock_irqrestore(&bond->lock, flags); write_unlock_irqrestore(&bond->lock, flags);
} }
/*
* Update the mc list and multicast-related flags for the new and
* old active slaves (if any) according to the multicast mode
*/
static void bond_mc_update(bonding_t *bond, slave_t *new, slave_t *old)
{
struct dev_mc_list *dmi;
switch(multicast) {
case BOND_MULTICAST_ACTIVE :
if (bond->device->flags & IFF_PROMISC) {
if (old != NULL && new != old)
dev_set_promiscuity(old->dev, -1);
dev_set_promiscuity(new->dev, 1);
}
if (bond->device->flags & IFF_ALLMULTI) {
if (old != NULL && new != old)
dev_set_allmulti(old->dev, -1);
dev_set_allmulti(new->dev, 1);
}
/* first remove all mc addresses from old slave if any,
and _then_ add them to new active slave */
if (old != NULL && new != old) {
for (dmi = bond->device->mc_list; dmi != NULL; dmi = dmi->next)
dev_mc_delete(old->dev, dmi->dmi_addr, dmi->dmi_addrlen, 0);
}
for (dmi = bond->device->mc_list; dmi != NULL; dmi = dmi->next)
dev_mc_add(new->dev, dmi->dmi_addr, dmi->dmi_addrlen, 0);
break;
case BOND_MULTICAST_ALL :
/* nothing to do: mc list is already up-to-date on all slaves */
break;
case BOND_MULTICAST_DISABLED :
break;
}
}
/* /*
* This function counts the number of attached * This function counts the number of attached
* slaves for use by bond_xmit_xor. * slaves for use by bond_xmit_xor.
...@@ -703,9 +928,16 @@ static int bond_enslave(struct net_device *master_dev, ...@@ -703,9 +928,16 @@ static int bond_enslave(struct net_device *master_dev,
bonding_t *bond = NULL; bonding_t *bond = NULL;
slave_t *new_slave = NULL; slave_t *new_slave = NULL;
unsigned long flags = 0; unsigned long flags = 0;
unsigned long rflags = 0;
int ndx = 0; int ndx = 0;
int err = 0; int err = 0;
struct dev_mc_list *dmi; struct dev_mc_list *dmi;
struct in_ifaddr **ifap;
struct in_ifaddr *ifa;
static int (* ioctl)(struct net_device *, struct ifreq *, int);
struct ifreq ifr;
struct ethtool_value etool;
int link_reporting = 0;
if (master_dev == NULL || slave_dev == NULL) { if (master_dev == NULL || slave_dev == NULL) {
return -ENODEV; return -ENODEV;
...@@ -758,17 +990,19 @@ static int bond_enslave(struct net_device *master_dev, ...@@ -758,17 +990,19 @@ static int bond_enslave(struct net_device *master_dev,
new_slave->dev = slave_dev; new_slave->dev = slave_dev;
/* set promiscuity level to new slave */ if (multicast == BOND_MULTICAST_ALL) {
if (master_dev->flags & IFF_PROMISC) /* set promiscuity level to new slave */
dev_set_promiscuity(slave_dev, 1); if (master_dev->flags & IFF_PROMISC)
dev_set_promiscuity(slave_dev, 1);
/* set allmulti level to new slave */ /* set allmulti level to new slave */
if (master_dev->flags & IFF_ALLMULTI) if (master_dev->flags & IFF_ALLMULTI)
dev_set_allmulti(slave_dev, 1); dev_set_allmulti(slave_dev, 1);
/* upload master's mc_list to new slave */ /* upload master's mc_list to new slave */
for (dmi = master_dev->mc_list; dmi != NULL; dmi = dmi->next) for (dmi = master_dev->mc_list; dmi != NULL; dmi = dmi->next)
dev_mc_add (slave_dev, dmi->dmi_addr, dmi->dmi_addrlen, 0); dev_mc_add (slave_dev, dmi->dmi_addr, dmi->dmi_addrlen, 0);
}
/* /*
* queue to the end of the slaves list, make the first element its * queue to the end of the slaves list, make the first element its
...@@ -799,6 +1033,56 @@ static int bond_enslave(struct net_device *master_dev, ...@@ -799,6 +1033,56 @@ static int bond_enslave(struct net_device *master_dev,
new_slave->delay = 0; new_slave->delay = 0;
new_slave->link_failure_count = 0; new_slave->link_failure_count = 0;
if (miimon > 0) {
/* if the network driver for the slave does not support
* ETHTOOL/MII link status reporting, warn the user of this
*/
if ((ioctl = slave_dev->do_ioctl) != NULL) {
etool.cmd = ETHTOOL_GLINK;
ifr.ifr_data = (char*)&etool;
if (IOCTL(slave_dev, &ifr, SIOCETHTOOL) == 0) {
link_reporting = 1;
}
else {
if (IOCTL(slave_dev, &ifr, SIOCGMIIPHY) == 0) {
/* Yes, the mii is overlaid on the
* ifreq.ifr_ifru
*/
((struct mii_ioctl_data*)
(&ifr.ifr_data))->reg_num = 1;
if (IOCTL(slave_dev, &ifr, SIOCGMIIREG)
== 0) {
link_reporting = 1;
}
}
}
}
if ((link_reporting == 0) && (arp_interval == 0)) {
/* miimon is set but a bonded network driver does
* not support ETHTOOL/MII and arp_interval is
* not set
*/
printk(KERN_ERR
"bond_enslave(): MII and ETHTOOL support not "
"available for interface %s, and "
"arp_interval/arp_ip_target module parameters "
"not specified, thus bonding will not detect "
"link failures! see bonding.txt for details.\n",
slave_dev->name);
}
else if (link_reporting == 0) {
/* unable get link status using mii/ethtool */
printk(KERN_WARNING
"bond_enslave: can't get link status from "
"interface %s; the network driver associated "
"with this interface does not support "
"MII or ETHTOOL link status reporting, thus "
"miimon has no effect on this interface.\n",
slave_dev->name);
}
}
/* check for initial state */ /* check for initial state */
if ((miimon <= 0) || if ((miimon <= 0) ||
(bond_check_dev_link(slave_dev) == BMSR_LSTATUS)) { (bond_check_dev_link(slave_dev) == BMSR_LSTATUS)) {
...@@ -806,6 +1090,7 @@ static int bond_enslave(struct net_device *master_dev, ...@@ -806,6 +1090,7 @@ static int bond_enslave(struct net_device *master_dev,
printk(KERN_CRIT "Initial state of slave_dev is BOND_LINK_UP\n"); printk(KERN_CRIT "Initial state of slave_dev is BOND_LINK_UP\n");
#endif #endif
new_slave->link = BOND_LINK_UP; new_slave->link = BOND_LINK_UP;
new_slave->jiffies = jiffies;
} }
else { else {
#ifdef BONDING_DEBUG #ifdef BONDING_DEBUG
...@@ -832,6 +1117,7 @@ static int bond_enslave(struct net_device *master_dev, ...@@ -832,6 +1117,7 @@ static int bond_enslave(struct net_device *master_dev,
is OK, so make this interface the active one */ is OK, so make this interface the active one */
bond->current_slave = new_slave; bond->current_slave = new_slave;
bond_set_slave_active_flags(new_slave); bond_set_slave_active_flags(new_slave);
bond_mc_update(bond, new_slave, NULL);
} }
else { else {
#ifdef BONDING_DEBUG #ifdef BONDING_DEBUG
...@@ -839,15 +1125,24 @@ static int bond_enslave(struct net_device *master_dev, ...@@ -839,15 +1125,24 @@ static int bond_enslave(struct net_device *master_dev,
#endif #endif
bond_set_slave_inactive_flags(new_slave); bond_set_slave_inactive_flags(new_slave);
} }
read_lock_irqsave(&(((struct in_device *)slave_dev->ip_ptr)->lock), rflags);
ifap= &(((struct in_device *)slave_dev->ip_ptr)->ifa_list);
ifa = *ifap;
my_ip = ifa->ifa_address;
read_unlock_irqrestore(&(((struct in_device *)slave_dev->ip_ptr)->lock), rflags);
/* if there is a primary slave, remember it */
if (primary != NULL)
if( strcmp(primary, new_slave->dev->name) == 0)
bond->primary_slave = new_slave;
} else { } else {
#ifdef BONDING_DEBUG #ifdef BONDING_DEBUG
printk(KERN_CRIT "This slave is always active in trunk mode\n"); printk(KERN_CRIT "This slave is always active in trunk mode\n");
#endif #endif
/* always active in trunk mode */ /* always active in trunk mode */
new_slave->state = BOND_STATE_ACTIVE; new_slave->state = BOND_STATE_ACTIVE;
if (bond->current_slave == NULL) { if (bond->current_slave == NULL)
bond->current_slave = new_slave; bond->current_slave = new_slave;
}
} }
update_slave_cnt(bond); update_slave_cnt(bond);
...@@ -938,6 +1233,7 @@ static int bond_change_active(struct net_device *master_dev, struct net_device * ...@@ -938,6 +1233,7 @@ static int bond_change_active(struct net_device *master_dev, struct net_device *
IS_UP(newactive->dev)) { IS_UP(newactive->dev)) {
bond_set_slave_inactive_flags(oldactive); bond_set_slave_inactive_flags(oldactive);
bond_set_slave_active_flags(newactive); bond_set_slave_active_flags(newactive);
bond_mc_update(bond, newactive, oldactive);
bond->current_slave = newactive; bond->current_slave = newactive;
printk("%s : activate %s(old : %s)\n", printk("%s : activate %s(old : %s)\n",
master_dev->name, newactive->dev->name, master_dev->name, newactive->dev->name,
...@@ -978,6 +1274,7 @@ slave_t *change_active_interface(bonding_t *bond) ...@@ -978,6 +1274,7 @@ slave_t *change_active_interface(bonding_t *bond)
newslave = bond->current_slave = bond->next; newslave = bond->current_slave = bond->next;
write_unlock(&bond->ptrlock); write_unlock(&bond->ptrlock);
} else { } else {
printk (" but could not find any %s interface.\n", printk (" but could not find any %s interface.\n",
(mode == BOND_MODE_ACTIVEBACKUP) ? "backup":"other"); (mode == BOND_MODE_ACTIVEBACKUP) ? "backup":"other");
write_lock(&bond->ptrlock); write_lock(&bond->ptrlock);
...@@ -985,16 +1282,38 @@ slave_t *change_active_interface(bonding_t *bond) ...@@ -985,16 +1282,38 @@ slave_t *change_active_interface(bonding_t *bond)
write_unlock(&bond->ptrlock); write_unlock(&bond->ptrlock);
return NULL; /* still no slave, return NULL */ return NULL; /* still no slave, return NULL */
} }
} else if (mode == BOND_MODE_ACTIVEBACKUP) {
/* make sure oldslave doesn't send arps - this could
* cause a ping-pong effect between interfaces since they
* would be able to tx arps - in active backup only one
* slave should be able to tx arps, and that should be
* the current_slave; the only exception is when all
* slaves have gone down, then only one non-current slave can
* send arps at a time; clearing oldslaves' mc list is handled
* later in this function.
*/
bond_set_slave_inactive_flags(oldslave);
} }
mintime = updelay; mintime = updelay;
/* first try the primary link; if arping, a link must tx/rx traffic
* before it can be considered the current_slave - also, we would skip
* slaves between the current_slave and primary_slave that may be up
* and able to arp
*/
if ((bond->primary_slave != NULL) && (arp_interval == 0)) {
if (IS_UP(bond->primary_slave->dev))
newslave = bond->primary_slave;
}
do { do {
if (IS_UP(newslave->dev)) { if (IS_UP(newslave->dev)) {
if (newslave->link == BOND_LINK_UP) { if (newslave->link == BOND_LINK_UP) {
/* this one is immediately usable */ /* this one is immediately usable */
if (mode == BOND_MODE_ACTIVEBACKUP) { if (mode == BOND_MODE_ACTIVEBACKUP) {
bond_set_slave_active_flags(newslave); bond_set_slave_active_flags(newslave);
bond_mc_update(bond, newslave, oldslave);
printk (" and making interface %s the active one.\n", printk (" and making interface %s the active one.\n",
newslave->dev->name); newslave->dev->name);
} }
...@@ -1030,14 +1349,30 @@ slave_t *change_active_interface(bonding_t *bond) ...@@ -1030,14 +1349,30 @@ slave_t *change_active_interface(bonding_t *bond)
bestslave->delay = 0; bestslave->delay = 0;
bestslave->link = BOND_LINK_UP; bestslave->link = BOND_LINK_UP;
bestslave->jiffies = jiffies;
bond_set_slave_active_flags(bestslave); bond_set_slave_active_flags(bestslave);
bond_mc_update(bond, bestslave, oldslave);
write_lock(&bond->ptrlock); write_lock(&bond->ptrlock);
bond->current_slave = bestslave; bond->current_slave = bestslave;
write_unlock(&bond->ptrlock); write_unlock(&bond->ptrlock);
return bestslave; return bestslave;
} }
if ((mode == BOND_MODE_ACTIVEBACKUP) &&
(multicast == BOND_MULTICAST_ACTIVE) &&
(oldslave != NULL)) {
/* flush bonds (master's) mc_list from oldslave since it wasn't
* updated (and deleted) above
*/
bond_mc_list_flush(oldslave->dev, bond->device);
if (bond->device->flags & IFF_PROMISC) {
dev_set_promiscuity(oldslave->dev, -1);
}
if (bond->device->flags & IFF_ALLMULTI) {
dev_set_allmulti(oldslave->dev, -1);
}
}
printk (" but could not find any %s interface.\n", printk (" but could not find any %s interface.\n",
(mode == BOND_MODE_ACTIVEBACKUP) ? "backup":"other"); (mode == BOND_MODE_ACTIVEBACKUP) ? "backup":"other");
...@@ -1081,6 +1416,7 @@ static int bond_release(struct net_device *master, struct net_device *slave) ...@@ -1081,6 +1416,7 @@ static int bond_release(struct net_device *master, struct net_device *slave)
return -EINVAL; return -EINVAL;
} }
bond->current_arp_slave = NULL;
our_slave = (slave_t *)bond; our_slave = (slave_t *)bond;
old_current = bond->current_slave; old_current = bond->current_slave;
while ((our_slave = our_slave->prev) != (slave_t *)bond) { while ((our_slave = our_slave->prev) != (slave_t *)bond) {
...@@ -1101,16 +1437,18 @@ static int bond_release(struct net_device *master, struct net_device *slave) ...@@ -1101,16 +1437,18 @@ static int bond_release(struct net_device *master, struct net_device *slave)
/* release the slave from its bond */ /* release the slave from its bond */
/* flush master's mc_list from slave */ if (multicast == BOND_MULTICAST_ALL) {
bond_mc_list_flush (slave, master); /* flush master's mc_list from slave */
bond_mc_list_flush (slave, master);
/* unset promiscuity level from slave */
if (master->flags & IFF_PROMISC) /* unset promiscuity level from slave */
dev_set_promiscuity(slave, -1); if (master->flags & IFF_PROMISC)
dev_set_promiscuity(slave, -1);
/* unset allmulti level from slave */ /* unset allmulti level from slave */
if (master->flags & IFF_ALLMULTI) if (master->flags & IFF_ALLMULTI)
dev_set_allmulti(slave, -1); dev_set_allmulti(slave, -1);
}
netdev_set_master(slave, NULL); netdev_set_master(slave, NULL);
...@@ -1122,6 +1460,7 @@ static int bond_release(struct net_device *master, struct net_device *slave) ...@@ -1122,6 +1460,7 @@ static int bond_release(struct net_device *master, struct net_device *slave)
if (slave->flags & IFF_NOARP || if (slave->flags & IFF_NOARP ||
bond->current_slave != NULL) { bond->current_slave != NULL) {
dev_close(slave); dev_close(slave);
our_slave->original_flags &= ~IFF_UP;
} }
bond_restore_slave_flags(our_slave); bond_restore_slave_flags(our_slave);
...@@ -1135,6 +1474,10 @@ static int bond_release(struct net_device *master, struct net_device *slave) ...@@ -1135,6 +1474,10 @@ static int bond_release(struct net_device *master, struct net_device *slave)
update_slave_cnt(bond); update_slave_cnt(bond);
if (bond->primary_slave == our_slave) {
bond->primary_slave = NULL;
}
write_unlock_irqrestore(&bond->lock, flags); write_unlock_irqrestore(&bond->lock, flags);
return 0; /* deletion OK */ return 0; /* deletion OK */
} }
...@@ -1166,12 +1509,28 @@ static int bond_release_all(struct net_device *master) ...@@ -1166,12 +1509,28 @@ static int bond_release_all(struct net_device *master)
} }
bond = (struct bonding *) master->priv; bond = (struct bonding *) master->priv;
bond->current_slave = NULL; bond->current_arp_slave = NULL;
while ((our_slave = bond->prev) != (slave_t *)bond) { while ((our_slave = bond->prev) != (slave_t *)bond) {
slave_dev = our_slave->dev; slave_dev = our_slave->dev;
bond->prev = our_slave->prev; bond->prev = our_slave->prev;
if (multicast == BOND_MULTICAST_ALL
|| (multicast == BOND_MULTICAST_ACTIVE
&& bond->current_slave == our_slave)) {
/* flush master's mc_list from slave */
bond_mc_list_flush (slave_dev, master);
/* unset promiscuity level from slave */
if (master->flags & IFF_PROMISC)
dev_set_promiscuity(slave_dev, -1);
/* unset allmulti level from slave */
if (master->flags & IFF_ALLMULTI)
dev_set_allmulti(slave_dev, -1);
}
kfree(our_slave); kfree(our_slave);
netdev_set_master(slave_dev, NULL); netdev_set_master(slave_dev, NULL);
...@@ -1183,9 +1542,12 @@ static int bond_release_all(struct net_device *master) ...@@ -1183,9 +1542,12 @@ static int bond_release_all(struct net_device *master)
if (slave_dev->flags & IFF_NOARP) if (slave_dev->flags & IFF_NOARP)
dev_close(slave_dev); dev_close(slave_dev);
} }
bond->current_slave = NULL;
bond->next = (slave_t *)bond; bond->next = (slave_t *)bond;
bond->slave_cnt = 0; bond->slave_cnt = 0;
printk (KERN_INFO "%s: releases all slaves\n", master->name); bond->primary_slave = NULL;
printk (KERN_INFO "%s: released all slaves\n", master->name);
return 0; return 0;
} }
...@@ -1291,6 +1653,7 @@ static void bond_mii_monitor(struct net_device *master) ...@@ -1291,6 +1653,7 @@ static void bond_mii_monitor(struct net_device *master)
} else { } else {
/* link up again */ /* link up again */
slave->link = BOND_LINK_UP; slave->link = BOND_LINK_UP;
slave->jiffies = jiffies;
printk(KERN_INFO printk(KERN_INFO
"%s: link status up again after %d ms " "%s: link status up again after %d ms "
"for interface %s.\n", "for interface %s.\n",
...@@ -1343,8 +1706,10 @@ static void bond_mii_monitor(struct net_device *master) ...@@ -1343,8 +1706,10 @@ static void bond_mii_monitor(struct net_device *master)
if (slave->delay == 0) { if (slave->delay == 0) {
/* now the link has been up for long time enough */ /* now the link has been up for long time enough */
slave->link = BOND_LINK_UP; slave->link = BOND_LINK_UP;
slave->jiffies = jiffies;
if (mode == BOND_MODE_ACTIVEBACKUP) { if ( (mode == BOND_MODE_ACTIVEBACKUP)
|| (slave != bond->primary_slave) ) {
/* prevent it from being the active one */ /* prevent it from being the active one */
slave->state = BOND_STATE_BACKUP; slave->state = BOND_STATE_BACKUP;
} }
...@@ -1358,15 +1723,26 @@ static void bond_mii_monitor(struct net_device *master) ...@@ -1358,15 +1723,26 @@ static void bond_mii_monitor(struct net_device *master)
"for interface %s.\n", "for interface %s.\n",
master->name, master->name,
dev->name); dev->name);
if ( (bond->primary_slave != NULL)
&& (slave == bond->primary_slave) )
change_active_interface(bond);
} }
else else
slave->delay--; slave->delay--;
/* we'll also look for the mostly eligible slave */ /* we'll also look for the mostly eligible slave */
if (IS_UP(dev) && (slave->delay < mindelay)) { if (bond->primary_slave == NULL) {
if (IS_UP(dev) && (slave->delay < mindelay)) {
mindelay = slave->delay;
bestslave = slave;
}
} else if ( (IS_UP(bond->primary_slave->dev)) ||
( (!IS_UP(bond->primary_slave->dev)) &&
(IS_UP(dev) && (slave->delay < mindelay)) ) ) {
mindelay = slave->delay; mindelay = slave->delay;
bestslave = slave; bestslave = slave;
} }
} }
break; break;
} /* end of switch */ } /* end of switch */
...@@ -1380,7 +1756,8 @@ static void bond_mii_monitor(struct net_device *master) ...@@ -1380,7 +1756,8 @@ static void bond_mii_monitor(struct net_device *master)
oldcurrent = bond->current_slave; oldcurrent = bond->current_slave;
read_unlock(&bond->ptrlock); read_unlock(&bond->ptrlock);
if (oldcurrent == NULL) { /* no active interface at the moment */ /* no active interface at the moment or need to bring up the primary */
if (oldcurrent == NULL) { /* no active interface at the moment */
if (bestslave != NULL) { /* last chance to find one ? */ if (bestslave != NULL) { /* last chance to find one ? */
if (bestslave->link == BOND_LINK_UP) { if (bestslave->link == BOND_LINK_UP) {
printk (KERN_INFO printk (KERN_INFO
...@@ -1395,10 +1772,12 @@ static void bond_mii_monitor(struct net_device *master) ...@@ -1395,10 +1772,12 @@ static void bond_mii_monitor(struct net_device *master)
bestslave->delay = 0; bestslave->delay = 0;
bestslave->link = BOND_LINK_UP; bestslave->link = BOND_LINK_UP;
bestslave->jiffies = jiffies;
} }
if (mode == BOND_MODE_ACTIVEBACKUP) { if (mode == BOND_MODE_ACTIVEBACKUP) {
bond_set_slave_active_flags(bestslave); bond_set_slave_active_flags(bestslave);
bond_mc_update(bond, bestslave, NULL);
} else { } else {
bestslave->state = BOND_STATE_ACTIVE; bestslave->state = BOND_STATE_ACTIVE;
} }
...@@ -1420,10 +1799,12 @@ static void bond_mii_monitor(struct net_device *master) ...@@ -1420,10 +1799,12 @@ static void bond_mii_monitor(struct net_device *master)
/* /*
* this function is called regularly to monitor each slave's link * this function is called regularly to monitor each slave's link
* insuring that traffic is being sent and received. If the adapter * ensuring that traffic is being sent and received when arp monitoring
* has been dormant, then an arp is transmitted to generate traffic * is used in load-balancing mode. if the adapter has been dormant, then an
* arp is transmitted to generate traffic. see activebackup_arp_monitor for
* arp monitoring in active backup mode.
*/ */
static void bond_arp_monitor(struct net_device *master) static void loadbalance_arp_monitor(struct net_device *master)
{ {
bonding_t *bond; bonding_t *bond;
unsigned long flags; unsigned long flags;
...@@ -1439,147 +1820,358 @@ static void bond_arp_monitor(struct net_device *master) ...@@ -1439,147 +1820,358 @@ static void bond_arp_monitor(struct net_device *master)
read_lock_irqsave(&bond->lock, flags); read_lock_irqsave(&bond->lock, flags);
if (!IS_UP(master)) { /* TODO: investigate why rtnl_shlock_nowait and rtnl_exlock_nowait
* are called below and add comment why they are required...
*/
if ((!IS_UP(master)) || rtnl_shlock_nowait()) {
mod_timer(&bond->arp_timer, next_timer); mod_timer(&bond->arp_timer, next_timer);
goto arp_monitor_out; read_unlock_irqrestore(&bond->lock, flags);
} return;
if (rtnl_shlock_nowait()) {
goto arp_monitor_out;
} }
if (rtnl_exlock_nowait()) { if (rtnl_exlock_nowait()) {
rtnl_shunlock(); rtnl_shunlock();
goto arp_monitor_out; mod_timer(&bond->arp_timer, next_timer);
read_unlock_irqrestore(&bond->lock, flags);
return;
} }
/* see if any of the previous devices are up now (i.e. they have seen a /* see if any of the previous devices are up now (i.e. they have
* response from an arp request sent by another adapter, since they * xmt and rcv traffic). the current_slave does not come into
* have the same hardware address). * the picture unless it is null. also, slave->jiffies is not needed
* here because we send an arp on each slave and give a slave as
* long as it needs to get the tx/rx within the delta.
* TODO: what about up/down delay in arp mode? it wasn't here before
* so it can wait
*/ */
slave = (slave_t *)bond; slave = (slave_t *)bond;
while ((slave = slave->prev) != (slave_t *)bond) { while ((slave = slave->prev) != (slave_t *)bond) {
read_lock(&bond->ptrlock); if (slave->link != BOND_LINK_UP) {
if ( (!(slave->link == BOND_LINK_UP))
&& (slave != bond->current_slave) ) {
read_unlock(&bond->ptrlock);
if ( ((jiffies - slave->dev->trans_start) <= if (((jiffies - slave->dev->trans_start) <=
the_delta_in_ticks) && the_delta_in_ticks) &&
((jiffies - slave->dev->last_rx) <= ((jiffies - slave->dev->last_rx) <=
the_delta_in_ticks) ) { the_delta_in_ticks)) {
slave->link = BOND_LINK_UP; slave->link = BOND_LINK_UP;
write_lock(&bond->ptrlock); slave->state = BOND_STATE_ACTIVE;
/* primary_slave has no meaning in round-robin
* mode. the window of a slave being up and
* current_slave being null after enslaving
* is closed.
*/
read_lock(&bond->ptrlock);
if (bond->current_slave == NULL) { if (bond->current_slave == NULL) {
slave->state = BOND_STATE_ACTIVE; read_unlock(&bond->ptrlock);
printk(KERN_INFO
"%s: link status definitely up "
"for interface %s, ",
master->name,
slave->dev->name);
change_active_interface(bond);
} else {
read_unlock(&bond->ptrlock);
printk(KERN_INFO
"%s: interface %s is now up\n",
master->name,
slave->dev->name);
}
}
} else {
/* slave->link == BOND_LINK_UP */
/* not all switches will respond to an arp request
* when the source ip is 0, so don't take the link down
* if we don't know our ip yet
*/
if (((jiffies - slave->dev->trans_start) >=
(2*the_delta_in_ticks)) ||
(((jiffies - slave->dev->last_rx) >=
(2*the_delta_in_ticks)) && my_ip !=0)) {
slave->link = BOND_LINK_DOWN;
slave->state = BOND_STATE_BACKUP;
if (slave->link_failure_count < UINT_MAX) {
slave->link_failure_count++;
}
printk(KERN_INFO
"%s: interface %s is now down.\n",
master->name,
slave->dev->name);
read_lock(&bond->ptrlock);
if (slave == bond->current_slave) {
read_unlock(&bond->ptrlock);
change_active_interface(bond);
} else {
read_unlock(&bond->ptrlock);
}
}
}
/* note: if switch is in round-robin mode, all links
* must tx arp to ensure all links rx an arp - otherwise
* links may oscillate or not come up at all; if switch is
* in something like xor mode, there is nothing we can
* do - all replies will be rx'ed on same link causing slaves
* to be unstable during low/no traffic periods
*/
if (IS_UP(slave->dev)) {
arp_send_all(slave);
}
}
rtnl_exunlock();
rtnl_shunlock();
read_unlock_irqrestore(&bond->lock, flags);
/* re-arm the timer */
mod_timer(&bond->arp_timer, next_timer);
}
/*
* When using arp monitoring in active-backup mode, this function is
* called to determine if any backup slaves have went down or a new
* current slave needs to be found.
* The backup slaves never generate traffic, they are considered up by merely
* receiving traffic. If the current slave goes down, each backup slave will
* be given the opportunity to tx/rx an arp before being taken down - this
* prevents all slaves from being taken down due to the current slave not
* sending any traffic for the backups to receive. The arps are not necessarily
* necessary, any tx and rx traffic will keep the current slave up. While any
* rx traffic will keep the backup slaves up, the current slave is responsible
* for generating traffic to keep them up regardless of any other traffic they
* may have received.
* see loadbalance_arp_monitor for arp monitoring in load balancing mode
*/
static void activebackup_arp_monitor(struct net_device *master)
{
bonding_t *bond;
unsigned long flags;
slave_t *slave;
int the_delta_in_ticks = arp_interval * HZ / 1000;
int next_timer = jiffies + (arp_interval * HZ / 1000);
bond = (struct bonding *) master->priv;
if (master->priv == NULL) {
mod_timer(&bond->arp_timer, next_timer);
return;
}
read_lock_irqsave(&bond->lock, flags);
if (!IS_UP(master)) {
mod_timer(&bond->arp_timer, next_timer);
read_unlock_irqrestore(&bond->lock, flags);
return;
}
/* determine if any slave has come up or any backup slave has
* gone down
* TODO: what about up/down delay in arp mode? it wasn't here before
* so it can wait
*/
slave = (slave_t *)bond;
while ((slave = slave->prev) != (slave_t *)bond) {
if (slave->link != BOND_LINK_UP) {
if ((jiffies - slave->dev->last_rx) <=
the_delta_in_ticks) {
slave->link = BOND_LINK_UP;
write_lock(&bond->ptrlock);
if ((bond->current_slave == NULL) &&
((jiffies - slave->dev->trans_start) <=
the_delta_in_ticks)) {
bond->current_slave = slave; bond->current_slave = slave;
bond_set_slave_active_flags(slave);
bond_mc_update(bond, slave, NULL);
bond->current_arp_slave = NULL;
} else if (bond->current_slave != slave) {
/* this slave has just come up but we
* already have a current slave; this
* can also happen if bond_enslave adds
* a new slave that is up while we are
* searching for a new slave
*/
bond_set_slave_inactive_flags(slave);
bond->current_arp_slave = NULL;
} }
if (slave != bond->current_slave) {
slave->dev->flags |= IFF_NOARP; if (slave == bond->current_slave) {
printk(KERN_INFO
"%s: %s is up and now the "
"active interface\n",
master->name,
slave->dev->name);
} else {
printk(KERN_INFO
"%s: backup interface %s is "
"now up\n",
master->name,
slave->dev->name);
} }
write_unlock(&bond->ptrlock); write_unlock(&bond->ptrlock);
} else { }
if ((jiffies - slave->dev->last_rx) <= } else {
the_delta_in_ticks) { read_lock(&bond->ptrlock);
arp_send(ARPOP_REQUEST, ETH_P_ARP, if ((slave != bond->current_slave) &&
arp_target, slave->dev, (bond->current_arp_slave == NULL) &&
my_ip, arp_target_hw_addr, (((jiffies - slave->dev->last_rx) >=
slave->dev->dev_addr, 3*the_delta_in_ticks) && (my_ip != 0))) {
arp_target_hw_addr); /* a backup slave has gone down; three times
* the delta allows the current slave to be
* taken out before the backup slave.
* note: a non-null current_arp_slave indicates
* the current_slave went down and we are
* searching for a new one; under this
* condition we only take the current_slave
* down - this gives each slave a chance to
* tx/rx traffic before being taken out
*/
read_unlock(&bond->ptrlock);
slave->link = BOND_LINK_DOWN;
if (slave->link_failure_count < UINT_MAX) {
slave->link_failure_count++;
} }
bond_set_slave_inactive_flags(slave);
printk(KERN_INFO
"%s: backup interface %s is now down\n",
master->name,
slave->dev->name);
} else {
read_unlock(&bond->ptrlock);
} }
} else }
read_unlock(&bond->ptrlock);
} }
read_lock(&bond->ptrlock); read_lock(&bond->ptrlock);
slave = bond->current_slave; slave = bond->current_slave;
read_unlock(&bond->ptrlock); read_unlock(&bond->ptrlock);
if (slave != 0) { if (slave != NULL) {
/* see if you need to take down the current_slave, since
* you haven't seen an arp in 2*arp_intervals
*/
if ( ((jiffies - slave->dev->trans_start) >=
(2*the_delta_in_ticks)) ||
((jiffies - slave->dev->last_rx) >=
(2*the_delta_in_ticks)) ) {
if (slave->link == BOND_LINK_UP) { /* if we have sent traffic in the past 2*arp_intervals but
slave->link = BOND_LINK_DOWN; * haven't xmit and rx traffic in that time interval, select
slave->state = BOND_STATE_BACKUP; * a different slave. slave->jiffies is only updated when
/* * a slave first becomes the current_slave - not necessarily
* we want to see arps, otherwise we couldn't * after every arp; this ensures the slave has a full 2*delta
* bring the adapter back online... * before being taken out. if a primary is being used, check
*/ * if it is up and needs to take over as the current_slave
printk(KERN_INFO "%s: link status definitely " */
"down for interface %s, " if ((((jiffies - slave->dev->trans_start) >=
"disabling it", (2*the_delta_in_ticks)) ||
slave->dev->master->name, (((jiffies - slave->dev->last_rx) >=
slave->dev->name); (2*the_delta_in_ticks)) && (my_ip != 0))) &&
/* find a new interface and be verbose */ ((jiffies - slave->jiffies) >= 2*the_delta_in_ticks)) {
change_active_interface(bond);
read_lock(&bond->ptrlock); slave->link = BOND_LINK_DOWN;
slave = bond->current_slave; if (slave->link_failure_count < UINT_MAX) {
read_unlock(&bond->ptrlock); slave->link_failure_count++;
}
printk(KERN_INFO "%s: link status down for "
"active interface %s, disabling it",
master->name,
slave->dev->name);
slave = change_active_interface(bond);
bond->current_arp_slave = slave;
if (slave != NULL) {
slave->jiffies = jiffies;
} }
}
/* } else if ((bond->primary_slave != NULL) &&
* ok, we know up/down, so just send a arp out if there has (bond->primary_slave != slave) &&
* been no activity for a while (bond->primary_slave->link == BOND_LINK_UP)) {
/* at this point, slave is the current_slave */
printk(KERN_INFO
"%s: changing from interface %s to primary "
"interface %s\n",
master->name,
slave->dev->name,
bond->primary_slave->dev->name);
/* primary is up so switch to it */
bond_set_slave_inactive_flags(slave);
bond_mc_update(bond, bond->primary_slave, slave);
write_lock(&bond->ptrlock);
bond->current_slave = bond->primary_slave;
write_unlock(&bond->ptrlock);
slave = bond->primary_slave;
bond_set_slave_active_flags(slave);
slave->jiffies = jiffies;
} else {
bond->current_arp_slave = NULL;
}
/* the current slave must tx an arp to ensure backup slaves
* rx traffic
*/ */
if ((slave != NULL) &&
(((jiffies - slave->dev->last_rx) >= the_delta_in_ticks) &&
(my_ip != 0))) {
arp_send_all(slave);
}
}
if (slave != NULL ) { /* if we don't have a current_slave, search for the next available
if ( ((jiffies - slave->dev->trans_start) >= * backup slave from the current_arp_slave and make it the candidate
the_delta_in_ticks) || * for becoming the current_slave
((jiffies - slave->dev->last_rx) >= */
the_delta_in_ticks) ) { if (slave == NULL) {
arp_send(ARPOP_REQUEST, ETH_P_ARP,
arp_target, slave->dev, if ((bond->current_arp_slave == NULL) ||
my_ip, arp_target_hw_addr, (bond->current_arp_slave == (slave_t *)bond)) {
slave->dev->dev_addr, bond->current_arp_slave = bond->prev;
arp_target_hw_addr);
}
} }
} if (bond->current_arp_slave != (slave_t *)bond) {
bond_set_slave_inactive_flags(bond->current_arp_slave);
slave = bond->current_arp_slave->next;
/* search for next candidate */
do {
if (IS_UP(slave->dev)) {
slave->link = BOND_LINK_BACK;
bond_set_slave_active_flags(slave);
arp_send_all(slave);
slave->jiffies = jiffies;
bond->current_arp_slave = slave;
break;
}
/* if we have no current slave.. try sending /* if the link state is up at this point, we
* an arp on all of the interfaces * mark it down - this can happen if we have
*/ * simultaneous link failures and
* change_active_interface doesn't make this
* one the current slave so it is still marked
* up when it is actually down
*/
if (slave->link == BOND_LINK_UP) {
slave->link = BOND_LINK_DOWN;
if (slave->link_failure_count <
UINT_MAX) {
slave->link_failure_count++;
}
read_lock(&bond->ptrlock); bond_set_slave_inactive_flags(slave);
if (bond->current_slave == NULL) { printk(KERN_INFO
read_unlock(&bond->ptrlock); "%s: backup interface "
slave = (slave_t *)bond; "%s is now down.\n",
while ((slave = slave->prev) != (slave_t *)bond) { master->name,
arp_send(ARPOP_REQUEST, ETH_P_ARP, arp_target, slave->dev->name);
slave->dev, my_ip, arp_target_hw_addr, }
slave->dev->dev_addr, arp_target_hw_addr); } while ((slave = slave->next) !=
bond->current_arp_slave->next);
} }
} }
else {
read_unlock(&bond->ptrlock);
}
rtnl_exunlock();
rtnl_shunlock();
arp_monitor_out:
read_unlock_irqrestore(&bond->lock, flags);
/* re-arm the timer */
mod_timer(&bond->arp_timer, next_timer); mod_timer(&bond->arp_timer, next_timer);
read_unlock_irqrestore(&bond->lock, flags);
} }
#define isdigit(c) (c >= '0' && c <= '9') #define isdigit(c) (c >= '0' && c <= '9')
__inline static int atoi( char **s) __inline static int atoi( char **s)
{ {
...@@ -1720,7 +2312,7 @@ static int bond_slave_info_query(struct net_device *master, ...@@ -1720,7 +2312,7 @@ static int bond_slave_info_query(struct net_device *master,
} }
read_unlock_irqrestore(&bond->lock, flags); read_unlock_irqrestore(&bond->lock, flags);
if (cur_ndx == info->slave_id) { if (slave != (slave_t *)bond) {
strcpy(info->slave_name, slave->dev->name); strcpy(info->slave_name, slave->dev->name);
info->link = slave->link; info->link = slave->link;
info->state = slave->state; info->state = slave->state;
...@@ -1737,7 +2329,7 @@ static int bond_ioctl(struct net_device *master_dev, struct ifreq *ifr, int cmd) ...@@ -1737,7 +2329,7 @@ static int bond_ioctl(struct net_device *master_dev, struct ifreq *ifr, int cmd)
struct net_device *slave_dev = NULL; struct net_device *slave_dev = NULL;
struct ifbond *u_binfo = NULL, k_binfo; struct ifbond *u_binfo = NULL, k_binfo;
struct ifslave *u_sinfo = NULL, k_sinfo; struct ifslave *u_sinfo = NULL, k_sinfo;
u16 *data = NULL; struct mii_ioctl_data *mii = NULL;
int ret = 0; int ret = 0;
#ifdef BONDING_DEBUG #ifdef BONDING_DEBUG
...@@ -1747,23 +2339,23 @@ static int bond_ioctl(struct net_device *master_dev, struct ifreq *ifr, int cmd) ...@@ -1747,23 +2339,23 @@ static int bond_ioctl(struct net_device *master_dev, struct ifreq *ifr, int cmd)
switch (cmd) { switch (cmd) {
case SIOCGMIIPHY: case SIOCGMIIPHY:
data = (u16 *)ifr->ifr_data; mii = (struct mii_ioctl_data *)&ifr->ifr_data;
if (data == NULL) { if (mii == NULL) {
return -EINVAL; return -EINVAL;
} }
data[0] = 0; mii->phy_id = 0;
/* Fall Through */ /* Fall Through */
case SIOCGMIIREG: case SIOCGMIIREG:
/* /*
* We do this again just in case we were called by SIOCGMIIREG * We do this again just in case we were called by SIOCGMIIREG
* instead of SIOCGMIIPHY. * instead of SIOCGMIIPHY.
*/ */
data = (u16 *)ifr->ifr_data; mii = (struct mii_ioctl_data *)&ifr->ifr_data;
if (data == NULL) { if (mii == NULL) {
return -EINVAL; return -EINVAL;
} }
if (data[1] == 1) { if (mii->reg_num == 1) {
data[3] = bond_check_mii_link( mii->val_out = bond_check_mii_link(
(struct bonding *)master_dev->priv); (struct bonding *)master_dev->priv);
} }
return 0; return 0;
...@@ -1846,6 +2438,65 @@ static int bond_accept_fastpath(struct net_device *dev, struct dst_entry *dst) ...@@ -1846,6 +2438,65 @@ static int bond_accept_fastpath(struct net_device *dev, struct dst_entry *dst)
} }
#endif #endif
/*
* in broadcast mode, we send everything to all usable interfaces.
*/
static int bond_xmit_broadcast(struct sk_buff *skb, struct net_device *dev)
{
slave_t *slave, *start_at;
struct bonding *bond = (struct bonding *) dev->priv;
unsigned long flags;
struct net_device *device_we_should_send_to = 0;
if (!IS_UP(dev)) { /* bond down */
dev_kfree_skb(skb);
return 0;
}
read_lock_irqsave(&bond->lock, flags);
read_lock(&bond->ptrlock);
slave = start_at = bond->current_slave;
read_unlock(&bond->ptrlock);
if (slave == NULL) { /* we're at the root, get the first slave */
/* no suitable interface, frame not sent */
read_unlock_irqrestore(&bond->lock, flags);
dev_kfree_skb(skb);
return 0;
}
do {
if (IS_UP(slave->dev)
&& (slave->link == BOND_LINK_UP)
&& (slave->state == BOND_STATE_ACTIVE)) {
if (device_we_should_send_to) {
struct sk_buff *skb2;
if ((skb2 = skb_clone(skb, GFP_ATOMIC)) == NULL) {
printk(KERN_ERR "bond_xmit_broadcast: skb_clone() failed\n");
continue;
}
skb2->dev = device_we_should_send_to;
skb2->priority = 1;
dev_queue_xmit(skb2);
}
device_we_should_send_to = slave->dev;
}
} while ((slave = slave->next) != start_at);
if (device_we_should_send_to) {
skb->dev = device_we_should_send_to;
skb->priority = 1;
dev_queue_xmit(skb);
} else
dev_kfree_skb(skb);
/* frame sent to all suitable interfaces */
read_unlock_irqrestore(&bond->lock, flags);
return 0;
}
static int bond_xmit_roundrobin(struct sk_buff *skb, struct net_device *dev) static int bond_xmit_roundrobin(struct sk_buff *skb, struct net_device *dev)
{ {
slave_t *slave, *start_at; slave_t *slave, *start_at;
...@@ -1978,14 +2629,25 @@ static int bond_xmit_activebackup(struct sk_buff *skb, struct net_device *dev) ...@@ -1978,14 +2629,25 @@ static int bond_xmit_activebackup(struct sk_buff *skb, struct net_device *dev)
} }
/* if we are sending arp packets and don't know /* if we are sending arp packets and don't know
the target hw address, save it so we don't need * the target hw address, save it so we don't need
to use a broadcast address */ * to use a broadcast address.
if ( (arp_interval > 0) && (arp_target_hw_addr == NULL) && * don't do this if in active backup mode because the slaves must
* receive packets to stay up, and the only ones they receive are
* broadcasts.
*/
if ( (mode != BOND_MODE_ACTIVEBACKUP) &&
(arp_ip_count == 1) &&
(arp_interval > 0) && (arp_target_hw_addr == NULL) &&
(skb->protocol == __constant_htons(ETH_P_IP) ) ) { (skb->protocol == __constant_htons(ETH_P_IP) ) ) {
struct ethhdr *eth_hdr = struct ethhdr *eth_hdr =
(struct ethhdr *) (((char *)skb->data)); (struct ethhdr *) (((char *)skb->data));
arp_target_hw_addr = kmalloc(ETH_ALEN, GFP_KERNEL); struct iphdr *ip_hdr = (struct iphdr *)(eth_hdr + 1);
memcpy(arp_target_hw_addr, eth_hdr->h_dest, ETH_ALEN);
if (arp_target[0] == ip_hdr->daddr) {
arp_target_hw_addr = kmalloc(ETH_ALEN, GFP_KERNEL);
if (arp_target_hw_addr != NULL)
memcpy(arp_target_hw_addr, eth_hdr->h_dest, ETH_ALEN);
}
} }
read_lock_irqsave(&bond->lock, flags); read_lock_irqsave(&bond->lock, flags);
...@@ -2074,29 +2736,7 @@ static int bond_get_info(char *buf, char **start, off_t offset, int length) ...@@ -2074,29 +2736,7 @@ static int bond_get_info(char *buf, char **start, off_t offset, int length)
*/ */
link = bond_check_mii_link(bond); link = bond_check_mii_link(bond);
len += sprintf(buf + len, "Bonding Mode: "); len += sprintf(buf + len, "Bonding Mode: %s\n", bond_mode());
switch (mode) {
case BOND_MODE_ACTIVEBACKUP:
len += sprintf(buf + len, "%s\n",
"active-backup");
break;
case BOND_MODE_ROUNDROBIN:
len += sprintf(buf + len, "%s\n",
"load balancing (round-robin)");
break;
case BOND_MODE_XOR:
len += sprintf(buf + len, "%s\n",
"load balancing (xor)");
break;
default:
len += sprintf(buf + len, "%s\n",
"unknown");
break;
}
if (mode == BOND_MODE_ACTIVEBACKUP) { if (mode == BOND_MODE_ACTIVEBACKUP) {
read_lock_irqsave(&bond->lock, flags); read_lock_irqsave(&bond->lock, flags);
...@@ -2115,8 +2755,11 @@ static int bond_get_info(char *buf, char **start, off_t offset, int length) ...@@ -2115,8 +2755,11 @@ static int bond_get_info(char *buf, char **start, off_t offset, int length)
link == BMSR_LSTATUS ? "up\n" : "down\n"); link == BMSR_LSTATUS ? "up\n" : "down\n");
len += sprintf(buf + len, "MII Polling Interval (ms): %d\n", len += sprintf(buf + len, "MII Polling Interval (ms): %d\n",
miimon); miimon);
len += sprintf(buf + len, "Up Delay (ms): %d\n", updelay); len += sprintf(buf + len, "Up Delay (ms): %d\n",
len += sprintf(buf + len, "Down Delay (ms): %d\n", downdelay); updelay * miimon);
len += sprintf(buf + len, "Down Delay (ms): %d\n",
downdelay * miimon);
len += sprintf(buf + len, "Multicast Mode: %s\n", multicast_mode());
read_lock_irqsave(&bond->lock, flags); read_lock_irqsave(&bond->lock, flags);
for (slave = bond->prev; slave != (slave_t *)bond; for (slave = bond->prev; slave != (slave_t *)bond;
...@@ -2205,6 +2848,7 @@ static struct notifier_block bond_netdev_notifier = { ...@@ -2205,6 +2848,7 @@ static struct notifier_block bond_netdev_notifier = {
static int __init bond_init(struct net_device *dev) static int __init bond_init(struct net_device *dev)
{ {
bonding_t *bond, *this_bond, *last_bond; bonding_t *bond, *this_bond, *last_bond;
int count;
#ifdef BONDING_DEBUG #ifdef BONDING_DEBUG
printk (KERN_INFO "Begin bond_init for %s\n", dev->name); printk (KERN_INFO "Begin bond_init for %s\n", dev->name);
...@@ -2228,6 +2872,7 @@ static int __init bond_init(struct net_device *dev) ...@@ -2228,6 +2872,7 @@ static int __init bond_init(struct net_device *dev)
bond->next = bond->prev = (slave_t *)bond; bond->next = bond->prev = (slave_t *)bond;
bond->current_slave = NULL; bond->current_slave = NULL;
bond->current_arp_slave = NULL;
bond->device = dev; bond->device = dev;
dev->priv = bond; dev->priv = bond;
...@@ -2238,6 +2883,8 @@ static int __init bond_init(struct net_device *dev) ...@@ -2238,6 +2883,8 @@ static int __init bond_init(struct net_device *dev)
dev->hard_start_xmit = bond_xmit_roundrobin; dev->hard_start_xmit = bond_xmit_roundrobin;
} else if (mode == BOND_MODE_XOR) { } else if (mode == BOND_MODE_XOR) {
dev->hard_start_xmit = bond_xmit_xor; dev->hard_start_xmit = bond_xmit_xor;
} else if (mode == BOND_MODE_BROADCAST) {
dev->hard_start_xmit = bond_xmit_broadcast;
} else { } else {
printk(KERN_ERR "Unknown bonding mode %d\n", mode); printk(KERN_ERR "Unknown bonding mode %d\n", mode);
kfree(bond->stats); kfree(bond->stats);
...@@ -2272,7 +2919,18 @@ static int __init bond_init(struct net_device *dev) ...@@ -2272,7 +2919,18 @@ static int __init bond_init(struct net_device *dev)
} else { } else {
printk("out MII link monitoring"); printk("out MII link monitoring");
} }
printk(", in %s mode.\n",mode?"active-backup":"bonding"); printk(", in %s mode.\n", bond_mode());
printk(KERN_INFO "%s registered with", dev->name);
if (arp_interval > 0) {
printk(" ARP monitoring set to %d ms with %d target(s):",
arp_interval, arp_ip_count);
for (count=0 ; count<arp_ip_count ; count++)
printk (" %s", arp_ip_target[count]);
printk("\n");
} else {
printk("out ARP monitoring\n");
}
#ifdef CONFIG_PROC_FS #ifdef CONFIG_PROC_FS
bond->bond_proc_dir = proc_mkdir(dev->name, proc_net); bond->bond_proc_dir = proc_mkdir(dev->name, proc_net);
...@@ -2329,6 +2987,8 @@ static int __init bonding_init(void) ...@@ -2329,6 +2987,8 @@ static int __init bonding_init(void)
/* Find a name for this unit */ /* Find a name for this unit */
static struct net_device *dev_bond = NULL; static struct net_device *dev_bond = NULL;
printk(KERN_INFO "%s", version);
if (max_bonds < 1 || max_bonds > INT_MAX) { if (max_bonds < 1 || max_bonds > INT_MAX) {
printk(KERN_WARNING printk(KERN_WARNING
"bonding_init(): max_bonds (%d) not in range %d-%d, " "bonding_init(): max_bonds (%d) not in range %d-%d, "
...@@ -2343,6 +3003,14 @@ static int __init bonding_init(void) ...@@ -2343,6 +3003,14 @@ static int __init bonding_init(void)
} }
memset(dev_bonds, 0, max_bonds*sizeof(struct net_device)); memset(dev_bonds, 0, max_bonds*sizeof(struct net_device));
if (miimon < 0) {
printk(KERN_WARNING
"bonding_init(): miimon module parameter (%d), "
"not in range 0-%d, so it was reset to %d\n",
miimon, INT_MAX, BOND_LINK_MON_INTERV);
miimon = BOND_LINK_MON_INTERV;
}
if (updelay < 0) { if (updelay < 0) {
printk(KERN_WARNING printk(KERN_WARNING
"bonding_init(): updelay module parameter (%d), " "bonding_init(): updelay module parameter (%d), "
...@@ -2359,6 +3027,52 @@ static int __init bonding_init(void) ...@@ -2359,6 +3027,52 @@ static int __init bonding_init(void)
downdelay = 0; downdelay = 0;
} }
if (miimon == 0) {
if ((updelay != 0) || (downdelay != 0)) {
/* just warn the user the up/down delay will have
* no effect since miimon is zero...
*/
printk(KERN_WARNING
"bonding_init(): miimon module parameter not "
"set and updelay (%d) or downdelay (%d) module "
"parameter is set; updelay and downdelay have "
"no effect unless miimon is set\n",
updelay, downdelay);
}
} else {
/* don't allow arp monitoring */
if (arp_interval != 0) {
printk(KERN_WARNING
"bonding_init(): miimon (%d) and arp_interval "
"(%d) can't be used simultaneously, "
"disabling ARP monitoring\n",
miimon, arp_interval);
arp_interval = 0;
}
if ((updelay % miimon) != 0) {
/* updelay will be rounded in bond_init() when it
* is divided by miimon, we just inform user here
*/
printk(KERN_WARNING
"bonding_init(): updelay (%d) is not a multiple "
"of miimon (%d), updelay rounded to %d ms\n",
updelay, miimon, (updelay / miimon) * miimon);
}
if ((downdelay % miimon) != 0) {
/* downdelay will be rounded in bond_init() when it
* is divided by miimon, we just inform user here
*/
printk(KERN_WARNING
"bonding_init(): downdelay (%d) is not a "
"multiple of miimon (%d), downdelay rounded "
"to %d ms\n",
downdelay, miimon,
(downdelay / miimon) * miimon);
}
}
if (arp_interval < 0) { if (arp_interval < 0) {
printk(KERN_WARNING printk(KERN_WARNING
"bonding_init(): arp_interval module parameter (%d), " "bonding_init(): arp_interval module parameter (%d), "
...@@ -2367,11 +3081,63 @@ static int __init bonding_init(void) ...@@ -2367,11 +3081,63 @@ static int __init bonding_init(void)
arp_interval = BOND_LINK_ARP_INTERV; arp_interval = BOND_LINK_ARP_INTERV;
} }
if (arp_ip_target) { for (arp_ip_count=0 ;
/* TODO: check and log bad ip address */ (arp_ip_count < MAX_ARP_IP_TARGETS) && arp_ip_target[arp_ip_count];
if (my_inet_aton(arp_ip_target, &arp_target) == 0) { arp_ip_count++ ) {
arp_interval = 0; /* TODO: check and log bad ip address */
if (my_inet_aton(arp_ip_target[arp_ip_count],
&arp_target[arp_ip_count]) == 0) {
printk(KERN_WARNING
"bonding_init(): bad arp_ip_target module "
"parameter (%s), ARP monitoring will not be "
"performed\n",
arp_ip_target[arp_ip_count]);
arp_interval = 0;
} }
}
if ( (arp_interval > 0) && (arp_ip_count==0)) {
/* don't allow arping if no arp_ip_target given... */
printk(KERN_WARNING
"bonding_init(): arp_interval module parameter "
"(%d) specified without providing an arp_ip_target "
"parameter, arp_interval was reset to 0\n",
arp_interval);
arp_interval = 0;
}
if ((miimon == 0) && (arp_interval == 0)) {
/* miimon and arp_interval not set, we need one so things
* work as expected, see bonding.txt for details
*/
printk(KERN_ERR
"bonding_init(): either miimon or "
"arp_interval and arp_ip_target module parameters "
"must be specified, otherwise bonding will not detect "
"link failures! see bonding.txt for details.\n");
}
if ((primary != NULL) && (mode != BOND_MODE_ACTIVEBACKUP)){
/* currently, using a primary only makes sence
* in active backup mode
*/
printk(KERN_WARNING
"bonding_init(): %s primary device specified but has "
" no effect in %s mode\n",
primary, bond_mode());
primary = NULL;
}
if (multicast != BOND_MULTICAST_DISABLED &&
multicast != BOND_MULTICAST_ACTIVE &&
multicast != BOND_MULTICAST_ALL) {
printk(KERN_WARNING
"bonding_init(): unknown multicast module "
"parameter (%d), multicast reset to %d\n",
multicast, BOND_MULTICAST_ALL);
multicast = BOND_MULTICAST_ALL;
} }
for (no = 0; no < max_bonds; no++) { for (no = 0; no < max_bonds; no++) {
...@@ -2420,6 +3186,7 @@ static void __exit bonding_exit(void) ...@@ -2420,6 +3186,7 @@ static void __exit bonding_exit(void)
module_init(bonding_init); module_init(bonding_init);
module_exit(bonding_exit); module_exit(bonding_exit);
MODULE_LICENSE("GPL"); MODULE_LICENSE("GPL");
MODULE_DESCRIPTION(DRV_DESCRIPTION ", v" DRV_VERSION);
/* /*
* Local variables: * Local variables:
......
...@@ -37,9 +37,10 @@ ...@@ -37,9 +37,10 @@
#define BOND_CHECK_MII_STATUS (SIOCGMIIPHY) #define BOND_CHECK_MII_STATUS (SIOCGMIIPHY)
#define BOND_MODE_ROUNDROBIN 0 #define BOND_MODE_ROUNDROBIN 0
#define BOND_MODE_ACTIVEBACKUP 1 #define BOND_MODE_ACTIVEBACKUP 1
#define BOND_MODE_XOR 2 #define BOND_MODE_XOR 2
#define BOND_MODE_BROADCAST 3
/* each slave's link has 4 states */ /* each slave's link has 4 states */
#define BOND_LINK_UP 0 /* link is up and running */ #define BOND_LINK_UP 0 /* link is up and running */
...@@ -74,6 +75,7 @@ typedef struct slave { ...@@ -74,6 +75,7 @@ typedef struct slave {
struct slave *prev; struct slave *prev;
struct net_device *dev; struct net_device *dev;
short delay; short delay;
unsigned long jiffies;
char link; /* one of BOND_LINK_XXXX */ char link; /* one of BOND_LINK_XXXX */
char state; /* one of BOND_STATE_XXXX */ char state; /* one of BOND_STATE_XXXX */
unsigned short original_flags; unsigned short original_flags;
...@@ -93,6 +95,8 @@ typedef struct bonding { ...@@ -93,6 +95,8 @@ typedef struct bonding {
slave_t *next; slave_t *next;
slave_t *prev; slave_t *prev;
slave_t *current_slave; slave_t *current_slave;
slave_t *primary_slave;
slave_t *current_arp_slave;
__s32 slave_cnt; __s32 slave_cnt;
rwlock_t lock; rwlock_t lock;
rwlock_t ptrlock; rwlock_t ptrlock;
......
...@@ -207,13 +207,13 @@ static struct pktgen_info pginfos[MAX_PKTGEN]; ...@@ -207,13 +207,13 @@ static struct pktgen_info pginfos[MAX_PKTGEN];
/** Convert to miliseconds */ /** Convert to miliseconds */
inline __u64 tv_to_ms(const struct timeval* tv) { static inline __u64 tv_to_ms(const struct timeval* tv) {
__u64 ms = tv->tv_usec / 1000; __u64 ms = tv->tv_usec / 1000;
ms += (__u64)tv->tv_sec * (__u64)1000; ms += (__u64)tv->tv_sec * (__u64)1000;
return ms; return ms;
} }
inline __u64 getCurMs(void) { static inline __u64 getCurMs(void) {
struct timeval tv; struct timeval tv;
do_gettimeofday(&tv); do_gettimeofday(&tv);
return tv_to_ms(&tv); return tv_to_ms(&tv);
...@@ -1277,7 +1277,7 @@ static int proc_write(struct file *file, const char *user_buffer, ...@@ -1277,7 +1277,7 @@ static int proc_write(struct file *file, const char *user_buffer,
} }
int create_proc_dir(void) static int create_proc_dir(void)
{ {
int len; int len;
/* does proc_dir already exists */ /* does proc_dir already exists */
...@@ -1295,7 +1295,7 @@ int create_proc_dir(void) ...@@ -1295,7 +1295,7 @@ int create_proc_dir(void)
return 1; return 1;
} }
int remove_proc_dir(void) static int remove_proc_dir(void)
{ {
remove_proc_entry(PG_PROC_DIR, proc_net); remove_proc_entry(PG_PROC_DIR, proc_net);
return 1; return 1;
......
...@@ -871,6 +871,7 @@ static void ndisc_router_discovery(struct sk_buff *skb) ...@@ -871,6 +871,7 @@ static void ndisc_router_discovery(struct sk_buff *skb)
} }
if (!ndisc_parse_options(opt, optlen, &ndopts)) { if (!ndisc_parse_options(opt, optlen, &ndopts)) {
in6_dev_put(in6_dev);
if (net_ratelimit()) if (net_ratelimit())
ND_PRINTK2(KERN_WARNING ND_PRINTK2(KERN_WARNING
"ICMP6 RA: invalid ND option, ignored.\n"); "ICMP6 RA: invalid ND option, ignored.\n");
......
/* /*
* net/key/pfkeyv2.c An implemenation of PF_KEYv2 sockets. * net/key/af_key.c An implementation of PF_KEYv2 sockets.
* *
* This program is free software; you can redistribute it and/or * This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License * modify it under the terms of the GNU General Public License
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment