About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Documentation / networking / bonding.txt




Custom Search

Based on kernel version 3.2. Page generated on 2012-01-05 23:29 EST.

1	
2			Linux Ethernet Bonding Driver HOWTO
3	
4			Latest update: 27 April 2011
5	
6	Initial release : Thomas Davis <tadavis at lbl.gov>
7	Corrections, HA extensions : 2000/10/03-15 :
8	  - Willy Tarreau <willy at meta-x.org>
9	  - Constantine Gavrilov <const-g at xpert.com>
10	  - Chad N. Tindel <ctindel at ieee dot org>
11	  - Janice Girouard <girouard at us dot ibm dot com>
12	  - Jay Vosburgh <fubar at us dot ibm dot com>
13	
14	Reorganized and updated Feb 2005 by Jay Vosburgh
15	Added Sysfs information: 2006/04/24
16	  - Mitch Williams <mitch.a.williams at intel.com>
17	
18	Introduction
19	============
20	
21		The Linux bonding driver provides a method for aggregating
22	multiple network interfaces into a single logical "bonded" interface.
23	The behavior of the bonded interfaces depends upon the mode; generally
24	speaking, modes provide either hot standby or load balancing services.
25	Additionally, link integrity monitoring may be performed.
26		
27		The bonding driver originally came from Donald Becker's
28	beowulf patches for kernel 2.0. It has changed quite a bit since, and
29	the original tools from extreme-linux and beowulf sites will not work
30	with this version of the driver.
31	
32		For new versions of the driver, updated userspace tools, and
33	who to ask for help, please follow the links at the end of this file.
34	
35	Table of Contents
36	=================
37	
38	1. Bonding Driver Installation
39	
40	2. Bonding Driver Options
41	
42	3. Configuring Bonding Devices
43	3.1	Configuration with Sysconfig Support
44	3.1.1		Using DHCP with Sysconfig
45	3.1.2		Configuring Multiple Bonds with Sysconfig
46	3.2	Configuration with Initscripts Support
47	3.2.1		Using DHCP with Initscripts
48	3.2.2		Configuring Multiple Bonds with Initscripts
49	3.3	Configuring Bonding Manually with Ifenslave
50	3.3.1		Configuring Multiple Bonds Manually
51	3.4	Configuring Bonding Manually via Sysfs
52	3.5	Configuration with Interfaces Support
53	3.6	Overriding Configuration for Special Cases
54	
55	4. Querying Bonding Configuration
56	4.1	Bonding Configuration
57	4.2	Network Configuration
58	
59	5. Switch Configuration
60	
61	6. 802.1q VLAN Support
62	
63	7. Link Monitoring
64	7.1	ARP Monitor Operation
65	7.2	Configuring Multiple ARP Targets
66	7.3	MII Monitor Operation
67	
68	8. Potential Trouble Sources
69	8.1	Adventures in Routing
70	8.2	Ethernet Device Renaming
71	8.3	Painfully Slow Or No Failed Link Detection By Miimon
72	
73	9. SNMP agents
74	
75	10. Promiscuous mode
76	
77	11. Configuring Bonding for High Availability
78	11.1	High Availability in a Single Switch Topology
79	11.2	High Availability in a Multiple Switch Topology
80	11.2.1		HA Bonding Mode Selection for Multiple Switch Topology
81	11.2.2		HA Link Monitoring for Multiple Switch Topology
82	
83	12. Configuring Bonding for Maximum Throughput
84	12.1	Maximum Throughput in a Single Switch Topology
85	12.1.1		MT Bonding Mode Selection for Single Switch Topology
86	12.1.2		MT Link Monitoring for Single Switch Topology
87	12.2	Maximum Throughput in a Multiple Switch Topology
88	12.2.1		MT Bonding Mode Selection for Multiple Switch Topology
89	12.2.2		MT Link Monitoring for Multiple Switch Topology
90	
91	13. Switch Behavior Issues
92	13.1	Link Establishment and Failover Delays
93	13.2	Duplicated Incoming Packets
94	
95	14. Hardware Specific Considerations
96	14.1	IBM BladeCenter
97	
98	15. Frequently Asked Questions
99	
100	16. Resources and Links
101	
102	
103	1. Bonding Driver Installation
104	==============================
105	
106		Most popular distro kernels ship with the bonding driver
107	already available as a module and the ifenslave user level control
108	program installed and ready for use. If your distro does not, or you
109	have need to compile bonding from source (e.g., configuring and
110	installing a mainline kernel from kernel.org), you'll need to perform
111	the following steps:
112	
113	1.1 Configure and build the kernel with bonding
114	-----------------------------------------------
115	
116		The current version of the bonding driver is available in the
117	drivers/net/bonding subdirectory of the most recent kernel source
118	(which is available on http://kernel.org).  Most users "rolling their
119	own" will want to use the most recent kernel from kernel.org.
120	
121		Configure kernel with "make menuconfig" (or "make xconfig" or
122	"make config"), then select "Bonding driver support" in the "Network
123	device support" section.  It is recommended that you configure the
124	driver as module since it is currently the only way to pass parameters
125	to the driver or configure more than one bonding device.
126	
127		Build and install the new kernel and modules, then continue
128	below to install ifenslave.
129	
130	1.2 Install ifenslave Control Utility
131	-------------------------------------
132	
133		The ifenslave user level control program is included in the
134	kernel source tree, in the file Documentation/networking/ifenslave.c.
135	It is generally recommended that you use the ifenslave that
136	corresponds to the kernel that you are using (either from the same
137	source tree or supplied with the distro), however, ifenslave
138	executables from older kernels should function (but features newer
139	than the ifenslave release are not supported).  Running an ifenslave
140	that is newer than the kernel is not supported, and may or may not
141	work.
142	
143		To install ifenslave, do the following:
144	
145	# gcc -Wall -O -I/usr/src/linux/include ifenslave.c -o ifenslave
146	# cp ifenslave /sbin/ifenslave
147	
148		If your kernel source is not in "/usr/src/linux," then replace
149	"/usr/src/linux/include" in the above with the location of your kernel
150	source include directory.
151	
152		You may wish to back up any existing /sbin/ifenslave, or, for
153	testing or informal use, tag the ifenslave to the kernel version
154	(e.g., name the ifenslave executable /sbin/ifenslave-2.6.10).
155	
156	IMPORTANT NOTE:
157	
158		If you omit the "-I" or specify an incorrect directory, you
159	may end up with an ifenslave that is incompatible with the kernel
160	you're trying to build it for.  Some distros (e.g., Red Hat from 7.1
161	onwards) do not have /usr/include/linux symbolically linked to the
162	default kernel source include directory.
163	
164	SECOND IMPORTANT NOTE:
165		If you plan to configure bonding using sysfs or using the
166	/etc/network/interfaces file, you do not need to use ifenslave.
167	
168	2. Bonding Driver Options
169	=========================
170	
171		Options for the bonding driver are supplied as parameters to the
172	bonding module at load time, or are specified via sysfs.
173	
174		Module options may be given as command line arguments to the
175	insmod or modprobe command, but are usually specified in either the
176	/etc/modules.conf or /etc/modprobe.conf configuration file, or in a
177	distro-specific configuration file (some of which are detailed in the next
178	section).
179	
180		Details on bonding support for sysfs is provided in the
181	"Configuring Bonding Manually via Sysfs" section, below.
182	
183		The available bonding driver parameters are listed below. If a
184	parameter is not specified the default value is used.  When initially
185	configuring a bond, it is recommended "tail -f /var/log/messages" be
186	run in a separate window to watch for bonding driver error messages.
187	
188		It is critical that either the miimon or arp_interval and
189	arp_ip_target parameters be specified, otherwise serious network
190	degradation will occur during link failures.  Very few devices do not
191	support at least miimon, so there is really no reason not to use it.
192	
193		Options with textual values will accept either the text name
194	or, for backwards compatibility, the option value.  E.g.,
195	"mode=802.3ad" and "mode=4" set the same mode.
196	
197		The parameters are as follows:
198	
199	ad_select
200	
201		Specifies the 802.3ad aggregation selection logic to use.  The
202		possible values and their effects are:
203	
204		stable or 0
205	
206			The active aggregator is chosen by largest aggregate
207			bandwidth.
208	
209			Reselection of the active aggregator occurs only when all
210			slaves of the active aggregator are down or the active
211			aggregator has no slaves.
212	
213			This is the default value.
214	
215		bandwidth or 1
216	
217			The active aggregator is chosen by largest aggregate
218			bandwidth.  Reselection occurs if:
219	
220			- A slave is added to or removed from the bond
221	
222			- Any slave's link state changes
223	
224			- Any slave's 802.3ad association state changes
225	
226			- The bond's administrative state changes to up
227	
228		count or 2
229	
230			The active aggregator is chosen by the largest number of
231			ports (slaves).  Reselection occurs as described under the
232			"bandwidth" setting, above.
233	
234		The bandwidth and count selection policies permit failover of
235		802.3ad aggregations when partial failure of the active aggregator
236		occurs.  This keeps the aggregator with the highest availability
237		(either in bandwidth or in number of ports) active at all times.
238	
239		This option was added in bonding version 3.4.0.
240	
241	all_slaves_active
242	
243		Specifies that duplicate frames (received on inactive ports) should be
244		dropped (0) or delivered (1).
245	
246		Normally, bonding will drop duplicate frames (received on inactive
247		ports), which is desirable for most users. But there are some times
248		it is nice to allow duplicate frames to be delivered.
249	
250		The default value is 0 (drop duplicate frames received on inactive
251		ports).
252	
253	arp_interval
254	
255		Specifies the ARP link monitoring frequency in milliseconds.
256	
257		The ARP monitor works by periodically checking the slave
258		devices to determine whether they have sent or received
259		traffic recently (the precise criteria depends upon the
260		bonding mode, and the state of the slave).  Regular traffic is
261		generated via ARP probes issued for the addresses specified by
262		the arp_ip_target option.
263	
264		This behavior can be modified by the arp_validate option,
265		below.
266	
267		If ARP monitoring is used in an etherchannel compatible mode
268		(modes 0 and 2), the switch should be configured in a mode
269		that evenly distributes packets across all links. If the
270		switch is configured to distribute the packets in an XOR
271		fashion, all replies from the ARP targets will be received on
272		the same link which could cause the other team members to
273		fail.  ARP monitoring should not be used in conjunction with
274		miimon.  A value of 0 disables ARP monitoring.  The default
275		value is 0.
276	
277	arp_ip_target
278	
279		Specifies the IP addresses to use as ARP monitoring peers when
280		arp_interval is > 0.  These are the targets of the ARP request
281		sent to determine the health of the link to the targets.
282		Specify these values in ddd.ddd.ddd.ddd format.  Multiple IP
283		addresses must be separated by a comma.  At least one IP
284		address must be given for ARP monitoring to function.  The
285		maximum number of targets that can be specified is 16.  The
286		default value is no IP addresses.
287	
288	arp_validate
289	
290		Specifies whether or not ARP probes and replies should be
291		validated in the active-backup mode.  This causes the ARP
292		monitor to examine the incoming ARP requests and replies, and
293		only consider a slave to be up if it is receiving the
294		appropriate ARP traffic.
295	
296		Possible values are:
297	
298		none or 0
299	
300			No validation is performed.  This is the default.
301	
302		active or 1
303	
304			Validation is performed only for the active slave.
305	
306		backup or 2
307	
308			Validation is performed only for backup slaves.
309	
310		all or 3
311	
312			Validation is performed for all slaves.
313	
314		For the active slave, the validation checks ARP replies to
315		confirm that they were generated by an arp_ip_target.  Since
316		backup slaves do not typically receive these replies, the
317		validation performed for backup slaves is on the ARP request
318		sent out via the active slave.  It is possible that some
319		switch or network configurations may result in situations
320		wherein the backup slaves do not receive the ARP requests; in
321		such a situation, validation of backup slaves must be
322		disabled.
323	
324		This option is useful in network configurations in which
325		multiple bonding hosts are concurrently issuing ARPs to one or
326		more targets beyond a common switch.  Should the link between
327		the switch and target fail (but not the switch itself), the
328		probe traffic generated by the multiple bonding instances will
329		fool the standard ARP monitor into considering the links as
330		still up.  Use of the arp_validate option can resolve this, as
331		the ARP monitor will only consider ARP requests and replies
332		associated with its own instance of bonding.
333	
334		This option was added in bonding version 3.1.0.
335	
336	downdelay
337	
338		Specifies the time, in milliseconds, to wait before disabling
339		a slave after a link failure has been detected.  This option
340		is only valid for the miimon link monitor.  The downdelay
341		value should be a multiple of the miimon value; if not, it
342		will be rounded down to the nearest multiple.  The default
343		value is 0.
344	
345	fail_over_mac
346	
347		Specifies whether active-backup mode should set all slaves to
348		the same MAC address at enslavement (the traditional
349		behavior), or, when enabled, perform special handling of the
350		bond's MAC address in accordance with the selected policy.
351	
352		Possible values are:
353	
354		none or 0
355	
356			This setting disables fail_over_mac, and causes
357			bonding to set all slaves of an active-backup bond to
358			the same MAC address at enslavement time.  This is the
359			default.
360	
361		active or 1
362	
363			The "active" fail_over_mac policy indicates that the
364			MAC address of the bond should always be the MAC
365			address of the currently active slave.  The MAC
366			address of the slaves is not changed; instead, the MAC
367			address of the bond changes during a failover.
368	
369			This policy is useful for devices that cannot ever
370			alter their MAC address, or for devices that refuse
371			incoming broadcasts with their own source MAC (which
372			interferes with the ARP monitor).
373	
374			The down side of this policy is that every device on
375			the network must be updated via gratuitous ARP,
376			vs. just updating a switch or set of switches (which
377			often takes place for any traffic, not just ARP
378			traffic, if the switch snoops incoming traffic to
379			update its tables) for the traditional method.  If the
380			gratuitous ARP is lost, communication may be
381			disrupted.
382	
383			When this policy is used in conjunction with the mii
384			monitor, devices which assert link up prior to being
385			able to actually transmit and receive are particularly
386			susceptible to loss of the gratuitous ARP, and an
387			appropriate updelay setting may be required.
388	
389		follow or 2
390	
391			The "follow" fail_over_mac policy causes the MAC
392			address of the bond to be selected normally (normally
393			the MAC address of the first slave added to the bond).
394			However, the second and subsequent slaves are not set
395			to this MAC address while they are in a backup role; a
396			slave is programmed with the bond's MAC address at
397			failover time (and the formerly active slave receives
398			the newly active slave's MAC address).
399	
400			This policy is useful for multiport devices that
401			either become confused or incur a performance penalty
402			when multiple ports are programmed with the same MAC
403			address.
404	
405	
406		The default policy is none, unless the first slave cannot
407		change its MAC address, in which case the active policy is
408		selected by default.
409	
410		This option may be modified via sysfs only when no slaves are
411		present in the bond.
412	
413		This option was added in bonding version 3.2.0.  The "follow"
414		policy was added in bonding version 3.3.0.
415	
416	lacp_rate
417	
418		Option specifying the rate in which we'll ask our link partner
419		to transmit LACPDU packets in 802.3ad mode.  Possible values
420		are:
421	
422		slow or 0
423			Request partner to transmit LACPDUs every 30 seconds
424	
425		fast or 1
426			Request partner to transmit LACPDUs every 1 second
427	
428		The default is slow.
429	
430	max_bonds
431	
432		Specifies the number of bonding devices to create for this
433		instance of the bonding driver.  E.g., if max_bonds is 3, and
434		the bonding driver is not already loaded, then bond0, bond1
435		and bond2 will be created.  The default value is 1.  Specifying
436		a value of 0 will load bonding, but will not create any devices.
437	
438	miimon
439	
440		Specifies the MII link monitoring frequency in milliseconds.
441		This determines how often the link state of each slave is
442		inspected for link failures.  A value of zero disables MII
443		link monitoring.  A value of 100 is a good starting point.
444		The use_carrier option, below, affects how the link state is
445		determined.  See the High Availability section for additional
446		information.  The default value is 0.
447	
448	min_links
449	
450		Specifies the minimum number of links that must be active before
451		asserting carrier. It is similar to the Cisco EtherChannel min-links
452		feature. This allows setting the minimum number of member ports that
453		must be up (link-up state) before marking the bond device as up
454		(carrier on). This is useful for situations where higher level services
455		such as clustering want to ensure a minimum number of low bandwidth
456		links are active before switchover. This option only affect 802.3ad
457		mode.
458	
459		The default value is 0. This will cause carrier to be asserted (for
460		802.3ad mode) whenever there is an active aggregator, regardless of the
461		number of available links in that aggregator. Note that, because an
462		aggregator cannot be active without at least one available link,
463		setting this option to 0 or to 1 has the exact same effect.
464	
465	mode
466	
467		Specifies one of the bonding policies. The default is
468		balance-rr (round robin).  Possible values are:
469	
470		balance-rr or 0
471	
472			Round-robin policy: Transmit packets in sequential
473			order from the first available slave through the
474			last.  This mode provides load balancing and fault
475			tolerance.
476	
477		active-backup or 1
478	
479			Active-backup policy: Only one slave in the bond is
480			active.  A different slave becomes active if, and only
481			if, the active slave fails.  The bond's MAC address is
482			externally visible on only one port (network adapter)
483			to avoid confusing the switch.
484	
485			In bonding version 2.6.2 or later, when a failover
486			occurs in active-backup mode, bonding will issue one
487			or more gratuitous ARPs on the newly active slave.
488			One gratuitous ARP is issued for the bonding master
489			interface and each VLAN interfaces configured above
490			it, provided that the interface has at least one IP
491			address configured.  Gratuitous ARPs issued for VLAN
492			interfaces are tagged with the appropriate VLAN id.
493	
494			This mode provides fault tolerance.  The primary
495			option, documented below, affects the behavior of this
496			mode.
497	
498		balance-xor or 2
499	
500			XOR policy: Transmit based on the selected transmit
501			hash policy.  The default policy is a simple [(source
502			MAC address XOR'd with destination MAC address) modulo
503			slave count].  Alternate transmit policies may be
504			selected via the xmit_hash_policy option, described
505			below.
506	
507			This mode provides load balancing and fault tolerance.
508	
509		broadcast or 3
510	
511			Broadcast policy: transmits everything on all slave
512			interfaces.  This mode provides fault tolerance.
513	
514		802.3ad or 4
515	
516			IEEE 802.3ad Dynamic link aggregation.  Creates
517			aggregation groups that share the same speed and
518			duplex settings.  Utilizes all slaves in the active
519			aggregator according to the 802.3ad specification.
520	
521			Slave selection for outgoing traffic is done according
522			to the transmit hash policy, which may be changed from
523			the default simple XOR policy via the xmit_hash_policy
524			option, documented below.  Note that not all transmit
525			policies may be 802.3ad compliant, particularly in
526			regards to the packet mis-ordering requirements of
527			section 43.2.4 of the 802.3ad standard.  Differing
528			peer implementations will have varying tolerances for
529			noncompliance.
530	
531			Prerequisites:
532	
533			1. Ethtool support in the base drivers for retrieving
534			the speed and duplex of each slave.
535	
536			2. A switch that supports IEEE 802.3ad Dynamic link
537			aggregation.
538	
539			Most switches will require some type of configuration
540			to enable 802.3ad mode.
541	
542		balance-tlb or 5
543	
544			Adaptive transmit load balancing: channel bonding that
545			does not require any special switch support.  The
546			outgoing traffic is distributed according to the
547			current load (computed relative to the speed) on each
548			slave.  Incoming traffic is received by the current
549			slave.  If the receiving slave fails, another slave
550			takes over the MAC address of the failed receiving
551			slave.
552	
553			Prerequisite:
554	
555			Ethtool support in the base drivers for retrieving the
556			speed of each slave.
557	
558		balance-alb or 6
559	
560			Adaptive load balancing: includes balance-tlb plus
561			receive load balancing (rlb) for IPV4 traffic, and
562			does not require any special switch support.  The
563			receive load balancing is achieved by ARP negotiation.
564			The bonding driver intercepts the ARP Replies sent by
565			the local system on their way out and overwrites the
566			source hardware address with the unique hardware
567			address of one of the slaves in the bond such that
568			different peers use different hardware addresses for
569			the server.
570	
571			Receive traffic from connections created by the server
572			is also balanced.  When the local system sends an ARP
573			Request the bonding driver copies and saves the peer's
574			IP information from the ARP packet.  When the ARP
575			Reply arrives from the peer, its hardware address is
576			retrieved and the bonding driver initiates an ARP
577			reply to this peer assigning it to one of the slaves
578			in the bond.  A problematic outcome of using ARP
579			negotiation for balancing is that each time that an
580			ARP request is broadcast it uses the hardware address
581			of the bond.  Hence, peers learn the hardware address
582			of the bond and the balancing of receive traffic
583			collapses to the current slave.  This is handled by
584			sending updates (ARP Replies) to all the peers with
585			their individually assigned hardware address such that
586			the traffic is redistributed.  Receive traffic is also
587			redistributed when a new slave is added to the bond
588			and when an inactive slave is re-activated.  The
589			receive load is distributed sequentially (round robin)
590			among the group of highest speed slaves in the bond.
591	
592			When a link is reconnected or a new slave joins the
593			bond the receive traffic is redistributed among all
594			active slaves in the bond by initiating ARP Replies
595			with the selected MAC address to each of the
596			clients. The updelay parameter (detailed below) must
597			be set to a value equal or greater than the switch's
598			forwarding delay so that the ARP Replies sent to the
599			peers will not be blocked by the switch.
600	
601			Prerequisites:
602	
603			1. Ethtool support in the base drivers for retrieving
604			the speed of each slave.
605	
606			2. Base driver support for setting the hardware
607			address of a device while it is open.  This is
608			required so that there will always be one slave in the
609			team using the bond hardware address (the
610			curr_active_slave) while having a unique hardware
611			address for each slave in the bond.  If the
612			curr_active_slave fails its hardware address is
613			swapped with the new curr_active_slave that was
614			chosen.
615	
616	num_grat_arp
617	num_unsol_na
618	
619		Specify the number of peer notifications (gratuitous ARPs and
620		unsolicited IPv6 Neighbor Advertisements) to be issued after a
621		failover event.  As soon as the link is up on the new slave
622		(possibly immediately) a peer notification is sent on the
623		bonding device and each VLAN sub-device.  This is repeated at
624		each link monitor interval (arp_interval or miimon, whichever
625		is active) if the number is greater than 1.
626	
627		The valid range is 0 - 255; the default value is 1.  These options
628		affect only the active-backup mode.  These options were added for
629		bonding versions 3.3.0 and 3.4.0 respectively.
630	
631		From Linux 3.0 and bonding version 3.7.1, these notifications
632		are generated by the ipv4 and ipv6 code and the numbers of
633		repetitions cannot be set independently.
634	
635	primary
636	
637		A string (eth0, eth2, etc) specifying which slave is the
638		primary device.  The specified device will always be the
639		active slave while it is available.  Only when the primary is
640		off-line will alternate devices be used.  This is useful when
641		one slave is preferred over another, e.g., when one slave has
642		higher throughput than another.
643	
644		The primary option is only valid for active-backup mode.
645	
646	primary_reselect
647	
648		Specifies the reselection policy for the primary slave.  This
649		affects how the primary slave is chosen to become the active slave
650		when failure of the active slave or recovery of the primary slave
651		occurs.  This option is designed to prevent flip-flopping between
652		the primary slave and other slaves.  Possible values are:
653	
654		always or 0 (default)
655	
656			The primary slave becomes the active slave whenever it
657			comes back up.
658	
659		better or 1
660	
661			The primary slave becomes the active slave when it comes
662			back up, if the speed and duplex of the primary slave is
663			better than the speed and duplex of the current active
664			slave.
665	
666		failure or 2
667	
668			The primary slave becomes the active slave only if the
669			current active slave fails and the primary slave is up.
670	
671		The primary_reselect setting is ignored in two cases:
672	
673			If no slaves are active, the first slave to recover is
674			made the active slave.
675	
676			When initially enslaved, the primary slave is always made
677			the active slave.
678	
679		Changing the primary_reselect policy via sysfs will cause an
680		immediate selection of the best active slave according to the new
681		policy.  This may or may not result in a change of the active
682		slave, depending upon the circumstances.
683	
684		This option was added for bonding version 3.6.0.
685	
686	updelay
687	
688		Specifies the time, in milliseconds, to wait before enabling a
689		slave after a link recovery has been detected.  This option is
690		only valid for the miimon link monitor.  The updelay value
691		should be a multiple of the miimon value; if not, it will be
692		rounded down to the nearest multiple.  The default value is 0.
693	
694	use_carrier
695	
696		Specifies whether or not miimon should use MII or ETHTOOL
697		ioctls vs. netif_carrier_ok() to determine the link
698		status. The MII or ETHTOOL ioctls are less efficient and
699		utilize a deprecated calling sequence within the kernel.  The
700		netif_carrier_ok() relies on the device driver to maintain its
701		state with netif_carrier_on/off; at this writing, most, but
702		not all, device drivers support this facility.
703	
704		If bonding insists that the link is up when it should not be,
705		it may be that your network device driver does not support
706		netif_carrier_on/off.  The default state for netif_carrier is
707		"carrier on," so if a driver does not support netif_carrier,
708		it will appear as if the link is always up.  In this case,
709		setting use_carrier to 0 will cause bonding to revert to the
710		MII / ETHTOOL ioctl method to determine the link state.
711	
712		A value of 1 enables the use of netif_carrier_ok(), a value of
713		0 will use the deprecated MII / ETHTOOL ioctls.  The default
714		value is 1.
715	
716	xmit_hash_policy
717	
718		Selects the transmit hash policy to use for slave selection in
719		balance-xor and 802.3ad modes.  Possible values are:
720	
721		layer2
722	
723			Uses XOR of hardware MAC addresses to generate the
724			hash.  The formula is
725	
726			(source MAC XOR destination MAC) modulo slave count
727	
728			This algorithm will place all traffic to a particular
729			network peer on the same slave.
730	
731			This algorithm is 802.3ad compliant.
732	
733		layer2+3
734	
735			This policy uses a combination of layer2 and layer3
736			protocol information to generate the hash.
737	
738			Uses XOR of hardware MAC addresses and IP addresses to
739			generate the hash.  The formula is
740	
741			(((source IP XOR dest IP) AND 0xffff) XOR
742				( source MAC XOR destination MAC ))
743					modulo slave count
744	
745			This algorithm will place all traffic to a particular
746			network peer on the same slave.  For non-IP traffic,
747			the formula is the same as for the layer2 transmit
748			hash policy.
749	
750			This policy is intended to provide a more balanced
751			distribution of traffic than layer2 alone, especially
752			in environments where a layer3 gateway device is
753			required to reach most destinations.
754	
755			This algorithm is 802.3ad compliant.
756	
757		layer3+4
758	
759			This policy uses upper layer protocol information,
760			when available, to generate the hash.  This allows for
761			traffic to a particular network peer to span multiple
762			slaves, although a single connection will not span
763			multiple slaves.
764	
765			The formula for unfragmented TCP and UDP packets is
766	
767			((source port XOR dest port) XOR
768				 ((source IP XOR dest IP) AND 0xffff)
769					modulo slave count
770	
771			For fragmented TCP or UDP packets and all other IP
772			protocol traffic, the source and destination port
773			information is omitted.  For non-IP traffic, the
774			formula is the same as for the layer2 transmit hash
775			policy.
776	
777			This policy is intended to mimic the behavior of
778			certain switches, notably Cisco switches with PFC2 as
779			well as some Foundry and IBM products.
780	
781			This algorithm is not fully 802.3ad compliant.  A
782			single TCP or UDP conversation containing both
783			fragmented and unfragmented packets will see packets
784			striped across two interfaces.  This may result in out
785			of order delivery.  Most traffic types will not meet
786			this criteria, as TCP rarely fragments traffic, and
787			most UDP traffic is not involved in extended
788			conversations.  Other implementations of 802.3ad may
789			or may not tolerate this noncompliance.
790	
791		The default value is layer2.  This option was added in bonding
792		version 2.6.3.  In earlier versions of bonding, this parameter
793		does not exist, and the layer2 policy is the only policy.  The
794		layer2+3 value was added for bonding version 3.2.2.
795	
796	resend_igmp
797	
798		Specifies the number of IGMP membership reports to be issued after
799		a failover event. One membership report is issued immediately after
800		the failover, subsequent packets are sent in each 200ms interval.
801	
802		The valid range is 0 - 255; the default value is 1. A value of 0
803		prevents the IGMP membership report from being issued in response
804		to the failover event.
805	
806		This option is useful for bonding modes balance-rr (0), active-backup
807		(1), balance-tlb (5) and balance-alb (6), in which a failover can
808		switch the IGMP traffic from one slave to another.  Therefore a fresh
809		IGMP report must be issued to cause the switch to forward the incoming
810		IGMP traffic over the newly selected slave.
811	
812		This option was added for bonding version 3.7.0.
813	
814	3. Configuring Bonding Devices
815	==============================
816	
817		You can configure bonding using either your distro's network
818	initialization scripts, or manually using either ifenslave or the
819	sysfs interface.  Distros generally use one of three packages for the
820	network initialization scripts: initscripts, sysconfig or interfaces.
821	Recent versions of these packages have support for bonding, while older
822	versions do not.
823	
824		We will first describe the options for configuring bonding for
825	distros using versions of initscripts, sysconfig and interfaces with full
826	or partial support for bonding, then provide information on enabling
827	bonding without support from the network initialization scripts (i.e.,
828	older versions of initscripts or sysconfig).
829	
830		If you're unsure whether your distro uses sysconfig,
831	initscripts or interfaces, or don't know if it's new enough, have no fear.
832	Determining this is fairly straightforward.
833	
834		First, look for a file called interfaces in /etc/network directory.
835	If this file is present in your system, then your system use interfaces. See
836	Configuration with Interfaces Support.
837	
838		Else, issue the command:
839	
840	$ rpm -qf /sbin/ifup
841	
842		It will respond with a line of text starting with either
843	"initscripts" or "sysconfig," followed by some numbers.  This is the
844	package that provides your network initialization scripts.
845	
846		Next, to determine if your installation supports bonding,
847	issue the command:
848	
849	$ grep ifenslave /sbin/ifup
850	
851		If this returns any matches, then your initscripts or
852	sysconfig has support for bonding.
853	
854	3.1 Configuration with Sysconfig Support
855	----------------------------------------
856	
857		This section applies to distros using a version of sysconfig
858	with bonding support, for example, SuSE Linux Enterprise Server 9.
859	
860		SuSE SLES 9's networking configuration system does support
861	bonding, however, at this writing, the YaST system configuration
862	front end does not provide any means to work with bonding devices.
863	Bonding devices can be managed by hand, however, as follows.
864	
865		First, if they have not already been configured, configure the
866	slave devices.  On SLES 9, this is most easily done by running the
867	yast2 sysconfig configuration utility.  The goal is for to create an
868	ifcfg-id file for each slave device.  The simplest way to accomplish
869	this is to configure the devices for DHCP (this is only to get the
870	file ifcfg-id file created; see below for some issues with DHCP).  The
871	name of the configuration file for each device will be of the form:
872	
873	ifcfg-id-xx:xx:xx:xx:xx:xx
874	
875		Where the "xx" portion will be replaced with the digits from
876	the device's permanent MAC address.
877	
878		Once the set of ifcfg-id-xx:xx:xx:xx:xx:xx files has been
879	created, it is necessary to edit the configuration files for the slave
880	devices (the MAC addresses correspond to those of the slave devices).
881	Before editing, the file will contain multiple lines, and will look
882	something like this:
883	
884	BOOTPROTO='dhcp'
885	STARTMODE='on'
886	USERCTL='no'
887	UNIQUE='XNzu.WeZGOGF+4wE'
888	_nm_name='bus-pci-0001:61:01.0'
889	
890		Change the BOOTPROTO and STARTMODE lines to the following:
891	
892	BOOTPROTO='none'
893	STARTMODE='off'
894	
895		Do not alter the UNIQUE or _nm_name lines.  Remove any other
896	lines (USERCTL, etc).
897	
898		Once the ifcfg-id-xx:xx:xx:xx:xx:xx files have been modified,
899	it's time to create the configuration file for the bonding device
900	itself.  This file is named ifcfg-bondX, where X is the number of the
901	bonding device to create, starting at 0.  The first such file is
902	ifcfg-bond0, the second is ifcfg-bond1, and so on.  The sysconfig
903	network configuration system will correctly start multiple instances
904	of bonding.
905	
906		The contents of the ifcfg-bondX file is as follows:
907	
908	BOOTPROTO="static"
909	BROADCAST="10.0.2.255"
910	IPADDR="10.0.2.10"
911	NETMASK="255.255.0.0"
912	NETWORK="10.0.2.0"
913	REMOTE_IPADDR=""
914	STARTMODE="onboot"
915	BONDING_MASTER="yes"
916	BONDING_MODULE_OPTS="mode=active-backup miimon=100"
917	BONDING_SLAVE0="eth0"
918	BONDING_SLAVE1="bus-pci-0000:06:08.1"
919	
920		Replace the sample BROADCAST, IPADDR, NETMASK and NETWORK
921	values with the appropriate values for your network.
922	
923		The STARTMODE specifies when the device is brought online.
924	The possible values are:
925	
926		onboot:	 The device is started at boot time.  If you're not
927			 sure, this is probably what you want.
928	
929		manual:	 The device is started only when ifup is called
930			 manually.  Bonding devices may be configured this
931			 way if you do not wish them to start automatically
932			 at boot for some reason.
933	
934		hotplug: The device is started by a hotplug event.  This is not
935			 a valid choice for a bonding device.
936	
937		off or ignore: The device configuration is ignored.
938	
939		The line BONDING_MASTER='yes' indicates that the device is a
940	bonding master device.  The only useful value is "yes."
941	
942		The contents of BONDING_MODULE_OPTS are supplied to the
943	instance of the bonding module for this device.  Specify the options
944	for the bonding mode, link monitoring, and so on here.  Do not include
945	the max_bonds bonding parameter; this will confuse the configuration
946	system if you have multiple bonding devices.
947	
948		Finally, supply one BONDING_SLAVEn="slave device" for each
949	slave.  where "n" is an increasing value, one for each slave.  The
950	"slave device" is either an interface name, e.g., "eth0", or a device
951	specifier for the network device.  The interface name is easier to
952	find, but the ethN names are subject to change at boot time if, e.g.,
953	a device early in the sequence has failed.  The device specifiers
954	(bus-pci-0000:06:08.1 in the example above) specify the physical
955	network device, and will not change unless the device's bus location
956	changes (for example, it is moved from one PCI slot to another).  The
957	example above uses one of each type for demonstration purposes; most
958	configurations will choose one or the other for all slave devices.
959	
960		When all configuration files have been modified or created,
961	networking must be restarted for the configuration changes to take
962	effect.  This can be accomplished via the following:
963	
964	# /etc/init.d/network restart
965	
966		Note that the network control script (/sbin/ifdown) will
967	remove the bonding module as part of the network shutdown processing,
968	so it is not necessary to remove the module by hand if, e.g., the
969	module parameters have changed.
970	
971		Also, at this writing, YaST/YaST2 will not manage bonding
972	devices (they do not show bonding interfaces on its list of network
973	devices).  It is necessary to edit the configuration file by hand to
974	change the bonding configuration.
975	
976		Additional general options and details of the ifcfg file
977	format can be found in an example ifcfg template file:
978	
979	/etc/sysconfig/network/ifcfg.template
980	
981		Note that the template does not document the various BONDING_
982	settings described above, but does describe many of the other options.
983	
984	3.1.1 Using DHCP with Sysconfig
985	-------------------------------
986	
987		Under sysconfig, configuring a device with BOOTPROTO='dhcp'
988	will cause it to query DHCP for its IP address information.  At this
989	writing, this does not function for bonding devices; the scripts
990	attempt to obtain the device address from DHCP prior to adding any of
991	the slave devices.  Without active slaves, the DHCP requests are not
992	sent to the network.
993	
994	3.1.2 Configuring Multiple Bonds with Sysconfig
995	-----------------------------------------------
996	
997		The sysconfig network initialization system is capable of
998	handling multiple bonding devices.  All that is necessary is for each
999	bonding instance to have an appropriately configured ifcfg-bondX file
1000	(as described above).  Do not specify the "max_bonds" parameter to any
1001	instance of bonding, as this will confuse sysconfig.  If you require
1002	multiple bonding devices with identical parameters, create multiple
1003	ifcfg-bondX files.
1004	
1005		Because the sysconfig scripts supply the bonding module
1006	options in the ifcfg-bondX file, it is not necessary to add them to
1007	the system /etc/modules.conf or /etc/modprobe.conf configuration file.
1008	
1009	3.2 Configuration with Initscripts Support
1010	------------------------------------------
1011	
1012		This section applies to distros using a recent version of
1013	initscripts with bonding support, for example, Red Hat Enterprise Linux
1014	version 3 or later, Fedora, etc.  On these systems, the network
1015	initialization scripts have knowledge of bonding, and can be configured to
1016	control bonding devices.  Note that older versions of the initscripts
1017	package have lower levels of support for bonding; this will be noted where
1018	applicable.
1019	
1020		These distros will not automatically load the network adapter
1021	driver unless the ethX device is configured with an IP address.
1022	Because of this constraint, users must manually configure a
1023	network-script file for all physical adapters that will be members of
1024	a bondX link.  Network script files are located in the directory:
1025	
1026	/etc/sysconfig/network-scripts
1027	
1028		The file name must be prefixed with "ifcfg-eth" and suffixed
1029	with the adapter's physical adapter number.  For example, the script
1030	for eth0 would be named /etc/sysconfig/network-scripts/ifcfg-eth0.
1031	Place the following text in the file:
1032	
1033	DEVICE=eth0
1034	USERCTL=no
1035	ONBOOT=yes
1036	MASTER=bond0
1037	SLAVE=yes
1038	BOOTPROTO=none
1039	
1040		The DEVICE= line will be different for every ethX device and
1041	must correspond with the name of the file, i.e., ifcfg-eth1 must have
1042	a device line of DEVICE=eth1.  The setting of the MASTER= line will
1043	also depend on the final bonding interface name chosen for your bond.
1044	As with other network devices, these typically start at 0, and go up
1045	one for each device, i.e., the first bonding instance is bond0, the
1046	second is bond1, and so on.
1047	
1048		Next, create a bond network script.  The file name for this
1049	script will be /etc/sysconfig/network-scripts/ifcfg-bondX where X is
1050	the number of the bond.  For bond0 the file is named "ifcfg-bond0",
1051	for bond1 it is named "ifcfg-bond1", and so on.  Within that file,
1052	place the following text:
1053	
1054	DEVICE=bond0
1055	IPADDR=192.168.1.1
1056	NETMASK=255.255.255.0
1057	NETWORK=192.168.1.0
1058	BROADCAST=192.168.1.255
1059	ONBOOT=yes
1060	BOOTPROTO=none
1061	USERCTL=no
1062	
1063		Be sure to change the networking specific lines (IPADDR,
1064	NETMASK, NETWORK and BROADCAST) to match your network configuration.
1065	
1066		For later versions of initscripts, such as that found with Fedora
1067	7 (or later) and Red Hat Enterprise Linux version 5 (or later), it is possible,
1068	and, indeed, preferable, to specify the bonding options in the ifcfg-bond0
1069	file, e.g. a line of the format:
1070	
1071	BONDING_OPTS="mode=active-backup arp_interval=60 arp_ip_target=192.168.1.254"
1072	
1073		will configure the bond with the specified options.  The options
1074	specified in BONDING_OPTS are identical to the bonding module parameters
1075	except for the arp_ip_target field when using versions of initscripts older
1076	than and 8.57 (Fedora 8) and 8.45.19 (Red Hat Enterprise Linux 5.2).  When
1077	using older versions each target should be included as a separate option and
1078	should be preceded by a '+' to indicate it should be added to the list of
1079	queried targets, e.g.,
1080	
1081		arp_ip_target=+192.168.1.1 arp_ip_target=+192.168.1.2
1082	
1083		is the proper syntax to specify multiple targets.  When specifying
1084	options via BONDING_OPTS, it is not necessary to edit /etc/modules.conf or
1085	/etc/modprobe.conf.
1086	
1087		For even older versions of initscripts that do not support
1088	BONDING_OPTS, it is necessary to edit /etc/modules.conf (or
1089	/etc/modprobe.conf, depending upon your distro) to load the bonding module
1090	with your desired options when the bond0 interface is brought up.  The
1091	following lines in /etc/modules.conf (or modprobe.conf) will load the
1092	bonding module, and select its options:
1093	
1094	alias bond0 bonding
1095	options bond0 mode=balance-alb miimon=100
1096	
1097		Replace the sample parameters with the appropriate set of
1098	options for your configuration.
1099	
1100		Finally run "/etc/rc.d/init.d/network restart" as root.  This
1101	will restart the networking subsystem and your bond link should be now
1102	up and running.
1103	
1104	3.2.1 Using DHCP with Initscripts
1105	---------------------------------
1106	
1107		Recent versions of initscripts (the versions supplied with Fedora
1108	Core 3 and Red Hat Enterprise Linux 4, or later versions, are reported to
1109	work) have support for assigning IP information to bonding devices via
1110	DHCP.
1111	
1112		To configure bonding for DHCP, configure it as described
1113	above, except replace the line "BOOTPROTO=none" with "BOOTPROTO=dhcp"
1114	and add a line consisting of "TYPE=Bonding".  Note that the TYPE value
1115	is case sensitive.
1116	
1117	3.2.2 Configuring Multiple Bonds with Initscripts
1118	-------------------------------------------------
1119	
1120		Initscripts packages that are included with Fedora 7 and Red Hat
1121	Enterprise Linux 5 support multiple bonding interfaces by simply
1122	specifying the appropriate BONDING_OPTS= in ifcfg-bondX where X is the
1123	number of the bond.  This support requires sysfs support in the kernel,
1124	and a bonding driver of version 3.0.0 or later.  Other configurations may
1125	not support this method for specifying multiple bonding interfaces; for
1126	those instances, see the "Configuring Multiple Bonds Manually" section,
1127	below.
1128	
1129	3.3 Configuring Bonding Manually with Ifenslave
1130	-----------------------------------------------
1131	
1132		This section applies to distros whose network initialization
1133	scripts (the sysconfig or initscripts package) do not have specific
1134	knowledge of bonding.  One such distro is SuSE Linux Enterprise Server
1135	version 8.
1136	
1137		The general method for these systems is to place the bonding
1138	module parameters into /etc/modules.conf or /etc/modprobe.conf (as
1139	appropriate for the installed distro), then add modprobe and/or
1140	ifenslave commands to the system's global init script.  The name of
1141	the global init script differs; for sysconfig, it is
1142	/etc/init.d/boot.local and for initscripts it is /etc/rc.d/rc.local.
1143	
1144		For example, if you wanted to make a simple bond of two e100
1145	devices (presumed to be eth0 and eth1), and have it persist across
1146	reboots, edit the appropriate file (/etc/init.d/boot.local or
1147	/etc/rc.d/rc.local), and add the following:
1148	
1149	modprobe bonding mode=balance-alb miimon=100
1150	modprobe e100
1151	ifconfig bond0 192.168.1.1 netmask 255.255.255.0 up
1152	ifenslave bond0 eth0
1153	ifenslave bond0 eth1
1154	
1155		Replace the example bonding module parameters and bond0
1156	network configuration (IP address, netmask, etc) with the appropriate
1157	values for your configuration.
1158	
1159		Unfortunately, this method will not provide support for the
1160	ifup and ifdown scripts on the bond devices.  To reload the bonding
1161	configuration, it is necessary to run the initialization script, e.g.,
1162	
1163	# /etc/init.d/boot.local
1164	
1165		or
1166	
1167	# /etc/rc.d/rc.local
1168	
1169		It may be desirable in such a case to create a separate script
1170	which only initializes the bonding configuration, then call that
1171	separate script from within boot.local.  This allows for bonding to be
1172	enabled without re-running the entire global init script.
1173	
1174		To shut down the bonding devices, it is necessary to first
1175	mark the bonding device itself as being down, then remove the
1176	appropriate device driver modules.  For our example above, you can do
1177	the following:
1178	
1179	# ifconfig bond0 down
1180	# rmmod bonding
1181	# rmmod e100
1182	
1183		Again, for convenience, it may be desirable to create a script
1184	with these commands.
1185	
1186	
1187	3.3.1 Configuring Multiple Bonds Manually
1188	-----------------------------------------
1189	
1190		This section contains information on configuring multiple
1191	bonding devices with differing options for those systems whose network
1192	initialization scripts lack support for configuring multiple bonds.
1193	
1194		If you require multiple bonding devices, but all with the same
1195	options, you may wish to use the "max_bonds" module parameter,
1196	documented above.
1197	
1198		To create multiple bonding devices with differing options, it is
1199	preferrable to use bonding parameters exported by sysfs, documented in the
1200	section below.
1201	
1202		For versions of bonding without sysfs support, the only means to
1203	provide multiple instances of bonding with differing options is to load
1204	the bonding driver multiple times.  Note that current versions of the
1205	sysconfig network initialization scripts handle this automatically; if
1206	your distro uses these scripts, no special action is needed.  See the
1207	section Configuring Bonding Devices, above, if you're not sure about your
1208	network initialization scripts.
1209	
1210		To load multiple instances of the module, it is necessary to
1211	specify a different name for each instance (the module loading system
1212	requires that every loaded module, even multiple instances of the same
1213	module, have a unique name).  This is accomplished by supplying multiple
1214	sets of bonding options in /etc/modprobe.conf, for example:
1215	
1216	alias bond0 bonding
1217	options bond0 -o bond0 mode=balance-rr miimon=100
1218	
1219	alias bond1 bonding
1220	options bond1 -o bond1 mode=balance-alb miimon=50
1221	
1222		will load the bonding module two times.  The first instance is
1223	named "bond0" and creates the bond0 device in balance-rr mode with an
1224	miimon of 100.  The second instance is named "bond1" and creates the
1225	bond1 device in balance-alb mode with an miimon of 50.
1226	
1227		In some circumstances (typically with older distributions),
1228	the above does not work, and the second bonding instance never sees
1229	its options.  In that case, the second options line can be substituted
1230	as follows:
1231	
1232	install bond1 /sbin/modprobe --ignore-install bonding -o bond1 \
1233		mode=balance-alb miimon=50
1234	
1235		This may be repeated any number of times, specifying a new and
1236	unique name in place of bond1 for each subsequent instance.
1237	
1238		It has been observed that some Red Hat supplied kernels are unable
1239	to rename modules at load time (the "-o bond1" part).  Attempts to pass
1240	that option to modprobe will produce an "Operation not permitted" error.
1241	This has been reported on some Fedora Core kernels, and has been seen on
1242	RHEL 4 as well.  On kernels exhibiting this problem, it will be impossible
1243	to configure multiple bonds with differing parameters (as they are older
1244	kernels, and also lack sysfs support).
1245	
1246	3.4 Configuring Bonding Manually via Sysfs
1247	------------------------------------------
1248	
1249		Starting with version 3.0.0, Channel Bonding may be configured
1250	via the sysfs interface.  This interface allows dynamic configuration
1251	of all bonds in the system without unloading the module.  It also
1252	allows for adding and removing bonds at runtime.  Ifenslave is no
1253	longer required, though it is still supported.
1254	
1255		Use of the sysfs interface allows you to use multiple bonds
1256	with different configurations without having to reload the module.
1257	It also allows you to use multiple, differently configured bonds when
1258	bonding is compiled into the kernel.
1259	
1260		You must have the sysfs filesystem mounted to configure
1261	bonding this way.  The examples in this document assume that you
1262	are using the standard mount point for sysfs, e.g. /sys.  If your
1263	sysfs filesystem is mounted elsewhere, you will need to adjust the
1264	example paths accordingly.
1265	
1266	Creating and Destroying Bonds
1267	-----------------------------
1268	To add a new bond foo:
1269	# echo +foo > /sys/class/net/bonding_masters
1270	
1271	To remove an existing bond bar:
1272	# echo -bar > /sys/class/net/bonding_masters
1273	
1274	To show all existing bonds:
1275	# cat /sys/class/net/bonding_masters
1276	
1277	NOTE: due to 4K size limitation of sysfs files, this list may be
1278	truncated if you have more than a few hundred bonds.  This is unlikely
1279	to occur under normal operating conditions.
1280	
1281	Adding and Removing Slaves
1282	--------------------------
1283		Interfaces may be enslaved to a bond using the file
1284	/sys/class/net/<bond>/bonding/slaves.  The semantics for this file
1285	are the same as for the bonding_masters file.
1286	
1287	To enslave interface eth0 to bond bond0:
1288	# ifconfig bond0 up
1289	# echo +eth0 > /sys/class/net/bond0/bonding/slaves
1290	
1291	To free slave eth0 from bond bond0:
1292	# echo -eth0 > /sys/class/net/bond0/bonding/slaves
1293	
1294		When an interface is enslaved to a bond, symlinks between the
1295	two are created in the sysfs filesystem.  In this case, you would get
1296	/sys/class/net/bond0/slave_eth0 pointing to /sys/class/net/eth0, and
1297	/sys/class/net/eth0/master pointing to /sys/class/net/bond0.
1298	
1299		This means that you can tell quickly whether or not an
1300	interface is enslaved by looking for the master symlink.  Thus:
1301	# echo -eth0 > /sys/class/net/eth0/master/bonding/slaves
1302	will free eth0 from whatever bond it is enslaved to, regardless of
1303	the name of the bond interface.
1304	
1305	Changing a Bond's Configuration
1306	-------------------------------
1307		Each bond may be configured individually by manipulating the
1308	files located in /sys/class/net/<bond name>/bonding
1309	
1310		The names of these files correspond directly with the command-
1311	line parameters described elsewhere in this file, and, with the
1312	exception of arp_ip_target, they accept the same values.  To see the
1313	current setting, simply cat the appropriate file.
1314	
1315		A few examples will be given here; for specific usage
1316	guidelines for each parameter, see the appropriate section in this
1317	document.
1318	
1319	To configure bond0 for balance-alb mode:
1320	# ifconfig bond0 down
1321	# echo 6 > /sys/class/net/bond0/bonding/mode
1322	 - or -
1323	# echo balance-alb > /sys/class/net/bond0/bonding/mode
1324		NOTE: The bond interface must be down before the mode can be
1325	changed.
1326	
1327	To enable MII monitoring on bond0 with a 1 second interval:
1328	# echo 1000 > /sys/class/net/bond0/bonding/miimon
1329		NOTE: If ARP monitoring is enabled, it will disabled when MII
1330	monitoring is enabled, and vice-versa.
1331	
1332	To add ARP targets:
1333	# echo +192.168.0.100 > /sys/class/net/bond0/bonding/arp_ip_target
1334	# echo +192.168.0.101 > /sys/class/net/bond0/bonding/arp_ip_target
1335		NOTE:  up to 16 target addresses may be specified.
1336	
1337	To remove an ARP target:
1338	# echo -192.168.0.100 > /sys/class/net/bond0/bonding/arp_ip_target
1339	
1340	Example Configuration
1341	---------------------
1342		We begin with the same example that is shown in section 3.3,
1343	executed with sysfs, and without using ifenslave.
1344	
1345		To make a simple bond of two e100 devices (presumed to be eth0
1346	and eth1), and have it persist across reboots, edit the appropriate
1347	file (/etc/init.d/boot.local or /etc/rc.d/rc.local), and add the
1348	following:
1349	
1350	modprobe bonding
1351	modprobe e100
1352	echo balance-alb > /sys/class/net/bond0/bonding/mode
1353	ifconfig bond0 192.168.1.1 netmask 255.255.255.0 up
1354	echo 100 > /sys/class/net/bond0/bonding/miimon
1355	echo +eth0 > /sys/class/net/bond0/bonding/slaves
1356	echo +eth1 > /sys/class/net/bond0/bonding/slaves
1357	
1358		To add a second bond, with two e1000 interfaces in
1359	active-backup mode, using ARP monitoring, add the following lines to
1360	your init script:
1361	
1362	modprobe e1000
1363	echo +bond1 > /sys/class/net/bonding_masters
1364	echo active-backup > /sys/class/net/bond1/bonding/mode
1365	ifconfig bond1 192.168.2.1 netmask 255.255.255.0 up
1366	echo +192.168.2.100 /sys/class/net/bond1/bonding/arp_ip_target
1367	echo 2000 > /sys/class/net/bond1/bonding/arp_interval
1368	echo +eth2 > /sys/class/net/bond1/bonding/slaves
1369	echo +eth3 > /sys/class/net/bond1/bonding/slaves
1370	
1371	3.5 Configuration with Interfaces Support
1372	-----------------------------------------
1373	
1374	        This section applies to distros which use /etc/network/interfaces file
1375	to describe network interface configuration, most notably Debian and it's
1376	derivatives.
1377	
1378		The ifup and ifdown commands on Debian don't support bonding out of
1379	the box. The ifenslave-2.6 package should be installed to provide bonding
1380	support.  Once installed, this package will provide bond-* options to be used
1381	into /etc/network/interfaces.
1382	
1383		Note that ifenslave-2.6 package will load the bonding module and use
1384	the ifenslave command when appropriate.
1385	
1386	Example Configurations
1387	----------------------
1388	
1389	In /etc/network/interfaces, the following stanza will configure bond0, in
1390	active-backup mode, with eth0 and eth1 as slaves.
1391	
1392	auto bond0
1393	iface bond0 inet dhcp
1394		bond-slaves eth0 eth1
1395		bond-mode active-backup
1396		bond-miimon 100
1397		bond-primary eth0 eth1
1398	
1399	If the above configuration doesn't work, you might have a system using
1400	upstart for system startup. This is most notably true for recent
1401	Ubuntu versions. The following stanza in /etc/network/interfaces will
1402	produce the same result on those systems.
1403	
1404	auto bond0
1405	iface bond0 inet dhcp
1406		bond-slaves none
1407		bond-mode active-backup
1408		bond-miimon 100
1409	
1410	auto eth0
1411	iface eth0 inet manual
1412		bond-master bond0
1413		bond-primary eth0 eth1
1414	
1415	auto eth1
1416	iface eth1 inet manual
1417		bond-master bond0
1418		bond-primary eth0 eth1
1419	
1420	For a full list of bond-* supported options in /etc/network/interfaces and some
1421	more advanced examples tailored to you particular distros, see the files in
1422	/usr/share/doc/ifenslave-2.6.
1423	
1424	3.6 Overriding Configuration for Special Cases
1425	----------------------------------------------
1426	
1427	When using the bonding driver, the physical port which transmits a frame is
1428	typically selected by the bonding driver, and is not relevant to the user or
1429	system administrator.  The output port is simply selected using the policies of
1430	the selected bonding mode.  On occasion however, it is helpful to direct certain
1431	classes of traffic to certain physical interfaces on output to implement
1432	slightly more complex policies.  For example, to reach a web server over a
1433	bonded interface in which eth0 connects to a private network, while eth1
1434	connects via a public network, it may be desirous to bias the bond to send said
1435	traffic over eth0 first, using eth1 only as a fall back, while all other traffic
1436	can safely be sent over either interface.  Such configurations may be achieved
1437	using the traffic control utilities inherent in linux.
1438	
1439	By default the bonding driver is multiqueue aware and 16 queues are created
1440	when the driver initializes (see Documentation/networking/multiqueue.txt
1441	for details).  If more or less queues are desired the module parameter
1442	tx_queues can be used to change this value.  There is no sysfs parameter
1443	available as the allocation is done at module init time.
1444	
1445	The output of the file /proc/net/bonding/bondX has changed so the output Queue
1446	ID is now printed for each slave:
1447	
1448	Bonding Mode: fault-tolerance (active-backup)
1449	Primary Slave: None
1450	Currently Active Slave: eth0
1451	MII Status: up
1452	MII Polling Interval (ms): 0
1453	Up Delay (ms): 0
1454	Down Delay (ms): 0
1455	
1456	Slave Interface: eth0
1457	MII Status: up
1458	Link Failure Count: 0
1459	Permanent HW addr: 00:1a:a0:12:8f:cb
1460	Slave queue ID: 0
1461	
1462	Slave Interface: eth1
1463	MII Status: up
1464	Link Failure Count: 0
1465	Permanent HW addr: 00:1a:a0:12:8f:cc
1466	Slave queue ID: 2
1467	
1468	The queue_id for a slave can be set using the command:
1469	
1470	# echo "eth1:2" > /sys/class/net/bond0/bonding/queue_id
1471	
1472	Any interface that needs a queue_id set should set it with multiple calls
1473	like the one above until proper priorities are set for all interfaces.  On
1474	distributions that allow configuration via initscripts, multiple 'queue_id'
1475	arguments can be added to BONDING_OPTS to set all needed slave queues.
1476	
1477	These queue id's can be used in conjunction with the tc utility to configure
1478	a multiqueue qdisc and filters to bias certain traffic to transmit on certain
1479	slave devices.  For instance, say we wanted, in the above configuration to
1480	force all traffic bound to 192.168.1.100 to use eth1 in the bond as its output
1481	device. The following commands would accomplish this:
1482	
1483	# tc qdisc add dev bond0 handle 1 root multiq
1484	
1485	# tc filter add dev bond0 protocol ip parent 1: prio 1 u32 match ip dst \
1486		192.168.1.100 action skbedit queue_mapping 2
1487	
1488	These commands tell the kernel to attach a multiqueue queue discipline to the
1489	bond0 interface and filter traffic enqueued to it, such that packets with a dst
1490	ip of 192.168.1.100 have their output queue mapping value overwritten to 2.
1491	This value is then passed into the driver, causing the normal output path
1492	selection policy to be overridden, selecting instead qid 2, which maps to eth1.
1493	
1494	Note that qid values begin at 1.  Qid 0 is reserved to initiate to the driver
1495	that normal output policy selection should take place.  One benefit to simply
1496	leaving the qid for a slave to 0 is the multiqueue awareness in the bonding
1497	driver that is now present.  This awareness allows tc filters to be placed on
1498	slave devices as well as bond devices and the bonding driver will simply act as
1499	a pass-through for selecting output queues on the slave device rather than 
1500	output port selection.
1501	
1502	This feature first appeared in bonding driver version 3.7.0 and support for
1503	output slave selection was limited to round-robin and active-backup modes.
1504	
1505	4 Querying Bonding Configuration
1506	=================================
1507	
1508	4.1 Bonding Configuration
1509	-------------------------
1510	
1511		Each bonding device has a read-only file residing in the
1512	/proc/net/bonding directory.  The file contents include information
1513	about the bonding configuration, options and state of each slave.
1514	
1515		For example, the contents of /proc/net/bonding/bond0 after the
1516	driver is loaded with parameters of mode=0 and miimon=1000 is
1517	generally as follows:
1518	
1519		Ethernet Channel Bonding Driver: 2.6.1 (October 29, 2004)
1520	        Bonding Mode: load balancing (round-robin)
1521	        Currently Active Slave: eth0
1522	        MII Status: up
1523	        MII Polling Interval (ms): 1000
1524	        Up Delay (ms): 0
1525	        Down Delay (ms): 0
1526	
1527	        Slave Interface: eth1
1528	        MII Status: up
1529	        Link Failure Count: 1
1530	
1531	        Slave Interface: eth0
1532	        MII Status: up
1533	        Link Failure Count: 1
1534	
1535		The precise format and contents will change depending upon the
1536	bonding configuration, state, and version of the bonding driver.
1537	
1538	4.2 Network configuration
1539	-------------------------
1540	
1541		The network configuration can be inspected using the ifconfig
1542	command.  Bonding devices will have the MASTER flag set; Bonding slave
1543	devices will have the SLAVE flag set.  The ifconfig output does not
1544	contain information on which slaves are associated with which masters.
1545	
1546		In the example below, the bond0 interface is the master
1547	(MASTER) while eth0 and eth1 are slaves (SLAVE). Notice all slaves of
1548	bond0 have the same MAC address (HWaddr) as bond0 for all modes except
1549	TLB and ALB that require a unique MAC address for each slave.
1550	
1551	# /sbin/ifconfig
1552	bond0     Link encap:Ethernet  HWaddr 00:C0:F0:1F:37:B4
1553	          inet addr:XXX.XXX.XXX.YYY  Bcast:XXX.XXX.XXX.255  Mask:255.255.252.0
1554	          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
1555	          RX packets:7224794 errors:0 dropped:0 overruns:0 frame:0
1556	          TX packets:3286647 errors:1 dropped:0 overruns:1 carrier:0
1557	          collisions:0 txqueuelen:0
1558	
1559	eth0      Link encap:Ethernet  HWaddr 00:C0:F0:1F:37:B4
1560	          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
1561	          RX packets:3573025 errors:0 dropped:0 overruns:0 frame:0
1562	          TX packets:1643167 errors:1 dropped:0 overruns:1 carrier:0
1563	          collisions:0 txqueuelen:100
1564	          Interrupt:10 Base address:0x1080
1565	
1566	eth1      Link encap:Ethernet  HWaddr 00:C0:F0:1F:37:B4
1567	          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
1568	          RX packets:3651769 errors:0 dropped:0 overruns:0 frame:0
1569	          TX packets:1643480 errors:0 dropped:0 overruns:0 carrier:0
1570	          collisions:0 txqueuelen:100
1571	          Interrupt:9 Base address:0x1400
1572	
1573	5. Switch Configuration
1574	=======================
1575	
1576		For this section, "switch" refers to whatever system the
1577	bonded devices are directly connected to (i.e., where the other end of
1578	the cable plugs into).  This may be an actual dedicated switch device,
1579	or it may be another regular system (e.g., another computer running
1580	Linux),
1581	
1582		The active-backup, balance-tlb and balance-alb modes do not
1583	require any specific configuration of the switch.
1584	
1585		The 802.3ad mode requires that the switch have the appropriate
1586	ports configured as an 802.3ad aggregation.  The precise method used
1587	to configure this varies from switch to switch, but, for example, a
1588	Cisco 3550 series switch requires that the appropriate ports first be
1589	grouped together in a single etherchannel instance, then that
1590	etherchannel is set to mode "lacp" to enable 802.3ad (instead of
1591	standard EtherChannel).
1592	
1593		The balance-rr, balance-xor and broadcast modes generally
1594	require that the switch have the appropriate ports grouped together.
1595	The nomenclature for such a group differs between switches, it may be
1596	called an "etherchannel" (as in the Cisco example, above), a "trunk
1597	group" or some other similar variation.  For these modes, each switch
1598	will also have its own configuration options for the switch's transmit
1599	policy to the bond.  Typical choices include XOR of either the MAC or
1600	IP addresses.  The transmit policy of the two peers does not need to
1601	match.  For these three modes, the bonding mode really selects a
1602	transmit policy for an EtherChannel group; all three will interoperate
1603	with another EtherChannel group.
1604	
1605	
1606	6. 802.1q VLAN Support
1607	======================
1608	
1609		It is possible to configure VLAN devices over a bond interface
1610	using the 8021q driver.  However, only packets coming from the 8021q
1611	driver and passing through bonding will be tagged by default.  Self
1612	generated packets, for example, bonding's learning packets or ARP
1613	packets generated by either ALB mode or the ARP monitor mechanism, are
1614	tagged internally by bonding itself.  As a result, bonding must
1615	"learn" the VLAN IDs configured above it, and use those IDs to tag
1616	self generated packets.
1617	
1618		For reasons of simplicity, and to support the use of adapters
1619	that can do VLAN hardware acceleration offloading, the bonding
1620	interface declares itself as fully hardware offloading capable, it gets
1621	the add_vid/kill_vid notifications to gather the necessary
1622	information, and it propagates those actions to the slaves.  In case
1623	of mixed adapter types, hardware accelerated tagged packets that
1624	should go through an adapter that is not offloading capable are
1625	"un-accelerated" by the bonding driver so the VLAN tag sits in the
1626	regular location.
1627	
1628		VLAN interfaces *must* be added on top of a bonding interface
1629	only after enslaving at least one slave.  The bonding interface has a
1630	hardware address of 00:00:00:00:00:00 until the first slave is added.
1631	If the VLAN interface is created prior to the first enslavement, it
1632	would pick up the all-zeroes hardware address.  Once the first slave
1633	is attached to the bond, the bond device itself will pick up the
1634	slave's hardware address, which is then available for the VLAN device.
1635	
1636		Also, be aware that a similar problem can occur if all slaves
1637	are released from a bond that still has one or more VLAN interfaces on
1638	top of it.  When a new slave is added, the bonding interface will
1639	obtain its hardware address from the first slave, which might not
1640	match the hardware address of the VLAN interfaces (which was
1641	ultimately copied from an earlier slave).
1642	
1643		There are two methods to insure that the VLAN device operates
1644	with the correct hardware address if all slaves are removed from a
1645	bond interface:
1646	
1647		1. Remove all VLAN interfaces then recreate them
1648	
1649		2. Set the bonding interface's hardware address so that it
1650	matches the hardware address of the VLAN interfaces.
1651	
1652		Note that changing a VLAN interface's HW address would set the
1653	underlying device -- i.e. the bonding interface -- to promiscuous
1654	mode, which might not be what you want.
1655	
1656	
1657	7. Link Monitoring
1658	==================
1659	
1660		The bonding driver at present supports two schemes for
1661	monitoring a slave device's link state: the ARP monitor and the MII
1662	monitor.
1663	
1664		At the present time, due to implementation restrictions in the
1665	bonding driver itself, it is not possible to enable both ARP and MII
1666	monitoring simultaneously.
1667	
1668	7.1 ARP Monitor Operation
1669	-------------------------
1670	
1671		The ARP monitor operates as its name suggests: it sends ARP
1672	queries to one or more designated peer systems on the network, and
1673	uses the response as an indication that the link is operating.  This
1674	gives some assurance that traffic is actually flowing to and from one
1675	or more peers on the local network.
1676	
1677		The ARP monitor relies on the device driver itself to verify
1678	that traffic is flowing.  In particular, the driver must keep up to
1679	date the last receive time, dev->last_rx, and transmit start time,
1680	dev->trans_start.  If these are not updated by the driver, then the
1681	ARP monitor will immediately fail any slaves using that driver, and
1682	those slaves will stay down.  If networking monitoring (tcpdump, etc)
1683	shows the ARP requests and replies on the network, then it may be that
1684	your device driver is not updating last_rx and trans_start.
1685	
1686	7.2 Configuring Multiple ARP Targets
1687	------------------------------------
1688	
1689		While ARP monitoring can be done with just one target, it can
1690	be useful in a High Availability setup to have several targets to
1691	monitor.  In the case of just one target, the target itself may go
1692	down or have a problem making it unresponsive to ARP requests.  Having
1693	an additional target (or several) increases the reliability of the ARP
1694	monitoring.
1695	
1696		Multiple ARP targets must be separated by commas as follows:
1697	
1698	# example options for ARP monitoring with three targets
1699	alias bond0 bonding
1700	options bond0 arp_interval=60 arp_ip_target=192.168.0.1,192.168.0.3,192.168.0.9
1701	
1702		For just a single target the options would resemble:
1703	
1704	# example options for ARP monitoring with one target
1705	alias bond0 bonding
1706	options bond0 arp_interval=60 arp_ip_target=192.168.0.100
1707	
1708	
1709	7.3 MII Monitor Operation
1710	-------------------------
1711	
1712		The MII monitor monitors only the carrier state of the local
1713	network interface.  It accomplishes this in one of three ways: by
1714	depending upon the device driver to maintain its carrier state, by
1715	querying the device's MII registers, or by making an ethtool query to
1716	the device.
1717	
1718		If the use_carrier module parameter is 1 (the default value),
1719	then the MII monitor will rely on the driver for carrier state
1720	information (via the netif_carrier subsystem).  As explained in the
1721	use_carrier parameter information, above, if the MII monitor fails to
1722	detect carrier loss on the device (e.g., when the cable is physically
1723	disconnected), it may be that the driver does not support
1724	netif_carrier.
1725	
1726		If use_carrier is 0, then the MII monitor will first query the
1727	device's (via ioctl) MII registers and check the link state.  If that
1728	request fails (not just that it returns carrier down), then the MII
1729	monitor will make an ethtool ETHOOL_GLINK request to attempt to obtain
1730	the same information.  If both methods fail (i.e., the driver either
1731	does not support or had some error in processing both the MII register
1732	and ethtool requests), then the MII monitor will assume the link is
1733	up.
1734	
1735	8. Potential Sources of Trouble
1736	===============================
1737	
1738	8.1 Adventures in Routing
1739	-------------------------
1740	
1741		When bonding is configured, it is important that the slave
1742	devices not have routes that supersede routes of the master (or,
1743	generally, not have routes at all).  For example, suppose the bonding
1744	device bond0 has two slaves, eth0 and eth1, and the routing table is
1745	as follows:
1746	
1747	Kernel IP routing table
1748	Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
1749	10.0.0.0        0.0.0.0         255.255.0.0     U        40 0          0 eth0
1750	10.0.0.0        0.0.0.0         255.255.0.0     U        40 0          0 eth1
1751	10.0.0.0        0.0.0.0         255.255.0.0     U        40 0          0 bond0
1752	127.0.0.0       0.0.0.0         255.0.0.0       U        40 0          0 lo
1753	
1754		This routing configuration will likely still update the
1755	receive/transmit times in the driver (needed by the ARP monitor), but
1756	may bypass the bonding driver (because outgoing traffic to, in this
1757	case, another host on network 10 would use eth0 or eth1 before bond0).
1758	
1759		The ARP monitor (and ARP itself) may become confused by this
1760	configuration, because ARP requests (generated by the ARP monitor)
1761	will be sent on one interface (bond0), but the corresponding reply
1762	will arrive on a different interface (eth0).  This reply looks to ARP
1763	as an unsolicited ARP reply (because ARP matches replies on an
1764	interface basis), and is discarded.  The MII monitor is not affected
1765	by the state of the routing table.
1766	
1767		The solution here is simply to insure that slaves do not have
1768	routes of their own, and if for some reason they must, those routes do
1769	not supersede routes of their master.  This should generally be the
1770	case, but unusual configurations or errant manual or automatic static
1771	route additions may cause trouble.
1772	
1773	8.2 Ethernet Device Renaming
1774	----------------------------
1775	
1776		On systems with network configuration scripts that do not
1777	associate physical devices directly with network interface names (so
1778	that the same physical device always has the same "ethX" name), it may
1779	be necessary to add some special logic to either /etc/modules.conf or
1780	/etc/modprobe.conf (depending upon which is installed on the system).
1781	
1782		For example, given a modules.conf containing the following:
1783	
1784	alias bond0 bonding
1785	options bond0 mode=some-mode miimon=50
1786	alias eth0 tg3
1787	alias eth1 tg3
1788	alias eth2 e1000
1789	alias eth3 e1000
1790	
1791		If neither eth0 and eth1 are slaves to bond0, then when the
1792	bond0 interface comes up, the devices may end up reordered.  This
1793	happens because bonding is loaded first, then its slave device's
1794	drivers are loaded next.  Since no other drivers have been loaded,
1795	when the e1000 driver loads, it will receive eth0 and eth1 for its
1796	devices, but the bonding configuration tries to enslave eth2 and eth3
1797	(which may later be assigned to the tg3 devices).
1798	
1799		Adding the following:
1800	
1801	add above bonding e1000 tg3
1802	
1803		causes modprobe to load e1000 then tg3, in that order, when
1804	bonding is loaded.  This command is fully documented in the
1805	modules.conf manual page.
1806	
1807		On systems utilizing modprobe.conf (or modprobe.conf.local),
1808	an equivalent problem can occur.  In this case, the following can be
1809	added to modprobe.conf (or modprobe.conf.local, as appropriate), as
1810	follows (all on one line; it has been split here for clarity):
1811	
1812	install bonding /sbin/modprobe tg3; /sbin/modprobe e1000;
1813		/sbin/modprobe --ignore-install bonding
1814	
1815		This will, when loading the bonding module, rather than
1816	performing the normal action, instead execute the provided command.
1817	This command loads the device drivers in the order needed, then calls
1818	modprobe with --ignore-install to cause the normal action to then take
1819	place.  Full documentation on this can be found in the modprobe.conf
1820	and modprobe manual pages.
1821	
1822	8.3. Painfully Slow Or No Failed Link Detection By Miimon
1823	---------------------------------------------------------
1824	
1825		By default, bonding enables the use_carrier option, which
1826	instructs bonding to trust the driver to maintain carrier state.
1827	
1828		As discussed in the options section, above, some drivers do
1829	not support the netif_carrier_on/_off link state tracking system.
1830	With use_carrier enabled, bonding will always see these links as up,
1831	regardless of their actual state.
1832	
1833		Additionally, other drivers do support netif_carrier, but do
1834	not maintain it in real time, e.g., only polling the link state at
1835	some fixed interval.  In this case, miimon will detect failures, but
1836	only after some long period of time has expired.  If it appears that
1837	miimon is very slow in detecting link failures, try specifying
1838	use_carrier=0 to see if that improves the failure detection time.  If
1839	it does, then it may be that the driver checks the carrier state at a
1840	fixed interval, but does not cache the MII register values (so the
1841	use_carrier=0 method of querying the registers directly works).  If
1842	use_carrier=0 does not improve the failover, then the driver may cache
1843	the registers, or the problem may be elsewhere.
1844	
1845		Also, remember that miimon only checks for the device's
1846	carrier state.  It has no way to determine the state of devices on or
1847	beyond other ports of a switch, or if a switch is refusing to pass
1848	traffic while still maintaining carrier on.
1849	
1850	9. SNMP agents
1851	===============
1852	
1853		If running SNMP agents, the bonding driver should be loaded
1854	before any network drivers participating in a bond.  This requirement
1855	is due to the interface index (ipAdEntIfIndex) being associated to
1856	the first interface found with a given IP address.  That is, there is
1857	only one ipAdEntIfIndex for each IP address.  For example, if eth0 and
1858	eth1 are slaves of bond0 and the driver for eth0 is loaded before the
1859	bonding driver, the interface for the IP address will be associated
1860	with the eth0 interface.  This configuration is shown below, the IP
1861	address 192.168.1.1 has an interface index of 2 which indexes to eth0
1862	in the ifDescr table (ifDescr.2).
1863	
1864	     interfaces.ifTable.ifEntry.ifDescr.1 = lo
1865	     interfaces.ifTable.ifEntry.ifDescr.2 = eth0
1866	     interfaces.ifTable.ifEntry.ifDescr.3 = eth1
1867	     interfaces.ifTable.ifEntry.ifDescr.4 = eth2
1868	     interfaces.ifTable.ifEntry.ifDescr.5 = eth3
1869	     interfaces.ifTable.ifEntry.ifDescr.6 = bond0
1870	     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.10.10.10 = 5
1871	     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.192.168.1.1 = 2
1872	     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.74.20.94 = 4
1873	     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.127.0.0.1 = 1
1874	
1875		This problem is avoided by loading the bonding driver before
1876	any network drivers participating in a bond.  Below is an example of
1877	loading the bonding driver first, the IP address 192.168.1.1 is
1878	correctly associated with ifDescr.2.
1879	
1880	     interfaces.ifTable.ifEntry.ifDescr.1 = lo
1881	     interfaces.ifTable.ifEntry.ifDescr.2 = bond0
1882	     interfaces.ifTable.ifEntry.ifDescr.3 = eth0
1883	     interfaces.ifTable.ifEntry.ifDescr.4 = eth1
1884	     interfaces.ifTable.ifEntry.ifDescr.5 = eth2
1885	     interfaces.ifTable.ifEntry.ifDescr.6 = eth3
1886	     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.10.10.10 = 6
1887	     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.192.168.1.1 = 2
1888	     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.74.20.94 = 5
1889	     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.127.0.0.1 = 1
1890	
1891		While some distributions may not report the interface name in
1892	ifDescr, the association between the IP address and IfIndex remains
1893	and SNMP functions such as Interface_Scan_Next will report that
1894	association.
1895	
1896	10. Promiscuous mode
1897	====================
1898	
1899		When running network monitoring tools, e.g., tcpdump, it is
1900	common to enable promiscuous mode on the device, so that all traffic
1901	is seen (instead of seeing only traffic destined for the local host).
1902	The bonding driver handles promiscuous mode changes to the bonding
1903	master device (e.g., bond0), and propagates the setting to the slave
1904	devices.
1905	
1906		For the balance-rr, balance-xor, broadcast, and 802.3ad modes,
1907	the promiscuous mode setting is propagated to all slaves.
1908	
1909		For the active-backup, balance-tlb and balance-alb modes, the
1910	promiscuous mode setting is propagated only to the active slave.
1911	
1912		For balance-tlb mode, the active slave is the slave currently
1913	receiving inbound traffic.
1914	
1915		For balance-alb mode, the active slave is the slave used as a
1916	"primary."  This slave is used for mode-specific control traffic, for
1917	sending to peers that are unassigned or if the load is unbalanced.
1918	
1919		For the active-backup, balance-tlb and balance-alb modes, when
1920	the active slave changes (e.g., due to a link failure), the
1921	promiscuous setting will be propagated to the new active slave.
1922	
1923	11. Configuring Bonding for High Availability
1924	=============================================
1925	
1926		High Availability refers to configurations that provide
1927	maximum network availability by having redundant or backup devices,
1928	links or switches between the host and the rest of the world.  The
1929	goal is to provide the maximum availability of network connectivity
1930	(i.e., the network always works), even though other configurations
1931	could provide higher throughput.
1932	
1933	11.1 High Availability in a Single Switch Topology
1934	--------------------------------------------------
1935	
1936		If two hosts (or a host and a single switch) are directly
1937	connected via multiple physical links, then there is no availability
1938	penalty to optimizing for maximum bandwidth.  In this case, there is
1939	only one switch (or peer), so if it fails, there is no alternative
1940	access to fail over to.  Additionally, the bonding load balance modes
1941	support link monitoring of their members, so if individual links fail,
1942	the load will be rebalanced across the remaining devices.
1943	
1944		See Section 13, "Configuring Bonding for Maximum Throughput"
1945	for information on configuring bonding with one peer device.
1946	
1947	11.2 High Availability in a Multiple Switch Topology
1948	----------------------------------------------------
1949	
1950		With multiple switches, the configuration of bonding and the
1951	network changes dramatically.  In multiple switch topologies, there is
1952	a trade off between network availability and usable bandwidth.
1953	
1954		Below is a sample network, configured to maximize the
1955	availability of the network:
1956	
1957	                |                                     |
1958	                |port3                           port3|
1959	          +-----+----+                          +-----+----+
1960	          |          |port2       ISL      port2|          |
1961	          | switch A +--------------------------+ switch B |
1962	          |          |                          |          |
1963	          +-----+----+                          +-----++---+
1964	                |port1                           port1|
1965	                |             +-------+               |
1966	                +-------------+ host1 +---------------+
1967	                         eth0 +-------+ eth1
1968	
1969		In this configuration, there is a link between the two
1970	switches (ISL, or inter switch link), and multiple ports connecting to
1971	the outside world ("port3" on each switch).  There is no technical
1972	reason that this could not be extended to a third switch.
1973	
1974	11.2.1 HA Bonding Mode Selection for Multiple Switch Topology
1975	-------------------------------------------------------------
1976	
1977		In a topology such as the example above, the active-backup and
1978	broadcast modes are the only useful bonding modes when optimizing for
1979	availability; the other modes require all links to terminate on the
1980	same peer for them to behave rationally.
1981	
1982	active-backup: This is generally the preferred mode, particularly if
1983		the switches have an ISL and play together well.  If the
1984		network configuration is such that one switch is specifically
1985		a backup switch (e.g., has lower capacity, higher cost, etc),
1986		then the primary option can be used to insure that the
1987		preferred link is always used when it is available.
1988	
1989	broadcast: This mode is really a special purpose mode, and is suitable
1990		only for very specific needs.  For example, if the two
1991		switches are not connected (no ISL), and the networks beyond
1992		them are totally independent.  In this case, if it is
1993		necessary for some specific one-way traffic to reach both
1994		independent networks, then the broadcast mode may be suitable.
1995	
1996	11.2.2 HA Link Monitoring Selection for Multiple Switch Topology
1997	----------------------------------------------------------------
1998	
1999		The choice of link monitoring ultimately depends upon your
2000	switch.  If the switch can reliably fail ports in response to other
2001	failures, then either the MII or ARP monitors should work.  For
2002	example, in the above example, if the "port3" link fails at the remote
2003	end, the MII monitor has no direct means to detect this.  The ARP
2004	monitor could be configured with a target at the remote end of port3,
2005	thus detecting that failure without switch support.
2006	
2007		In general, however, in a multiple switch topology, the ARP
2008	monitor can provide a higher level of reliability in detecting end to
2009	end connectivity failures (which may be caused by the failure of any
2010	individual component to pass traffic for any reason).  Additionally,
2011	the ARP monitor should be configured with multiple targets (at least
2012	one for each switch in the network).  This will insure that,
2013	regardless of which switch is active, the ARP monitor has a suitable
2014	target to query.
2015	
2016		Note, also, that of late many switches now support a functionality
2017	generally referred to as "trunk failover."  This is a feature of the
2018	switch that causes the link state of a particular switch port to be set
2019	down (or up) when the state of another switch port goes down (or up).
2020	Its purpose is to propagate link failures from logically "exterior" ports
2021	to the logically "interior" ports that bonding is able to monitor via
2022	miimon.  Availability and configuration for trunk failover varies by
2023	switch, but this can be a viable alternative to the ARP monitor when using
2024	suitable switches.
2025	
2026	12. Configuring Bonding for Maximum Throughput
2027	==============================================
2028	
2029	12.1 Maximizing Throughput in a Single Switch Topology
2030	------------------------------------------------------
2031	
2032		In a single switch configuration, the best method to maximize
2033	throughput depends upon the application and network environment.  The
2034	various load balancing modes each have strengths and weaknesses in
2035	different environments, as detailed below.
2036	
2037		For this discussion, we will break down the topologies into
2038	two categories.  Depending upon the destination of most traffic, we
2039	categorize them into either "gatewayed" or "local" configurations.
2040	
2041		In a gatewayed configuration, the "switch" is acting primarily
2042	as a router, and the majority of traffic passes through this router to
2043	other networks.  An example would be the following:
2044	
2045	
2046	     +----------+                     +----------+
2047	     |          |eth0            port1|          | to other networks
2048	     | Host A   +---------------------+ router   +------------------->
2049	     |          +---------------------+          | Hosts B and C are out
2050	     |          |eth1            port2|          | here somewhere
2051	     +----------+                     +----------+
2052	
2053		The router may be a dedicated router device, or another host
2054	acting as a gateway.  For our discussion, the important point is that
2055	the majority of traffic from Host A will pass through the router to
2056	some other network before reaching its final destination.
2057	
2058		In a gatewayed network configuration, although Host A may
2059	communicate with many other systems, all of its traffic will be sent
2060	and received via one other peer on the local network, the router.
2061	
2062		Note that the case of two systems connected directly via
2063	multiple physical links is, for purposes of configuring bonding, the
2064	same as a gatewayed configuration.  In that case, it happens that all
2065	traffic is destined for the "gateway" itself, not some other network
2066	beyond the gateway.
2067	
2068		In a local configuration, the "switch" is acting primarily as
2069	a switch, and the majority of traffic passes through this switch to
2070	reach other stations on the same network.  An example would be the
2071	following:
2072	
2073	    +----------+            +----------+       +--------+
2074	    |          |eth0   port1|          +-------+ Host B |
2075	    |  Host A  +------------+  switch  |port3  +--------+
2076	    |          +------------+          |                  +--------+
2077	    |          |eth1   port2|          +------------------+ Host C |
2078	    +----------+            +----------+port4             +--------+
2079	
2080	
2081		Again, the switch may be a dedicated switch device, or another
2082	host acting as a gateway.  For our discussion, the important point is
2083	that the majority of traffic from Host A is destined for other hosts
2084	on the same local network (Hosts B and C in the above example).
2085	
2086		In summary, in a gatewayed configuration, traffic to and from
2087	the bonded device will be to the same MAC level peer on the network
2088	(the gateway itself, i.e., the router), regardless of its final
2089	destination.  In a local configuration, traffic flows directly to and
2090	from the final destinations, thus, each destination (Host B, Host C)
2091	will be addressed directly by their individual MAC addresses.
2092	
2093		This distinction between a gatewayed and a local network
2094	configuration is important because many of the load balancing modes
2095	available use the MAC addresses of the local network source and
2096	destination to make load balancing decisions.  The behavior of each
2097	mode is described below.
2098	
2099	
2100	12.1.1 MT Bonding Mode Selection for Single Switch Topology
2101	-----------------------------------------------------------
2102	
2103		This configuration is the easiest to set up and to understand,
2104	although you will have to decide which bonding mode best suits your
2105	needs.  The trade offs for each mode are detailed below:
2106	
2107	balance-rr: This mode is the only mode that will permit a single
2108		TCP/IP connection to stripe traffic across multiple
2109		interfaces. It is therefore the only mode that will allow a
2110		single TCP/IP stream to utilize more than one interface's
2111		worth of throughput.  This comes at a cost, however: the
2112		striping generally results in peer systems receiving packets out
2113		of order, causing TCP/IP's congestion control system to kick
2114		in, often by retransmitting segments.
2115	
2116		It is possible to adjust TCP/IP's congestion limits by
2117		altering the net.ipv4.tcp_reordering sysctl parameter.  The
2118		usual default value is 3, and the maximum useful value is 127.
2119		For a four interface balance-rr bond, expect that a single
2120		TCP/IP stream will utilize no more than approximately 2.3
2121		interface's worth of throughput, even after adjusting
2122		tcp_reordering.
2123	
2124		Note that the fraction of packets that will be delivered out of
2125		order is highly variable, and is unlikely to be zero.  The level
2126		of reordering depends upon a variety of factors, including the
2127		networking interfaces, the switch, and the topology of the
2128		configuration.  Speaking in general terms, higher speed network
2129		cards produce more reordering (due to factors such as packet
2130		coalescing), and a "many to many" topology will reorder at a
2131		higher rate than a "many slow to one fast" configuration.
2132	
2133		Many switches do not support any modes that stripe traffic
2134		(instead choosing a port based upon IP or MAC level addresses);
2135		for those devices, traffic for a particular connection flowing
2136		through the switch to a balance-rr bond will not utilize greater
2137		than one interface's worth of bandwidth.
2138	
2139		If you are utilizing protocols other than TCP/IP, UDP for
2140		example, and your application can tolerate out of order
2141		delivery, then this mode can allow for single stream datagram
2142		performance that scales near linearly as interfaces are added
2143		to the bond.
2144	
2145		This mode requires the switch to have the appropriate ports
2146		configured for "etherchannel" or "trunking."
2147	
2148	active-backup: There is not much advantage in this network topology to
2149		the active-backup mode, as the inactive backup devices are all
2150		connected to the same peer as the primary.  In this case, a
2151		load balancing mode (with link monitoring) will provide the
2152		same level of network availability, but with increased
2153		available bandwidth.  On the plus side, active-backup mode
2154		does not require any configuration of the switch, so it may
2155		have value if the hardware available does not support any of
2156		the load balance modes.
2157	
2158	balance-xor: This mode will limit traffic such that packets destined
2159		for specific peers will always be sent over the same
2160		interface.  Since the destination is determined by the MAC
2161		addresses involved, this mode works best in a "local" network
2162		configuration (as described above), with destinations all on
2163		the same local network.  This mode is likely to be suboptimal
2164		if all your traffic is passed through a single router (i.e., a
2165		"gatewayed" network configuration, as described above).
2166	
2167		As with balance-rr, the switch ports need to be configured for
2168		"etherchannel" or "trunking."
2169	
2170	broadcast: Like active-backup, there is not much advantage to this
2171		mode in this type of network topology.
2172	
2173	802.3ad: This mode can be a good choice for this type of network
2174		topology.  The 802.3ad mode is an IEEE standard, so all peers
2175		that implement 802.3ad should interoperate well.  The 802.3ad
2176		protocol includes automatic configuration of the aggregates,
2177		so minimal manual configuration of the switch is needed
2178		(typically only to designate that some set of devices is
2179		available for 802.3ad).  The 802.3ad standard also mandates
2180		that frames be delivered in order (within certain limits), so
2181		in general single connections will not see misordering of
2182		packets.  The 802.3ad mode does have some drawbacks: the
2183		standard mandates that all devices in the aggregate operate at
2184		the same speed and duplex.  Also, as with all bonding load
2185		balance modes other than balance-rr, no single connection will
2186		be able to utilize more than a single interface's worth of
2187		bandwidth.  
2188	
2189		Additionally, the linux bonding 802.3ad implementation
2190		distributes traffic by peer (using an XOR of MAC addresses),
2191		so in a "gatewayed" configuration, all outgoing traffic will
2192		generally use the same device.  Incoming traffic may also end
2193		up on a single device, but that is dependent upon the
2194		balancing policy of the peer's 8023.ad implementation.  In a
2195		"local" configuration, traffic will be distributed across the
2196		devices in the bond.
2197	
2198		Finally, the 802.3ad mode mandates the use of the MII monitor,
2199		therefore, the ARP monitor is not available in this mode.
2200	
2201	balance-tlb: The balance-tlb mode balances outgoing traffic by peer.
2202		Since the balancing is done according to MAC address, in a
2203		"gatewayed" configuration (as described above), this mode will
2204		send all traffic across a single device.  However, in a
2205		"local" network configuration, this mode balances multiple
2206		local network peers across devices in a vaguely intelligent
2207		manner (not a simple XOR as in balance-xor or 802.3ad mode),
2208		so that mathematically unlucky MAC addresses (i.e., ones that
2209		XOR to the same value) will not all "bunch up" on a single
2210		interface.
2211	
2212		Unlike 802.3ad, interfaces may be of differing speeds, and no
2213		special switch configuration is required.  On the down side,
2214		in this mode all incoming traffic arrives over a single
2215		interface, this mode requires certain ethtool support in the
2216		network device driver of the slave interfaces, and the ARP
2217		monitor is not available.
2218	
2219	balance-alb: This mode is everything that balance-tlb is, and more.
2220		It has all of the features (and restrictions) of balance-tlb,
2221		and will also balance incoming traffic from local network
2222		peers (as described in the Bonding Module Options section,
2223		above).
2224	
2225		The only additional down side to this mode is that the network
2226		device driver must support changing the hardware address while
2227		the device is open.
2228	
2229	12.1.2 MT Link Monitoring for Single Switch Topology
2230	----------------------------------------------------
2231	
2232		The choice of link monitoring may largely depend upon which
2233	mode you choose to use.  The more advanced load balancing modes do not
2234	support the use of the ARP monitor, and are thus restricted to using
2235	the MII monitor (which does not provide as high a level of end to end
2236	assurance as the ARP monitor).
2237	
2238	12.2 Maximum Throughput in a Multiple Switch Topology
2239	-----------------------------------------------------
2240	
2241		Multiple switches may be utilized to optimize for throughput
2242	when they are configured in parallel as part of an isolated network
2243	between two or more systems, for example:
2244	
2245	                       +-----------+
2246	                       |  Host A   | 
2247	                       +-+---+---+-+
2248	                         |   |   |
2249	                +--------+   |   +---------+
2250	                |            |             |
2251	         +------+---+  +-----+----+  +-----+----+
2252	         | Switch A |  | Switch B |  | Switch C |
2253	         +------+---+  +-----+----+  +-----+----+
2254	                |            |             |
2255	                +--------+   |   +---------+
2256	                         |   |   |
2257	                       +-+---+---+-+
2258	                       |  Host B   | 
2259	                       +-----------+
2260	
2261		In this configuration, the switches are isolated from one
2262	another.  One reason to employ a topology such as this is for an
2263	isolated network with many hosts (a cluster configured for high
2264	performance, for example), using multiple smaller switches can be more
2265	cost effective than a single larger switch, e.g., on a network with 24
2266	hosts, three 24 port switches can be significantly less expensive than
2267	a single 72 port switch.
2268	
2269		If access beyond the network is required, an individual host
2270	can be equipped with an additional network device connected to an
2271	external network; this host then additionally acts as a gateway.
2272	
2273	12.2.1 MT Bonding Mode Selection for Multiple Switch Topology
2274	-------------------------------------------------------------
2275	
2276		In actual practice, the bonding mode typically employed in
2277	configurations of this type is balance-rr.  Historically, in this
2278	network configuration, the usual caveats about out of order packet
2279	delivery are mitigated by the use of network adapters that do not do
2280	any kind of packet coalescing (via the use of NAPI, or because the
2281	device itself does not generate interrupts until some number of
2282	packets has arrived).  When employed in this fashion, the balance-rr
2283	mode allows individual connections between two hosts to effectively
2284	utilize greater than one interface's bandwidth.
2285	
2286	12.2.2 MT Link Monitoring for Multiple Switch Topology
2287	------------------------------------------------------
2288	
2289		Again, in actual practice, the MII monitor is most often used
2290	in this configuration, as performance is given preference over
2291	availability.  The ARP monitor will function in this topology, but its
2292	advantages over the MII monitor are mitigated by the volume of probes
2293	needed as the number of systems involved grows (remember that each
2294	host in the network is configured with bonding).
2295	
2296	13. Switch Behavior Issues
2297	==========================
2298	
2299	13.1 Link Establishment and Failover Delays
2300	-------------------------------------------
2301	
2302		Some switches exhibit undesirable behavior with regard to the
2303	timing of link up and down reporting by the switch.
2304	
2305		First, when a link comes up, some switches may indicate that
2306	the link is up (carrier available), but not pass traffic over the
2307	interface for some period of time.  This delay is typically due to
2308	some type of autonegotiation or routing protocol, but may also occur
2309	during switch initialization (e.g., during recovery after a switch
2310	failure).  If you find this to be a problem, specify an appropriate
2311	value to the updelay bonding module option to delay the use of the
2312	relevant interface(s).
2313	
2314		Second, some switches may "bounce" the link state one or more
2315	times while a link is changing state.  This occurs most commonly while
2316	the switch is initializing.  Again, an appropriate updelay value may
2317	help.
2318	
2319		Note that when a bonding interface has no active links, the
2320	driver will immediately reuse the first link that goes up, even if the
2321	updelay parameter has been specified (the updelay is ignored in this
2322	case).  If there are slave interfaces waiting for the updelay timeout
2323	to expire, the interface that first went into that state will be
2324	immediately reused.  This reduces down time of the network if the
2325	value of updelay has been overestimated, and since this occurs only in
2326	cases with no connectivity, there is no additional penalty for
2327	ignoring the updelay.
2328	
2329		In addition to the concerns about switch timings, if your
2330	switches take a long time to go into backup mode, it may be desirable
2331	to not activate a backup interface immediately after a link goes down.
2332	Failover may be delayed via the downdelay bonding module option.
2333	
2334	13.2 Duplicated Incoming Packets
2335	--------------------------------
2336	
2337		NOTE: Starting with version 3.0.2, the bonding driver has logic to
2338	suppress duplicate packets, which should largely eliminate this problem.
2339	The following description is kept for reference.
2340	
2341		It is not uncommon to observe a short burst of duplicated
2342	traffic when the bonding device is first used, or after it has been
2343	idle for some period of time.  This is most easily observed by issuing
2344	a "ping" to some other host on the network, and noticing that the
2345	output from ping flags duplicates (typically one per slave).
2346	
2347		For example, on a bond in active-backup mode with five slaves
2348	all connected to one switch, the output may appear as follows:
2349	
2350	# ping -n 10.0.4.2
2351	PING 10.0.4.2 (10.0.4.2) from 10.0.3.10 : 56(84) bytes of data.
2352	64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.7 ms
2353	64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!)
2354	64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!)
2355	64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!)
2356	64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!)
2357	64 bytes from 10.0.4.2: icmp_seq=2 ttl=64 time=0.216 ms
2358	64 bytes from 10.0.4.2: icmp_seq=3 ttl=64 time=0.267 ms
2359	64 bytes from 10.0.4.2: icmp_seq=4 ttl=64 time=0.222 ms
2360	
2361		This is not due to an error in the bonding driver, rather, it
2362	is a side effect of how many switches update their MAC forwarding
2363	tables.  Initially, the switch does not associate the MAC address in
2364	the packet with a particular switch port, and so it may send the
2365	traffic to all ports until its MAC forwarding table is updated.  Since
2366	the interfaces attached to the bond may occupy multiple ports on a
2367	single switch, when the switch (temporarily) floods the traffic to all
2368	ports, the bond device receives multiple copies of the same packet
2369	(one per slave device).
2370	
2371		The duplicated packet behavior is switch dependent, some
2372	switches exhibit this, and some do not.  On switches that display this
2373	behavior, it can be induced by clearing the MAC forwarding table (on
2374	most Cisco switches, the privileged command "clear mac address-table
2375	dynamic" will accomplish this).
2376	
2377	14. Hardware Specific Considerations
2378	====================================
2379	
2380		This section contains additional information for configuring
2381	bonding on specific hardware platforms, or for interfacing bonding
2382	with particular switches or other devices.
2383	
2384	14.1 IBM BladeCenter
2385	--------------------
2386	
2387		This applies to the JS20 and similar systems.
2388	
2389		On the JS20 blades, the bonding driver supports only
2390	balance-rr, active-backup, balance-tlb and balance-alb modes.  This is
2391	largely due to the network topology inside the BladeCenter, detailed
2392	below.
2393	
2394	JS20 network adapter information
2395	--------------------------------
2396	
2397		All JS20s come with two Broadcom Gigabit Ethernet ports
2398	integrated on the planar (that's "motherboard" in IBM-speak).  In the
2399	BladeCenter chassis, the eth0 port of all JS20 blades is hard wired to
2400	I/O Module #1; similarly, all eth1 ports are wired to I/O Module #2.
2401	An add-on Broadcom daughter card can be installed on a JS20 to provide
2402	two more Gigabit Ethernet ports.  These ports, eth2 and eth3, are
2403	wired to I/O Modules 3 and 4, respectively.
2404	
2405		Each I/O Module may contain either a switch or a passthrough
2406	module (which allows ports to be directly connected to an external
2407	switch).  Some bonding modes require a specific BladeCenter internal
2408	network topology in order to function; these are detailed below.
2409	
2410		Additional BladeCenter-specific networking information can be
2411	found in two IBM Redbooks (www.ibm.com/redbooks):
2412	
2413	"IBM eServer BladeCenter Networking Options"
2414	"IBM eServer BladeCenter Layer 2-7 Network Switching"
2415	
2416	BladeCenter networking configuration
2417	------------------------------------
2418	
2419		Because a BladeCenter can be configured in a very large number
2420	of ways, this discussion will be confined to describing basic
2421	configurations.
2422	
2423		Normally, Ethernet Switch Modules (ESMs) are used in I/O
2424	modules 1 and 2.  In this configuration, the eth0 and eth1 ports of a
2425	JS20 will be connected to different internal switches (in the
2426	respective I/O modules).
2427	
2428		A passthrough module (OPM or CPM, optical or copper,
2429	passthrough module) connects the I/O module directly to an external
2430	switch.  By using PMs in I/O module #1 and #2, the eth0 and eth1
2431	interfaces of a JS20 can be redirected to the outside world and
2432	connected to a common external switch.
2433	
2434		Depending upon the mix of ESMs and PMs, the network will
2435	appear to bonding as either a single switch topology (all PMs) or as a
2436	multiple switch topology (one or more ESMs, zero or more PMs).  It is
2437	also possible to connect ESMs together, resulting in a configuration
2438	much like the example in "High Availability in a Multiple Switch
2439	Topology," above.
2440	
2441	Requirements for specific modes
2442	-------------------------------
2443	
2444		The balance-rr mode requires the use of passthrough modules
2445	for devices in the bond, all connected to an common external switch.
2446	That switch must be configured for "etherchannel" or "trunking" on the
2447	appropriate ports, as is usual for balance-rr.
2448	
2449		The balance-alb and balance-tlb modes will function with
2450	either switch modules or passthrough modules (or a mix).  The only
2451	specific requirement for these modes is that all network interfaces
2452	must be able to reach all destinations for traffic sent over the
2453	bonding device (i.e., the network must converge at some point outside
2454	the BladeCenter).
2455	
2456		The active-backup mode has no additional requirements.
2457	
2458	Link monitoring issues
2459	----------------------
2460	
2461		When an Ethernet Switch Module is in place, only the ARP
2462	monitor will reliably detect link loss to an external switch.  This is
2463	nothing unusual, but examination of the BladeCenter cabinet would
2464	suggest that the "external" network ports are the ethernet ports for
2465	the system, when it fact there is a switch between these "external"
2466	ports and the devices on the JS20 system itself.  The MII monitor is
2467	only able to detect link failures between the ESM and the JS20 system.
2468	
2469		When a passthrough module is in place, the MII monitor does
2470	detect failures to the "external" port, which is then directly
2471	connected to the JS20 system.
2472	
2473	Other concerns
2474	--------------
2475	
2476		The Serial Over LAN (SoL) link is established over the primary
2477	ethernet (eth0) only, therefore, any loss of link to eth0 will result
2478	in losing your SoL connection.  It will not fail over with other
2479	network traffic, as the SoL system is beyond the control of the
2480	bonding driver.
2481	
2482		It may be desirable to disable spanning tree on the switch
2483	(either the internal Ethernet Switch Module, or an external switch) to
2484	avoid fail-over delay issues when using bonding.
2485	
2486		
2487	15. Frequently Asked Questions
2488	==============================
2489	
2490	1.  Is it SMP safe?
2491	
2492		Yes. The old 2.0.xx channel bonding patch was not SMP safe.
2493	The new driver was designed to be SMP safe from the start.
2494	
2495	2.  What type of cards will work with it?
2496	
2497		Any Ethernet type cards (you can even mix cards - a Intel
2498	EtherExpress PRO/100 and a 3com 3c905b, for example).  For most modes,
2499	devices need not be of the same speed.
2500	
2501		Starting with version 3.2.1, bonding also supports Infiniband
2502	slaves in active-backup mode.
2503	
2504	3.  How many bonding devices can I have?
2505	
2506		There is no limit.
2507	
2508	4.  How many slaves can a bonding device have?
2509	
2510		This is limited only by the number of network interfaces Linux
2511	supports and/or the number of network cards you can place in your
2512	system.
2513	
2514	5.  What happens when a slave link dies?
2515	
2516		If link monitoring is enabled, then the failing device will be
2517	disabled.  The active-backup mode will fail over to a backup link, and
2518	other modes will ignore the failed link.  The link will continue to be
2519	monitored, and should it recover, it will rejoin the bond (in whatever
2520	manner is appropriate for the mode). See the sections on High
2521	Availability and the documentation for each mode for additional
2522	information.
2523		
2524		Link monitoring can be enabled via either the miimon or
2525	arp_interval parameters (described in the module parameters section,
2526	above).  In general, miimon monitors the carrier state as sensed by
2527	the underlying network device, and the arp monitor (arp_interval)
2528	monitors connectivity to another host on the local network.
2529	
2530		If no link monitoring is configured, the bonding driver will
2531	be unable to detect link failures, and will assume that all links are
2532	always available.  This will likely result in lost packets, and a
2533	resulting degradation of performance.  The precise performance loss
2534	depends upon the bonding mode and network configuration.
2535	
2536	6.  Can bonding be used for High Availability?
2537	
2538		Yes.  See the section on High Availability for details.
2539	
2540	7.  Which switches/systems does it work with?
2541	
2542		The full answer to this depends upon the desired mode.
2543	
2544		In the basic balance modes (balance-rr and balance-xor), it
2545	works with any system that supports etherchannel (also called
2546	trunking).  Most managed switches currently available have such
2547	support, and many unmanaged switches as well.
2548	
2549		The advanced balance modes (balance-tlb and balance-alb) do
2550	not have special switch requirements, but do need device drivers that
2551	support specific features (described in the appropriate section under
2552	module parameters, above).
2553	
2554		In 802.3ad mode, it works with systems that support IEEE
2555	802.3ad Dynamic Link Aggregation.  Most managed and many unmanaged
2556	switches currently available support 802.3ad.
2557	
2558	        The active-backup mode should work with any Layer-II switch.
2559	
2560	8.  Where does a bonding device get its MAC address from?
2561	
2562		When using slave devices that have fixed MAC addresses, or when
2563	the fail_over_mac option is enabled, the bonding device's MAC address is
2564	the MAC address of the active slave.
2565	
2566		For other configurations, if not explicitly configured (with
2567	ifconfig or ip link), the MAC address of the bonding device is taken from
2568	its first slave device.  This MAC address is then passed to all following
2569	slaves and remains persistent (even if the first slave is removed) until
2570	the bonding device is brought down or reconfigured.
2571	
2572		If you wish to change the MAC address, you can set it with
2573	ifconfig or ip link:
2574	
2575	# ifconfig bond0 hw ether 00:11:22:33:44:55
2576	
2577	# ip link set bond0 address 66:77:88:99:aa:bb
2578	
2579		The MAC address can be also changed by bringing down/up the
2580	device and then changing its slaves (or their order):
2581	
2582	# ifconfig bond0 down ; modprobe -r bonding
2583	# ifconfig bond0 .... up
2584	# ifenslave bond0 eth...
2585	
2586		This method will automatically take the address from the next
2587	slave that is added.
2588	
2589		To restore your slaves' MAC addresses, you need to detach them
2590	from the bond (`ifenslave -d bond0 eth0'). The bonding driver will
2591	then restore the MAC addresses that the slaves had before they were
2592	enslaved.
2593	
2594	16. Resources and Links
2595	=======================
2596	
2597		The latest version of the bonding driver can be found in the latest
2598	version of the linux kernel, found on http://kernel.org
2599	
2600		The latest version of this document can be found in the latest kernel
2601	source (named Documentation/networking/bonding.txt).
2602	
2603		Discussions regarding the usage of the bonding driver take place on the
2604	bonding-devel mailing list, hosted at sourceforge.net. If you have questions or
2605	problems, post them to the list.  The list address is:
2606	
2607	bonding-devel@lists.sourceforge.net
2608	
2609		The administrative interface (to subscribe or unsubscribe) can
2610	be found at:
2611	
2612	https://lists.sourceforge.net/lists/listinfo/bonding-devel
2613	
2614		Discussions regarding the developpement of the bonding driver take place
2615	on the main Linux network mailing list, hosted at vger.kernel.org. The list
2616	address is:
2617	
2618	netdev@vger.kernel.org
2619	
2620		The administrative interface (to subscribe or unsubscribe) can
2621	be found at:
2622	
2623	http://vger.kernel.org/vger-lists.html#netdev
2624	
2625	Donald Becker's Ethernet Drivers and diag programs may be found at :
2626	 - http://web.archive.org/web/*/http://www.scyld.com/network/ 
2627	
2628	You will also find a lot of information regarding Ethernet, NWay, MII,
2629	etc. at www.scyld.com.
2630	
2631	-- END --
Hide Line Numbers
About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Information is copyright its respective author. All material is available from the Linux Kernel Source distributed under a GPL License. This page is provided as a free service by mjmwired.net.