About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Documentation / networking / bonding.txt




Custom Search

Based on kernel version 3.13. Page generated on 2014-01-20 22:03 EST.

1	
2			Linux Ethernet Bonding Driver HOWTO
3	
4			Latest update: 27 April 2011
5	
6	Initial release : Thomas Davis <tadavis at lbl.gov>
7	Corrections, HA extensions : 2000/10/03-15 :
8	  - Willy Tarreau <willy at meta-x.org>
9	  - Constantine Gavrilov <const-g at xpert.com>
10	  - Chad N. Tindel <ctindel at ieee dot org>
11	  - Janice Girouard <girouard at us dot ibm dot com>
12	  - Jay Vosburgh <fubar at us dot ibm dot com>
13	
14	Reorganized and updated Feb 2005 by Jay Vosburgh
15	Added Sysfs information: 2006/04/24
16	  - Mitch Williams <mitch.a.williams at intel.com>
17	
18	Introduction
19	============
20	
21		The Linux bonding driver provides a method for aggregating
22	multiple network interfaces into a single logical "bonded" interface.
23	The behavior of the bonded interfaces depends upon the mode; generally
24	speaking, modes provide either hot standby or load balancing services.
25	Additionally, link integrity monitoring may be performed.
26		
27		The bonding driver originally came from Donald Becker's
28	beowulf patches for kernel 2.0. It has changed quite a bit since, and
29	the original tools from extreme-linux and beowulf sites will not work
30	with this version of the driver.
31	
32		For new versions of the driver, updated userspace tools, and
33	who to ask for help, please follow the links at the end of this file.
34	
35	Table of Contents
36	=================
37	
38	1. Bonding Driver Installation
39	
40	2. Bonding Driver Options
41	
42	3. Configuring Bonding Devices
43	3.1	Configuration with Sysconfig Support
44	3.1.1		Using DHCP with Sysconfig
45	3.1.2		Configuring Multiple Bonds with Sysconfig
46	3.2	Configuration with Initscripts Support
47	3.2.1		Using DHCP with Initscripts
48	3.2.2		Configuring Multiple Bonds with Initscripts
49	3.3	Configuring Bonding Manually with Ifenslave
50	3.3.1		Configuring Multiple Bonds Manually
51	3.4	Configuring Bonding Manually via Sysfs
52	3.5	Configuration with Interfaces Support
53	3.6	Overriding Configuration for Special Cases
54	
55	4. Querying Bonding Configuration
56	4.1	Bonding Configuration
57	4.2	Network Configuration
58	
59	5. Switch Configuration
60	
61	6. 802.1q VLAN Support
62	
63	7. Link Monitoring
64	7.1	ARP Monitor Operation
65	7.2	Configuring Multiple ARP Targets
66	7.3	MII Monitor Operation
67	
68	8. Potential Trouble Sources
69	8.1	Adventures in Routing
70	8.2	Ethernet Device Renaming
71	8.3	Painfully Slow Or No Failed Link Detection By Miimon
72	
73	9. SNMP agents
74	
75	10. Promiscuous mode
76	
77	11. Configuring Bonding for High Availability
78	11.1	High Availability in a Single Switch Topology
79	11.2	High Availability in a Multiple Switch Topology
80	11.2.1		HA Bonding Mode Selection for Multiple Switch Topology
81	11.2.2		HA Link Monitoring for Multiple Switch Topology
82	
83	12. Configuring Bonding for Maximum Throughput
84	12.1	Maximum Throughput in a Single Switch Topology
85	12.1.1		MT Bonding Mode Selection for Single Switch Topology
86	12.1.2		MT Link Monitoring for Single Switch Topology
87	12.2	Maximum Throughput in a Multiple Switch Topology
88	12.2.1		MT Bonding Mode Selection for Multiple Switch Topology
89	12.2.2		MT Link Monitoring for Multiple Switch Topology
90	
91	13. Switch Behavior Issues
92	13.1	Link Establishment and Failover Delays
93	13.2	Duplicated Incoming Packets
94	
95	14. Hardware Specific Considerations
96	14.1	IBM BladeCenter
97	
98	15. Frequently Asked Questions
99	
100	16. Resources and Links
101	
102	
103	1. Bonding Driver Installation
104	==============================
105	
106		Most popular distro kernels ship with the bonding driver
107	already available as a module. If your distro does not, or you
108	have need to compile bonding from source (e.g., configuring and
109	installing a mainline kernel from kernel.org), you'll need to perform
110	the following steps:
111	
112	1.1 Configure and build the kernel with bonding
113	-----------------------------------------------
114	
115		The current version of the bonding driver is available in the
116	drivers/net/bonding subdirectory of the most recent kernel source
117	(which is available on http://kernel.org).  Most users "rolling their
118	own" will want to use the most recent kernel from kernel.org.
119	
120		Configure kernel with "make menuconfig" (or "make xconfig" or
121	"make config"), then select "Bonding driver support" in the "Network
122	device support" section.  It is recommended that you configure the
123	driver as module since it is currently the only way to pass parameters
124	to the driver or configure more than one bonding device.
125	
126		Build and install the new kernel and modules.
127	
128	1.2 Bonding Control Utility
129	-------------------------------------
130	
131		 It is recommended to configure bonding via iproute2 (netlink)
132	or sysfs, the old ifenslave control utility is obsolete.
133	
134	2. Bonding Driver Options
135	=========================
136	
137		Options for the bonding driver are supplied as parameters to the
138	bonding module at load time, or are specified via sysfs.
139	
140		Module options may be given as command line arguments to the
141	insmod or modprobe command, but are usually specified in either the
142	/etc/modrobe.d/*.conf configuration files, or in a distro-specific
143	configuration file (some of which are detailed in the next section).
144	
145		Details on bonding support for sysfs is provided in the
146	"Configuring Bonding Manually via Sysfs" section, below.
147	
148		The available bonding driver parameters are listed below. If a
149	parameter is not specified the default value is used.  When initially
150	configuring a bond, it is recommended "tail -f /var/log/messages" be
151	run in a separate window to watch for bonding driver error messages.
152	
153		It is critical that either the miimon or arp_interval and
154	arp_ip_target parameters be specified, otherwise serious network
155	degradation will occur during link failures.  Very few devices do not
156	support at least miimon, so there is really no reason not to use it.
157	
158		Options with textual values will accept either the text name
159	or, for backwards compatibility, the option value.  E.g.,
160	"mode=802.3ad" and "mode=4" set the same mode.
161	
162		The parameters are as follows:
163	
164	active_slave
165	
166		Specifies the new active slave for modes that support it
167		(active-backup, balance-alb and balance-tlb).  Possible values
168		are the name of any currently enslaved interface, or an empty
169		string.  If a name is given, the slave and its link must be up in order
170		to be selected as the new active slave.  If an empty string is
171		specified, the current active slave is cleared, and a new active
172		slave is selected automatically.
173	
174		Note that this is only available through the sysfs interface. No module
175		parameter by this name exists.
176	
177		The normal value of this option is the name of the currently
178		active slave, or the empty string if there is no active slave or
179		the current mode does not use an active slave.
180	
181	ad_select
182	
183		Specifies the 802.3ad aggregation selection logic to use.  The
184		possible values and their effects are:
185	
186		stable or 0
187	
188			The active aggregator is chosen by largest aggregate
189			bandwidth.
190	
191			Reselection of the active aggregator occurs only when all
192			slaves of the active aggregator are down or the active
193			aggregator has no slaves.
194	
195			This is the default value.
196	
197		bandwidth or 1
198	
199			The active aggregator is chosen by largest aggregate
200			bandwidth.  Reselection occurs if:
201	
202			- A slave is added to or removed from the bond
203	
204			- Any slave's link state changes
205	
206			- Any slave's 802.3ad association state changes
207	
208			- The bond's administrative state changes to up
209	
210		count or 2
211	
212			The active aggregator is chosen by the largest number of
213			ports (slaves).  Reselection occurs as described under the
214			"bandwidth" setting, above.
215	
216		The bandwidth and count selection policies permit failover of
217		802.3ad aggregations when partial failure of the active aggregator
218		occurs.  This keeps the aggregator with the highest availability
219		(either in bandwidth or in number of ports) active at all times.
220	
221		This option was added in bonding version 3.4.0.
222	
223	all_slaves_active
224	
225		Specifies that duplicate frames (received on inactive ports) should be
226		dropped (0) or delivered (1).
227	
228		Normally, bonding will drop duplicate frames (received on inactive
229		ports), which is desirable for most users. But there are some times
230		it is nice to allow duplicate frames to be delivered.
231	
232		The default value is 0 (drop duplicate frames received on inactive
233		ports).
234	
235	arp_interval
236	
237		Specifies the ARP link monitoring frequency in milliseconds.
238	
239		The ARP monitor works by periodically checking the slave
240		devices to determine whether they have sent or received
241		traffic recently (the precise criteria depends upon the
242		bonding mode, and the state of the slave).  Regular traffic is
243		generated via ARP probes issued for the addresses specified by
244		the arp_ip_target option.
245	
246		This behavior can be modified by the arp_validate option,
247		below.
248	
249		If ARP monitoring is used in an etherchannel compatible mode
250		(modes 0 and 2), the switch should be configured in a mode
251		that evenly distributes packets across all links. If the
252		switch is configured to distribute the packets in an XOR
253		fashion, all replies from the ARP targets will be received on
254		the same link which could cause the other team members to
255		fail.  ARP monitoring should not be used in conjunction with
256		miimon.  A value of 0 disables ARP monitoring.  The default
257		value is 0.
258	
259	arp_ip_target
260	
261		Specifies the IP addresses to use as ARP monitoring peers when
262		arp_interval is > 0.  These are the targets of the ARP request
263		sent to determine the health of the link to the targets.
264		Specify these values in ddd.ddd.ddd.ddd format.  Multiple IP
265		addresses must be separated by a comma.  At least one IP
266		address must be given for ARP monitoring to function.  The
267		maximum number of targets that can be specified is 16.  The
268		default value is no IP addresses.
269	
270	arp_validate
271	
272		Specifies whether or not ARP probes and replies should be
273		validated in the active-backup mode.  This causes the ARP
274		monitor to examine the incoming ARP requests and replies, and
275		only consider a slave to be up if it is receiving the
276		appropriate ARP traffic.
277	
278		Possible values are:
279	
280		none or 0
281	
282			No validation is performed.  This is the default.
283	
284		active or 1
285	
286			Validation is performed only for the active slave.
287	
288		backup or 2
289	
290			Validation is performed only for backup slaves.
291	
292		all or 3
293	
294			Validation is performed for all slaves.
295	
296		For the active slave, the validation checks ARP replies to
297		confirm that they were generated by an arp_ip_target.  Since
298		backup slaves do not typically receive these replies, the
299		validation performed for backup slaves is on the ARP request
300		sent out via the active slave.  It is possible that some
301		switch or network configurations may result in situations
302		wherein the backup slaves do not receive the ARP requests; in
303		such a situation, validation of backup slaves must be
304		disabled.
305	
306		The validation of ARP requests on backup slaves is mainly
307		helping bonding to decide which slaves are more likely to
308		work in case of the active slave failure, it doesn't really
309		guarantee that the backup slave will work if it's selected
310		as the next active slave.
311	
312		This option is useful in network configurations in which
313		multiple bonding hosts are concurrently issuing ARPs to one or
314		more targets beyond a common switch.  Should the link between
315		the switch and target fail (but not the switch itself), the
316		probe traffic generated by the multiple bonding instances will
317		fool the standard ARP monitor into considering the links as
318		still up.  Use of the arp_validate option can resolve this, as
319		the ARP monitor will only consider ARP requests and replies
320		associated with its own instance of bonding.
321	
322		This option was added in bonding version 3.1.0.
323	
324	arp_all_targets
325	
326		Specifies the quantity of arp_ip_targets that must be reachable
327		in order for the ARP monitor to consider a slave as being up.
328		This option affects only active-backup mode for slaves with
329		arp_validation enabled.
330	
331		Possible values are:
332	
333		any or 0
334	
335			consider the slave up only when any of the arp_ip_targets
336			is reachable
337	
338		all or 1
339	
340			consider the slave up only when all of the arp_ip_targets
341			are reachable
342	
343	downdelay
344	
345		Specifies the time, in milliseconds, to wait before disabling
346		a slave after a link failure has been detected.  This option
347		is only valid for the miimon link monitor.  The downdelay
348		value should be a multiple of the miimon value; if not, it
349		will be rounded down to the nearest multiple.  The default
350		value is 0.
351	
352	fail_over_mac
353	
354		Specifies whether active-backup mode should set all slaves to
355		the same MAC address at enslavement (the traditional
356		behavior), or, when enabled, perform special handling of the
357		bond's MAC address in accordance with the selected policy.
358	
359		Possible values are:
360	
361		none or 0
362	
363			This setting disables fail_over_mac, and causes
364			bonding to set all slaves of an active-backup bond to
365			the same MAC address at enslavement time.  This is the
366			default.
367	
368		active or 1
369	
370			The "active" fail_over_mac policy indicates that the
371			MAC address of the bond should always be the MAC
372			address of the currently active slave.  The MAC
373			address of the slaves is not changed; instead, the MAC
374			address of the bond changes during a failover.
375	
376			This policy is useful for devices that cannot ever
377			alter their MAC address, or for devices that refuse
378			incoming broadcasts with their own source MAC (which
379			interferes with the ARP monitor).
380	
381			The down side of this policy is that every device on
382			the network must be updated via gratuitous ARP,
383			vs. just updating a switch or set of switches (which
384			often takes place for any traffic, not just ARP
385			traffic, if the switch snoops incoming traffic to
386			update its tables) for the traditional method.  If the
387			gratuitous ARP is lost, communication may be
388			disrupted.
389	
390			When this policy is used in conjunction with the mii
391			monitor, devices which assert link up prior to being
392			able to actually transmit and receive are particularly
393			susceptible to loss of the gratuitous ARP, and an
394			appropriate updelay setting may be required.
395	
396		follow or 2
397	
398			The "follow" fail_over_mac policy causes the MAC
399			address of the bond to be selected normally (normally
400			the MAC address of the first slave added to the bond).
401			However, the second and subsequent slaves are not set
402			to this MAC address while they are in a backup role; a
403			slave is programmed with the bond's MAC address at
404			failover time (and the formerly active slave receives
405			the newly active slave's MAC address).
406	
407			This policy is useful for multiport devices that
408			either become confused or incur a performance penalty
409			when multiple ports are programmed with the same MAC
410			address.
411	
412	
413		The default policy is none, unless the first slave cannot
414		change its MAC address, in which case the active policy is
415		selected by default.
416	
417		This option may be modified via sysfs only when no slaves are
418		present in the bond.
419	
420		This option was added in bonding version 3.2.0.  The "follow"
421		policy was added in bonding version 3.3.0.
422	
423	lacp_rate
424	
425		Option specifying the rate in which we'll ask our link partner
426		to transmit LACPDU packets in 802.3ad mode.  Possible values
427		are:
428	
429		slow or 0
430			Request partner to transmit LACPDUs every 30 seconds
431	
432		fast or 1
433			Request partner to transmit LACPDUs every 1 second
434	
435		The default is slow.
436	
437	max_bonds
438	
439		Specifies the number of bonding devices to create for this
440		instance of the bonding driver.  E.g., if max_bonds is 3, and
441		the bonding driver is not already loaded, then bond0, bond1
442		and bond2 will be created.  The default value is 1.  Specifying
443		a value of 0 will load bonding, but will not create any devices.
444	
445	miimon
446	
447		Specifies the MII link monitoring frequency in milliseconds.
448		This determines how often the link state of each slave is
449		inspected for link failures.  A value of zero disables MII
450		link monitoring.  A value of 100 is a good starting point.
451		The use_carrier option, below, affects how the link state is
452		determined.  See the High Availability section for additional
453		information.  The default value is 0.
454	
455	min_links
456	
457		Specifies the minimum number of links that must be active before
458		asserting carrier. It is similar to the Cisco EtherChannel min-links
459		feature. This allows setting the minimum number of member ports that
460		must be up (link-up state) before marking the bond device as up
461		(carrier on). This is useful for situations where higher level services
462		such as clustering want to ensure a minimum number of low bandwidth
463		links are active before switchover. This option only affect 802.3ad
464		mode.
465	
466		The default value is 0. This will cause carrier to be asserted (for
467		802.3ad mode) whenever there is an active aggregator, regardless of the
468		number of available links in that aggregator. Note that, because an
469		aggregator cannot be active without at least one available link,
470		setting this option to 0 or to 1 has the exact same effect.
471	
472	mode
473	
474		Specifies one of the bonding policies. The default is
475		balance-rr (round robin).  Possible values are:
476	
477		balance-rr or 0
478	
479			Round-robin policy: Transmit packets in sequential
480			order from the first available slave through the
481			last.  This mode provides load balancing and fault
482			tolerance.
483	
484		active-backup or 1
485	
486			Active-backup policy: Only one slave in the bond is
487			active.  A different slave becomes active if, and only
488			if, the active slave fails.  The bond's MAC address is
489			externally visible on only one port (network adapter)
490			to avoid confusing the switch.
491	
492			In bonding version 2.6.2 or later, when a failover
493			occurs in active-backup mode, bonding will issue one
494			or more gratuitous ARPs on the newly active slave.
495			One gratuitous ARP is issued for the bonding master
496			interface and each VLAN interfaces configured above
497			it, provided that the interface has at least one IP
498			address configured.  Gratuitous ARPs issued for VLAN
499			interfaces are tagged with the appropriate VLAN id.
500	
501			This mode provides fault tolerance.  The primary
502			option, documented below, affects the behavior of this
503			mode.
504	
505		balance-xor or 2
506	
507			XOR policy: Transmit based on the selected transmit
508			hash policy.  The default policy is a simple [(source
509			MAC address XOR'd with destination MAC address) modulo
510			slave count].  Alternate transmit policies may be
511			selected via the xmit_hash_policy option, described
512			below.
513	
514			This mode provides load balancing and fault tolerance.
515	
516		broadcast or 3
517	
518			Broadcast policy: transmits everything on all slave
519			interfaces.  This mode provides fault tolerance.
520	
521		802.3ad or 4
522	
523			IEEE 802.3ad Dynamic link aggregation.  Creates
524			aggregation groups that share the same speed and
525			duplex settings.  Utilizes all slaves in the active
526			aggregator according to the 802.3ad specification.
527	
528			Slave selection for outgoing traffic is done according
529			to the transmit hash policy, which may be changed from
530			the default simple XOR policy via the xmit_hash_policy
531			option, documented below.  Note that not all transmit
532			policies may be 802.3ad compliant, particularly in
533			regards to the packet mis-ordering requirements of
534			section 43.2.4 of the 802.3ad standard.  Differing
535			peer implementations will have varying tolerances for
536			noncompliance.
537	
538			Prerequisites:
539	
540			1. Ethtool support in the base drivers for retrieving
541			the speed and duplex of each slave.
542	
543			2. A switch that supports IEEE 802.3ad Dynamic link
544			aggregation.
545	
546			Most switches will require some type of configuration
547			to enable 802.3ad mode.
548	
549		balance-tlb or 5
550	
551			Adaptive transmit load balancing: channel bonding that
552			does not require any special switch support.  The
553			outgoing traffic is distributed according to the
554			current load (computed relative to the speed) on each
555			slave.  Incoming traffic is received by the current
556			slave.  If the receiving slave fails, another slave
557			takes over the MAC address of the failed receiving
558			slave.
559	
560			Prerequisite:
561	
562			Ethtool support in the base drivers for retrieving the
563			speed of each slave.
564	
565		balance-alb or 6
566	
567			Adaptive load balancing: includes balance-tlb plus
568			receive load balancing (rlb) for IPV4 traffic, and
569			does not require any special switch support.  The
570			receive load balancing is achieved by ARP negotiation.
571			The bonding driver intercepts the ARP Replies sent by
572			the local system on their way out and overwrites the
573			source hardware address with the unique hardware
574			address of one of the slaves in the bond such that
575			different peers use different hardware addresses for
576			the server.
577	
578			Receive traffic from connections created by the server
579			is also balanced.  When the local system sends an ARP
580			Request the bonding driver copies and saves the peer's
581			IP information from the ARP packet.  When the ARP
582			Reply arrives from the peer, its hardware address is
583			retrieved and the bonding driver initiates an ARP
584			reply to this peer assigning it to one of the slaves
585			in the bond.  A problematic outcome of using ARP
586			negotiation for balancing is that each time that an
587			ARP request is broadcast it uses the hardware address
588			of the bond.  Hence, peers learn the hardware address
589			of the bond and the balancing of receive traffic
590			collapses to the current slave.  This is handled by
591			sending updates (ARP Replies) to all the peers with
592			their individually assigned hardware address such that
593			the traffic is redistributed.  Receive traffic is also
594			redistributed when a new slave is added to the bond
595			and when an inactive slave is re-activated.  The
596			receive load is distributed sequentially (round robin)
597			among the group of highest speed slaves in the bond.
598	
599			When a link is reconnected or a new slave joins the
600			bond the receive traffic is redistributed among all
601			active slaves in the bond by initiating ARP Replies
602			with the selected MAC address to each of the
603			clients. The updelay parameter (detailed below) must
604			be set to a value equal or greater than the switch's
605			forwarding delay so that the ARP Replies sent to the
606			peers will not be blocked by the switch.
607	
608			Prerequisites:
609	
610			1. Ethtool support in the base drivers for retrieving
611			the speed of each slave.
612	
613			2. Base driver support for setting the hardware
614			address of a device while it is open.  This is
615			required so that there will always be one slave in the
616			team using the bond hardware address (the
617			curr_active_slave) while having a unique hardware
618			address for each slave in the bond.  If the
619			curr_active_slave fails its hardware address is
620			swapped with the new curr_active_slave that was
621			chosen.
622	
623	num_grat_arp
624	num_unsol_na
625	
626		Specify the number of peer notifications (gratuitous ARPs and
627		unsolicited IPv6 Neighbor Advertisements) to be issued after a
628		failover event.  As soon as the link is up on the new slave
629		(possibly immediately) a peer notification is sent on the
630		bonding device and each VLAN sub-device.  This is repeated at
631		each link monitor interval (arp_interval or miimon, whichever
632		is active) if the number is greater than 1.
633	
634		The valid range is 0 - 255; the default value is 1.  These options
635		affect only the active-backup mode.  These options were added for
636		bonding versions 3.3.0 and 3.4.0 respectively.
637	
638		From Linux 3.0 and bonding version 3.7.1, these notifications
639		are generated by the ipv4 and ipv6 code and the numbers of
640		repetitions cannot be set independently.
641	
642	packets_per_slave
643	
644		Specify the number of packets to transmit through a slave before
645		moving to the next one. When set to 0 then a slave is chosen at
646		random.
647	
648		The valid range is 0 - 65535; the default value is 1. This option
649		has effect only in balance-rr mode.
650	
651	primary
652	
653		A string (eth0, eth2, etc) specifying which slave is the
654		primary device.  The specified device will always be the
655		active slave while it is available.  Only when the primary is
656		off-line will alternate devices be used.  This is useful when
657		one slave is preferred over another, e.g., when one slave has
658		higher throughput than another.
659	
660		The primary option is only valid for active-backup mode.
661	
662	primary_reselect
663	
664		Specifies the reselection policy for the primary slave.  This
665		affects how the primary slave is chosen to become the active slave
666		when failure of the active slave or recovery of the primary slave
667		occurs.  This option is designed to prevent flip-flopping between
668		the primary slave and other slaves.  Possible values are:
669	
670		always or 0 (default)
671	
672			The primary slave becomes the active slave whenever it
673			comes back up.
674	
675		better or 1
676	
677			The primary slave becomes the active slave when it comes
678			back up, if the speed and duplex of the primary slave is
679			better than the speed and duplex of the current active
680			slave.
681	
682		failure or 2
683	
684			The primary slave becomes the active slave only if the
685			current active slave fails and the primary slave is up.
686	
687		The primary_reselect setting is ignored in two cases:
688	
689			If no slaves are active, the first slave to recover is
690			made the active slave.
691	
692			When initially enslaved, the primary slave is always made
693			the active slave.
694	
695		Changing the primary_reselect policy via sysfs will cause an
696		immediate selection of the best active slave according to the new
697		policy.  This may or may not result in a change of the active
698		slave, depending upon the circumstances.
699	
700		This option was added for bonding version 3.6.0.
701	
702	updelay
703	
704		Specifies the time, in milliseconds, to wait before enabling a
705		slave after a link recovery has been detected.  This option is
706		only valid for the miimon link monitor.  The updelay value
707		should be a multiple of the miimon value; if not, it will be
708		rounded down to the nearest multiple.  The default value is 0.
709	
710	use_carrier
711	
712		Specifies whether or not miimon should use MII or ETHTOOL
713		ioctls vs. netif_carrier_ok() to determine the link
714		status. The MII or ETHTOOL ioctls are less efficient and
715		utilize a deprecated calling sequence within the kernel.  The
716		netif_carrier_ok() relies on the device driver to maintain its
717		state with netif_carrier_on/off; at this writing, most, but
718		not all, device drivers support this facility.
719	
720		If bonding insists that the link is up when it should not be,
721		it may be that your network device driver does not support
722		netif_carrier_on/off.  The default state for netif_carrier is
723		"carrier on," so if a driver does not support netif_carrier,
724		it will appear as if the link is always up.  In this case,
725		setting use_carrier to 0 will cause bonding to revert to the
726		MII / ETHTOOL ioctl method to determine the link state.
727	
728		A value of 1 enables the use of netif_carrier_ok(), a value of
729		0 will use the deprecated MII / ETHTOOL ioctls.  The default
730		value is 1.
731	
732	xmit_hash_policy
733	
734		Selects the transmit hash policy to use for slave selection in
735		balance-xor and 802.3ad modes.  Possible values are:
736	
737		layer2
738	
739			Uses XOR of hardware MAC addresses to generate the
740			hash.  The formula is
741	
742			(source MAC XOR destination MAC) modulo slave count
743	
744			This algorithm will place all traffic to a particular
745			network peer on the same slave.
746	
747			This algorithm is 802.3ad compliant.
748	
749		layer2+3
750	
751			This policy uses a combination of layer2 and layer3
752			protocol information to generate the hash.
753	
754			Uses XOR of hardware MAC addresses and IP addresses to
755			generate the hash.  The formula is
756	
757			hash = source MAC XOR destination MAC
758			hash = hash XOR source IP XOR destination IP
759			hash = hash XOR (hash RSHIFT 16)
760			hash = hash XOR (hash RSHIFT 8)
761			And then hash is reduced modulo slave count.
762	
763			If the protocol is IPv6 then the source and destination
764			addresses are first hashed using ipv6_addr_hash.
765	
766			This algorithm will place all traffic to a particular
767			network peer on the same slave.  For non-IP traffic,
768			the formula is the same as for the layer2 transmit
769			hash policy.
770	
771			This policy is intended to provide a more balanced
772			distribution of traffic than layer2 alone, especially
773			in environments where a layer3 gateway device is
774			required to reach most destinations.
775	
776			This algorithm is 802.3ad compliant.
777	
778		layer3+4
779	
780			This policy uses upper layer protocol information,
781			when available, to generate the hash.  This allows for
782			traffic to a particular network peer to span multiple
783			slaves, although a single connection will not span
784			multiple slaves.
785	
786			The formula for unfragmented TCP and UDP packets is
787	
788			hash = source port, destination port (as in the header)
789			hash = hash XOR source IP XOR destination IP
790			hash = hash XOR (hash RSHIFT 16)
791			hash = hash XOR (hash RSHIFT 8)
792			And then hash is reduced modulo slave count.
793	
794			If the protocol is IPv6 then the source and destination
795			addresses are first hashed using ipv6_addr_hash.
796	
797			For fragmented TCP or UDP packets and all other IPv4 and
798			IPv6 protocol traffic, the source and destination port
799			information is omitted.  For non-IP traffic, the
800			formula is the same as for the layer2 transmit hash
801			policy.
802	
803			This algorithm is not fully 802.3ad compliant.  A
804			single TCP or UDP conversation containing both
805			fragmented and unfragmented packets will see packets
806			striped across two interfaces.  This may result in out
807			of order delivery.  Most traffic types will not meet
808			this criteria, as TCP rarely fragments traffic, and
809			most UDP traffic is not involved in extended
810			conversations.  Other implementations of 802.3ad may
811			or may not tolerate this noncompliance.
812	
813		encap2+3
814	
815			This policy uses the same formula as layer2+3 but it
816			relies on skb_flow_dissect to obtain the header fields
817			which might result in the use of inner headers if an
818			encapsulation protocol is used. For example this will
819			improve the performance for tunnel users because the
820			packets will be distributed according to the encapsulated
821			flows.
822	
823		encap3+4
824	
825			This policy uses the same formula as layer3+4 but it
826			relies on skb_flow_dissect to obtain the header fields
827			which might result in the use of inner headers if an
828			encapsulation protocol is used. For example this will
829			improve the performance for tunnel users because the
830			packets will be distributed according to the encapsulated
831			flows.
832	
833		The default value is layer2.  This option was added in bonding
834		version 2.6.3.  In earlier versions of bonding, this parameter
835		does not exist, and the layer2 policy is the only policy.  The
836		layer2+3 value was added for bonding version 3.2.2.
837	
838	resend_igmp
839	
840		Specifies the number of IGMP membership reports to be issued after
841		a failover event. One membership report is issued immediately after
842		the failover, subsequent packets are sent in each 200ms interval.
843	
844		The valid range is 0 - 255; the default value is 1. A value of 0
845		prevents the IGMP membership report from being issued in response
846		to the failover event.
847	
848		This option is useful for bonding modes balance-rr (0), active-backup
849		(1), balance-tlb (5) and balance-alb (6), in which a failover can
850		switch the IGMP traffic from one slave to another.  Therefore a fresh
851		IGMP report must be issued to cause the switch to forward the incoming
852		IGMP traffic over the newly selected slave.
853	
854		This option was added for bonding version 3.7.0.
855	
856	3. Configuring Bonding Devices
857	==============================
858	
859		You can configure bonding using either your distro's network
860	initialization scripts, or manually using either iproute2 or the
861	sysfs interface.  Distros generally use one of three packages for the
862	network initialization scripts: initscripts, sysconfig or interfaces.
863	Recent versions of these packages have support for bonding, while older
864	versions do not.
865	
866		We will first describe the options for configuring bonding for
867	distros using versions of initscripts, sysconfig and interfaces with full
868	or partial support for bonding, then provide information on enabling
869	bonding without support from the network initialization scripts (i.e.,
870	older versions of initscripts or sysconfig).
871	
872		If you're unsure whether your distro uses sysconfig,
873	initscripts or interfaces, or don't know if it's new enough, have no fear.
874	Determining this is fairly straightforward.
875	
876		First, look for a file called interfaces in /etc/network directory.
877	If this file is present in your system, then your system use interfaces. See
878	Configuration with Interfaces Support.
879	
880		Else, issue the command:
881	
882	$ rpm -qf /sbin/ifup
883	
884		It will respond with a line of text starting with either
885	"initscripts" or "sysconfig," followed by some numbers.  This is the
886	package that provides your network initialization scripts.
887	
888		Next, to determine if your installation supports bonding,
889	issue the command:
890	
891	$ grep ifenslave /sbin/ifup
892	
893		If this returns any matches, then your initscripts or
894	sysconfig has support for bonding.
895	
896	3.1 Configuration with Sysconfig Support
897	----------------------------------------
898	
899		This section applies to distros using a version of sysconfig
900	with bonding support, for example, SuSE Linux Enterprise Server 9.
901	
902		SuSE SLES 9's networking configuration system does support
903	bonding, however, at this writing, the YaST system configuration
904	front end does not provide any means to work with bonding devices.
905	Bonding devices can be managed by hand, however, as follows.
906	
907		First, if they have not already been configured, configure the
908	slave devices.  On SLES 9, this is most easily done by running the
909	yast2 sysconfig configuration utility.  The goal is for to create an
910	ifcfg-id file for each slave device.  The simplest way to accomplish
911	this is to configure the devices for DHCP (this is only to get the
912	file ifcfg-id file created; see below for some issues with DHCP).  The
913	name of the configuration file for each device will be of the form:
914	
915	ifcfg-id-xx:xx:xx:xx:xx:xx
916	
917		Where the "xx" portion will be replaced with the digits from
918	the device's permanent MAC address.
919	
920		Once the set of ifcfg-id-xx:xx:xx:xx:xx:xx files has been
921	created, it is necessary to edit the configuration files for the slave
922	devices (the MAC addresses correspond to those of the slave devices).
923	Before editing, the file will contain multiple lines, and will look
924	something like this:
925	
926	BOOTPROTO='dhcp'
927	STARTMODE='on'
928	USERCTL='no'
929	UNIQUE='XNzu.WeZGOGF+4wE'
930	_nm_name='bus-pci-0001:61:01.0'
931	
932		Change the BOOTPROTO and STARTMODE lines to the following:
933	
934	BOOTPROTO='none'
935	STARTMODE='off'
936	
937		Do not alter the UNIQUE or _nm_name lines.  Remove any other
938	lines (USERCTL, etc).
939	
940		Once the ifcfg-id-xx:xx:xx:xx:xx:xx files have been modified,
941	it's time to create the configuration file for the bonding device
942	itself.  This file is named ifcfg-bondX, where X is the number of the
943	bonding device to create, starting at 0.  The first such file is
944	ifcfg-bond0, the second is ifcfg-bond1, and so on.  The sysconfig
945	network configuration system will correctly start multiple instances
946	of bonding.
947	
948		The contents of the ifcfg-bondX file is as follows:
949	
950	BOOTPROTO="static"
951	BROADCAST="10.0.2.255"
952	IPADDR="10.0.2.10"
953	NETMASK="255.255.0.0"
954	NETWORK="10.0.2.0"
955	REMOTE_IPADDR=""
956	STARTMODE="onboot"
957	BONDING_MASTER="yes"
958	BONDING_MODULE_OPTS="mode=active-backup miimon=100"
959	BONDING_SLAVE0="eth0"
960	BONDING_SLAVE1="bus-pci-0000:06:08.1"
961	
962		Replace the sample BROADCAST, IPADDR, NETMASK and NETWORK
963	values with the appropriate values for your network.
964	
965		The STARTMODE specifies when the device is brought online.
966	The possible values are:
967	
968		onboot:	 The device is started at boot time.  If you're not
969			 sure, this is probably what you want.
970	
971		manual:	 The device is started only when ifup is called
972			 manually.  Bonding devices may be configured this
973			 way if you do not wish them to start automatically
974			 at boot for some reason.
975	
976		hotplug: The device is started by a hotplug event.  This is not
977			 a valid choice for a bonding device.
978	
979		off or ignore: The device configuration is ignored.
980	
981		The line BONDING_MASTER='yes' indicates that the device is a
982	bonding master device.  The only useful value is "yes."
983	
984		The contents of BONDING_MODULE_OPTS are supplied to the
985	instance of the bonding module for this device.  Specify the options
986	for the bonding mode, link monitoring, and so on here.  Do not include
987	the max_bonds bonding parameter; this will confuse the configuration
988	system if you have multiple bonding devices.
989	
990		Finally, supply one BONDING_SLAVEn="slave device" for each
991	slave.  where "n" is an increasing value, one for each slave.  The
992	"slave device" is either an interface name, e.g., "eth0", or a device
993	specifier for the network device.  The interface name is easier to
994	find, but the ethN names are subject to change at boot time if, e.g.,
995	a device early in the sequence has failed.  The device specifiers
996	(bus-pci-0000:06:08.1 in the example above) specify the physical
997	network device, and will not change unless the device's bus location
998	changes (for example, it is moved from one PCI slot to another).  The
999	example above uses one of each type for demonstration purposes; most
1000	configurations will choose one or the other for all slave devices.
1001	
1002		When all configuration files have been modified or created,
1003	networking must be restarted for the configuration changes to take
1004	effect.  This can be accomplished via the following:
1005	
1006	# /etc/init.d/network restart
1007	
1008		Note that the network control script (/sbin/ifdown) will
1009	remove the bonding module as part of the network shutdown processing,
1010	so it is not necessary to remove the module by hand if, e.g., the
1011	module parameters have changed.
1012	
1013		Also, at this writing, YaST/YaST2 will not manage bonding
1014	devices (they do not show bonding interfaces on its list of network
1015	devices).  It is necessary to edit the configuration file by hand to
1016	change the bonding configuration.
1017	
1018		Additional general options and details of the ifcfg file
1019	format can be found in an example ifcfg template file:
1020	
1021	/etc/sysconfig/network/ifcfg.template
1022	
1023		Note that the template does not document the various BONDING_
1024	settings described above, but does describe many of the other options.
1025	
1026	3.1.1 Using DHCP with Sysconfig
1027	-------------------------------
1028	
1029		Under sysconfig, configuring a device with BOOTPROTO='dhcp'
1030	will cause it to query DHCP for its IP address information.  At this
1031	writing, this does not function for bonding devices; the scripts
1032	attempt to obtain the device address from DHCP prior to adding any of
1033	the slave devices.  Without active slaves, the DHCP requests are not
1034	sent to the network.
1035	
1036	3.1.2 Configuring Multiple Bonds with Sysconfig
1037	-----------------------------------------------
1038	
1039		The sysconfig network initialization system is capable of
1040	handling multiple bonding devices.  All that is necessary is for each
1041	bonding instance to have an appropriately configured ifcfg-bondX file
1042	(as described above).  Do not specify the "max_bonds" parameter to any
1043	instance of bonding, as this will confuse sysconfig.  If you require
1044	multiple bonding devices with identical parameters, create multiple
1045	ifcfg-bondX files.
1046	
1047		Because the sysconfig scripts supply the bonding module
1048	options in the ifcfg-bondX file, it is not necessary to add them to
1049	the system /etc/modules.d/*.conf configuration files.
1050	
1051	3.2 Configuration with Initscripts Support
1052	------------------------------------------
1053	
1054		This section applies to distros using a recent version of
1055	initscripts with bonding support, for example, Red Hat Enterprise Linux
1056	version 3 or later, Fedora, etc.  On these systems, the network
1057	initialization scripts have knowledge of bonding, and can be configured to
1058	control bonding devices.  Note that older versions of the initscripts
1059	package have lower levels of support for bonding; this will be noted where
1060	applicable.
1061	
1062		These distros will not automatically load the network adapter
1063	driver unless the ethX device is configured with an IP address.
1064	Because of this constraint, users must manually configure a
1065	network-script file for all physical adapters that will be members of
1066	a bondX link.  Network script files are located in the directory:
1067	
1068	/etc/sysconfig/network-scripts
1069	
1070		The file name must be prefixed with "ifcfg-eth" and suffixed
1071	with the adapter's physical adapter number.  For example, the script
1072	for eth0 would be named /etc/sysconfig/network-scripts/ifcfg-eth0.
1073	Place the following text in the file:
1074	
1075	DEVICE=eth0
1076	USERCTL=no
1077	ONBOOT=yes
1078	MASTER=bond0
1079	SLAVE=yes
1080	BOOTPROTO=none
1081	
1082		The DEVICE= line will be different for every ethX device and
1083	must correspond with the name of the file, i.e., ifcfg-eth1 must have
1084	a device line of DEVICE=eth1.  The setting of the MASTER= line will
1085	also depend on the final bonding interface name chosen for your bond.
1086	As with other network devices, these typically start at 0, and go up
1087	one for each device, i.e., the first bonding instance is bond0, the
1088	second is bond1, and so on.
1089	
1090		Next, create a bond network script.  The file name for this
1091	script will be /etc/sysconfig/network-scripts/ifcfg-bondX where X is
1092	the number of the bond.  For bond0 the file is named "ifcfg-bond0",
1093	for bond1 it is named "ifcfg-bond1", and so on.  Within that file,
1094	place the following text:
1095	
1096	DEVICE=bond0
1097	IPADDR=192.168.1.1
1098	NETMASK=255.255.255.0
1099	NETWORK=192.168.1.0
1100	BROADCAST=192.168.1.255
1101	ONBOOT=yes
1102	BOOTPROTO=none
1103	USERCTL=no
1104	
1105		Be sure to change the networking specific lines (IPADDR,
1106	NETMASK, NETWORK and BROADCAST) to match your network configuration.
1107	
1108		For later versions of initscripts, such as that found with Fedora
1109	7 (or later) and Red Hat Enterprise Linux version 5 (or later), it is possible,
1110	and, indeed, preferable, to specify the bonding options in the ifcfg-bond0
1111	file, e.g. a line of the format:
1112	
1113	BONDING_OPTS="mode=active-backup arp_interval=60 arp_ip_target=192.168.1.254"
1114	
1115		will configure the bond with the specified options.  The options
1116	specified in BONDING_OPTS are identical to the bonding module parameters
1117	except for the arp_ip_target field when using versions of initscripts older
1118	than and 8.57 (Fedora 8) and 8.45.19 (Red Hat Enterprise Linux 5.2).  When
1119	using older versions each target should be included as a separate option and
1120	should be preceded by a '+' to indicate it should be added to the list of
1121	queried targets, e.g.,
1122	
1123		arp_ip_target=+192.168.1.1 arp_ip_target=+192.168.1.2
1124	
1125		is the proper syntax to specify multiple targets.  When specifying
1126	options via BONDING_OPTS, it is not necessary to edit /etc/modprobe.d/*.conf.
1127	
1128		For even older versions of initscripts that do not support
1129	BONDING_OPTS, it is necessary to edit /etc/modprobe.d/*.conf, depending upon
1130	your distro) to load the bonding module with your desired options when the
1131	bond0 interface is brought up.  The following lines in /etc/modprobe.d/*.conf
1132	will load the bonding module, and select its options:
1133	
1134	alias bond0 bonding
1135	options bond0 mode=balance-alb miimon=100
1136	
1137		Replace the sample parameters with the appropriate set of
1138	options for your configuration.
1139	
1140		Finally run "/etc/rc.d/init.d/network restart" as root.  This
1141	will restart the networking subsystem and your bond link should be now
1142	up and running.
1143	
1144	3.2.1 Using DHCP with Initscripts
1145	---------------------------------
1146	
1147		Recent versions of initscripts (the versions supplied with Fedora
1148	Core 3 and Red Hat Enterprise Linux 4, or later versions, are reported to
1149	work) have support for assigning IP information to bonding devices via
1150	DHCP.
1151	
1152		To configure bonding for DHCP, configure it as described
1153	above, except replace the line "BOOTPROTO=none" with "BOOTPROTO=dhcp"
1154	and add a line consisting of "TYPE=Bonding".  Note that the TYPE value
1155	is case sensitive.
1156	
1157	3.2.2 Configuring Multiple Bonds with Initscripts
1158	-------------------------------------------------
1159	
1160		Initscripts packages that are included with Fedora 7 and Red Hat
1161	Enterprise Linux 5 support multiple bonding interfaces by simply
1162	specifying the appropriate BONDING_OPTS= in ifcfg-bondX where X is the
1163	number of the bond.  This support requires sysfs support in the kernel,
1164	and a bonding driver of version 3.0.0 or later.  Other configurations may
1165	not support this method for specifying multiple bonding interfaces; for
1166	those instances, see the "Configuring Multiple Bonds Manually" section,
1167	below.
1168	
1169	3.3 Configuring Bonding Manually with iproute2
1170	-----------------------------------------------
1171	
1172		This section applies to distros whose network initialization
1173	scripts (the sysconfig or initscripts package) do not have specific
1174	knowledge of bonding.  One such distro is SuSE Linux Enterprise Server
1175	version 8.
1176	
1177		The general method for these systems is to place the bonding
1178	module parameters into a config file in /etc/modprobe.d/ (as
1179	appropriate for the installed distro), then add modprobe and/or
1180	`ip link` commands to the system's global init script.  The name of
1181	the global init script differs; for sysconfig, it is
1182	/etc/init.d/boot.local and for initscripts it is /etc/rc.d/rc.local.
1183	
1184		For example, if you wanted to make a simple bond of two e100
1185	devices (presumed to be eth0 and eth1), and have it persist across
1186	reboots, edit the appropriate file (/etc/init.d/boot.local or
1187	/etc/rc.d/rc.local), and add the following:
1188	
1189	modprobe bonding mode=balance-alb miimon=100
1190	modprobe e100
1191	ifconfig bond0 192.168.1.1 netmask 255.255.255.0 up
1192	ip link set eth0 master bond0
1193	ip link set eth1 master bond0
1194	
1195		Replace the example bonding module parameters and bond0
1196	network configuration (IP address, netmask, etc) with the appropriate
1197	values for your configuration.
1198	
1199		Unfortunately, this method will not provide support for the
1200	ifup and ifdown scripts on the bond devices.  To reload the bonding
1201	configuration, it is necessary to run the initialization script, e.g.,
1202	
1203	# /etc/init.d/boot.local
1204	
1205		or
1206	
1207	# /etc/rc.d/rc.local
1208	
1209		It may be desirable in such a case to create a separate script
1210	which only initializes the bonding configuration, then call that
1211	separate script from within boot.local.  This allows for bonding to be
1212	enabled without re-running the entire global init script.
1213	
1214		To shut down the bonding devices, it is necessary to first
1215	mark the bonding device itself as being down, then remove the
1216	appropriate device driver modules.  For our example above, you can do
1217	the following:
1218	
1219	# ifconfig bond0 down
1220	# rmmod bonding
1221	# rmmod e100
1222	
1223		Again, for convenience, it may be desirable to create a script
1224	with these commands.
1225	
1226	
1227	3.3.1 Configuring Multiple Bonds Manually
1228	-----------------------------------------
1229	
1230		This section contains information on configuring multiple
1231	bonding devices with differing options for those systems whose network
1232	initialization scripts lack support for configuring multiple bonds.
1233	
1234		If you require multiple bonding devices, but all with the same
1235	options, you may wish to use the "max_bonds" module parameter,
1236	documented above.
1237	
1238		To create multiple bonding devices with differing options, it is
1239	preferable to use bonding parameters exported by sysfs, documented in the
1240	section below.
1241	
1242		For versions of bonding without sysfs support, the only means to
1243	provide multiple instances of bonding with differing options is to load
1244	the bonding driver multiple times.  Note that current versions of the
1245	sysconfig network initialization scripts handle this automatically; if
1246	your distro uses these scripts, no special action is needed.  See the
1247	section Configuring Bonding Devices, above, if you're not sure about your
1248	network initialization scripts.
1249	
1250		To load multiple instances of the module, it is necessary to
1251	specify a different name for each instance (the module loading system
1252	requires that every loaded module, even multiple instances of the same
1253	module, have a unique name).  This is accomplished by supplying multiple
1254	sets of bonding options in /etc/modprobe.d/*.conf, for example:
1255	
1256	alias bond0 bonding
1257	options bond0 -o bond0 mode=balance-rr miimon=100
1258	
1259	alias bond1 bonding
1260	options bond1 -o bond1 mode=balance-alb miimon=50
1261	
1262		will load the bonding module two times.  The first instance is
1263	named "bond0" and creates the bond0 device in balance-rr mode with an
1264	miimon of 100.  The second instance is named "bond1" and creates the
1265	bond1 device in balance-alb mode with an miimon of 50.
1266	
1267		In some circumstances (typically with older distributions),
1268	the above does not work, and the second bonding instance never sees
1269	its options.  In that case, the second options line can be substituted
1270	as follows:
1271	
1272	install bond1 /sbin/modprobe --ignore-install bonding -o bond1 \
1273		mode=balance-alb miimon=50
1274	
1275		This may be repeated any number of times, specifying a new and
1276	unique name in place of bond1 for each subsequent instance.
1277	
1278		It has been observed that some Red Hat supplied kernels are unable
1279	to rename modules at load time (the "-o bond1" part).  Attempts to pass
1280	that option to modprobe will produce an "Operation not permitted" error.
1281	This has been reported on some Fedora Core kernels, and has been seen on
1282	RHEL 4 as well.  On kernels exhibiting this problem, it will be impossible
1283	to configure multiple bonds with differing parameters (as they are older
1284	kernels, and also lack sysfs support).
1285	
1286	3.4 Configuring Bonding Manually via Sysfs
1287	------------------------------------------
1288	
1289		Starting with version 3.0.0, Channel Bonding may be configured
1290	via the sysfs interface.  This interface allows dynamic configuration
1291	of all bonds in the system without unloading the module.  It also
1292	allows for adding and removing bonds at runtime.  Ifenslave is no
1293	longer required, though it is still supported.
1294	
1295		Use of the sysfs interface allows you to use multiple bonds
1296	with different configurations without having to reload the module.
1297	It also allows you to use multiple, differently configured bonds when
1298	bonding is compiled into the kernel.
1299	
1300		You must have the sysfs filesystem mounted to configure
1301	bonding this way.  The examples in this document assume that you
1302	are using the standard mount point for sysfs, e.g. /sys.  If your
1303	sysfs filesystem is mounted elsewhere, you will need to adjust the
1304	example paths accordingly.
1305	
1306	Creating and Destroying Bonds
1307	-----------------------------
1308	To add a new bond foo:
1309	# echo +foo > /sys/class/net/bonding_masters
1310	
1311	To remove an existing bond bar:
1312	# echo -bar > /sys/class/net/bonding_masters
1313	
1314	To show all existing bonds:
1315	# cat /sys/class/net/bonding_masters
1316	
1317	NOTE: due to 4K size limitation of sysfs files, this list may be
1318	truncated if you have more than a few hundred bonds.  This is unlikely
1319	to occur under normal operating conditions.
1320	
1321	Adding and Removing Slaves
1322	--------------------------
1323		Interfaces may be enslaved to a bond using the file
1324	/sys/class/net/<bond>/bonding/slaves.  The semantics for this file
1325	are the same as for the bonding_masters file.
1326	
1327	To enslave interface eth0 to bond bond0:
1328	# ifconfig bond0 up
1329	# echo +eth0 > /sys/class/net/bond0/bonding/slaves
1330	
1331	To free slave eth0 from bond bond0:
1332	# echo -eth0 > /sys/class/net/bond0/bonding/slaves
1333	
1334		When an interface is enslaved to a bond, symlinks between the
1335	two are created in the sysfs filesystem.  In this case, you would get
1336	/sys/class/net/bond0/slave_eth0 pointing to /sys/class/net/eth0, and
1337	/sys/class/net/eth0/master pointing to /sys/class/net/bond0.
1338	
1339		This means that you can tell quickly whether or not an
1340	interface is enslaved by looking for the master symlink.  Thus:
1341	# echo -eth0 > /sys/class/net/eth0/master/bonding/slaves
1342	will free eth0 from whatever bond it is enslaved to, regardless of
1343	the name of the bond interface.
1344	
1345	Changing a Bond's Configuration
1346	-------------------------------
1347		Each bond may be configured individually by manipulating the
1348	files located in /sys/class/net/<bond name>/bonding
1349	
1350		The names of these files correspond directly with the command-
1351	line parameters described elsewhere in this file, and, with the
1352	exception of arp_ip_target, they accept the same values.  To see the
1353	current setting, simply cat the appropriate file.
1354	
1355		A few examples will be given here; for specific usage
1356	guidelines for each parameter, see the appropriate section in this
1357	document.
1358	
1359	To configure bond0 for balance-alb mode:
1360	# ifconfig bond0 down
1361	# echo 6 > /sys/class/net/bond0/bonding/mode
1362	 - or -
1363	# echo balance-alb > /sys/class/net/bond0/bonding/mode
1364		NOTE: The bond interface must be down before the mode can be
1365	changed.
1366	
1367	To enable MII monitoring on bond0 with a 1 second interval:
1368	# echo 1000 > /sys/class/net/bond0/bonding/miimon
1369		NOTE: If ARP monitoring is enabled, it will disabled when MII
1370	monitoring is enabled, and vice-versa.
1371	
1372	To add ARP targets:
1373	# echo +192.168.0.100 > /sys/class/net/bond0/bonding/arp_ip_target
1374	# echo +192.168.0.101 > /sys/class/net/bond0/bonding/arp_ip_target
1375		NOTE:  up to 16 target addresses may be specified.
1376	
1377	To remove an ARP target:
1378	# echo -192.168.0.100 > /sys/class/net/bond0/bonding/arp_ip_target
1379	
1380	To configure the interval between learning packet transmits:
1381	# echo 12 > /sys/class/net/bond0/bonding/lp_interval
1382		NOTE: the lp_inteval is the number of seconds between instances where
1383	the bonding driver sends learning packets to each slaves peer switch.  The
1384	default interval is 1 second.
1385	
1386	Example Configuration
1387	---------------------
1388		We begin with the same example that is shown in section 3.3,
1389	executed with sysfs, and without using ifenslave.
1390	
1391		To make a simple bond of two e100 devices (presumed to be eth0
1392	and eth1), and have it persist across reboots, edit the appropriate
1393	file (/etc/init.d/boot.local or /etc/rc.d/rc.local), and add the
1394	following:
1395	
1396	modprobe bonding
1397	modprobe e100
1398	echo balance-alb > /sys/class/net/bond0/bonding/mode
1399	ifconfig bond0 192.168.1.1 netmask 255.255.255.0 up
1400	echo 100 > /sys/class/net/bond0/bonding/miimon
1401	echo +eth0 > /sys/class/net/bond0/bonding/slaves
1402	echo +eth1 > /sys/class/net/bond0/bonding/slaves
1403	
1404		To add a second bond, with two e1000 interfaces in
1405	active-backup mode, using ARP monitoring, add the following lines to
1406	your init script:
1407	
1408	modprobe e1000
1409	echo +bond1 > /sys/class/net/bonding_masters
1410	echo active-backup > /sys/class/net/bond1/bonding/mode
1411	ifconfig bond1 192.168.2.1 netmask 255.255.255.0 up
1412	echo +192.168.2.100 /sys/class/net/bond1/bonding/arp_ip_target
1413	echo 2000 > /sys/class/net/bond1/bonding/arp_interval
1414	echo +eth2 > /sys/class/net/bond1/bonding/slaves
1415	echo +eth3 > /sys/class/net/bond1/bonding/slaves
1416	
1417	3.5 Configuration with Interfaces Support
1418	-----------------------------------------
1419	
1420	        This section applies to distros which use /etc/network/interfaces file
1421	to describe network interface configuration, most notably Debian and it's
1422	derivatives.
1423	
1424		The ifup and ifdown commands on Debian don't support bonding out of
1425	the box. The ifenslave-2.6 package should be installed to provide bonding
1426	support.  Once installed, this package will provide bond-* options to be used
1427	into /etc/network/interfaces.
1428	
1429		Note that ifenslave-2.6 package will load the bonding module and use
1430	the ifenslave command when appropriate.
1431	
1432	Example Configurations
1433	----------------------
1434	
1435	In /etc/network/interfaces, the following stanza will configure bond0, in
1436	active-backup mode, with eth0 and eth1 as slaves.
1437	
1438	auto bond0
1439	iface bond0 inet dhcp
1440		bond-slaves eth0 eth1
1441		bond-mode active-backup
1442		bond-miimon 100
1443		bond-primary eth0 eth1
1444	
1445	If the above configuration doesn't work, you might have a system using
1446	upstart for system startup. This is most notably true for recent
1447	Ubuntu versions. The following stanza in /etc/network/interfaces will
1448	produce the same result on those systems.
1449	
1450	auto bond0
1451	iface bond0 inet dhcp
1452		bond-slaves none
1453		bond-mode active-backup
1454		bond-miimon 100
1455	
1456	auto eth0
1457	iface eth0 inet manual
1458		bond-master bond0
1459		bond-primary eth0 eth1
1460	
1461	auto eth1
1462	iface eth1 inet manual
1463		bond-master bond0
1464		bond-primary eth0 eth1
1465	
1466	For a full list of bond-* supported options in /etc/network/interfaces and some
1467	more advanced examples tailored to you particular distros, see the files in
1468	/usr/share/doc/ifenslave-2.6.
1469	
1470	3.6 Overriding Configuration for Special Cases
1471	----------------------------------------------
1472	
1473	When using the bonding driver, the physical port which transmits a frame is
1474	typically selected by the bonding driver, and is not relevant to the user or
1475	system administrator.  The output port is simply selected using the policies of
1476	the selected bonding mode.  On occasion however, it is helpful to direct certain
1477	classes of traffic to certain physical interfaces on output to implement
1478	slightly more complex policies.  For example, to reach a web server over a
1479	bonded interface in which eth0 connects to a private network, while eth1
1480	connects via a public network, it may be desirous to bias the bond to send said
1481	traffic over eth0 first, using eth1 only as a fall back, while all other traffic
1482	can safely be sent over either interface.  Such configurations may be achieved
1483	using the traffic control utilities inherent in linux.
1484	
1485	By default the bonding driver is multiqueue aware and 16 queues are created
1486	when the driver initializes (see Documentation/networking/multiqueue.txt
1487	for details).  If more or less queues are desired the module parameter
1488	tx_queues can be used to change this value.  There is no sysfs parameter
1489	available as the allocation is done at module init time.
1490	
1491	The output of the file /proc/net/bonding/bondX has changed so the output Queue
1492	ID is now printed for each slave:
1493	
1494	Bonding Mode: fault-tolerance (active-backup)
1495	Primary Slave: None
1496	Currently Active Slave: eth0
1497	MII Status: up
1498	MII Polling Interval (ms): 0
1499	Up Delay (ms): 0
1500	Down Delay (ms): 0
1501	
1502	Slave Interface: eth0
1503	MII Status: up
1504	Link Failure Count: 0
1505	Permanent HW addr: 00:1a:a0:12:8f:cb
1506	Slave queue ID: 0
1507	
1508	Slave Interface: eth1
1509	MII Status: up
1510	Link Failure Count: 0
1511	Permanent HW addr: 00:1a:a0:12:8f:cc
1512	Slave queue ID: 2
1513	
1514	The queue_id for a slave can be set using the command:
1515	
1516	# echo "eth1:2" > /sys/class/net/bond0/bonding/queue_id
1517	
1518	Any interface that needs a queue_id set should set it with multiple calls
1519	like the one above until proper priorities are set for all interfaces.  On
1520	distributions that allow configuration via initscripts, multiple 'queue_id'
1521	arguments can be added to BONDING_OPTS to set all needed slave queues.
1522	
1523	These queue id's can be used in conjunction with the tc utility to configure
1524	a multiqueue qdisc and filters to bias certain traffic to transmit on certain
1525	slave devices.  For instance, say we wanted, in the above configuration to
1526	force all traffic bound to 192.168.1.100 to use eth1 in the bond as its output
1527	device. The following commands would accomplish this:
1528	
1529	# tc qdisc add dev bond0 handle 1 root multiq
1530	
1531	# tc filter add dev bond0 protocol ip parent 1: prio 1 u32 match ip dst \
1532		192.168.1.100 action skbedit queue_mapping 2
1533	
1534	These commands tell the kernel to attach a multiqueue queue discipline to the
1535	bond0 interface and filter traffic enqueued to it, such that packets with a dst
1536	ip of 192.168.1.100 have their output queue mapping value overwritten to 2.
1537	This value is then passed into the driver, causing the normal output path
1538	selection policy to be overridden, selecting instead qid 2, which maps to eth1.
1539	
1540	Note that qid values begin at 1.  Qid 0 is reserved to initiate to the driver
1541	that normal output policy selection should take place.  One benefit to simply
1542	leaving the qid for a slave to 0 is the multiqueue awareness in the bonding
1543	driver that is now present.  This awareness allows tc filters to be placed on
1544	slave devices as well as bond devices and the bonding driver will simply act as
1545	a pass-through for selecting output queues on the slave device rather than 
1546	output port selection.
1547	
1548	This feature first appeared in bonding driver version 3.7.0 and support for
1549	output slave selection was limited to round-robin and active-backup modes.
1550	
1551	4 Querying Bonding Configuration
1552	=================================
1553	
1554	4.1 Bonding Configuration
1555	-------------------------
1556	
1557		Each bonding device has a read-only file residing in the
1558	/proc/net/bonding directory.  The file contents include information
1559	about the bonding configuration, options and state of each slave.
1560	
1561		For example, the contents of /proc/net/bonding/bond0 after the
1562	driver is loaded with parameters of mode=0 and miimon=1000 is
1563	generally as follows:
1564	
1565		Ethernet Channel Bonding Driver: 2.6.1 (October 29, 2004)
1566	        Bonding Mode: load balancing (round-robin)
1567	        Currently Active Slave: eth0
1568	        MII Status: up
1569	        MII Polling Interval (ms): 1000
1570	        Up Delay (ms): 0
1571	        Down Delay (ms): 0
1572	
1573	        Slave Interface: eth1
1574	        MII Status: up
1575	        Link Failure Count: 1
1576	
1577	        Slave Interface: eth0
1578	        MII Status: up
1579	        Link Failure Count: 1
1580	
1581		The precise format and contents will change depending upon the
1582	bonding configuration, state, and version of the bonding driver.
1583	
1584	4.2 Network configuration
1585	-------------------------
1586	
1587		The network configuration can be inspected using the ifconfig
1588	command.  Bonding devices will have the MASTER flag set; Bonding slave
1589	devices will have the SLAVE flag set.  The ifconfig output does not
1590	contain information on which slaves are associated with which masters.
1591	
1592		In the example below, the bond0 interface is the master
1593	(MASTER) while eth0 and eth1 are slaves (SLAVE). Notice all slaves of
1594	bond0 have the same MAC address (HWaddr) as bond0 for all modes except
1595	TLB and ALB that require a unique MAC address for each slave.
1596	
1597	# /sbin/ifconfig
1598	bond0     Link encap:Ethernet  HWaddr 00:C0:F0:1F:37:B4
1599	          inet addr:XXX.XXX.XXX.YYY  Bcast:XXX.XXX.XXX.255  Mask:255.255.252.0
1600	          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
1601	          RX packets:7224794 errors:0 dropped:0 overruns:0 frame:0
1602	          TX packets:3286647 errors:1 dropped:0 overruns:1 carrier:0
1603	          collisions:0 txqueuelen:0
1604	
1605	eth0      Link encap:Ethernet  HWaddr 00:C0:F0:1F:37:B4
1606	          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
1607	          RX packets:3573025 errors:0 dropped:0 overruns:0 frame:0
1608	          TX packets:1643167 errors:1 dropped:0 overruns:1 carrier:0
1609	          collisions:0 txqueuelen:100
1610	          Interrupt:10 Base address:0x1080
1611	
1612	eth1      Link encap:Ethernet  HWaddr 00:C0:F0:1F:37:B4
1613	          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
1614	          RX packets:3651769 errors:0 dropped:0 overruns:0 frame:0
1615	          TX packets:1643480 errors:0 dropped:0 overruns:0 carrier:0
1616	          collisions:0 txqueuelen:100
1617	          Interrupt:9 Base address:0x1400
1618	
1619	5. Switch Configuration
1620	=======================
1621	
1622		For this section, "switch" refers to whatever system the
1623	bonded devices are directly connected to (i.e., where the other end of
1624	the cable plugs into).  This may be an actual dedicated switch device,
1625	or it may be another regular system (e.g., another computer running
1626	Linux),
1627	
1628		The active-backup, balance-tlb and balance-alb modes do not
1629	require any specific configuration of the switch.
1630	
1631		The 802.3ad mode requires that the switch have the appropriate
1632	ports configured as an 802.3ad aggregation.  The precise method used
1633	to configure this varies from switch to switch, but, for example, a
1634	Cisco 3550 series switch requires that the appropriate ports first be
1635	grouped together in a single etherchannel instance, then that
1636	etherchannel is set to mode "lacp" to enable 802.3ad (instead of
1637	standard EtherChannel).
1638	
1639		The balance-rr, balance-xor and broadcast modes generally
1640	require that the switch have the appropriate ports grouped together.
1641	The nomenclature for such a group differs between switches, it may be
1642	called an "etherchannel" (as in the Cisco example, above), a "trunk
1643	group" or some other similar variation.  For these modes, each switch
1644	will also have its own configuration options for the switch's transmit
1645	policy to the bond.  Typical choices include XOR of either the MAC or
1646	IP addresses.  The transmit policy of the two peers does not need to
1647	match.  For these three modes, the bonding mode really selects a
1648	transmit policy for an EtherChannel group; all three will interoperate
1649	with another EtherChannel group.
1650	
1651	
1652	6. 802.1q VLAN Support
1653	======================
1654	
1655		It is possible to configure VLAN devices over a bond interface
1656	using the 8021q driver.  However, only packets coming from the 8021q
1657	driver and passing through bonding will be tagged by default.  Self
1658	generated packets, for example, bonding's learning packets or ARP
1659	packets generated by either ALB mode or the ARP monitor mechanism, are
1660	tagged internally by bonding itself.  As a result, bonding must
1661	"learn" the VLAN IDs configured above it, and use those IDs to tag
1662	self generated packets.
1663	
1664		For reasons of simplicity, and to support the use of adapters
1665	that can do VLAN hardware acceleration offloading, the bonding
1666	interface declares itself as fully hardware offloading capable, it gets
1667	the add_vid/kill_vid notifications to gather the necessary
1668	information, and it propagates those actions to the slaves.  In case
1669	of mixed adapter types, hardware accelerated tagged packets that
1670	should go through an adapter that is not offloading capable are
1671	"un-accelerated" by the bonding driver so the VLAN tag sits in the
1672	regular location.
1673	
1674		VLAN interfaces *must* be added on top of a bonding interface
1675	only after enslaving at least one slave.  The bonding interface has a
1676	hardware address of 00:00:00:00:00:00 until the first slave is added.
1677	If the VLAN interface is created prior to the first enslavement, it
1678	would pick up the all-zeroes hardware address.  Once the first slave
1679	is attached to the bond, the bond device itself will pick up the
1680	slave's hardware address, which is then available for the VLAN device.
1681	
1682		Also, be aware that a similar problem can occur if all slaves
1683	are released from a bond that still has one or more VLAN interfaces on
1684	top of it.  When a new slave is added, the bonding interface will
1685	obtain its hardware address from the first slave, which might not
1686	match the hardware address of the VLAN interfaces (which was
1687	ultimately copied from an earlier slave).
1688	
1689		There are two methods to insure that the VLAN device operates
1690	with the correct hardware address if all slaves are removed from a
1691	bond interface:
1692	
1693		1. Remove all VLAN interfaces then recreate them
1694	
1695		2. Set the bonding interface's hardware address so that it
1696	matches the hardware address of the VLAN interfaces.
1697	
1698		Note that changing a VLAN interface's HW address would set the
1699	underlying device -- i.e. the bonding interface -- to promiscuous
1700	mode, which might not be what you want.
1701	
1702	
1703	7. Link Monitoring
1704	==================
1705	
1706		The bonding driver at present supports two schemes for
1707	monitoring a slave device's link state: the ARP monitor and the MII
1708	monitor.
1709	
1710		At the present time, due to implementation restrictions in the
1711	bonding driver itself, it is not possible to enable both ARP and MII
1712	monitoring simultaneously.
1713	
1714	7.1 ARP Monitor Operation
1715	-------------------------
1716	
1717		The ARP monitor operates as its name suggests: it sends ARP
1718	queries to one or more designated peer systems on the network, and
1719	uses the response as an indication that the link is operating.  This
1720	gives some assurance that traffic is actually flowing to and from one
1721	or more peers on the local network.
1722	
1723		The ARP monitor relies on the device driver itself to verify
1724	that traffic is flowing.  In particular, the driver must keep up to
1725	date the last receive time, dev->last_rx, and transmit start time,
1726	dev->trans_start.  If these are not updated by the driver, then the
1727	ARP monitor will immediately fail any slaves using that driver, and
1728	those slaves will stay down.  If networking monitoring (tcpdump, etc)
1729	shows the ARP requests and replies on the network, then it may be that
1730	your device driver is not updating last_rx and trans_start.
1731	
1732	7.2 Configuring Multiple ARP Targets
1733	------------------------------------
1734	
1735		While ARP monitoring can be done with just one target, it can
1736	be useful in a High Availability setup to have several targets to
1737	monitor.  In the case of just one target, the target itself may go
1738	down or have a problem making it unresponsive to ARP requests.  Having
1739	an additional target (or several) increases the reliability of the ARP
1740	monitoring.
1741	
1742		Multiple ARP targets must be separated by commas as follows:
1743	
1744	# example options for ARP monitoring with three targets
1745	alias bond0 bonding
1746	options bond0 arp_interval=60 arp_ip_target=192.168.0.1,192.168.0.3,192.168.0.9
1747	
1748		For just a single target the options would resemble:
1749	
1750	# example options for ARP monitoring with one target
1751	alias bond0 bonding
1752	options bond0 arp_interval=60 arp_ip_target=192.168.0.100
1753	
1754	
1755	7.3 MII Monitor Operation
1756	-------------------------
1757	
1758		The MII monitor monitors only the carrier state of the local
1759	network interface.  It accomplishes this in one of three ways: by
1760	depending upon the device driver to maintain its carrier state, by
1761	querying the device's MII registers, or by making an ethtool query to
1762	the device.
1763	
1764		If the use_carrier module parameter is 1 (the default value),
1765	then the MII monitor will rely on the driver for carrier state
1766	information (via the netif_carrier subsystem).  As explained in the
1767	use_carrier parameter information, above, if the MII monitor fails to
1768	detect carrier loss on the device (e.g., when the cable is physically
1769	disconnected), it may be that the driver does not support
1770	netif_carrier.
1771	
1772		If use_carrier is 0, then the MII monitor will first query the
1773	device's (via ioctl) MII registers and check the link state.  If that
1774	request fails (not just that it returns carrier down), then the MII
1775	monitor will make an ethtool ETHOOL_GLINK request to attempt to obtain
1776	the same information.  If both methods fail (i.e., the driver either
1777	does not support or had some error in processing both the MII register
1778	and ethtool requests), then the MII monitor will assume the link is
1779	up.
1780	
1781	8. Potential Sources of Trouble
1782	===============================
1783	
1784	8.1 Adventures in Routing
1785	-------------------------
1786	
1787		When bonding is configured, it is important that the slave
1788	devices not have routes that supersede routes of the master (or,
1789	generally, not have routes at all).  For example, suppose the bonding
1790	device bond0 has two slaves, eth0 and eth1, and the routing table is
1791	as follows:
1792	
1793	Kernel IP routing table
1794	Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
1795	10.0.0.0        0.0.0.0         255.255.0.0     U        40 0          0 eth0
1796	10.0.0.0        0.0.0.0         255.255.0.0     U        40 0          0 eth1
1797	10.0.0.0        0.0.0.0         255.255.0.0     U        40 0          0 bond0
1798	127.0.0.0       0.0.0.0         255.0.0.0       U        40 0          0 lo
1799	
1800		This routing configuration will likely still update the
1801	receive/transmit times in the driver (needed by the ARP monitor), but
1802	may bypass the bonding driver (because outgoing traffic to, in this
1803	case, another host on network 10 would use eth0 or eth1 before bond0).
1804	
1805		The ARP monitor (and ARP itself) may become confused by this
1806	configuration, because ARP requests (generated by the ARP monitor)
1807	will be sent on one interface (bond0), but the corresponding reply
1808	will arrive on a different interface (eth0).  This reply looks to ARP
1809	as an unsolicited ARP reply (because ARP matches replies on an
1810	interface basis), and is discarded.  The MII monitor is not affected
1811	by the state of the routing table.
1812	
1813		The solution here is simply to insure that slaves do not have
1814	routes of their own, and if for some reason they must, those routes do
1815	not supersede routes of their master.  This should generally be the
1816	case, but unusual configurations or errant manual or automatic static
1817	route additions may cause trouble.
1818	
1819	8.2 Ethernet Device Renaming
1820	----------------------------
1821	
1822		On systems with network configuration scripts that do not
1823	associate physical devices directly with network interface names (so
1824	that the same physical device always has the same "ethX" name), it may
1825	be necessary to add some special logic to config files in
1826	/etc/modprobe.d/.
1827	
1828		For example, given a modules.conf containing the following:
1829	
1830	alias bond0 bonding
1831	options bond0 mode=some-mode miimon=50
1832	alias eth0 tg3
1833	alias eth1 tg3
1834	alias eth2 e1000
1835	alias eth3 e1000
1836	
1837		If neither eth0 and eth1 are slaves to bond0, then when the
1838	bond0 interface comes up, the devices may end up reordered.  This
1839	happens because bonding is loaded first, then its slave device's
1840	drivers are loaded next.  Since no other drivers have been loaded,
1841	when the e1000 driver loads, it will receive eth0 and eth1 for its
1842	devices, but the bonding configuration tries to enslave eth2 and eth3
1843	(which may later be assigned to the tg3 devices).
1844	
1845		Adding the following:
1846	
1847	add above bonding e1000 tg3
1848	
1849		causes modprobe to load e1000 then tg3, in that order, when
1850	bonding is loaded.  This command is fully documented in the
1851	modules.conf manual page.
1852	
1853		On systems utilizing modprobe an equivalent problem can occur.
1854	In this case, the following can be added to config files in
1855	/etc/modprobe.d/ as:
1856	
1857	softdep bonding pre: tg3 e1000
1858	
1859		This will load tg3 and e1000 modules before loading the bonding one.
1860	Full documentation on this can be found in the modprobe.d and modprobe
1861	manual pages.
1862	
1863	8.3. Painfully Slow Or No Failed Link Detection By Miimon
1864	---------------------------------------------------------
1865	
1866		By default, bonding enables the use_carrier option, which
1867	instructs bonding to trust the driver to maintain carrier state.
1868	
1869		As discussed in the options section, above, some drivers do
1870	not support the netif_carrier_on/_off link state tracking system.
1871	With use_carrier enabled, bonding will always see these links as up,
1872	regardless of their actual state.
1873	
1874		Additionally, other drivers do support netif_carrier, but do
1875	not maintain it in real time, e.g., only polling the link state at
1876	some fixed interval.  In this case, miimon will detect failures, but
1877	only after some long period of time has expired.  If it appears that
1878	miimon is very slow in detecting link failures, try specifying
1879	use_carrier=0 to see if that improves the failure detection time.  If
1880	it does, then it may be that the driver checks the carrier state at a
1881	fixed interval, but does not cache the MII register values (so the
1882	use_carrier=0 method of querying the registers directly works).  If
1883	use_carrier=0 does not improve the failover, then the driver may cache
1884	the registers, or the problem may be elsewhere.
1885	
1886		Also, remember that miimon only checks for the device's
1887	carrier state.  It has no way to determine the state of devices on or
1888	beyond other ports of a switch, or if a switch is refusing to pass
1889	traffic while still maintaining carrier on.
1890	
1891	9. SNMP agents
1892	===============
1893	
1894		If running SNMP agents, the bonding driver should be loaded
1895	before any network drivers participating in a bond.  This requirement
1896	is due to the interface index (ipAdEntIfIndex) being associated to
1897	the first interface found with a given IP address.  That is, there is
1898	only one ipAdEntIfIndex for each IP address.  For example, if eth0 and
1899	eth1 are slaves of bond0 and the driver for eth0 is loaded before the
1900	bonding driver, the interface for the IP address will be associated
1901	with the eth0 interface.  This configuration is shown below, the IP
1902	address 192.168.1.1 has an interface index of 2 which indexes to eth0
1903	in the ifDescr table (ifDescr.2).
1904	
1905	     interfaces.ifTable.ifEntry.ifDescr.1 = lo
1906	     interfaces.ifTable.ifEntry.ifDescr.2 = eth0
1907	     interfaces.ifTable.ifEntry.ifDescr.3 = eth1
1908	     interfaces.ifTable.ifEntry.ifDescr.4 = eth2
1909	     interfaces.ifTable.ifEntry.ifDescr.5 = eth3
1910	     interfaces.ifTable.ifEntry.ifDescr.6 = bond0
1911	     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.10.10.10 = 5
1912	     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.192.168.1.1 = 2
1913	     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.74.20.94 = 4
1914	     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.127.0.0.1 = 1
1915	
1916		This problem is avoided by loading the bonding driver before
1917	any network drivers participating in a bond.  Below is an example of
1918	loading the bonding driver first, the IP address 192.168.1.1 is
1919	correctly associated with ifDescr.2.
1920	
1921	     interfaces.ifTable.ifEntry.ifDescr.1 = lo
1922	     interfaces.ifTable.ifEntry.ifDescr.2 = bond0
1923	     interfaces.ifTable.ifEntry.ifDescr.3 = eth0
1924	     interfaces.ifTable.ifEntry.ifDescr.4 = eth1
1925	     interfaces.ifTable.ifEntry.ifDescr.5 = eth2
1926	     interfaces.ifTable.ifEntry.ifDescr.6 = eth3
1927	     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.10.10.10 = 6
1928	     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.192.168.1.1 = 2
1929	     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.74.20.94 = 5
1930	     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.127.0.0.1 = 1
1931	
1932		While some distributions may not report the interface name in
1933	ifDescr, the association between the IP address and IfIndex remains
1934	and SNMP functions such as Interface_Scan_Next will report that
1935	association.
1936	
1937	10. Promiscuous mode
1938	====================
1939	
1940		When running network monitoring tools, e.g., tcpdump, it is
1941	common to enable promiscuous mode on the device, so that all traffic
1942	is seen (instead of seeing only traffic destined for the local host).
1943	The bonding driver handles promiscuous mode changes to the bonding
1944	master device (e.g., bond0), and propagates the setting to the slave
1945	devices.
1946	
1947		For the balance-rr, balance-xor, broadcast, and 802.3ad modes,
1948	the promiscuous mode setting is propagated to all slaves.
1949	
1950		For the active-backup, balance-tlb and balance-alb modes, the
1951	promiscuous mode setting is propagated only to the active slave.
1952	
1953		For balance-tlb mode, the active slave is the slave currently
1954	receiving inbound traffic.
1955	
1956		For balance-alb mode, the active slave is the slave used as a
1957	"primary."  This slave is used for mode-specific control traffic, for
1958	sending to peers that are unassigned or if the load is unbalanced.
1959	
1960		For the active-backup, balance-tlb and balance-alb modes, when
1961	the active slave changes (e.g., due to a link failure), the
1962	promiscuous setting will be propagated to the new active slave.
1963	
1964	11. Configuring Bonding for High Availability
1965	=============================================
1966	
1967		High Availability refers to configurations that provide
1968	maximum network availability by having redundant or backup devices,
1969	links or switches between the host and the rest of the world.  The
1970	goal is to provide the maximum availability of network connectivity
1971	(i.e., the network always works), even though other configurations
1972	could provide higher throughput.
1973	
1974	11.1 High Availability in a Single Switch Topology
1975	--------------------------------------------------
1976	
1977		If two hosts (or a host and a single switch) are directly
1978	connected via multiple physical links, then there is no availability
1979	penalty to optimizing for maximum bandwidth.  In this case, there is
1980	only one switch (or peer), so if it fails, there is no alternative
1981	access to fail over to.  Additionally, the bonding load balance modes
1982	support link monitoring of their members, so if individual links fail,
1983	the load will be rebalanced across the remaining devices.
1984	
1985		See Section 12, "Configuring Bonding for Maximum Throughput"
1986	for information on configuring bonding with one peer device.
1987	
1988	11.2 High Availability in a Multiple Switch Topology
1989	----------------------------------------------------
1990	
1991		With multiple switches, the configuration of bonding and the
1992	network changes dramatically.  In multiple switch topologies, there is
1993	a trade off between network availability and usable bandwidth.
1994	
1995		Below is a sample network, configured to maximize the
1996	availability of the network:
1997	
1998	                |                                     |
1999	                |port3                           port3|
2000	          +-----+----+                          +-----+----+
2001	          |          |port2       ISL      port2|          |
2002	          | switch A +--------------------------+ switch B |
2003	          |          |                          |          |
2004	          +-----+----+                          +-----++---+
2005	                |port1                           port1|
2006	                |             +-------+               |
2007	                +-------------+ host1 +---------------+
2008	                         eth0 +-------+ eth1
2009	
2010		In this configuration, there is a link between the two
2011	switches (ISL, or inter switch link), and multiple ports connecting to
2012	the outside world ("port3" on each switch).  There is no technical
2013	reason that this could not be extended to a third switch.
2014	
2015	11.2.1 HA Bonding Mode Selection for Multiple Switch Topology
2016	-------------------------------------------------------------
2017	
2018		In a topology such as the example above, the active-backup and
2019	broadcast modes are the only useful bonding modes when optimizing for
2020	availability; the other modes require all links to terminate on the
2021	same peer for them to behave rationally.
2022	
2023	active-backup: This is generally the preferred mode, particularly if
2024		the switches have an ISL and play together well.  If the
2025		network configuration is such that one switch is specifically
2026		a backup switch (e.g., has lower capacity, higher cost, etc),
2027		then the primary option can be used to insure that the
2028		preferred link is always used when it is available.
2029	
2030	broadcast: This mode is really a special purpose mode, and is suitable
2031		only for very specific needs.  For example, if the two
2032		switches are not connected (no ISL), and the networks beyond
2033		them are totally independent.  In this case, if it is
2034		necessary for some specific one-way traffic to reach both
2035		independent networks, then the broadcast mode may be suitable.
2036	
2037	11.2.2 HA Link Monitoring Selection for Multiple Switch Topology
2038	----------------------------------------------------------------
2039	
2040		The choice of link monitoring ultimately depends upon your
2041	switch.  If the switch can reliably fail ports in response to other
2042	failures, then either the MII or ARP monitors should work.  For
2043	example, in the above example, if the "port3" link fails at the remote
2044	end, the MII monitor has no direct means to detect this.  The ARP
2045	monitor could be configured with a target at the remote end of port3,
2046	thus detecting that failure without switch support.
2047	
2048		In general, however, in a multiple switch topology, the ARP
2049	monitor can provide a higher level of reliability in detecting end to
2050	end connectivity failures (which may be caused by the failure of any
2051	individual component to pass traffic for any reason).  Additionally,
2052	the ARP monitor should be configured with multiple targets (at least
2053	one for each switch in the network).  This will insure that,
2054	regardless of which switch is active, the ARP monitor has a suitable
2055	target to query.
2056	
2057		Note, also, that of late many switches now support a functionality
2058	generally referred to as "trunk failover."  This is a feature of the
2059	switch that causes the link state of a particular switch port to be set
2060	down (or up) when the state of another switch port goes down (or up).
2061	Its purpose is to propagate link failures from logically "exterior" ports
2062	to the logically "interior" ports that bonding is able to monitor via
2063	miimon.  Availability and configuration for trunk failover varies by
2064	switch, but this can be a viable alternative to the ARP monitor when using
2065	suitable switches.
2066	
2067	12. Configuring Bonding for Maximum Throughput
2068	==============================================
2069	
2070	12.1 Maximizing Throughput in a Single Switch Topology
2071	------------------------------------------------------
2072	
2073		In a single switch configuration, the best method to maximize
2074	throughput depends upon the application and network environment.  The
2075	various load balancing modes each have strengths and weaknesses in
2076	different environments, as detailed below.
2077	
2078		For this discussion, we will break down the topologies into
2079	two categories.  Depending upon the destination of most traffic, we
2080	categorize them into either "gatewayed" or "local" configurations.
2081	
2082		In a gatewayed configuration, the "switch" is acting primarily
2083	as a router, and the majority of traffic passes through this router to
2084	other networks.  An example would be the following:
2085	
2086	
2087	     +----------+                     +----------+
2088	     |          |eth0            port1|          | to other networks
2089	     | Host A   +---------------------+ router   +------------------->
2090	     |          +---------------------+          | Hosts B and C are out
2091	     |          |eth1            port2|          | here somewhere
2092	     +----------+                     +----------+
2093	
2094		The router may be a dedicated router device, or another host
2095	acting as a gateway.  For our discussion, the important point is that
2096	the majority of traffic from Host A will pass through the router to
2097	some other network before reaching its final destination.
2098	
2099		In a gatewayed network configuration, although Host A may
2100	communicate with many other systems, all of its traffic will be sent
2101	and received via one other peer on the local network, the router.
2102	
2103		Note that the case of two systems connected directly via
2104	multiple physical links is, for purposes of configuring bonding, the
2105	same as a gatewayed configuration.  In that case, it happens that all
2106	traffic is destined for the "gateway" itself, not some other network
2107	beyond the gateway.
2108	
2109		In a local configuration, the "switch" is acting primarily as
2110	a switch, and the majority of traffic passes through this switch to
2111	reach other stations on the same network.  An example would be the
2112	following:
2113	
2114	    +----------+            +----------+       +--------+
2115	    |          |eth0   port1|          +-------+ Host B |
2116	    |  Host A  +------------+  switch  |port3  +--------+
2117	    |          +------------+          |                  +--------+
2118	    |          |eth1   port2|          +------------------+ Host C |
2119	    +----------+            +----------+port4             +--------+
2120	
2121	
2122		Again, the switch may be a dedicated switch device, or another
2123	host acting as a gateway.  For our discussion, the important point is
2124	that the majority of traffic from Host A is destined for other hosts
2125	on the same local network (Hosts B and C in the above example).
2126	
2127		In summary, in a gatewayed configuration, traffic to and from
2128	the bonded device will be to the same MAC level peer on the network
2129	(the gateway itself, i.e., the router), regardless of its final
2130	destination.  In a local configuration, traffic flows directly to and
2131	from the final destinations, thus, each destination (Host B, Host C)
2132	will be addressed directly by their individual MAC addresses.
2133	
2134		This distinction between a gatewayed and a local network
2135	configuration is important because many of the load balancing modes
2136	available use the MAC addresses of the local network source and
2137	destination to make load balancing decisions.  The behavior of each
2138	mode is described below.
2139	
2140	
2141	12.1.1 MT Bonding Mode Selection for Single Switch Topology
2142	-----------------------------------------------------------
2143	
2144		This configuration is the easiest to set up and to understand,
2145	although you will have to decide which bonding mode best suits your
2146	needs.  The trade offs for each mode are detailed below:
2147	
2148	balance-rr: This mode is the only mode that will permit a single
2149		TCP/IP connection to stripe traffic across multiple
2150		interfaces. It is therefore the only mode that will allow a
2151		single TCP/IP stream to utilize more than one interface's
2152		worth of throughput.  This comes at a cost, however: the
2153		striping generally results in peer systems receiving packets out
2154		of order, causing TCP/IP's congestion control system to kick
2155		in, often by retransmitting segments.
2156	
2157		It is possible to adjust TCP/IP's congestion limits by
2158		altering the net.ipv4.tcp_reordering sysctl parameter.  The
2159		usual default value is 3, and the maximum useful value is 127.
2160		For a four interface balance-rr bond, expect that a single
2161		TCP/IP stream will utilize no more than approximately 2.3
2162		interface's worth of throughput, even after adjusting
2163		tcp_reordering.
2164	
2165		Note that the fraction of packets that will be delivered out of
2166		order is highly variable, and is unlikely to be zero.  The level
2167		of reordering depends upon a variety of factors, including the
2168		networking interfaces, the switch, and the topology of the
2169		configuration.  Speaking in general terms, higher speed network
2170		cards produce more reordering (due to factors such as packet
2171		coalescing), and a "many to many" topology will reorder at a
2172		higher rate than a "many slow to one fast" configuration.
2173	
2174		Many switches do not support any modes that stripe traffic
2175		(instead choosing a port based upon IP or MAC level addresses);
2176		for those devices, traffic for a particular connection flowing
2177		through the switch to a balance-rr bond will not utilize greater
2178		than one interface's worth of bandwidth.
2179	
2180		If you are utilizing protocols other than TCP/IP, UDP for
2181		example, and your application can tolerate out of order
2182		delivery, then this mode can allow for single stream datagram
2183		performance that scales near linearly as interfaces are added
2184		to the bond.
2185	
2186		This mode requires the switch to have the appropriate ports
2187		configured for "etherchannel" or "trunking."
2188	
2189	active-backup: There is not much advantage in this network topology to
2190		the active-backup mode, as the inactive backup devices are all
2191		connected to the same peer as the primary.  In this case, a
2192		load balancing mode (with link monitoring) will provide the
2193		same level of network availability, but with increased
2194		available bandwidth.  On the plus side, active-backup mode
2195		does not require any configuration of the switch, so it may
2196		have value if the hardware available does not support any of
2197		the load balance modes.
2198	
2199	balance-xor: This mode will limit traffic such that packets destined
2200		for specific peers will always be sent over the same
2201		interface.  Since the destination is determined by the MAC
2202		addresses involved, this mode works best in a "local" network
2203		configuration (as described above), with destinations all on
2204		the same local network.  This mode is likely to be suboptimal
2205		if all your traffic is passed through a single router (i.e., a
2206		"gatewayed" network configuration, as described above).
2207	
2208		As with balance-rr, the switch ports need to be configured for
2209		"etherchannel" or "trunking."
2210	
2211	broadcast: Like active-backup, there is not much advantage to this
2212		mode in this type of network topology.
2213	
2214	802.3ad: This mode can be a good choice for this type of network
2215		topology.  The 802.3ad mode is an IEEE standard, so all peers
2216		that implement 802.3ad should interoperate well.  The 802.3ad
2217		protocol includes automatic configuration of the aggregates,
2218		so minimal manual configuration of the switch is needed
2219		(typically only to designate that some set of devices is
2220		available for 802.3ad).  The 802.3ad standard also mandates
2221		that frames be delivered in order (within certain limits), so
2222		in general single connections will not see misordering of
2223		packets.  The 802.3ad mode does have some drawbacks: the
2224		standard mandates that all devices in the aggregate operate at
2225		the same speed and duplex.  Also, as with all bonding load
2226		balance modes other than balance-rr, no single connection will
2227		be able to utilize more than a single interface's worth of
2228		bandwidth.  
2229	
2230		Additionally, the linux bonding 802.3ad implementation
2231		distributes traffic by peer (using an XOR of MAC addresses),
2232		so in a "gatewayed" configuration, all outgoing traffic will
2233		generally use the same device.  Incoming traffic may also end
2234		up on a single device, but that is dependent upon the
2235		balancing policy of the peer's 8023.ad implementation.  In a
2236		"local" configuration, traffic will be distributed across the
2237		devices in the bond.
2238	
2239		Finally, the 802.3ad mode mandates the use of the MII monitor,
2240		therefore, the ARP monitor is not available in this mode.
2241	
2242	balance-tlb: The balance-tlb mode balances outgoing traffic by peer.
2243		Since the balancing is done according to MAC address, in a
2244		"gatewayed" configuration (as described above), this mode will
2245		send all traffic across a single device.  However, in a
2246		"local" network configuration, this mode balances multiple
2247		local network peers across devices in a vaguely intelligent
2248		manner (not a simple XOR as in balance-xor or 802.3ad mode),
2249		so that mathematically unlucky MAC addresses (i.e., ones that
2250		XOR to the same value) will not all "bunch up" on a single
2251		interface.
2252	
2253		Unlike 802.3ad, interfaces may be of differing speeds, and no
2254		special switch configuration is required.  On the down side,
2255		in this mode all incoming traffic arrives over a single
2256		interface, this mode requires certain ethtool support in the
2257		network device driver of the slave interfaces, and the ARP
2258		monitor is not available.
2259	
2260	balance-alb: This mode is everything that balance-tlb is, and more.
2261		It has all of the features (and restrictions) of balance-tlb,
2262		and will also balance incoming traffic from local network
2263		peers (as described in the Bonding Module Options section,
2264		above).
2265	
2266		The only additional down side to this mode is that the network
2267		device driver must support changing the hardware address while
2268		the device is open.
2269	
2270	12.1.2 MT Link Monitoring for Single Switch Topology
2271	----------------------------------------------------
2272	
2273		The choice of link monitoring may largely depend upon which
2274	mode you choose to use.  The more advanced load balancing modes do not
2275	support the use of the ARP monitor, and are thus restricted to using
2276	the MII monitor (which does not provide as high a level of end to end
2277	assurance as the ARP monitor).
2278	
2279	12.2 Maximum Throughput in a Multiple Switch Topology
2280	-----------------------------------------------------
2281	
2282		Multiple switches may be utilized to optimize for throughput
2283	when they are configured in parallel as part of an isolated network
2284	between two or more systems, for example:
2285	
2286	                       +-----------+
2287	                       |  Host A   | 
2288	                       +-+---+---+-+
2289	                         |   |   |
2290	                +--------+   |   +---------+
2291	                |            |             |
2292	         +------+---+  +-----+----+  +-----+----+
2293	         | Switch A |  | Switch B |  | Switch C |
2294	         +------+---+  +-----+----+  +-----+----+
2295	                |            |             |
2296	                +--------+   |   +---------+
2297	                         |   |   |
2298	                       +-+---+---+-+
2299	                       |  Host B   | 
2300	                       +-----------+
2301	
2302		In this configuration, the switches are isolated from one
2303	another.  One reason to employ a topology such as this is for an
2304	isolated network with many hosts (a cluster configured for high
2305	performance, for example), using multiple smaller switches can be more
2306	cost effective than a single larger switch, e.g., on a network with 24
2307	hosts, three 24 port switches can be significantly less expensive than
2308	a single 72 port switch.
2309	
2310		If access beyond the network is required, an individual host
2311	can be equipped with an additional network device connected to an
2312	external network; this host then additionally acts as a gateway.
2313	
2314	12.2.1 MT Bonding Mode Selection for Multiple Switch Topology
2315	-------------------------------------------------------------
2316	
2317		In actual practice, the bonding mode typically employed in
2318	configurations of this type is balance-rr.  Historically, in this
2319	network configuration, the usual caveats about out of order packet
2320	delivery are mitigated by the use of network adapters that do not do
2321	any kind of packet coalescing (via the use of NAPI, or because the
2322	device itself does not generate interrupts until some number of
2323	packets has arrived).  When employed in this fashion, the balance-rr
2324	mode allows individual connections between two hosts to effectively
2325	utilize greater than one interface's bandwidth.
2326	
2327	12.2.2 MT Link Monitoring for Multiple Switch Topology
2328	------------------------------------------------------
2329	
2330		Again, in actual practice, the MII monitor is most often used
2331	in this configuration, as performance is given preference over
2332	availability.  The ARP monitor will function in this topology, but its
2333	advantages over the MII monitor are mitigated by the volume of probes
2334	needed as the number of systems involved grows (remember that each
2335	host in the network is configured with bonding).
2336	
2337	13. Switch Behavior Issues
2338	==========================
2339	
2340	13.1 Link Establishment and Failover Delays
2341	-------------------------------------------
2342	
2343		Some switches exhibit undesirable behavior with regard to the
2344	timing of link up and down reporting by the switch.
2345	
2346		First, when a link comes up, some switches may indicate that
2347	the link is up (carrier available), but not pass traffic over the
2348	interface for some period of time.  This delay is typically due to
2349	some type of autonegotiation or routing protocol, but may also occur
2350	during switch initialization (e.g., during recovery after a switch
2351	failure).  If you find this to be a problem, specify an appropriate
2352	value to the updelay bonding module option to delay the use of the
2353	relevant interface(s).
2354	
2355		Second, some switches may "bounce" the link state one or more
2356	times while a link is changing state.  This occurs most commonly while
2357	the switch is initializing.  Again, an appropriate updelay value may
2358	help.
2359	
2360		Note that when a bonding interface has no active links, the
2361	driver will immediately reuse the first link that goes up, even if the
2362	updelay parameter has been specified (the updelay is ignored in this
2363	case).  If there are slave interfaces waiting for the updelay timeout
2364	to expire, the interface that first went into that state will be
2365	immediately reused.  This reduces down time of the network if the
2366	value of updelay has been overestimated, and since this occurs only in
2367	cases with no connectivity, there is no additional penalty for
2368	ignoring the updelay.
2369	
2370		In addition to the concerns about switch timings, if your
2371	switches take a long time to go into backup mode, it may be desirable
2372	to not activate a backup interface immediately after a link goes down.
2373	Failover may be delayed via the downdelay bonding module option.
2374	
2375	13.2 Duplicated Incoming Packets
2376	--------------------------------
2377	
2378		NOTE: Starting with version 3.0.2, the bonding driver has logic to
2379	suppress duplicate packets, which should largely eliminate this problem.
2380	The following description is kept for reference.
2381	
2382		It is not uncommon to observe a short burst of duplicated
2383	traffic when the bonding device is first used, or after it has been
2384	idle for some period of time.  This is most easily observed by issuing
2385	a "ping" to some other host on the network, and noticing that the
2386	output from ping flags duplicates (typically one per slave).
2387	
2388		For example, on a bond in active-backup mode with five slaves
2389	all connected to one switch, the output may appear as follows:
2390	
2391	# ping -n 10.0.4.2
2392	PING 10.0.4.2 (10.0.4.2) from 10.0.3.10 : 56(84) bytes of data.
2393	64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.7 ms
2394	64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!)
2395	64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!)
2396	64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!)
2397	64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!)
2398	64 bytes from 10.0.4.2: icmp_seq=2 ttl=64 time=0.216 ms
2399	64 bytes from 10.0.4.2: icmp_seq=3 ttl=64 time=0.267 ms
2400	64 bytes from 10.0.4.2: icmp_seq=4 ttl=64 time=0.222 ms
2401	
2402		This is not due to an error in the bonding driver, rather, it
2403	is a side effect of how many switches update their MAC forwarding
2404	tables.  Initially, the switch does not associate the MAC address in
2405	the packet with a particular switch port, and so it may send the
2406	traffic to all ports until its MAC forwarding table is updated.  Since
2407	the interfaces attached to the bond may occupy multiple ports on a
2408	single switch, when the switch (temporarily) floods the traffic to all
2409	ports, the bond device receives multiple copies of the same packet
2410	(one per slave device).
2411	
2412		The duplicated packet behavior is switch dependent, some
2413	switches exhibit this, and some do not.  On switches that display this
2414	behavior, it can be induced by clearing the MAC forwarding table (on
2415	most Cisco switches, the privileged command "clear mac address-table
2416	dynamic" will accomplish this).
2417	
2418	14. Hardware Specific Considerations
2419	====================================
2420	
2421		This section contains additional information for configuring
2422	bonding on specific hardware platforms, or for interfacing bonding
2423	with particular switches or other devices.
2424	
2425	14.1 IBM BladeCenter
2426	--------------------
2427	
2428		This applies to the JS20 and similar systems.
2429	
2430		On the JS20 blades, the bonding driver supports only
2431	balance-rr, active-backup, balance-tlb and balance-alb modes.  This is
2432	largely due to the network topology inside the BladeCenter, detailed
2433	below.
2434	
2435	JS20 network adapter information
2436	--------------------------------
2437	
2438		All JS20s come with two Broadcom Gigabit Ethernet ports
2439	integrated on the planar (that's "motherboard" in IBM-speak).  In the
2440	BladeCenter chassis, the eth0 port of all JS20 blades is hard wired to
2441	I/O Module #1; similarly, all eth1 ports are wired to I/O Module #2.
2442	An add-on Broadcom daughter card can be installed on a JS20 to provide
2443	two more Gigabit Ethernet ports.  These ports, eth2 and eth3, are
2444	wired to I/O Modules 3 and 4, respectively.
2445	
2446		Each I/O Module may contain either a switch or a passthrough
2447	module (which allows ports to be directly connected to an external
2448	switch).  Some bonding modes require a specific BladeCenter internal
2449	network topology in order to function; these are detailed below.
2450	
2451		Additional BladeCenter-specific networking information can be
2452	found in two IBM Redbooks (www.ibm.com/redbooks):
2453	
2454	"IBM eServer BladeCenter Networking Options"
2455	"IBM eServer BladeCenter Layer 2-7 Network Switching"
2456	
2457	BladeCenter networking configuration
2458	------------------------------------
2459	
2460		Because a BladeCenter can be configured in a very large number
2461	of ways, this discussion will be confined to describing basic
2462	configurations.
2463	
2464		Normally, Ethernet Switch Modules (ESMs) are used in I/O
2465	modules 1 and 2.  In this configuration, the eth0 and eth1 ports of a
2466	JS20 will be connected to different internal switches (in the
2467	respective I/O modules).
2468	
2469		A passthrough module (OPM or CPM, optical or copper,
2470	passthrough module) connects the I/O module directly to an external
2471	switch.  By using PMs in I/O module #1 and #2, the eth0 and eth1
2472	interfaces of a JS20 can be redirected to the outside world and
2473	connected to a common external switch.
2474	
2475		Depending upon the mix of ESMs and PMs, the network will
2476	appear to bonding as either a single switch topology (all PMs) or as a
2477	multiple switch topology (one or more ESMs, zero or more PMs).  It is
2478	also possible to connect ESMs together, resulting in a configuration
2479	much like the example in "High Availability in a Multiple Switch
2480	Topology," above.
2481	
2482	Requirements for specific modes
2483	-------------------------------
2484	
2485		The balance-rr mode requires the use of passthrough modules
2486	for devices in the bond, all connected to an common external switch.
2487	That switch must be configured for "etherchannel" or "trunking" on the
2488	appropriate ports, as is usual for balance-rr.
2489	
2490		The balance-alb and balance-tlb modes will function with
2491	either switch modules or passthrough modules (or a mix).  The only
2492	specific requirement for these modes is that all network interfaces
2493	must be able to reach all destinations for traffic sent over the
2494	bonding device (i.e., the network must converge at some point outside
2495	the BladeCenter).
2496	
2497		The active-backup mode has no additional requirements.
2498	
2499	Link monitoring issues
2500	----------------------
2501	
2502		When an Ethernet Switch Module is in place, only the ARP
2503	monitor will reliably detect link loss to an external switch.  This is
2504	nothing unusual, but examination of the BladeCenter cabinet would
2505	suggest that the "external" network ports are the ethernet ports for
2506	the system, when it fact there is a switch between these "external"
2507	ports and the devices on the JS20 system itself.  The MII monitor is
2508	only able to detect link failures between the ESM and the JS20 system.
2509	
2510		When a passthrough module is in place, the MII monitor does
2511	detect failures to the "external" port, which is then directly
2512	connected to the JS20 system.
2513	
2514	Other concerns
2515	--------------
2516	
2517		The Serial Over LAN (SoL) link is established over the primary
2518	ethernet (eth0) only, therefore, any loss of link to eth0 will result
2519	in losing your SoL connection.  It will not fail over with other
2520	network traffic, as the SoL system is beyond the control of the
2521	bonding driver.
2522	
2523		It may be desirable to disable spanning tree on the switch
2524	(either the internal Ethernet Switch Module, or an external switch) to
2525	avoid fail-over delay issues when using bonding.
2526	
2527		
2528	15. Frequently Asked Questions
2529	==============================
2530	
2531	1.  Is it SMP safe?
2532	
2533		Yes. The old 2.0.xx channel bonding patch was not SMP safe.
2534	The new driver was designed to be SMP safe from the start.
2535	
2536	2.  What type of cards will work with it?
2537	
2538		Any Ethernet type cards (you can even mix cards - a Intel
2539	EtherExpress PRO/100 and a 3com 3c905b, for example).  For most modes,
2540	devices need not be of the same speed.
2541	
2542		Starting with version 3.2.1, bonding also supports Infiniband
2543	slaves in active-backup mode.
2544	
2545	3.  How many bonding devices can I have?
2546	
2547		There is no limit.
2548	
2549	4.  How many slaves can a bonding device have?
2550	
2551		This is limited only by the number of network interfaces Linux
2552	supports and/or the number of network cards you can place in your
2553	system.
2554	
2555	5.  What happens when a slave link dies?
2556	
2557		If link monitoring is enabled, then the failing device will be
2558	disabled.  The active-backup mode will fail over to a backup link, and
2559	other modes will ignore the failed link.  The link will continue to be
2560	monitored, and should it recover, it will rejoin the bond (in whatever
2561	manner is appropriate for the mode). See the sections on High
2562	Availability and the documentation for each mode for additional
2563	information.
2564		
2565		Link monitoring can be enabled via either the miimon or
2566	arp_interval parameters (described in the module parameters section,
2567	above).  In general, miimon monitors the carrier state as sensed by
2568	the underlying network device, and the arp monitor (arp_interval)
2569	monitors connectivity to another host on the local network.
2570	
2571		If no link monitoring is configured, the bonding driver will
2572	be unable to detect link failures, and will assume that all links are
2573	always available.  This will likely result in lost packets, and a
2574	resulting degradation of performance.  The precise performance loss
2575	depends upon the bonding mode and network configuration.
2576	
2577	6.  Can bonding be used for High Availability?
2578	
2579		Yes.  See the section on High Availability for details.
2580	
2581	7.  Which switches/systems does it work with?
2582	
2583		The full answer to this depends upon the desired mode.
2584	
2585		In the basic balance modes (balance-rr and balance-xor), it
2586	works with any system that supports etherchannel (also called
2587	trunking).  Most managed switches currently available have such
2588	support, and many unmanaged switches as well.
2589	
2590		The advanced balance modes (balance-tlb and balance-alb) do
2591	not have special switch requirements, but do need device drivers that
2592	support specific features (described in the appropriate section under
2593	module parameters, above).
2594	
2595		In 802.3ad mode, it works with systems that support IEEE
2596	802.3ad Dynamic Link Aggregation.  Most managed and many unmanaged
2597	switches currently available support 802.3ad.
2598	
2599	        The active-backup mode should work with any Layer-II switch.
2600	
2601	8.  Where does a bonding device get its MAC address from?
2602	
2603		When using slave devices that have fixed MAC addresses, or when
2604	the fail_over_mac option is enabled, the bonding device's MAC address is
2605	the MAC address of the active slave.
2606	
2607		For other configurations, if not explicitly configured (with
2608	ifconfig or ip link), the MAC address of the bonding device is taken from
2609	its first slave device.  This MAC address is then passed to all following
2610	slaves and remains persistent (even if the first slave is removed) until
2611	the bonding device is brought down or reconfigured.
2612	
2613		If you wish to change the MAC address, you can set it with
2614	ifconfig or ip link:
2615	
2616	# ifconfig bond0 hw ether 00:11:22:33:44:55
2617	
2618	# ip link set bond0 address 66:77:88:99:aa:bb
2619	
2620		The MAC address can be also changed by bringing down/up the
2621	device and then changing its slaves (or their order):
2622	
2623	# ifconfig bond0 down ; modprobe -r bonding
2624	# ifconfig bond0 .... up
2625	# ifenslave bond0 eth...
2626	
2627		This method will automatically take the address from the next
2628	slave that is added.
2629	
2630		To restore your slaves' MAC addresses, you need to detach them
2631	from the bond (`ifenslave -d bond0 eth0'). The bonding driver will
2632	then restore the MAC addresses that the slaves had before they were
2633	enslaved.
2634	
2635	16. Resources and Links
2636	=======================
2637	
2638		The latest version of the bonding driver can be found in the latest
2639	version of the linux kernel, found on http://kernel.org
2640	
2641		The latest version of this document can be found in the latest kernel
2642	source (named Documentation/networking/bonding.txt).
2643	
2644		Discussions regarding the usage of the bonding driver take place on the
2645	bonding-devel mailing list, hosted at sourceforge.net. If you have questions or
2646	problems, post them to the list.  The list address is:
2647	
2648	bonding-devel@lists.sourceforge.net
2649	
2650		The administrative interface (to subscribe or unsubscribe) can
2651	be found at:
2652	
2653	https://lists.sourceforge.net/lists/listinfo/bonding-devel
2654	
2655		Discussions regarding the development of the bonding driver take place
2656	on the main Linux network mailing list, hosted at vger.kernel.org. The list
2657	address is:
2658	
2659	netdev@vger.kernel.org
2660	
2661		The administrative interface (to subscribe or unsubscribe) can
2662	be found at:
2663	
2664	http://vger.kernel.org/vger-lists.html#netdev
2665	
2666	Donald Becker's Ethernet Drivers and diag programs may be found at :
2667	 - http://web.archive.org/web/*/http://www.scyld.com/network/ 
2668	
2669	You will also find a lot of information regarding Ethernet, NWay, MII,
2670	etc. at www.scyld.com.
2671	
2672	-- END --
Hide Line Numbers
About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Information is copyright its respective author. All material is available from the Linux Kernel Source distributed under a GPL License. This page is provided as a free service by mjmwired.net.