About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Documentation / networking / cxgb.txt


Based on kernel version 4.16.1. Page generated on 2018-04-09 11:53 EST.

1	                 Chelsio N210 10Gb Ethernet Network Controller
2	
3	                         Driver Release Notes for Linux
4	
5	                                 Version 2.1.1
6	
7	                                 June 20, 2005
8	
9	CONTENTS
10	========
11	 INTRODUCTION
12	 FEATURES
13	 PERFORMANCE
14	 DRIVER MESSAGES
15	 KNOWN ISSUES
16	 SUPPORT
17	
18	
19	INTRODUCTION
20	============
21	
22	 This document describes the Linux driver for Chelsio 10Gb Ethernet Network
23	 Controller. This driver supports the Chelsio N210 NIC and is backward
24	 compatible with the Chelsio N110 model 10Gb NICs.
25	
26	
27	FEATURES
28	========
29	
30	 Adaptive Interrupts (adaptive-rx)
31	 ---------------------------------
32	
33	  This feature provides an adaptive algorithm that adjusts the interrupt
34	  coalescing parameters, allowing the driver to dynamically adapt the latency
35	  settings to achieve the highest performance during various types of network
36	  load.
37	
38	  The interface used to control this feature is ethtool. Please see the
39	  ethtool manpage for additional usage information.
40	
41	  By default, adaptive-rx is disabled.
42	  To enable adaptive-rx:
43	
44	      ethtool -C <interface> adaptive-rx on
45	
46	  To disable adaptive-rx, use ethtool:
47	
48	      ethtool -C <interface> adaptive-rx off
49	
50	  After disabling adaptive-rx, the timer latency value will be set to 50us.
51	  You may set the timer latency after disabling adaptive-rx:
52	
53	      ethtool -C <interface> rx-usecs <microseconds>
54	
55	  An example to set the timer latency value to 100us on eth0:
56	
57	      ethtool -C eth0 rx-usecs 100
58	
59	  You may also provide a timer latency value while disabling adaptive-rx:
60	
61	      ethtool -C <interface> adaptive-rx off rx-usecs <microseconds>
62	
63	  If adaptive-rx is disabled and a timer latency value is specified, the timer
64	  will be set to the specified value until changed by the user or until
65	  adaptive-rx is enabled.
66	
67	  To view the status of the adaptive-rx and timer latency values:
68	
69	      ethtool -c <interface>
70	
71	
72	 TCP Segmentation Offloading (TSO) Support
73	 -----------------------------------------
74	
75	  This feature, also known as "large send", enables a system's protocol stack
76	  to offload portions of outbound TCP processing to a network interface card
77	  thereby reducing system CPU utilization and enhancing performance.
78	
79	  The interface used to control this feature is ethtool version 1.8 or higher.
80	  Please see the ethtool manpage for additional usage information.
81	
82	  By default, TSO is enabled.
83	  To disable TSO:
84	
85	      ethtool -K <interface> tso off
86	
87	  To enable TSO:
88	
89	      ethtool -K <interface> tso on
90	
91	  To view the status of TSO:
92	
93	      ethtool -k <interface>
94	
95	
96	PERFORMANCE
97	===========
98	
99	 The following information is provided as an example of how to change system
100	 parameters for "performance tuning" an what value to use. You may or may not
101	 want to change these system parameters, depending on your server/workstation
102	 application. Doing so is not warranted in any way by Chelsio Communications,
103	 and is done at "YOUR OWN RISK". Chelsio will not be held responsible for loss
104	 of data or damage to equipment.
105	
106	 Your distribution may have a different way of doing things, or you may prefer
107	 a different method. These commands are shown only to provide an example of
108	 what to do and are by no means definitive.
109	
110	 Making any of the following system changes will only last until you reboot
111	 your system. You may want to write a script that runs at boot-up which
112	 includes the optimal settings for your system.
113	
114	  Setting PCI Latency Timer:
115	      setpci -d 1425:* 0x0c.l=0x0000F800
116	
117	  Disabling TCP timestamp:
118	      sysctl -w net.ipv4.tcp_timestamps=0
119	
120	  Disabling SACK:
121	      sysctl -w net.ipv4.tcp_sack=0
122	
123	  Setting large number of incoming connection requests:
124	      sysctl -w net.ipv4.tcp_max_syn_backlog=3000
125	
126	  Setting maximum receive socket buffer size:
127	      sysctl -w net.core.rmem_max=1024000
128	
129	  Setting maximum send socket buffer size:
130	      sysctl -w net.core.wmem_max=1024000
131	
132	  Set smp_affinity (on a multiprocessor system) to a single CPU:
133	      echo 1 > /proc/irq/<interrupt_number>/smp_affinity
134	
135	  Setting default receive socket buffer size:
136	      sysctl -w net.core.rmem_default=524287
137	
138	  Setting default send socket buffer size:
139	      sysctl -w net.core.wmem_default=524287
140	
141	  Setting maximum option memory buffers:
142	      sysctl -w net.core.optmem_max=524287
143	
144	  Setting maximum backlog (# of unprocessed packets before kernel drops):
145	      sysctl -w net.core.netdev_max_backlog=300000
146	
147	  Setting TCP read buffers (min/default/max):
148	      sysctl -w net.ipv4.tcp_rmem="10000000 10000000 10000000"
149	
150	  Setting TCP write buffers (min/pressure/max):
151	      sysctl -w net.ipv4.tcp_wmem="10000000 10000000 10000000"
152	
153	  Setting TCP buffer space (min/pressure/max):
154	      sysctl -w net.ipv4.tcp_mem="10000000 10000000 10000000"
155	
156	  TCP window size for single connections:
157	   The receive buffer (RX_WINDOW) size must be at least as large as the
158	   Bandwidth-Delay Product of the communication link between the sender and
159	   receiver. Due to the variations of RTT, you may want to increase the buffer
160	   size up to 2 times the Bandwidth-Delay Product. Reference page 289 of
161	   "TCP/IP Illustrated, Volume 1, The Protocols" by W. Richard Stevens.
162	   At 10Gb speeds, use the following formula:
163	       RX_WINDOW >= 1.25MBytes * RTT(in milliseconds)
164	       Example for RTT with 100us: RX_WINDOW = (1,250,000 * 0.1) = 125,000
165	   RX_WINDOW sizes of 256KB - 512KB should be sufficient.
166	   Setting the min, max, and default receive buffer (RX_WINDOW) size:
167	       sysctl -w net.ipv4.tcp_rmem="<min> <default> <max>"
168	
169	  TCP window size for multiple connections:
170	   The receive buffer (RX_WINDOW) size may be calculated the same as single
171	   connections, but should be divided by the number of connections. The
172	   smaller window prevents congestion and facilitates better pacing,
173	   especially if/when MAC level flow control does not work well or when it is
174	   not supported on the machine. Experimentation may be necessary to attain
175	   the correct value. This method is provided as a starting point for the
176	   correct receive buffer size.
177	   Setting the min, max, and default receive buffer (RX_WINDOW) size is
178	   performed in the same manner as single connection.
179	
180	
181	DRIVER MESSAGES
182	===============
183	
184	 The following messages are the most common messages logged by syslog. These
185	 may be found in /var/log/messages.
186	
187	  Driver up:
188	     Chelsio Network Driver - version 2.1.1
189	
190	  NIC detected:
191	     eth#: Chelsio N210 1x10GBaseX NIC (rev #), PCIX 133MHz/64-bit
192	
193	  Link up:
194	     eth#: link is up at 10 Gbps, full duplex
195	
196	  Link down:
197	     eth#: link is down
198	
199	
200	KNOWN ISSUES
201	============
202	
203	 These issues have been identified during testing. The following information
204	 is provided as a workaround to the problem. In some cases, this problem is
205	 inherent to Linux or to a particular Linux Distribution and/or hardware
206	 platform.
207	
208	  1. Large number of TCP retransmits on a multiprocessor (SMP) system.
209	
210	      On a system with multiple CPUs, the interrupt (IRQ) for the network
211	      controller may be bound to more than one CPU. This will cause TCP
212	      retransmits if the packet data were to be split across different CPUs
213	      and re-assembled in a different order than expected.
214	
215	      To eliminate the TCP retransmits, set smp_affinity on the particular
216	      interrupt to a single CPU. You can locate the interrupt (IRQ) used on
217	      the N110/N210 by using ifconfig:
218	          ifconfig <dev_name> | grep Interrupt
219	      Set the smp_affinity to a single CPU:
220	          echo 1 > /proc/irq/<interrupt_number>/smp_affinity
221	
222	      It is highly suggested that you do not run the irqbalance daemon on your
223	      system, as this will change any smp_affinity setting you have applied.
224	      The irqbalance daemon runs on a 10 second interval and binds interrupts
225	      to the least loaded CPU determined by the daemon. To disable this daemon:
226	          chkconfig --level 2345 irqbalance off
227	
228	      By default, some Linux distributions enable the kernel feature,
229	      irqbalance, which performs the same function as the daemon. To disable
230	      this feature, add the following line to your bootloader:
231	          noirqbalance
232	
233	          Example using the Grub bootloader:
234	              title Red Hat Enterprise Linux AS (2.4.21-27.ELsmp)
235	              root (hd0,0)
236	              kernel /vmlinuz-2.4.21-27.ELsmp ro root=/dev/hda3 noirqbalance
237	              initrd /initrd-2.4.21-27.ELsmp.img
238	
239	  2. After running insmod, the driver is loaded and the incorrect network
240	     interface is brought up without running ifup.
241	
242	      When using 2.4.x kernels, including RHEL kernels, the Linux kernel
243	      invokes a script named "hotplug". This script is primarily used to
244	      automatically bring up USB devices when they are plugged in, however,
245	      the script also attempts to automatically bring up a network interface
246	      after loading the kernel module. The hotplug script does this by scanning
247	      the ifcfg-eth# config files in /etc/sysconfig/network-scripts, looking
248	      for HWADDR=<mac_address>.
249	
250	      If the hotplug script does not find the HWADDRR within any of the
251	      ifcfg-eth# files, it will bring up the device with the next available
252	      interface name. If this interface is already configured for a different
253	      network card, your new interface will have incorrect IP address and
254	      network settings.
255	
256	      To solve this issue, you can add the HWADDR=<mac_address> key to the
257	      interface config file of your network controller.
258	
259	      To disable this "hotplug" feature, you may add the driver (module name)
260	      to the "blacklist" file located in /etc/hotplug. It has been noted that
261	      this does not work for network devices because the net.agent script
262	      does not use the blacklist file. Simply remove, or rename, the net.agent
263	      script located in /etc/hotplug to disable this feature.
264	
265	  3. Transport Protocol (TP) hangs when running heavy multi-connection traffic
266	     on an AMD Opteron system with HyperTransport PCI-X Tunnel chipset.
267	
268	      If your AMD Opteron system uses the AMD-8131 HyperTransport PCI-X Tunnel
269	      chipset, you may experience the "133-Mhz Mode Split Completion Data
270	      Corruption" bug identified by AMD while using a 133Mhz PCI-X card on the
271	      bus PCI-X bus.
272	
273	      AMD states, "Under highly specific conditions, the AMD-8131 PCI-X Tunnel
274	      can provide stale data via split completion cycles to a PCI-X card that
275	      is operating at 133 Mhz", causing data corruption.
276	
277	      AMD's provides three workarounds for this problem, however, Chelsio
278	      recommends the first option for best performance with this bug:
279	
280	        For 133Mhz secondary bus operation, limit the transaction length and
281	        the number of outstanding transactions, via BIOS configuration
282	        programming of the PCI-X card, to the following:
283	
284	           Data Length (bytes): 1k
285	           Total allowed outstanding transactions: 2
286	
287	      Please refer to AMD 8131-HT/PCI-X Errata 26310 Rev 3.08 August 2004,
288	      section 56, "133-MHz Mode Split Completion Data Corruption" for more
289	      details with this bug and workarounds suggested by AMD.
290	
291	      It may be possible to work outside AMD's recommended PCI-X settings, try
292	      increasing the Data Length to 2k bytes for increased performance. If you
293	      have issues with these settings, please revert to the "safe" settings
294	      and duplicate the problem before submitting a bug or asking for support.
295	
296	      NOTE: The default setting on most systems is 8 outstanding transactions
297	            and 2k bytes data length.
298	
299	  4. On multiprocessor systems, it has been noted that an application which
300	     is handling 10Gb networking can switch between CPUs causing degraded
301	     and/or unstable performance.
302	
303	      If running on an SMP system and taking performance measurements, it
304	      is suggested you either run the latest netperf-2.4.0+ or use a binding
305	      tool such as Tim Hockin's procstate utilities (runon)
306	      <http://www.hockin.org/~thockin/procstate/>.
307	
308	      Binding netserver and netperf (or other applications) to particular
309	      CPUs will have a significant difference in performance measurements.
310	      You may need to experiment which CPU to bind the application to in
311	      order to achieve the best performance for your system.
312	
313	      If you are developing an application designed for 10Gb networking,
314	      please keep in mind you may want to look at kernel functions
315	      sched_setaffinity & sched_getaffinity to bind your application.
316	
317	      If you are just running user-space applications such as ftp, telnet,
318	      etc., you may want to try the runon tool provided by Tim Hockin's
319	      procstate utility. You could also try binding the interface to a
320	      particular CPU: runon 0 ifup eth0
321	
322	
323	SUPPORT
324	=======
325	
326	 If you have problems with the software or hardware, please contact our
327	 customer support team via email at support@chelsio.com or check our website
328	 at http://www.chelsio.com
329	
330	===============================================================================
331	
332	 Chelsio Communications
333	 370 San Aleso Ave.
334	 Suite 100
335	 Sunnyvale, CA 94085
336	 http://www.chelsio.com
337	
338	This program is free software; you can redistribute it and/or modify
339	it under the terms of the GNU General Public License, version 2, as
340	published by the Free Software Foundation.
341	
342	You should have received a copy of the GNU General Public License along
343	with this program; if not, write to the Free Software Foundation, Inc.,
344	59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
345	
346	THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
347	WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
348	MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
349	
350	 Copyright (c) 2003-2005 Chelsio Communications. All rights reserved.
351	
352	===============================================================================
Hide Line Numbers


About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog