About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Documentation / networking / spider_net.txt


Based on kernel version 4.16.1. Page generated on 2018-04-09 11:53 EST.

1	
2	            The Spidernet Device Driver
3	            ===========================
4	
5	Written by Linas Vepstas <linas@austin.ibm.com>
6	
7	Version of 7 June 2007
8	
9	Abstract
10	========
11	This document sketches the structure of portions of the spidernet
12	device driver in the Linux kernel tree. The spidernet is a gigabit
13	ethernet device built into the Toshiba southbridge commonly used
14	in the SONY Playstation 3 and the IBM QS20 Cell blade.
15	
16	The Structure of the RX Ring.
17	=============================
18	The receive (RX) ring is a circular linked list of RX descriptors,
19	together with three pointers into the ring that are used to manage its
20	contents.
21	
22	The elements of the ring are called "descriptors" or "descrs"; they
23	describe the received data. This includes a pointer to a buffer
24	containing the received data, the buffer size, and various status bits.
25	
26	There are three primary states that a descriptor can be in: "empty",
27	"full" and "not-in-use".  An "empty" or "ready" descriptor is ready
28	to receive data from the hardware. A "full" descriptor has data in it,
29	and is waiting to be emptied and processed by the OS. A "not-in-use"
30	descriptor is neither empty or full; it is simply not ready. It may
31	not even have a data buffer in it, or is otherwise unusable.
32	
33	During normal operation, on device startup, the OS (specifically, the
34	spidernet device driver) allocates a set of RX descriptors and RX
35	buffers. These are all marked "empty", ready to receive data. This
36	ring is handed off to the hardware, which sequentially fills in the
37	buffers, and marks them "full". The OS follows up, taking the full
38	buffers, processing them, and re-marking them empty.
39	
40	This filling and emptying is managed by three pointers, the "head"
41	and "tail" pointers, managed by the OS, and a hardware current
42	descriptor pointer (GDACTDPA). The GDACTDPA points at the descr
43	currently being filled. When this descr is filled, the hardware
44	marks it full, and advances the GDACTDPA by one.  Thus, when there is
45	flowing RX traffic, every descr behind it should be marked "full",
46	and everything in front of it should be "empty".  If the hardware
47	discovers that the current descr is not empty, it will signal an
48	interrupt, and halt processing.
49	
50	The tail pointer tails or trails the hardware pointer. When the
51	hardware is ahead, the tail pointer will be pointing at a "full"
52	descr. The OS will process this descr, and then mark it "not-in-use",
53	and advance the tail pointer.  Thus, when there is flowing RX traffic,
54	all of the descrs in front of the tail pointer should be "full", and
55	all of those behind it should be "not-in-use". When RX traffic is not
56	flowing, then the tail pointer can catch up to the hardware pointer.
57	The OS will then note that the current tail is "empty", and halt
58	processing.
59	
60	The head pointer (somewhat mis-named) follows after the tail pointer.
61	When traffic is flowing, then the head pointer will be pointing at
62	a "not-in-use" descr. The OS will perform various housekeeping duties
63	on this descr. This includes allocating a new data buffer and
64	dma-mapping it so as to make it visible to the hardware. The OS will
65	then mark the descr as "empty", ready to receive data. Thus, when there
66	is flowing RX traffic, everything in front of the head pointer should
67	be "not-in-use", and everything behind it should be "empty". If no
68	RX traffic is flowing, then the head pointer can catch up to the tail
69	pointer, at which point the OS will notice that the head descr is
70	"empty", and it will halt processing.
71	
72	Thus, in an idle system, the GDACTDPA, tail and head pointers will
73	all be pointing at the same descr, which should be "empty". All of the
74	other descrs in the ring should be "empty" as well.
75	
76	The show_rx_chain() routine will print out the locations of the
77	GDACTDPA, tail and head pointers. It will also summarize the contents
78	of the ring, starting at the tail pointer, and listing the status
79	of the descrs that follow.
80	
81	A typical example of the output, for a nearly idle system, might be
82	
83	net eth1: Total number of descrs=256
84	net eth1: Chain tail located at descr=20
85	net eth1: Chain head is at 20
86	net eth1: HW curr desc (GDACTDPA) is at 21
87	net eth1: Have 1 descrs with stat=x40800101
88	net eth1: HW next desc (GDACNEXTDA) is at 22
89	net eth1: Last 255 descrs with stat=xa0800000
90	
91	In the above, the hardware has filled in one descr, number 20. Both
92	head and tail are pointing at 20, because it has not yet been emptied.
93	Meanwhile, hw is pointing at 21, which is free.
94	
95	The "Have nnn decrs" refers to the descr starting at the tail: in this
96	case, nnn=1 descr, starting at descr 20. The "Last nnn descrs" refers
97	to all of the rest of the descrs, from the last status change. The "nnn"
98	is a count of how many descrs have exactly the same status.
99	
100	The status x4... corresponds to "full" and status xa... corresponds
101	to "empty". The actual value printed is RXCOMST_A.
102	
103	In the device driver source code, a different set of names are
104	used for these same concepts, so that
105	
106	"empty" == SPIDER_NET_DESCR_CARDOWNED == 0xa
107	"full"  == SPIDER_NET_DESCR_FRAME_END == 0x4
108	"not in use" == SPIDER_NET_DESCR_NOT_IN_USE == 0xf
109	
110	
111	The RX RAM full bug/feature
112	===========================
113	
114	As long as the OS can empty out the RX buffers at a rate faster than
115	the hardware can fill them, there is no problem. If, for some reason,
116	the OS fails to empty the RX ring fast enough, the hardware GDACTDPA
117	pointer will catch up to the head, notice the not-empty condition,
118	ad stop. However, RX packets may still continue arriving on the wire.
119	The spidernet chip can save some limited number of these in local RAM.
120	When this local ram fills up, the spider chip will issue an interrupt
121	indicating this (GHIINT0STS will show ERRINT, and the GRMFLLINT bit
122	will be set in GHIINT1STS).  When the RX ram full condition occurs,
123	a certain bug/feature is triggered that has to be specially handled.
124	This section describes the special handling for this condition.
125	
126	When the OS finally has a chance to run, it will empty out the RX ring.
127	In particular, it will clear the descriptor on which the hardware had
128	stopped. However, once the hardware has decided that a certain
129	descriptor is invalid, it will not restart at that descriptor; instead
130	it will restart at the next descr. This potentially will lead to a
131	deadlock condition, as the tail pointer will be pointing at this descr,
132	which, from the OS point of view, is empty; the OS will be waiting for
133	this descr to be filled. However, the hardware has skipped this descr,
134	and is filling the next descrs. Since the OS doesn't see this, there
135	is a potential deadlock, with the OS waiting for one descr to fill,
136	while the hardware is waiting for a different set of descrs to become
137	empty.
138	
139	A call to show_rx_chain() at this point indicates the nature of the
140	problem. A typical print when the network is hung shows the following:
141	
142	net eth1: Spider RX RAM full, incoming packets might be discarded!
143	net eth1: Total number of descrs=256
144	net eth1: Chain tail located at descr=255
145	net eth1: Chain head is at 255
146	net eth1: HW curr desc (GDACTDPA) is at 0
147	net eth1: Have 1 descrs with stat=xa0800000
148	net eth1: HW next desc (GDACNEXTDA) is at 1
149	net eth1: Have 127 descrs with stat=x40800101
150	net eth1: Have 1 descrs with stat=x40800001
151	net eth1: Have 126 descrs with stat=x40800101
152	net eth1: Last 1 descrs with stat=xa0800000
153	
154	Both the tail and head pointers are pointing at descr 255, which is
155	marked xa... which is "empty". Thus, from the OS point of view, there
156	is nothing to be done. In particular, there is the implicit assumption
157	that everything in front of the "empty" descr must surely also be empty,
158	as explained in the last section. The OS is waiting for descr 255 to
159	become non-empty, which, in this case, will never happen.
160	
161	The HW pointer is at descr 0. This descr is marked 0x4.. or "full".
162	Since its already full, the hardware can do nothing more, and thus has
163	halted processing. Notice that descrs 0 through 254 are all marked
164	"full", while descr 254 and 255 are empty. (The "Last 1 descrs" is
165	descr 254, since tail was at 255.) Thus, the system is deadlocked,
166	and there can be no forward progress; the OS thinks there's nothing
167	to do, and the hardware has nowhere to put incoming data.
168	
169	This bug/feature is worked around with the spider_net_resync_head_ptr()
170	routine. When the driver receives RX interrupts, but an examination
171	of the RX chain seems to show it is empty, then it is probable that
172	the hardware has skipped a descr or two (sometimes dozens under heavy
173	network conditions). The spider_net_resync_head_ptr() subroutine will
174	search the ring for the next full descr, and the driver will resume
175	operations there.  Since this will leave "holes" in the ring, there
176	is also a spider_net_resync_tail_ptr() that will skip over such holes.
177	
178	As of this writing, the spider_net_resync() strategy seems to work very
179	well, even under heavy network loads.
180	
181	
182	The TX ring
183	===========
184	The TX ring uses a low-watermark interrupt scheme to make sure that
185	the TX queue is appropriately serviced for large packet sizes.
186	
187	For packet sizes greater than about 1KBytes, the kernel can fill
188	the TX ring quicker than the device can drain it. Once the ring
189	is full, the netdev is stopped. When there is room in the ring,
190	the netdev needs to be reawakened, so that more TX packets are placed
191	in the ring. The hardware can empty the ring about four times per jiffy,
192	so its not appropriate to wait for the poll routine to refill, since
193	the poll routine runs only once per jiffy.  The low-watermark mechanism
194	marks a descr about 1/4th of the way from the bottom of the queue, so
195	that an interrupt is generated when the descr is processed. This
196	interrupt wakes up the netdev, which can then refill the queue.
197	For large packets, this mechanism generates a relatively small number
198	of interrupts, about 1K/sec. For smaller packets, this will drop to zero
199	interrupts, as the hardware can empty the queue faster than the kernel
200	can fill it.
201	
202	
203	 ======= END OF DOCUMENT ========
Hide Line Numbers


About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog