About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Documentation / xillybus.txt




Custom Search

Based on kernel version 4.9. Page generated on 2016-12-21 14:37 EST.

1	
2	               ==========================================
3	               Xillybus driver for generic FPGA interface
4	               ==========================================
5	
6	Author: Eli Billauer, Xillybus Ltd. (http://xillybus.com)
7	Email:  eli.billauer@gmail.com or as advertised on Xillybus' site.
8	
9	Contents:
10	
11	 - Introduction
12	  -- Background
13	  -- Xillybus Overview
14	
15	 - Usage
16	  -- User interface
17	  -- Synchronization
18	  -- Seekable pipes
19	
20	- Internals
21	  -- Source code organization
22	  -- Pipe attributes
23	  -- Host never reads from the FPGA
24	  -- Channels, pipes, and the message channel
25	  -- Data streaming
26	  -- Data granularity
27	  -- Probing
28	  -- Buffer allocation
29	  -- The "nonempty" message (supporting poll)
30	
31	
32	INTRODUCTION
33	============
34	
35	Background
36	----------
37	
38	An FPGA (Field Programmable Gate Array) is a piece of logic hardware, which
39	can be programmed to become virtually anything that is usually found as a
40	dedicated chipset: For instance, a display adapter, network interface card,
41	or even a processor with its peripherals. FPGAs are the LEGO of hardware:
42	Based upon certain building blocks, you make your own toys the way you like
43	them. It's usually pointless to reimplement something that is already
44	available on the market as a chipset, so FPGAs are mostly used when some
45	special functionality is needed, and the production volume is relatively low
46	(hence not justifying the development of an ASIC).
47	
48	The challenge with FPGAs is that everything is implemented at a very low
49	level, even lower than assembly language. In order to allow FPGA designers to
50	focus on their specific project, and not reinvent the wheel over and over
51	again, pre-designed building blocks, IP cores, are often used. These are the
52	FPGA parallels of library functions. IP cores may implement certain
53	mathematical functions, a functional unit (e.g. a USB interface), an entire
54	processor (e.g. ARM) or anything that might come handy. Think of them as a
55	building block, with electrical wires dangling on the sides for connection to
56	other blocks.
57	
58	One of the daunting tasks in FPGA design is communicating with a fullblown
59	operating system (actually, with the processor running it): Implementing the
60	low-level bus protocol and the somewhat higher-level interface with the host
61	(registers, interrupts, DMA etc.) is a project in itself. When the FPGA's
62	function is a well-known one (e.g. a video adapter card, or a NIC), it can
63	make sense to design the FPGA's interface logic specifically for the project.
64	A special driver is then written to present the FPGA as a well-known interface
65	to the kernel and/or user space. In that case, there is no reason to treat the
66	FPGA differently than any device on the bus.
67	
68	It's however common that the desired data communication doesn't fit any well-
69	known peripheral function. Also, the effort of designing an elegant
70	abstraction for the data exchange is often considered too big. In those cases,
71	a quicker and possibly less elegant solution is sought: The driver is
72	effectively written as a user space program, leaving the kernel space part
73	with just elementary data transport. This still requires designing some
74	interface logic for the FPGA, and write a simple ad-hoc driver for the kernel.
75	
76	Xillybus Overview
77	-----------------
78	
79	Xillybus is an IP core and a Linux driver. Together, they form a kit for
80	elementary data transport between an FPGA and the host, providing pipe-like
81	data streams with a straightforward user interface. It's intended as a low-
82	effort solution for mixed FPGA-host projects, for which it makes sense to
83	have the project-specific part of the driver running in a user-space program.
84	
85	Since the communication requirements may vary significantly from one FPGA
86	project to another (the number of data pipes needed in each direction and
87	their attributes), there isn't one specific chunk of logic being the Xillybus
88	IP core. Rather, the IP core is configured and built based upon a
89	specification given by its end user.
90	
91	Xillybus presents independent data streams, which resemble pipes or TCP/IP
92	communication to the user. At the host side, a character device file is used
93	just like any pipe file. On the FPGA side, hardware FIFOs are used to stream
94	the data. This is contrary to a common method of communicating through fixed-
95	sized buffers (even though such buffers are used by Xillybus under the hood).
96	There may be more than a hundred of these streams on a single IP core, but
97	also no more than one, depending on the configuration.
98	
99	In order to ease the deployment of the Xillybus IP core, it contains a simple
100	data structure which completely defines the core's configuration. The Linux
101	driver fetches this data structure during its initialization process, and sets
102	up the DMA buffers and character devices accordingly. As a result, a single
103	driver is used to work out of the box with any Xillybus IP core.
104	
105	The data structure just mentioned should not be confused with PCI's
106	configuration space or the Flattened Device Tree.
107	
108	USAGE
109	=====
110	
111	User interface
112	--------------
113	
114	On the host, all interface with Xillybus is done through /dev/xillybus_*
115	device files, which are generated automatically as the drivers loads. The
116	names of these files depend on the IP core that is loaded in the FPGA (see
117	Probing below). To communicate with the FPGA, open the device file that
118	corresponds to the hardware FIFO you want to send data or receive data from,
119	and use plain write() or read() calls, just like with a regular pipe. In
120	particular, it makes perfect sense to go:
121	
122	$ cat mydata > /dev/xillybus_thisfifo
123	
124	$ cat /dev/xillybus_thatfifo > hisdata
125	
126	possibly pressing CTRL-C as some stage, even though the xillybus_* pipes have
127	the capability to send an EOF (but may not use it).
128	
129	The driver and hardware are designed to behave sensibly as pipes, including:
130	
131	* Supporting non-blocking I/O (by setting O_NONBLOCK on open() ).
132	
133	* Supporting poll() and select().
134	
135	* Being bandwidth efficient under load (using DMA) but also handle small
136	  pieces of data sent across (like TCP/IP) by autoflushing.
137	
138	A device file can be read only, write only or bidirectional. Bidirectional
139	device files are treated like two independent pipes (except for sharing a
140	"channel" structure in the implementation code).
141	
142	Synchronization
143	---------------
144	
145	Xillybus pipes are configured (on the IP core) to be either synchronous or
146	asynchronous. For a synchronous pipe, write() returns successfully only after
147	some data has been submitted and acknowledged by the FPGA. This slows down
148	bulk data transfers, and is nearly impossible for use with streams that
149	require data at a constant rate: There is no data transmitted to the FPGA
150	between write() calls, in particular when the process loses the CPU.
151	
152	When a pipe is configured asynchronous, write() returns if there was enough
153	room in the buffers to store any of the data in the buffers.
154	
155	For FPGA to host pipes, asynchronous pipes allow data transfer from the FPGA
156	as soon as the respective device file is opened, regardless of if the data
157	has been requested by a read() call. On synchronous pipes, only the amount
158	of data requested by a read() call is transmitted.
159	
160	In summary, for synchronous pipes, data between the host and FPGA is
161	transmitted only to satisfy the read() or write() call currently handled
162	by the driver, and those calls wait for the transmission to complete before
163	returning.
164	
165	Note that the synchronization attribute has nothing to do with the possibility
166	that read() or write() completes less bytes than requested. There is a
167	separate configuration flag ("allowpartial") that determines whether such a
168	partial completion is allowed.
169	
170	Seekable pipes
171	--------------
172	
173	A synchronous pipe can be configured to have the stream's position exposed
174	to the user logic at the FPGA. Such a pipe is also seekable on the host API.
175	With this feature, a memory or register interface can be attached on the
176	FPGA side to the seekable stream. Reading or writing to a certain address in
177	the attached memory is done by seeking to the desired address, and calling
178	read() or write() as required.
179	
180	
181	INTERNALS
182	=========
183	
184	Source code organization
185	------------------------
186	
187	The Xillybus driver consists of a core module, xillybus_core.c, and modules
188	that depend on the specific bus interface (xillybus_of.c and xillybus_pcie.c).
189	
190	The bus specific modules are those probed when a suitable device is found by
191	the kernel. Since the DMA mapping and synchronization functions, which are bus
192	dependent by their nature, are used by the core module, a
193	xilly_endpoint_hardware structure is passed to the core module on
194	initialization. This structure is populated with pointers to wrapper functions
195	which execute the DMA-related operations on the bus.
196	
197	Pipe attributes
198	---------------
199	
200	Each pipe has a number of attributes which are set when the FPGA component
201	(IP core) is built. They are fetched from the IDT (the data structure which
202	defines the core's configuration, see Probing below) by xilly_setupchannels()
203	in xillybus_core.c as follows:
204	
205	* is_writebuf: The pipe's direction. A non-zero value means it's an FPGA to
206	  host pipe (the FPGA "writes").
207	
208	* channelnum: The pipe's identification number in communication between the
209	  host and FPGA.
210	
211	* format: The underlying data width. See Data Granularity below.
212	
213	* allowpartial: A non-zero value means that a read() or write() (whichever
214	  applies) may return with less than the requested number of bytes. The common
215	  choice is a non-zero value, to match standard UNIX behavior.
216	
217	* synchronous: A non-zero value means that the pipe is synchronous. See
218	  Synchronization above.
219	
220	* bufsize: Each DMA buffer's size. Always a power of two.
221	
222	* bufnum: The number of buffers allocated for this pipe. Always a power of two.
223	
224	* exclusive_open: A non-zero value forces exclusive opening of the associated
225	  device file. If the device file is bidirectional, and already opened only in
226	  one direction, the opposite direction may be opened once.
227	
228	* seekable: A non-zero value indicates that the pipe is seekable. See
229	  Seekable pipes above.
230	
231	* supports_nonempty: A non-zero value (which is typical) indicates that the
232	  hardware will send the messages that are necessary to support select() and
233	  poll() for this pipe.
234	
235	Host never reads from the FPGA
236	------------------------------
237	
238	Even though PCI Express is hotpluggable in general, a typical motherboard
239	doesn't expect a card to go away all of the sudden. But since the PCIe card
240	is based upon reprogrammable logic, a sudden disappearance from the bus is
241	quite likely as a result of an accidental reprogramming of the FPGA while the
242	host is up. In practice, nothing happens immediately in such a situation. But
243	if the host attempts to read from an address that is mapped to the PCI Express
244	device, that leads to an immediate freeze of the system on some motherboards,
245	even though the PCIe standard requires a graceful recovery.
246	
247	In order to avoid these freezes, the Xillybus driver refrains completely from
248	reading from the device's register space. All communication from the FPGA to
249	the host is done through DMA. In particular, the Interrupt Service Routine
250	doesn't follow the common practice of checking a status register when it's
251	invoked. Rather, the FPGA prepares a small buffer which contains short
252	messages, which inform the host what the interrupt was about.
253	
254	This mechanism is used on non-PCIe buses as well for the sake of uniformity.
255	
256	
257	Channels, pipes, and the message channel
258	----------------------------------------
259	
260	Each of the (possibly bidirectional) pipes presented to the user is allocated
261	a data channel between the FPGA and the host. The distinction between channels
262	and pipes is necessary only because of channel 0, which is used for interrupt-
263	related messages from the FPGA, and has no pipe attached to it.
264	
265	Data streaming
266	--------------
267	
268	Even though a non-segmented data stream is presented to the user at both
269	sides, the implementation relies on a set of DMA buffers which is allocated
270	for each channel. For the sake of illustration, let's take the FPGA to host
271	direction: As data streams into the respective channel's interface in the
272	FPGA, the Xillybus IP core writes it to one of the DMA buffers. When the
273	buffer is full, the FPGA informs the host about that (appending a
274	XILLYMSG_OPCODE_RELEASEBUF message channel 0 and sending an interrupt if
275	necessary). The host responds by making the data available for reading through
276	the character device. When all data has been read, the host writes on the
277	the FPGA's buffer control register, allowing the buffer's overwriting. Flow
278	control mechanisms exist on both sides to prevent underflows and overflows.
279	
280	This is not good enough for creating a TCP/IP-like stream: If the data flow
281	stops momentarily before a DMA buffer is filled, the intuitive expectation is
282	that the partial data in buffer will arrive anyhow, despite the buffer not
283	being completed. This is implemented by adding a field in the
284	XILLYMSG_OPCODE_RELEASEBUF message, through which the FPGA informs not just
285	which buffer is submitted, but how much data it contains.
286	
287	But the FPGA will submit a partially filled buffer only if directed to do so
288	by the host. This situation occurs when the read() method has been blocking
289	for XILLY_RX_TIMEOUT jiffies (currently 10 ms), after which the host commands
290	the FPGA to submit a DMA buffer as soon as it can. This timeout mechanism
291	balances between bus bandwidth efficiency (preventing a lot of partially
292	filled buffers being sent) and a latency held fairly low for tails of data.
293	
294	A similar setting is used in the host to FPGA direction. The handling of
295	partial DMA buffers is somewhat different, though. The user can tell the
296	driver to submit all data it has in the buffers to the FPGA, by issuing a
297	write() with the byte count set to zero. This is similar to a flush request,
298	but it doesn't block. There is also an autoflushing mechanism, which triggers
299	an equivalent flush roughly XILLY_RX_TIMEOUT jiffies after the last write().
300	This allows the user to be oblivious about the underlying buffering mechanism
301	and yet enjoy a stream-like interface.
302	
303	Note that the issue of partial buffer flushing is irrelevant for pipes having
304	the "synchronous" attribute nonzero, since synchronous pipes don't allow data
305	to lay around in the DMA buffers between read() and write() anyhow.
306	
307	Data granularity
308	----------------
309	
310	The data arrives or is sent at the FPGA as 8, 16 or 32 bit wide words, as
311	configured by the "format" attribute. Whenever possible, the driver attempts
312	to hide this when the pipe is accessed differently from its natural alignment.
313	For example, reading single bytes from a pipe with 32 bit granularity works
314	with no issues. Writing single bytes to pipes with 16 or 32 bit granularity
315	will also work, but the driver can't send partially completed words to the
316	FPGA, so the transmission of up to one word may be held until it's fully
317	occupied with user data.
318	
319	This somewhat complicates the handling of host to FPGA streams, because
320	when a buffer is flushed, it may contain up to 3 bytes don't form a word in
321	the FPGA, and hence can't be sent. To prevent loss of data, these leftover
322	bytes need to be moved to the next buffer. The parts in xillybus_core.c
323	that mention "leftovers" in some way are related to this complication.
324	
325	Probing
326	-------
327	
328	As mentioned earlier, the number of pipes that are created when the driver
329	loads and their attributes depend on the Xillybus IP core in the FPGA. During
330	the driver's initialization, a blob containing configuration info, the
331	Interface Description Table (IDT), is sent from the FPGA to the host. The
332	bootstrap process is done in three phases:
333	
334	1. Acquire the length of the IDT, so a buffer can be allocated for it. This
335	   is done by sending a quiesce command to the device, since the acknowledge
336	   for this command contains the IDT's buffer length.
337	
338	2. Acquire the IDT itself.
339	
340	3. Create the interfaces according to the IDT.
341	
342	Buffer allocation
343	-----------------
344	
345	In order to simplify the logic that prevents illegal boundary crossings of
346	PCIe packets, the following rule applies: If a buffer is smaller than 4kB,
347	it must not cross a 4kB boundary. Otherwise, it must be 4kB aligned. The
348	xilly_setupchannels() functions allocates these buffers by requesting whole
349	pages from the kernel, and diving them into DMA buffers as necessary. Since
350	all buffers' sizes are powers of two, it's possible to pack any set of such
351	buffers, with a maximal waste of one page of memory.
352	
353	All buffers are allocated when the driver is loaded. This is necessary,
354	since large continuous physical memory segments are sometimes requested,
355	which are more likely to be available when the system is freshly booted.
356	
357	The allocation of buffer memory takes place in the same order they appear in
358	the IDT. The driver relies on a rule that the pipes are sorted with decreasing
359	buffer size in the IDT. If a requested buffer is larger or equal to a page,
360	the necessary number of pages is requested from the kernel, and these are
361	used for this buffer. If the requested buffer is smaller than a page, one
362	single page is requested from the kernel, and that page is partially used.
363	Or, if there already is a partially used page at hand, the buffer is packed
364	into that page. It can be shown that all pages requested from the kernel
365	(except possibly for the last) are 100% utilized this way.
366	
367	The "nonempty" message (supporting poll)
368	---------------------------------------
369	
370	In order to support the "poll" method (and hence select() ), there is a small
371	catch regarding the FPGA to host direction: The FPGA may have filled a DMA
372	buffer with some data, but not submitted that buffer. If the host waited for
373	the buffer's submission by the FPGA, there would be a possibility that the
374	FPGA side has sent data, but a select() call would still block, because the
375	host has not received any notification about this. This is solved with
376	XILLYMSG_OPCODE_NONEMPTY messages sent by the FPGA when a channel goes from
377	completely empty to containing some data.
378	
379	These messages are used only to support poll() and select(). The IP core can
380	be configured not to send them for a slight reduction of bandwidth.
Hide Line Numbers
About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Information is copyright its respective author. All material is available from the Linux Kernel Source distributed under a GPL License. This page is provided as a free service by mjmwired.net.