About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Documentation / powerpc / cxl.txt


Based on kernel version 4.16.1. Page generated on 2018-04-09 11:53 EST.

1	Coherent Accelerator Interface (CXL)
2	====================================
3	
4	Introduction
5	============
6	
7	    The coherent accelerator interface is designed to allow the
8	    coherent connection of accelerators (FPGAs and other devices) to a
9	    POWER system. These devices need to adhere to the Coherent
10	    Accelerator Interface Architecture (CAIA).
11	
12	    IBM refers to this as the Coherent Accelerator Processor Interface
13	    or CAPI. In the kernel it's referred to by the name CXL to avoid
14	    confusion with the ISDN CAPI subsystem.
15	
16	    Coherent in this context means that the accelerator and CPUs can
17	    both access system memory directly and with the same effective
18	    addresses.
19	
20	
21	Hardware overview
22	=================
23	
24	         POWER8/9             FPGA
25	       +----------+        +---------+
26	       |          |        |         |
27	       |   CPU    |        |   AFU   |
28	       |          |        |         |
29	       |          |        |         |
30	       |          |        |         |
31	       +----------+        +---------+
32	       |   PHB    |        |         |
33	       |   +------+        |   PSL   |
34	       |   | CAPP |<------>|         |
35	       +---+------+  PCIE  +---------+
36	
37	    The POWER8/9 chip has a Coherently Attached Processor Proxy (CAPP)
38	    unit which is part of the PCIe Host Bridge (PHB). This is managed
39	    by Linux by calls into OPAL. Linux doesn't directly program the
40	    CAPP.
41	
42	    The FPGA (or coherently attached device) consists of two parts.
43	    The POWER Service Layer (PSL) and the Accelerator Function Unit
44	    (AFU). The AFU is used to implement specific functionality behind
45	    the PSL. The PSL, among other things, provides memory address
46	    translation services to allow each AFU direct access to userspace
47	    memory.
48	
49	    The AFU is the core part of the accelerator (eg. the compression,
50	    crypto etc function). The kernel has no knowledge of the function
51	    of the AFU. Only userspace interacts directly with the AFU.
52	
53	    The PSL provides the translation and interrupt services that the
54	    AFU needs. This is what the kernel interacts with. For example, if
55	    the AFU needs to read a particular effective address, it sends
56	    that address to the PSL, the PSL then translates it, fetches the
57	    data from memory and returns it to the AFU. If the PSL has a
58	    translation miss, it interrupts the kernel and the kernel services
59	    the fault. The context to which this fault is serviced is based on
60	    who owns that acceleration function.
61	
62	    POWER8 <-----> PSL Version 8 is compliant to the CAIA Version 1.0.
63	    POWER9 <-----> PSL Version 9 is compliant to the CAIA Version 2.0.
64	    This PSL Version 9 provides new features such as:
65	    * Interaction with the nest MMU on the P9 chip.
66	    * Native DMA support.
67	    * Supports sending ASB_Notify messages for host thread wakeup.
68	    * Supports Atomic operations.
69	    * ....
70	
71	    Cards with a PSL9 won't work on a POWER8 system and cards with a
72	    PSL8 won't work on a POWER9 system.
73	
74	AFU Modes
75	=========
76	
77	    There are two programming modes supported by the AFU. Dedicated
78	    and AFU directed. AFU may support one or both modes.
79	
80	    When using dedicated mode only one MMU context is supported. In
81	    this mode, only one userspace process can use the accelerator at
82	    time.
83	
84	    When using AFU directed mode, up to 16K simultaneous contexts can
85	    be supported. This means up to 16K simultaneous userspace
86	    applications may use the accelerator (although specific AFUs may
87	    support fewer). In this mode, the AFU sends a 16 bit context ID
88	    with each of its requests. This tells the PSL which context is
89	    associated with each operation. If the PSL can't translate an
90	    operation, the ID can also be accessed by the kernel so it can
91	    determine the userspace context associated with an operation.
92	
93	
94	MMIO space
95	==========
96	
97	    A portion of the accelerator MMIO space can be directly mapped
98	    from the AFU to userspace. Either the whole space can be mapped or
99	    just a per context portion. The hardware is self describing, hence
100	    the kernel can determine the offset and size of the per context
101	    portion.
102	
103	
104	Interrupts
105	==========
106	
107	    AFUs may generate interrupts that are destined for userspace. These
108	    are received by the kernel as hardware interrupts and passed onto
109	    userspace by a read syscall documented below.
110	
111	    Data storage faults and error interrupts are handled by the kernel
112	    driver.
113	
114	
115	Work Element Descriptor (WED)
116	=============================
117	
118	    The WED is a 64-bit parameter passed to the AFU when a context is
119	    started. Its format is up to the AFU hence the kernel has no
120	    knowledge of what it represents. Typically it will be the
121	    effective address of a work queue or status block where the AFU
122	    and userspace can share control and status information.
123	
124	
125	
126	
127	User API
128	========
129	
130	1. AFU character devices
131	
132	    For AFUs operating in AFU directed mode, two character device
133	    files will be created. /dev/cxl/afu0.0m will correspond to a
134	    master context and /dev/cxl/afu0.0s will correspond to a slave
135	    context. Master contexts have access to the full MMIO space an
136	    AFU provides. Slave contexts have access to only the per process
137	    MMIO space an AFU provides.
138	
139	    For AFUs operating in dedicated process mode, the driver will
140	    only create a single character device per AFU called
141	    /dev/cxl/afu0.0d. This will have access to the entire MMIO space
142	    that the AFU provides (like master contexts in AFU directed).
143	
144	    The types described below are defined in include/uapi/misc/cxl.h
145	
146	    The following file operations are supported on both slave and
147	    master devices.
148	
149	    A userspace library libcxl is available here:
150		https://github.com/ibm-capi/libcxl
151	    This provides a C interface to this kernel API.
152	
153	open
154	----
155	
156	    Opens the device and allocates a file descriptor to be used with
157	    the rest of the API.
158	
159	    A dedicated mode AFU only has one context and only allows the
160	    device to be opened once.
161	
162	    An AFU directed mode AFU can have many contexts, the device can be
163	    opened once for each context that is available.
164	
165	    When all available contexts are allocated the open call will fail
166	    and return -ENOSPC.
167	
168	    Note: IRQs need to be allocated for each context, which may limit
169	          the number of contexts that can be created, and therefore
170	          how many times the device can be opened. The POWER8 CAPP
171	          supports 2040 IRQs and 3 are used by the kernel, so 2037 are
172	          left. If 1 IRQ is needed per context, then only 2037
173	          contexts can be allocated. If 4 IRQs are needed per context,
174	          then only 2037/4 = 509 contexts can be allocated.
175	
176	
177	ioctl
178	-----
179	
180	    CXL_IOCTL_START_WORK:
181	        Starts the AFU context and associates it with the current
182	        process. Once this ioctl is successfully executed, all memory
183	        mapped into this process is accessible to this AFU context
184	        using the same effective addresses. No additional calls are
185	        required to map/unmap memory. The AFU memory context will be
186	        updated as userspace allocates and frees memory. This ioctl
187	        returns once the AFU context is started.
188	
189	        Takes a pointer to a struct cxl_ioctl_start_work:
190	
191	                struct cxl_ioctl_start_work {
192	                        __u64 flags;
193	                        __u64 work_element_descriptor;
194	                        __u64 amr;
195	                        __s16 num_interrupts;
196	                        __s16 reserved1;
197	                        __s32 reserved2;
198	                        __u64 reserved3;
199	                        __u64 reserved4;
200	                        __u64 reserved5;
201	                        __u64 reserved6;
202	                };
203	
204	            flags:
205	                Indicates which optional fields in the structure are
206	                valid.
207	
208	            work_element_descriptor:
209	                The Work Element Descriptor (WED) is a 64-bit argument
210	                defined by the AFU. Typically this is an effective
211	                address pointing to an AFU specific structure
212	                describing what work to perform.
213	
214	            amr:
215	                Authority Mask Register (AMR), same as the powerpc
216	                AMR. This field is only used by the kernel when the
217	                corresponding CXL_START_WORK_AMR value is specified in
218	                flags. If not specified the kernel will use a default
219	                value of 0.
220	
221	            num_interrupts:
222	                Number of userspace interrupts to request. This field
223	                is only used by the kernel when the corresponding
224	                CXL_START_WORK_NUM_IRQS value is specified in flags.
225	                If not specified the minimum number required by the
226	                AFU will be allocated. The min and max number can be
227	                obtained from sysfs.
228	
229	            reserved fields:
230	                For ABI padding and future extensions
231	
232	    CXL_IOCTL_GET_PROCESS_ELEMENT:
233	        Get the current context id, also known as the process element.
234	        The value is returned from the kernel as a __u32.
235	
236	
237	mmap
238	----
239	
240	    An AFU may have an MMIO space to facilitate communication with the
241	    AFU. If it does, the MMIO space can be accessed via mmap. The size
242	    and contents of this area are specific to the particular AFU. The
243	    size can be discovered via sysfs.
244	
245	    In AFU directed mode, master contexts are allowed to map all of
246	    the MMIO space and slave contexts are allowed to only map the per
247	    process MMIO space associated with the context. In dedicated
248	    process mode the entire MMIO space can always be mapped.
249	
250	    This mmap call must be done after the START_WORK ioctl.
251	
252	    Care should be taken when accessing MMIO space. Only 32 and 64-bit
253	    accesses are supported by POWER8. Also, the AFU will be designed
254	    with a specific endianness, so all MMIO accesses should consider
255	    endianness (recommend endian(3) variants like: le64toh(),
256	    be64toh() etc). These endian issues equally apply to shared memory
257	    queues the WED may describe.
258	
259	
260	read
261	----
262	
263	    Reads events from the AFU. Blocks if no events are pending
264	    (unless O_NONBLOCK is supplied). Returns -EIO in the case of an
265	    unrecoverable error or if the card is removed.
266	
267	    read() will always return an integral number of events.
268	
269	    The buffer passed to read() must be at least 4K bytes.
270	
271	    The result of the read will be a buffer of one or more events,
272	    each event is of type struct cxl_event, of varying size.
273	
274	            struct cxl_event {
275	                    struct cxl_event_header header;
276	                    union {
277	                            struct cxl_event_afu_interrupt irq;
278	                            struct cxl_event_data_storage fault;
279	                            struct cxl_event_afu_error afu_error;
280	                    };
281	            };
282	
283	    The struct cxl_event_header is defined as:
284	
285	            struct cxl_event_header {
286	                    __u16 type;
287	                    __u16 size;
288	                    __u16 process_element;
289	                    __u16 reserved1;
290	            };
291	
292	        type:
293	            This defines the type of event. The type determines how
294	            the rest of the event is structured. These types are
295	            described below and defined by enum cxl_event_type.
296	
297	        size:
298	            This is the size of the event in bytes including the
299	            struct cxl_event_header. The start of the next event can
300	            be found at this offset from the start of the current
301	            event.
302	
303	        process_element:
304	            Context ID of the event.
305	
306	        reserved field:
307	            For future extensions and padding.
308	
309	    If the event type is CXL_EVENT_AFU_INTERRUPT then the event
310	    structure is defined as:
311	
312	            struct cxl_event_afu_interrupt {
313	                    __u16 flags;
314	                    __u16 irq; /* Raised AFU interrupt number */
315	                    __u32 reserved1;
316	            };
317	
318	        flags:
319	            These flags indicate which optional fields are present
320	            in this struct. Currently all fields are mandatory.
321	
322	        irq:
323	            The IRQ number sent by the AFU.
324	
325	        reserved field:
326	            For future extensions and padding.
327	
328	    If the event type is CXL_EVENT_DATA_STORAGE then the event
329	    structure is defined as:
330	
331	            struct cxl_event_data_storage {
332	                    __u16 flags;
333	                    __u16 reserved1;
334	                    __u32 reserved2;
335	                    __u64 addr;
336	                    __u64 dsisr;
337	                    __u64 reserved3;
338	            };
339	
340	        flags:
341	            These flags indicate which optional fields are present in
342	            this struct. Currently all fields are mandatory.
343	
344	        address:
345	            The address that the AFU unsuccessfully attempted to
346	            access. Valid accesses will be handled transparently by the
347	            kernel but invalid accesses will generate this event.
348	
349	        dsisr:
350	            This field gives information on the type of fault. It is a
351	            copy of the DSISR from the PSL hardware when the address
352	            fault occurred. The form of the DSISR is as defined in the
353	            CAIA.
354	
355	        reserved fields:
356	            For future extensions
357	
358	    If the event type is CXL_EVENT_AFU_ERROR then the event structure
359	    is defined as:
360	
361	            struct cxl_event_afu_error {
362	                    __u16 flags;
363	                    __u16 reserved1;
364	                    __u32 reserved2;
365	                    __u64 error;
366	            };
367	
368	        flags:
369	            These flags indicate which optional fields are present in
370	            this struct. Currently all fields are Mandatory.
371	
372	        error:
373	            Error status from the AFU. Defined by the AFU.
374	
375	        reserved fields:
376	            For future extensions and padding
377	
378	
379	2. Card character device (powerVM guest only)
380	
381	    In a powerVM guest, an extra character device is created for the
382	    card. The device is only used to write (flash) a new image on the
383	    FPGA accelerator. Once the image is written and verified, the
384	    device tree is updated and the card is reset to reload the updated
385	    image.
386	
387	open
388	----
389	
390	    Opens the device and allocates a file descriptor to be used with
391	    the rest of the API. The device can only be opened once.
392	
393	ioctl
394	-----
395	
396	CXL_IOCTL_DOWNLOAD_IMAGE:
397	CXL_IOCTL_VALIDATE_IMAGE:
398	    Starts and controls flashing a new FPGA image. Partial
399	    reconfiguration is not supported (yet), so the image must contain
400	    a copy of the PSL and AFU(s). Since an image can be quite large,
401	    the caller may have to iterate, splitting the image in smaller
402	    chunks.
403	
404	    Takes a pointer to a struct cxl_adapter_image:
405	        struct cxl_adapter_image {
406	            __u64 flags;
407	            __u64 data;
408	            __u64 len_data;
409	            __u64 len_image;
410	            __u64 reserved1;
411	            __u64 reserved2;
412	            __u64 reserved3;
413	            __u64 reserved4;
414	        };
415	
416	    flags:
417	        These flags indicate which optional fields are present in
418	        this struct. Currently all fields are mandatory.
419	
420	    data:
421	        Pointer to a buffer with part of the image to write to the
422	        card.
423	
424	    len_data:
425	        Size of the buffer pointed to by data.
426	
427	    len_image:
428	        Full size of the image.
429	
430	
431	Sysfs Class
432	===========
433	
434	    A cxl sysfs class is added under /sys/class/cxl to facilitate
435	    enumeration and tuning of the accelerators. Its layout is
436	    described in Documentation/ABI/testing/sysfs-class-cxl
437	
438	
439	Udev rules
440	==========
441	
442	    The following udev rules could be used to create a symlink to the
443	    most logical chardev to use in any programming mode (afuX.Yd for
444	    dedicated, afuX.Ys for afu directed), since the API is virtually
445	    identical for each:
446	
447		SUBSYSTEM=="cxl", ATTRS{mode}=="dedicated_process", SYMLINK="cxl/%b"
448		SUBSYSTEM=="cxl", ATTRS{mode}=="afu_directed", \
449		                  KERNEL=="afu[0-9]*.[0-9]*s", SYMLINK="cxl/%b"
Hide Line Numbers


About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog