About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Documentation / powerpc / cxlflash.txt


Based on kernel version 4.16.1. Page generated on 2018-04-09 11:53 EST.

1	Introduction
2	============
3	
4	    The IBM Power architecture provides support for CAPI (Coherent
5	    Accelerator Power Interface), which is available to certain PCIe slots
6	    on Power 8 systems. CAPI can be thought of as a special tunneling
7	    protocol through PCIe that allow PCIe adapters to look like special
8	    purpose co-processors which can read or write an application's
9	    memory and generate page faults. As a result, the host interface to
10	    an adapter running in CAPI mode does not require the data buffers to
11	    be mapped to the device's memory (IOMMU bypass) nor does it require
12	    memory to be pinned.
13	
14	    On Linux, Coherent Accelerator (CXL) kernel services present CAPI
15	    devices as a PCI device by implementing a virtual PCI host bridge.
16	    This abstraction simplifies the infrastructure and programming
17	    model, allowing for drivers to look similar to other native PCI
18	    device drivers.
19	
20	    CXL provides a mechanism by which user space applications can
21	    directly talk to a device (network or storage) bypassing the typical
22	    kernel/device driver stack. The CXL Flash Adapter Driver enables a
23	    user space application direct access to Flash storage.
24	
25	    The CXL Flash Adapter Driver is a kernel module that sits in the
26	    SCSI stack as a low level device driver (below the SCSI disk and
27	    protocol drivers) for the IBM CXL Flash Adapter. This driver is
28	    responsible for the initialization of the adapter, setting up the
29	    special path for user space access, and performing error recovery. It
30	    communicates directly the Flash Accelerator Functional Unit (AFU)
31	    as described in Documentation/powerpc/cxl.txt.
32	
33	    The cxlflash driver supports two, mutually exclusive, modes of
34	    operation at the device (LUN) level:
35	
36	        - Any flash device (LUN) can be configured to be accessed as a
37	          regular disk device (i.e.: /dev/sdc). This is the default mode.
38	
39	        - Any flash device (LUN) can be configured to be accessed from
40	          user space with a special block library. This mode further
41	          specifies the means of accessing the device and provides for
42	          either raw access to the entire LUN (referred to as direct
43	          or physical LUN access) or access to a kernel/AFU-mediated
44	          partition of the LUN (referred to as virtual LUN access). The
45	          segmentation of a disk device into virtual LUNs is assisted
46	          by special translation services provided by the Flash AFU.
47	
48	Overview
49	========
50	
51	    The Coherent Accelerator Interface Architecture (CAIA) introduces a
52	    concept of a master context. A master typically has special privileges
53	    granted to it by the kernel or hypervisor allowing it to perform AFU
54	    wide management and control. The master may or may not be involved
55	    directly in each user I/O, but at the minimum is involved in the
56	    initial setup before the user application is allowed to send requests
57	    directly to the AFU.
58	
59	    The CXL Flash Adapter Driver establishes a master context with the
60	    AFU. It uses memory mapped I/O (MMIO) for this control and setup. The
61	    Adapter Problem Space Memory Map looks like this:
62	
63	                     +-------------------------------+
64	                     |    512 * 64 KB User MMIO      |
65	                     |        (per context)          |
66	                     |       User Accessible         |
67	                     +-------------------------------+
68	                     |    512 * 128 B per context    |
69	                     |    Provisioning and Control   |
70	                     |   Trusted Process accessible  |
71	                     +-------------------------------+
72	                     |         64 KB Global          |
73	                     |   Trusted Process accessible  |
74	                     +-------------------------------+
75	
76	    This driver configures itself into the SCSI software stack as an
77	    adapter driver. The driver is the only entity that is considered a
78	    Trusted Process to program the Provisioning and Control and Global
79	    areas in the MMIO Space shown above.  The master context driver
80	    discovers all LUNs attached to the CXL Flash adapter and instantiates
81	    scsi block devices (/dev/sdb, /dev/sdc etc.) for each unique LUN
82	    seen from each path.
83	
84	    Once these scsi block devices are instantiated, an application
85	    written to a specification provided by the block library may get
86	    access to the Flash from user space (without requiring a system call).
87	
88	    This master context driver also provides a series of ioctls for this
89	    block library to enable this user space access.  The driver supports
90	    two modes for accessing the block device.
91	
92	    The first mode is called a virtual mode. In this mode a single scsi
93	    block device (/dev/sdb) may be carved up into any number of distinct
94	    virtual LUNs. The virtual LUNs may be resized as long as the sum of
95	    the sizes of all the virtual LUNs, along with the meta-data associated
96	    with it does not exceed the physical capacity.
97	
98	    The second mode is called the physical mode. In this mode a single
99	    block device (/dev/sdb) may be opened directly by the block library
100	    and the entire space for the LUN is available to the application.
101	
102	    Only the physical mode provides persistence of the data.  i.e. The
103	    data written to the block device will survive application exit and
104	    restart and also reboot. The virtual LUNs do not persist (i.e. do
105	    not survive after the application terminates or the system reboots).
106	
107	
108	Block library API
109	=================
110	
111	    Applications intending to get access to the CXL Flash from user
112	    space should use the block library, as it abstracts the details of
113	    interfacing directly with the cxlflash driver that are necessary for
114	    performing administrative actions (i.e.: setup, tear down, resize).
115	    The block library can be thought of as a 'user' of services,
116	    implemented as IOCTLs, that are provided by the cxlflash driver
117	    specifically for devices (LUNs) operating in user space access
118	    mode. While it is not a requirement that applications understand
119	    the interface between the block library and the cxlflash driver,
120	    a high-level overview of each supported service (IOCTL) is provided
121	    below.
122	
123	    The block library can be found on GitHub:
124	    http://github.com/open-power/capiflash
125	
126	
127	CXL Flash Driver LUN IOCTLs
128	===========================
129	
130	    Users, such as the block library, that wish to interface with a flash
131	    device (LUN) via user space access need to use the services provided
132	    by the cxlflash driver. As these services are implemented as ioctls,
133	    a file descriptor handle must first be obtained in order to establish
134	    the communication channel between a user and the kernel.  This file
135	    descriptor is obtained by opening the device special file associated
136	    with the scsi disk device (/dev/sdb) that was created during LUN
137	    discovery. As per the location of the cxlflash driver within the
138	    SCSI protocol stack, this open is actually not seen by the cxlflash
139	    driver. Upon successful open, the user receives a file descriptor
140	    (herein referred to as fd1) that should be used for issuing the
141	    subsequent ioctls listed below.
142	
143	    The structure definitions for these IOCTLs are available in:
144	    uapi/scsi/cxlflash_ioctl.h
145	
146	DK_CXLFLASH_ATTACH
147	------------------
148	
149	    This ioctl obtains, initializes, and starts a context using the CXL
150	    kernel services. These services specify a context id (u16) by which
151	    to uniquely identify the context and its allocated resources. The
152	    services additionally provide a second file descriptor (herein
153	    referred to as fd2) that is used by the block library to initiate
154	    memory mapped I/O (via mmap()) to the CXL flash device and poll for
155	    completion events. This file descriptor is intentionally installed by
156	    this driver and not the CXL kernel services to allow for intermediary
157	    notification and access in the event of a non-user-initiated close(),
158	    such as a killed process. This design point is described in further
159	    detail in the description for the DK_CXLFLASH_DETACH ioctl.
160	
161	    There are a few important aspects regarding the "tokens" (context id
162	    and fd2) that are provided back to the user:
163	
164	        - These tokens are only valid for the process under which they
165	          were created. The child of a forked process cannot continue
166	          to use the context id or file descriptor created by its parent
167	          (see DK_CXLFLASH_VLUN_CLONE for further details).
168	
169	        - These tokens are only valid for the lifetime of the context and
170	          the process under which they were created. Once either is
171	          destroyed, the tokens are to be considered stale and subsequent
172	          usage will result in errors.
173	
174		- A valid adapter file descriptor (fd2 >= 0) is only returned on
175		  the initial attach for a context. Subsequent attaches to an
176		  existing context (DK_CXLFLASH_ATTACH_REUSE_CONTEXT flag present)
177		  do not provide the adapter file descriptor as it was previously
178		  made known to the application.
179	
180	        - When a context is no longer needed, the user shall detach from
181	          the context via the DK_CXLFLASH_DETACH ioctl. When this ioctl
182		  returns with a valid adapter file descriptor and the return flag
183		  DK_CXLFLASH_APP_CLOSE_ADAP_FD is present, the application _must_
184		  close the adapter file descriptor following a successful detach.
185	
186		- When this ioctl returns with a valid fd2 and the return flag
187		  DK_CXLFLASH_APP_CLOSE_ADAP_FD is present, the application _must_
188		  close fd2 in the following circumstances:
189	
190			+ Following a successful detach of the last user of the context
191			+ Following a successful recovery on the context's original fd2
192			+ In the child process of a fork(), following a clone ioctl,
193			  on the fd2 associated with the source context
194	
195	        - At any time, a close on fd2 will invalidate the tokens. Applications
196		  should exercise caution to only close fd2 when appropriate (outlined
197		  in the previous bullet) to avoid premature loss of I/O.
198	
199	DK_CXLFLASH_USER_DIRECT
200	-----------------------
201	    This ioctl is responsible for transitioning the LUN to direct
202	    (physical) mode access and configuring the AFU for direct access from
203	    user space on a per-context basis. Additionally, the block size and
204	    last logical block address (LBA) are returned to the user.
205	
206	    As mentioned previously, when operating in user space access mode,
207	    LUNs may be accessed in whole or in part. Only one mode is allowed
208	    at a time and if one mode is active (outstanding references exist),
209	    requests to use the LUN in a different mode are denied.
210	
211	    The AFU is configured for direct access from user space by adding an
212	    entry to the AFU's resource handle table. The index of the entry is
213	    treated as a resource handle that is returned to the user. The user
214	    is then able to use the handle to reference the LUN during I/O.
215	
216	DK_CXLFLASH_USER_VIRTUAL
217	------------------------
218	    This ioctl is responsible for transitioning the LUN to virtual mode
219	    of access and configuring the AFU for virtual access from user space
220	    on a per-context basis. Additionally, the block size and last logical
221	    block address (LBA) are returned to the user.
222	
223	    As mentioned previously, when operating in user space access mode,
224	    LUNs may be accessed in whole or in part. Only one mode is allowed
225	    at a time and if one mode is active (outstanding references exist),
226	    requests to use the LUN in a different mode are denied.
227	
228	    The AFU is configured for virtual access from user space by adding
229	    an entry to the AFU's resource handle table. The index of the entry
230	    is treated as a resource handle that is returned to the user. The
231	    user is then able to use the handle to reference the LUN during I/O.
232	
233	    By default, the virtual LUN is created with a size of 0. The user
234	    would need to use the DK_CXLFLASH_VLUN_RESIZE ioctl to adjust the grow
235	    the virtual LUN to a desired size. To avoid having to perform this
236	    resize for the initial creation of the virtual LUN, the user has the
237	    option of specifying a size as part of the DK_CXLFLASH_USER_VIRTUAL
238	    ioctl, such that when success is returned to the user, the
239	    resource handle that is provided is already referencing provisioned
240	    storage. This is reflected by the last LBA being a non-zero value.
241	
242	    When a LUN is accessible from more than one port, this ioctl will
243	    return with the DK_CXLFLASH_ALL_PORTS_ACTIVE return flag set. This
244	    provides the user with a hint that I/O can be retried in the event
245	    of an I/O error as the LUN can be reached over multiple paths.
246	
247	DK_CXLFLASH_VLUN_RESIZE
248	-----------------------
249	    This ioctl is responsible for resizing a previously created virtual
250	    LUN and will fail if invoked upon a LUN that is not in virtual
251	    mode. Upon success, an updated last LBA is returned to the user
252	    indicating the new size of the virtual LUN associated with the
253	    resource handle.
254	
255	    The partitioning of virtual LUNs is jointly mediated by the cxlflash
256	    driver and the AFU. An allocation table is kept for each LUN that is
257	    operating in the virtual mode and used to program a LUN translation
258	    table that the AFU references when provided with a resource handle.
259	
260	    This ioctl can return -EAGAIN if an AFU sync operation takes too long.
261	    In addition to returning a failure to user, cxlflash will also schedule
262	    an asynchronous AFU reset. Should the user choose to retry the operation,
263	    it is expected to succeed. If this ioctl fails with -EAGAIN, the user
264	    can either retry the operation or treat it as a failure.
265	
266	DK_CXLFLASH_RELEASE
267	-------------------
268	    This ioctl is responsible for releasing a previously obtained
269	    reference to either a physical or virtual LUN. This can be
270	    thought of as the inverse of the DK_CXLFLASH_USER_DIRECT or
271	    DK_CXLFLASH_USER_VIRTUAL ioctls. Upon success, the resource handle
272	    is no longer valid and the entry in the resource handle table is
273	    made available to be used again.
274	
275	    As part of the release process for virtual LUNs, the virtual LUN
276	    is first resized to 0 to clear out and free the translation tables
277	    associated with the virtual LUN reference.
278	
279	DK_CXLFLASH_DETACH
280	------------------
281	    This ioctl is responsible for unregistering a context with the
282	    cxlflash driver and release outstanding resources that were
283	    not explicitly released via the DK_CXLFLASH_RELEASE ioctl. Upon
284	    success, all "tokens" which had been provided to the user from the
285	    DK_CXLFLASH_ATTACH onward are no longer valid.
286	
287	    When the DK_CXLFLASH_APP_CLOSE_ADAP_FD flag was returned on a successful
288	    attach, the application _must_ close the fd2 associated with the context
289	    following the detach of the final user of the context.
290	
291	DK_CXLFLASH_VLUN_CLONE
292	----------------------
293	    This ioctl is responsible for cloning a previously created
294	    context to a more recently created context. It exists solely to
295	    support maintaining user space access to storage after a process
296	    forks. Upon success, the child process (which invoked the ioctl)
297	    will have access to the same LUNs via the same resource handle(s)
298	    as the parent, but under a different context.
299	
300	    Context sharing across processes is not supported with CXL and
301	    therefore each fork must be met with establishing a new context
302	    for the child process. This ioctl simplifies the state management
303	    and playback required by a user in such a scenario. When a process
304	    forks, child process can clone the parents context by first creating
305	    a context (via DK_CXLFLASH_ATTACH) and then using this ioctl to
306	    perform the clone from the parent to the child.
307	
308	    The clone itself is fairly simple. The resource handle and lun
309	    translation tables are copied from the parent context to the child's
310	    and then synced with the AFU.
311	
312	    When the DK_CXLFLASH_APP_CLOSE_ADAP_FD flag was returned on a successful
313	    attach, the application _must_ close the fd2 associated with the source
314	    context (still resident/accessible in the parent process) following the
315	    clone. This is to avoid a stale entry in the file descriptor table of the
316	    child process.
317	
318	    This ioctl can return -EAGAIN if an AFU sync operation takes too long.
319	    In addition to returning a failure to user, cxlflash will also schedule
320	    an asynchronous AFU reset. Should the user choose to retry the operation,
321	    it is expected to succeed. If this ioctl fails with -EAGAIN, the user
322	    can either retry the operation or treat it as a failure.
323	
324	DK_CXLFLASH_VERIFY
325	------------------
326	    This ioctl is used to detect various changes such as the capacity of
327	    the disk changing, the number of LUNs visible changing, etc. In cases
328	    where the changes affect the application (such as a LUN resize), the
329	    cxlflash driver will report the changed state to the application.
330	
331	    The user calls in when they want to validate that a LUN hasn't been
332	    changed in response to a check condition. As the user is operating out
333	    of band from the kernel, they will see these types of events without
334	    the kernel's knowledge. When encountered, the user's architected
335	    behavior is to call in to this ioctl, indicating what they want to
336	    verify and passing along any appropriate information. For now, only
337	    verifying a LUN change (ie: size different) with sense data is
338	    supported.
339	
340	DK_CXLFLASH_RECOVER_AFU
341	-----------------------
342	    This ioctl is used to drive recovery (if such an action is warranted)
343	    of a specified user context. Any state associated with the user context
344	    is re-established upon successful recovery.
345	
346	    User contexts are put into an error condition when the device needs to
347	    be reset or is terminating. Users are notified of this error condition
348	    by seeing all 0xF's on an MMIO read. Upon encountering this, the
349	    architected behavior for a user is to call into this ioctl to recover
350	    their context. A user may also call into this ioctl at any time to
351	    check if the device is operating normally. If a failure is returned
352	    from this ioctl, the user is expected to gracefully clean up their
353	    context via release/detach ioctls. Until they do, the context they
354	    hold is not relinquished. The user may also optionally exit the process
355	    at which time the context/resources they held will be freed as part of
356	    the release fop.
357	
358	    When the DK_CXLFLASH_APP_CLOSE_ADAP_FD flag was returned on a successful
359	    attach, the application _must_ unmap and close the fd2 associated with the
360	    original context following this ioctl returning success and indicating that
361	    the context was recovered (DK_CXLFLASH_RECOVER_AFU_CONTEXT_RESET).
362	
363	DK_CXLFLASH_MANAGE_LUN
364	----------------------
365	    This ioctl is used to switch a LUN from a mode where it is available
366	    for file-system access (legacy), to a mode where it is set aside for
367	    exclusive user space access (superpipe). In case a LUN is visible
368	    across multiple ports and adapters, this ioctl is used to uniquely
369	    identify each LUN by its World Wide Node Name (WWNN).
370	
371	
372	CXL Flash Driver Host IOCTLs
373	============================
374	
375	    Each host adapter instance that is supported by the cxlflash driver
376	    has a special character device associated with it to enable a set of
377	    host management function. These character devices are hosted in a
378	    class dedicated for cxlflash and can be accessed via /dev/cxlflash/*.
379	
380	    Applications can be written to perform various functions using the
381	    host ioctl APIs below.
382	
383	    The structure definitions for these IOCTLs are available in:
384	    uapi/scsi/cxlflash_ioctl.h
385	
386	HT_CXLFLASH_LUN_PROVISION
387	-------------------------
388	    This ioctl is used to create and delete persistent LUNs on cxlflash
389	    devices that lack an external LUN management interface. It is only
390	    valid when used with AFUs that support the LUN provision capability.
391	
392	    When sufficient space is available, LUNs can be created by specifying
393	    the target port to host the LUN and a desired size in 4K blocks. Upon
394	    success, the LUN ID and WWID of the created LUN will be returned and
395	    the SCSI bus can be scanned to detect the change in LUN topology. Note
396	    that partial allocations are not supported. Should a creation fail due
397	    to a space issue, the target port can be queried for its current LUN
398	    geometry.
399	
400	    To remove a LUN, the device must first be disassociated from the Linux
401	    SCSI subsystem. The LUN deletion can then be initiated by specifying a
402	    target port and LUN ID. Upon success, the LUN geometry associated with
403	    the port will be updated to reflect new number of provisioned LUNs and
404	    available capacity.
405	
406	    To query the LUN geometry of a port, the target port is specified and
407	    upon success, the following information is presented:
408	
409	        - Maximum number of provisioned LUNs allowed for the port
410	        - Current number of provisioned LUNs for the port
411	        - Maximum total capacity of provisioned LUNs for the port (4K blocks)
412	        - Current total capacity of provisioned LUNs for the port (4K blocks)
413	
414	    With this information, the number of available LUNs and capacity can be
415	    can be calculated.
416	
417	HT_CXLFLASH_AFU_DEBUG
418	---------------------
419	    This ioctl is used to debug AFUs by supporting a command pass-through
420	    interface. It is only valid when used with AFUs that support the AFU
421	    debug capability.
422	
423	    With exception of buffer management, AFU debug commands are opaque to
424	    cxlflash and treated as pass-through. For debug commands that do require
425	    data transfer, the user supplies an adequately sized data buffer and must
426	    specify the data transfer direction with respect to the host. There is a
427	    maximum transfer size of 256K imposed. Note that partial read completions
428	    are not supported - when errors are experienced with a host read data
429	    transfer, the data buffer is not copied back to the user.
Hide Line Numbers


About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog