About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Documentation / s390 / vfio-ccw.txt


Based on kernel version 4.16.1. Page generated on 2018-04-09 11:53 EST.

1	vfio-ccw: the basic infrastructure
2	==================================
3	
4	Introduction
5	------------
6	
7	Here we describe the vfio support for I/O subchannel devices for
8	Linux/s390. Motivation for vfio-ccw is to passthrough subchannels to a
9	virtual machine, while vfio is the means.
10	
11	Different than other hardware architectures, s390 has defined a unified
12	I/O access method, which is so called Channel I/O. It has its own access
13	patterns:
14	- Channel programs run asynchronously on a separate (co)processor.
15	- The channel subsystem will access any memory designated by the caller
16	  in the channel program directly, i.e. there is no iommu involved.
17	Thus when we introduce vfio support for these devices, we realize it
18	with a mediated device (mdev) implementation. The vfio mdev will be
19	added to an iommu group, so as to make itself able to be managed by the
20	vfio framework. And we add read/write callbacks for special vfio I/O
21	regions to pass the channel programs from the mdev to its parent device
22	(the real I/O subchannel device) to do further address translation and
23	to perform I/O instructions.
24	
25	This document does not intend to explain the s390 I/O architecture in
26	every detail. More information/reference could be found here:
27	- A good start to know Channel I/O in general:
28	  https://en.wikipedia.org/wiki/Channel_I/O
29	- s390 architecture:
30	  s390 Principles of Operation manual (IBM Form. No. SA22-7832)
31	- The existing Qemu code which implements a simple emulated channel
32	  subsystem could also be a good reference. It makes it easier to follow
33	  the flow.
34	  qemu/hw/s390x/css.c
35	
36	For vfio mediated device framework:
37	- Documentation/vfio-mediated-device.txt
38	
39	Motivation of vfio-ccw
40	----------------------
41	
42	Currently, a guest virtualized via qemu/kvm on s390 only sees
43	paravirtualized virtio devices via the "Virtio Over Channel I/O
44	(virtio-ccw)" transport. This makes virtio devices discoverable via
45	standard operating system algorithms for handling channel devices.
46	
47	However this is not enough. On s390 for the majority of devices, which
48	use the standard Channel I/O based mechanism, we also need to provide
49	the functionality of passing through them to a Qemu virtual machine.
50	This includes devices that don't have a virtio counterpart (e.g. tape
51	drives) or that have specific characteristics which guests want to
52	exploit.
53	
54	For passing a device to a guest, we want to use the same interface as
55	everybody else, namely vfio. Thus, we would like to introduce vfio
56	support for channel devices. And we would like to name this new vfio
57	device "vfio-ccw".
58	
59	Access patterns of CCW devices
60	------------------------------
61	
62	s390 architecture has implemented a so called channel subsystem, that
63	provides a unified view of the devices physically attached to the
64	systems. Though the s390 hardware platform knows about a huge variety of
65	different peripheral attachments like disk devices (aka. DASDs), tapes,
66	communication controllers, etc. They can all be accessed by a well
67	defined access method and they are presenting I/O completion a unified
68	way: I/O interruptions.
69	
70	All I/O requires the use of channel command words (CCWs). A CCW is an
71	instruction to a specialized I/O channel processor. A channel program is
72	a sequence of CCWs which are executed by the I/O channel subsystem.  To
73	issue a channel program to the channel subsystem, it is required to
74	build an operation request block (ORB), which can be used to point out
75	the format of the CCW and other control information to the system. The
76	operating system signals the I/O channel subsystem to begin executing
77	the channel program with a SSCH (start sub-channel) instruction. The
78	central processor is then free to proceed with non-I/O instructions
79	until interrupted. The I/O completion result is received by the
80	interrupt handler in the form of interrupt response block (IRB).
81	
82	Back to vfio-ccw, in short:
83	- ORBs and channel programs are built in guest kernel (with guest
84	  physical addresses).
85	- ORBs and channel programs are passed to the host kernel.
86	- Host kernel translates the guest physical addresses to real addresses
87	  and starts the I/O with issuing a privileged Channel I/O instruction
88	  (e.g SSCH).
89	- channel programs run asynchronously on a separate processor.
90	- I/O completion will be signaled to the host with I/O interruptions.
91	  And it will be copied as IRB to user space to pass it back to the
92	  guest.
93	
94	Physical vfio ccw device and its child mdev
95	-------------------------------------------
96	
97	As mentioned above, we realize vfio-ccw with a mdev implementation.
98	
99	Channel I/O does not have IOMMU hardware support, so the physical
100	vfio-ccw device does not have an IOMMU level translation or isolation.
101	
102	Sub-channel I/O instructions are all privileged instructions, When
103	handling the I/O instruction interception, vfio-ccw has the software
104	policing and translation how the channel program is programmed before
105	it gets sent to hardware.
106	
107	Within this implementation, we have two drivers for two types of
108	devices:
109	- The vfio_ccw driver for the physical subchannel device.
110	  This is an I/O subchannel driver for the real subchannel device.  It
111	  realizes a group of callbacks and registers to the mdev framework as a
112	  parent (physical) device. As a consequence, mdev provides vfio_ccw a
113	  generic interface (sysfs) to create mdev devices. A vfio mdev could be
114	  created by vfio_ccw then and added to the mediated bus. It is the vfio
115	  device that added to an IOMMU group and a vfio group.
116	  vfio_ccw also provides an I/O region to accept channel program
117	  request from user space and store I/O interrupt result for user
118	  space to retrieve. To notify user space an I/O completion, it offers
119	  an interface to setup an eventfd fd for asynchronous signaling.
120	
121	- The vfio_mdev driver for the mediated vfio ccw device.
122	  This is provided by the mdev framework. It is a vfio device driver for
123	  the mdev that created by vfio_ccw.
124	  It realize a group of vfio device driver callbacks, adds itself to a
125	  vfio group, and registers itself to the mdev framework as a mdev
126	  driver.
127	  It uses a vfio iommu backend that uses the existing map and unmap
128	  ioctls, but rather than programming them into an IOMMU for a device,
129	  it simply stores the translations for use by later requests. This
130	  means that a device programmed in a VM with guest physical addresses
131	  can have the vfio kernel convert that address to process virtual
132	  address, pin the page and program the hardware with the host physical
133	  address in one step.
134	  For a mdev, the vfio iommu backend will not pin the pages during the
135	  VFIO_IOMMU_MAP_DMA ioctl. Mdev framework will only maintain a database
136	  of the iova<->vaddr mappings in this operation. And they export a
137	  vfio_pin_pages and a vfio_unpin_pages interfaces from the vfio iommu
138	  backend for the physical devices to pin and unpin pages by demand.
139	
140	Below is a high Level block diagram.
141	
142	 +-------------+
143	 |             |
144	 | +---------+ | mdev_register_driver() +--------------+
145	 | |  Mdev   | +<-----------------------+              |
146	 | |  bus    | |                        | vfio_mdev.ko |
147	 | | driver  | +----------------------->+              |<-> VFIO user
148	 | +---------+ |    probe()/remove()    +--------------+    APIs
149	 |             |
150	 |  MDEV CORE  |
151	 |   MODULE    |
152	 |   mdev.ko   |
153	 | +---------+ | mdev_register_device() +--------------+
154	 | |Physical | +<-----------------------+              |
155	 | | device  | |                        |  vfio_ccw.ko |<-> subchannel
156	 | |interface| +----------------------->+              |     device
157	 | +---------+ |       callback         +--------------+
158	 +-------------+
159	
160	The process of how these work together.
161	1. vfio_ccw.ko drives the physical I/O subchannel, and registers the
162	   physical device (with callbacks) to mdev framework.
163	   When vfio_ccw probing the subchannel device, it registers device
164	   pointer and callbacks to the mdev framework. Mdev related file nodes
165	   under the device node in sysfs would be created for the subchannel
166	   device, namely 'mdev_create', 'mdev_destroy' and
167	   'mdev_supported_types'.
168	2. Create a mediated vfio ccw device.
169	   Use the 'mdev_create' sysfs file, we need to manually create one (and
170	   only one for our case) mediated device.
171	3. vfio_mdev.ko drives the mediated ccw device.
172	   vfio_mdev is also the vfio device drvier. It will probe the mdev and
173	   add it to an iommu_group and a vfio_group. Then we could pass through
174	   the mdev to a guest.
175	
176	vfio-ccw I/O region
177	-------------------
178	
179	An I/O region is used to accept channel program request from user
180	space and store I/O interrupt result for user space to retrieve. The
181	defination of the region is:
182	
183	struct ccw_io_region {
184	#define ORB_AREA_SIZE 12
185		__u8	orb_area[ORB_AREA_SIZE];
186	#define SCSW_AREA_SIZE 12
187		__u8	scsw_area[SCSW_AREA_SIZE];
188	#define IRB_AREA_SIZE 96
189		__u8	irb_area[IRB_AREA_SIZE];
190		__u32	ret_code;
191	} __packed;
192	
193	While starting an I/O request, orb_area should be filled with the
194	guest ORB, and scsw_area should be filled with the SCSW of the Virtual
195	Subchannel.
196	
197	irb_area stores the I/O result.
198	
199	ret_code stores a return code for each access of the region.
200	
201	vfio-ccw patches overview
202	-------------------------
203	
204	For now, our patches are rebased on the latest mdev implementation.
205	vfio-ccw follows what vfio-pci did on the s390 paltform and uses
206	vfio-iommu-type1 as the vfio iommu backend. It's a good start to launch
207	the code review for vfio-ccw. Note that the implementation is far from
208	complete yet; but we'd like to get feedback for the general
209	architecture.
210	
211	* CCW translation APIs
212	- Description:
213	  These introduce a group of APIs (start with 'cp_') to do CCW
214	  translation. The CCWs passed in by a user space program are
215	  organized with their guest physical memory addresses. These APIs
216	  will copy the CCWs into the kernel space, and assemble a runnable
217	  kernel channel program by updating the guest physical addresses with
218	  their corresponding host physical addresses.
219	- Patches:
220	  vfio: ccw: introduce channel program interfaces
221	
222	* vfio_ccw device driver
223	- Description:
224	  The following patches utilizes the CCW translation APIs and introduce
225	  vfio_ccw, which is the driver for the I/O subchannel devices you want
226	  to pass through.
227	  vfio_ccw implements the following vfio ioctls:
228	    VFIO_DEVICE_GET_INFO
229	    VFIO_DEVICE_GET_IRQ_INFO
230	    VFIO_DEVICE_GET_REGION_INFO
231	    VFIO_DEVICE_RESET
232	    VFIO_DEVICE_SET_IRQS
233	  This provides an I/O region, so that the user space program can pass a
234	  channel program to the kernel, to do further CCW translation before
235	  issuing them to a real device.
236	  This also provides the SET_IRQ ioctl to setup an event notifier to
237	  notify the user space program the I/O completion in an asynchronous
238	  way.
239	- Patches:
240	  vfio: ccw: basic implementation for vfio_ccw driver
241	  vfio: ccw: introduce ccw_io_region
242	  vfio: ccw: realize VFIO_DEVICE_GET_REGION_INFO ioctl
243	  vfio: ccw: realize VFIO_DEVICE_RESET ioctl
244	  vfio: ccw: realize VFIO_DEVICE_G(S)ET_IRQ_INFO ioctls
245	
246	The user of vfio-ccw is not limited to Qemu, while Qemu is definitely a
247	good example to get understand how these patches work. Here is a little
248	bit more detail how an I/O request triggered by the Qemu guest will be
249	handled (without error handling).
250	
251	Explanation:
252	Q1-Q7: Qemu side process.
253	K1-K5: Kernel side process.
254	
255	Q1. Get I/O region info during initialization.
256	Q2. Setup event notifier and handler to handle I/O completion.
257	
258	... ...
259	
260	Q3. Intercept a ssch instruction.
261	Q4. Write the guest channel program and ORB to the I/O region.
262	    K1. Copy from guest to kernel.
263	    K2. Translate the guest channel program to a host kernel space
264	        channel program, which becomes runnable for a real device.
265	    K3. With the necessary information contained in the orb passed in
266	        by Qemu, issue the ccwchain to the device.
267	    K4. Return the ssch CC code.
268	Q5. Return the CC code to the guest.
269	
270	... ...
271	
272	    K5. Interrupt handler gets the I/O result and write the result to
273	        the I/O region.
274	    K6. Signal Qemu to retrieve the result.
275	Q6. Get the signal and event handler reads out the result from the I/O
276	    region.
277	Q7. Update the irb for the guest.
278	
279	Limitations
280	-----------
281	
282	The current vfio-ccw implementation focuses on supporting basic commands
283	needed to implement block device functionality (read/write) of DASD/ECKD
284	device only. Some commands may need special handling in the future, for
285	example, anything related to path grouping.
286	
287	DASD is a kind of storage device. While ECKD is a data recording format.
288	More information for DASD and ECKD could be found here:
289	https://en.wikipedia.org/wiki/Direct-access_storage_device
290	https://en.wikipedia.org/wiki/Count_key_data
291	
292	Together with the corresponding work in Qemu, we can bring the passed
293	through DASD/ECKD device online in a guest now and use it as a block
294	device.
295	
296	Reference
297	---------
298	1. ESA/s390 Principles of Operation manual (IBM Form. No. SA22-7832)
299	2. ESA/390 Common I/O Device Commands manual (IBM Form. No. SA22-7204)
300	3. https://en.wikipedia.org/wiki/Channel_I/O
301	4. Documentation/s390/cds.txt
302	5. Documentation/vfio.txt
303	6. Documentation/vfio-mediated-device.txt
Hide Line Numbers


About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog