About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Documentation / networking / kcm.txt




Custom Search

Based on kernel version 4.9. Page generated on 2016-12-21 14:36 EST.

1	Kernel Connection Mulitplexor
2	-----------------------------
3	
4	Kernel Connection Multiplexor (KCM) is a mechanism that provides a message based
5	interface over TCP for generic application protocols. With KCM an application
6	can efficiently send and receive application protocol messages over TCP using
7	datagram sockets.
8	
9	KCM implements an NxM multiplexor in the kernel as diagrammed below:
10	
11	+------------+   +------------+   +------------+   +------------+
12	| KCM socket |   | KCM socket |   | KCM socket |   | KCM socket |
13	+------------+   +------------+   +------------+   +------------+
14	      |                 |               |                |
15	      +-----------+     |               |     +----------+
16	                  |     |               |     |
17	               +----------------------------------+
18	               |           Multiplexor            |
19	               +----------------------------------+
20	                 |   |           |           |  |
21	       +---------+   |           |           |  ------------+
22	       |             |           |           |              |
23	+----------+  +----------+  +----------+  +----------+ +----------+
24	|  Psock   |  |  Psock   |  |  Psock   |  |  Psock   | |  Psock   |
25	+----------+  +----------+  +----------+  +----------+ +----------+
26	      |              |           |            |             |
27	+----------+  +----------+  +----------+  +----------+ +----------+
28	| TCP sock |  | TCP sock |  | TCP sock |  | TCP sock | | TCP sock |
29	+----------+  +----------+  +----------+  +----------+ +----------+
30	
31	KCM sockets
32	-----------
33	
34	The KCM sockets provide the user interface to the muliplexor. All the KCM sockets
35	bound to a multiplexor are considered to have equivalent function, and I/O
36	operations in different sockets may be done in parallel without the need for
37	synchronization between threads in userspace.
38	
39	Multiplexor
40	-----------
41	
42	The multiplexor provides the message steering. In the transmit path, messages
43	written on a KCM socket are sent atomically on an appropriate TCP socket.
44	Similarly, in the receive path, messages are constructed on each TCP socket
45	(Psock) and complete messages are steered to a KCM socket.
46	
47	TCP sockets & Psocks
48	--------------------
49	
50	TCP sockets may be bound to a KCM multiplexor. A Psock structure is allocated
51	for each bound TCP socket, this structure holds the state for constructing
52	messages on receive as well as other connection specific information for KCM.
53	
54	Connected mode semantics
55	------------------------
56	
57	Each multiplexor assumes that all attached TCP connections are to the same
58	destination and can use the different connections for load balancing when
59	transmitting. The normal send and recv calls (include sendmmsg and recvmmsg)
60	can be used to send and receive messages from the KCM socket.
61	
62	Socket types
63	------------
64	
65	KCM supports SOCK_DGRAM and SOCK_SEQPACKET socket types.
66	
67	Message delineation
68	-------------------
69	
70	Messages are sent over a TCP stream with some application protocol message
71	format that typically includes a header which frames the messages. The length
72	of a received message can be deduced from the application protocol header
73	(often just a simple length field).
74	
75	A TCP stream must be parsed to determine message boundaries. Berkeley Packet
76	Filter (BPF) is used for this. When attaching a TCP socket to a multiplexor a
77	BPF program must be specified. The program is called at the start of receiving
78	a new message and is given an skbuff that contains the bytes received so far.
79	It parses the message header and returns the length of the message. Given this
80	information, KCM will construct the message of the stated length and deliver it
81	to a KCM socket.
82	
83	TCP socket management
84	---------------------
85	
86	When a TCP socket is attached to a KCM multiplexor data ready (POLLIN) and
87	write space available (POLLOUT) events are handled by the multiplexor. If there
88	is a state change (disconnection) or other error on a TCP socket, an error is
89	posted on the TCP socket so that a POLLERR event happens and KCM discontinues
90	using the socket. When the application gets the error notification for a
91	TCP socket, it should unattach the socket from KCM and then handle the error
92	condition (the typical response is to close the socket and create a new
93	connection if necessary).
94	
95	KCM limits the maximum receive message size to be the size of the receive
96	socket buffer on the attached TCP socket (the socket buffer size can be set by
97	SO_RCVBUF). If the length of a new message reported by the BPF program is
98	greater than this limit a corresponding error (EMSGSIZE) is posted on the TCP
99	socket. The BPF program may also enforce a maximum messages size and report an
100	error when it is exceeded.
101	
102	A timeout may be set for assembling messages on a receive socket. The timeout
103	value is taken from the receive timeout of the attached TCP socket (this is set
104	by SO_RCVTIMEO). If the timer expires before assembly is complete an error
105	(ETIMEDOUT) is posted on the socket.
106	
107	User interface
108	==============
109	
110	Creating a multiplexor
111	----------------------
112	
113	A new multiplexor and initial KCM socket is created by a socket call:
114	
115	  socket(AF_KCM, type, protocol)
116	
117	  - type is either SOCK_DGRAM or SOCK_SEQPACKET
118	  - protocol is KCMPROTO_CONNECTED
119	
120	Cloning KCM sockets
121	-------------------
122	
123	After the first KCM socket is created using the socket call as described
124	above, additional sockets for the multiplexor can be created by cloning
125	a KCM socket. This is accomplished by an ioctl on a KCM socket:
126	
127	  /* From linux/kcm.h */
128	  struct kcm_clone {
129	        int fd;
130	  };
131	
132	  struct kcm_clone info;
133	
134	  memset(&info, 0, sizeof(info));
135	
136	  err = ioctl(kcmfd, SIOCKCMCLONE, &info);
137	
138	  if (!err)
139	    newkcmfd = info.fd;
140	
141	Attach transport sockets
142	------------------------
143	
144	Attaching of transport sockets to a multiplexor is performed by calling an
145	ioctl on a KCM socket for the multiplexor. e.g.:
146	
147	  /* From linux/kcm.h */
148	  struct kcm_attach {
149	        int fd;
150		int bpf_fd;
151	  };
152	
153	  struct kcm_attach info;
154	
155	  memset(&info, 0, sizeof(info));
156	
157	  info.fd = tcpfd;
158	  info.bpf_fd = bpf_prog_fd;
159	
160	  ioctl(kcmfd, SIOCKCMATTACH, &info);
161	
162	The kcm_attach structure contains:
163	  fd: file descriptor for TCP socket being attached
164	  bpf_prog_fd: file descriptor for compiled BPF program downloaded
165	
166	Unattach transport sockets
167	--------------------------
168	
169	Unattaching a transport socket from a multiplexor is straightforward. An
170	"unattach" ioctl is done with the kcm_unattach structure as the argument:
171	
172	  /* From linux/kcm.h */
173	  struct kcm_unattach {
174	        int fd;
175	  };
176	
177	  struct kcm_unattach info;
178	
179	  memset(&info, 0, sizeof(info));
180	
181	  info.fd = cfd;
182	
183	  ioctl(fd, SIOCKCMUNATTACH, &info);
184	
185	Disabling receive on KCM socket
186	-------------------------------
187	
188	A setsockopt is used to disable or enable receiving on a KCM socket.
189	When receive is disabled, any pending messages in the socket's
190	receive buffer are moved to other sockets. This feature is useful
191	if an application thread knows that it will be doing a lot of
192	work on a request and won't be able to service new messages for a
193	while. Example use:
194	
195	  int val = 1;
196	
197	  setsockopt(kcmfd, SOL_KCM, KCM_RECV_DISABLE, &val, sizeof(val))
198	
199	BFP programs for message delineation
200	------------------------------------
201	
202	BPF programs can be compiled using the BPF LLVM backend. For exmple,
203	the BPF program for parsing Thrift is:
204	
205	  #include "bpf.h" /* for __sk_buff */
206	  #include "bpf_helpers.h" /* for load_word intrinsic */
207	
208	  SEC("socket_kcm")
209	  int bpf_prog1(struct __sk_buff *skb)
210	  {
211	       return load_word(skb, 0) + 4;
212	  }
213	
214	  char _license[] SEC("license") = "GPL";
215	
216	Use in applications
217	===================
218	
219	KCM accelerates application layer protocols. Specifically, it allows
220	applications to use a message based interface for sending and receiving
221	messages. The kernel provides necessary assurances that messages are sent
222	and received atomically. This relieves much of the burden applications have
223	in mapping a message based protocol onto the TCP stream. KCM also make
224	application layer messages a unit of work in the kernel for the purposes of
225	steerng and scheduling, which in turn allows a simpler networking model in
226	multithreaded applications.
227	
228	Configurations
229	--------------
230	
231	In an Nx1 configuration, KCM logically provides multiple socket handles
232	to the same TCP connection. This allows parallelism between in I/O
233	operations on the TCP socket (for instance copyin and copyout of data is
234	parallelized). In an application, a KCM socket can be opened for each
235	processing thread and inserted into the epoll (similar to how SO_REUSEPORT
236	is used to allow multiple listener sockets on the same port).
237	
238	In a MxN configuration, multiple connections are established to the
239	same destination. These are used for simple load balancing.
240	
241	Message batching
242	----------------
243	
244	The primary purpose of KCM is load balancing between KCM sockets and hence
245	threads in a nominal use case. Perfect load balancing, that is steering
246	each received message to a different KCM socket or steering each sent
247	message to a different TCP socket, can negatively impact performance
248	since this doesn't allow for affinities to be established. Balancing
249	based on groups, or batches of messages, can be beneficial for performance.
250	
251	On transmit, there are three ways an application can batch (pipeline)
252	messages on a KCM socket.
253	  1) Send multiple messages in a single sendmmsg.
254	  2) Send a group of messages each with a sendmsg call, where all messages
255	     except the last have MSG_BATCH in the flags of sendmsg call.
256	  3) Create "super message" composed of multiple messages and send this
257	     with a single sendmsg.
258	
259	On receive, the KCM module attempts to queue messages received on the
260	same KCM socket during each TCP ready callback. The targeted KCM socket
261	changes at each receive ready callback on the KCM socket. The application
262	does not need to configure this.
263	
264	Error handling
265	--------------
266	
267	An application should include a thread to monitor errors raised on
268	the TCP connection. Normally, this will be done by placing each
269	TCP socket attached to a KCM multiplexor in epoll set for POLLERR
270	event. If an error occurs on an attached TCP socket, KCM sets an EPIPE
271	on the socket thus waking up the application thread. When the application
272	sees the error (which may just be a disconnect) it should unattach the
273	socket from KCM and then close it. It is assumed that once an error is
274	posted on the TCP socket the data stream is unrecoverable (i.e. an error
275	may have occurred in in the middle of receiving a messssge).
276	
277	TCP connection monitoring
278	-------------------------
279	
280	In KCM there is no means to correlate a message to the TCP socket that
281	was used to send or receive the message (except in the case there is
282	only one attached TCP socket). However, the application does retain
283	an open file descriptor to the socket so it will be able to get statistics
284	from the socket which can be used in detecting issues (such as high
285	retransmissions on the socket).
Hide Line Numbers
About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Information is copyright its respective author. All material is available from the Linux Kernel Source distributed under a GPL License. This page is provided as a free service by mjmwired.net.